I am trying to extract dates from email texts using datefinder
python library.
Below is a the code snippet of what I am trying to do.
import datefinder #body has list of email texts email_dates=[] for b in body: dates = datefinder.find_dates(b) date = [] for d in dates: date.append(d) email_dates.append(date)
datefinder tries to construct all the numbers in the email to dates. I get lot of false positives. I can remove those using some logic. But i get IllegalMonthError
in some email and i am unable to go past the error and retrieve dates from other emails. Below is the error
--------------------------------------------------------------------------- IllegalMonthError Traceback (most recent call last) c:pythonpython38libsite-packagesdateutilparser_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 654 try: --> 655 ret = self._build_naive(res, default) 656 except ValueError as e: c:pythonpython38libsite-packagesdateutilparser_parser.py in _build_naive(self, res, default) 1237 -> 1238 if cday > monthrange(cyear, cmonth)[1]: 1239 repl['day'] = monthrange(cyear, cmonth)[1] c:pythonpython38libcalendar.py in monthrange(year, month) 123 if not 1 <= month <= 12: --> 124 raise IllegalMonthError(month) 125 day1 = weekday(year, month, 1) IllegalMonthError: bad month number 42; must be 1-12 During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) <ipython-input-39-1fbacc8ca3f6> in <module> 7 dates = datefinder.find_dates(b) 8 date = [] ----> 9 for d in dates: 10 date.append(d) 11 c:pythonpython38libsite-packagesdatefinder__init__.py in find_dates(self, text, source, index, strict) 30 ): 31 ---> 32 as_dt = self.parse_date_string(date_string, captures) 33 if as_dt is None: 34 ## Dateutil couldn't make heads or tails of it c:pythonpython38libsite-packagesdatefinder__init__.py in parse_date_string(self, date_string, captures) 100 # otherwise self._find_and_replace method might corrupt them 101 try: --> 102 as_dt = parser.parse(date_string, default=self.base_date) 103 except (ValueError, OverflowError): 104 # replace tokens that are problematic for dateutil c:pythonpython38libsite-packagesdateutilparser_parser.py in parse(timestr, parserinfo, **kwargs) 1372 return parser(parserinfo).parse(timestr, **kwargs) 1373 else: -> 1374 return DEFAULTPARSER.parse(timestr, **kwargs) 1375 1376 c:pythonpython38libsite-packagesdateutilparser_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 655 ret = self._build_naive(res, default) 656 except ValueError as e: --> 657 six.raise_from(ParserError(e.args[0] + ": %s", timestr), e) 658 659 if not ignoretz: TypeError: unsupported operand type(s) for +: 'int' and 'str'
Suppose if i am getting this error in the 5th email, I will not be able to retrieve dates from 5th onwards. How to bypass this error, remove the entries causing this error and retrieve all other dates?
Thanks in Advance
Advertisement
Answer
Use a try/except
block:
try: datefinder.find_dates(b) except IllegalMonthError as e: # this will print the error, but will not stop the program print(e) except Exception as e: # any other unexpected error will be propagated raise e
Update from the edits:
Notice that the traceback shows
----> 9 for d in dates:
that the exeption is raised here. Indeed, checking the documentations for find_dates
, you see that find_dates
returns a generator:
Returns a generator that produces datetime.datetime objects, or a tuple with the source text and index, if requested
So the actual parsing of the date is not done when you call find_dates
, but when you iterate over the results. This makes it trickier to wrap in a try/catch
, as you have to iterate over the generator item by item, each in a separate try/catch
block:
from datefinder import find_dates string_with_dates = """ ... entries are due by January 4th, 2017 at 8:00pm ... created 01/15/2005 by ACME Inc. and associates. ... Liverpool NY 13088 42 cases """ matches = find_dates(string_with_dates) print(type(matches)) # <class 'generator'> while True: try: m = next(matches) # this is the exception seen by the program, rather than IllegalMonthError except TypeError as e: print(f"TypeError {e}") continue # the generator has no more items except StopIteration as e: print(f"StopIteration {e}") break # any other unexpected error will be propagated except Exception as e: raise e print(f"m {m}")
You can do with m
whatever you need.
Cheers!