I am trying to extract dates from email texts using datefinder python library.
Below is a the code snippet of what I am trying to do.
import datefinder
#body has list of email texts
email_dates=[]
for b in body: 
    dates = datefinder.find_dates(b)
    date = []
    for d in dates:
        date.append(d)
    email_dates.append(date)
datefinder tries to construct all the numbers in the email to dates. I get lot of false positives. I can remove those using some logic. But i get IllegalMonthError in some email and i am unable to go past the error and retrieve dates from other emails. Below is the error
---------------------------------------------------------------------------
IllegalMonthError                         Traceback (most recent call last)
c:pythonpython38libsite-packagesdateutilparser_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    654         try:
--> 655             ret = self._build_naive(res, default)
    656         except ValueError as e:
c:pythonpython38libsite-packagesdateutilparser_parser.py in _build_naive(self, res, default)
   1237 
-> 1238             if cday > monthrange(cyear, cmonth)[1]:
   1239                 repl['day'] = monthrange(cyear, cmonth)[1]
c:pythonpython38libcalendar.py in monthrange(year, month)
    123     if not 1 <= month <= 12:
--> 124         raise IllegalMonthError(month)
    125     day1 = weekday(year, month, 1)
IllegalMonthError: bad month number 42; must be 1-12
During handling of the above exception, another exception occurred:
TypeError                                 Traceback (most recent call last)
<ipython-input-39-1fbacc8ca3f6> in <module>
      7     dates = datefinder.find_dates(b)
      8     date = []
----> 9     for d in dates:
     10         date.append(d)
     11 
c:pythonpython38libsite-packagesdatefinder__init__.py in find_dates(self, text, source, index, strict)
     30         ):
     31 
---> 32             as_dt = self.parse_date_string(date_string, captures)
     33             if as_dt is None:
     34                 ## Dateutil couldn't make heads or tails of it
c:pythonpython38libsite-packagesdatefinder__init__.py in parse_date_string(self, date_string, captures)
    100         # otherwise self._find_and_replace method might corrupt them
    101         try:
--> 102             as_dt = parser.parse(date_string, default=self.base_date)
    103         except (ValueError, OverflowError):
    104             # replace tokens that are problematic for dateutil
c:pythonpython38libsite-packagesdateutilparser_parser.py in parse(timestr, parserinfo, **kwargs)
   1372         return parser(parserinfo).parse(timestr, **kwargs)
   1373     else:
-> 1374         return DEFAULTPARSER.parse(timestr, **kwargs)
   1375 
   1376 
c:pythonpython38libsite-packagesdateutilparser_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    655             ret = self._build_naive(res, default)
    656         except ValueError as e:
--> 657             six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)
    658 
    659         if not ignoretz:
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Suppose if i am getting this error in the 5th email, I will not be able to retrieve dates from 5th onwards. How to bypass this error, remove the entries causing this error and retrieve all other dates?
Thanks in Advance
Advertisement
Answer
Use a try/except block:
try:
    datefinder.find_dates(b)
except IllegalMonthError as e:
    # this will print the error, but will not stop the program
    print(e)
except Exception as e:
    # any other unexpected error will be propagated
    raise e
Update from the edits:
Notice that the traceback shows
----> 9 for d in dates:
that the exeption is raised here. Indeed, checking the documentations for find_dates, you see that find_dates returns a generator:
Returns a generator that produces datetime.datetime objects, or a tuple with the source text and index, if requested
So the actual parsing of the date is not done when you call find_dates, but when you iterate over the results. This makes it trickier to wrap in a try/catch, as you have to iterate over the generator item by item, each in a separate try/catch block:
from datefinder import find_dates
string_with_dates = """
...
entries are due by January 4th, 2017 at 8:00pm
...
created 01/15/2005 by ACME Inc. and associates.
...
Liverpool NY 13088 42 cases
"""
matches = find_dates(string_with_dates)
print(type(matches))  # <class 'generator'>
while True:
    try:
        m = next(matches)
    # this is the exception seen by the program, rather than IllegalMonthError
    except TypeError as e:
        print(f"TypeError {e}")
        continue
    # the generator has no more items
    except StopIteration as e:
        print(f"StopIteration {e}")
        break
    # any other unexpected error will be propagated
    except Exception as e:
        raise e
    print(f"m {m}")
You can do with m whatever you need.
Cheers!