Skip to content
Advertisement

Using datetime.strptime when data is a little messy : extra spaces, Jan or January

Currently the text I am dealing with are dates with a somewhat standard format, however the data isn’t super clean.

For example the text can be in these formats:

Jan. 1, 2021 (dot after Jan)
Jan, 1 2021 (comma after Jan)
January, 1 2020 (Full month with comma)
Jan,  1 2020 (two spaces after Jan, instead of one)

I’m not quite sure how to deal with this. I want to convert these strings into 2021-01-01 format.

My plan was to convert to datetime object, and then convert back to string.

However when using strptime, the pattern seemingly needs to be rigid, and doesn’t allow for regex like patterns.

print(datetime.datetime.strptime(timestamp, '%b %d, %Y'))

instead of something like '%b|%Bs[.,]?

Anyone have suggestions as to how to convert my text into year-month-day format?

Advertisement

Answer

You can try using the dateutil library, (it’s one of the most downloaded pypi packages)

>>> from dateutil import parser
>>>
>>> print(parser.parse("Jan. 1, 2021"))
2021-01-01 00:00:00
>>>
>>> print(parser.parse("Jan, 1 2021"))
2021-01-01 00:00:00
>>>
>>> print(parser.parse("January, 1 2020"))
2020-01-01 00:00:00
>>>
>>> print(parser.parse("Jan,  1 2020"))
2020-01-01 00:00:00
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement