Currently the text I am dealing with are dates with a somewhat standard format, however the data isn’t super clean.
For example the text can be in these formats:
Jan. 1, 2021 (dot after Jan) Jan, 1 2021 (comma after Jan) January, 1 2020 (Full month with comma) Jan, 1 2020 (two spaces after Jan, instead of one)
I’m not quite sure how to deal with this.
I want to convert these strings into 2021-01-01
format.
My plan was to convert to datetime object, and then convert back to string.
However when using strptime
, the pattern seemingly needs to be rigid,
and doesn’t allow for regex like patterns.
print(datetime.datetime.strptime(timestamp, '%b %d, %Y'))
instead of something like '%b|%Bs[.,]?
Anyone have suggestions as to how to convert my text into year-month-day format?
Advertisement
Answer
You can try using the dateutil library, (it’s one of the most downloaded pypi packages)
>>> from dateutil import parser >>> >>> print(parser.parse("Jan. 1, 2021")) 2021-01-01 00:00:00 >>> >>> print(parser.parse("Jan, 1 2021")) 2021-01-01 00:00:00 >>> >>> print(parser.parse("January, 1 2020")) 2020-01-01 00:00:00 >>> >>> print(parser.parse("Jan, 1 2020")) 2020-01-01 00:00:00