Currently the text I am dealing with are dates with a somewhat standard format, however the data isn’t super clean.
For example the text can be in these formats:
JavaScript
x
5
1
Jan. 1, 2021 (dot after Jan)
2
Jan, 1 2021 (comma after Jan)
3
January, 1 2020 (Full month with comma)
4
Jan, 1 2020 (two spaces after Jan, instead of one)
5
I’m not quite sure how to deal with this.
I want to convert these strings into 2021-01-01
format.
My plan was to convert to datetime object, and then convert back to string.
However when using strptime
, the pattern seemingly needs to be rigid,
and doesn’t allow for regex like patterns.
JavaScript
1
2
1
print(datetime.datetime.strptime(timestamp, '%b %d, %Y'))
2
instead of something like '%b|%Bs[.,]?
Anyone have suggestions as to how to convert my text into year-month-day format?
Advertisement
Answer
You can try using the dateutil library, (it’s one of the most downloaded pypi packages)
JavaScript
1
14
14
1
>>> from dateutil import parser
2
>>>
3
>>> print(parser.parse("Jan. 1, 2021"))
4
2021-01-01 00:00:00
5
>>>
6
>>> print(parser.parse("Jan, 1 2021"))
7
2021-01-01 00:00:00
8
>>>
9
>>> print(parser.parse("January, 1 2020"))
10
2020-01-01 00:00:00
11
>>>
12
>>> print(parser.parse("Jan, 1 2020"))
13
2020-01-01 00:00:00
14