I have a dataframe with a column of dates in the format MMDDYYY. I want to convert the dates into the format YYYY-MM-DD. This works for most dates. But for dates starting with 1, the wrong output is given. In this example, the last 3 rows are wrong. There are many rows so I cannot hardcode the correct value.
JavaScript
x
9
1
OriginalDates (MMDDYYYY) OutputDates (YYYYMMDD) ExpectedDates (YYYYMMDD) Correct Output?
2
5011989 1989-05-01 1989-05-01 Yes
3
6011989 1989-06-01 1989-06-01 Yes
4
12042009 2009-12-04 2009-12-04 Yes
5
01012001 2001-01-01 2001-01-01 Yes
6
1161955 1955-11-06 1955-01-16 No
7
1051991 1991-10-05 1991-01-05 No
8
1011933 1933-10-01 1933-01-01 No
9
My code:
JavaScript
1
4
1
df['OutputDates'] = pd.to_datetime(df['OriginalDates'], format='%m%d%Y')
2
df['OutputDates'] = pd.to_datetime(df['OutputDates'], format='%Y-%m-%d')
3
4
Advertisement
Answer
There you go using string slicing, not the cleanest solution but it does what you require :
JavaScript
1
10
10
1
def format_date(x):
2
if len(x) == 7:
3
return x[-4:] + '-' + x[-6:3] + '-' + x[-8:1]
4
5
if len(x) == 8:
6
return(x[-4:] + '-' + x[2:4] + '-' + x[0:2])
7
8
df['OriginalDates (MMDDYYYY)'] = df['OriginalDates (MMDDYYYY)'].apply(lambda x: format_date(str(x)))
9
df['OriginalDates (MMDDYYYY)'] = pd.to_datetime(df['OriginalDates (MMDDYYYY)'], format='%Y-%d-%m')
10