I have a dataframe with a column of dates in the format MMDDYYY. I want to convert the dates into the format YYYY-MM-DD. This works for most dates. But for dates starting with 1, the wrong output is given. In this example, the last 3 rows are wrong. There are many rows so I cannot hardcode the correct value.
OriginalDates (MMDDYYYY) OutputDates (YYYYMMDD) ExpectedDates (YYYYMMDD) Correct Output? 5011989 1989-05-01 1989-05-01 Yes 6011989 1989-06-01 1989-06-01 Yes 12042009 2009-12-04 2009-12-04 Yes 01012001 2001-01-01 2001-01-01 Yes 1161955 1955-11-06 1955-01-16 No 1051991 1991-10-05 1991-01-05 No 1011933 1933-10-01 1933-01-01 No
My code:
df['OutputDates'] = pd.to_datetime(df['OriginalDates'], format='%m%d%Y') df['OutputDates'] = pd.to_datetime(df['OutputDates'], format='%Y-%m-%d')
Advertisement
Answer
There you go using string slicing, not the cleanest solution but it does what you require :
def format_date(x): if len(x) == 7: return x[-4:] + '-' + x[-6:3] + '-' + x[-8:1] if len(x) == 8: return(x[-4:] + '-' + x[2:4] + '-' + x[0:2]) df['OriginalDates (MMDDYYYY)'] = df['OriginalDates (MMDDYYYY)'].apply(lambda x: format_date(str(x))) df['OriginalDates (MMDDYYYY)'] = pd.to_datetime(df['OriginalDates (MMDDYYYY)'], format='%Y-%d-%m')