Skip to content
Advertisement

Python Pandas – Datetime gives wrong output only for certain dates

I have a dataframe with a column of dates in the format MMDDYYY. I want to convert the dates into the format YYYY-MM-DD. This works for most dates. But for dates starting with 1, the wrong output is given. In this example, the last 3 rows are wrong. There are many rows so I cannot hardcode the correct value.

 OriginalDates (MMDDYYYY)   OutputDates (YYYYMMDD)   ExpectedDates (YYYYMMDD)     Correct Output?
 5011989                    1989-05-01               1989-05-01                   Yes
 6011989                    1989-06-01               1989-06-01                   Yes
 12042009                   2009-12-04               2009-12-04                   Yes
 01012001                   2001-01-01               2001-01-01                   Yes
 1161955                    1955-11-06               1955-01-16                   No
 1051991                    1991-10-05               1991-01-05                   No
 1011933                    1933-10-01               1933-01-01                   No

My code:

df['OutputDates'] = pd.to_datetime(df['OriginalDates'], format='%m%d%Y')
df['OutputDates'] = pd.to_datetime(df['OutputDates'], format='%Y-%m-%d')
                

Advertisement

Answer

There you go using string slicing, not the cleanest solution but it does what you require :

def format_date(x):
    if len(x) == 7:
        return x[-4:] + '-' + x[-6:3] + '-' + x[-8:1]

    if len(x) == 8:
        return(x[-4:] + '-' + x[2:4] + '-' + x[0:2])
    
df['OriginalDates (MMDDYYYY)'] = df['OriginalDates (MMDDYYYY)'].apply(lambda x: format_date(str(x)))
df['OriginalDates (MMDDYYYY)'] = pd.to_datetime(df['OriginalDates (MMDDYYYY)'], format='%Y-%d-%m') 
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement