i have a dataframe data
d=pd.DataFrame({"dat":["01-06-68", "01-06-57","14-02-80","01-01-04","07-11-20"], "j":[34,2,1,7,8]})
i want to convert the dat column to “YYYY-MM-DD” format which is currently in dd-mm-yy format
Code using
pd.to_datetime(d.dat)
The output of this is coming out to be wrong
0 2068-01-06 1 2057-01-06 2 1980-02-14 3 2004-01-01 4 2020-07-11 Name: dat, dtype: datetime64[ns]
Problems
- it was supposed to be giving output year as 1968 and not 2068
- Months and date are also not coming in proper order
Required Output:
0 1968-06-01 1 1957-06-01 2 1980-02-14 3 2004-01-01 4 2020-11-07 Name: dat, dtype: datetime64[ns]
Advertisement
Answer
Solution with replace in callable for test last digits and then use %Y
for match years in YYYY format:
f = lambda x: '19' + x.group() if int(x.group()) > 22 else '20' + x.group() d.dat = d.dat.str.replace('(d+)$', f, regex=True) d.dat = pd.to_datetime(d.dat, format='%d-%m-%Y') print (d) dat j 0 1968-06-01 34 1 1957-06-01 2 2 1980-02-14 1 3 2004-01-01 7 4 2020-11-07 8
Or subtract 100 years if year greater like 2022
:
d.dat = pd.to_datetime(d.dat, format='%d-%m-%y') d.dat = d.dat.mask(d.dat.dt.year.gt(2022), d.dat - pd.offsets.DateOffset(years=100)) print (d) dat j 0 1968-06-01 34 1 1957-06-01 2 2 1980-02-14 1 3 2004-01-01 7 4 2020-11-07 8