i have a dataframe data
JavaScript
x
3
1
d=pd.DataFrame({"dat":["01-06-68", "01-06-57","14-02-80","01-01-04","07-11-20"],
2
"j":[34,2,1,7,8]})
3
i want to convert the dat column to “YYYY-MM-DD” format which is currently in dd-mm-yy format
Code using
JavaScript
1
2
1
pd.to_datetime(d.dat)
2
The output of this is coming out to be wrong
JavaScript
1
7
1
0 2068-01-06
2
1 2057-01-06
3
2 1980-02-14
4
3 2004-01-01
5
4 2020-07-11
6
Name: dat, dtype: datetime64[ns]
7
Problems
- it was supposed to be giving output year as 1968 and not 2068
- Months and date are also not coming in proper order
Required Output:
JavaScript
1
9
1
0 1968-06-01
2
1 1957-06-01
3
2 1980-02-14
4
3 2004-01-01
5
4 2020-11-07
6
Name: dat, dtype: datetime64[ns]
7
8
9
Advertisement
Answer
Solution with replace in callable for test last digits and then use %Y
for match years in YYYY format:
JavaScript
1
12
12
1
f = lambda x: '19' + x.group() if int(x.group()) > 22 else '20' + x.group()
2
d.dat = d.dat.str.replace('(d+)$', f, regex=True)
3
d.dat = pd.to_datetime(d.dat, format='%d-%m-%Y')
4
5
print (d)
6
dat j
7
0 1968-06-01 34
8
1 1957-06-01 2
9
2 1980-02-14 1
10
3 2004-01-01 7
11
4 2020-11-07 8
12
Or subtract 100 years if year greater like 2022
:
JavaScript
1
12
12
1
d.dat = pd.to_datetime(d.dat, format='%d-%m-%y')
2
3
d.dat = d.dat.mask(d.dat.dt.year.gt(2022), d.dat - pd.offsets.DateOffset(years=100))
4
print (d)
5
6
dat j
7
0 1968-06-01 34
8
1 1957-06-01 2
9
2 1980-02-14 1
10
3 2004-01-01 7
11
4 2020-11-07 8
12