For this dataframe (link):
JavaScript
x
7
1
Issue date Redemption date
2
0 14-Jan-20 14-Jan-21
3
1 31-Jul-20 29-Jan-21
4
2 14-Feb-20 12-Feb-21
5
3 31-Aug-20 26-Feb-21
6
4 1-Sep-10 1-Mar-21
7
I want to convert columns [Issue date, Redemption date] to datetime format.
The format variations are:
- Month is lower case: 01-set-09
- Month in Italian 01-set-09 (settembre-italian instead of September-English)
I have tried using:
JavaScript
1
2
1
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
2
and this one:
JavaScript
1
2
1
pd.to_datetime(df['Issue date'], format='%d-%b-%y')
2
however i get the following error:
JavaScript
1
2
1
ParserError: Unknown string format: 01-set-09
2
Here is a working code that generates my error:
JavaScript
1
13
13
1
import pandas as pd
2
import numpy as np
3
#setup dateframe
4
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
5
skiprows=4
6
df = pd.read_csv(url, sep=None, skiprows=skiprows)
7
print (df.dtypes)
8
9
# convert date wont work
10
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
11
#also tried this ad wont work
12
pd.to_datetime(df['Issue date'], format='%d-%b-%y')
13
EDIT FROM MrFuppes answer
when running the code suggested (I am using GoogleColab):
JavaScript
1
17
17
1
import locale
2
import pandas as pd
3
4
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
5
df = pd.read_csv(url, skiprows=4)
6
7
locale.setlocale(locale.LC_TIME, "en_us") # try English first
8
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
9
10
locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
11
m = df['datetime'].isnull() # where English did not work
12
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')
13
14
print(df['datetime'].isnull().value_counts())
15
# False 238
16
# True 2 # these two were already NaN on import
17
I get the following error:
JavaScript
1
17
17
1
Error Traceback (most recent call last)
2
<ipython-input-1-be04fb51ee5f> in <module>()
3
5 df = pd.read_csv(url, skiprows=4)
4
6
5
----> 7 locale.setlocale(locale.LC_TIME, "en_us") # try English first
6
8 df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
7
9
8
9
/usr/lib/python3.7/locale.py in setlocale(category, locale)
10
606 # convert to string
11
607 locale = normalize(_build_localename(locale))
12
--> 608 return _setlocale(category, locale)
13
609
14
610 def resetlocale(category=LC_ALL):
15
16
Error: unsupported locale setting
17
Advertisement
Answer
If you know which languages to expect in advance, another option would be to set the appropriate locale. Side note, the names of the locales (i.e. the string to use for setting a specific one) depend on the operating system.
Example, Windows locale:
JavaScript
1
17
17
1
import locale
2
import pandas as pd
3
4
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
5
df = pd.read_csv(url, skiprows=4)
6
7
locale.setlocale(locale.LC_TIME, "en_us") # try English first
8
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
9
10
locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
11
m = df['datetime'].isnull() # where English did not work
12
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')
13
14
print(df['datetime'].isnull().value_counts())
15
# False 238
16
# True 2 # these two were already NaN on import
17