For this dataframe (link):
Issue date Redemption date 0 14-Jan-20 14-Jan-21 1 31-Jul-20 29-Jan-21 2 14-Feb-20 12-Feb-21 3 31-Aug-20 26-Feb-21 4 1-Sep-10 1-Mar-21
I want to convert columns [Issue date, Redemption date] to datetime format.
The format variations are:
- Month is lower case: 01-set-09
- Month in Italian 01-set-09 (settembre-italian instead of September-English)
I have tried using:
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
and this one:
pd.to_datetime(df['Issue date'], format='%d-%b-%y')
however i get the following error:
ParserError: Unknown string format: 01-set-09
Here is a working code that generates my error:
import pandas as pd import numpy as np #setup dateframe url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv" skiprows=4 df = pd.read_csv(url, sep=None, skiprows=skiprows) print (df.dtypes) # convert date wont work df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True) #also tried this ad wont work pd.to_datetime(df['Issue date'], format='%d-%b-%y')
EDIT FROM MrFuppes answer
when running the code suggested (I am using GoogleColab):
import locale import pandas as pd url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv" df = pd.read_csv(url, skiprows=4) locale.setlocale(locale.LC_TIME, "en_us") # try English first df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce') locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian... m = df['datetime'].isnull() # where English did not work df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce') print(df['datetime'].isnull().value_counts()) # False 238 # True 2 # these two were already NaN on import
I get the following error:
Error Traceback (most recent call last) <ipython-input-1-be04fb51ee5f> in <module>() 5 df = pd.read_csv(url, skiprows=4) 6 ----> 7 locale.setlocale(locale.LC_TIME, "en_us") # try English first 8 df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce') 9 /usr/lib/python3.7/locale.py in setlocale(category, locale) 606 # convert to string 607 locale = normalize(_build_localename(locale)) --> 608 return _setlocale(category, locale) 609 610 def resetlocale(category=LC_ALL): Error: unsupported locale setting
Advertisement
Answer
If you know which languages to expect in advance, another option would be to set the appropriate locale. Side note, the names of the locales (i.e. the string to use for setting a specific one) depend on the operating system.
Example, Windows locale:
import locale import pandas as pd url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv" df = pd.read_csv(url, skiprows=4) locale.setlocale(locale.LC_TIME, "en_us") # try English first df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce') locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian... m = df['datetime'].isnull() # where English did not work df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce') print(df['datetime'].isnull().value_counts()) # False 238 # True 2 # these two were already NaN on import