Skip to content
Advertisement

Convert object column format “%d-%b-%y” to datetime, where month initial letter is Upper or lowercase and Month can be in italian or english

For this dataframe (link):

    Issue date Redemption date
0    14-Jan-20       14-Jan-21
1    31-Jul-20       29-Jan-21
2    14-Feb-20       12-Feb-21
3    31-Aug-20       26-Feb-21
4     1-Sep-10        1-Mar-21

I want to convert columns [Issue date, Redemption date] to datetime format.

The format variations are:

  • Month is lower case: 01-set-09
  • Month in Italian 01-set-09 (settembre-italian instead of September-English)

I have tried using:

df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)

and this one:

pd.to_datetime(df['Issue date'], format='%d-%b-%y')

however i get the following error:

ParserError: Unknown string format: 01-set-09

Here is a working code that generates my error:

import pandas as pd 
import numpy as np 
 #setup dateframe 
url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
skiprows=4
df = pd.read_csv(url, sep=None, skiprows=skiprows)
print (df.dtypes)

 # convert date wont work 
df['Issue date'] = pd.to_datetime(df['Issue date'], infer_datetime_format=True)
 #also tried this ad wont work 
pd.to_datetime(df['Issue date'], format='%d-%b-%y')

EDIT FROM MrFuppes answer

when running the code suggested (I am using GoogleColab):

import locale
import pandas as pd

url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
df = pd.read_csv(url, skiprows=4)

locale.setlocale(locale.LC_TIME, "en_us") # try English first
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')

locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
m = df['datetime'].isnull() # where English did not work
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')

print(df['datetime'].isnull().value_counts())
# False    238
# True       2 # these two were already NaN on import

I get the following error:

Error                                     Traceback (most recent call last)
<ipython-input-1-be04fb51ee5f> in <module>()
      5 df = pd.read_csv(url, skiprows=4)
      6 
----> 7 locale.setlocale(locale.LC_TIME, "en_us") # try English first
      8 df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')
      9 

/usr/lib/python3.7/locale.py in setlocale(category, locale)
    606         # convert to string
    607         locale = normalize(_build_localename(locale))
--> 608     return _setlocale(category, locale)
    609 
    610 def resetlocale(category=LC_ALL):

Error: unsupported locale setting

Advertisement

Answer

If you know which languages to expect in advance, another option would be to set the appropriate locale. Side note, the names of the locales (i.e. the string to use for setting a specific one) depend on the operating system.

Example, Windows locale:

import locale
import pandas as pd

url = "http://www.dt.mef.gov.it/modules/documenti_en/debito_pubblico/scadenze_titoli_suddivise_per_anno/Maturities_broken_down_by_year_xupdate_July_31x_2021x.csv"
df = pd.read_csv(url, skiprows=4)

locale.setlocale(locale.LC_TIME, "en_us") # try English first
df['datetime'] = pd.to_datetime(df['Issue date'], format='%d-%b-%y', errors='coerce')

locale.setlocale(locale.LC_TIME, "it_it") # we also have Italian...
m = df['datetime'].isnull() # where English did not work
df.loc[m, 'datetime'] = pd.to_datetime(df['Issue date'][m], format='%d-%b-%y', errors='coerce')

print(df['datetime'].isnull().value_counts())
# False    238
# True       2 # these two were already NaN on import
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement