Hi already referred to this post but I could not get through my issue. I have a column in my CSV which is string and the sample values are below (note that the month
and year
positioning are sometimes reversed). What format do I need to set in my to_datetime
? I tried all the below approaches
df = pd.read_csv("filename.csv") #Imagine there is a Month column #[1] df["Month"] = pd.to_datetime(df["Month"]) #[2] df["Month"] = pd.to_datetime(df["Month"], format="%m/%d/%Y") [Month] Mar-97 Apr-97 May-97 Jun-97 Nov-00 Dec-00 1-Jan 1-Feb 1-Mar 1-Apr
I get the error
ValueError: day is out of range for month
for [1] and I get
ValueError: time data ‘Mar-97’ does not match format ‘%m/%d/%Y’ (match)
for [2]. I tried to remove the %d
too but no luck. Could you please point me what is going wrong here.
Advertisement
Answer
One way is to use try
/ except
with pd.Series.apply
:
s = pd.Series(['Mar-97', 'May-97', 'Nov-00', '1-Jan', '1-Mar']) def converter(x): try: return pd.datetime.strptime(x, '%b-%y') except ValueError: year, month = x.split('-') # split by delimiter x = year.zfill(2) + '-' + month # %y requires 0-padding return pd.datetime.strptime(x, '%y-%b') res = s.apply(converter) print(res) 0 1997-03-01 1 1997-05-01 2 2000-11-01 3 2001-01-01 4 2001-03-01 dtype: datetime64[ns]
Since we have defined converter
as a function, we can use this directly with pd.read_csv
:
df = pd.read_csv('file.csv', parse_dates=['dt_col_name'], date_parser=converter)
Python’s strftime directives is a useful reference for constructing datetime
format strings.