I tried to convert a column of dates to datetime
using pd.to_datetime(df, format='%Y-%m-%d_%H-%M-%S')
but I received the error ValueError: unconverted data remains: .1
I ran:
data.loc[pd.to_datetime(data.date, format='%Y-%m-%d_%H-%M-%S', errors='coerce').isnull(), 'date']
to identify the problem. 119/1037808 dates in the date
column have an extra “.1” at the end of them. Other than the “.1”, the dates are fine. How can I remove the “.1” from the end of those dates only and then convert the column values to datetime?
Here is an example dataframe that recreates the issue:
import pandas as pd data = pd.DataFrame({"date" : ["2022-01-15_08-11-00.1","2022-01-15_08-11-30","2022-01-15_08-12-00.1", "2022-01-15_08-12-30"], "value" : [1,2,3,4]})
I have tried:
data.date = data.date.replace(".1", "")
and
data = data.replace(".1", "")
but these did not remove the “.1”. The final result should look like this:
data = pd.DataFrame({"date" : ["2022-01-15_08-11-00","2022-01-15_08-11-30","2022-01-15_08-12-00", "2022-01-15_08-12-30"], "value" : [1,2,3,4]})
Advertisement
Answer
You can use pandas.Series.replace
to get rid of the extra dot/number :
data["date"]= pd.to_datetime(data["date"].replace(r".d+", "", regex=True), format="%Y-%m-%d_%H-%M-%S")
# Output :
print(data) print(data.dtypes) date value 0 2022-01-15 08:11:00 1 1 2022-01-15 08:11:30 2 2 2022-01-15 08:12:00 3 3 2022-01-15 08:12:30 4 date datetime64[ns] value int64 dtype: object
If you don’t want a datetime format, use just data["date"].replace(r".d+", "", regex=True)