Skip to content
Advertisement

Remove part of a string from pd.to_datetime() unconverted values

I tried to convert a column of dates to datetime using pd.to_datetime(df, format='%Y-%m-%d_%H-%M-%S') but I received the error ValueError: unconverted data remains: .1

I ran:

data.loc[pd.to_datetime(data.date, format='%Y-%m-%d_%H-%M-%S', errors='coerce').isnull(), 'date']

to identify the problem. 119/1037808 dates in the date column have an extra “.1” at the end of them. Other than the “.1”, the dates are fine. How can I remove the “.1” from the end of those dates only and then convert the column values to datetime?

Here is an example dataframe that recreates the issue:

import pandas as pd

data = pd.DataFrame({"date" : ["2022-01-15_08-11-00.1","2022-01-15_08-11-30","2022-01-15_08-12-00.1", "2022-01-15_08-12-30"], 
                     "value" : [1,2,3,4]})

I have tried:

data.date = data.date.replace(".1", "")

and

data = data.replace(".1", "")

but these did not remove the “.1”. The final result should look like this:

data = pd.DataFrame({"date" : ["2022-01-15_08-11-00","2022-01-15_08-11-30","2022-01-15_08-12-00", "2022-01-15_08-12-30"], 
                         "value" : [1,2,3,4]})

Advertisement

Answer

You can use pandas.Series.replace to get rid of the extra dot/number :

data["date"]= pd.to_datetime(data["date"].replace(r".d+", "",
                                                  regex=True),
                             format="%Y-%m-%d_%H-%M-%S")

# Output :

print(data)
print(data.dtypes)

                 date  value
0 2022-01-15 08:11:00      1
1 2022-01-15 08:11:30      2
2 2022-01-15 08:12:00      3
3 2022-01-15 08:12:30      4
date     datetime64[ns]
value             int64
dtype: object

If you don’t want a datetime format, use just data["date"].replace(r".d+", "", regex=True)

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement