Skip to content
Advertisement

pandas convert timestamp in a dynamic way

I have a dataframe which is having the following timestamps. Total of 5600000 rows!

time                                   SD
2016-01-02 11:23:04.299000+00:00       1
2016-02-02 11:23:04.299000+00:00       2
2016-03-02 11:23:04.299000+00:00       2
2016-04-01 11:23:04.299000+00:00       3   

Because I am doing the following operation, it throws an error.

(TypeError: Timestamp subtraction must have the same timezones or no timezones)

(pd.to_datetime("2016-01-02 11:23:04.299000+00:00") - pd.Timestamp("1970-01-01"))

Here i know the problem, and it could be solved by (pd.to_datetime("2016-01-02 11:23:04") - pd.Timestamp("1970-01-01"))

I am looking for an optimized solution for this problem because i think it is not a good way to iterate all rows of “time”, it takes a lot of time.

there might be a change that i get different format than what i have right now in the dataframe.

is there anything i can do with “pandas”? Thanks in advance!

Advertisement

Answer

IIUC This will use numpy and pandas to get the data difference between the 2 columns without the second column having the timezones

import pandas as pd
import numpy as np

df['time'] = pd.to_datetime(df['time'], infer_datetime_format = True)
df['date_to_subtract'] = pd.to_datetime('1970-01-01')
df['difference'] = df['time'].values - df['date_to_subtract'].values
10 People found this is helpful
Advertisement