Skip to content
Advertisement

Python calculated Timedelta 50 years in future, should be same day

This is a follow up to Calculating new column value in dataframe based on next rows column value

The solution in the previous question worked for a column holding hh:mm:ss values as a string.

I tried applying (no pun intended) the same logic to calculate the 1 second difference on a column of pandas Timestamps:

# df.start_time is now of type <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# in yyyy-dd-mm hh:mm:ss format
s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))
df = df.assign( end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str) )

By mistake in one round of coding I change the line where the series is applied as a column to the df to:

df = df.assign( end_time=s.add(pd.Timestamp('now').normalize()))

The results were… interesting. The end_time is in the correct format, but the date portion…

    start_time                   end_time
2021-03-30 16:58:13     2072-06-28 03:17:30.192227
2021-03-30 17:00:00     2072-06-28 03:17:32.192227

I expected the end_time Timedelta of 1 second less than the start_time. As you can see that is not the case! The end_time Timedelta is 51 years in the future!

Can someone please explain how/why this happened? There is no explicit call of pd.offsets.DateOffset(years=50)

Advertisement

Answer

The solution to this was easy, and staring me in the face.

The offending code:

s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))

The correct way to create an end_time off of a timestamp type series/column:

s = pd.to_timestamp(df.start_time).shift(-1).sub(pd.offsets.Second(1))
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement