This is a follow up to Calculating new column value in dataframe based on next rows column value
The solution in the previous question worked for a column holding hh:mm:ss values as a string.
I tried applying (no pun intended) the same logic to calculate the 1 second difference on a column of pandas Timestamps:
# df.start_time is now of type <class 'pandas._libs.tslibs.timestamps.Timestamp'> # in yyyy-dd-mm hh:mm:ss format s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)) df = df.assign( end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str) )
By mistake in one round of coding I change the line where the series is applied as a column to the df to:
df = df.assign( end_time=s.add(pd.Timestamp('now').normalize()))
The results were… interesting. The end_time is in the correct format, but the date portion…
start_time end_time 2021-03-30 16:58:13 2072-06-28 03:17:30.192227 2021-03-30 17:00:00 2072-06-28 03:17:32.192227
I expected the end_time Timedelta of 1 second less than the start_time. As you can see that is not the case! The end_time Timedelta is 51 years in the future!
Can someone please explain how/why this happened? There is no explicit call of pd.offsets.DateOffset(years=50)
Advertisement
Answer
The solution to this was easy, and staring me in the face.
The offending code:
s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))
The correct way to create an end_time off of a timestamp type series/column:
s = pd.to_timestamp(df.start_time).shift(-1).sub(pd.offsets.Second(1))