Skip to content
Advertisement

Pandas: Subtract timestamps

I grouped a dataframe test_df2 by frequency 'B' (by business day, so each name of the group is the date of that day at 00:00) and am now looping over the groups to calculate timestamp differences and save them in the dict grouped_bins. The data in the original dataframe and the groups looks like this:

timestamp status externalId
0 2020-05-11 13:06:05.922 1 1
7 2020-05-11 13:14:29.759 10 1
8 2020-05-11 13:16:09.147 1 2
16 2020-05-11 13:19:08.641 10 2

What I want is to calculate the difference between each row’s timestamp, for example of rows 7 and 0, since they have the same externalId.

What I did for that purpose is the following.

JavaScript

After grouping, I loop over each group (to maintain the day) and try to calculate the difference between the timestamos, which is where it goes wrong.

JavaScript

The problem I face is the following. The element in the dict (elem) looks like this in the end:

JavaScript

What I want is something like this:

JavaScript

Though I am open to suggestions on how to store time durations in Python/Pandas.

I am also open to suggestions regarding the loop itself.

Advertisement

Answer

To obtain timestamp differences between consecutive rows of the same externalId, you should be able to simply write, for example:

JavaScript

On the example you give:

JavaScript

If your timestamps are not actually of type Timestamp yet, then you can convert them into it first:

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement