I have a following problem. My data has this structure:
JavaScript
x
20
20
1
import pandas as pd
2
import numpy as np
3
4
input = {
5
"Name": ["Tom", "Tom", "nick", "krish", "krish", "jack", "krish"],
6
"Age": [20, 20, 21, 19, 19, 18, 19],
7
"Time": [
8
"2021-09-23 00:01:00",
9
"2021-09-24 00:02:00",
10
"2021-09-23 00:01:00",
11
"2021-09-23 00:01:00",
12
"2021-09-23 00:10:00",
13
"2021-09-23 00:01:00",
14
"2021-09-25 00:03:00",
15
],
16
"Value": [1, 5, 1, 1, 17, 2, 8],
17
}
18
19
df_input = pd.DataFrame(input)
20
I would like to calculate difference in minutes based on:
Name
- and
Value
starts with 1 and ends with 9 or 17.
Desired output is:
JavaScript
1
23
23
1
output = {
2
"Name": ["Tom", "Tom", "nick", "krish", "krish", "jack", "krish"],
3
"Age": [20, 20, 21, 19, 19, 18, 19],
4
"Time": [
5
"2021-09-23 00:01:00",
6
"2021-09-24 00:02:00",
7
"2021-09-23 00:01:00",
8
"2021-09-23 00:01:00",
9
"2021-09-23 00:10:00",
10
"2021-09-23 00:01:00",
11
"2021-09-25 00:03:00",
12
],
13
"Value": [1, 5, 1, 1, 17, 2, 8],
14
"Diff_hours": [np.NaN, np.NaN, np.NaN, # becuase no 9 or 17 at the end in Value
15
9, # because 2021-09-23 00:01:00 minus 2021-09-23 00:10:00
16
9,
17
np.NaN, # because neither 1 at beginning and 9 or 17 at the end in Value
18
9
19
],
20
}
21
22
df_output = pd.DataFrame(output)
23
I found this, but it did not help me: Time difference in day based on specific condition in pandas
Advertisement
Answer
Solution I come with, but there might be better one:
JavaScript
1
17
17
1
help = df_input[["Name", "Time", "Value"]]
2
help = help[(help["Value"] == 1 ) | (help["Value"] == 9 ) | (help["Value"] == 17 ) ]
3
4
help["Time"] = pd.to_datetime(help["Time"])
5
help['diff'] = help.sort_values(['Name','Time']).groupby('Name')['Time'].diff()
6
7
help['diff'] = help['diff'].fillna(pd.Timedelta(seconds=0))
8
help['diff'] = help['diff'].dt.total_seconds().div(60).astype(int)
9
help = help[help["diff"] != 0][["Name", "diff"]]
10
11
12
df_output = df_input.merge(
13
help, how="left", on="Name"
14
)
15
16
print(df_output)
17