Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 1 year ago. Improve this question I have a following problem. My data has this structure: I would like to calculate difference in minutes based on: Name and Value starts with

Compute time difference in pandas based on conditions [closed]

I have a following problem. My data has this structure:

import pandas as pd
import numpy as np

input = {
    "Name": ["Tom", "Tom", "nick", "krish", "krish", "jack", "krish"],
    "Age": [20, 20, 21, 19, 19, 18, 19],
    "Time": [
        "2021-09-23 00:01:00",
        "2021-09-24 00:02:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:10:00",
        "2021-09-23 00:01:00",
        "2021-09-25 00:03:00",
    ],
    "Value": [1, 5, 1, 1, 17, 2, 8],
}

df_input = pd.DataFrame(input)

JavaScript
​x
 
import pandas as pd
import numpy as np
​
input = {
    "Name": ["Tom", "Tom", "nick", "krish", "krish", "jack", "krish"],
    "Age": [20, 20, 21, 19, 19, 18, 19],
    "Time": [
        "2021-09-23 00:01:00",
        "2021-09-24 00:02:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:10:00",
        "2021-09-23 00:01:00",
        "2021-09-25 00:03:00",
    ],
    "Value": [1, 5, 1, 1, 17, 2, 8],
}
​
df_input = pd.DataFrame(input)
​

I would like to calculate difference in minutes based on:

Name
and Value starts with 1 and ends with 9 or 17.

Desired output is:

output = {
    "Name": ["Tom", "Tom", "nick", "krish", "krish", "jack", "krish"],
    "Age": [20, 20, 21, 19, 19, 18, 19],
    "Time": [
        "2021-09-23 00:01:00",
        "2021-09-24 00:02:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:10:00",
        "2021-09-23 00:01:00",
        "2021-09-25 00:03:00",
    ],
    "Value": [1, 5, 1, 1, 17, 2, 8],
    "Diff_hours": [np.NaN, np.NaN, np.NaN, # becuase no 9 or 17 at the end in Value
         9, # because 2021-09-23 00:01:00 minus 2021-09-23 00:10:00
         9,
         np.NaN, # because neither 1 at beginning and 9 or 17 at the end in Value
         9  
    ],
}

df_output = pd.DataFrame(output)

JavaScript
 
output = {
    "Name": ["Tom", "Tom", "nick", "krish", "krish", "jack", "krish"],
    "Age": [20, 20, 21, 19, 19, 18, 19],
    "Time": [
        "2021-09-23 00:01:00",
        "2021-09-24 00:02:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:01:00",
        "2021-09-23 00:10:00",
        "2021-09-23 00:01:00",
        "2021-09-25 00:03:00",
    ],
    "Value": [1, 5, 1, 1, 17, 2, 8],
    "Diff_hours": [np.NaN, np.NaN, np.NaN, # becuase no 9 or 17 at the end in Value
         9, # because 2021-09-23 00:01:00 minus 2021-09-23 00:10:00
         9,
         np.NaN, # because neither 1 at beginning and 9 or 17 at the end in Value
         9  
    ],
}
​
df_output = pd.DataFrame(output)
​

I found this, but it did not help me: Time difference in day based on specific condition in pandas

Answer

Solution I come with, but there might be better one:

help = df_input[["Name", "Time", "Value"]]
help = help[(help["Value"] == 1 ) | (help["Value"] == 9 ) | (help["Value"] == 17 ) ]

help["Time"] = pd.to_datetime(help["Time"])
help['diff'] = help.sort_values(['Name','Time']).groupby('Name')['Time'].diff()

help['diff'] = help['diff'].fillna(pd.Timedelta(seconds=0))
help['diff'] = help['diff'].dt.total_seconds().div(60).astype(int)
help = help[help["diff"] != 0][["Name", "diff"]]


df_output = df_input.merge(
    help, how="left", on="Name"
)

print(df_output)

JavaScript
 
help = df_input[["Name", "Time", "Value"]]
help = help[(help["Value"] == 1 ) | (help["Value"] == 9 ) | (help["Value"] == 17 ) ]
​
help["Time"] = pd.to_datetime(help["Time"])
help['diff'] = help.sort_values(['Name','Time']).groupby('Name')['Time'].diff()
​
help['diff'] = help['diff'].fillna(pd.Timedelta(seconds=0))
help['diff'] = help['diff'].dt.total_seconds().div(60).astype(int)
help = help[help["diff"] != 0][["Name", "diff"]]
​
​
df_output = df_input.merge(
    help, how="left", on="Name"
)
​
print(df_output)
​

Advertisement

Answer