Pandas: Check each row for condition and insert row below if condition is met

this is my first question here as I really couldn’t figure it out with related answers: I have a list of dataframes “df_list”, for each user I have a dataframe which basically looks like:

-   |User |Timestamp                |Timestamp2         |check            |in_out
0   |0001 |2022-01-07 07:40:44  |2022-01-07 17:32:18    |0 days 09:51:34  |in
1   |0001 |2022-01-07 17:32:18  |2022-01-10 17:31:50    |2 days 23:59:32  |out
2   |0001 |2022-01-10 17:31:50  |2022-01-11 07:43:48    |0 days 14:11:58  |in

JavaScript
​x
 
-   |User |Timestamp                |Timestamp2         |check            |in_out
0   |0001 |2022-01-07 07:40:44  |2022-01-07 17:32:18    |0 days 09:51:34  |in
1   |0001 |2022-01-07 17:32:18  |2022-01-10 17:31:50    |2 days 23:59:32  |out
2   |0001 |2022-01-10 17:31:50  |2022-01-11 07:43:48    |0 days 14:11:58  |in
​

Data:

data = {'User': [1, 1, 1],
 'Timestamp': ['2022-01-07 07:40:44',
  '2022-01-07 17:32:18',
  '2022-01-10 17:31:50'],
 'Timestamp2': ['2022-01-07 17:32:18',
  '2022-01-10 17:31:50',
  '2022-01-11 07:43:48'],
 'check': ['0 days 09:51:34', '2 days 23:59:32', '0 days 14:11:58'],
 'in_out': ['in', 'out', 'in']}

JavaScript
 
data = {'User': [1, 1, 1],
 'Timestamp': ['2022-01-07 07:40:44',
  '2022-01-07 17:32:18',
  '2022-01-10 17:31:50'],
 'Timestamp2': ['2022-01-07 17:32:18',
  '2022-01-10 17:31:50',
  '2022-01-11 07:43:48'],
 'check': ['0 days 09:51:34', '2 days 23:59:32', '0 days 14:11:58'],
 'in_out': ['in', 'out', 'in']}
​

I would like to go through all the dataframes in my df_list and inside each df I would like to add 1 row (below) where the ‘check’ value is > 15 hours.

Desired Output

-   |User |Timestamp                |Timestamp2         |check            |in_out
0   |0001 |2022-01-07 07:40:44  |2022-01-07 17:32:18    |0 days 09:51:34  |in
1   |0001 |2022-01-07 17:32:18  |2022-01-10 17:31:50    |2 days 23:59:32  |out
2   |0001  
3   |0001 |2022-01-10 17:31:50  |2022-01-11 07:43:48    |0 days 14:11:58  |in

JavaScript
 
-   |User |Timestamp                |Timestamp2         |check            |in_out
0   |0001 |2022-01-07 07:40:44  |2022-01-07 17:32:18    |0 days 09:51:34  |in
1   |0001 |2022-01-07 17:32:18  |2022-01-10 17:31:50    |2 days 23:59:32  |out
2   |0001  
3   |0001 |2022-01-10 17:31:50  |2022-01-11 07:43:48    |0 days 14:11:58  |in
​

Attempt:

So what I tried is to go with a for-loop into the list and there with another loop through all my rows in each dataframe. Then I checked via if-clause for my condition… I create a new blank line with the required index and then concat the dataframe so i can include the empty line.

for dfx in df_list:     
    for i in dfx.index.values.tolist():
        if (pd.Timedelta(dfx["check"].loc[i]) > pd.Timedelta("15 hours")) == True:
            line = pd.DataFrame({'ID': '', 'Timestamp': None,'Timestamp2': '','check': None}, index=[i+1])
            dfx = pd.concat([dfx.iloc[:i+1], line, dfx.iloc[i:]]).reset_index(drop=True)

JavaScript
 
for dfx in df_list:     
    for i in dfx.index.values.tolist():
        if (pd.Timedelta(dfx["check"].loc[i]) > pd.Timedelta("15 hours")) == True:
            line = pd.DataFrame({'ID': '', 'Timestamp': None,'Timestamp2': '','check': None}, index=[i+1])
            dfx = pd.concat([dfx.iloc[:i+1], line, dfx.iloc[i:]]).reset_index(drop=True)
​

I also found this here Add empty row if a condition is met Pandas but I can’t get this running as well…

for i in dfy.index.values.tolist():
    indices = dfy.loc[pd.Timedelta(((dfy["check"].loc[i]) > pd.Timedelta("15 hours"))== True)].index.tolist()
    ERROR: Value must be Timedelta, string, integer, float, timedelta or convertible, not bool

JavaScript
 
for i in dfy.index.values.tolist():
    indices = dfy.loc[pd.Timedelta(((dfy["check"].loc[i]) > pd.Timedelta("15 hours"))== True)].index.tolist()
    ERROR: Value must be Timedelta, string, integer, float, timedelta or convertible, not bool
​

In the next step I want to add the ‘Timestamp’ value from previous line + or – 10 hours, depending on the ‘value’ of the in/out column…but I thought I should figure out one problem at a time…

Answer

You can create a boolean mask using “check” column and add a row using Index.repeat and reindexing. Then sort_index and delete duplicate values:

msk = df['check'].astype('timedelta64[h]') > 15
df = df.reindex(df[msk].index.repeat(2)).append(df[~msk]).sort_index()
df.loc[df.duplicated(['Timestamp','check']), df.columns.drop('User')] = np.nan

JavaScript
 
msk = df['check'].astype('timedelta64[h]') > 15
df = df.reindex(df[msk].index.repeat(2)).append(df[~msk]).sort_index()
df.loc[df.duplicated(['Timestamp','check']), df.columns.drop('User')] = np.nan
​

Note that this assumes “check” is dtype timedelta object.

Output:

   User           Timestamp          Timestamp2           check in_out
0     1 2022-01-07 07:40:44 2022-01-07 17:32:18 0 days 09:51:34     in
1     1 2022-01-07 17:32:18 2022-01-10 17:31:50 2 days 23:59:32    out
1     1                 NaT                 NaT             NaT    NaN
2     1 2022-01-10 17:31:50 2022-01-11 07:43:48 0 days 14:11:58     in

JavaScript
 
   User           Timestamp          Timestamp2           check in_out
0     1 2022-01-07 07:40:44 2022-01-07 17:32:18 0 days 09:51:34     in
1     1 2022-01-07 17:32:18 2022-01-10 17:31:50 2 days 23:59:32    out
1     1                 NaT                 NaT             NaT    NaN
2     1 2022-01-10 17:31:50 2022-01-11 07:43:48 0 days 14:11:58     in
​

Advertisement

Answer