this is my first question here as I really couldn’t figure it out with related answers: I have a list of dataframes “df_list”, for each user I have a dataframe which basically looks like:
- |User |Timestamp |Timestamp2 |check |in_out 0 |0001 |2022-01-07 07:40:44 |2022-01-07 17:32:18 |0 days 09:51:34 |in 1 |0001 |2022-01-07 17:32:18 |2022-01-10 17:31:50 |2 days 23:59:32 |out 2 |0001 |2022-01-10 17:31:50 |2022-01-11 07:43:48 |0 days 14:11:58 |in
Data:
data = {'User': [1, 1, 1], 'Timestamp': ['2022-01-07 07:40:44', '2022-01-07 17:32:18', '2022-01-10 17:31:50'], 'Timestamp2': ['2022-01-07 17:32:18', '2022-01-10 17:31:50', '2022-01-11 07:43:48'], 'check': ['0 days 09:51:34', '2 days 23:59:32', '0 days 14:11:58'], 'in_out': ['in', 'out', 'in']}
I would like to go through all the dataframes in my df_list and inside each df I would like to add 1 row (below) where the ‘check’ value is > 15 hours.
Desired Output
- |User |Timestamp |Timestamp2 |check |in_out 0 |0001 |2022-01-07 07:40:44 |2022-01-07 17:32:18 |0 days 09:51:34 |in 1 |0001 |2022-01-07 17:32:18 |2022-01-10 17:31:50 |2 days 23:59:32 |out 2 |0001 3 |0001 |2022-01-10 17:31:50 |2022-01-11 07:43:48 |0 days 14:11:58 |in
Attempt:
So what I tried is to go with a for-loop into the list and there with another loop through all my rows in each dataframe. Then I checked via if-clause for my condition… I create a new blank line with the required index and then concat the dataframe so i can include the empty line.
for dfx in df_list: for i in dfx.index.values.tolist(): if (pd.Timedelta(dfx["check"].loc[i]) > pd.Timedelta("15 hours")) == True: line = pd.DataFrame({'ID': '', 'Timestamp': None,'Timestamp2': '','check': None}, index=[i+1]) dfx = pd.concat([dfx.iloc[:i+1], line, dfx.iloc[i:]]).reset_index(drop=True)
I also found this here Add empty row if a condition is met Pandas but I can’t get this running as well…
for i in dfy.index.values.tolist(): indices = dfy.loc[pd.Timedelta(((dfy["check"].loc[i]) > pd.Timedelta("15 hours"))== True)].index.tolist() ERROR: Value must be Timedelta, string, integer, float, timedelta or convertible, not bool
In the next step I want to add the ‘Timestamp’ value from previous line + or – 10 hours, depending on the ‘value’ of the in/out column…but I thought I should figure out one problem at a time…
Advertisement
Answer
You can create a boolean mask using “check” column and add a row using Index.repeat
and reindex
ing. Then sort_index
and delete duplicate values:
msk = df['check'].astype('timedelta64[h]') > 15 df = df.reindex(df[msk].index.repeat(2)).append(df[~msk]).sort_index() df.loc[df.duplicated(['Timestamp','check']), df.columns.drop('User')] = np.nan
Note that this assumes “check” is dtype timedelta object.
Output:
User Timestamp Timestamp2 check in_out 0 1 2022-01-07 07:40:44 2022-01-07 17:32:18 0 days 09:51:34 in 1 1 2022-01-07 17:32:18 2022-01-10 17:31:50 2 days 23:59:32 out 1 1 NaT NaT NaT NaN 2 1 2022-01-10 17:31:50 2022-01-11 07:43:48 0 days 14:11:58 in