I have a dataframe with 3 columns: session_id
, name
, reset_flag
.
I need to make a new column, new_name
, where the new name will be set to the first name
where reset_flag=True
, and then it will continue as that name WITHIN that session, until there is new reset_flag
.
Not really sure best way to approach.
EDIT: I thought of a way to do so with df.iterrows(), by storing into list and then appending, but it seems very bulky. is there a more efficient ‘pandas’ way?
Sample expected output
session_id | name | reset_flag | new_name |
---|---|---|---|
06c97a-bc7-6cc-29f-65978ee8d | some_name_1 | TRUE | some_name_1 |
06c97a-bc7-6cc-29f-65978ee8d | some_name_1 | some_name_1 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_1 | some_name_1 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_2 | TRUE | some_name_2 |
06c97a-bc7-6cc-29f-65978ee8d | some_name_2 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_2 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_3 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_3 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_4 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_4 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_4 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_5 | TRUE | some_name_5 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_1 | TRUE | some_name_1 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_1 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_1 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_2 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_2 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_2 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_3 | TRUE | some_name_3 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_3 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_4 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_4 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_4 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_5 | TRUE | some_name_5 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_6 | some_name_5 |
Advertisement
Answer
An efficient way to go about this would be to use cumsum
on the “reset_flag” column : this will give you a columns of numbers that increase every time a True
is encountered.
You can then simply group by this column to get the desired result (I’m assuming your “reset_flag” column is boolean):
df["new_name"] = df.groupby(df["reset_flag"].cumsum())["name"].transform("first")