I have a table with the following structure in pandas:
import pandas as pd df = pd.DataFrame({ "user_id": [1, 2, 3], "user_events": [1, 2, 3], "user_successes": [0, 1, 2] })
I would like to put it in a long format. In this case, we have, for each user, a different number of events, and successes. I would like to transform this into an event table (each row corresponds to an event, and there is a column that tells you whether it was a success or not). In this case, the output dataframe should be:
out_df = pd.DataFrame({ "user_id": [1, 2, 2, 3, 3, 3], "success": [0, 1, 0, 1, 1, 0] })
What’s the simplest way of doing this in pandas? I would like to avoid for loops that iterate on each user, create dataframes and then append them.
Advertisement
Answer
You may try with reindex
with repeat
then assign the value with limit condition created by groupby
cumsum
s = df.reindex(df.index.repeat(df.user_events)) s['success'] = 1 s['success'] = s['success'].where(s.groupby('user_id')['success'].cumsum()<=s['user_successes'],0) s Out[54]: user_id user_events user_successes success 0 1 1 0 0 1 2 2 1 1 1 2 2 1 0 2 3 3 2 1 2 3 3 2 1 2 3 3 2 0