Skip to content
Advertisement

Pandas long format of success table

I have a table with the following structure in pandas:

import pandas as pd

df = pd.DataFrame({
    "user_id": [1, 2, 3],
    "user_events": [1, 2, 3],
    "user_successes": [0, 1, 2]
})

I would like to put it in a long format. In this case, we have, for each user, a different number of events, and successes. I would like to transform this into an event table (each row corresponds to an event, and there is a column that tells you whether it was a success or not). In this case, the output dataframe should be:

out_df = pd.DataFrame({
    "user_id": [1, 2, 2, 3, 3, 3],
    "success": [0, 1, 0, 1, 1, 0]
})

What’s the simplest way of doing this in pandas? I would like to avoid for loops that iterate on each user, create dataframes and then append them.

Advertisement

Answer

You may try with reindex with repeat then assign the value with limit condition created by groupby cumsum

s = df.reindex(df.index.repeat(df.user_events))
s['success'] = 1
s['success'] = s['success'].where(s.groupby('user_id')['success'].cumsum()<=s['user_successes'],0)
s
Out[54]: 
   user_id  user_events  user_successes  success
0        1            1               0        0
1        2            2               1        1
1        2            2               1        0
2        3            3               2        1
2        3            3               2        1
2        3            3               2        0
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement