inserting missing categories and dates in pandas dataframe

Question

I have the following data frame. I want to add in all score levels (high, mid, low), for each group (a, b, c, d), for all dates (there are two dates - 2020-06-01 and 2020-06-02) I can add in the score categories for all subjects with the following, but i am having trouble adding date in as well the expected

Accepted Answer

Your solution is possible modify with add date columns by unique values, this solution working if not unique triples date, group, score in input data:cats = ['high', 'mid','low'] x_re = pd.DataFrame(list(product(x['date'].unique(),                                  x['group'].unique(),                                  cats)),columns=['date','group', 'score'])x = x_re.merge(x, how='left').fillna(0)Solution with reindex by 3 level MultiIndex is similar:cats = ['high', 'mid','low'] x_re = pd.MultiIndex.from_product([x['date'].unique(),                                    x['group'].unique(),                                   cats],names=['date','group', 'score'])x = x.set_index(['date','group','score']).reindex(x_re).reset_index()print (x)          date group score  count0   2020-06-01     a  high   12.01   2020-06-01     a   mid    NaN2   2020-06-01     a   low   13.03   2020-06-01     b  high    NaN4   2020-06-01     b   mid    NaN5   2020-06-01     b   low   19.06   2020-06-01     c  high    3.07   2020-06-01     c   mid    NaN8   2020-06-01     c   low    NaN9   2020-06-01     d  high    NaN10  2020-06-01     d   mid    NaN11  2020-06-01     d   low    NaN12  2020-06-02     a  high    NaN13  2020-06-02     a   mid    2.014  2020-06-02     a   low    NaN15  2020-06-02     b  high   22.016  2020-06-02     b   mid    NaN17  2020-06-02     b   low    NaN18  2020-06-02     c  high    4.019  2020-06-02     c   mid   49.020  2020-06-02     c   low    NaN21  2020-06-02     d  high   12.022  2020-06-02     d   mid    NaN23  2020-06-02     d   low    NaNWith one call unstack and one call stack is possible use, but is necessary all unique values cats have to exist in input data:x = (x.set_index(['date', 'group', 'score'])      .unstack(['group','score'])      .stack([1, 2], dropna=False)      .reset_index())print (x)          date group score  count0   2020-06-01     a  high   12.01   2020-06-01     a   low   13.02   2020-06-01     a   mid    NaN3   2020-06-01     b  high    NaN4   2020-06-01     b   low   19.05   2020-06-01     b   mid    NaN6   2020-06-01     c  high    3.07   2020-06-01     c   low    NaN8   2020-06-01     c   mid    NaN9   2020-06-01     d  high    NaN10  2020-06-01     d   low    NaN11  2020-06-01     d   mid    NaN12  2020-06-02     a  high    NaN13  2020-06-02     a   low    NaN14  2020-06-02     a   mid    2.015  2020-06-02     b  high   22.016  2020-06-02     b   low    NaN17  2020-06-02     b   mid    NaN18  2020-06-02     c  high    4.019  2020-06-02     c   low    NaN20  2020-06-02     c   mid   49.021  2020-06-02     d  high   12.022  2020-06-02     d   low    NaN23  2020-06-02     d   mid    NaN

Advertisement

Answer