Identify number of occurrence of any combination of equal elements in DataFrame

Question

I have a mock dataframe, df1 with 6 columns and 5 rows, i.e., with shape (5 x 6). Each column represents the price of an area, and rows are time. Now, I want to identify - in each row - the areas with the same price as the first column "DK1", and then being able to sum up how often

Accepted Answer

Compare DataFrame by first column by DataFrame.eq, then use matrix multiplication by DataFrame.dot with columns names with separator and last count with Series.value_counts and convert to DataFrame:df = (df.eq(df['DK1'], axis=0)        .dot(df.columns + ',')        .str[:-1]        .value_counts()        .rename_axis('Combo')        .reset_index(name='Occurrence'))print (df)                     Combo  Occurrence0              DK1,DK2,NO4           21                  DK1,NO1           12          DK1,DK2,NO1,NO2           13  DK1,DK2,NO1,NO2,NO3,NO4           1EDIT: For groups is possible create dictionary by all values and then call replace:s = df.columns.to_series()s.index = s.index.str.replace('d+','', regex=True)d = s.groupby(level=0).agg(','.join).to_dict()d = {v:k for k, v in d.items()}print (d){'DK1,DK2': 'DK', 'NO1,NO2,NO3,NO4': 'NO'}df = (df.eq(df['DK1'], axis=0)        .dot(df.columns + ',')        .str[:-1]        .value_counts()        .rename_axis('Combo')        .reset_index(name='Occurrence'))df['Combo'] = df['Combo'].replace(d, regex=True)print (df)        Combo  Occurrence0      DK,NO4           21     DK1,NO1           12  DK,NO1,NO2           13       DK,NO           1

Advertisement

Answer