Skip to content
Advertisement

Select entries in one dataframe based on cross-sectional statistic of another dataframe

I want to select the entries of one dataframe, say df2, based on the cross-sectional statistic of another dataframe, say df1:

df1 = pd.DataFrame([[4, 5, 9, 11],
               [3, 1, 45, 1],
               [88, 314, 2, 313]], columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame([['h','e','l','p'],
               ['m','y','q','u'],
               ['e','r','y','.']], columns = ['A', 'B', 'C', 'D'])

For instance, if the cross-sectional statistic on df1 is a max operation, then for the 3 rows in df1 the corresponding columns with the max entries are ‘D’, ‘C’, ‘B’ (corresponding to entries 11, 45, 314).

Selecting only those entries in df2 should give me:

enter image description here

which I can achieve by:

mask_ = pd.DataFrame(False, index=df1.idxmax(1).index, columns=df1.idxmax(1))
for k,i in enumerate(df1.idxmax(1)):
    mask_.loc[k, i] = True
df2[mask_]

However, this feels cumbersome; is there an easier way to do this?

Advertisement

Answer

Solution working if index and columns names are same in both DataFrames.

Use DataFrame.where with mask for compare maximal values by all values of rows:

df = df2.where(df1.eq(df1.max(axis=1), axis=0))
print (df)
     A    B    C    D
0  NaN  NaN  NaN    p
1  NaN  NaN    q  NaN
2  NaN    r  NaN  NaN
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement