Select entries in one dataframe based on cross-sectional statistic of another dataframe

I want to select the entries of one dataframe, say df2, based on the cross-sectional statistic of another dataframe, say df1:

df1 = pd.DataFrame([[4, 5, 9, 11],
               [3, 1, 45, 1],
               [88, 314, 2, 313]], columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame([['h','e','l','p'],
               ['m','y','q','u'],
               ['e','r','y','.']], columns = ['A', 'B', 'C', 'D'])

JavaScript
​x
 
df1 = pd.DataFrame([[4, 5, 9, 11],
               [3, 1, 45, 1],
               [88, 314, 2, 313]], columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame([['h','e','l','p'],
               ['m','y','q','u'],
               ['e','r','y','.']], columns = ['A', 'B', 'C', 'D'])
​

For instance, if the cross-sectional statistic on df1 is a max operation, then for the 3 rows in df1 the corresponding columns with the max entries are ‘D’, ‘C’, ‘B’ (corresponding to entries 11, 45, 314).

Selecting only those entries in df2 should give me:

which I can achieve by:

mask_ = pd.DataFrame(False, index=df1.idxmax(1).index, columns=df1.idxmax(1))
for k,i in enumerate(df1.idxmax(1)):
    mask_.loc[k, i] = True
df2[mask_]

JavaScript
 
mask_ = pd.DataFrame(False, index=df1.idxmax(1).index, columns=df1.idxmax(1))
for k,i in enumerate(df1.idxmax(1)):
    mask_.loc[k, i] = True
df2[mask_]
​

However, this feels cumbersome; is there an easier way to do this?

Answer

Solution working if index and columns names are same in both DataFrames.

Use DataFrame.where with mask for compare maximal values by all values of rows:

df = df2.where(df1.eq(df1.max(axis=1), axis=0))
print (df)
     A    B    C    D
0  NaN  NaN  NaN    p
1  NaN  NaN    q  NaN
2  NaN    r  NaN  NaN

JavaScript
 
df = df2.where(df1.eq(df1.max(axis=1), axis=0))
print (df)
     A    B    C    D
0  NaN  NaN  NaN    p
1  NaN  NaN    q  NaN
2  NaN    r  NaN  NaN
​

Advertisement

Answer