I want to select the entries of one dataframe, say df2, based on the cross-sectional statistic of another dataframe, say df1:
df1 = pd.DataFrame([[4, 5, 9, 11], [3, 1, 45, 1], [88, 314, 2, 313]], columns = ['A', 'B', 'C', 'D']) df2 = pd.DataFrame([['h','e','l','p'], ['m','y','q','u'], ['e','r','y','.']], columns = ['A', 'B', 'C', 'D'])
For instance, if the cross-sectional statistic on df1 is a max operation, then for the 3 rows in df1 the corresponding columns with the max entries are ‘D’, ‘C’, ‘B’ (corresponding to entries 11, 45, 314).
Selecting only those entries in df2 should give me:
which I can achieve by:
mask_ = pd.DataFrame(False, index=df1.idxmax(1).index, columns=df1.idxmax(1)) for k,i in enumerate(df1.idxmax(1)): mask_.loc[k, i] = True df2[mask_]
However, this feels cumbersome; is there an easier way to do this?
Advertisement
Answer
Solution working if index and columns names are same in both DataFrames.
Use DataFrame.where
with mask for compare maximal values by all values of rows:
df = df2.where(df1.eq(df1.max(axis=1), axis=0)) print (df) A B C D 0 NaN NaN NaN p 1 NaN NaN q NaN 2 NaN r NaN NaN