Filter out dataframe based on values being within the 90th percentile

Question

Suppose I have this dataframe Now I want to go through each column and filter out the low percentiles keeping only values that are contained in the 90th percentile. Thus since apple and bob are each within their associated 90th percentiles I would have this dataframe How do I achieve this? Answer Hope this he…

Accepted Answer

Hope this helps:Calculate 90th percentile and keep them in a dictionary:perc_dic = df.apply(lambda x : np.percentile(x.value_counts().tolist(), 90) , axis= 0).to_dict()perc_dic {'fruit': 2.8, 'names': 2.8}2.Calculate value counts of each value:value_dic = df.apply(lambda x : x.value_counts() , axis= 0).to_dict()value_dic{'fruit': {'apple': 3.0, 'bob': nan, 'joe': nan, 'orange': 1.0}, 'names': {'apple': nan, 'bob': 3.0, 'joe': 1.0, 'orange': nan}}Compare value counts and percentile:df_result =  df.apply(lambda row : row  if value_dic['fruit'][row['fruit']] >= perc_dic['fruit']  and                                           value_dic['names'][row['names']] >= perc_dic['names']                                          else np.nan , axis= 1)df_result.dropna()    fruit   names0   apple   bob1   apple   bob2   apple   bobNote:  There are some nan values in value_dic but it won&#8217;t hurt the performance.

Advertisement

Answer