Skip to content
Advertisement

Pandas filtering based on minimum data occurrences across multiple columns

I have a dataframe like this

country     data_fingerprint   organization     
US          111                Tesco         
UK          222                IBM          
US          111                Yahoo           
PY          333                Tesco
US          111                Boeing   
CN          333                TCS  
NE          458                Yahoo
UK          678                Tesco

I want those data_fingerprint for where the organisation and country with top 2 counts exists

So if see in organization top 2 occurrences are for Tesco,Yahoo and for country we have US,UK .

So based on that the output of data_fingerprint should be having

data_fingerprint
111
678

What i have tried for organization to exist in my complete dataframe is this

# First find top 2 occurances of organization
nd = df['organization'].value_counts().groupby(level=0, group_keys=False).head(2)
# Then checking if the organization exist in the complete dataframe and filtering those rows
new = df["organization"].isin(nd)

But i am not getting any data here.Once i get data for this I can do it along with country Can someone please help to get me the output.I have less data so using Pandas

Advertisement

Answer

here is one way to do it

df[
    df['organization'].isin(df['organization'].value_counts().head(2).index) &
    df['country'].isin(df['country'].value_counts().head(2).index)
]['data_fingerprint'].unique()
array([111, 678], dtype=int64)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement