Skip to content
Advertisement

Group by column on basis of occurrences

This is my dataframe:

df = pd.DataFrame({
    'User': ["alex", "alex", "ravi", "dodo", "dodo", "dodo", "cokie","dodo","nemo","ravi"],
    'Id': ['a', 'b', 'b', 'a', 'b', 'b', 'c','a','e','b']
})

This is how my dataframe looklike:

    User Id
0   alex  a
1   alex  b
2   ravi  b
3   dodo  a
4   dodo  b
5   dodo  b
6  cokie  c
7   dodo  a
8   nemo  e
9   ravi  b

I first counted the no of items for each user using the following code:

df_group = df.groupby(['User', 'Id'])

# size of group to count observations
df_group = df_group.size()
  
# make a column name 
df_group = df_group.reset_index(name='Observation')

This is how it looks:

  User  Id Observation
0   alex    a   1
1   alex    b   1
2   cokie   c   1
3   dodo    a   2
4   dodo    b   2
5   nemo    e   1
6   ravi    b   2

I want to remove a user who is coming 1 time and also observation is 1. For example user nemo and cokie. But I don’t want to remove user alex, even though items a and b are coming 1 time or user ravi.

How can I do it?

My end goal:

    User    Id Observation
0   alex    a   1
1   alex    b   1

3   dodo    a   2
4   dodo    b   2

6   ravi    b   2

Advertisement

Answer

This works:

s = df.groupby('User').size() > 1
df = df.merge(df_group, how='left')[df['User'].isin(s[s].index)].drop_duplicates()
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement