This is my dataframe:
df = pd.DataFrame({ 'User': ["alex", "alex", "ravi", "dodo", "dodo", "dodo", "cokie","dodo","nemo","ravi"], 'Id': ['a', 'b', 'b', 'a', 'b', 'b', 'c','a','e','b'] })
This is how my dataframe looklike:
User Id 0 alex a 1 alex b 2 ravi b 3 dodo a 4 dodo b 5 dodo b 6 cokie c 7 dodo a 8 nemo e 9 ravi b
I first counted the no of items for each user using the following code:
df_group = df.groupby(['User', 'Id']) # size of group to count observations df_group = df_group.size() # make a column name df_group = df_group.reset_index(name='Observation')
This is how it looks:
User Id Observation 0 alex a 1 1 alex b 1 2 cokie c 1 3 dodo a 2 4 dodo b 2 5 nemo e 1 6 ravi b 2
I want to remove a user who is coming 1 time and also observation is 1. For example user nemo and cokie. But I don’t want to remove user alex, even though items a and b are coming 1 time or user ravi.
How can I do it?
My end goal:
User Id Observation 0 alex a 1 1 alex b 1 3 dodo a 2 4 dodo b 2 6 ravi b 2
Advertisement
Answer
This works:
s = df.groupby('User').size() > 1 df = df.merge(df_group, how='left')[df['User'].isin(s[s].index)].drop_duplicates()