This is my dataframe:
JavaScript
x
5
1
df = pd.DataFrame({
2
'User': ["alex", "alex", "ravi", "dodo", "dodo", "dodo", "cokie","dodo","nemo","ravi"],
3
'Id': ['a', 'b', 'b', 'a', 'b', 'b', 'c','a','e','b']
4
})
5
This is how my dataframe looklike:
JavaScript
1
12
12
1
User Id
2
0 alex a
3
1 alex b
4
2 ravi b
5
3 dodo a
6
4 dodo b
7
5 dodo b
8
6 cokie c
9
7 dodo a
10
8 nemo e
11
9 ravi b
12
I first counted the no of items for each user using the following code:
JavaScript
1
8
1
df_group = df.groupby(['User', 'Id'])
2
3
# size of group to count observations
4
df_group = df_group.size()
5
6
# make a column name
7
df_group = df_group.reset_index(name='Observation')
8
This is how it looks:
JavaScript
1
9
1
User Id Observation
2
0 alex a 1
3
1 alex b 1
4
2 cokie c 1
5
3 dodo a 2
6
4 dodo b 2
7
5 nemo e 1
8
6 ravi b 2
9
I want to remove a user who is coming 1 time and also observation is 1. For example user nemo and cokie. But I don’t want to remove user alex, even though items a and b are coming 1 time or user ravi.
How can I do it?
My end goal:
JavaScript
1
9
1
User Id Observation
2
0 alex a 1
3
1 alex b 1
4
5
3 dodo a 2
6
4 dodo b 2
7
8
6 ravi b 2
9
Advertisement
Answer
This works:
JavaScript
1
3
1
s = df.groupby('User').size() > 1
2
df = df.merge(df_group, how='left')[df['User'].isin(s[s].index)].drop_duplicates()
3