I have a df
id val1 val2 1 1.1 2.2 1 1.1 2.2 2 2.1 5.5 3 8.8 6.2 4 1.1 2.2 5 8.8 6.2
I want to group by val1 and val2
and get similar dataframe only with rows which has multiple occurance of same val1 and val2
combination.
Final df
:
id val1 val2 1 1.1 2.2 4 1.1 2.2 3 8.8 6.2 5 8.8 6.2
Advertisement
Answer
You need duplicated
with parameter subset
for specify columns for check with keep=False
for all duplicates for mask and filter by boolean indexing
:
df = df[df.duplicated(subset=['val1','val2'], keep=False)] print (df) id val1 val2 0 1 1.1 2.2 1 1 1.1 2.2 3 3 8.8 6.2 4 4 1.1 2.2 5 5 8.8 6.2
Detail:
print (df.duplicated(subset=['val1','val2'], keep=False)) 0 True 1 True 2 False 3 True 4 True 5 True dtype: bool