Let’s say I have the Pandas dataframe with columns of different measurement attributes and corresponding measurement values.
ID Parameter Value 0 'A' 4.3 1 'B' 3.1 2 'C' 8.9 3 'A' 2.1 4 'A' 3.9 . . . . . . . . . 100 'B' 3.8
How can I filter this dataframe to only have measurements that appear more than X number of times? For example, for this dataframe I want to get all rows with more than 5 measurements (lets say only parameters ‘A’ and ‘B’ appear more than 5 times) to get a dataframe like below.
ID Parameter Value 0 'A' 4.3 1 'B' 3.1 3 'A' 2.1 . . . . . . . . . 100 'B' 3.8
Advertisement
Answer
You can use value_counts
+ isin
–
v = df.Parameter.value_counts() df[df.Parameter.isin(v.index[v.gt(5)])]
For example, where K = 2
(get all items which have more than 2 readings) –
df ID Parameter Value 0 0 A 4.3 1 1 B 3.1 2 2 C 8.9 3 3 A 2.1 4 4 A 3.9 5 5 B 4.5 v = df.Parameter.value_counts() v A 3 B 2 C 1 Name: Parameter, dtype: int64 df[df.Parameter.isin(v.index[v.gt(2)])] ID Parameter Value 0 0 A 4.3 3 3 A 2.1 4 4 A 3.9