Let’s say I have the Pandas dataframe with columns of different measurement attributes and corresponding measurement values.
JavaScript
x
11
11
1
ID Parameter Value
2
0 'A' 4.3
3
1 'B' 3.1
4
2 'C' 8.9
5
3 'A' 2.1
6
4 'A' 3.9
7
. . .
8
. . .
9
. . .
10
100 'B' 3.8
11
How can I filter this dataframe to only have measurements that appear more than X number of times? For example, for this dataframe I want to get all rows with more than 5 measurements (lets say only parameters ‘A’ and ‘B’ appear more than 5 times) to get a dataframe like below.
JavaScript
1
9
1
ID Parameter Value
2
0 'A' 4.3
3
1 'B' 3.1
4
3 'A' 2.1
5
. . .
6
. . .
7
. . .
8
100 'B' 3.8
9
Advertisement
Answer
You can use value_counts
+ isin
–
JavaScript
1
3
1
v = df.Parameter.value_counts()
2
df[df.Parameter.isin(v.index[v.gt(5)])]
3
For example, where K = 2
(get all items which have more than 2 readings) –
JavaScript
1
25
25
1
df
2
3
ID Parameter Value
4
0 0 A 4.3
5
1 1 B 3.1
6
2 2 C 8.9
7
3 3 A 2.1
8
4 4 A 3.9
9
5 5 B 4.5
10
11
v = df.Parameter.value_counts()
12
v
13
14
A 3
15
B 2
16
C 1
17
Name: Parameter, dtype: int64
18
19
df[df.Parameter.isin(v.index[v.gt(2)])]
20
21
ID Parameter Value
22
0 0 A 4.3
23
3 3 A 2.1
24
4 4 A 3.9
25