Let’s say I have a pandas DataFrame:
JavaScript
x
16
16
1
import pandas as pd
2
3
df = pd.DataFrame({'a': [1,2,2,2,2,1,1,1,2,2]})
4
>> df
5
a
6
0 1
7
1 2
8
2 2
9
3 2
10
4 2
11
5 1
12
6 1
13
7 1
14
8 2
15
9 2
16
I want to drop duplicates if they exceed a certain threshold n
and replace them with that minimum. Let’s say that n=3
. Then, my target dataframe is
JavaScript
1
12
12
1
>> df
2
a
3
0 1
4
1 2
5
2 2
6
3 2
7
5 1
8
6 1
9
7 1
10
8 2
11
9 2
12
EDIT: Each set of consecutive repetitions is considered separately. In this example, rows 8 and 9 should be kept.
Advertisement
Answer
You can create unique value for each consecutive group, then use groupby
and head
:
JavaScript
1
16
16
1
group_value = np.cumsum(df.a.shift() != df.a)
2
df.groupby(group_value).head(3)
3
4
# result:
5
6
a
7
0 1
8
1 2
9
2 2
10
3 2
11
5 1
12
6 1
13
7 1
14
8 3
15
9 3
16