Let’s say I have a pandas DataFrame:
import pandas as pd df = pd.DataFrame({'a': [1,2,2,2,2,1,1,1,2,2]}) >> df a 0 1 1 2 2 2 3 2 4 2 5 1 6 1 7 1 8 2 9 2
I want to drop duplicates if they exceed a certain threshold n
and replace them with that minimum. Let’s say that n=3
. Then, my target dataframe is
>> df a 0 1 1 2 2 2 3 2 5 1 6 1 7 1 8 2 9 2
EDIT: Each set of consecutive repetitions is considered separately. In this example, rows 8 and 9 should be kept.
Advertisement
Answer
You can create unique value for each consecutive group, then use groupby
and head
:
group_value = np.cumsum(df.a.shift() != df.a) df.groupby(group_value).head(3) # result: a 0 1 1 2 2 2 3 2 5 1 6 1 7 1 8 3 9 3