How to drop duplicates in pandas but keep more than the first

Question

Let's say I have a pandas DataFrame: I want to drop duplicates if they exceed a certain threshold n and replace them with that minimum. Let's say that n=3. Then, my target dataframe is EDIT: Each set of consecutive repetitions is considered separately. In this example, rows 8 and 9 should be kept. Answer You can create unique value for

Accepted Answer

You can create unique value for each consecutive group, then use groupby and head:group_value = np.cumsum(df.a.shift() != df.a)df.groupby(group_value).head(3)# result:   a0  11  22  23  25  16  17  18  39  3

Advertisement

Answer