How can I delete only the three consecutive rows in a pandas dataframe that have the same value (in the example below, this would be the integer “4”).
Consider the following code:
import pandas as pd df = pd.DataFrame({ 'rating': [4, 4, 3.5, 15, 5 ,4,4,4,4,4 ] }) rating 0 4.0 1 4.0 2 3.5 3 15.0 4 5.0 5 4.0 6 4.0 7 4.0 8 4.0 9 4.0
I would like to get the following result as output with the three consecutive rows containing the value “4” being removed:
0 4.0 1 4.0 2 3.5 3 15.0 4 5.0 5 4.0 6 4.0
Advertisement
Answer
first get a group each time a new value exists, then use GroupBy.head
new_df = df.groupby(df['rating'].ne(df['rating'].shift()).cumsum()).head(2) print(new_df) rating 0 4.0 1 4.0 2 3.5 3 15.0 4 5.0 5 4.0 6 4.0