How can I delete only the three consecutive rows in a pandas dataframe that have the same value (in the example below, this would be the integer “4”).
Consider the following code:
JavaScript
x
18
18
1
import pandas as pd
2
3
df = pd.DataFrame({
4
'rating': [4, 4, 3.5, 15, 5 ,4,4,4,4,4 ]
5
})
6
7
rating
8
0 4.0
9
1 4.0
10
2 3.5
11
3 15.0
12
4 5.0
13
5 4.0
14
6 4.0
15
7 4.0
16
8 4.0
17
9 4.0
18
I would like to get the following result as output with the three consecutive rows containing the value “4” being removed:
JavaScript
1
8
1
0 4.0
2
1 4.0
3
2 3.5
4
3 15.0
5
4 5.0
6
5 4.0
7
6 4.0
8
Advertisement
Answer
first get a group each time a new value exists, then use GroupBy.head
JavaScript
1
12
12
1
new_df = df.groupby(df['rating'].ne(df['rating'].shift()).cumsum()).head(2)
2
print(new_df)
3
4
rating
5
0 4.0
6
1 4.0
7
2 3.5
8
3 15.0
9
4 5.0
10
5 4.0
11
6 4.0
12