I have the following df
JavaScript
x
14
14
1
id type
2
3
0 1 A
4
1 1 B
5
2 1 A
6
3 2 A
7
4 2 B
8
5 3 A
9
6 3 B
10
7 3 A
11
8 3 B
12
9 3 A
13
10 3 A
14
We can assume that this data is already sorted. What i need to do is, for every id, I need to remove rows under the following conditions
- the first entry for every id is type
A
- the last entry for every id is type
B
- the last entry’s
B
is the last one that appears (data is already sorted)
I’ve accomplished 1. with the following:
df = df.groupby('id').filter(lambda x: x['Type'].iloc[0] != 'A')
Which removes ids entirely if their first type isn’t A
However, for 2. and 3., I don’t want to remove the id if the last type isn’t B
, instead I just want to remove everything in the middle
Resulting df:
JavaScript
1
9
1
id type
2
3
0 1 A
4
1 1 B
5
3 2 A
6
4 2 B
7
5 3 A
8
8 3 B
9
example code:
JavaScript
1
15
15
1
d = {'id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3, 6: 3, 7: 3, 8: 3, 9: 3, 10: 3},
2
'type': {0: 'A',
3
1: 'B',
4
2: 'A',
5
3: 'A',
6
4: 'B',
7
5: 'A',
8
6: 'B',
9
7: 'A',
10
8: 'B',
11
9: 'A',
12
10: 'A'}}
13
14
df = pd.DataFrame.from_dict(d)
15
Advertisement
Answer
It seems you could use drop_duplicates
with different rule depending on type
:
JavaScript
1
3
1
out = pd.concat([df.query("type=='A'").drop_duplicates(subset=['id','type'], keep='first'),
2
df.query("type=='B'").drop_duplicates(subset=['id','type'], keep='last')]).sort_index()
3
Output:
JavaScript
1
8
1
id type
2
0 1 A
3
1 1 B
4
3 2 B
5
4 2 A
6
5 3 A
7
8 3 B
8