I have the following dataframe
JavaScript
x
17
17
1
id date other variables..
2
A 2019Q4
3
A 2020Q4
4
A 2021Q4
5
B 2018Q4
6
B 2019Q4
7
B 2020Q4
8
B 2021Q4
9
C 2020Q4
10
C 2021Q4
11
D 2021Q4
12
E 2018Q4
13
E 2019Q4
14
E 2020Q4
15
E 2021Q4
16
. .
17
I want to group by id and keep those ids if it contains all of the designated values (i.e. 2019Q4, 2020Q4, 2021Q4
) then extract rows that correspond to those values. isin()
won’t work because it won’t drop C and D.
desired output
JavaScript
1
11
11
1
A 2019Q4
2
A 2020Q4
3
A 2021Q4
4
B 2019Q4
5
B 2020Q4
6
B 2021Q4
7
E 2019Q4
8
E 2020Q4
9
E 2021Q4
10
. .
11
Advertisement
Answer
You can use set operations to filter the id and isin
for the date:
JavaScript
1
6
1
target = {'2019Q4', '2020Q4', '2021Q4'}
2
3
id_ok = df.groupby('id')['date'].agg(lambda x: target.issubset(x))
4
5
df2 = df[df['date'].isin(target) & df['id'].map(id_ok)]
6
or, using transform
:
JavaScript
1
6
1
target = {'2019Q4', '2020Q4', '2021Q4'}
2
3
mask = df.groupby('id')['date'].transform(lambda x: target.issubset(x))
4
5
df2 = df[df['date'].isin(target) & mask]
6
output:
JavaScript
1
11
11
1
id date other
2
0 A 2019Q4 NaN
3
1 A 2020Q4 NaN
4
2 A 2021Q4 NaN
5
4 B 2019Q4 NaN
6
5 B 2020Q4 NaN
7
6 B 2021Q4 NaN
8
11 E 2019Q4 NaN
9
12 E 2020Q4 NaN
10
13 E 2021Q4 NaN
11
id_ok
:
JavaScript
1
8
1
id
2
A True
3
B True
4
C False
5
D False
6
E True
7
Name: date, dtype: bool
8