I have the following dataframe
id date other variables.. A 2019Q4 A 2020Q4 A 2021Q4 B 2018Q4 B 2019Q4 B 2020Q4 B 2021Q4 C 2020Q4 C 2021Q4 D 2021Q4 E 2018Q4 E 2019Q4 E 2020Q4 E 2021Q4 . .
I want to group by id and keep those ids if it contains all of the designated values (i.e. 2019Q4, 2020Q4, 2021Q4
) then extract rows that correspond to those values. isin()
won’t work because it won’t drop C and D.
desired output
A 2019Q4 A 2020Q4 A 2021Q4 B 2019Q4 B 2020Q4 B 2021Q4 E 2019Q4 E 2020Q4 E 2021Q4 . .
Advertisement
Answer
You can use set operations to filter the id and isin
for the date:
target = {'2019Q4', '2020Q4', '2021Q4'} id_ok = df.groupby('id')['date'].agg(lambda x: target.issubset(x)) df2 = df[df['date'].isin(target) & df['id'].map(id_ok)]
or, using transform
:
target = {'2019Q4', '2020Q4', '2021Q4'} mask = df.groupby('id')['date'].transform(lambda x: target.issubset(x)) df2 = df[df['date'].isin(target) & mask]
output:
id date other 0 A 2019Q4 NaN 1 A 2020Q4 NaN 2 A 2021Q4 NaN 4 B 2019Q4 NaN 5 B 2020Q4 NaN 6 B 2021Q4 NaN 11 E 2019Q4 NaN 12 E 2020Q4 NaN 13 E 2021Q4 NaN
id_ok
:
id A True B True C False D False E True Name: date, dtype: bool