I have a dataframe similar to this one
df = pd.DataFrame({'date':[20220101,20220102,20220103,20220101,20220102,20220101], 'id':[1,1,1,2,2,3], 'value':[11,22,33,44,55,66], 'categorie':['a','a','c','a','c','c']}) date id value categorie 20220101 1 11 a 20220102 1 22 a 20220103 1 33 c 20220101 2 44 a 20220102 2 55 c 20220101 3 66 c
I would now like to slice the df based on multiple values from column ‘categorie’ and am currently using
df = df[df['categorie'].isin(['a','c'])]
In addition to that I would like to be able to only get the [-1] row back for categorie ‘a’
date id value categorie 20220102 1 22 a 20220103 1 33 c 20220101 2 44 a 20220102 2 55 c 20220101 3 66 c
instead of
date id value categorie 20220101 1 11 a 20220102 1 22 a 20220103 1 33 c 20220101 2 44 a 20220102 2 55 c 20220101 3 66 c
I think the closest would be to think about it as a groupby max value on id and categorie but I am curious if there is a more pythonic way.
Advertisement
Answer
‘a’ and ‘c’ are the only categories in your data, if you just need the latest then drop the duplicates
# drop duplicates and keep the last df.drop_duplicates(subset=['id','categorie'], keep='last')
or
# select the categories of 'a' and 'c' and drop the duplicates from among them (df.loc[df['categorie'].isin(['a','c'])] .drop_duplicates(subset=['id','categorie'], keep='last'))
date id value categorie 1 20220102 1 22 a 2 20220103 1 33 c 3 20220101 2 44 a 4 20220102 2 55 c 5 20220101 3 66 c