I have a dataframe similar to this one
JavaScript
x
10
10
1
df = pd.DataFrame({'date':[20220101,20220102,20220103,20220101,20220102,20220101], 'id':[1,1,1,2,2,3], 'value':[11,22,33,44,55,66], 'categorie':['a','a','c','a','c','c']})
2
3
date id value categorie
4
20220101 1 11 a
5
20220102 1 22 a
6
20220103 1 33 c
7
20220101 2 44 a
8
20220102 2 55 c
9
20220101 3 66 c
10
I would now like to slice the df based on multiple values from column ‘categorie’ and am currently using
JavaScript
1
2
1
df = df[df['categorie'].isin(['a','c'])]
2
In addition to that I would like to be able to only get the [-1] row back for categorie ‘a’
JavaScript
1
7
1
date id value categorie
2
20220102 1 22 a
3
20220103 1 33 c
4
20220101 2 44 a
5
20220102 2 55 c
6
20220101 3 66 c
7
instead of
JavaScript
1
8
1
date id value categorie
2
20220101 1 11 a
3
20220102 1 22 a
4
20220103 1 33 c
5
20220101 2 44 a
6
20220102 2 55 c
7
20220101 3 66 c
8
I think the closest would be to think about it as a groupby max value on id and categorie but I am curious if there is a more pythonic way.
Advertisement
Answer
‘a’ and ‘c’ are the only categories in your data, if you just need the latest then drop the duplicates
JavaScript
1
3
1
# drop duplicates and keep the last
2
df.drop_duplicates(subset=['id','categorie'], keep='last')
3
or
JavaScript
1
4
1
# select the categories of 'a' and 'c' and drop the duplicates from among them
2
(df.loc[df['categorie'].isin(['a','c'])]
3
.drop_duplicates(subset=['id','categorie'], keep='last'))
4
JavaScript
1
7
1
date id value categorie
2
1 20220102 1 22 a
3
2 20220103 1 33 c
4
3 20220101 2 44 a
5
4 20220102 2 55 c
6
5 20220101 3 66 c
7