I have this dataframe:
JavaScript
x
6
1
df = pd.DataFrame({
2
'thread_id': [0,0,1,1,1,2,2],
3
'message_id_in_thread': [0,1,0,1,2,0,1],
4
'text': ['txt0', 'txt1', 'txt2', 'txt3', 'txt4', 'txt5', 'txt6']
5
}).set_index(['thread_id', 'message_id_in_thread'])
6
And I want to keep all the last second level rows, meaning that:
- For
thread_id==0
I want to keep the rowmessage_id_in_thread==1
- For
thread_id==1
I want to keep the rowmessage_id_in_thread==2
- For
thread_id==2
I want to keep the rowmessage_id_in_thread==1
This can easily be achieved by doing df.iterrows(), but I would like to know if there is any direct indexing method.
I look for something like df.loc[(:, -1)]
, which selects from all (:
) level 1 groups, the last (-1
) row of that block/group, but obviously this does not work.
Advertisement
Answer
If need both levels use GroupBy.tail
:
JavaScript
1
8
1
df = df.groupby(level=0).tail(1)
2
print (df)
3
text
4
thread_id message_id_in_thread
5
0 1 txt1
6
1 2 txt4
7
2 1 txt6
8
If need only first level use GroupBy.last
or GroupBy.nth
:
JavaScript
1
9
1
df = df.groupby(level=0).last()
2
#df = df.groupby(level=0).nth(-1)
3
print (df)
4
text
5
thread_id
6
0 txt1
7
1 txt4
8
2 txt6
9