I have a pandas DataFrame
like following:
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", "second","first","third","fourth", "fifth","second","fifth","first", "first","second","third","fourth","fifth"]})
I want to group this by ["id","value"]
and get the first row of each group:
id value 0 1 first 1 1 second 2 1 second 3 2 first 4 2 second 5 3 first 6 3 third 7 3 fourth 8 3 fifth 9 4 second 10 4 fifth 11 5 first 12 6 first 13 6 second 14 6 third 15 7 fourth 16 7 fifth
Expected outcome:
id value 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth
I tried following, which only gives the first row of the DataFrame
. Any help regarding this is appreciated.
In [25]: for index, row in df.iterrows(): ....: df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0])
Advertisement
Answer
>>> df.groupby('id').first() value id 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth
If you need id
as column:
>>> df.groupby('id').first().reset_index() id value 0 1 first 1 2 first 2 3 first 3 4 second 4 5 first 5 6 first 6 7 fourth
To get n first records, you can use head():
>>> df.groupby('id').head(2).reset_index(drop=True) id value 0 1 first 1 1 second 2 2 first 3 2 second 4 3 first 5 3 third 6 4 second 7 4 fifth 8 5 first 9 6 first 10 6 second 11 7 fourth 12 7 fifth