Skip to content
Advertisement

Keeping ‘key’ column when using groupby with transform in pandas

Finding a normalized dataframe removes the column being used to group by, so that it can’t be used in subsequent groupby operations. for example (edit: updated):

    df = pd.DataFrame({'a':[1, 1 , 2, 3, 2, 3], 'b':[0, 1, 2, 3, 4, 5]})

       a  b
    0  1  0
    1  1  1
    2  2  2
    3  3  3
    4  2  4
    5  3  5

    df.groupby('a').transform(lambda x: x)

       b
    0  0
    1  1
    2  2
    3  3
    4  4
    5  5

Now, with most operations on groups the ‘missing’ column becomes a new index (which can then be adjusted using reset_index, or set as_index=False), but when using transform it just disappears, leaving the original index and a new dataset without the key.

Edit: here’s a one liner of what I would like to be able to do

    df.groupby('a').transform(lambda x: x+1).groupby('a').mean()
    KeyError 'a'

In the example from the pandas docs a function is used to split based on the index, which appears to avoid this issue entirely. Alternatively, it would always be possible just to add the column after the groupby/transform, but surely there’s a better way?

Update: It looks like reset_index/as_index are intended only for functions that reduce each group to a single row. There seem to be a couple options, from answers

Advertisement

Answer

that is bizzare!

I tricked it like this

df.groupby(df.a.values).transform(lambda x: x)

enter image description here

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement