Finding a normalized dataframe removes the column being used to group by, so that it can’t be used in subsequent groupby operations. for example (edit: updated):
df = pd.DataFrame({'a':[1, 1 , 2, 3, 2, 3], 'b':[0, 1, 2, 3, 4, 5]}) a b 0 1 0 1 1 1 2 2 2 3 3 3 4 2 4 5 3 5 df.groupby('a').transform(lambda x: x) b 0 0 1 1 2 2 3 3 4 4 5 5
Now, with most operations on groups the ‘missing’ column becomes a new index (which can then be adjusted using reset_index
, or set as_index=False
), but when using transform it just disappears, leaving the original index and a new dataset without the key.
Edit: here’s a one liner of what I would like to be able to do
df.groupby('a').transform(lambda x: x+1).groupby('a').mean() KeyError 'a'
In the example from the pandas docs a function is used to split based on the index, which appears to avoid this issue entirely. Alternatively, it would always be possible just to add the column after the groupby/transform, but surely there’s a better way?
Update: It looks like reset_index/as_index are intended only for functions that reduce each group to a single row. There seem to be a couple options, from answers
Advertisement
Answer
that is bizzare!
I tricked it like this
df.groupby(df.a.values).transform(lambda x: x)