I have a dataframe which as a column for grouping by and several other columns. Play dataframe:
d = {'group_col': ["a","b","b","a"],'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]} df = pd.DataFrame(data=d)
When using a groupby on this dataframe followed by a default function, the groupby column is set as an index and not included in the results:
# using sum as an example df.groupby('group_col').sum()
But when I define a custom function and use apply
, I get an unwanted additional column:
# Sum function for use by apply def sum_2(x): return x.sum() df.groupby('group_col').apply(sum_2)
How do I avoid having this additional column?
The actual function I want to use is the following:
def tss(x): return ((x - x.mean(numeric_only = True))**2).sum() df.groupby('group_col').apply(tss)
Advertisement
Answer
You can try to use .agg
instead of .apply
:
def tss(x): return ((x - x.mean()) ** 2).sum() print(df.groupby("group_col").agg(tss))
Prints:
col1 col2 group_col a 4.5 4.5 b 0.5 0.5