Skip to content
Advertisement

Applying custom function to groupby object keeps groupby column

I have a dataframe which as a column for grouping by and several other columns. Play dataframe:

d = {'group_col': ["a","b","b","a"],'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)

When using a groupby on this dataframe followed by a default function, the groupby column is set as an index and not included in the results:

# using sum as an example
df.groupby('group_col').sum()

enter image description here

But when I define a custom function and use apply, I get an unwanted additional column:

# Sum function for use by apply
def sum_2(x):
    return x.sum()

df.groupby('group_col').apply(sum_2)

enter image description here

How do I avoid having this additional column?

The actual function I want to use is the following:

def tss(x):
    return ((x - x.mean(numeric_only = True))**2).sum()
df.groupby('group_col').apply(tss)

Advertisement

Answer

You can try to use .agg instead of .apply:

def tss(x):
    return ((x - x.mean()) ** 2).sum()


print(df.groupby("group_col").agg(tss))

Prints:

           col1  col2
group_col            
a           4.5   4.5
b           0.5   0.5
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement