Skip to content
Advertisement

how to apply a class function to replace NaN for mean within a subset of pandas df columns?

The class is composed of a set of attributes and functions including:

Attributes:

  • df : a pandas dataframe.
  • numerical_feature_names: df columns with a numeric value.
  • label_column_names: df string columns to be grouped.

Functions:

  • mean(nums): takes a list of numbers as input and returns the mean
  • fill_na(df, numerical_feature_names, label_columns): takes class attributes as inputs and returns a transformed df.

And here’s the class:

JavaScript

When trying to apply it to a pandas df:

JavaScript

The next error arises:

ValueError: Grouper and axis must be same length

data and class parameters

JavaScript

How could I change the class in order to get the transformed df (i.e. the one that replaces np.nan with it’s group mean)?

Advertisement

Answer

First the error is because label_column_names is already a list, so in the groupby you don’t need the [] around it. so it should be df.groupby(label_column_names)... instead of df.groupby([label_column_names])...

Now, to actually solve you problem, in the function fill_na of your class, replace the loop for (you don’t need it actually) by

JavaScript

in which you fillna the columns numerical_feature_names by the result of the groupy.tranform with the mean of these columns

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement