Skip to content
Advertisement

Change pandas dataframe content in a function

I’m writing a class that does one hot encoding, but it doesn’t work as I expected.

On my main code I have this:

for col in train_x_categorical.columns:
   dataCleaner.addFeatureToBeOneHotEncoded(col)

dataCleaner.applyOneHotEncoding(train_x_categorical)

train_x_categorical.head()

The class method is the following:

def addFeatureToBeOneHotEncoded(self, featureName):
    self._featuresToBeOneHotEncoded.append(featureName)

def applyOneHotEncoding(self, data):
    for feature in self._featuresToBeOneHotEncoded:
        dummies = pd.get_dummies(data[feature])
        dummies.drop(dummies.columns[-1],axis=1,inplace=True) 
        data.drop(feature, axis=1, inplace=True) 
        data = pd.concat([data, dummies], axis=1)
        print(data.columns)

Now, with print(data.columns) I can see that the method works correctly, but when train_x_categorical.head() runs I can’t see the effect of the method applyOneHotEncoding.

I don’t understand why this is happening and how to fix it. I thought that since python passes values by reference, the variable data points to the same object as the variable train_x_categorical, so in the method applyOneHotEncoding I was working on the same object, but clearly I am wrong. Can someone explain to me why my reasoning is wrong and how I can solve the problem?

Advertisement

Answer

It is because applyOneHotEncoding updates the reference variable – data. That doesn’t work the way you think it does. This is a well-known feature in Python. There are a couple of ways around this that I know of – one is to have your method return the value. That won’t work in your case since you are doing this as part of a loop. The other option is to put the variable to be updated in a wrapper class and pass that to the method. Then updating the variable that is part of the wrapper class will work.

See this for an exhaustive discussion: How do I pass a variable by reference?

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement