Skip to content
Advertisement

Change pandas dataframe content in a function

I’m writing a class that does one hot encoding, but it doesn’t work as I expected.

On my main code I have this:

JavaScript

The class method is the following:

JavaScript

Now, with print(data.columns) I can see that the method works correctly, but when train_x_categorical.head() runs I can’t see the effect of the method applyOneHotEncoding.

I don’t understand why this is happening and how to fix it. I thought that since python passes values by reference, the variable data points to the same object as the variable train_x_categorical, so in the method applyOneHotEncoding I was working on the same object, but clearly I am wrong. Can someone explain to me why my reasoning is wrong and how I can solve the problem?

Advertisement

Answer

It is because applyOneHotEncoding updates the reference variable – data. That doesn’t work the way you think it does. This is a well-known feature in Python. There are a couple of ways around this that I know of – one is to have your method return the value. That won’t work in your case since you are doing this as part of a loop. The other option is to put the variable to be updated in a wrapper class and pass that to the method. Then updating the variable that is part of the wrapper class will work.

See this for an exhaustive discussion: How do I pass a variable by reference?

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement