Skip to content
Advertisement

Changing one data frame also changes it’s copy

Couldnt come up with a better title, so here we are. I am running the following code:

dow_23457 = df
dow_23457 = dow_23457.set_index('date', inplace = True)
dow_23457 = dof_2345i.shift(24)
dow_23457 = dow_23457.reset_index()

As far as I understand, I first make a copy of ‘df’ and then I change the copy. What makes me confused is that when I run the second line, the ‘date’ column, becomes the index even in the ‘df’ data frame. The changes from the two following lines, only applies to the copied (dow_23457) data frame though. How can this happen?

Advertisement

Answer

I first make a copy of ‘df’ and then I change the copy

Nope! When you do dow_23457 = df, you’re making dow_23457 look at the same underlying object df has been looking at. Direct assignment doesn’t copy data in the language.

You need to be explicit:

dow_23457 = df.copy()

which makes dow_23457 now look at an entirely different, newly made dataframe object which is independent of what df looks at. (well except you had some lists, dicts etc. in the cells of the dataframe, which resists in the copying process… but you shouldn’t have them in the cells of a dataframe in the first place!)


For more on this “naming” subject, you might want to see here (it has also a video form as well as plain text).

Advertisement