Skip to content
Advertisement

Creating a new column for predicted cluster: SettingWithCopyWarning

This question will be a duplicate unfortunately, but I could not fix the issue in my code, even after looking at the other similar questions and their related answers. I need to split my dataset into train a test a dataset. However, it seems I am doing some error when I add a new column for predicting the cluster. The error that I get is:

JavaScript

There are a few questions on this error, but probably I am doing something wrong, as I have not fixed the issue yet and I am still getting the same error as above. The dataset is the following:

JavaScript

I have split the dataset into train and test sets as follows:

JavaScript

The lines of code that cause the error are:

JavaScript

I think these two questions should have been able to help me with code:

How to add k-means predicted clusters in a column to a dataframe in Python

How to deal with SettingWithCopyWarning in Pandas?

but something is still continuing to be wrong within my code.

Could you please have a look at it and help me to fix this issue before closing this question as duplicate?

Advertisement

Answer

IMHO, train_test_split gives you a tuple, and when you do copy(), that copy() is a tuple‘s operation, not pandas’. This triggers pandas’ infamous copy warning.

So you only create a shallow copy of the tuple, not the elements. In other words

JavaScript

is equivalent to:

JavaScript

Since pandas dataframes are pointers, X_train and X_test may or may not point to the same data as X does. If you want to copy the dataframes, you should explicitly force copy() on each dataframe:

JavaScript

or

JavaScript

Then each X_train and X_test is a new dataframe pointing to new memory data.


Update: Tested this code without any warnings:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement