Skip to content
Advertisement

TypeError: train_test_split() got an unexpected keyword argument ‘test_size’

I’m trying to find the best feature set using random forest approach I need to split the dataset into test and train. here is my code

from sklearn.model_selection import train_test_split

def train_test_split(x,y):
    # split data train 70 % and test 30 %
    x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.3,random_state=42)
    #normalization
    x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
    x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())

train_test_split(data,data_y)

parameters data,data_y are parsing correctly. But I’m getting the following error. I couldn’t figure out why this is.

enter image description here

Advertisement

Answer

You are using the same function name in your code same as the one from sklearn.preprocessing, changing your function name would do the job. Something like this,

from sklearn.model_selection import train_test_split

def my_train_test_split(x,y):
    # split data train 70 % and test 30 %
    x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.3,random_state=42)
    #normalization
    x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
    x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())

my_train_test_split(data,data_y)

Explaination :- Although there is method overloading in python (ie. same named function selected on the basis on the type of arguments) here in your case turns out both the functions need the same type of arguments, so different naming is the only possible solution IMO.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement