I’m trying to find the best feature set using random forest approach I need to split the dataset into test and train. here is my code
from sklearn.model_selection import train_test_split def train_test_split(x,y): # split data train 70 % and test 30 % x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.3,random_state=42) #normalization x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min()) x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min()) train_test_split(data,data_y)
parameters data,data_y are parsing correctly. But I’m getting the following error. I couldn’t figure out why this is.
Advertisement
Answer
You are using the same function name in your code same as the one from sklearn.preprocessing, changing your function name would do the job. Something like this,
from sklearn.model_selection import train_test_split def my_train_test_split(x,y): # split data train 70 % and test 30 % x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.3,random_state=42) #normalization x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min()) x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min()) my_train_test_split(data,data_y)
Explaination :- Although there is method overloading in python (ie. same named function selected on the basis on the type of arguments) here in your case turns out both the functions need the same type of arguments, so different naming is the only possible solution IMO.