I have code for splitting a data set dfXa
of size 351 by 14 into 10 fold and choosing one fold for validation denoted by dfX_val
of size 35 by 14 and resting 9 fold for training by dfX_train
of size 316 by 14.
But how to do this for a 5-fold CV? I want to implement 5-fold CV without using the sklearn.
Advertisement
Answer
You can use cross_val_score from the scikit learn library as mentioned here.
from sklearn.model_selection import cross_val_score estimator = KMeans(n_clusters=m, random_state=0) scores = cross_val_score(estimator, X_train, y_train, scoring='accuracy', cv=5)
To get the labels, i.e., y_train values you can do:
X = df.loc[:, 2:].values y = df.loc[:, 1].values
where df is your dataframe of size 351 by 14. I am assuming here the first comlumn of your data frame are labels, which normally is in such tasks.