I have a dataframe that looks like this one:
import numpy as np import pandas as pd column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 'shucked_weight', 'viscera_weight', 'shell_weight', 'rings'] # Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/ abalone = pd.read_csv('abalone.data', header=None, names=column_headers) # Split the data cross validation shuffled_index = np.random.permutation(len(scaled)) shuffled_df = scaled.reindex(shuffled_index) shuffled_df.head() # Split the dataset in cross fold validation k = 4 folds = np.array_split(shuffled_df, k) .... k = 5 scores = list() for fold in folds: training_set = list(folds) training_set.remove(fold) training_set = pd.concat(training_set) d = fold.apply(lambda row: distance(row, training_set, k), axis=1) error = root_mean_squared_error(fold['rings'], d) scores.append(error) >> ValueError: Can only compare identically-labeled DataFrame objects
I am implementing K-Neighbors Algorithm with Pandas and Numpy and when getting a list of dataframes, I can’t remove the one I am looping on with a list. How to remove the one I am looping on from the list so I can concatenate the remaining ones on cross fold validation?
Advertisement
Answer
You can delete the dataframe by index from your list with del
.
Try this minimal example:
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 'shucked_weight', 'viscera_weight', 'shell_weight', 'rings'] abalone = pd.read_csv('__data_input/abalone.data', header=None, names=column_headers) ... folds = np.array_split(abalone, 4) for idx, fold in enumerate(folds): training_set = folds.copy() del training_set[idx] training_set = pd.concat(training_set) ...