Remove DataFrames from a list of DataFrames

I have a dataframe that looks like this one:

import numpy as np
import pandas as pd

column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 
                  'shucked_weight', 'viscera_weight', 'shell_weight', 
                  'rings']

# Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
    
abalone = pd.read_csv('abalone.data', header=None, names=column_headers)

# Split the data cross validation

shuffled_index = np.random.permutation(len(scaled))

shuffled_df = scaled.reindex(shuffled_index)

shuffled_df.head()

# Split the dataset in cross fold validation

k = 4
folds = np.array_split(shuffled_df, k)

....

k = 5
scores = list()
for fold in folds:
    training_set = list(folds)
    training_set.remove(fold)
    training_set = pd.concat(training_set)
    d = fold.apply(lambda row: distance(row, training_set, k), axis=1)
    error = root_mean_squared_error(fold['rings'], d)
    scores.append(error)
   
>>

ValueError: Can only compare identically-labeled DataFrame objects

JavaScript
​x
 
import numpy as np
import pandas as pd
​
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 
                  'shucked_weight', 'viscera_weight', 'shell_weight', 
                  'rings']
​
# Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
    
abalone = pd.read_csv('abalone.data', header=None, names=column_headers)
​
# Split the data cross validation
​
shuffled_index = np.random.permutation(len(scaled))
​
shuffled_df = scaled.reindex(shuffled_index)
​
shuffled_df.head()
​
# Split the dataset in cross fold validation
​
k = 4
folds = np.array_split(shuffled_df, k)
​
....
​
k = 5
scores = list()
for fold in folds:
    training_set = list(folds)
    training_set.remove(fold)
    training_set = pd.concat(training_set)
    d = fold.apply(lambda row: distance(row, training_set, k), axis=1)
    error = root_mean_squared_error(fold['rings'], d)
    scores.append(error)
   
>>
​
ValueError: Can only compare identically-labeled DataFrame objects
​
​
​

I am implementing K-Neighbors Algorithm with Pandas and Numpy and when getting a list of dataframes, I can’t remove the one I am looping on with a list. How to remove the one I am looping on from the list so I can concatenate the remaining ones on cross fold validation?

Answer

You can delete the dataframe by index from your list with del.

Try this minimal example:

column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 
                  'shucked_weight', 'viscera_weight', 'shell_weight', 
                  'rings']

 
abalone = pd.read_csv('__data_input/abalone.data', header=None, names=column_headers)

...

folds = np.array_split(abalone, 4)

for idx, fold in enumerate(folds):
    training_set = folds.copy()
    del training_set[idx]
    training_set = pd.concat(training_set)
    ...

JavaScript
 
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight', 
                  'shucked_weight', 'viscera_weight', 'shell_weight', 
                  'rings']
​
 
abalone = pd.read_csv('__data_input/abalone.data', header=None, names=column_headers)
​
...
​
folds = np.array_split(abalone, 4)
​
for idx, fold in enumerate(folds):
    training_set = folds.copy()
    del training_set[idx]
    training_set = pd.concat(training_set)
    ...
​

Advertisement

Answer