I have a dataframe that looks like this one:
JavaScript
x
42
42
1
import numpy as np
2
import pandas as pd
3
4
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight',
5
'shucked_weight', 'viscera_weight', 'shell_weight',
6
'rings']
7
8
# Data source: https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
9
10
abalone = pd.read_csv('abalone.data', header=None, names=column_headers)
11
12
# Split the data cross validation
13
14
shuffled_index = np.random.permutation(len(scaled))
15
16
shuffled_df = scaled.reindex(shuffled_index)
17
18
shuffled_df.head()
19
20
# Split the dataset in cross fold validation
21
22
k = 4
23
folds = np.array_split(shuffled_df, k)
24
25
.
26
27
k = 5
28
scores = list()
29
for fold in folds:
30
training_set = list(folds)
31
training_set.remove(fold)
32
training_set = pd.concat(training_set)
33
d = fold.apply(lambda row: distance(row, training_set, k), axis=1)
34
error = root_mean_squared_error(fold['rings'], d)
35
scores.append(error)
36
37
>>
38
39
ValueError: Can only compare identically-labeled DataFrame objects
40
41
42
I am implementing K-Neighbors Algorithm with Pandas and Numpy and when getting a list of dataframes, I can’t remove the one I am looping on with a list. How to remove the one I am looping on from the list so I can concatenate the remaining ones on cross fold validation?
Advertisement
Answer
You can delete the dataframe by index from your list with del
.
Try this minimal example:
JavaScript
1
17
17
1
column_headers = ['sex', 'length', 'diamater', 'height', 'whole_weight',
2
'shucked_weight', 'viscera_weight', 'shell_weight',
3
'rings']
4
5
6
abalone = pd.read_csv('__data_input/abalone.data', header=None, names=column_headers)
7
8
9
10
folds = np.array_split(abalone, 4)
11
12
for idx, fold in enumerate(folds):
13
training_set = folds.copy()
14
del training_set[idx]
15
training_set = pd.concat(training_set)
16
17