I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error :
JavaScript
x
27
27
1
type x y fx fy fz
2
0 0 -0.639547 -1.013450 -8.600000e-231 -1.390000e-230 0.0
3
0 1 -0.497006 -2.311890 0.000000e+00 0.000000e+00 0.0
4
1 0 0.154376 -0.873189 1.150000e-228 -1.480000e-226 0.0
5
1 1 -0.342055 -2.179370 0.000000e+00 0.000000e+00 0.0
6
2 0 0.312719 -0.872756 -2.370000e-221 2.420000e-221 0.0
7
8
[5 rows x 10 columns]
9
10
(1047064, 10)
11
---------------------------------------------------------------------------
12
ValueError Traceback (most recent call last)
13
<ipython-input-28-0b631a51ce61> in <module>()
14
33
15
34
16
---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
17
18
4 frames
19
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self)
20
327 for block in self.blocks:
21
328 if block.shape[1:] != mgr_shape[1:]:
22
--> 329 raise construction_error(tot_items, block.shape[1:], self.axes)
23
330 if len(self.items) != tot_items:
24
331 raise AssertionError(
25
26
ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)
27
This is my code:
JavaScript
1
32
32
1
import sys
2
import pandas as pd
3
import pylab as pl
4
import numpy as np
5
import matplotlib.pyplot as plt
6
from sklearn import preprocessing
7
from sklearn.decomposition import PCA
8
from sklearn.preprocessing import StandardScaler
9
10
11
df1=pd.read_csv('./data/1.csv')
12
df2=pd.read_csv('./data/2.csv')
13
df = pd.concat([df1, df2], axis=0).sort_index()
14
print(df.head())
15
print(df.shape)
16
17
features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz']
18
# Separating out the features
19
x = df.loc[:, features].values
20
# Separating out the target
21
y = df.loc[:,['type']].values
22
# Standardizing the features
23
x = StandardScaler().fit_transform(x)
24
25
pca = PCA(n_components=4)
26
principalComponents = pca.fit_transform(x)
27
principalDf = pd.DataFrame(data = principalComponents
28
, columns = ['pcc1','pcc2','pcc3', 'pcc4'])
29
30
31
finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
32
I guess I am getting error while concat my components and df[‘type’].
Can I get idea to get rid of this error?
Thank you.
Advertisement
Answer
The index in df
is not the same as in principalDf
. We have (using a short version of your data)
JavaScript
1
3
1
df.index
2
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')
3
and
JavaScript
1
3
1
principalDf.index
2
RangeIndex(start=0, stop=10, step=1)
3
Hence concat
is getting confused. You can fix this by resetting the index early on:
JavaScript
1
5
1
2
df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added
3
4
5