I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error :
type x y ... fx fy fz 0 0 -0.639547 -1.013450 ... -8.600000e-231 -1.390000e-230 0.0 0 1 -0.497006 -2.311890 ... 0.000000e+00 0.000000e+00 0.0 1 0 0.154376 -0.873189 ... 1.150000e-228 -1.480000e-226 0.0 1 1 -0.342055 -2.179370 ... 0.000000e+00 0.000000e+00 0.0 2 0 0.312719 -0.872756 ... -2.370000e-221 2.420000e-221 0.0 [5 rows x 10 columns] (1047064, 10) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-28-0b631a51ce61> in <module>() 33 34 ---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1) 4 frames /usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self) 327 for block in self.blocks: 328 if block.shape[1:] != mgr_shape[1:]: --> 329 raise construction_error(tot_items, block.shape[1:], self.axes) 330 if len(self.items) != tot_items: 331 raise AssertionError( ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)
This is my code:
import sys import pandas as pd import pylab as pl import numpy as np import matplotlib.pyplot as plt from sklearn import preprocessing from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler df1=pd.read_csv('./data/1.csv') df2=pd.read_csv('./data/2.csv') df = pd.concat([df1, df2], axis=0).sort_index() print(df.head()) print(df.shape) features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz'] # Separating out the features x = df.loc[:, features].values # Separating out the target y = df.loc[:,['type']].values # Standardizing the features x = StandardScaler().fit_transform(x) pca = PCA(n_components=4) principalComponents = pca.fit_transform(x) principalDf = pd.DataFrame(data = principalComponents , columns = ['pcc1','pcc2','pcc3', 'pcc4']) finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
I guess I am getting error while concat my components and df[‘type’].
Can I get idea to get rid of this error?
Thank you.
Advertisement
Answer
The index in df
is not the same as in principalDf
. We have (using a short version of your data)
df.index Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')
and
principalDf.index RangeIndex(start=0, stop=10, step=1)
Hence concat
is getting confused. You can fix this by resetting the index early on:
... df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added ...