I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error : This is my code: I guess I am getting error while concat my components and df['type']. Can I get idea to get rid of this error? Thank you. Answer

shape error while concating columns after Principal Analysis in csv

I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error :

 type         x         y  ...             fx             fy   fz
0     0 -0.639547 -1.013450  ... -8.600000e-231 -1.390000e-230  0.0
0     1 -0.497006 -2.311890  ...   0.000000e+00   0.000000e+00  0.0
1     0  0.154376 -0.873189  ...  1.150000e-228 -1.480000e-226  0.0
1     1 -0.342055 -2.179370  ...   0.000000e+00   0.000000e+00  0.0
2     0  0.312719 -0.872756  ... -2.370000e-221  2.420000e-221  0.0

[5 rows x 10 columns]

(1047064, 10)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-0b631a51ce61> in <module>()
     33 
     34 
---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1)

4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self)
    327         for block in self.blocks:
    328             if block.shape[1:] != mgr_shape[1:]:
--> 329                 raise construction_error(tot_items, block.shape[1:], self.axes)
    330         if len(self.items) != tot_items:
    331             raise AssertionError(

ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)

JavaScript
​x
 
 type         x         y  ...             fx             fy   fz
0     0 -0.639547 -1.013450  ... -8.600000e-231 -1.390000e-230  0.0
0     1 -0.497006 -2.311890  ...   0.000000e+00   0.000000e+00  0.0
1     0  0.154376 -0.873189  ...  1.150000e-228 -1.480000e-226  0.0
1     1 -0.342055 -2.179370  ...   0.000000e+00   0.000000e+00  0.0
2     0  0.312719 -0.872756  ... -2.370000e-221  2.420000e-221  0.0
​
[5 rows x 10 columns]
​
(1047064, 10)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-0b631a51ce61> in <module>()
     33 
     34 
---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
​
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self)
    327         for block in self.blocks:
    328             if block.shape[1:] != mgr_shape[1:]:
--> 329                 raise construction_error(tot_items, block.shape[1:], self.axes)
    330         if len(self.items) != tot_items:
    331             raise AssertionError(
​
ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)
​

This is my code:

import sys
import pandas as pd
import pylab as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler


df1=pd.read_csv('./data/1.csv')
df2=pd.read_csv('./data/2.csv')
df = pd.concat([df1, df2], axis=0).sort_index()
print(df.head())
print(df.shape)

features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['type']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)

pca = PCA(n_components=4)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['pcc1','pcc2','pcc3', 'pcc4'])


finalDf = pd.concat([principalDf, df[['type']]], axis = 1)

JavaScript
 
import sys
import pandas as pd
import pylab as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
​
​
df1=pd.read_csv('./data/1.csv')
df2=pd.read_csv('./data/2.csv')
df = pd.concat([df1, df2], axis=0).sort_index()
print(df.head())
print(df.shape)
​
features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['type']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
​
pca = PCA(n_components=4)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['pcc1','pcc2','pcc3', 'pcc4'])
​
​
finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
​

I guess I am getting error while concat my components and df[‘type’].

Can I get idea to get rid of this error?

Thank you.

Answer

The index in df is not the same as in principalDf. We have (using a short version of your data)

df.index
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')

JavaScript
 
df.index
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')
​

and

principalDf.index
RangeIndex(start=0, stop=10, step=1)

JavaScript
 
principalDf.index
RangeIndex(start=0, stop=10, step=1)
​

Hence concat is getting confused. You can fix this by resetting the index early on:

...
df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added
...

JavaScript
 
...
df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added
...
​
​

Advertisement

Answer