I’m trying to get the variances from the eigen vectors.
What is the difference between explained_variance_ratio_
and explained_variance_
in PCA
?
Advertisement
Answer
The percentage of the explained variance is:
explained_variance_ratio_
The variance i.e. the eigenvalues of the covariance matrix is:
explained_variance_
Formula:
explained_variance_ratio_ = explained_variance_ / np.sum(explained_variance_)
Example:
import numpy as np from sklearn.decomposition import PCA X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) pca = PCA(n_components=2) pca.fit(X)
pca.explained_variance_ array([7.93954312, 0.06045688]) # the actual eigenvalues (variance) pca.explained_variance_ratio_ # the percentage of the variance array([0.99244289, 0.00755711])
Also based on the above formula:
7.93954312 / (7.93954312+ 0.06045688) = 0.99244289
From the documentation:
explained_variance_ : array, shape (n_components,) The amount of variance explained by each of the selected components.
Equal to n_components largest eigenvalues of the covariance matrix of X.
New in version 0.18.
explained_variance_ratio_ : array, shape (n_components,) Percentage of variance explained by each of the selected components.
If n_components is not set then all components are stored and the sum of the ratios is equal to 1.0.