Skip to content
Advertisement

Matrix from outer product not positive definite

I was trying to simulate multivariate normal data in python the following way

n = 100
p = 10
mu = np.random.normal(loc=0, scale=3, size=p)
A = np.random.uniform(low = -4, high = 4, size = (p, 1))
A = pd.DataFrame(A)
Sigma = np.dot(A, np.transpose(A))
Sigma = pd.DataFrame(Sigma)
X = np.random.multivariate_normal(mean = mu, cov = Sigma, size = n)

However, the resulting variables are always perfectly linearly correlated. I then checked np.linalg.eigvals(Sigma) and realized that Sigma is not positive semi-definite. This surprised me since 1) I did not get an error from multivariate_normal and 2) Sigma was generated from an outer product, which is supposed to be positive semi-definite. Can anyone tell me what’s going on and how to fix this?

Advertisement

Answer

Here’s one idea. The principle is to generate p random vectors (rather than the p scalars in the original code), find the covariance between each pair of vectors, then use that covariance matrix to construct the multivariate distribution. This removes the predictable interrelationships between the elements of the original Sigma (see comments).

import numpy as np

n = 100
p = 10

mu = np.random.normal(loc=0, scale=3, size=p)
A_vec = np.random.uniform(low = -4, high = 4, size = (p, p))
Sigma_vec = np.cov(A_vec)
X = np.random.multivariate_normal(mean = mu, cov = Sigma_vec, size = n)

Inspecting the results of this with np.corrcoef(X.T) shows that the generated variables are no longer perfectly correlated.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement