Currently I want to generate some samples to get expectation & variance of it.
Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise}
I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable
But here is what I have when simulating using python:
import numpy as np N = 100_000 X = np.random.uniform(size=N, low=0, high=1) Y = [2*x for x in X] np.mean(Y) # 1.00221 <- not equal to 2/3 np.var(Y) # 0.3323 <- not equal to 1/18
What am I doing wrong here? Thank you in advanced.
Advertisement
Answer
To approximate the integral of some function of x
, say, g(x)
, over S = [0, 1]
, using Monte Carlo simulation, you
generate
N
random numbers in[0, 1]
(i.e. draw from the uniform distributionU[0, 1]
)calculate the arithmetic mean of
g(x_i)
overi = 1
toi = N
wherex_i
is thei
th random number: i.e.(1 / N)
times the sum fromi = 1
toi = N
ofg(x_i)
.
The result of step 2 is the approximation of the integral.
The expected value of continuous random variable X
with pdf f(x)
and set of possible values S
is the integral of x * f(x)
over S
. The variance of X
is the expected value of X
-squared minus the square of the expected value of X
.
- Expected value: to approximate the integral of
x * f(x)
overS = [0, 1]
(i.e. the expected value ofX
), setg(x) = x * f(x)
and apply the method outlined above. - Variance: to approximate the integral of
(x * x) * f(x)
overS = [0, 1]
(i.e. the expected value ofX
-squared), setg(x) = (x * x) * f(x)
and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value ofX
to obtain an estimate of the variance ofX
.
Adapting your method:
import numpy as np N = 100_000 X = np.random.uniform(size = N, low = 0, high = 1) Y = [x * (2 * x) for x in X] E = [(x * x) * (2 * x) for x in X] # mean print((a := np.mean(Y))) # variance print(np.mean(E) - a * a)
Output
0.6662016482614397 0.05554821798023696
Instead of making Y
and E
lists, a much better approach is
Y = X * (2 * X) E = (X * X) * (2 * X)
Y
, E
in this case are numpy
arrays. This approach is much more efficient. Try making N = 100_000_000
and compare the execution times of both methods. The second should be much faster.