Currently I want to generate some samples to get expectation & variance of it.
Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise}
I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable
But here is what I have when simulating using python:
import numpy as np N = 100_000 X = np.random.uniform(size=N, low=0, high=1) Y = [2*x for x in X] np.mean(Y) # 1.00221 <- not equal to 2/3 np.var(Y) # 0.3323 <- not equal to 1/18
What am I doing wrong here? Thank you in advanced.
Advertisement
Answer
To approximate the integral of some function of x, say, g(x), over S = [0, 1], using Monte Carlo simulation, you
generate
Nrandom numbers in[0, 1](i.e. draw from the uniform distributionU[0, 1])calculate the arithmetic mean of
g(x_i)overi = 1toi = Nwherex_iis theith random number: i.e.(1 / N)times the sum fromi = 1toi = Nofg(x_i).
The result of step 2 is the approximation of the integral.
The expected value of continuous random variable X with pdf f(x) and set of possible values S is the integral of x * f(x) over S. The variance of X is the expected value of X-squared minus the square of the expected value of X.
- Expected value: to approximate the integral of
x * f(x)overS = [0, 1](i.e. the expected value ofX), setg(x) = x * f(x)and apply the method outlined above. - Variance: to approximate the integral of
(x * x) * f(x)overS = [0, 1](i.e. the expected value ofX-squared), setg(x) = (x * x) * f(x)and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value ofXto obtain an estimate of the variance ofX.
Adapting your method:
import numpy as np N = 100_000 X = np.random.uniform(size = N, low = 0, high = 1) Y = [x * (2 * x) for x in X] E = [(x * x) * (2 * x) for x in X] # mean print((a := np.mean(Y))) # variance print(np.mean(E) - a * a)
Output
0.6662016482614397 0.05554821798023696
Instead of making Y and E lists, a much better approach is
Y = X * (2 * X) E = (X * X) * (2 * X)
Y, E in this case are numpy arrays. This approach is much more efficient. Try making N = 100_000_000 and compare the execution times of both methods. The second should be much faster.