I try to plot normalized histogram using example from numpy.random.normal documentation. For this purpose I generate normally distributed random sample.
mu_true = 0 sigma_true = 0.1 s = np.random.normal(mu_true, sigma_true, 2000)
Then I fitt normal distribution to the data and calculate pdf.
mu, sigma = stats.norm.fit(s) points = np.linspace(stats.norm.ppf(0.01,loc=mu,scale=sigma), stats.norm.ppf(0.9999,loc=mu,scale=sigma),100) pdf = stats.norm.pdf(points,loc=mu,scale=sigma)
Display fitted pdf and data histogram.
plt.hist(s, 30, density=True); plt.plot(points, pdf, color='r') plt.show()
I use density=True
, but it is obviously, that pdf and histogram are not normalized.
What can one suggests to plot truly normalized histogram and pdf?
Seaborn distplot also doesn’t solve the problem.
import seaborn as sns ax = sns.distplot(s)
Advertisement
Answer
What makes you think it is not normalised? At a guess, it’s probably because the heights of each column extend to values greater than 1. However, this thinking is flawed because in a normalised histogram/pdf, the total area under it should sum to one (not the heights). When you are dealing with small steps in x (as you are), that are less than one, then it is not surprising that the column heights are greater than one!
You can see this clearly in the scipy example you link: the x-values are much greater (by an order of magnitude) so it follows that their y-values are also smaller. You will see the same effect if you change your distribution to cover a wider range of values. Try a sigma of 10 instead of 0.1, see what happens!