Fit data with a lognormal function via Maximum Likelihood estimators

Question

Could someone help me in fitting the data collapse_fractions with a lognormal function, which has median and standard deviation derived via the maximum likelihood method? I tried scipy.stats.lognormal.fit(data), but I did not obtain the data I retrieved with Excel. The excel file can be downloaded: https://stacks.stanford.edu/file/druid:sw589ts9300/p_collapse_from_msa.xlsx Also, any reference is really welcomed. Answer I couldn't figure out how to get

Accepted Answer

I couldn&#8217;t figure out how to get the lognorm.fit to work. So I just implemented the functions from your excel-file and used scipy.optimize as the optimizer. The added benefit is, that it is easier to understand what is actually going on compared to lognorm.fit especially with the excel on the side.Here is my implementation:from functools import partialimport numpy as npfrom scipy import optimize, statsim = np.array([0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 1])im_log = np.log(im)number_of_analyses = np.array([40, 40, 40, 40, 40, 40, 40, 40]) number_of_collapses = np.array([0, 0, 0, 4, 6, 13, 12, 16])FORMAT_STRING = "{:<20}{:<20}{:<20}"print(FORMAT_STRING.format("sigma", "beta", "log_likelihood_sum"))def neg_log_likelihood_sum(params, im_l, no_a, no_c):    sigma, beta = params    theoretical_fragility_function = stats.norm(np.log(sigma), beta).cdf(im_l)    likelihood = stats.binom.pmf(no_c, no_a, theoretical_fragility_function)    log_likelihood = np.log(likelihood)    log_likelihood_sum = np.sum(log_likelihood)    print(FORMAT_STRING.format(sigma, beta, log_likelihood_sum))    return -log_likelihood_sumneg_log_likelihood_sum_partial = partial(neg_log_likelihood_sum, im_l=im_log, no_a=number_of_analyses, no_c=number_of_collapses)res = optimize.minimize(neg_log_likelihood_sum_partial, (1, 1), method="Nelder-Mead")print(res)And the final result is: final_simplex: (array([[1.07613697, 0.42927824],       [1.07621925, 0.42935678],       [1.07622438, 0.42924577]]), array([10.7977048 , 10.79770573, 10.79770723]))           fun: 10.797704803509903       message: 'Optimization terminated successfully.'          nfev: 68           nit: 36        status: 0       success: True             x: array([1.07613697, 0.42927824])The interesting part for you is on line one, the same final result as in the excel-calculation (sigma=1.07613697 and beta=0.42927824).If you have any questions about what I did here, don&#8217;t hesitate to ask as you said you are new to python. A few things in advance:I did minimize the negative likelihood-sum as there is no maximizer in scipy.partial from functools returns a function that has the specified arguments already defined (in this case im_l, no_a and no_c as they don&#8217;t change) the partial function can then be called with just the missing argument.The neg_log_likelihood_sum-function is basically whats happening in the excel-file, so it should be easy to understand when viewing it side-by-side.scipy.optimize.minimize minimizes the function given as the first argument by changing the parameters (start-value as second argument). The method is chosen because it gave good results, I didn&#8217;t dive deep into the abyss of different optimization-methods, gradients etc. So it may well be, that there is a better setup, but this one works fine and seems faster than the optimization with lognorm.fit.The plot like in the excel-file is done like this with the results res from the optimization:import matplotlib.pyplot as pltx = np.linspace(0, 2.5, 100)y = stats.norm(np.log(res["x"][0]), res["x"][1]).cdf(np.log(x))plt.plot(x, y)plt.scatter(im, number_of_collapses/number_of_analyses)plt.show()

Advertisement

Answer