Fitting data to a complementary error function with multiple variables in Python

Tags: , , , ,



I am having trouble to fit experimental data to a complementary error function in Python 3.7.4.

import matplotlib.pyplot as plt
import math 
import numpy as np
from scipy import optimize
from scipy import integrate

with open('C:Path\Data\test.txt', 'r') as f:
    lines = f.readlines()
    x = [float(line.split(',')[0]) for line in lines]
    y = [float(line.split(',')[1]) for line in lines]


int_start = 35
int_end = 75

start = float(x[int_start])
end = float(x[int_end]) 
  
x_data = np.linspace(start, end, (int_end-int_start)+1)
y_data = y[int_start: int_end+1]


def integrand(x, a, b, c):
    return a*np.exp(((-1)*(x-b)**2)/(2*(c**2)))

def cerf(x, a, b, c):
    return integrate.quad(integrand, x, np.inf)

params, params_covariance = optimize.curve_fit(cerf, x_data, y_data)

plt.plot(x_data, y_data, 'x', label='Data')
plt.plot(x_data, integrand(x_data, params[0], params[1], params[2]), '-', label="fit")

plt.legend(loc='best')
plt.show()

More precisely, I want to fit my data to the complementary error function consisting of the integrand function with the parameters a, b, c, and the cerf function doing the actual integration. The integration should go from x (the argument of the function) to +infinity. Afterwards, I wanted to use standard curve_fit from scipy. But now I am getting a value error as follows:

> ValueError                                Traceback (most recent call last)
<ipython-input-33-8130a3eb44bb> in <module>
     29     return integrate.quad(integrand, x, np.inf)
     30 
---> 31 params, params_covariance = optimize.curve_fit(cerf, x_data, y_data)

~AppDataRoamingPythonPython37site-packagesscipyintegratequadpack.py in quad(func, a, b, args, full_output, epsabs, epsrel, limit, points, weight, wvar, wopts, maxp1, limlst)
    346 
    347     # check the limits of integration: int_a^b, expect a < b
--> 348     flip, a, b = b < a, min(a, b), max(a, b)
    349 
    350     if weight is None:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I would be really thankful, if someone knew how to do the fit of the function with the x-arguments as the lower boundary for the integral.

The data look like this:

0.20,0.40
0.21,0.40
0.22,0.40
0.23,0.40
0.24,0.40
0.25,0.40
0.26,0.40
0.27,0.40
0.28,0.40
0.29,0.40
0.30,0.40
0.31,0.40
0.32,0.40
0.33,0.40
0.34,0.40
0.35,0.40
0.36,0.40
0.37,0.40
0.38,0.40
0.39,0.40
0.40,0.40
0.41,0.40
0.42,0.39
0.43,0.39
0.44,0.38
0.45,0.38
0.46,0.37
0.47,0.37
0.48,0.35
0.49,0.34
0.50,0.33
0.51,0.31
0.52,0.30
0.53,0.28
0.54,0.26
0.55,0.24
0.56,0.21
0.57,0.19
0.58,0.16
0.59,0.14
0.60,0.12
0.61,0.10
0.62,0.09
0.63,0.07
0.64,0.06 
0.65,0.05
0.66,0.04
0.67,0.03
0.68,0.02
0.69,0.02
0.70,0.01
0.71,0.01
0.72,0.00
0.73,0.00
0.74,0.00
0.75,0.00
0.76,0.00
0.77,0.00
0.78,-0.00
0.79,0.00
0.80,0.00
0.81,-0.00
0.82,-0.00
0.83,-0.00
0.84,0.00
0.85,-0.00
0.86,0.00
0.87,0.00
0.88,0.00
0.89,-0.00
0.90,0.00

Answer

According to the documentation, scipy.integrate.quad does not take arrays, and it cannot call a function with arguments. So, we have to construct a helper function f within the function that is addressed by scipy.curve_fit:

import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize, integrate

#define a function that integrates or evaluates f depending on the Boolean flag func_integr
def cerf(x, a, b, c, func_integr=True):
    f = lambda x: a*np.exp(((-1)*(x-b)**2)/(2*(c**2)))
    #flag is preset to True, so will return the integrated values
    if func_integr:
        return np.asarray([integrate.quad(f, i, np.inf)[0] for i in x])
    #unless the flag func_integr is set to False, then it will return the function values
    else:
        return f(x)

#read file
arr=np.genfromtxt("test.txt", delimiter=",")
x_data = arr[:, 0]
y_data = arr[:, 1]

#provide reasonable start values...
start_p = [1, 0, -1]
#...for scipy.curve_fit
params, params_covariance = optimize.curve_fit(cerf, x_data, y_data, p0=start_p)
print(params)
#[ 2.26757666  0.56501062 -0.0704476 ]


#plot our results
plt.plot(x_data, y_data, 'x', label='Data')
plt.plot(x_data, cerf(x_data, *params), '-', label="fit")    
plt.legend(loc='best')
plt.show()

Sample output:

enter image description here

This approach is not the fastest – every x-value is integrated individually. Maybe there are other scipy.integrate functions that can work with numpy arrays; I would not know.
The part evaluating f instead of its integrated values is not really necessary here. But I used it initially to verify that cerf works as expected, so I left it in the script.



Source: stackoverflow