Curve Fitting using Numpy Polyfit, estimate constant on function with Square Root

Tags: , ,



First of All, Sorry for my poor English and Thanks for clicking this Question.

I already have x and y data sets, so I want to do curve fitting with my data sets.

and estimated Model is enter image description here

then How can I estimate constants of this Model by polyfit?

I know

np.polyfit(x,y,1)

means Linear Equation Estimating. (1 means Linear)

but How can I estimate using another equation like square root with three or more constants with my data sets.

Answer

You can use scipy.optimize.curve_fit, here is an example how you can do this

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x,a,b,c):
    return a * np.sqrt(x - b) + c

x = np.linspace(2,20,100)
y = func(x,2,-2,3)
y_true = y + 0.1*np.random.normal(size=len(x))

popt, pcov = curve_fit(func,x,y_true)
y_pred = func(x,*popt)

fig,ax = plt.subplots(figsize=(8,6))
ax.scatter(x,y_true,c='r',label='true',s=6)
ax.plot(x,y_pred,c='g',label='pred')
ax.legend(loc='best')

this will give you

result

The array popt is the list of (a,b,c) values.


UPDATE

After testing curve_fit using the real dataset provided by reaver lover, I was surprised to find that curve_fit can fail on this relatively simple regression task.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x,a,b,c):
    print('%.3f, %.3f, %.3f' % (a,b,c))
    return a * np.sqrt(x - b) + c

x = np.array([5, 11, 15, 44, 60, 70, 75, 100, 120, 200])
y_true = np.array([2.492, 8.330, 11.000, 19.394, 24.466, 27.777, 29.878, 26.952, 35.607, 46.966])

popt, pcov = curve_fit(func,x,y_true)
popt = [2.252, 5.000, 6.908]
y_pred = func(x,*popt)

fig,ax = plt.subplots(figsize=(8,6))
ax.scatter(x,y_true,c='r',label='true',s=6)
ax.plot(x,y_pred,c='g',label='pred')
ax.legend(loc='best')

Running this script, you will find the list of coefficients (a,b,c) somehow becomes (nan,nan,nan) near the end of optimization. However, the last (a,b,c) that is not (nan,nan,nan) found by curve_fit has already been good enough, as you can see in the plot

output

I’m really clueless why curve_fit can fail.



Source: stackoverflow