I have data that depends on 4 independent variables (x1,x2,x3,x4) and I need a model (available in Python) to evaluate f(x1,x2,x3,x4) outside the data points. In principle, if I set 3 of my variables as constant values I can always use a polynomial fit of a reasonable degree (<5) to interpolate the data in the remaining dimension so I would like to generate a function that is capable to interpolate in all dimensions at once using a multivariate polynomial fit. It must be noted that the underlying function is non-linear (meaning that I should expect terms of the form x1^n*x2^m where n,m are not 0). What do you recommend?
To illustrate I am including a small sample of data:
(Note that the fact that some variables appear to be constant is due to the fact that this is just a small sample)
x1 x2 x3 x4 f 15 10 5 3 0.621646 15 10 5 5 0.488879 15 10 5 10 0.490204 15 10 7 0 0.616027 15 10 7 0.5 0.615497 15 10 7 1 0.619804 15 10 7 3 0.614494 15 10 7 5 0.556772 15 10 7 10 0.555393 15 20 0.5 0 0.764692 15 20 0.5 0.5 0.78774 15 20 0.5 1 0.799749 15 20 0.5 3 0.567796 15 20 0.5 5 0.328497 15 20 0.5 10 0.0923708 15 20 1 0 0.802219 15 20 1 0.5 0.811475 15 20 1 1 0.822908 15 20 1 3 0.721053 15 20 1 5 0.573549 15 20 1 10 0.206259 15 20 2 0 0.829069 15 20 2 0.5 0.831135 15 0 7 1 0.240144 15 0 7 3 0.258186 15 0 7 5 0.260836
Advertisement
Answer
You can do multivariate curve fitting use the scipy.optimize.curve_fit()
function. It is well documented and there are multiple questions and answers on StackOverflow on using it for multivariate fitting.
For your case, something like this can help you start off
import numpy from scipy.optimize import curve_fit # Example function to fit to your data def non_linear_func(x, a, b, c, d): return x[0] ** a * x[1] ** b + x[2] ** c + x[3] ** d # X is your multivariate x data # f is your y data # p0 is an initial guess for your a,b,c,d... in your fitting function p0 = [1,2,3,4] fitParams, fitCov = curve_fit(non_linear_func, X, y, p0=p0)
A couple of things to note, you need to make sure that the X
and y
you pass to curve_fit()
have the correct dimensions. X
must have dimensions of N x M, where N is the number of data points you have, and M is the number of independent variables you have. y
should be of length N.
You must also define your fitting function based on the form that you would like, and try and give an initial guess, p0
, for the parameters in the function to help curve_fit
find the optimal values.
Hope that helps, there are lots of good answers on multivariate fitting with curve_fit()
on StackOverflow (see here and here) and the curve_fit documentation should be of help as well.