#!/usr/bin/env python # coding: utf-8 # # Curve Fitting # Suppose that you want to fit a set of data points $(x_i,y_i)$, where # $i = 1,2,\ldots,N$, to a function that can't be linearized. For example, the function could be a second-order polynomial, $f(x,a,b,c)=ax^2+bx+c$. There isn’t an analytical expression for finding the best-fit parameters ($a$, $b$, and $c$ in this example) as there is for linear regression with uncertainties in one dimension. The usual approach is to optimize the parameters to minimize the sum of the squares of the differences between the data and the function. How the difference are defined varies. If there are only uncertainties in the y direction, then the differences in the vertical direction (the gray lines in the figure below) are used. If there are uncertainties in both the $x$ and $y$ directions, the orthogonal (perpendicular) distances from the line (the dotted red lines in the figure below) are used. # In[1]: from IPython.display import Image Image(filename="Normal_vs_ODR.png",width=400) # Image from http://blog.rtwilson.com/orthogonal-distance-regression-in-python/ # # For the case where there are only uncertainties in the y direction, if the uncertainty in $y_i$ is $\sigma_i$, then the difference squared for each point is weighted by $w_i=1/\sigma_i^2$. If there are no uncertainties, each point is given an equal weight of one and the results should be used with caution. The function to be minimized with # respect to variations in the parameters is # $$ # \chi^2 = \sum_{i=1}^N w_i \left[y_i - f\left(x_i,a,b,c\right)\right]^2. # $$ # For the case where there are uncertainties in both $x$ and $y$, the function to be minimized is more complicated. # # The **`general_fit`** function that performs this minimization is defined in the file "fitting.py", which must be in the same drectory as the Python program. If there are only uncertainties in the y direction, it uses the **`curve_fit`** function from the "scipy" library to find the best-fit parameters. If there are uncertainties in both $x$ and $y$, the **`odr`** package from the "scipy" library is used. # An example of performing a fit with uncertainties in only the $y$ direction is shown below. The first command imports the **`general_fit`** function. Second, the function (**`fitfunc`**) to be fit is defined. The function could have more than 3 parameters. Inital guesses at the parameters are placed in the list $p0$. The name of the fitting function, arrays containing the data points ($x$ and $y$), the initial guesses at the parameters, and an array of uncertainties ($yerr$) are sent to the **`general_fit`** function. If it succeeds, the **`general_fit`** function returns arrays with the optimal parameters and estimates of their uncertainties (in lists called $popt$ and $punc$), the reduced chi squared ($rchi2$), and the degrees of freedom ($dof$). # In[1]: from fitting import * # Define the fitting function def fitfunc(x, a, b, c): return (a*x**2 + b*x + c) x = array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0]) y = array([-1.78, 4.09, 8.85, 17.9, 26.1, 35.2]) yerr = array([1.24, 1.46, 1.05, 1.68, 1.18, 1.56]) p0 = [1.0, 3.0, -2.0] popt, punc, rchi2, dof = general_fit(fitfunc, x, y, p0, yerr) print('optimal parameters: ', popt) print('uncertainties of parameters: ', punc) # For this example, the optimal parameter are # # $$ a = 0.61500361\\ # b = 4.33082898\\ # c = -1.66737444 $$ # # and their uncertainties are # # $$ \sigma_a = 0.11813261 \\ # \sigma_b = 0.57818057 \\ # \sigma_c = 0.50099218 $$ # # If the initial guesses at the parameters are not reasonable, the optimization may fail. It is often helpful to plot the data first to help make a good guesses at the parameters. # If your performing a fit with uncertainties in both the $x$ and $y$ direction, arrays containing the data points ($x$ and $y$) and their uncertainties ($yerr$ and $xerr$) are sent to the **`general_fit`** function as follows: #