First time here?

Please see our demo on how to use the notebooks.


Linear Non-Linear Regressions (and Why They're Bad)

Consider the classical ways of determining Michaelis-Menten constants... In the old days, curve fitting was hard and a lot of effort was spent linearizing problems in order to avoid having to perform a nonlinear fit. One such method is the Lineweaver-Burk plot that uses the reciprocal of subtrate concentration vs the reciprocal of reaction rate. The problem here is that taking the reciprocal inverts the size of measurement errors, so small errors become large ones. Moreover, the constants are determined by the axis-intercepts, which may be some distance from the data and extrapolations over long distances is always dicey. Another method is the Hanes-Woolf plot which plots the substrate concentration divided by the reaction rate against the substrate concentration. The problem here is that both axes depend on the substrate concentration, so typical methods of error estimation are no longer valid.

These methods are still taught today because they provide an intuition into problem and because it's necessary to understand how the kinetics constants were found in older papers. However, we now have access to fast and ubiquitous computing. A much better solution is to use nonlinear fitting. For example, set up the Michaelis-Menten equation in gnuplot and use its fit() function, and you'll get a very quick, very accurate constant estimate with error estimates that actually mean something! For a more detailed comparison of the different methods and their errors, see the following paper: Current statistical methods for estimating the Km and Vmax of Michaelis-Menten kinetics (10.1016/S0307-4412(96)00089-1)

Here, we will work out an example using Python...

First, let's import everything we're going to need...

In [5]:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

Now, let's load in some kinetics data. You can download the example data, or you can use your own data if you would like. You may need to adjust the code below to suit your data.

In [6]:
A = np.loadtxt('vdata.asc')

Then we define the function we want to fit, which is just the Michaelis-Menten equation,

In [7]:
def michaelis_menten(s, km, vmax):
    return(s*vmax/(s+km))

Now, we use the nonlinear fitting function from SciPy. We're going to treat these answers as the "correct" ones and preserve them for later use.

In [8]:
opt, cov = curve_fit(michaelis_menten, A[:,0], A[:,1], p0=(0.5,0.5))
correct_km = opt[0]
correct_vmax = opt[1]
print('Km=%.3f, Vmax=%.3f' % (correct_km, correct_vmax))
Km=0.597, Vmax=0.690

The classic method of determining Km and Vmax is to use a double-reciprocal plot, also known as the Lineweaver-Burk Plot. The data is fit using a linear regression model, or $y = ax + b$. From this, we can derive $K_m = \frac{a}{b}$ and $V_{max} = \frac{1}{b}$. Now let's try it using Python...

In [9]:
(a,b) = np.polyfit(1.0/A[:,0], 1.0/A[:,1], 1)
km = a/b
vmax = 1.0/b
print('Km=%.3f, Vmax=%.3f' % (km, vmax))
print('Km error = %f' % (km - correct_km))
Km=0.441, Vmax=0.585
Km error = -0.155920

That's a pretty sizable error!

Handling Measurement Errors

Let's see how these two methods respond to different levels of measurement error. First, lets set up some functions that will let us compare the two methods...

In [10]:
ntries = 0
nonlin_km_errsum = 0.0
lin_km_errsum = 0.0
In [11]:
def compareFits(s, v):
    global ntries
    global nonlin_km_errsum
    global lin_km_errsum
    
    opt, cov = curve_fit(michaelis_menten, s, v, p0=(0.5, 0.5))
    km = opt[0]
    vmax = opt[1]
    
    si = 1.0 / s
    vi = 1.0 / v
    (a,b) = np.polyfit(si, vi, 1)
    km2 = a / b
    vmax2 = 1.0 / b
    
    plt.subplot(1,2,1)
    plt.plot(s,v,'ro')
    plt.plot(s, michaelis_menten(s, km, vmax), 'r-')
    plt.xlabel('[S]',labelpad=-2)
    plt.ylabel('V')
    text = 'Km = %.3f, Vm = %.3f' % (km, vmax)
    plt.figtext(0.15,0.85, text)
    err = abs(km - correct_km)
    nonlin_km_errsum += err
    text = 'Km error = %.3f' % err
    plt.figtext(0.15, 0.15, text)
    text = 'Avg Km error = %.3f' % (nonlin_km_errsum / ntries)
    plt.figtext(0.15, 0.0, text)
    plt.title('Nonlinear fit')
    
    plt.subplot(1,2,2)
    plt.plot(si, vi, 'go')
    y2 = np.polyval( [a, b], si)
    plt.plot(si, y2, 'g-')
    plt.xlabel('1/[S]', labelpad=-2)
    plt.ylabel('1/V')
    text = 'Km = %.3f, Vm = %.3f' % (km2, vmax2)
    plt.figtext(0.6, 0.85, text)
    err = abs(km2 - correct_km)
    lin_km_errsum += err
    text = 'Km error = %.3f' % err
    plt.figtext(0.6, 0.15, text)
    text = 'Avg Km error = %.3f' % (lin_km_errsum / ntries)
    plt.figtext(0.6, 0.0, text)
    lin_km_errsum += abs(km2 - correct_km) 
    plt.title('Lineweaver-Burk Plot')
    
    ntries += 1    
    plt.show()
    
def compareWithError(s, v, serror, verror):
    serr = (np.random.random(len(s)) * 2.0 - 1.0) * (serror / 100.0)
    verr = (np.random.random(len(v)) * 2.0 - 1.0) * (verror / 100.0)
    ve = v + v * verr
    se = s + s * serr
    compareFits(se, ve)
    

The following code block will now compare the nonlinear fit with the linearized (double-reciprocal) plot and show you the error in calculated Km along with the overall average error committed. There are two slides: serror and verror. These correspond to error introduced into the measurements of substrate concentration and reaction velocity. The slides allow you to add errors from 0% up to 20% of the value of each data point (drawn from a uniform distribution). Each time you move the slide, a new set of errors will be introduced. Try playing with both slides and see how the error in estimating Km (shown below the graphs) changes...

In [12]:
ntries = 1
nonlin_km_errsum = 0.0
lin_km_errsum = 0.0
interact(compareWithError, s = fixed(A[:,0]), v = fixed(A[:,1]),
         serror=widgets.FloatSlider(min=0.0, max=20.0, step=0.1, value=5),
         verror=widgets.FloatSlider(min=0.0, max=20.0, step=0.1, value=10));

Conclusions

In general, why should you not trust a linear regression when plotting your data on a log-scale (or your linearized data)? The fundamental assumption of linear least squares regression is that the errors in your measurements are normally distributed with mean 0 and have a constant variance. When you transform your data, you are now skewing your error distribution. This leads to unreliable estimates of your parameters. Certainly, there are cases where such a transformation could be appropriate and the linear regression still work, but you must be very careful when using that approach. In general, you want to look at a nonlinear least squares fitting method.

In [ ]: