Linear Regression of a Nonlinear Model

CH EN 2450 - Numerical Methods

Prof. Tony Saad (www.tsaad.net)
Department of Chemical Engineering
University of Utah

Synopsis

In this lesson, you will exercise your knowledge on how to perform linear least-squares regression on a nonlinear exponential model.

Problem Statement

Chemical reacting systems consist of different substances reacting with each other. These reactants have different reaction rates. A reaction rate is the speed at which reactants are converted into products. A popular model for the reaction rate is known as the Arrhenius reaction rate model given by \begin{equation} k(T) = A e^{-\frac{E_\text{a}}{RT}} \label{eq:arrhenius} \end{equation} where $k$ is known as the reaction rate constant, $T$ is the absolute temperature (in Kelvin), $A$ is the pre-exponential factor and is a measure of the frequency of molecular collisions, $E_\text{a}$ is the activation energy or the energy required to get the reaction started, and $R$ is the universal gas constant.

We can typically measure $k$ for a given temperature and chemical reaction. Our goal is to find $A$ and $E_\text{a}$ given those data.

T (K) k (1/s)
313 0.00043
319 0.00103
323 0.00180
328 0.00355
333 0.00717

First, let's get some boiler plate out of the way

In [19]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
import numpy as np

Now input the data from the table into numpy arrays

In [20]:
R=8.314
T=np.array([313, 319, 323, 328, 333])
k=np.array([0.00043, 0.00103, 0.00180, 0.00355, 0.00717])

As usual, you should generate a plot to get an idea of how the data looks like

In [21]:
plt.plot(T,k,'o')
plt.xlabel('Temperature (K)')
plt.ylabel('k (1/s)')
plt.title('Arrhenius reaction rate')
plt.grid()

The data do indeed look like they follow an exponential trend. We will use linear regression now to fit this model. We will first take the logarithm of the Arrhenius equation \begin{equation} \ln k = \ln A + - \frac{1}{RT} E_\text{a} \end{equation} where we will choose our new y-values as $y_\text{new} = \ln (k)$ and the new x-values as $x_\text{new} = - \frac{1}{RT}$. We will also call $a_0 = \ln{A}$ and $a_1=E_\text{a}$ so that our new model fit is \begin{equation} y_\text{new} = a_0 + a_1 x_\text{new} \end{equation}

To program this, we will first create arrays for $y_\text{new}$ and $x_\text{new}$

In [22]:
ynew = np.log(k)
xnew = -1.0/R/T

Let's plot the transformed data to see how they look like

In [23]:
plt.plot(xnew,ynew,'o')
plt.xlabel('-1/RT')
plt.ylabel('log(k)')
plt.title('Arrhenius reaction rate')
plt.grid()

As expected, the transformed data do follow a linear trend. Now we can apply the usual linear least-squares regression to these transformed data.

Let's use the normal equations. In this case, we have

In [24]:
N = len(ynew)
A = np.array([np.ones(N), xnew]).T
ATA = A.T @ A
coefs = np.linalg.solve(ATA, A.T @ ynew)
print(coefs)
[3.89245804e+01 1.21481509e+05]

The variable coefs contains the coefficients of the straight line fit. We can untangle that using poly1d

In [25]:
p = np.poly1d(np.flip(coefs,0)) # must reverse array for poly1d to work

Finally, we can plot the fit over the transformed data

In [26]:
plt.plot(xnew, p(xnew), label='best fit')
plt.plot(xnew, ynew, 'o', label='original data')
plt.xlabel('-1/RT')
plt.ylabel('log(k)')
plt.title('Arrhenius reaction rate')
plt.grid()

we can also calculate the $R^2$ value. Recall that the $R^2$ value can be computed as: \begin{equation}R^2 = 1 - \frac{\sum{(y_i - f_i)^2}}{\sum{(y_i - \bar{y})^2}}\end{equation} where \begin{equation} \bar{y} = \frac{1}{N}\sum{y_i} \end{equation}

In [27]:
ybar = np.average(ynew)
fnew = p(xnew)
rsq = 1 - np.sum( (ynew - fnew)**2 )/np.sum( (ynew - ybar)**2 )
print(rsq)
0.9998570770329075

This is an unbelievable fit! But it is correct since the data was actually setup to produce such a high R2 value.

Now we need to compute the original coefficients of the model, $A$ and $E_\text{a}$. This can be done by inverting the original transfomration \begin{equation} A = e^{a_0} \end{equation} and \begin{equation} E_\text{a} = a_1 \end{equation}

In [38]:
a0 = coefs[0]
a1 = coefs[1]
A = np.exp(a0)
Ea = a1
In [39]:
# construct a function  for the regressed line
y_fit = lambda x: A * np.exp(-Ea/R/x)

# To produce a nice smooth plot for the fit, generate points between Tmin and Tmax and use those as the x data
Tpoints = np.linspace(min(T), max(T))
plt.plot(Tpoints, y_fit(Tpoints), label='Best fit' )

# now plot the original input data
plt.plot(T, k, 'o', label='Original data')

plt.xlabel('Temperature (K)')
plt.ylabel('Reaction rate, k (1/s)')
plt.title('Arrhenius reaction rate')
plt.legend()
plt.grid()
In [37]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()
Out[37]:
CSS style adapted from https://github.com/barbagroup/CFDPython. Copyright (c) Barba group