$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}$

Given $N$ observations, $\xv_n$, for $n=1,\ldots,N$, and target values, $t_n$, for $n=1,\ldots,N$, what is the simplest model, $g(\xv)$, you can think of?

$$ g(\xv) = 0 $$or maybe

$$ g(\xv) = c $$What is next simplest model?

$$ \begin{align*} g(\xv;\wv) &= w_0 + w_1 x_1 + w_2 x_2 + \cdots + w_D x_D \\ &= w_0 + \sum_{i=1}^D w_i x_i \\ & = \sum_{i=0}^D w_i x_i \mbox{, where } x_0 = 1 \\ &= \wv^T \xv \end{align*} $$- This is nice because it is linear in the parameters $\wv$; optimizations based on derivatives might be solvable analytically.
- This is not so nice, because it is also linear in the inputs, $\xv$; greatly limits the complexity of the model.
- But, a model linear in the inputs might be the best you can do for many cases, such as a sparsely sampled distribution, process, population, thing...whatever it is you want to model.

Force exerted by spring is proportional to its length. The potential energy stored in the spring is proportinal to the square of its length. Say we want the rod to settle at position that minimizes the potential energy in the springs. $$ \begin{align*} \sum_{n=1}^N (t_n - g(\xv_n;\wv))^2 \end{align*} $$

If $g$ is an affine (linear + constant) function of $x$, $$ g(\xv;\wv) = w_0 + w_1 x $$ with parameters $\wv = (w_0, w_1)$, which parameter values give best fit? $$ \wv_{\mbox{best}} = \argmin{\wv} \sum_{n=1}^N (t_n - g(x_n ; \wv))^2 $$

Set derivative (gradient) with respect to $\wv$ to zero and solve for $\wv$. Let's do this with matrices.

Collect all targets into matrix $T$ and $x$ samples into matrix $X$. ($N$=number samples, $D$=sample dimensionality) $$ \begin{align*} T &= \begin{bmatrix} t_1 \\ t_2 \\ \vdots \\ t_N \end{bmatrix} \\ X &= \begin{bmatrix} x_{1,0} & x_{1,1} & x_{1,2} & \dotsc & x_{1,D} \\ x_{2,0} & x_{2,1} & x_{2,2} & \dotsc & x_{2,D} \\ \vdots \\ x_{N,0} & x_{N,1} & x_{N,2} & \dotsc & x_{N,D} \end{bmatrix}\\ \wv &= \begin{bmatrix} w_0 \\ w_1 \\ \vdots \\ w_D \end{bmatrix} \end{align*} $$

Collection of all differences is $T - X\wv$, which is an $N \times 1$ matrix. To form the square of all values and add them up, just do a dot product $(T-X\wv)^T (T-X\wv)$. This only works if the value we are predicting is a scalar, which means $T$ is a column matrix. If we want to predict more than one value for each sample, $T$ will have more than one column. Let's continue with the derivation assuming $T$ has $K$ columns, meaning we want a linear model with $K$ outputs.

To find the best value for $\wv$, we take the derivative the the sum of squared error objective, set it equal to 0 and solve for $\wv$.

$$ \begin{align*} \frac{\partial \sum_{n=1}^N (\tv_n - g(\xv_n;\wv))^2}{\partial \wv} &= -2 \sum_{n=1}^N (\tv_n - g(\xv_n;\wv) \frac{\partial g(\xv_n;\wv)}{\partial \wv}\\ &= -2 \sum_{n=1}^N (\tv_n - \xv_n^T \wv) \frac{\partial \xv_n^T \wv}{\partial \wv}\\ &= -2 \sum_{n=1}^N (\tv_n - \xv_n^T \wv) \xv_n^T \end{align*} $$Here's where we get the benefit of expressing the $\xv_n$ and $t_n$ samples as matrices. The sum can be performed with a dot product:

$$ \begin{align*} \frac{\partial \sum_{n=1}^N (\tv_n - g(\xv_n;\wv))^2}{\partial \wv} &= -2 \sum_{n=1}^N (\tv_n - \xv_n^T \wv) \xv_n^T\\ &= -2 \Xv^T (\Tv - \Xv \wv) \end{align*} $$Check the sizes and shapes of each matrix in the last equation above.

Now we can set this equal to zero and solve for $\wv$.

$$ \begin{align*} -2 \Xv^T (\Tv - \Xv \wv) &= 0\\ \Xv^T (\Tv - \Xv \wv) &= 0\\ \Xv^T \Tv &= \Xv^T \Xv \wv\\ \wv &= (\Xv^T \Xv)^{-1} \Xv^T \Tv \end{align*} $$And, in python

```
w = np.linalg.inv(X.T @ X), X.T @ T)
```

or, you may use the *solve* function that assumes $\Xv^T \Xv$ is full rank (no linearly dependent columns),

```
w = np.linalg.solve(X.T @ X, X.T @ T)
```

or, better yet, use the *lstsq* function that does not make that assumption.

```
w = np.linalg.lstsq(X.T @ X, X.T @ T))
```

The `lstsq`

and `solve`

functions can be written with simpler arguments, like

```
w = np.linalg.lstsq(X, T))
```

because they are designed to find the value of $\wv$ that minimized the squared error between $X \wv$ and $T$. Using the matrix products as arguments will simplify our implementation of ridge regression a little bit later in this course. For now, let's use the simpler version.

Try the automobile data set at UCI Machine Learning Database
Repository. Download
*auto-mpg.data* and *auto-mpg.names* from the "Data Folder" link
near the top of the web page.

In [ ]:

```
!curl -O http://mlr.cs.umass.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data
!curl -O http://mlr.cs.umass.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.names
```

First, take a look at *auto-mpg.names*. There you will learn that there are 398 samples, each with 8 numerical attributes and one string attribute. Their names are

- mpg: continuous
- cylinders: multi-valued discrete
- displacement: continuous
- horsepower: continuous
- weight: continuous
- acceleration: continuous
- model year: multi-valued discrete
- origin: multi-valued discrete
- car name: string (unique for each instance)

First step, read it into python and look at it.

In [ ]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
```

Take a look at a few lines of the data file to figure out how to read it in.

In [ ]:

```
!head -40 auto-mpg.data
```

Instead of relying on the *pandas* package to read this data, let's try using the simpler *numpy.loadtxt* function.
We see that each line contains a sample consisting of 8 numbers and
one string. However, first we have to figure out how to deal with the
missing values. The documentation for *numpy.loadtxt* reveals that
the *converters* argument can be used to specify a function to apply
to the values from particular columns of data. We can use this to
deal with the missing values. Looking at *auto-mpg.data* we see
that the missing values are marked with question marks. These are all in the fourth column, the horsepower attribute.

Here is a simple function that translates '?' to NaN (*np.nan*) and anything else is converted to a floating point number.

In [ ]:

```
def missingIsNan(s):
if s == b'?': # single character read as b'?' != '?'
return np.nan
else:
return float(s)
print(missingIsNan('12.32'))
print(missingIsNan(b'?'))
```

Hey, why not get fancy and write a one-liner?

In [ ]:

```
def missingIsNan(s):
return np.nan if s == b'?' else float(s)
print(missingIsNan('12.32'))
print(missingIsNan(b'?'))
```

The *converters* argument to *np.loadtxt* accepts a python dictionary, which is python's associative array structure.

In [ ]:

```
d = {1: 'a', 2: 'yo', 3: 42, 4: missingIsNan}
d
```

In [ ]:

```
d[1]
```

In [ ]:

```
d[4]
```

In [ ]:

```
d[4](b'?')
```

Let's also restrict *np.loadtxt* to reading just the first 8 columns to avoid dealing with the string in the 9th column.

In [ ]:

```
data = np.loadtxt('auto-mpg.data', usecols=range(8), converters={3: missingIsNan})
```

In [ ]:

```
data.shape
```

In [ ]:

```
data[:3,:]
```

We can change the precision that ipython uses to display floating point values.

In [ ]:

```
%precision 3
```

In [ ]:

```
data[:3,:]
```

As we have done before, we must find the missing values. Let's just remove the samples with missing values.

In [ ]:

```
np.isnan(data)
```

In [ ]:

```
np.sum(np.isnan(data))
```

In [ ]:

```
np.sum(np.isnan(data), axis=0)
```

What does this result tell us?

Yep, all 6 *NaN* values are in the fourth column, the horsepower column.

Let's just
remove those 6 samples from the array. To do this build a boolean
mask of length equal to the number of rows in *data* that is
*False* except for any row that contains a NaN. We can use the
*np.any* or *np.all* methods to combine boolean values in each row or column. First, always try
such ideas in multiple steps, checking each step.

In [ ]:

```
nans = np.isnan(data)
nans.shape
```

In [ ]:

```
nans.any(axis=0).shape
```

In [ ]:

```
nans.any(axis=1).shape
```

That's the one we want, a boolean value for each row, or sample.

In [ ]:

```
goodRowsMask = nans.any(axis=1)
goodRowsMask
```

In [ ]:

```
dataNew = data[goodRowsMask,:]
dataNew.shape
```

Wait a minute! This gives us only 6 samples. We wanted all samples but these 6, right?

In [ ]:

```
dataNew
```

So, let's change all *False* values to *True* and *True* values to *False*.

In [ ]:

```
goodRowsMask = np.logical_not(goodRowsMask)
```

In [ ]:

```
dataNew = data[goodRowsMask,:]
dataNew.shape
```

In [ ]:

```
dataNew[:3,:]
```

In [ ]:

```
np.sum(np.isnan(dataNew),axis=0)
```

Remember, the next step after reading data into python is to visualize it. One thing we can do is just plot the value of each attribute in a separate graph. Let's make an array of column names to label the y axes.

In [ ]:

```
names = ['mpg','cylinders','displacement','horsepower','weight',
'acceleration','year','origin']
```

In [ ]:

```
plt.figure(figsize=(10, 10))
nrow, ncol = dataNew.shape
for c in range(ncol):
plt.subplot(3, 3, c+1)
plt.plot(dataNew[:, c])
plt.ylabel(names[c])
```

What is interesting to you in these graphs?

Let's try to predict miles per gallon, *mpg*, from the other attributes. To set this up, let's make a 392 x 1 column vector, *T*, of target values containing all of the *mpg* values, and a 392 x 7 matrix, *X*, to hold the inputs to our model.

In [ ]:

```
names
```

In [ ]:

```
T = dataNew[:, 0:1] # dataNew[:,0] results in a one-dimensional matrix, dataNew[:,0:1] preserves its two-dimensional nature.
```

In [ ]:

```
X = dataNew[:, 1:]
```

In [ ]:

```
X.shape, T.shape
```

In [ ]:

```
Xnames = names[1:]
Tname = names[0]
Xnames,Tname
```

Now, let's see if a linear model makes some sense by plotting the target values versus each of the input variables.

In [ ]:

```
plt.figure(figsize=(10, 10))
for c in range(X.shape[1]):
plt.subplot(3,3, c+1)
plt.plot(X[:, c], T, 'o', alpha=0.5)
plt.ylabel(Tname)
plt.xlabel(Xnames[c])
```

What do you think? Are there any linear relationships between the individual input variables and the target variable? Do they make sense, given your knowledge of automobiles?

Now, to build the linear model. First, let's tack on a column of constant 1 values to the left side of *X*. With this addition, our matrix multiplication takes care of the $w_0$ coefficient as described above. Here are two ways to add the column of 1's. First, we can use `np.hstack`

.

In [ ]:

```
X1 = np.hstack((np.ones((X.shape[0], 1)), X))
X.shape, X1.shape
```

In [ ]:

```
X1[:3, :]
```

We can also use `np.insert`

whose arguments are `np.insert(array, where, value, axis)`

, so to add 1's as a column at index 0, we can do this.

In [ ]:

```
X1 = np.insert(X, 0, 1, 1)
X.shape, X1.shape
```

And, let's add a name to *Xnames*.

In [ ]:

```
Xnames.insert(0, 'bias')
Xnames
```

We could try to fit a linear model to all of the data and check to see
how accurately we predict *mpg* for each sample. However, this will
give a much too optimistic expectation of how well the model will do
on new data.

A much better way to evaluate a model is to remove, or hold out, some
of the samples from the data set used to fit the model. Then apply
the model to the held out samples and compare the predicted *mpg*
with the actual *mpg*. Call these held out samples the **test set**
and the other samples used to fit the model the **train set**.

How should we partition the data into these two disjoint subsets? A common practice is to randomly select 80% of the samples as training samples and use the remaining 20% of the samples as testing samples.

To partition our samples *mpg* (rows of X and T ) into training and test sets, let's deal with just the row indices. Then we will use the same selected row indices to slice out corresponding rows of X and T.

In [ ]:

```
nrows = X1.shape[0]
nTrain = int(round(nrow*0.8))
nTest = nrow - nTrain
nTrain, nTest, nTrain+nTest
```

In [ ]:

```
rows = np.arange(nrows)
np.random.shuffle(rows)
rows
```

In [ ]:

```
trainIndices = rows[:nTrain]
testIndices = rows[nTrain:]
trainIndices, testIndices
```

Check that the training and testing sets are disjoint.

In [ ]:

```
np.intersect1d(trainIndices, testIndices)
```

In [ ]:

```
Xtrain = X1[trainIndices, :]
Ttrain = T[trainIndices, :]
Xtest = X1[testIndices, :]
Ttest = T[testIndices, :]
Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
```

We want to solve for $\wv$ in the equation $X^T X \wv = X^T T$.
This is done with the *numpy.linalg.lstsq* function. Don't be confused by the `T`

s. Remember *something.T* is the
transpose of *something*.

In [ ]:

```
w = np.linalg.lstsq(Xtrain.T @ Xtrain, Xtrain.T @ Ttrain, rcond=None)
w = w[0] # to only keep the weights, and discard other information returned by lstsq
w
```

In [ ]:

```
for wi,name in zip(w.flat, Xnames):
print('{:8.3f} {:s}'.format(wi, name))
```

How can you figure out what *np.linalg.lstsq* is doing? Try finding the source code!

In [ ]:

```
!locate linalg.py
```

In my version I see documentation for *lstsq* that states

```
Return the least-squares solution to a linear matrix equation.
Solves the equation `a x = b` by computing a vector `x` that
minimizes the Euclidean 2-norm `|| b - a x ||^2`. The equation may
be under-, well-, or over- determined (i.e., the number of
linearly independent rows of `a` can be less than, equal to, or
greater than its number of linearly independent columns). If `a`
is square and of full rank, then `x` (but for round-off error) is
the "exact" solution of the equation.
```

Now that we have a linear model, which is simply $w$, let's use it to predict *mpg* for the first four samples.

In [ ]:

```
predict = X1[:4,:] @ w
predict
```

How do these predictions compare with the actual *mpg* values? We can either make a two column matrix, or use a *for* loop to print them.

In [ ]:

```
np.hstack(( predict, Ttrain[:4, :]))
```

In [ ]:

```
print('{:^5} {:^5}'.format('P', 'T'))
for (p, t) in zip(predict, Ttrain[0:4, :]):
# print(p,t)
print('{:5.2f} {:5.2f}'.format(p[0], t[0]))
```

Let's try all of the test data and plot the results.

In [ ]:

```
predict = Xtest @ w
predict.shape, Ttest.shape
```

In [ ]:

```
plt.plot(predict, Ttest, 'o')
plt.xlabel('Predicted MPG')
plt.ylabel('Actual MPG')
# add a 45 degree line
a = max(min(predict), min(Ttest))
b = min(max(predict), max(Ttest))
plt.plot([a, b], [a ,b], 'r', linewidth=3, alpha=0.7);
```

Not too shabby! But, how about a numerical measure of accuracy?

Right, just calculate the root-mean-square error (RMSE).

In [ ]:

```
np.sqrt( np.mean( (Xtest @ w - Ttest)**2))
```

This means we are about 3 mpg off in our predictions, on average.

Here again are the weights we learned before for predicting *mpg*.
Can we use these to decide which attibutes are most important? Can
we remove some from the model?

In [ ]:

```
for wi, name in zip(w.flat, Xnames):
print('{:8.3f} {:s}'.format(wi, name))
```

Perhaps year and origin are the most significant. Does this make sense?

A weight magnitude is determined not only by the linear relationship between the corresponding input variable with the target, but also by the variable's range.

What are the ranges of the input variables?

In [ ]:

```
for n, mn, mx in zip(Xnames, np.min(X1, axis=0), np.max(X1, axis=0)):
print('{:>20} {:8.2f} {:8.2f}'.format(n, mn, mx))
```

Remember that you can also call `numpy`

functions as methods applied to `numpy`

array objects.

In [ ]:

```
for n, mn, mx in zip(Xnames, X1.min(axis=0), X1.max(axis=0)):
print('{:>20} {:8.2f} {:8.2f}'.format(n, mn, mx))
```

The weight for *weight* is the smallest magnitude, but the range of its values are the largest.

The weight for *origin* is the largest, but the range of its values are the smallest.

If we could remove the effect of the ranges on the weight magnitudes, we could use the weight magnitudes to judge the relative importance of each input variable. How?

A common approach is to *standardize* the input variables by adjusting the values to have zero mean and unit variance.

In [ ]:

```
Xs = (X - X.mean(axis=0)) / X.std(axis=0)
Xs.shape
```

In [ ]:

```
Xs.mean(0)
```

In [ ]:

```
Xs.std(0)
```

To do this correctly when partitioning data into training and testing sets, we must always calculate means and standard deviations using only the training set, and use the same means and standard deviations when standardizing the testing set. Remember, you must not use any information about the testing set when building a model. If you do, your test error will be lower than it will be when you truly see new data.

You can do this by storing the means and standard deviations obtained from the training data, and just use them when standardizing the testing data. Don't forget to add the column of 1's after standardizing.

In [ ]:

```
means = X.mean(0)
stds = X.std(0)
Xs = (X - means) / stds
Xs1 = np.insert(Xs, 0, 1, 1)
w = np.linalg.lstsq( Xs1.T @ Xs1, Xs1.T @ T, rcond=None)[0] # What is the [0] doing here??
w
```

Another way is to construct functions for standardizing that include the calculated means and standard deviations as local variables, by using function closures.

In [ ]:

```
def makeStandardize(X):
means = X.mean(axis=0)
stds = X.std(axis=0)
def standardize(origX):
return (origX - means) / stds
def unStandardize(stdX):
return stds * stdX + means
return (standardize, unStandardize)
```

Let's start with *X* again, and tack on the column of 1's after dividing data into training and testing partitions. What happens if you tack on the constant column of 1's before standardizing?

In [ ]:

```
Xtrain = X[trainIndices, :]
Ttrain = T[trainIndices, :]
Xtest = X[testIndices, :]
Ttest = T[testIndices, :]
(standardize, unStandardize) = makeStandardize(Xtrain)
XtrainS = standardize(Xtrain)
XtestS = standardize(Xtest)
XtrainS.mean(0), XtrainS.std(0), XtestS.mean(0), XtestS.std(0)
```

Notice that the means and standard deviations for the testing set are not as close to 0 and 1 as they are for the training set. Why?

Now we can tack on the column of 1's, solve for *w*, and examine the weight magnitudes.

In [ ]:

```
XtrainS1 = np.insert(XtrainS, 0, 1, 1)
XtestS1 = np.insert(XtestS, 0, 1, 1)
w = np.linalg.lstsq( XtrainS1.T @ XtrainS1, XtrainS1.T @ Ttrain, rcond=None)[0] # see this [0]?
for wi, name in zip(w.flat, Xnames):
print('{:8.3f} {:s}'.format(wi, name))
```

Now what do you observe about the relative magnitudes? If you had a ton of input variables, it would be easier to see if we sorted them by their magnitudes.

In [ ]:

```
np.abs(w)
```

In [ ]:

```
np.argsort(np.abs(w.flat))
```

In [ ]:

```
np.argsort(np.abs(w.flat))[::-1]
```

In [ ]:

```
sortedOrder = np.argsort(np.abs(w.flat))[::-1]
Xnames = np.array(Xnames)
for wi, name in zip(w.flat[sortedOrder], Xnames[sortedOrder]):
print('{:8.3f} {:s}'.format(wi, name))
```

The beauty of matrix equations now shines. Previously $T$ was an $N \times 1$ matrix of target values, one per sample.

Just add additional columns of target values for each target component. So $T$ becomes an $N \times K$ matrix, if there are $K$ output components. Each row is one $K$-dimensional sample.

Looking again at the solution for $\wv$,

$$ \begin{align*} \wv &= (X^T X)^{-1} X^T T \end{align*} $$we see that this changes $\wv$ from its original form of $D \times 1$ to $D \times K$.

Let's use this to predict miles per gallon **and** horsepower.

First, let's assemble the data for predicting MPG and HP.

In [ ]:

```
X = dataNew[:, [1, 2, 4, 5, 6, 7]]
T = dataNew[:, [0, 3]]
Tnames = [names[0], names[3]]
X.shape, Xnames, T.shape, Tnames
```

In [ ]:

```
Xtrain = X[trainIndices, :]
Ttrain = T[trainIndices, :]
Xtest = X[testIndices, :]
Ttest = T[testIndices, :]
Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
```

In [ ]:

```
standardize,_ = makeStandardize(Xtrain)
XtrainS = standardize(Xtrain)
XtestS = standardize(Xtest)
XtrainS1 = np.insert(XtrainS, 0, 1, 1)
Xnames = np.array(['bias'] +names)[[0, 2, 3, 5, 6, 7, 8]]
Xnames
```

In [ ]:

```
XtrainS1.shape, Ttrain.shape
```

In [ ]:

```
w = np.linalg.lstsq( XtrainS1.T @ XtrainS1, XtrainS1.T @ Ttrain, rcond=None)[0]
w
```

In [ ]:

```
Xnames = np.array(Xnames)
for targeti in range(2):
print('\n Weights for {} target\n'.format(Tnames[targeti]))
thisw = w[:, targeti]
sortedOrder = np.argsort(np.abs(thisw))[::-1]
for wi ,name in zip(thisw[sortedOrder], Xnames[sortedOrder]):
print('{:8.3f} {:s}'.format(wi, name))
```

Now let's use our single model to predict both mpg and horsepower.

In [ ]:

```
XtestS1 = np.insert(XtestS, 0, 1, 1)
prediction = XtestS1 @ w
prediction.shape
```

Or do the insertion of a column of 1's and the multiplication by the weight matrix in one statement.

In [ ]:

```
prediction = np.insert(XtestS, 0, 1, 1) @ w
prediction.shape
```

In [ ]:

```
plt.figure(figsize=(10,10))
for p in range(2):
plt.subplot(2, 1, p+1)
plt.plot(prediction[:, p], Ttest[:, p], 'o')
plt.xlabel("Predicted " + Tnames[p])
plt.ylabel("Actual " + Tnames[p])
a = max(min(prediction[:, p]), min(Ttest[:, p]))
b = min(max(prediction[:, p]), max(Ttest[:, p]))
plt.plot([a, b], [a, b], 'r', linewidth=3)
```

How well did we do in terms of RMSE?

In [ ]:

```
rmseTrain = np.sqrt(np.mean((XtrainS1 @ w - Ttrain)**2, axis=0))
rmseTrain
```

In [ ]:

```
rmseTest = np.sqrt(np.mean((XtestS1 @ w - Ttest)**2, axis=0))
rmseTest
```

In [ ]:

```
print('Training RMSE: MPG {:4.2f} HP {:4.2f}'.format(*rmseTrain)) #what is that * doing there??
print(' Testing RMSE: MPG {:4.2f} HP {:4.2f}'.format(*rmseTest))
```

In [ ]:

```
```