This is an jupyter notebook. Lectures about Python, useful both for beginners and experts, can be found at http://scipy-lectures.github.io.

Open the notebook by (1) copying this file into a directory, (2) in that directory typing jupyter-notebook and (3) selecting the notebook.

Written By: **Riddhish Bhalodia**

In this exercise, we will learn about different kernal methods for pattern recognition. Reader should be familiar with basic machine learning, estimation theory, linear algebra and a good command over regression.

**Dependencies**: numpy, matplotlib

*Dual Representation* is an alternate way of formulating many **linear models** used for regression and classification problems. Here the predictions/ estimates are based on the linear combination of a ** kernal function** evaluated at the training points. To get to what exactly kernal functions are and how they are formulated we consider the linear regression least squares error

By setting the gradient of the error term (1) wrt the vector $\textbf{w}$ to be zero, we get the solution as linear conbination of the vectors $\phi (\textbf{x}_n)$ where the weights are a function of $\textbf{w}$, as follows.

$$\textbf{w} = -\frac{1}{\lambda} \sum \limits_{n=1}^N \{\textbf{w}^T \phi(\textbf{x}_n) - \textbf{t}_n\}\phi (\textbf{x}_n) = \pmb{\phi}^T\textbf{a} \quad \quad \quad (2)$$The $\pmb{\phi}^T$ is called a *Design Matrix* and it's N'th row is given by $\phi(\textbf{x}_n)$. The vector $\textbf{a} = (a_1, a_2, \cdots, a_N)^T$ is given by

Now, we would like to represent the $J(\textbf{w})$ in terms of $\textbf{a}$ and hence we substitue the equation (3) in equation (1). So we get

$$J(\textbf{a}) = \frac{1}{2}\pmb{a^T \phi \phi^T \phi \phi^T a} - \pmb{a^T \phi \phi^T t} + \frac{1}{2}\pmb{t^T t} + \frac{\lambda}{2}\pmb{a^T \phi \phi^T a} \quad \quad \quad (4)$$Now we define something called a *Gram Matrix* as

Gram martix is a symmetric NxN matrix whose each element is given by

$$K_{nm} = \phi(\textbf{x}_n)^T\phi(\textbf{x}_m) = k(\textbf{x}_n, \textbf{x}_m) \quad \quad \quad (6)$$Here, $k(\textbf{x}_n, \textbf{x}_m)$ is called the ** kernal function** which is defined by the inner profuct of feature space, as follows.

Hence, we can see that the equation (4) can completely be written in ter

In [ ]:

```
```