- We import NumPy, matplotlib, and scikit-learn.

In [ ]:

```
import numpy as np
import sklearn
import sklearn.decomposition as dec
import sklearn.datasets as ds
import matplotlib.pyplot as plt
%matplotlib inline
```

- The Iris flower dataset is available in the
*datasets*module of scikit-learn.

In [ ]:

```
iris = ds.load_iris()
X = iris.data
y = iris.target
print(X.shape)
```

In [ ]:

```
plt.figure(figsize=(6,3));
plt.scatter(X[:,0], X[:,1], c=y,
s=30, cmap=plt.cm.rainbow);
```

- We now apply PCA on the dataset to get the transformed matrix. This operation can be done in a single line with scikit-learn: we instantiate a
`PCA`

model, and call the`fit_transform`

method. This function computes the principal components first, and projects the data then.

In [ ]:

```
X_bis = dec.PCA().fit_transform(X)
```

In [ ]:

```
plt.figure(figsize=(6,3));
plt.scatter(X_bis[:,0], X_bis[:,1], c=y,
s=30, cmap=plt.cm.rainbow);
```

`PCA`

estimator dit *not* use the labels. The PCA was able to find a projection maximizing the variance, which corresponds here to a projection where the classes are well separated.

- The
`scikit.decomposition`

module contains several variants of the classic`PCA`

estimator:`ProbabilisticPCA`

,`SparsePCA`

,`RandomizedPCA`

,`KernelPCA`

... As an example, let's take a look at`KernelPCA`

, a non-linear version of PCA.

In [ ]:

```
X_ter = dec.KernelPCA(kernel='rbf').fit_transform(X)
plt.figure(figsize=(6,3));
plt.scatter(X_ter[:,0], X_ter[:,1], c=y, s=30, cmap=plt.cm.rainbow);
```

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).