Intro :¶

SVM (support vector machine) is a *supervised* learning algorithm that can be used both *classification and regression*.

Consider there is a *p dimensional data set, the SVM model is a way to seperate these points as good as possible by generate a p-1 dimensional hyper plane*. (choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. )

SVM works well at

Small ~ medium data sets only. Training becomes extremely slow in the case of larger datasets.

Data sets with low noise. When the data set has more noise i.e. target classes are overlapping, SVM perform very poorly.

no. of features >> no. of samples. SVMs are extremely helpful

Since only a subset of training points are used in the decision function (called support vectors), it is quite memory efficient. This also leads to extremely fast prediction.

Non-linear kernels (rbf,poly..) work OK with various type of data,

but it cost much longer time especially in medium-big data set

Parameter :¶

gamma

defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’.

gamma ↑, fit ↑, over-fitting ↑

C

trades off misclassification of training examples VS simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.

C ↓, fit ↓, under-fitting ↑

Kernel :¶

- dev

Contents:¶

In this nb, we will demo basic applications of SVM, model tuning, and SVM kernel

applications
- SVM.SVC (classification)
  - kernel : linear / RBF / polynomial
  - Model tuning : gamma / c
- SVM.SVR (regression)
  - kernel : linear / RBF / polynomial
  - Model tuning : gamma / c

In [3]:

#cd analysis/ML_/doc/

In [4]:

# credit : 
from IPython.display import Image
print ("""

SVM demo :

How SVM classify data via a optimal hyperplane 
that seperate data points and make the max margin


""")
Image(filename='svm.png')


SVM demo :

How SVM classify data via a optimal hyperplane 
that seperate data points and make the max margin

Out[4]:

Ref :¶

In [5]:

#cd analysis/ML_/doc/

In [ ]:

In [6]:

# analysis library
import pandas as pd, numpy as np 
from scipy import stats 
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline
%pylab inline

# ML
from sklearn import svm, datasets
from sklearn.svm import SVR

Populating the interactive namespace from numpy and matplotlib

In [ ]:

0) Data prepare¶

In [7]:

# help function 


def get_data():
    iris = datasets.load_iris()
    X = iris.data[:, :2] # only take the first two features.
    y = iris.target
    return X,y

def get_data_2():
    # np.sort
    # Return a sorted copy of an array.
    X = np.sort(5 * np.random.rand(200, 1), axis=0)
    # np.ravel
    # Return a contiguous flattened array.
    y = np.cos(X).ravel()
    # add some noise 
    y[::5] += 2*(.5- np.random.rand(40))
    return X,y 


def plot_SVM(Z,subplot_id,title):
    plt.figure(figsize=(15, 5))
    plt.subplot(subplot_id)
    plt.contourf(xx, yy, Z, cmap=plt.cm.terrain_r, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.hot_r)
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.xlim(xx.min(), xx.max())
    plt.title('SVC with {} kernel'.format(title))
    plt.show()

In [8]:

X,y = get_data()

In [9]:

#X[:,1:2]
#plt.m

In [10]:

plt.scatter(X[:,:1],X[:,1:2],c=y,cmap=plt.cm.hot_r)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('SVC with RBF kernel')
plt.show()

1) Classification¶

1-1) Classification - Kernel¶

In [11]:

# prepare plot data 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
 np.arange(y_min, y_max, h))
X_plot = np.c_[xx.ravel(), yy.ravel()]

In [12]:

kernel_list = ['linear','rbf','poly']
subplot_list = [121,122,122]
C = 1.0 

for kernel_, subplot_id in zip(kernel_list,subplot_list):
    print ('kernel_ :', kernel_)
    print ('subplot_id :', subplot_id)
    svc = svm.SVC(kernel=kernel_, C=C, decision_function_shape='ovr').fit(X, y)
    Z = svc.predict(X_plot)
    Z = Z.reshape(xx.shape)
    plot_SVM(Z,subplot_id,kernel_)
    

kernel_ : linear
subplot_id : 121

//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.minimum.reduce will be axis=0, not the current None, to match np.minimum.reduce. Explicitly pass 0 or None to silence this warning.
  return self.reduce(a)
//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.maximum.reduce will be axis=0, not the current None, to match np.maximum.reduce. Explicitly pass 0 or None to silence this warning.
  return self.reduce(a)

kernel_ : rbf
subplot_id : 122

kernel_ : poly
subplot_id : 122

In [ ]:

1-2) Classification - Model tuning¶

In [13]:

# rbf Kernel 

kernel_list = ['rbf','rbf','rbf']
subplot_list = [121,122,122]
C_list = [1,10,100] 
gamma_list = [0.5,1,10] 

print ('---------------------------')
print ('tuning with C')

for kernel_, subplot_id,C in zip(kernel_list,subplot_list,C_list):
    print ('kernel_ :', kernel_)
    print ('subplot_id :', subplot_id)
    print ('C :', C)
    svc = svm.SVC(kernel=kernel_, C=C, decision_function_shape='ovr').fit(X, y)
    Z = svc.predict(X_plot)
    Z = Z.reshape(xx.shape)
    plot_SVM(Z,subplot_id,kernel_)

print ('---------------------------')
print ('tuning with gamma')

for kernel_, subplot_id,gamma in zip(kernel_list,subplot_list,gamma_list):
    print ('kernel_ :', kernel_)
    print ('subplot_id :', subplot_id)
    print ('gamma :', gamma)
    svc = svm.SVC(kernel=kernel_, gamma=gamma, decision_function_shape='ovr').fit(X, y)
    Z = svc.predict(X_plot)
    Z = Z.reshape(xx.shape)
    plot_SVM(Z,subplot_id,kernel_)

---------------------------
tuning with C
kernel_ : rbf
subplot_id : 121
C : 1

//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.minimum.reduce will be axis=0, not the current None, to match np.minimum.reduce. Explicitly pass 0 or None to silence this warning.
  return self.reduce(a)
//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.maximum.reduce will be axis=0, not the current None, to match np.maximum.reduce. Explicitly pass 0 or None to silence this warning.
  return self.reduce(a)

kernel_ : rbf
subplot_id : 122
C : 10

kernel_ : rbf
subplot_id : 122
C : 100

---------------------------
tuning with gamma
kernel_ : rbf
subplot_id : 121
gamma : 0.5

kernel_ : rbf
subplot_id : 122
gamma : 1

kernel_ : rbf
subplot_id : 122
gamma : 10

In [14]:

# poly Kernel 

kernel_list = ['poly','poly','poly']
subplot_list = [121,122,122]
C_list = [1,10,100] 
gamma_list = [0.5,1,10] 

print ('---------------------------')
print ('tuning with C')

for kernel_, subplot_id,C in zip(kernel_list,subplot_list,C_list):
    print ('kernel_ :', kernel_)
    print ('subplot_id :', subplot_id)
    print ('C :', C)
    svc = svm.SVC(kernel=kernel_, C=C, decision_function_shape='ovr').fit(X, y)
    Z = svc.predict(X_plot)
    Z = Z.reshape(xx.shape)
    plot_SVM(Z,subplot_id,kernel_)

print ('---------------------------')
print ('tuning with gamma')

for kernel_, subplot_id,gamma in zip(kernel_list,subplot_list,gamma_list):
    print ('kernel_ :', kernel_)
    print ('subplot_id :', subplot_id)
    print ('gamma :', gamma)
    svc = svm.SVC(kernel=kernel_, gamma=gamma, decision_function_shape='ovr').fit(X, y)
    Z = svc.predict(X_plot)
    Z = Z.reshape(xx.shape)
    plot_SVM(Z,subplot_id,kernel_)

---------------------------
tuning with C
kernel_ : poly
subplot_id : 121
C : 1

//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.minimum.reduce will be axis=0, not the current None, to match np.minimum.reduce. Explicitly pass 0 or None to silence this warning.
  return self.reduce(a)
//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.maximum.reduce will be axis=0, not the current None, to match np.maximum.reduce. Explicitly pass 0 or None to silence this warning.
  return self.reduce(a)

kernel_ : poly
subplot_id : 122
C : 10

kernel_ : poly
subplot_id : 122
C : 100

---------------------------
tuning with gamma
kernel_ : poly
subplot_id : 121
gamma : 0.5

kernel_ : poly
subplot_id : 122
gamma : 1

kernel_ : poly
subplot_id : 122
gamma : 10

In [ ]:

Per examples above,¶

we can find out that gamma, and C do modify SVM model in entirely different ways

gamma

try to fit every training data set i.e. would cause *over-fitting* problem since it make SVM *"learn too much" *on train data that lost model generalization

C

controls the trade off between *smooth decision boundary* and *classifying the training points correctly*.

In [ ]:

2) Regression¶

In [15]:

X,y = get_data_2()

In [16]:

svr_rbf = SVR(kernel='rbf')
svr_lin = SVR(kernel='linear')
svr_poly = SVR(kernel='poly')
y_rbf = svr_rbf.fit(X, y).predict(X)
y_lin = svr_lin.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)

lw = 2
plt.figure(figsize=(12, 7))
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X, y_rbf, color='navy', lw=lw, label='RBF model')
plt.plot(X, y_lin, color='c', lw=lw, label='Linear model')
plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

In [21]:

plt.figure(figsize=(15,10))

C_list = [.01,.5,1,10,100,300] 
gamma_list = [.01,.5,1,10,100,300] 

print ('---------------------------')
print ('tuning with C')

for C in C_list:
    svr_rbf = SVR(kernel='rbf',C=C)
    y_rbf = svr_rbf.fit(X, y).predict(X)

    lw = 2
    #plt.figure(figsize=(12, 7))
    plt.subplot(211)
    plt.plot(X, y_rbf, lw=lw, label='RBF model with C = {}'.format(C))

plt.scatter(X, y, color='darkorange', label='data')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

print ('---------------------------')
print ('tuning with gamma')

plt.figure(figsize=(15,10))

for gamma in gamma_list:
    svr_rbf = SVR(kernel='rbf',gamma=gamma)
    y_rbf = svr_rbf.fit(X, y).predict(X)

    lw = 2
    #plt.figure(figsize=(12, 7))
    plt.subplot(211)
    plt.plot(X, y_rbf, lw=lw, label='RBF model with gamma_list = {}'.format(gamma))

plt.scatter(X, y, color='darkorange', label='data')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

---------------------------
tuning with C

---------------------------
tuning with gamma

In [ ]: