SVM (support vector machine) is a *supervised* learning algorithm that can be used both *classification and regression*.
Consider there is a *p dimensional data set, the SVM model is a way to seperate these points as good as possible by generate a p-1 dimensional hyper plane*. (choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. )
SVM works well at
- Small ~ medium data sets only. Training becomes extremely slow in the case of larger datasets.
- Data sets with low noise. When the data set has more noise i.e. target classes are overlapping, SVM perform very poorly.
- no. of features >> no. of samples. SVMs are extremely helpful
Since only a subset of training points are used in the decision function (called support vectors), it is quite memory efficient. This also leads to extremely fast prediction.
- Non-linear kernels (rbf,poly..) work OK with various type of data,
but it cost much longer time especially in medium-big data set
- defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’.
- gamma ↑, fit ↑, over-fitting ↑
- trades off misclassification of training examples VS simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.
- C ↓, fit ↓, under-fitting ↑
- dev
In this nb, we will demo basic applications of SVM, model tuning, and SVM kernel
#cd analysis/ML_/doc/
# credit :
from IPython.display import Image
print ("""
SVM demo :
How SVM classify data via a optimal hyperplane
that seperate data points and make the max margin
""")
Image(filename='svm.png')
SVM demo : How SVM classify data via a optimal hyperplane that seperate data points and make the max margin
#cd analysis/ML_/doc/
# analysis library
import pandas as pd, numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline
%pylab inline
# ML
from sklearn import svm, datasets
from sklearn.svm import SVR
Populating the interactive namespace from numpy and matplotlib
# help function
def get_data():
iris = datasets.load_iris()
X = iris.data[:, :2] # only take the first two features.
y = iris.target
return X,y
def get_data_2():
# np.sort
# Return a sorted copy of an array.
X = np.sort(5 * np.random.rand(200, 1), axis=0)
# np.ravel
# Return a contiguous flattened array.
y = np.cos(X).ravel()
# add some noise
y[::5] += 2*(.5- np.random.rand(40))
return X,y
def plot_SVM(Z,subplot_id,title):
plt.figure(figsize=(15, 5))
plt.subplot(subplot_id)
plt.contourf(xx, yy, Z, cmap=plt.cm.terrain_r, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.hot_r)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with {} kernel'.format(title))
plt.show()
X,y = get_data()
#X[:,1:2]
#plt.m
plt.scatter(X[:,:1],X[:,1:2],c=y,cmap=plt.cm.hot_r)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('SVC with RBF kernel')
plt.show()
# prepare plot data
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
X_plot = np.c_[xx.ravel(), yy.ravel()]
kernel_list = ['linear','rbf','poly']
subplot_list = [121,122,122]
C = 1.0
for kernel_, subplot_id in zip(kernel_list,subplot_list):
print ('kernel_ :', kernel_)
print ('subplot_id :', subplot_id)
svc = svm.SVC(kernel=kernel_, C=C, decision_function_shape='ovr').fit(X, y)
Z = svc.predict(X_plot)
Z = Z.reshape(xx.shape)
plot_SVM(Z,subplot_id,kernel_)
kernel_ : linear subplot_id : 121
//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.minimum.reduce will be axis=0, not the current None, to match np.minimum.reduce. Explicitly pass 0 or None to silence this warning. return self.reduce(a) //anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.maximum.reduce will be axis=0, not the current None, to match np.maximum.reduce. Explicitly pass 0 or None to silence this warning. return self.reduce(a)
kernel_ : rbf subplot_id : 122
kernel_ : poly subplot_id : 122
# rbf Kernel
kernel_list = ['rbf','rbf','rbf']
subplot_list = [121,122,122]
C_list = [1,10,100]
gamma_list = [0.5,1,10]
print ('---------------------------')
print ('tuning with C')
for kernel_, subplot_id,C in zip(kernel_list,subplot_list,C_list):
print ('kernel_ :', kernel_)
print ('subplot_id :', subplot_id)
print ('C :', C)
svc = svm.SVC(kernel=kernel_, C=C, decision_function_shape='ovr').fit(X, y)
Z = svc.predict(X_plot)
Z = Z.reshape(xx.shape)
plot_SVM(Z,subplot_id,kernel_)
print ('---------------------------')
print ('tuning with gamma')
for kernel_, subplot_id,gamma in zip(kernel_list,subplot_list,gamma_list):
print ('kernel_ :', kernel_)
print ('subplot_id :', subplot_id)
print ('gamma :', gamma)
svc = svm.SVC(kernel=kernel_, gamma=gamma, decision_function_shape='ovr').fit(X, y)
Z = svc.predict(X_plot)
Z = Z.reshape(xx.shape)
plot_SVM(Z,subplot_id,kernel_)
--------------------------- tuning with C kernel_ : rbf subplot_id : 121 C : 1
//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.minimum.reduce will be axis=0, not the current None, to match np.minimum.reduce. Explicitly pass 0 or None to silence this warning. return self.reduce(a) //anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.maximum.reduce will be axis=0, not the current None, to match np.maximum.reduce. Explicitly pass 0 or None to silence this warning. return self.reduce(a)
kernel_ : rbf subplot_id : 122 C : 10
kernel_ : rbf subplot_id : 122 C : 100
--------------------------- tuning with gamma kernel_ : rbf subplot_id : 121 gamma : 0.5
kernel_ : rbf subplot_id : 122 gamma : 1
kernel_ : rbf subplot_id : 122 gamma : 10
# poly Kernel
kernel_list = ['poly','poly','poly']
subplot_list = [121,122,122]
C_list = [1,10,100]
gamma_list = [0.5,1,10]
print ('---------------------------')
print ('tuning with C')
for kernel_, subplot_id,C in zip(kernel_list,subplot_list,C_list):
print ('kernel_ :', kernel_)
print ('subplot_id :', subplot_id)
print ('C :', C)
svc = svm.SVC(kernel=kernel_, C=C, decision_function_shape='ovr').fit(X, y)
Z = svc.predict(X_plot)
Z = Z.reshape(xx.shape)
plot_SVM(Z,subplot_id,kernel_)
print ('---------------------------')
print ('tuning with gamma')
for kernel_, subplot_id,gamma in zip(kernel_list,subplot_list,gamma_list):
print ('kernel_ :', kernel_)
print ('subplot_id :', subplot_id)
print ('gamma :', gamma)
svc = svm.SVC(kernel=kernel_, gamma=gamma, decision_function_shape='ovr').fit(X, y)
Z = svc.predict(X_plot)
Z = Z.reshape(xx.shape)
plot_SVM(Z,subplot_id,kernel_)
--------------------------- tuning with C kernel_ : poly subplot_id : 121 C : 1
//anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.minimum.reduce will be axis=0, not the current None, to match np.minimum.reduce. Explicitly pass 0 or None to silence this warning. return self.reduce(a) //anaconda/envs/g_dash/lib/python3.4/site-packages/numpy/ma/core.py:6385: MaskedArrayFutureWarning: In the future the default for ma.maximum.reduce will be axis=0, not the current None, to match np.maximum.reduce. Explicitly pass 0 or None to silence this warning. return self.reduce(a)
kernel_ : poly subplot_id : 122 C : 10
kernel_ : poly subplot_id : 122 C : 100
--------------------------- tuning with gamma kernel_ : poly subplot_id : 121 gamma : 0.5
kernel_ : poly subplot_id : 122 gamma : 1
kernel_ : poly subplot_id : 122 gamma : 10
we can find out that gamma, and C do modify SVM model in entirely different ways
try to fit every training data set i.e. would cause *over-fitting* problem since it make SVM *"learn too much" *on train data that lost model generalization
controls the trade off between *smooth decision boundary* and *classifying the training points correctly*.
X,y = get_data_2()
svr_rbf = SVR(kernel='rbf')
svr_lin = SVR(kernel='linear')
svr_poly = SVR(kernel='poly')
y_rbf = svr_rbf.fit(X, y).predict(X)
y_lin = svr_lin.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)
lw = 2
plt.figure(figsize=(12, 7))
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X, y_rbf, color='navy', lw=lw, label='RBF model')
plt.plot(X, y_lin, color='c', lw=lw, label='Linear model')
plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()
plt.figure(figsize=(15,10))
C_list = [.01,.5,1,10,100,300]
gamma_list = [.01,.5,1,10,100,300]
print ('---------------------------')
print ('tuning with C')
for C in C_list:
svr_rbf = SVR(kernel='rbf',C=C)
y_rbf = svr_rbf.fit(X, y).predict(X)
lw = 2
#plt.figure(figsize=(12, 7))
plt.subplot(211)
plt.plot(X, y_rbf, lw=lw, label='RBF model with C = {}'.format(C))
plt.scatter(X, y, color='darkorange', label='data')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()
print ('---------------------------')
print ('tuning with gamma')
plt.figure(figsize=(15,10))
for gamma in gamma_list:
svr_rbf = SVR(kernel='rbf',gamma=gamma)
y_rbf = svr_rbf.fit(X, y).predict(X)
lw = 2
#plt.figure(figsize=(12, 7))
plt.subplot(211)
plt.plot(X, y_rbf, lw=lw, label='RBF model with gamma_list = {}'.format(gamma))
plt.scatter(X, y, color='darkorange', label='data')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()
--------------------------- tuning with C
--------------------------- tuning with gamma