# Breast Cancer Wisconsin (Diagnostic) Data Set¶

https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic)

## Attribute Information:¶

1. ID number
2. Diagnosis (M = malignant, B = benign) M:悪性，B:良性
3. 3-32

Ten real-valued features are computed for each cell nucleus:

• 半径radius (mean of distances from center to points on the perimeter)
• テクスチャtexture (standard deviation of gray-scale values)
• 境界の長さperimeter
• 面積area
• なめらかさsmoothness (local variation in radius lengths)
• コンパクトさcompactness (perimeter^2 / area - 1.0)
• くぼみ度合いconcavity (severity of concave portions of the contour)
• くぼみの数concave points (number of concave portions of the contour)
• 対称性symmetry
• フラクタル次元fractal dimension ("coastline approximation" - 1)

http://people.idsia.ch/~juergen/deeplearningwinsMICCAIgrandchallenge.html

## 仮説クラス¶

$$C(\boldsymbol{y}) = \left\{ \begin{array}{ccc} +1 & {\rm when} & h(\boldsymbol{y})\geq 0\\ -1 & {\rm when} & h(\boldsymbol{y})<0 \end{array} \right.$$

# 最急降下法¶

このような探索方法を最急降下法(steepest descent method)と呼ぶ．

# code(python)¶

• /Users/bob/python/doing_math_with_python/numerical_calc/breast_cancer_detector/codes/python.ipynb

In [8]:
def print_w(w):
"smoothness","compactness","concavity","concave points",
"symmetry","fractal dimension"]
print("    (params)      :    ",end="")
print("   (mean)     (stderr)      (worst)")
for i, param in enumerate(params):
print("%18s:" %param, end="")
for j in range(3):
print("%13.9f" % w[i*3+j], end="")
print()


## データの読み込みと初期化¶

In [9]:
import numpy as np
tmp = np.fromfile('./codes/train_A.data', np.float64, -1, " ")
A = tmp.reshape(300,30)
tmp = np.fromfile('./codes/train_b.data', np.float64, -1, " ")
b = tmp.reshape(300,1)
w = np.zeros(30).reshape(30,1)
for i in range(30):
w[i] = 0.0
print(w.shape)

(30, 1)


## 最急降下法によるw探索(steepest descent)¶

In [10]:
loop, sigma = 300, 3.0*10**(-9)
for i in range(loop):
dLw = A.dot(w)-b
w = w - (dLw.transpose().dot(A)).transpose()*sigma

print_w(w)

    (params)      :       (mean)     (stderr)      (worst)
texture:  0.001687946  0.000004707  0.000000127
perimeter: -0.000003968 -0.000002078  0.000008954
area:  0.000003595  0.000002569  0.000070324
smoothness:  0.000001139 -0.000881778  0.000000430
compactness:  0.000000441  0.000000723  0.000000267
concavity:  0.000001200  0.000000191  0.000411499
concave points:  0.000921972  0.002395138 -0.001932789
symmetry:  0.000005930 -0.000003750 -0.000008147
fractal dimension: -0.000002341  0.000011565  0.000003523


# 結果¶

In [11]:
def show_accuracy(mA, vb, vw):
# M:悪性(-1)，B:良性(1)

correct,safe_error,critical_error=0,0,0
predict = mA.dot(vw)
n = vb.size
for i in range(n):
if predict[i]*vb[i]>0:
correct += 1
elif (predict[i]<0 and vb[i]>0):
safe_error += 1
elif (predict[i]>0 and vb[i]<0):
critical_error += 1
print("       correct: %4d/%4d" % (correct,n))
print("    safe error: %4d" % safe_error)
print("critical error: %4d" % critical_error)

In [45]:
show_accuracy(A, b, w)

       correct:  258/ 300
safe error:    1
critical error:   41

In [12]:
tmp = np.fromfile('./codes/validate_A.data', np.float64, -1, " ")
A = tmp.reshape(260,30)
tmp = np.fromfile('./codes/validate_b.data', np.float64, -1, " ")
b = tmp.reshape(260,1)

show_accuracy(A, b, w)

       correct:  240/ 260
safe error:   10
critical error:   10


# QR decomposition¶

QR分解を使うとより簡単に最小値を求めることができる． 行列$A$は正方行列でないので，逆行列をもとめることができない． しかし，その場合でも$||A.w -b ||^2$を最小にする$w$を求めることができる．

QR分解によって，$n \times m$行列は $$A = QR$$ と分解される．ここで，Qは$n \times m$行列，Rは$m \times m$の正方行列．逆行列を求めることができる．

$|| Aw - b ||$がzeroとなるのはQRを使って， $$Q.R.w=b \\ R.w = Q^t.b \\ R^{-1}.R.w = R^{-1}.Q^t.b$$ となりそう．

In [13]:
import numpy as np

tmp = np.fromfile('./codes/train_A.data', np.float64, -1, " ")
A = tmp.reshape(300,30)
tmp = np.fromfile('./codes/train_b.data', np.float64, -1, " ")
b = tmp.reshape(300,1)

q, r = np.linalg.qr(A)

In [14]:
ww = np.linalg.inv(r).dot(np.transpose(q).dot(b))

In [15]:
q.shape

Out[15]:
(300, 30)
In [16]:
print(r[0,0:5])

[ -2.57579883e+02  -3.32324268e+02  -1.68607899e+03  -1.29450676e+04
-1.65446346e+00]

In [17]:
show_accuracy(A, b, ww)

       correct:  286/ 300
safe error:    1
critical error:   13

In [18]:
print_w(ww)

    (params)      :       (mean)     (stderr)      (worst)
texture: -0.003274619 -8.790300861  1.747147500
perimeter: -0.202849407 -6.506451098  5.061760446
area: 49.167541566 -0.956591421 -0.082052658
smoothness: -0.007943157  0.004976908-27.841944367
compactness:  3.301527110  4.985959134-16.318886295
concavity: 10.316289081-21.332232171 -0.408605816
concave points: -0.003345722 -0.000677873  0.002510735
symmetry:  4.531369718  0.590110016 -0.719368704
fractal dimension: -2.158965299 -3.803467225-12.298417038

In [19]:
tmp = np.fromfile('./codes/validate_A.data', np.float64, -1, " ")
A = tmp.reshape(260,30)
tmp = np.fromfile('./codes/validate_b.data', np.float64, -1, " ")
b = tmp.reshape(260,1)

show_accuracy(A, b, ww)

       correct:  252/ 260
safe error:    6
critical error:    2

In [ ]: