of which feature coefficient →w=[w0,w1,w2,…,wM−1]T, feature ϕ(xn)=[1,xn,x2n,…,xM−1n] (here we add a bias term ϕ0(xn)=1 to features).
regression_example_draw(degree1=0,degree2=1,degree3=3, ifprint=True)
The expression for the first polynomial is y=-0.129 The expression for the second polynomial is y=-0.231+0.598x^1 The expression for the third polynomial is y=0.085-0.781x^1+1.584x^2-0.097x^3
basis_function_plot()
of which ϕ(→xn)=[ϕ0(→xn),ϕ1(→xn),ϕ2(→xn),…,ϕM−1(→xn)]T
Main Idea: Instead of computing batch gradient (over entire training data), just compute gradient for individual training sample and update.
Main Idea: Compute gradient and set to gradient to zero, solving in closed form.
Here Φ∈\RN×M is called design matrix. Each row represents one sample. Each column represents one feature Φ=[ϕ(→x1)Tϕ(→x2)T⋮ϕ(→xN)T]=[ϕ0(→x1)ϕ1(→x1)⋯ϕM−1(→x1)ϕ0(→x2)ϕ1(→x2)⋯ϕM−1(→x2)⋮⋮⋱⋮ϕ0(→xN)ϕ1(→xN)⋯ϕM−1(→xN)]
of which Φ† is called the Moore-Penrose Pseudoinverse of Φ.
of which (ΦTΦ)†ΦT=Φ†. This is left as an exercise. (Hint: use SVD)