# Understanding SVM(2)

A brief Introduction here. (Wrote a blog about it last year, but do not think it is detailed.)

This blog is learning notes from this video (English slides but Chinese speaker). First a quick introduction on SVM, then the magic of how to solve max/min values. Also, you could find Kernel SVM.

##### Hard-Margin Support Vector Machines

Shown in the graph, we have two classes (circles and triangles) and we want to separate them by one single line. Note that here we consider they are linear separatable, and use a straight line.

So the goal is to find the blue line, but what is a “good” line? Because you might find a number of lines that can separate the two classes. That is those dots should be as far as they could to the blue line. We find some lines that are parallel to the blue one, then always there are two lines (dash lines) reach the closest sample (or more than one). Then we define the distances of both two lines to the blue line as margin. The goal for SVM is to find a max value of margin.

Notations:
Blue line (the wanted line): ${w}^{T}x+b=0$
Dots above the blue line, the tags are +1: ${w}^{T}x+b=1$
Dots below the blue line, the tags are -1:${w}^{T}x+b=-1$
margin $= ({d}_{+})+({d}_{-})$
Goal: max {margin}

##### Max the Margin

Refresh our mind, how do you calculate distance of two lines:

So for the dash line1: ${w}^{T}x+b=1$ and blue line: ${w}^{T}x+b=0$:(two features x1 and x2)
${w}^{T}x+b-1=0$ => ${w}_{1}{x}_{1}+{w}_{2}{x}_{2}+(b-1)=0$
${w}^{T}x+b=0$ =>${w}_{1}{x}_{1}+{w}_{2}{x}_{2}+b=0$
${d}_{+}=\frac { |b-1-b| }{ \sqrt { { w_{ 1 } }^{ 2 }+{ w_{ 2 } }^{ 2 } } } =\frac { 1 }{ \sqrt { { w_{ 1 } }^{ 2 }+{ w_{ 2 } }^{ 2 } } } =\frac { 1 }{ ||w|| } =\frac { 2 }{ \sqrt { w^{ T }w } }$
(You will find d+ and d- are the same)
So we have $margin=\frac { 2 }{ ||w|| }$, recall the goal is to find a max value of it. Thus, the min value of ||w||. Then we define a loss function: $\phi (w)=\frac { 1 }{ 2 } w^{ T }w$, the goal is to find max ${\phi (w)}$. So do we have constraints for $\phi (w)$? Yes, we have:
$w^{ T }x_{ i }+b\ge +1$ if $y_i=+1$
$w^{ T }x_{ i }+b\le -1$ if $y_i=-1$
They are equal to $y_i(w^{ T }x_{ i }+b)\ge 1, b_i=1,2,..,l$.

So the problem becomes, given a function $\phi (w)=\frac { 1 }{ 2 } w^{ T }w$, which is subjected to $y_i(w^{ T }x_{ i }+b)\ge 1, b_i=1,2,..,l$, then find out the max value of $\phi (w)$.

##### Lagrange Multipliers

Define our L expression:
$L(w,b,\alpha )=\frac { 1 }{ 2 } w^{ T }w-\sum _{ i=1 }^{ l }{ \alpha _{ i }[y_{ i }(w^{ i }x_{ i }+b)-1] }$
1) Get relationship of w and $\alpha$
We let $\frac { \partial L }{ \partial w } =0\quad$, where we have
$\frac { \partial L }{ \partial w } =w-\sum _{ i=1 }^{ l }{ \alpha _{ i }y_{ i }x_{ i } } =0$
Then we will get $w=\sum _{ i=1 }^{ l }{ \alpha _{ i }y_{ i }x_{ i } }$.

* *More detailed: how to get $\frac { \partial w^Tw}{ \partial w }$ (the first term of the above equation)
Recall $w^T=[w_1,w_2,..w_l]$, then $w^Tw=w_1^2+w_2^2+..=m$, then:
$\frac { \partial m}{ \partial w } = [\frac { \partial m }{ \partial w_{ 1 } } ,\frac { \partial m }{ \partial w_{ 2 } } ,..\frac { \partial m }{ \partial w_{ l } } ]^T = [w_1,w_2,..,w_l]^T=w$.

2) Get relationship of b and $\alpha$
Let $\frac { \partial L }{ \partial b } =0\quad$.
$\frac { \partial L }{ \partial b } = \sum \alpha_iy_i = 0$, but we found it is totally eliminating b.

Then the problem becomes:
$\phi (w)=\frac { 1 }{ 2 } w^{ T }w = \frac { 1 }{ 2 } (\sum _{ i=1 }^{ l }{ \alpha _{ i }x_{ i }y_{ i } } )^{ T }(\sum _{ j=1 }^{ l }{ \alpha _{ j }x_{ j }y_{ j } } )$
The job is to find out the $\alpha_i$ when we have the max value of $\phi (w)$. Also, very easy to deal with the second term of the L expression.