There is a nice tutorial from Alex. I expanded the math part to show you more details. I used latex then posted screenshots.
Parzen Windows
Two sets of samples from two distributions. How do we know the differences?
Now we want to test whether q = p. We can either estimate and
from the observations or measure the distance and see the difference.
Same, the other two terms can be simplified in the same way:
So, finally we will reach:
Maximum Mean Discrepancy (MMD)
Defination
It is defined as the supremum of the set S, where S contains differences between expectations. Supremum is an element $a$,where for every element m in S, we have always . For example,
= 3.
In a Reproducing Kernel Hilbert Space where is universal, we have the Theorem that
iff
, when
is a unit ball in the space.
Not going to prove here (well because I don’t know how to do 😦 ). But for simplicity, let us understand this way: if p=q, that means the two sets of samples come from the exact same distribution, and there is no doubt that the discrepancy is zero. If pq, we try some ways to represent these data and map it into some feature spaces (RKHS), then we can always find a mapping function f that would contribute providing the mean discrepancy.
In another word, we can get a squared distance between embeddings in the RKHS, given the two distributions. Then the goal for us is to estimate:
Replacing with
and 
Now I will explain and
.
We have a function who is mapping the data to a feature space F. For example, the quadratic features
. An advantage of the kernels is that there is no need to compute
explicitly, where we have eq2:
.
In the RKHS with kernel k, the evaluation functions are eq3 . We also call it reproducing property. By using kernels,if you are familiar with that, we have
defined as a kernal function
. We take it into eq2, then we will finally have eq4:
(take eq3 in)
And we mark as
.
Optimization Problem
Then let’s have a deep breath and start optimizing.
We take and
:
Add kernels and calculate the squared distance:
(Remember )
We take the first term (same as the third):
(Use eq4)
The second term:
(Use eq4)
Finally, as
. The goal is to estimate it.
That means, we do not need to know the real distribution p and q (usually we do not have that!). We could estimate the MMD from given indipendent i.i.d. data from both two datasets. If we know p and q, we could use Parzen Windows.
One thought on “Two sample problem(1): Parzen Windows, Maximum Mean Discrepancy”