Two sample problem(1): Parzen Windows, Maximum Mean Discrepancy

There is a nice tutorial from Alex. I expanded the math part to show you more details. I used latex then posted screenshots.

Parzen Windows

Two sets of samples from two distributions. How do we know the differences?
Now we want to test whether q = p. We can either estimate \hat { p } and \hat { q } from the observations or measure the distance and see the difference.



Same, the other two terms can be simplified in the same way:


So, finally we will reach:


Maximum Mean Discrepancy (MMD)

D(p,q,\mathcal{F}) =\sup_{f\in \mathcal{F}} { E }_{ p }^{ }[f(x)]-{ E }_{ q }^{ }[f(y)]

It is defined as the supremum of the set S, where S contains differences between expectations. Supremum is an element $a$,where for every element m in S, we have always a\le m. For example,  \sup\{1,2,3\} = 3.

In a Reproducing Kernel Hilbert Space where \mathcal{H} is universal, we have the Theorem that D(p,q,\mathcal{F}) = 0 iff p=q, when \mathcal{F} is a unit ball in the space.
Not going to prove here (well because I don’t know how to do 😦 ). But for simplicity, let us understand this way: if p=q, that means the two sets of samples come from the exact same distribution, and there is no doubt that the discrepancy is zero. If p\neqq, we try some ways to represent these data and map it into some feature spaces (RKHS), then we can always find a mapping function f that would contribute providing the mean discrepancy.

In another word, we can get a squared distance between embeddings in the RKHS, given the two distributions. Then the goal for us is to estimate:
MMD(P,Q) = ||\mu_p-\mu_q||^2_\mathcal{H}

Replacing with \mu_p and \mu_q

Now I will explain \mu_p and \mu_q.
We have a function \phi(x) who is mapping the data to a feature space F. For example, the quadratic features \phi(x)= (1,x,x^2). An advantage of the kernels is that there is no need to compute \phi(x) explicitly, where we have eq2: \langle\phi(x),\phi(x')\rangle=k(x,x').

In the RKHS with kernel k, the evaluation functions are eq3 f(x)= \langle k(x,\cdot ),f \rangle . We also call it reproducing property. By using kernels,if you are familiar with that, we have \phi(x) defined as a kernal function  k(x,\cdot). We take it into eq2, then we will finally have eq4:
\langle k(x,\cdot),k(x',\cdot) \rangle=k(x,x')

E_p[f(x)]=E_p[\langle k(x,\cdot),f \rangle]= \langle E_p[k(x,\cdot)],f \rangle = \langle \mu_p,f \rangle (take eq3 in)
And we mark E_p[k(x,\cdot)] as \mu_p.

Optimization Problem

Then let’s have a deep breath and start optimizing.

We take \mu_p and \mu_q:
\sup_{f\in \mathcal{F}} { E }_{ p }^{ }[f(x)]-{ E }_{ q }^{ }[f(y)] = sup\langle \mu_p-\mu_q,f \rangle = ||\mu_p-\mu_q||_\mathcal{H}

Add kernels and calculate the squared distance:
MMD(P,Q) = ||\mu_p-\mu_q||^2_\mathcal{H}
= \langle \mu_p-\mu_q,\mu_p-\mu_q \rangle
=\langle \mu_p,\mu_p -\mu_q \rangle -\langle \mu_q,\mu_p-\mu_q \rangle
=\langle \mu_p,\mu_p \rangle-2\langle \mu_p,\mu_q \rangle + \langle \mu_q,\mu_q \rangle
(Remember \langle a,b \rangle = \langle b,a \rangle)

We take the first term (same as the third):
\langle \mu_p,\mu_p \rangle
=\langle E_p[k(x,\cdot)], E_p[k(x',\cdot)]\rangle
(Use eq4)

The second term:
\langle \mu_p,\mu_q \rangle
=\langle E_p[k(x,\cdot)], E_q[k(y,\cdot)]\rangle
(Use eq4)

Finally, ||\mu_p-\mu_q||^2_\mathcal{H} = E_{pp}k(x,x')-2E_{pq}k(x,y)+E_{qq}k(y,y') as D(p,q,\mathcal{F}). The goal is to estimate it.

That means, we do not need to know the real distribution p and q  (usually we do not have that!). We could estimate the MMD from given indipendent i.i.d. data from both two datasets. If we know p and q, we could use Parzen Windows.

Published by Irene

Keep calm and update blog.

One thought on “Two sample problem(1): Parzen Windows, Maximum Mean Discrepancy

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: