Sampling Methods

This post contains very basic knowledge of few sampling methods. As I am going to implement a java version of Gibbs Sampling, I went through some materials on the internet, and kept a learning journal here.
Will learn more about Gibbs Sampling in few days, and I will focus on how it involves with Graph Processing.

PS. I found in a time series, RNN (Recurrent Neural Networks) has something in common with Gibbs Sampling.

Background
Stochastic Simulation: Monte Carlo Simulation. 20th 40s.
Idea: given a probability distribution p(x), how to generate the samples?
Uniform(0,1) is easier to generate, to some extend, fake random numbers.

1. Monte Carlo Method
If we want to get derivative of a f(x):

But when f(x) is complicated and not that easy to get derivative, we use MC Method.

q(x): a prob dist. in a range about x.
f(x) : a function.
get n samples from q(x), when n is big enough, we can use avg to approximately get:

So the main thing is how to sampling when given a prob. dist.

2.Uniform Dist, Box-Muller
The fake random numbers of [0,1]…Uniform Dist.
Box-Muller: if U1 and U2 are independent, and U1,U2 ~ Uniform[0,1] :

Then Z0,Z1 ～N(0,1). >> Standard Normal Dist.

3. Monte Carlo Principle
MC sampling calculate Exceptions.
x ~ p(x), so what’s E(x)?
Let’s get samples xi from p(x):

4. Acceptance-Rejection Sampling
In the real world, sometimes it’s hard to sample in P(x).
Use an approximate way to reject some of the sampling, set a proposal diet q(x), maybe a Gaussian Dist.

The steps are as follows:

set a function q(x), and a constant k, which allow p(x) is below kq(x):

– x dimension: samples from q(x) to get a.

– y dimension: samples from Uniform Dist (0,kq(a) ) to get u.

– If the point is in the gray field: u > p(a), then reject; unless accept

– repeat.

In high dimensions, RS is in face of two problems: it is hard to find a suitable q distribution, and it is not easy to find a k.

5. Markov Chain

We have a state matrix:

if the init is: u(x) = (0.5, 0.2, 0.3)
so the next state should be :
u(x)T = (0.18, 0.64, 0.18)
if keep going, then >>> (0.22, 0.41, 0.37)
No matter the starting state, after long way of Markov Chain, all going to the nealy same result.

Python code:
import numpy as np

a = [0.5,0.2,0.3]
b = [(0,1,0),(0,0.1,0.9),(0.6,0.4,0)]
print a,b
c = np.dot(a,b)

for i in range(1,100):
c = np.dot(c,b);

print c

The result is:

(100 times)
You can change the starting of a.

In the end, u(x) will be -> p(x).

马氏链的收敛性质主要由转移矩阵决定, 所以基于马氏链做采样的关键问题是如何构造转移矩阵,使得平稳分布恰好是我们要的分布p(x)

6. Gibbs Sampling

Gibbs sampling is a widely-used MCMC method.
In RBM of Deep Learning, Gibbs Sampling is that when the weight attributes and a variable v are certain, through sampling we can get h. Then through sampling of h, we can get another variable v. Iterate v and h, then it’s going to converges to a joint probability. (It seems like x and y, the 2D example ??? Need to double check!)
In Machine Learning, Gibbs Sampling is used in the inference of training: to calculate derivatives, expectations (average).

MCMC:Monte Carlo Markov Chain

2-D x, y, Gibbs Sampling

For multiple dimensions :

The algorithm is as follows.

An example of 4 D:
(A,B,C,D):

It seems that G.S is used for sampling only.
I am now trying to find out how it work in the inference of Graph Models.