Previously, we introduced Autoencoders and Hierarchical Variational Autoencoders (HVAEs). In this post, we will cover the details of Denoising Diffusion Probabilistic Models (DDPM).
Diffusion Models
We can treat DDPM as a restricted HVAE. Here, each only depends on
. In DDPM, we do not have parameters to add noises, and it is a predefined Gaussian model. This would bring us some computational convenience to obtain any arbitrary
quickly.
As shown in the following image, we can see that DDPM has two phases: 1) forward diffusion: adding noises to an input image, at T steps, the image becomes pure noises; 2) reverse process: we try to generate the original image input based on the noised version at .
Forward Diffusion: 
Similarly, each step is to use a linear Gaussian to add noises based on the input from the previous step. So the forward process from time step 0 to T is:
,
And for each we have:
.
Note that is the mean, and
is the variance at
, ranges from 0 to 1. Let’s first look at this variance value
. Because at time step
, we should have exactly
, so one way is to start with a smaller value, and increase it:
. So then the mean value
is in a reversed trend.
As I mentioned before, such a definition makes it possible to obtain sample at any arbitrary forward step given
. Because the sum of independent Gaussian is still a Gaussian:
,
where , and
.
Reverse Process: 
While the forward diffusion is unparameterized, the reverse process is parameterized with (eliminated in the image). So we defined the following:
.
Given that is pure noise,
, so we do not have
here.
The issue is how to define the objective function . It is not possible to look at all possible directions on
.
.
Similar to VAEs, we have the first term here to be reconstruction loss and consistency loss. Actually, we are eliminating the loss at step (since this is a known distribution, we simply ignore it). Since the reverse steps are also Gaussians, we have:
.
Differently, we learn the mean value but fix the variance
. After reparameterization, we transfer our loss to predict this noise
(
is a weight at step
):
,
There are some small tricks to obtain such simplification, and we will include some details in a slide file, stay tuned!
References
[1]https://youtu.be/fbLgFrlTnGU
[2] Understanding Diffusion Models: A Unified Perspective (Calvin Luo)