Deep Learning 03: about Training

Shallow Learning & Deep Learning

In the late 20 century, Back Propagation (BP) has been used on NNs, which brought new hope for Machine Learning. It allows a neural network to learn some patterns through a large sum of training samples, aiming to make predictions. It is based on the statistics. At that time, the machine was called a multi-layer perceptron, but actually, it has only one hidden layer. During 90s of the 20 century, lots of machine learning models were promoted: SVM(Support Vector Machines),Boosting, Logistic Regression,etc.

In 2006, Prof Geoffrey Hinton and his student published a paper, with the opinion of: 1. Multiple Hidden layers have more potential to learn features; 2. The training problems can be solved by layer-wise pre-training (unsupervised learning).

Most of the classification and LG algorithms are based on shallow learning. Deep learning provides a non-linear structure, with the capability of approaching to aimed values in complicated functions and learning features from restricted samples. With multiple layers, one can use less parameters to represent a complicated one.

Training NNs

Traditional training methods can not be applied to deep nns. If BP is applied to a multiple-layer NN, then the result will be not very good. The problems will be:

1) Vanishing Gradient: from top to root, the signal for the bias or error will be smaller and smaller.

2) Generally, it can only learn from tagged or labelled data, opposit to the real brains.

3) Exploding Gradient: The gradient gets much larger in earlier layers.

Training in Deep Learning

Cost will be extremely high if train all the hidden layers in one time. If train those layers one by one, then the errors will be passed and grow larger.

Prof Hinto proposed a method in 2006. The idea contains two steps, train one layer each time, then refine by using wake-sleep algorithm.

The steps:

1) From the root to the top, unsupervised learning.

Training each layer(get parameters) can be treat as an unsupervised learning progress, which could be a feature learning step.

More specifically, training the fist layer can be reckoned as a hidden layer in a 3-layer NN. After training the (n-1)th layer, set the parameters as the input to the nth layer.

2) From the top to the root, supervised learning. (Errors pass down to the root, refine the whole net )

The first step is not randomly, and is actually very reasonable from learning, so it is more close to the result. Thanks to the feature learning in first step.

Deep Learning Algorithms(update soon)
Deep Boltzmann Machine (DBM)
Deep Belief Networks (DBN)
Convolutional Neural Network (CNN)
Stacked Auto-Encoders


Published by Irene

Keep calm and update blog.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: