Posted in Deep Learning

Deep Learning 01: Basis + NN on Handwritten Recognition


This is the learning note from the book!

Till 2006, people learnt how to train neural networks (and discovered the D.L).

Computer vision, speech recognition, NLP, classifications on music collection. Large scale : Google, Microsoft and FB.

Chap 1.

NN approach: take a large number of handwritten digits, known as training examples -> learn (rules) -> more training set, learn more features, more accuracy???

The example:

76 lines python with 96%…99%.

Later on:

Two imp types of artificial neuron(perceptron, sigmoid neuron:感知,乙状结肠神经元)

A std learning algoritm for NN: stochastic gradient descent.


inputs & outputs:

w*x + b <= 0

We use bias instead of thresholds.

Sigmoid neurons:

Real world? Raw data, we’d like the network to learn weights and biases.

Resutls & Feedbacks. (TO LEARN  )

xi can be between 0 and 1

Sigmoid function in Baidu Baike:

The Architecture of Neural Networks

MLP: Multilayer perceptrons.

64 x 64 grayscale, then we need 4096 (64 x 64) Input neurone (0 or 1).  Output is one neurone, thrd = 0.5 (is or not ?)

Design hidden layers? Heuristics

Feedforward: output is the input of the next layer. No loops.

RNN (Recurrent neural networks): Feed back loops are possible.

A simple network to classify handwritten digits.

1. First break images. (Segmentation is easier)

2. Recognize individually. (Focus on here!)

The 3-layer neural network:

Inputs: 784 (28 x 28)  0.0 white, 1.0 black

Hidden: n can be different

Outputs: 10 (0 for number 0) —> but we can use only 4 : 2^4 = 16.

Learning with gradient descent 

Data set: MNIST

Training set: 6w imgs

Testing set:1w imgs

Cost function.

w: collection of weights

b: all biases

n: number of training inputs

a: vector of outputs

x: vector of inputs

C: mean square error (MSE), less is better

To find a minimum value of C, and the values of weights and biases.  -> So we need


After training, that is to find a good set of weights and biases for a network, it can easily be ported to run in web browser, or a native app on a mobile device.

Run the codes:

>>>  import mnist_loader

>>>   import network

>>> training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

>>> net = network.Network([784, 100, 10])

>>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

3.0 -> Learning rate (for gradient descent)

diff rates can result in diff accuracy.

We can also change the hidden layer epoch number. (When I usee 100 instead 10, the accuracy raised from 80 to 90 generally, but I didn’t calculate the running time).

Compared to random guess (the simplest baseline)….we are higher!!

sophisticated algorithm ≤ simple learning algorithm + good training data.

About Deep Learning:

A good example:

More hidden nodes, more detailed questions:

5-10 hidden layers, are far more better than 1.


Keep calm and update blog.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s