Basis:

http://neuralnetworksanddeeplearning.com/about.html

**This is the learning note from the book!**

Till 2006, people learnt how to train neural networks (and discovered the D.L).

Computer vision, speech recognition, NLP, classifications on music collection. Large scale : Google, Microsoft and FB.

Chap 1.

NN approach: take a large number of handwritten digits, known as training examples -> learn (rules) -> more training set, learn more features, more accuracy???

The example:

76 lines python with 96%…99%.

Later on:

Two imp types of artificial neuron(perceptron, sigmoid neuron:感知,乙状结肠神经元)

A std learning algoritm for NN: stochastic gradient descent.

Perceptron:

inputs & outputs:

w*x + b <= 0

We use bias instead of thresholds.

Sigmoid neurons:

Real world? Raw data, we’d like the network to learn weights and biases.

Resutls & Feedbacks. (TO LEARN )

xi can be between 0 and 1

Sigmoid function in Baidu Baike:

**The Architecture of Neural Networks**

MLP: Multilayer perceptrons.

64 x 64 grayscale, then we need 4096 (64 x 64) Input neurone (0 or 1). Output is one neurone, thrd = 0.5 (is or not ?)

**Design hidden layers? Heuristics**

Feedforward: output is the input of the next layer. No loops.

RNN (Recurrent neural networks): Feed back loops are possible.

A simple network to classify handwritten digits.

1. First break images. (Segmentation is easier)

2. Recognize individually. (Focus on here!)

The 3-layer neural network:

Inputs: 784 (28 x 28) 0.0 white, 1.0 black

Hidden: n can be different

Outputs: 10 (0 for number 0) —> but we can use only 4 : 2^4 = 16.

**Learning with gradient descent **

Data set: MNIST

Training set: 6w imgs

Testing set:1w imgs

Cost function.

w: collection of weights

b: all biases

n: number of training inputs

a: vector of outputs

x: vector of inputs

C: mean square error (MSE), less is better

To find a minimum value of C, and the values of weights and biases. -> So we need

*******

After training, that is to find a good set of weights and biases for a network, it can easily be ported to run in web browser, or** a native app on a mobile device.**

**Run the codes:**

>>> **import** **mnist_loader**

>>> import network

>>> training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

>>> net = network.Network([784, **100**, 10])

>>> net.SGD(training_data, 30, 10, **3.0**, test_data=test_data)

3.0 -> Learning rate (for gradient descent)

diff rates can result in diff accuracy.

We can also change the hidden layer epoch number. (When I usee 100 instead 10, the accuracy raised from 80 to 90 generally, but I didn’t calculate the running time).

Compared to random guess (the simplest baseline)….we are higher!!

## sophisticated algorithm ≤ simple learning algorithm + good training data.

About Deep Learning:

A good example:

More hidden nodes, more detailed questions:

5-10 hidden layers, are far more better than 1.