This is the learning note from the book!
Till 2006, people learnt how to train neural networks (and discovered the D.L).
Computer vision, speech recognition, NLP, classifications on music collection. Large scale : Google, Microsoft and FB.
NN approach: take a large number of handwritten digits, known as training examples -> learn (rules) -> more training set, learn more features, more accuracy???
76 lines python with 96%…99%.
Two imp types of artificial neuron(perceptron, sigmoid neuron:感知,乙状结肠神经元)
A std learning algoritm for NN: stochastic gradient descent.
inputs & outputs:
w*x + b <= 0
We use bias instead of thresholds.
Real world? Raw data, we’d like the network to learn weights and biases.
Resutls & Feedbacks. (TO LEARN )
xi can be between 0 and 1
Sigmoid function in Baidu Baike:
The Architecture of Neural Networks
MLP: Multilayer perceptrons.
64 x 64 grayscale, then we need 4096 (64 x 64) Input neurone (0 or 1). Output is one neurone, thrd = 0.5 (is or not ?)
Design hidden layers? Heuristics
Feedforward: output is the input of the next layer. No loops.
RNN (Recurrent neural networks): Feed back loops are possible.
A simple network to classify handwritten digits.
1. First break images. (Segmentation is easier)
2. Recognize individually. (Focus on here!)
The 3-layer neural network:
Inputs: 784 (28 x 28) 0.0 white, 1.0 black
Hidden: n can be different
Outputs: 10 (0 for number 0) —> but we can use only 4 : 2^4 = 16.
Learning with gradient descent
Data set: MNIST
Training set: 6w imgs
Testing set:1w imgs
w: collection of weights
b: all biases
n: number of training inputs
a: vector of outputs
x: vector of inputs
C: mean square error (MSE), less is better
To find a minimum value of C, and the values of weights and biases. -> So we need
After training, that is to find a good set of weights and biases for a network, it can easily be ported to run in web browser, or a native app on a mobile device.
Run the codes:
>>> import mnist_loader
>>> import network
>>> training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
>>> net = network.Network([784, 100, 10])
>>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)
3.0 -> Learning rate (for gradient descent)
diff rates can result in diff accuracy.
We can also change the hidden layer epoch number. (When I usee 100 instead 10, the accuracy raised from 80 to 90 generally, but I didn’t calculate the running time).
Compared to random guess (the simplest baseline)….we are higher!!
sophisticated algorithm ≤ simple learning algorithm + good training data.
About Deep Learning:
A good example:
More hidden nodes, more detailed questions:
5-10 hidden layers, are far more better than 1.