Let’s try some ways to speedup our learning!

**Cross Entropy**

CE is a widely-used cost function. Always larger than 0, and approximate to 0 when the accuracy is huge -> Good feature of Cost function. CE also has a good ability to prevent learning rate decay. It is defined as below [1]:

(need some calculates here…)

Eventually, you will get:

Where a = σ(z), sigmoid function.

From the partial direvatives, we could notice that the learning speed is sensative to (σ(z)−y ), the error of inputs. When we are making a terrible mistake, the error is huge, then the learning speed is huge.

**Softmax**

Instead of applying a Sigmoid function after the activation function to get outputs, we use a Softmax here.

According to Micheal [1], the activation value at neuro j is defined as below:

where the denominator gives the output sum up of all neurons. If one of the activation value a increases, the others would decrease. Because they sum up to 1.

Softmax could be treated as a probability distribution: 1) ranges in (0,1); 2) sum up to 1. In practice, we would like to map real values to probabilities, which help us to classify. Softmax functions convert a raw value into a posterior probability. And also that’s exactly the non-linear feature exist.

* Sigmoid:

Comparison [3]:

ReLU (red), softplus function, has range [0,∞).

Sigmoid (green)

Sigmoid function is useful for binary outputs (0 or 1), and it shows a high belief when the inputs are very big or small.

Softmax function, designed for multi-class outputs, performs well in our case.

Other activation functions like Tan-Sigmoid, Linear combination, Step Function etc, please find in [2].

[1] http://neuralnetworksanddeeplearning.com/chap3.html

[2] https://en.wikibooks.org/wiki/Artificial_Neural_Networks/Activation_Functions

[3]

https://www.quora.com/What-is-special-about-rectifier-neural-units-used-in-NN-learning

### Like this:

Like Loading...

*Related*

## Published by Irene

Keep calm and update blog.
View all posts by Irene