TensorFlow 08: save and restore a subset of variables

TensorFlow provides save and restore functions for us to save and re-use the model parameters. If you have a trained VGG model, for example, it will be helpful for you to restore the first few layers then apply them in your own networks. This may raise a problem, how do we restore a subset of the parameters? You can always check the TF official document here. In this post, I will take some code from the document and add some practical points.

Continue reading “TensorFlow 08: save and restore a subset of variables”

To copy or not, that is the question: copying mechanism

In our daily life, we always repeating something mentioned before in our dialogue, like the name of people or organizations. “Hi, my name is Pikachu”, “Hi, Pikachu,…” There is a high probability that the word “Pikachu” will not be in the vocabulary extracted from the training data. So in the paper (Incorporating Copying Mechanism in Sequence-to-Sequence Learning), the authors proposed CopyNet which brings copying mechanism to seq2seq models with encoder and decoder structure. Read from my old post to learn the prerequisite knowledge.

Continue reading “To copy or not, that is the question: copying mechanism”

What matters: attention mechanism

People would be attracted only on a part of an image, say a person on a photo. Similarly, for a given sequence of words, we should pay attention to few keywords instead of treating each word equally. For example, “this is an apple”, when you read it loudly, I am sure you will stress “apple” more rather than “is” or “an” because you will naturally pay attention to the word “apple” (meaningful in this sentence). In seq2seq models (check this post if you forget), we are learning some weights corresponding to the words, where important words get a higher weight.

Continue reading “What matters: attention mechanism”