If you use ROUGE Evaluation metric for text summarization systems or machine translation systems, you must have noticed that there are many versions of them. So how to get it work with your own systems with Python? What packages are helpful? In this post, I will give some ideas based on engineering’s view (which means I am not going to introduce what is ROUGE). I also suffered from few issues and finally got them solved. My methods might not be the best ways but they worked.
First introduced by Mikolov 1 in 2013, the word2vec is to learn distributed representations (word embeddings) when applying neural network. It is based on the distributed hypothesis that words occur in similar contexts (neighboring words) tend to have similar meanings. Two models here: cbow ( continuous bag of words) where we use a bag of words to predict a target word and skip-gram where we use one word to predict its neighbors. For more, although not highly recommended, have a look at TensorFlow tutorial here. Continue reading “NLP 05: From Word2vec to Doc2vec: a simple example with Gensim”
We will focus on POS tagging in this blog.
While HMM gives us a joint probability on tags and words: . Tags t and words w are one-to-one mapping, so in the series, they share the same length.
Continue reading “NLP 04: Log-Linear Models for Tagging Task (Python)”
It is somehow a little bit fast to start MT.
Anyway, this blog is very superficial, giving you a view on basics, along with an implementation but a bad result…which gives you more chances to optimize. Btw, you might learn some Chinese here 😛
Continue reading “NLP 03: Finding Mr. Alignment, IBM Translation Model 1”
After HMMs, let’s work on a Trigram HMM directly on texts.First will introduce the model, then pieces of code for practicing.
But not going to give a full solution as the course is still going every year, find out more in references.
Continue reading “NLP 02: A Trigram Hidden Markov Model (Python)”
Lecture notes from Natural Language Processing (by Michael Collins)
Continue reading “NLP 01: Language Modeling Problems”