Posted in Algorithm, Natural Language Processing, Python, Theory

## NLP 05: From Word2vec to Doc2vec: a simple example with Gensim

#### Introduction

First introduced by Mikolov [1] in 2013, the word2vec is to learn distributed representations (word embeddings) when applying neural network. It is based on the distributed hypothesis that words occur in similar contexts (neighboring words) tend to have similar meanings. Two models here: cbow ( continuous bag of words) where we use a bag of words to predict a target word and skip-gram where we use one word to predict its neighbors. For more, although not highly recommended, have a look at TensorFlow tutorial here. Continue reading “NLP 05: From Word2vec to Doc2vec: a simple example with Gensim”

Posted in Algorithm, Natural Language Processing, Python, Theory

## NLP 04: Log-Linear Models for Tagging Task (Python)

We will focus on POS tagging in this blog.

##### Notations

While HMM gives us a joint probability on tags and words: $p({t}_{[1:n]},{w}_{[1:n]})$. Tags t and words w are one-to-one mapping, so in the series, they share the same length.