NLP 01: Language Modeling Problems

Lecture notes from Natural Language Processing (by Michael Collins)


Tagging: Strings to Tagged Sequences

Example 1: Part-of-speech Tagging (POS tags)

Tag each word in a sentence with its component name.
Example 2:Named Entity Recognition (NER)
Find and classify names in text.
To understand a role in a sentence.



Focuses on relations between words.


Why is NLP Hard? Ambiguity.

Let’s go back to the example.
“I saw a girl with a telescope.”
How do you understand the example: “I saw a girl, and the girl was with a telescope”, or “I used a telescope and saw a girl”.
Different structures in parsing lead to different interpretations.

Ambiguity also exits in semantic (meaning) level.
“They put money in the bank.” The word “bank” itself has more than one meanings. Did they bury the money into mud? Obviously, that’s not the case.

Ambiguity in discourse (multi-clause) level. For example, it should be able to understand whom the word “he” or “she” is referring to.

Language Modeling

Finite vocabulary, V=\{ the,\quad a,\quad man,...\}
An infinite set of strings, like any combinations of vocabulary.

The Language Modeling Problem
– Training samples in English
– Learn a probability distribution p:
(p is the output when the input is a given sentence.)


Markov Processes

A sequence of random variables {X}_{n}, each of them could take any value in a finite set V, and assume length n is fixed.
The model (a joint probabilit), { v }^{ n }:
P({ X }_{ 1 }={x}_{1},{ X }_{ 2 }={x}_{2},{ X }_{ 3 }={x}_{3},...,{ X }_{ n }={x}_{n}).
First-Order Markov Processes
The current state of one random variable, only depends on its previous one, has nothing to do with the ones before the previous one.
So we have:
It’s reasonable to change to the thrid line, because of the Markov Assumption.
Second-Order Markov Processes:
The current state now is dependent on its previous two variables’ states.
More info about about Markov Assumption, go here.

Trigram Language Models

– A finite set V
– A parameter q(w|,u,v)\quad for each trigram u,v,w, w is not a STOP, and u, v are not the start (*).
The TLM:
p({ x }_{ 1 },..,{ x }_{ n })=\prod _{ i=1 }^{ n }{ q({ x }_{ i }|{ x }_{ i-1 },{ x }_{ i-2 })}
where { x }_{ 0 }={ x }_{ -1 }=*,{ x }_{ n }=STOP.
An example:
Given a sentence: “the dog barks STOP”. We will have a model like this:

Four words (including the STOP tag), we will have the same number of probability to sum product. And at each time, we focus (condition) on two previous words. So how do we know the probabilities? We need an estimation.
Estimation of Trigram Model
A natural estimate (maximum likelihood estimate):

By count only on the appearance of the trigrams, finally get the maximum likelihood estimation of them.


Click to access part-of-speech-tagging.pdf

Click to access nlp-programming-en-10-parsing.pdf

Published by Irene

Keep calm and update blog.

4 thoughts on “NLP 01: Language Modeling Problems

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: