NLP 01: Language Modeling Problems

Lecture notes from Natural Language Processing (by Michael Collins)


Tagging

Tagging: Strings to Tagged Sequences

Example 1: Part-of-speech Tagging (POS tags)

13045539_788838327918404_771946237_n
Tag each word in a sentence with its component name.
Example 2:Named Entity Recognition (NER)
Find and classify names in text.
To understand a role in a sentence.

13084054_788839297918307_2018980326_n

Parsing

Focuses on relations between words.

13089952_788839457918291_1695414490_n


Why is NLP Hard? Ambiguity.

Let’s go back to the example.
“I saw a girl with a telescope.”
How do you understand the example: “I saw a girl, and the girl was with a telescope”, or “I used a telescope and saw a girl”.
Different structures in parsing lead to different interpretations.

Ambiguity also exits in semantic (meaning) level.
“They put money in the bank.” The word “bank” itself has more than one meanings. Did they bury the money into mud? Obviously, that’s not the case.

Ambiguity in discourse (multi-clause) level. For example, it should be able to understand whom the word “he” or “she” is referring to.


Language Modeling

Finite vocabulary, V=\{ the,\quad a,\quad man,...\}
An infinite set of strings, like any combinations of vocabulary.

The Language Modeling Problem
– Training samples in English
– Learn a probability distribution p:
(p is the output when the input is a given sentence.)

13059761_788840211251549_1498088224_n


Markov Processes

A sequence of random variables {X}_{n}, each of them could take any value in a finite set V, and assume length n is fixed.
The model (a joint probabilit), { v }^{ n }:
P({ X }_{ 1 }={x}_{1},{ X }_{ 2 }={x}_{2},{ X }_{ 3 }={x}_{3},...,{ X }_{ n }={x}_{n}).
First-Order Markov Processes
The current state of one random variable, only depends on its previous one, has nothing to do with the ones before the previous one.
So we have:
13059351_788840551251515_176149291_n
It’s reasonable to change to the thrid line, because of the Markov Assumption.
Second-Order Markov Processes:
13059817_788840711251499_463789704_n
The current state now is dependent on its previous two variables’ states.
More info about about Markov Assumption, go here.

Trigram Language Models

– A finite set V
– A parameter q(w|,u,v)\quad for each trigram u,v,w, w is not a STOP, and u, v are not the start (*).
The TLM:
p({ x }_{ 1 },..,{ x }_{ n })=\prod _{ i=1 }^{ n }{ q({ x }_{ i }|{ x }_{ i-1 },{ x }_{ i-2 })}
where { x }_{ 0 }={ x }_{ -1 }=*,{ x }_{ n }=STOP.
An example:
Given a sentence: “the dog barks STOP”. We will have a model like this:
13077257_788841114584792_767763331_n

Four words (including the STOP tag), we will have the same number of probability to sum product. And at each time, we focus (condition) on two previous words. So how do we know the probabilities? We need an estimation.
Estimation of Trigram Model
A natural estimate (maximum likelihood estimate):
13081926_788841404584763_1335173839_n

By count only on the appearance of the trigrams, finally get the maximum likelihood estimation of them.


References:
https://class.coursera.org/nlangp-001/lecture

Click to access part-of-speech-tagging.pdf

Click to access nlp-programming-en-10-parsing.pdf

Published by Irene

Keep calm and update blog.

4 thoughts on “NLP 01: Language Modeling Problems

Leave a reply to Irene Cancel reply