NLP 04: Log-Linear Models for Tagging Task (Python)

We will focus on POS tagging in this blog.


While HMM gives us a joint probability on tags and words: p({t}_{[1:n]},{w}_{[1:n]}). Tags t and words w are one-to-one mapping, so in the series, they share the same length.

We will need an log-linear model to define p({t}_{[1:n]}|{w}_{[1:n]}), which means given the current sentence, what is the possible sequence of the tags for each word.
So our goal is to find the most likely tag sequence given word sequence W:
{ t }_{ [1:n] }^{ * }={ argmax }_{ { t }_{ [1:n] } }p({ t }_{ [1:n] }|{ w }_{ [1:n] }).
The trigram Log-Linear Tagger is given below:
The first line is simply a chain rule, which conditions on word from the start to the current, and tags from the start to the previous one. The second line applies the independence assumptions, because it is a trigram model, so we condition on only previous two tags, not every tag in the history.


We will define a 4-tuple representation as the history: ({t}_{-2},{t}_{-1},{w}_{[1:n]},i). Given the current position i, the first two elements are the previous two tags, the third one w is the whole sentence.

Here is the example.
Besides, we have a large set of all possible histories. Because you will have a number of different sentence length n, although the tag set is finite.

Feature Vectors

A feature here, means a mapping function f, who maps input domain (\chi) and tags (\gamma) to a real value. For example, a binary feature is defined as f: \chi \times \gamma \rightarrow {0,1}. We usually have more than one features, so we have a vector with m dimensions if we have m features.
Here are two features we will use in the implementation:

There is a trigram feature, and it is very similar to the trigram model. Another one is a tag feature, it is actually mapping a word to a tag. It is from language model.
Say we have a labeled sentence: Characteristics \O of \O lipase \I-GENE.., we have now x=(Characteristics, of,lipase,etc). At the position of i = 3, that is the word “lipase”, we will have both two features to be 1.
Besides, we will have a weight vector v, who decides which feature weights more.

GEN Function

GEN(x) is defined as a set of all possible output structures given a sentence x. y \in GEN(x) is a sequence of tagging {t}_{[1:n]}.


Now we have everything, and the decoding problem is given bellow:
13231072_799994226802814_726241109_nWe can use the Viterbi algorithm to find out the maximum local score v \times g((t,u,x,k),s). A full version is given bellow:


The data is described in this blog. Full data and code find here. We are using the same dataset on tagging these words. My code is missing the q function, thus the ViterbiGLM is only pseudo code. But you will find the codes for getting bigram and tag features.
The v vector can be computed using a perceptron algorithm, it is a way for parameter estimation. Here we read directly from the tag.model file, part of data is shown below.



import sys
from collections import defaultdict
class decoder():
def __init__(self):
self.trigram = defaultdict(float)
self.tag = defaultdict(float)

def build_maps(self,infile):
for line in infile:
#TAG:photobleached:I-GENE -1.0
group = line.split(' ')
if group[0].split(':')[0]=='TAG':
elif group[0].split(':')[0]=='TRIGRAM':

print (len(self.tag),len(self.trigram))
#print (self.tag)

def get_g(self,x,tags):
g_for_x = defaultdict(lambda : defaultdict(int))

trigram_values = defaultdict(float) #g trigram = 1 or 0
tag_values = defaultdict(float) #g tag = 1 or 0

history = defaultdict(tuple) # tupel of four <t-2,t-1,x,i>

#let's create the history
tags.insert(0,'*') #init
tags.insert(0,'*') #init
for i in range(0,len(x)):

#let's create the trigram_values and tag_values
#for each word
for i in range(0,len(x)):
#for all possible tags in y
y_current = ['O',' I-GENE']
if i == 0:
y_sub2 = y_sub1 = ['*']
elif i == 1:
y_sub2 = ['*']
y_sub1 = y_current
y_sub2 = y_sub1 = y_current

for y_sub2_tag in y_sub2:
for y_sub1_tag in y_sub1:
for y_current_tag in y_current:
if y_sub2_tag == tags[i] and y_sub1_tag == tags[i+1] and y_current_tag == tags[i+2]:
trigram_values [tuple((y_sub2_tag,y_sub1_tag,y_current_tag,(i+1)))] = 1.
if y_current_tag == tags[i+2]:
tag_values [tuple((tags[i+2],x[i]))] = 1.

#get v * g
for i in range(0,len(x)):
trigram_list = ['TRIGRAM',history[i+1][0],history[i+1][1],tags[i+2]]
v_trigram = self.trigram[':'.join(trigram_list)]
now = tuple((history[i+1][0],history[i+1][1],tags[i+2],(i+1)))

g_tag = tag_values[tuple((tags[i+2],x[i]))]
tag_word_list = ['TAG',x[i],tags[i+2]]
v_tag = self.tag[':'.join(tag_word_list)]

score = trigram_values[now] * v_trigram + v_tag * g_tag
print ('position i:',i,';score:',score)

# pseudo code of ViterbiGLM
# q functon is missing 😦
def ViterbiGLM(self,v,g,x):
pi = defaultdict(float)
pi[tuple((0,'*','*'))] = 1.

for k in range(0,len(x)):
s = ['O',' I-GENE']
if i == 0:
s_sub2 = s_sub1 = ['*']
elif i == 1:
s_sub2 = ['*']
s_sub1 = s
s_sub2 = s_sub1 = s

for u in s_sub1:
for v in s:
max_pi = []
for t in s_sub2:
max_pi.append(pi[tuple((k-1,t,u))] * q(v|t,u,w(1:n),k))
pi[tuple((k,u,v))] = max(max_pi)

# get tags here....
return tag_seq

def load_sentence(self, train_file):
sentence = []
tags = []
for line in train_file:
if len(line.strip().split(' ')) == 2:
line_list = line.strip().split(' ')
elif len(line.strip()) == 0:


def usage():

if __name__ == "__main__":
decoder = decoder()
input_file = open (r'tag.model','r')
train_file = open (r'small_gene.train','r')
print ('start')

Here are some sample outputs, showing a score of each tag in a particular position i. (A part of training)

Published by Irene

Keep calm and update blog.

2 thoughts on “NLP 04: Log-Linear Models for Tagging Task (Python)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: