Part of Speech Tagging with HMMs Lab Experiment

Experiment No 05 Part of Speech Tagging Aim : To implement the Part of Speech Tagging with Hidden Markov Models Theory : Use of a HiddenMarkovModel to do part-of-speech-tagging is a special case of Bayesian inference. In a classification task, we are given some observation(s) and our job is to determine which of a set of classes it belongs to. Part-of-speech tagging is generally treated as a sequence classification task. So here the observation is a sequence of words (a sentence), and it is our job to assign them a sequence of part-of-speech tags.For example, say we are given a sentence like Secretariat is expected to race tomorrow. What is the best sequence of tags which corresponds to this sequence of words? An HMM is specified by the following components : HMM be defined by the transition probabilities between hidden states (i.e. part-of-speech tags) and the observation likelihoods of words given below: Figure : Tag transition probabilities Figure : Observation Likelihoods The symbol <s> is the start-of-sentence symbol. Viterbi algorithm The task of Viterbi algorithm here is to find best hidden state sequence given observations and model The Viterbi algorithm can be represented by following 3 steps: 1. Initialization : This step involves obtaining the required transition and emission probabilities from the trained HMM. The training will be done using any tagged corpus. 2. Recursion :The forward algorithm, in the context of a hidden Markov model (HMM), is used to calculate a 'belief state': the probability of a state at a certain time, given the history of evidence. 3. Termination : It involves finding the best hidden state sequence Programming Exercises 1. Write a Python program to build bigram HMM availing transition and emission probabilities from a tagged dataset. (Simple sentences of maximum 5 Tags to be used) lines = ['<s> Mary/N Jane/N can/M see/V Will/N <\s>', '<s> Spot/N Will/M see/V Mary/N <\s>', '<s> Will/M Jane/N Spot/V Mary/N <\s>', '<s> Mary/N Will/M see/V Spot/N <\s>'] outF = open("hmm.txt", "w") for i in lines: outF.write(i+"\n" ) outF.close() ct = {} # to count no. of occurances bigram = {} trigram = {} cth = {} # to count occurences consider distinct(word, tags) x = [] # store words without tags with open("hmm.txt", encoding = 'utf-8') as f: for line in f: x.clear() l = line.strip() l.replace("\n", "") xm = l.split(' ') for i in xm: if (i in cth): cth[i] += 1 else: cth[i] = 1 sp = i.split('/') x.append(sp[0]) i=0 j=1 for m in x: if (m in ct): ct[m] += 1 else: ct[m] = 1 while(j < len(x)): bi = x[i] + ' | ' + x[j] if (bi in bigram): bigram[bi] += 1 else: bigram[bi] = 1 i += 1 j += 1 print("Transmission probabilities are:") print("Bigram \t\tProbability(x(n)|x(n-1))") for key, value in bigram.items(): x = key.split(" | ") xm=x[0] den = ct[xm] prob = value/den print(key+"\t"+str(prob)) print("Emmision Probabilities are:") print("Value \tProbability(x(n)|(n1))") for key, value in cth.items(): x= key.split("/") xm=x[0] den = ct[xm] prob = value/den print(key+"\t"+str(prob)) 2. Use the HMM model created above to tag a given sentence using the Viterbi algorithm. em = { 'can': { 'M': 1.0 }, 'jane': { 'N': 1.0 }, 'mary': { 'N': 1.0 }, 'see': { 'V': 1.0 }, 'spot': { 'N': 0.6666666666666666 }, 'spot' : { 'V': 0.3333333333333333 }, 'will': { 'M': 0.75 }, 'will': { 'N': 0.25 } } tp = { '<s>': { 'M': 0.25, 'N': 0.75 }, 'M': {'N': 0.25, 'V': 0.75 }, 'N': {'<\\s>': 0.4444444444444444, 'M': 0.3333333333333333, 'N': 0.1111111111111111, 'V': 0.1111111111111111 }, 'V': { 'N': 1.0 } } tags = ['M', 'N', 'V'] line = 'can mary see jane' prevtag = '<s>' first = True new = True prev = 0 words = line.split() res_mat = {} for w in words: res_mat[w] = {} for t in tags: # print(w, t) if em.get(w, False): if em.get(w).get(t, False): if first: first = False res_mat[w][t] = 1*tp.get(prevtag).get(t,0) *em.get(w).get(t,0) else: res_mat[w][t] = prev*tp.get(prevtag).get(t,0) *em.get(w).get(t,0) # print(prev, tp.get(prevtag).get(t,0), em.get(w).get(t,0), prevtag) prev = res_mat[w][t] prevtag = t print(res_mat)

Part of Speech Tagging with HMMs Lab Experiment

Related documents

Products

Support

Part of Speech Tagging with HMMs Lab Experiment

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib