Part of Speech • Each word belongs to a word class. The word class of a word is known as part-of-speech (POS) of that word. • Most POS tags implicitly encode fine-grained specializations of eight basic parts of speech: – noun, verb, pronoun, preposition, adjective, adverb, conjunction, article • These categories are based on morphological and distributional similarities (not semantic similarities). • Part of speech is also known as: – word classes – morphological classes – lexical tags BİL711 Natural Language Processing 1 Part of Speech (cont.) • A POS tag of a word describes the major and minor word classes of that word. • A POS tag of a word gives a significant amount of information about that word and its neighbours. For example, a possessive pronoun (my, your, her, its) most likely will be followed by a noun, and a personal pronoun (I, you, he, she) most likely will be followed by a verb. • Most of words have a single POS tag, but some of them have more than one (2,3,4,…) • For example, book/noun or book/verb – I bought a book. – Please book that flight. BİL711 Natural Language Processing 2 Tag Sets • There are various tag sets to choose. • The choice of the tag set depends on the nature of the application. – We may use small tag set (more general tags) or – large tag set (finer tags). • Some of widely used part-of-speech tag sets: – Penn Treebank has 45 tags – Brown Corpus has 87 tags – C7 tag set has 146 tags • In a tagged corpus, each word is associated with a tag from the used tag set. BİL711 Natural Language Processing 3 English Word Classes • Part-of-speech can be divided into two broad categories: – closed class types -- such as prepositions – open class types -- such as noun, verb • Closed class words are generally also function words. – Function words play important role in grammar – Some function words are: of, it, and, you – Functions words are most of time very short and frequently occur. • There are four major open classes. – noun, verb, adjective, adverb – a new word may easily enter into an open class. • Word classes may change depending on the natural language, but all natural languages have at least two word classes: noun and verb. BİL711 Natural Language Processing 4 Nouns • Nouns can be divided as: – proper nouns -- names for specific entities such as Ankara, John, Ali – common nouns • Proper nouns do not take an article but common nouns may take. • Common nouns can be divided as: – count nouns -- they can be singular or plural -- chair/chairs – mass nouns -- they are used when something is conceptualized as a homogenous group -- snow, salt • Mass nouns cannot take articles a and an, and they can not be plural. BİL711 Natural Language Processing 5 Verbs • Verb class includes the words referring actions and processes. • Verbs can be divided as: – main verbs -- open class -- draw, bake – auxiliary verbs -- closed class -- can, should • Auxiliary verbs can be divided as: – copula -- be, have – modal verbs -- may, can, must, should • Verbs have different morphological forms: – – – – – non-3rd-person-sg eat 3rd-person-sg - eats progressive -- eating past -- ate past participle -- eaten BİL711 Natural Language Processing 6 Adjectives • Adjectives describe properties or qualities – for color -- black, white – for age -- young, old • In Turkish, all adjectives can also be used as noun. – kırmızı kitap red book – kırmızıyı the red one (ACC) BİL711 Natural Language Processing 7 Adverbs • Adverbs normally modify verbs. • Adverb categories: – locative adverbs -- home, here, downhill – degree adverbs -- very, extremely – manner adverbs -- slowly, delicately – temporal adverbs -- yesterday, Friday • Because of the heterogeneous nature of adverbs, some adverbs such as Friday may be tagged as nouns. BİL711 Natural Language Processing 8 Major Closed Classes • • • • • • Prepositions -- on, under, over, near, at, from, to, with Determiners -- a, an, the Pronouns -- I, you, he, she, who, others Conjunctions -- and, but, if, when Participles -- up, down, on, off, in, out Numerals -- one, two, first, second BİL711 Natural Language Processing 9 Prepositions • Occur before noun phrases • indicate spatial or temporal relations • Example: – on the table – under chair • They occur so often. For example, some of the frequency counts in a 16 million word corpora (COBUILD). – – – – – – – of in for to with on at 540,085 331,235 142,421 125,691 124,965 109,129 100,169 BİL711 Natural Language Processing 10 Particles • A particle combines with a verb to form a larger unit called phrasal verb. – go on – turn on – turn off – shut down BİL711 Natural Language Processing 11 Articles • • • • A small closed class Only three words in the class: a an the Marks definite or indefinite They occur so often. For example, some of the frequency counts in a 16 million word corpora (COBUILD). – the – a – an 1,071,676 413,887 59,359 • Almost 10% of words are articles in this corpus. BİL711 Natural Language Processing 12 Conjunctions • Conjunctions are used to combine or join two phrases, clauses or sentences. • Coordinating conjunctions -- and or but – join two elements of equal status – Example: you and me • Subordinating conjunctions -- that who – combines main clause with subordinate clause – Example: • I thought that you might like milk BİL711 Natural Language Processing 13 Pronouns • Shorthand for referring to some entity or event. • Pronouns can be divided: – personal you she I – possessive my your his – wh-pronouns who what -- who is the president? BİL711 Natural Language Processing 14 TagSets for English • There are popular actual tagsets for part-of-speech • PENN TREEBANK tagset has 45 tags – – – – – – – IN DT JJ NN NNS VB VBD preposition/subordinating conj. determiner adjective noun, singular or mass noun, plural verb, base form verb, past tense • A sentence from Brown corpus which is tagged using Penn Treebank tagset. – The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. BİL711 Natural Language Processing 15 Part of Speech Tagging • Part of speech tagging is simply assigning the correct part of speech for each in an input sentence • We assume that we have the following: – A set of tags (our tag set) – A dictionary that tells us the possible tags for each word (including all morphological variants). – A text to be tagged. • There are different algorithms for tagging. – Rule Based Tagging – Statistical Tagging (Stochastic Tagging) – Transformation Based Tagging BİL711 Natural Language Processing 16 How hard is tagging? • Most words in English are unambiguous. They have only a single tag. • But many of most common words are ambiguous: – can/verb can/auxiliary can/noun • The number of word types in Brown Corpus – unambiguous (one tag) 35,340 – ambiguous (2-7 tags) 4,100 • • • • • • 2 tags 3 tags 4 tags 5 tags 6 tags 7 tags 3760 264 61 12 2 1 • While only 11.5% of word types are ambiguous, over 40% of Brown corpus tokens are ambiguous. BİL711 Natural Language Processing 17 Rule-Based Part-of-Speech Tagging • The rule-based approach uses handcrafted sets of rules to tag input sentence. • There are two stages in rule-based taggers: – First Stage: Uses a dictionary to assign each word a list of potential parts-of-speech. – Second Stage: Uses a large list of handcrafted rules to window down this list to a single part-of-speech for each word. • The ENGTWOL is a rule-based tagger – In the first stage, uses a two-level lexicon transducer – In the second stage, uses hand-crafted rules (about 1100 rules) BİL711 Natural Language Processing 18 After The First Stage • Example: He had a book. • After the fırst stage: – he he/pronoun – had have/verbpast have/auxliarypast – a a/article – book book/noun book/verb BİL711 Natural Language Processing 19 Tagging Rule Rule-1: if (the previous tag is an article) then eliminate all verb tags Rule-2: if (the next tag is verb) then eliminate all verb tags BİL711 Natural Language Processing 20 Transformation-Based Tagging • Transformation-based tagging is also known as Brill Tagging. • Similar to rule-based taggers but rules are learned from a tagged corpus. • Then these learned rules are used in tagging. BİL711 Natural Language Processing 21 How TBL Rules are Applied • Before the rules are applied the tagger labels every word with its most likely tag. • We get these most likely tags from a tagged corpus. • Example: – He is expected to race tomorrow – he/PRN is/VBZ expected/VBN to/TO race/NN tomorrow/NN • After selecting most-likely tags, we apply transformation rules. – Change NN to VB when the previous tag is TO – This rule converts race/NN into race/VB • This may not work for every case – ….. According to race BİL711 Natural Language Processing 22 How TBL Rules are Learned • We will assume that we have a tagged corpus. • Brill’s TBL algorithm has three major steps. – Tag the corpus with the most likely tag for each (unigram model) – Choose a transformation that deterministically replaces an existing tag with a new tag such that the resulting tagged training corpus has the lowest error rate out of all transformations. – Apply the transformation to the training corpus. • These steps are repeated until a stopping criterion is reached. • The result (which will be our tagger) will be: – First tags using most-likely tags – Then apply the learned transformations BİL711 Natural Language Processing 23 Transformations • A transformation is selected from a small set of templates. Change tag a to tag b when - The preceding (following) word is tagged z. - The word two before (after) is tagged z. - One of two preceding (following) words is tagged z. - One of three preceding (following) words is tagged z. - The preceding word is tagged z and the following word is tagged w. - The preceding (following) word is tagged z and the word two before (after) is tagged w. BİL711 Natural Language Processing 24 Basic Results • We get 91% accuracy just picking the most likely tag. • We should improve the accuracy further. • Some taggers can perform 99% percent. BİL711 Natural Language Processing 25 Statistical Part-of-Speech Tagging • Choosing the best tag sequence T=t1,t2,…,tn for a given word sequence W = w1,w2,…,wn (sentence): ^ T arg max P(T | W ) T By Bayes Rule: P(W | T ) P(T ) T arg max P(W ) T ^ Since P(W) will be same for each tag sequence: ^ T arg max P(W | T ) P(T ) T BİL711 Natural Language Processing 26 Statistical POS Tagging (cont.) • If we assume a tagged corpus and a trigram language model, then P(T) can be approximated as: n P(t1 ) P(t2 | t1 ) P(ti | ti 2 ti 1 ) i 3 To evaluate this formula is simple, we get from simple word counting (and smoothing). BİL711 Natural Language Processing 27 Statistical POS Tagging (cont.) To evaluate P(W|T), we will make the simplifying assumption that the word depends only on its tag. n P( w i 1 i | ti ) So, we want the tag sequence that maximizes the following quantity. n P(t1 ) P(t2 | t1 ) P(ti | ti 2 ti 1 ) P( wi | ti ) i 3 i 1 n The best tag sequence can be found by Viterbi algorithm. BİL711 Natural Language Processing 28