Morphology: Words and their Parts CS 4705 CS 4705 Basic Uses of Morphology • The study of how words are composed from smaller, meaning-bearing units (morphemes) • Applications: – Spelling correction: referece – Hyphenation algorithms: refer-ence – Part-of-speech analysis: googler – Text-to-speech: grapheme-to-phoneme conversion • hothouse (/T/ or /D/) – Speech recognition: phoneme-to-grapheme conversion – Artificial languages in standardized tests • ‘Twas brillig and the slithy toves… • Muggles moogled migwiches What is a word? • In formal languages, words are arbitrary strings • In natural languages, words are made up of meaningful subunits called morphemes – Allows for productivity: googled, texted – Subword units express concepts denoting entities or relationships in the world • Roots + • Syntactic or grammatical elements – Realizations of morphemes: morphs • Door realizes door; take and took realize take • Allomorphs are classes of related morphs that realize a given morpheme – Allomorphs of s include en, men, es in English – Take and took are allomorphs of take • Syntactic or grammatical morphemes can convey many things – In Italian, nouns are marked for gender and number Singular Plural Masc pomodoro pomodori Fem cipolla cipolle – pomodor- cipoll- are called stems, which may or may not occur on their own as words – Stem may not occur as a word: derivative/deriv – Base form (lemma) occurs as word: derivative/derive – Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too What information does morphology give us? • Differs by language – Spanish: hablo, hablaré/ English: I speak, I will speak – English: book, books/ Japanese: hon, hon • Languages also differ in how they encode information – Isolating languages (e.g. Mandarin) have no bound forms (affixes) that attach to a word – Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem like beads on a string – each feature is expressed by a single affix – Inflectional languages (e.g. English) merges different features into a single affix (e.g. person and tense of verbs); same feature can be realized by different affixes – Polysynthetic languges (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g. – So….different languages may require very different morphological analyzers Morphology Helps Define Word Classes • AKA morphological classes, parts-of-speech • Closed vs. open (function vs. content) class words – Pronoun, preposition, conjunction, determiner,… – Noun, verb, adverb, adjective,… • Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic detection… Inflectional Morphology • Word stem + grammatical morpheme different forms of same word – Usually produces word of same class – Usually serves a syntactic or grammatical function (e.g. agreement) like likes or liked bird birds • Nominal morphology – Plural forms • s or es • Irregular forms (goose/geese) • Mass vs. count nouns (fish/fish(es), email or emails?) – Possessives (cat’s, cats’) • Verbal inflection – Main verbs (sleep, like, fear) relatively regular • -s, ing, ed • And productive: emailed, instant-messaged, faxed, homered • But some are not: – eat/ate/eaten, catch/caught/caught – Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive » Be: am/is/are/were/was/been/being – Irregular verbs few (~250) but frequently occurring • Particles occur in only one form: in English – Prepositions: to, from – Adverbs: happily, quickly – Conjunctions: but, and – Articles: the, a, an • So….English inflectional morphology is fairly easy to model….with some special cases... Derivational Morphology • Word stem + syntactic/grammatical morpheme new words – Usually produces word of different class – Incomplete process: derivational morphs cannot be applied to just any member of a class • Verbs --> nouns – -ize verbs -ation nouns – generalize, realize generalization, realization • Verbs, nouns adjectives – embrace, pity embraceable, pitiable – care, wit careless, witless • Adjective adverb – happy happily • But process is selective in unpredictable ways – Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness – Meanings of derived terms harder to predict by rule • clueless, careless, nerveless, sleepless • Derivation can be applied recursively: – Hospital hospitalize hospitalization prehospitalization … – Morphological analysis identifies concatenative process as well as morphemes [pre[[[hospital]ize]ation]] – Bracketing paradoxes unhappier [un[happier]: not happier [[unhappy]er]: more unhappy Compounding • Two base forms join to form a new word – Bedtime, Weinerschnitzel, Rotwein – Careful? Compound or derivation? Affixes can be attached to stems in different ways – Prefixation • Immaterial – Suffixation: more common across languages than prefixation • Trying – Circumfixation: combine prefixation and suffixation • Gesagt – Infixation • English: Absobl**dylutely • Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red) kumilad (to be red)) Concatenative vs. non-concatenative morphology • Semitic root-and-pattern morphology – Root (2-4 consonants) conveys basic semantics (e.g. Arabic /ktb/) – Vowel pattern conveys voice and aspect – Derivational template (binyan) identifies word class Template CVCVC CVCCVC CVVCVC tVCVVCVC nCVVCVC CtVCVC stVCCVC Vowel Pattern active katab kattab ka:tab taka:tab nka:tab ktatab staktab passive kutib write kuttib cause to write ku:tib correspond tuku:tibwrite each other nku:tib subscribe ktutib write stuktib dictate Morphotactics • What are the ‘rules’ for word construction in a language? – pseudointellectual vs. *intellectualpseudo – rationalize vs *izerational – cretinous vs. *cretinly vs. *cretinacious • Possible ‘rules’ – Suffixes are suffixes and prefixes are prefixes – Certain affixes attach to certain types of stems (nouns, verbs, etc.) – Certain stems can/cannot take certain affixes, e.g. • Semantics: In English, un- cannot attach to adjectives that already have a negative connotation: – Unhappy vs. *unsad – Unhealthy vs. *unsick – Unclean vs. *undirty • Phonology: In English, -er cannot attach to words of more than two syllables – great, greater – Happy, happier – Competent, *competenter – Elegant, *eleganter – Unruly, unrulier???? Morphological Representations: Evidence from Human Performance • Hypotheses: – Full listing hypothesis: words listed – Minimum redundancy hypothesis: morphemes listed • Experimental evidence: – Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither – Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage) – But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart) • Speech errors suggest affixes must be represented separately in the mental lexicon – ‘easy enoughly’ for ‘easily enough’ Summing Up • Different languages have different morphological systems – If we can discover how to decode such a system, we can identify useful information about the word class and the semantic meaning of a word – Morphological rules provide basis for morphological analyzers (computational morphology) • Next time: – Read Ch 3.2-3.8 (new version)