Morphology

advertisement
n
Introduction to Morphology
Lexicon
Word
Morphology
Word as the smallest free form that appears in a language
What are those things in the tree? Birds.
Qu'est-ce que c'est que ça?
Des oiseaux.
Not: *Oiseaux.
n
Lexeme (lemma)
n
We can begin with a rough conception of forms that a word can take on while still being the same word --
n
One entry in the dictionary for sing, sang, sung, another for singer.
n
Alternate forms of the same lexeme are formed by inflectional morphology; if there is a common (fixed) form,
it’s called the inflectional stem.
n
Derivational morphology
n
Forms new words (new lexemes) from other words. Typically, the meaning changes. (When does it not? No
problemo! )
n
The change in meaning can be subtle, difficult to make explicit; conditions on the base may be complex; each
suffix has its history --
n
Unlike the case of inflectional morphology.
n
Morpheme: smallest unit of language that carries information about meaning or function:
build; build-er; house; houses.
n
Are we saying that a morpheme
must have a characterizable meaning?
n
No. But that is the usual case, without a doubt.
n
Grammatical vs lexical morphemes
n
When we can identify a word (or a part of a word) as being a morphological constituent and being composed of
two morphemes,
n
we can identify one of them as the base and the other the affix.
n
Except when…. (compounds, reduplication, …)
n
Lexical morpheme
n
When a word consists of one morpheme, it is a lexical morpheme.
n
When it consists of two morphemes, it is the base: that which is not the affix.
n
Derivational morphology
n
Deals with the relationship between morphologically simple forms -- roots -- and more complex forms which are
distinct lexemes.
n
monomorphemic (simple) words; complex or polymorphemic words.
no natural connection between sound and meaning (??)
Free morpheme: can stand as a word by itself
Bound morpheme: cannot.
n
Allomorphs: a single morpheme with more than one phonological realization. say/sez. a/an. Often the result of
history of the language.
Terms:
Bound morpheme, free morpheme
base plus affix
root
inflectional versus derivational:
ODA: same word; change/not change category or "type of meaning";
order: derivational before inflectional
productivity
regularity of form?
stem = word + inflection
semantically transparent versus opaque
compounds: 2 stems: endocentric (normal vs
exocentric (redskin, highbrow, Maple Leafs)
?Sino-Soviet, Howard Johnson (John Goldsmith) Anglophobe
upgrades of affixes to stems: emic, etic, ese, ism.
Baby sitter, ice breaker, cake-icer, lawn-mower, star-gazer
5 footer
n
Roots and affixes: complex words consist often of a root plus affixes. Prefixes, suffixes. The category of the
word may be determined by either -- though it's typically the affix, not the base.
n
List prefixes in English:
p. 129: Prefixes and suffixes. Associated with categories (in input and in output: suffixes change category): -able
suffix.
n
From Aronoff:
Assume English stress rule is Latin stress rule -- penult or antepenult.
inexplicable
hospitable
explicable
despicable
formidable
n
n
Two forms:
comparable
réparable
réfutable
préferable
comparable
repairable
refútable
preférable
circumscribe
extend
defend
perceive
divide
deride
Truncation:
tolerate
negotiate
vindicate
demonstrate
exculpate
circumscriptible circumscribable
extensible
extendable
defensible
defendable
perceptible
perceivable
divisible
divdable
derisible
deridable
tolerable
*toleratable
negotiatable
*negotiatable
vindicable
*vendicatable
demonstrable *demonstratable
exculpable
*exculpatable
but
n
debate
debatable
*debable
n
infixes: fuckin' in English; others in Tagalog:
takbuh:
t-um-akbuh
run/ran
lakad
l-um-akad
walk/walked
pili?
p-in-ili? choose/chose
n
Arabic intercalation:
katab 'write'
kutib 'have been written'
aktub 'be writing'
uktab 'being written'
n
Cliticization: short unstressed forms that 'lean on' a neighboring word:
I'm leaving now
Mary's going to succeed
They're here now.
n
Je ne le crois pas.
n
Internal change: relating allomorphs: run/ran; sing/sang/sung.
n
Nouns: often marked for :
.number
.possessor
.case
.gender
n
verbs:
subject agreement
object agreement
tense, aspect
n
adjectives:
agreement with object referred to for number, case, gender
degree of comparison (-er, -est)
1.ninasema
8. wanasema
2. wunasema
9. ninapika
3. anasema
10. ninaupika
4. ninaona
11. ninakupika
5. ninamupika
12. ninawapika
6. tunasema
13. ananipika
7. munasema
14. ananupika
15. nilipika
25. wutakanipikizwa
16. nilimupika
26. sitanupika
17. nitakanupika
27. hatanupika
18. nitakapikiwa
28. hatutanupika
19. wutakapikiwa
29. hawatatupika
20. ninapikiwa
21. nilipikiwa
22. nilipikaka
23. wunapikizwa
24. wunanipikizwa
• Morphology: Words
and their Parts
•
CS 4705
•
Basic Uses of Morphology
•
The study of how words are composed from smaller, meaning-bearing units (morphemes)
•
Applications:
–
Spelling correction: referece
–
Hyphenation algorithms: refer-ence
–
Part-of-speech analysis: googler
–
Text-to-speech: grapheme-to-phoneme conversion
•
hothouse (/T/ or /D/)
–
Speech recognition: phoneme-to-grapheme conversion
–
Amusing poetry and artificial languages in standardized tests
•
‘Twas brillig and the slithy toves…
•
Muggles moogled migwiches
•
What is a word?
•
In formal languages, words are arbitrary strings
•
In natural languages, words are made up of meaningful subunits called morphemes
–
Allows for productivity: googled, texted
–
Abstract concepts denoting entities or relationships in the world
–
•
Roots +
•
Syntactic or grammatical elements
Realizations of morphemes: morphs
•
Door realizes door; take and took realize take
•
Allomorphs are classes of related morphs that realize a given morpheme
–
Allomorphs of s include en, men, es in English
–
Take and took are allomorphs of take
–
Sum: Morpheme [s] is realized by an allomorph class that includes the related morphs {en,men,es}
–
Syntactic or grammatical morphemes can convey many things
–
In Italian, mark nouns for gender and number
Singular
Plural
Masc
pomodoro
pomodori
Fem
cipolla
cipolle
pomodor- cipoll-: stems, may or may not occur on their own as words
–
Stem may not occur as a word: derivative/deriv
–
Base form (lemma) occurs as word: derivative/derive
–
Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too
•
What useful information does morphology give us?
•
Different things in different languages
•
–
Spanish: hablo, hablaré/ English: I speak, I will speak
–
English: book, books/ Japanese: hon, hon
Languages differ in how they encode morphological information
–
Isolating languages (e.g. Cantonese) have no affixes: each word usually has 1 morpheme
–
Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem
(like beads on a string) – each feature realized by a single affix, e.g. Finnish
epäjärjestelmällistyttämättömyydellänsäkäänköhän
‘Wonder if he can also ... with his capability of not causing things to be unsystematic’
–
Inflectional languages (e.g. English) merge different features into a single affix (e.g. ‘s’ in likes indicates
both person and tense); and the same feature can be realized by different affixes
–
Polysynthetic languages (e.g. Inuit languages) express much of their syntax in their morphology,
incorporating a verb’s arguments into the verb, e.g. Western Greenlandic
Aliikusersuillammassuaanerartassagaluarpaalli.
aliiku-sersu-i-llammas-sua-a-nerar-ta-ssa-galuar-paal-li
entertainment-provide-SEMITRANS-one.good.at-COP-say.that-REP-FUT-sure.but-3.PL.SUBJ/3SG.OBJ-but
'However, they will say that he is a great entertainer, but ...'
–
So….different languages may require very different morphological analyzers
•
Morphology Can Help Define Word Classes
•
AKA morphological classes, parts-of-speech
•
Closed vs. open (function vs. content) class words
–
Pronoun, preposition, conjunction, determiner,…
–
Noun, verb, adverb, adjective,…
•
Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic
detection…very basic semantics
•
(English) Inflectional Morphology
Word stem + grammatical morpheme  different forms of same word
–
Usually produces word of same class
–
Usually serves a syntactic or grammatical function (e.g. agreement)
like  likes or liked
bird  birds
•
Nominal morphology
–
–
•
Plural forms
•
s or es
•
Irregular forms (goose/geese)
•
Mass vs. count nouns (fish/fish(es), email or emails?)
Possessives (cat’s, cats’)
Verbal inflection
–
Main verbs (sleep, like, fear) relatively regular
•
-s, ing, ed
•
And productive: emailed, instant-messaged, faxed, homered
•
But some are not:
–
–
eat/ate/eaten, catch/caught/caught
Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive
»
–
Be: am/is/are/were/was/been/being
Irregular verbs few (~250) but frequently occurring
»
Particles occur in only one form: in English
–
Prepositions: to, from
–
Adverbs: happily, quickly
–
Conjunctions: but, and
–
Articles: the, a, an
–
Japanese?
•
So….English inflectional morphology is fairly easy to model….with some special cases...
•
Derivational Morphology
•
Word stem + syntactic/grammatical morpheme  new words
–
Usually produces word of different class
–
Incomplete process: derivational morphs cannot be applied to just any member of a class
•
•
•
Verbs --> nouns
–
-ize verbs  -ation nouns
–
generalize, realize  generalization, realization
–
synthesize but no synthesization
Verbs, nouns  adjectives
–
embrace, pity embraceable, pitiable
–
care, wit  careless, witless
Adjective  adverb
–
•
happy  happily
Process selective in unpredictable ways
–
Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness
–
Meanings of derived terms harder to predict by rule
•
•
clueless, careless, nerveless, sleepless
Derivation can be applied recursively:
–
Hospital  hospitalize  hospitalization  prehospitalization  …
–
Morphological analysis identifies concatenative processes as well as morphemes
[pre[[[hospital]ize]ation]]
–
But there are bracketing paradoxes
unhappier
[un[happier]: not happier
[[unhappy]er]: more unhappy
•
Compounding
•
Two base forms join to form a new word
•
–
Bedtime, Weinerschnitzel, Rotwein
–
Careful? Compound or derivation?
Affixes can be attached to stems in different ways
–
Prefixation
•
–
Suffixation: more common across languages than prefixation
•
–
Immaterial
Trying
Circumfixation: combine prefixation and suffixation
•
–
–
•
Gesagt
Infixation
•
English: Absobl**dylutely
•
Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red)  kumilad (to be red))
Concatenative vs. Non-concatenative Morphology
Semitic root-and-pattern morphology
–
Root (2-4 consonants) conveys basic semantics (e.g. Arabic /ktb/)
–
Vowel pattern conveys voice and aspect
–
Derivational template (binyan) identifies word class
Template
Vowel Pattern
active
passive
CVCVC
katab
kutib
CVCCVC
kattab
kuttib cause to write
CVVCVC
ka:tab
write
ku:tib correspond
tVCVVCVC
taka:tab
tuku:tib write each other
nCVVCVC
nka:tab
nku:tib subscribe
CtVCVC
ktatab
ktutib write
stVCCVC
staktab
stuktib dictate
•
Morphotactics
•
What are the ‘rules’ for constructing a word in a given language?
•
•
–
Pseudo-intellectual vs. *intellectual-pseudo
–
Rational-ize vs *ize-rational
–
Cretin-ous vs. *cretin-ly vs. *cretin-acious
Possible ‘rules’
–
Suffixes are suffixes and prefixes are prefixes
–
Certain affixes attach to certain types of stems (nouns, verbs, etc.)
–
Certain stems can/cannot take certain affixes
Semantics: In English, un- cannot attach to adjectives that already have a negative connotation:
–
Unhappy vs. *unsad
–
Unhealthy vs. *unsick
–
Unclean vs. *undirty
•
Phonology: In English, -er cannot attach to words of more than two syllables
–
great, greater
–
Happy, happier
–
Competent, *competenter
–
Elegant, *eleganter
–
Unruly, ?unrulier
•
Morphological Parsing
•
These regularities enable us to create software to parse words into their component parts
–
Known words and new ones (e.g. Pneumonoultramicroscopicsilicovolcanoconiosis, Columbianize,
Columbianization)
•
Morphological Representations: Evidence from Human Performance
•
Hypotheses:
•
•
–
Full listing hypothesis: words listed
–
Minimum redundancy hypothesis: morphemes listed
Experimental evidence:
–
Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither
–
Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)
–
But spoken derived words can prime stems if they are semantically close (e.g. government/govern but
not department/depart)
Speech errors suggest affixes must be represented separately in the mental lexicon
–
‘easy enoughly’ for ‘easily enough’
•
Summing Up
•
Different languages have different morphological systems
•
–
If we can discover how to decode such a system, we can identify useful information about the word
class and the semantic meaning of a word
–
Morphological regularities provide basis for building (automatic) morphological analyzers
Next time: Read Ch 3.2-3.6
–
HW1 will be assigned (check the course syllabus and courseworks)
•
Announcements
•
HW1 will now be due 9/25/07
•
WICS lunch tomorrow at noon in the CS Lounge, 452 MUDD (rsvp to hila@cs.columbia.edu)
Download