A definition of productivity

LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt In this lecture  We look at some ways in which corpora can be useful in morphological research.  Main focus: morphological productivity Part 1 Morphology, corpora and productivity Productivity in linguistics  The term “productivity” is used in a wide variety of contexts.  Syntactic rules are “productive” in the sense that they can be used to generate new phrases.  The same can be said of some morphological rules. A definition of productivity  A linguistic process is productive if:  It can be used to produce novel forms.  If a rule is productive, then:  Novel forms (previously unheard) can be understood and produced;  There is no need to store all forms in the mental lexicon. A couple of examples  Imagine an English adjective garmy. How would you derive a noun out of this adjective?  Many speakers might say garminess  This suggests that –ness suffixation is a productive derivational process.  E.g. Imagine a Maltese verb intoffa. How would you produce a noun from it?  Speakers might say intoffar or inttofament or intoffazzjoni  This suggests that –ar and –ment suffixation are productive derivational processes in Maltese. Productive vs non-productive  Some morphological processes or categories seem to have greater potential to form new words than others  e.g. English -able, -ness  compare to English –th: warmth, strength… (much less productive) Classical approaches to productivity  Jackendoff (1975):  morphological rules are called redundancy rules:  They capture the relationship between related forms   E.g. Warm  warmth (ADJ  N via addition of –th) E.g. Desire  desirable (N  ADJ via addition of –able)  If a rule is productive, then it can be used to create novel forms.  e.g. adjectives with –able can be produced “online” Features of classical approaches 1. Relies on a binary distinction (un/productive) 2. Productive rules are typically regular & sub-regularities not considered much (Dressler 2003) 3. Most of these approaches do not look at corpus data Productive vs regular  Usually, productive morphological rules are regular. Irregular forms are likely to be stored in the lexicon.  However, we can sometimes detect “sub-regularities”:    sing-sang ring-rang bring-brang (?)  Speakers can sometimes generalise these sub-regular processes, perhaps by analogy.  What’s the past tense of tring or spling? “Possible” vs “attested”  Our tentative definition of productivity focuses on production of novel forms.  By definition, novel forms are:  Possible words of the language;  Previously unattested.  This would suggest that we can’t use corpora to study productivity.  Corpora only contain attested forms. The problem of frequency  Suppose we find that a corpus contains lots of words ending in some suffix –X.  This doesn’t necessarily imply that the -X suffix is productive.  It could have been productive in the past, but is not anymore.  Therefore, the likelihood of a new word ending in –X is low, despite the high frequency. Getting around the problem  Frequency can’t give us all the answers. However, one interesting solution is to look at hapax legomena.  A corpus will usually contain lots of words occurring only once.  We can think of hapaxes as “one-offs”.  It seems likely that some hapaxes will be “new formations”  NB We can only make this assumption if the corpus is very large. Corpus-based approaches  View productivity as a gradable phenomenon:  some forms become ingrained through frequent usage  category can still be productive to some extent  productivity estimated in terms of a category’s potential to produce new forms  can account for sub-regularities: productivity of a category is due to a lot of factors, including analogy to existing words The continuum ADJ+ness  Noun ADJ+th  Noun Productive morphological process  Productive processes tend to:  be compositional  result in a lot of new words lexicalised word Why is productivity interesting?  No finite lexicon can contain all words of a language at a certain time  productive processes can be exploited to parse new/unseen lexical items  this is helped by the compositionality of productive processes  can also help to distinguish creative neologism from systematic rule-application. compare:  well-defined, well-intentioned, well-specified  lots of adjectives with a well- prefix  YouTube  a one-off Theoretical implications  raises interesting questions about the relationship between corpus-based measures and psycholinguistic data  likelihood of a morphological process being applied depends on style, genre, speech community…  can give an indication of language change over time (some processes are fossilised, others become more productive) Statistical measures of productivity (Baayen 2006) What we need  A measure of productivity of a process/category C should reflect:  our intuitions about how frequently we encounter C  how easily native speakers can form new words using C  Is it easier to produce a noun with – th (like warmth) or one with –ness (like goodness)? An analogy  We can compare morphological processes to companies.  All try to dominate a market where the number of clients (words) is limited.  Productivity reflects the extent to which these companies:  have managed to dominate in the past (how many words they’ve formed)  are expanding into new areas of the market (how many new words they’re forming)  may expand in the future (how many as yet unseen words they’re going to form) Realised productivity (RP)  Given a morphological category C, RP gives a rough indication of the past utility of C in forming new words.  Measured as the number of distinct types in C in a corpus of size N  E.g. regular past tense –ed displays many more types than sub-regular forms such as keep-kept/sleep-slept Realised productivity cont/d  Why types, not tokens?  Productive processes have lots of types which are hapaxes, or are very infrequent (low token frequency).  Words formed from irregular processes tend to be very frequent (have high token frequency).  Some limitations:  a high RP for a category does not imply that it will keep forming lots of new words  RP is heavily dependent on corpus size Expanding productivity (P*)  P* gives a rough indication of the rate of expansion of C.  Focuses on the number of hapaxes produced using C in the corpus.  aka hapax-conditioned productivity P*  No. of hapaxes formed using C total no. of hapaxes in corpus  NB: P* is still heavily dependent on corpus size! Potential productivity (P)  Gives an indication of how likely a category C is to form new words in future.  I.e. the potential for C to be already saturated  aka category-conditioned productivity No. of hapaxes formed using C P*  total no. of tokens formed using C Some more on P  Unlike RP and P*, P is not very sensitive to corpus size as such  However, very sensitive to frequency of the category.  e.g. if C is realised only once in a corpus of size N, then P = 1!  Recent empirical work has shown that RP and P* may correlate very strongly, but both exhibit a weak correlation with P (Vegnaduzzo 2009)  pattern non-X has high RP and P*, but low P  pattern X-ish has low RP and P*, but high P P vs. RP and P*  A category C can have low RP and P*, but high P.  In this case, C hasn’t been used much in the past, but is being used quite productively at the moment.  Corresponds to the “ease” with which new words can be formed using the category.  If category has high RP, it may still be saturated, so have low P. The psycholinguistic connection 1. Rule vs. direct access:  To produce a word (e.g. illegal), you can either store it directly, or apply the rule on the fly.  Evidence suggests that frequency of baseform vs. derivation is related to which of the two alternatives apply. The psycholinguistic connection 2. Complexity-based affix ordering:  Corpus research: more productive affixes follow less productive ones in word formation  It seems that more highly predictable (low productivity) affixes are processed first.  High productivity may also imply less likelihood of entering into further derivational processes. Works cited  S. Vegnaduzzo (2009). Morphological productivity rankings of complex adjectives. Proc. NAACL-HLT Workshop on Computational Approaches to Linguistic Creativity.  K. Molinen and S. Pulman (2008). The good, the bad and the unknown: Morphosyllabic sentiment tagging of unseen words. Proc. ACL 2008  Baayen 2006 linked from web page

A definition of productivity

Related documents

Products

Support

A definition of productivity

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib