Salience and Frequency of Meanings Comparison of Corpus and Experimental Data on Polysemy Daniela Marzo, Verena Rube, Birgit Umbreit (University of Tübingen) Corpus and Cognition Workshop, Corpus Linguistics Conference 2007 Structure 1. 2. 3. 4. 5. 6. Hypotheses Evidence from other studies Production experiment Corpus analysis Comparison Résumé 2 1. Hypotheses Hypothesis 1: Frequency = Entrenchment (≈ Salience) Schmid (2000): From-Corpus-to-Cognition Principle, compare: Langacker (1991) 3 Hypothesis 2: Frequency ≠ Salience (Gernsbacher 1984, De Mauro 1980) 4 2. Evidence from previous studies Roland & Jurafsky (2002), Gilquin (2006 and 2007) - comparison of corpus studies and experiments on polysemy → differences in verb sense distributions → Evidence against Hypothesis 1 5 - But Gilquin (2007) claims that there are some links between salience and frequency: If phrasal and colloquial meanings are excluded from her analysis of to take, the ‘move’ sense is most prominent in the corpus and in the experiment. 6 - Gilquin (2007) also claims that the prototypical (= most salient) sense has to be a concrete sense. 7 3. Production experiment Sentence Generation Task (SGT) Subjects are asked to produce sentences disambiguating the meanings of the stimulus (Caramazza and Grober, 1976; Colombo and Flores d’Arcais, 1984; Raukko, 2003) Definition Task (DT) Sentences should be accompanied by definitions (Raukko 2003) 8 Questionnaire 9 Example: Disambiguation by sentences (1) Quel bimbo diventerà grande in fretta (crescere). This child will get big very soon (to grow). (2) Ha un grande appartamento (di vaste dimensioni). He has got a big apartment (of big dimensions). 10 Example: Disambiguation by sentences Two closely related meanings - ‘spatially extended’ - ‘tall, extended in height’ were disambiguated by the informants. 11 Example: Disambiguation by definition (3) Quel cantante è grande. This singer is ?. Occurrence is ambiguous: 1. ‘tall, extended in height’ ? or 2. ‘admirable, famous’ ? or 3. ‘very able, very gifted’? 12 Example: Disambiguation by definition The definition disambiguates the sentence: Quel cantante è grande (famoso). 2. ‘admirable, famous’ 13 Comparison sample 400 stimuli 20 stimuli per questionnaire 30 informants per questionnaire Comparison sample: Results for the 15 most frequent words: andare, avere, cosa, dire, dare, dovere, essere, fare, grande, potere, sapere, stare, vedere, venire, volere 14 4. Corpus Analysis (CA) Corpus of the Lessico di frequenza dell‘italiano parlato (LIP) - - - Corpus of transcribed spoken language easily accessible via the banca dati dell’italiano parlato (BADIP) ca. 490000 words 5 different types of oral texts: telephone conversations, university lectures, etc. recordings were made in 4 Italian cities: Florence, Milan, Rome and Naples. 15 Constituting the Subcorpus For each of the 15 sample words: 50 randomly selected occurrences of the Florence subcorpus (10 per text type) ~ average number of sentences in the SG&DT Only occurrences of the original stimulus! (No idioms, etc.) 16 5. Comparison (i) The most salient meanings in the SG&DT in the majority of the cases (9/15) correspond to the most frequent meanings of the CA. venire, andare, dire, dovere, essere, fare, potere, sapere, vedere 17 Comparison for sapere 70 60 50 40 30 20 10 0 Experiment Corpus . t . f ll g sth d ou o sth stand ean te o sme uage st to uous anin w e s m r to in d o ta to lang att big me kn to f e to unde o t to am ew o f# t bl to o n a e nd no b a m to om c d oo g a ve a h to 18 Comparison for essere 50 40 30 20 10 0 Experiment Corpus s NM c ) ti tyj)ec te) red) e ofe ase anng toc ost x isiton)al lepdpensureher) ti ong)ouN i t s e n i m a t a i s ter(i dect obewehmea sa mto be lo to toofesbetoc hame(weaindi camb c ra e a m b h tobe e to (pr to ha to abb setr (s o to be t e c o + tim b t + b + o ( t to e ted ( to e( b b to oc a to l be to 19 (ii) If the most important meaning in one data type does not correspond to the most important meaning in the other data type, it is likely to correspond to the second most important meaning. Sometimes this works in both directions. cosa, avere, dare, grande, stare, volere 20 Comparison for grande 40 35 30 25 20 15 Experiment Corpus 10 5 0 t s s e p d n un ing ted ou ag ee guou tio si ty nde o n n d a # / an m e e m u i e l a h e t t t f g i b a a i g n m S b ~ # ex ,t hi ,i am le ig ed l ly ew e# t act b b l a n f r i i a t b t g a no bs mir suita sp be #a ad o t g bi 21 (iii) The less frequent meanings in the CA and the less salient meanings in the SG & DT usually diverge. 22 Comparison for essere 50 40 30 20 10 0 Experiment Corpus s NM c ) ti tyj)ec te) red) e ofe ase anng toc ost x isiton)al lepdpensureher) ti ong)ouN i t s e n i m a t a i s ter(i dect obewehmea sa mto be lo to toofesbetoc hame(weaindi camb c ra e a m b h tobe e to (pr to ha to abb setr (s o to be t e c o + tim b t + b + o ( t to e ted ( to e( b b to oc a to l be to 23 Importance of concrete meanings In the cases of result (ii): Experiment: most important meaning = “concrete” meaning Corpus: most important meaning = “abstract” meaning e.g. cosa Corpus: ‘object #abstract’ 84 % Experiment: ‘object #concrete’ 39 % 24 ⇒ confirmation of the claim that the “prototypical” sense is a concrete sense ⇒ the importance of concrete meanings in the cognitive system seems to inhibit the total matching of the results 25 5. Résumé The comparison study confirmed No 1:1 correspondence between experiment and corpus data ⇒ frequency ≠ salience (Hypothesis 2) BUT: In almost two thirds of the data the most important meanings in CA and in SG&DT are identical. If they do not match there is correspondence to the second most important meaning. ⇒ frequency ≈ salience (Tendency towards Hypothesis 1 ?) 26 Comparability of the evidence? Production Corpus task data = language reflection data data = language use data 27 THANK YOU FOR YOUR ATTENTION! 28 References Caramazza, A. and E. Grober (1976) Polysemy and the structure of the subjective lexicon, in C. Rameh (ed.) Semantics: Theory and Application. Georgetown University Round Table on Linguistics 1976, pp.181-206. Washington D.C.: Georgetown University Press. Colombo, L. and G. B. Flores d’Arcais (1984) ‘The meaning of Dutch prepositions: Psycholinguistic study of polysemy’. Linguistics 22, 51-98. De Mauro, T. (1980) Guida all’uso delle parole. Come parlare e scrivere semplice e preciso. Uno stile italiano per capire e farsi capire. Roma: Editori Riuniti. De Mauro, T. and F. Mancini (1993) Lessico di frequenza dell’italiano parlato. Milano: Etas Libri. Gernsbacher, M.A. (1984) ‘Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy’. Journal of Experimental Psychology 113, 256-81. Gilquin, G. (2006) Towards an empirically grounded definition of prototypes, Poster Presentation at Linguistic Evidence II, Tübingen, 2 4 February 2006. 29 References Gilquin, G. (2007) Universality and language specificity in prototypicality. Paper presented at the AFLiCo conference, Lille, 10 -12 May 2007. Juilland, A. and V. Traversa (1973) Frequency dictionary of Italian words. The Hague: Mouton de Gruyter. Langacker, R. W. (1991) Foundations of Cognitive Grammar. Vol. II. Stanford: Stanford University Press. Raukko, J. (2003) Polysemy as flexible meaning: experiments with English get and Finnish pitää, in B. Nerlich et al. (eds.) Polysemy: Flexible Patterns of Meaning in Mind and Language, pp. 161-93. Berlin: Mouton de Gruyter. Roland, R. and D. Jurafsky (2002) Verb sense and verb subcategorization probabilities, in Stevenson, S. and P. Merlo (eds.) The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues, pp. 325-46. Amsterdam: Benjamins. Schmid, H. J. (2000) English Abstract Nouns as Conceptual Shells. From Corpus to Cognition. Berlin: Mouton de Gruyter. 30