Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics University of Helsinki Juhani Järvikivi juhani.jarvikivi@joensuu.fi General Linguistics University of Joensuu 1 Table of contents • • • • • • • Background Goals of this research Some words about synonymy Corpus-based results Psycholinguistic test results Combining and interpreting the evidence Conclusion 2 Background: traditional descriptions of synonyms and their usage • lexical descriptions that contain information about synonyms, i.e. general dictionaries or specific synonym dictionaries/thesauri, rarely provide extensive and/or explicit information on the usage or contextual limitations of these synonyms or their interchangability • synonyms are actually used to describe each other • Examples – cognitive verbs ~ think/ponder: – Collins Cobuild English dictionary (2001) • corpus-based – Comprehensive dictionary of Finnish (i.e. PSK 1990/1997) • word-card corpus-based 3 Collins – ponder • If you ponder something, you think about it carefully • I found myself constantly pondering the question: ’How could anyone do these things’ ... The prime minister pondered on when to go to the polls .. I’m continually pondering how to improve the team • V n | V on/over n | V wh | ALSO V – deliberate • [3/3] if you deliberate, you think about something carefully, especially before making a very important decision • She deliberated over the decision for a good few years before she finally made up her mind ... The six-person jury deliberated about two hours before returning with the verdict ... The Court of Appeals has been deliberating his case for almost two weeks • V prep | V | V n 4 What are the indicated differences between ponder and deliberate • frequency – ponder vs. deliberate • description – if you deliberate/ponder, you think about something carefully ... • deliberate: ... especially before making a very important decision • syntax – common: V n | V – ponder: V on/over n | V wh – deliberate: V prep 5 PSK – [1/2] miettiä • ajatella, harkita, pohtia, punnita, tuumia, aprikoida, järkeillä, mietiskellä • Mitä mietit? ... Asiaa täytyy vielä miettiä .. Mietin juuri, kannattaako ollenkaan lähteä ... Vastasi sen enempää miettimättä. ... Mietti päänsä puhki. – pohtia • ajatella jotakin perusteellisesti, eri mahdollisuuksia arvioiden, harkita, miettiä, tuumia, ajatella, järkeillä, punnita, aprikoida • Pohtia arvoitusta, ongelmaa ... Pohtia kysymystä joka puolelta ... Pohtia keinoja asian auttamiseksi. 6 A rough English approximation of the PSK examples for pohtia and miettiä – miettiä ~ M-think • think, consider, ponder, weigh, muse, wonder, think rationally, contemplate • What are you thinking about? ... One still has to think about the issue ... I’m thinking right now, is it any worth going at all ... Answered withing any further thought ... Pondered his head ”off” – pohtia ~ ponder • consider something thoroughly, evaluating every possibility, consider, M-think, muse, think, think rationally, weigh, wonder • ponder a puzzle, problem ... Consider the issue from every angle ... Consider ways to improve the situation 7 What are the differences between miettiä and pohtia • descriptions – common: ajatella ~ think, harkita ~ consider, tuumia ~ muse, järkeillä ~ think rationally, punnita ~ weigh, aprikoida ~ wonder – miettiä: mietiskellä ~ contemplate, meditate – pohtia: ajatella jotakin perusteellisesti, eri mahdollisuuksia arvioiden ~ consider something thoroughly, evaluating the different possibilities • no differences indicated in grammatical usage 8 Background: Linguistic studies on synonym usage • numerous studies have shown that a wide range of factors influence which word in a synonym group is actually chosen Synonyms are not as fully interchangable as they have been naively interpreted These studies are typically corpus-based 9 Linguistic studies cont’d • These factors include e.g. – register, intended style, situation (Zgusta 1971, Biber 1998) – lexical and syntactic context (Biber 1998) – functional context (Atkins 199x) – (word-internal) morphological context, i.e. inflected form (Arppe 2002) Sinclair (1991) has further argued, that each inflected form of a lexeme could in principle have independent usage contexts, e.g. concerning collocatess 10 Goals of this study • The factors that have been noted to influence the selection and usage of synonyms have been observed mainly using large corpora Do the corpus-based results on differences in the usage of synonyms match the linguistic intuitions of native speakers, i.e. subjective acceptability ratings How could combining two types of linguistic evidence be used to enhance existing lexicographical descriptions of word usage 11 A few words on synonymy • as a premise absolute synonymy, i.e. full interchangeability in all possible contexts, is not expected to exist in practice or to be found in the corpora or otherwise • on a naive level synonymy is believed to exist, as speakers of a language feel that some words can be interchanged with each other without an essential change in the meaning and connotations of an utterance • synonymy is interpreted as near-synonymy in this study 12 A description of synonymy (Cruse 2000: 156-160) • ”based on empirical, contextual evidence” • ”synonyms are words 1) whose semantic similarities are more salient than their differences 2) that do not primarily contrast with each other; and 3) whose permissible differences must in general be either minor, backgrounded, or both” 13 The corpus-based study • A refinement of Arppe (2002) • based on lexicographical sources (descriptions, examples) and frequency information a pair of Finnish cognitive verbs had been chosen miettiä and pohtia ~ think, consider, ponder • approximately 2 million words of Finnish newspaper text • automatically morphosyntactically analyzed using Connexor’s Functional dependency (FDG) parser 14 The corpus-based study (cont’d) • all instances of the selected two verbs and selected argument types (agent) were manually identified and the analyses were corrected if necessary • the agents were manually semantically classified according to WordNet (Miller et al. 1991) • t-score (Church et alii 1991) is used to highlight the differences in the frequency of contextual features morpho-syntactic features considered similar to lexemes (that Church et alii observed) 15 Judgements in synonymy: pohtia • Hallitus pohtii lähiviikkoina, pitääkö se kiinni lupauksestaan painaa valtion menot vuonna 1995 reaalisesti vuoden 1991 tasolle. The government is considering in the coming weeks whether it will keep its promise to push public spending in 1995 down to the level of 1991. • ??? Hallitus miettii lähiviikkoina, pitääkö se kiinni lupauksestaan painaa valtion menot vuonna 1995 reaalisesti vuoden 1991 tasolle. 16 Judgements in synonymy: pohtia • Työryhmässä oli erillinen jaos, joka pohti moottorikelkkailua Lapin läänissä. There is a separate subgroup in the working group which was considering motor-sledding in the province of Lapland • ??? Työryhmässä oli erillinen jaos, joka mietti moottorikelkkailua Lapin läänissä. 17 Judgements in synonymy: pohtia • Nato pohtii laajentamiskysymystä kokouksessaan Brysselissä. Nato is considering the issue of expansion in its meeting in Brussels. • ??? Nato miettii laajentamiskysymystä kokouksessaan Brysselissä. 18 Judgements in synonymy: miettiä • Mietin muuttoa pari vuotta, laskin yhteen plussia ja miinuksia. I considered moving for a couple of years, I counted together the plusses and minuses. • ??? Pohdin muuttoa pari vuotta, laskin yhteen plussia ja miinuksia. 19 Judgements in synonymy: miettiä • Aina kun mietin, että synnyttäisin lapsen, ajatus tuntui mahdottomalle. Always when I’m considering that I would give birth to a child, the thought seems inconceivable. • Aina kun pohdin, että synnyttäisin lapsen, ajatus tuntui mahdottomalle. 20 Obvious conclusions? • pohtia is tilted toward collective human subjects such as eduskunta ’parliament’, jaos ’subdivision’ or Nato ’NATO’ • miettiä is tilted towards individual, personal subjects, as in the 1st person singular 21 The corpus strikes back I • ... miksi Suomessa jopa eduskunta miettii milloin kaupan ovi saa olla auki? ... why in Finland even the Parliament is considering when a shop can have its doors open? • MTK miettii ehtoja tänään. MTK is considering its negotiation terms today. 22 The corpus strikes back II • Liikenneministeriön työryhmä miettii parhaillaan, miten tunnuksettomia puheluita pitäisi kohdella. A working group in the Transport Ministry is presently considering how nonprefixed calls should be treated. • Yhtä kuitenkin pohdin. There is one issue that I’m considering. 23 Preliminary conclusions • the two verbs are more interchangeable, i.e. synonymous, than one would suspect at first collective human subjects can be used also with miettiä individual, personal subject can be used also with pohtia 24 Data on the occurrences of the two verbs • 410 occurrences of miettiä 49 unique word forms • 445 occurrences of pohtia representing 45 unique word forms • 25 of the morphological analyses were common • active indicative present tense third person singular was the most frequent form 85 occurrences of miettii 145 occurrences of pohtii 25 Corpus-based results – morphological preferences t-score Fisher’s exact test 2.358 2.148 -2.705 -8.170 1.000000 1.000000 0.000013 0.000001 Verb miettiä pohtia miettiä pohtia nfeature,verb/ nfeature,total 24/26 206/336 130/336 2/26 Morpho-syntactic feature 0_SG1 0_SG3 0_SG3 0_SG1 26 Corpus-based results – preferences of agent types t-score 1.908 1.844 0.679 0.560 0.480 0.0 0.0 -0.791 -2.307 -3.518 Fisher’s exact test 1.0000 1.0000 1.0000 0.9089 1.0000 0.2700 0.5199 0.3067 0.0004 0.0004 Verb pohtia pohtia pohtia miettiä pohtia miettiä miettiä pohtia miettiä miettiä nfeature,verb/ nfeature,total 34/44 155/254 2/2 4/6 1/1 0/2 0/1 2/6 99/254 10/44 Semantic category of subject/agent SEM_HUMAN_GROUP SEM_HUMAN_INDIVIDUAL SEM_COGNITION SEM_LOCATION SEM_ACTIVITY SEM_COGNITION SEM_ACTIVITY SEM_LOCATION SEM_HUMAN_INDIVIDUAL SEM_HUMAN_GROUP 27 Corpus-based results - summary • there seemed to be statistically significant differences in the preferences of either verb according to the person and countability of the agent • 1st person singular frames prefer miettiä • 3rd person singular collective human frames prefer pohtia 28 Psycholinguistic Experiments • Two off-line experiments – Forced Choice – Acceptability Rating • Hypotheses based on the corpus-based results – 1st person singular agents (1SG) prefer miettiä – 3rd person collective agents (3COLL) prefer pohtia 29 XP 1: Forced choice Materials • 31 sentence triplets with 31 sentence frames and three different verbs for each triplet, e.g., • Anu Joutsasta pohti hetken ~ Anu from Joutsa thought for a moment • Anu Joutsasta mietti hetken • Anu Joutsasta ajatteli hetken • The materials were constucted by using (slightly edited) natural instances with either experimental verb as the sentence frame for the other(s) the source of the natural instances was the same corpus as in the corpus-based study 30 Forced choice (cont’d) • The two experimental verbs (pohtia vs. miettiä) and the fillers (ajatella) were presented semirandomized within each triplet in the appropriate inflected form. • The participants were instructed to select the most natural sentence from each triplet and check the appropriate box on the experimental sheet. • 21 Finnish native speakers participated in the Experiment 31 Results: XP 1(1) (N=520) 60,0 50,0 miettiä 45.0 35.8 19.2 pohtia 10.4 31.9 57.7 40,0 % % 1sg 3sg 3coll Miettiä 30,0 Pohtia 20,0 10,0 0,0 1sg 3sg 3coll 32 Results: XP1(2) • The overall distribution of responses differed significantly from chance (2 , p < .0001) • The 1SG agent clearly preferred the verb miettiä (2 , p < .001) • The 3SG-COLLECTIVE agent had a clear preference for the verb pohtia (2 , p < .001) • There was no preference either way in the 3SG (non-collective) category (2 , n.s.) 33 XP 2: Acceptability rating • Sentence frames with each Agent Type (1SG, 3SG & 3COLL) – 21 frames each – were used to construct the experimental sentences with both the verbs miettiä and pohtia as well as the closely related verb filler ajatella ~ think (generic) The sentence frames were based on natural instances extracted from the corpus used in the corpus-based study 1/3 of the sentences had the original verb in the corpus, 2/3 had another verb in the corresponding form • this amounted to 63 test sentences per test subject • 40 filler sentences were constructed with the verbs käsittää and ymmärtää ~ understand (20 + 20) 34 Acceptability rating (cont’d) • The experimental sentences as well as the sentences with ajatella were counter-balanced over three experimental lists • Each list included the same 40 filler sentences • Altogether 103 sentences were presented randomized on three experimental sheets • The verbs were presented in angle brackets, e.g., Anu Joutsasta <ajatteli> hetken 35 Acceptability rating (cont’d) • The three sheets were distributed to 54 Finnish native speakers (as) evenly (as possible) • The participants were instructed to evaluate the acceptability of each verb in the sentence frame on a scale of 1-7 by checking the appropriate box on the sheet. 36 Mean Acceptability Scores XP2 1SG 3SG 3COLL miettiä pohtia 5.6 5.3 4.5 5.2 5.6 5.4 37 MAS Mean Acceptability Scores XP2 6,0 5,5 5,0 4,5 4,0 3,5 3,0 miettiä pohtia 1sg 3sg 3coll 38 • Significant main effect of Agent Type • Significant interaction of Agent Type and Verb • Agent Type significant with miettiä but not with pohtia • miettiä: 3COLL significantly less acceptable than either 1SG or 3SG (p<.001), no difference between 1SG and 3SG (p>.2) 39 • Within the three Agent Types: – SG1: miettiä significantly more acceptable than pohtia (p < .01) – SG3: no significant difference (p > .1) – 3COLL: miettiä significantly less acceptable than pohtia (p < .001) 40 Discussion • both the corpus-based evidence and the psycholinguistic test results converge • the psycholinguistic test results deepen the picture that the corpus provides and give an explanation for the mechanism that drives the selection of either verb in a particular context/frame A word can be selected simply because the alternative is not preferred 41 Relationships between the different types of evidence • the forced choice tests reflect normal actual usage situations (~ performance) and thus mirror the corpus-based results • the acceptability tests reflect the general linguistic insights about what is considered possible and what is not (~ competence) sounds like building blocks for generative descriptions 42 Conclusions • the two types of empirical definitely show that the two near-synonymous verbs differ in usage regarding the studied features • combining two types of empirical linguistic evidence can be used to enhance and enrich lexical descriptions 43 Questions, Comments, Critique, Discussion