The usage patterns and selectional preferences of synonyms in a

advertisement
Verbal Synonymy in Practice:
Combining Corpus-based and
Psycholinguistic Evidence
Antti Arppe
antti.arppe@helsinki.fi
Department of General
Linguistics
University of Helsinki
Juhani Järvikivi
juhani.jarvikivi@joensuu.fi
General Linguistics
University of Joensuu
1
Table of contents
•
•
•
•
•
•
•
Background
Goals of this research
Some words about synonymy
Corpus-based results
Psycholinguistic test results
Combining and interpreting the evidence
Conclusion
2
Background: traditional
descriptions of synonyms and
their usage
• lexical descriptions that contain information about
synonyms, i.e. general dictionaries or specific synonym
dictionaries/thesauri, rarely provide extensive and/or
explicit information on the usage or contextual
limitations of these synonyms or their interchangability
• synonyms are actually used to describe each other
• Examples – cognitive verbs ~ think/ponder:
– Collins Cobuild English dictionary (2001)
• corpus-based
– Comprehensive dictionary of Finnish (i.e. PSK 1990/1997)
• word-card  corpus-based
3
Collins
– ponder 
• If you ponder something, you think about it carefully
• I found myself constantly pondering the question: ’How could
anyone do these things’ ... The prime minister pondered on
when to go to the polls .. I’m continually pondering how to
improve the team
• V n | V on/over n | V wh | ALSO V
– deliberate 
• [3/3] if you deliberate, you think about something carefully,
especially before making a very important decision
• She deliberated over the decision for a good few years before
she finally made up her mind ... The six-person jury
deliberated about two hours before returning with the verdict
... The Court of Appeals has been deliberating his case for
almost two weeks
• V prep | V | V n
4
What are the indicated
differences between ponder and
deliberate
• frequency
– ponder  vs. deliberate 
• description
– if you deliberate/ponder, you think about something
carefully ...
• deliberate: ... especially before making a very important
decision
• syntax
– common: V n | V
– ponder: V on/over n | V wh
– deliberate: V prep
5
PSK
– [1/2] miettiä
• ajatella, harkita, pohtia, punnita, tuumia, aprikoida, järkeillä,
mietiskellä
• Mitä mietit? ... Asiaa täytyy vielä miettiä .. Mietin juuri,
kannattaako ollenkaan lähteä ... Vastasi sen enempää
miettimättä. ... Mietti päänsä puhki.
– pohtia
• ajatella jotakin perusteellisesti, eri mahdollisuuksia arvioiden,
harkita, miettiä, tuumia, ajatella, järkeillä, punnita, aprikoida
• Pohtia arvoitusta, ongelmaa ... Pohtia kysymystä joka puolelta
... Pohtia keinoja asian auttamiseksi.
6
A rough English approximation
of the PSK examples for pohtia
and miettiä
– miettiä ~ M-think
• think, consider, ponder, weigh, muse, wonder, think rationally,
contemplate
• What are you thinking about? ... One still has to think about
the issue ... I’m thinking right now, is it any worth going at all
... Answered withing any further thought ... Pondered his
head ”off”
– pohtia ~ ponder
• consider something thoroughly, evaluating every possibility,
consider, M-think, muse, think, think rationally, weigh, wonder
• ponder a puzzle, problem ... Consider the issue from every
angle ... Consider ways to improve the situation
7
What are the differences between
miettiä and pohtia
• descriptions
– common: ajatella ~ think, harkita ~ consider, tuumia ~
muse, järkeillä ~ think rationally, punnita ~ weigh,
aprikoida ~ wonder
– miettiä: mietiskellä ~ contemplate, meditate
– pohtia: ajatella jotakin perusteellisesti, eri
mahdollisuuksia arvioiden ~ consider something
thoroughly, evaluating the different possibilities
• no differences indicated in grammatical usage
8
Background: Linguistic studies
on synonym usage
• numerous studies have shown that a wide
range of factors influence which word in a
synonym group is actually chosen
Synonyms are not as fully interchangable as
they have been naively interpreted
These studies are typically corpus-based
9
Linguistic studies cont’d
• These factors include e.g.
– register, intended style, situation (Zgusta 1971, Biber
1998)
– lexical and syntactic context (Biber 1998)
– functional context (Atkins 199x)
– (word-internal) morphological context, i.e. inflected
form (Arppe 2002)
 Sinclair (1991) has further argued, that each inflected
form of a lexeme could in principle have independent
usage contexts, e.g. concerning collocatess
10
Goals of this study
• The factors that have been noted to influence the
selection and usage of synonyms have been
observed mainly using large corpora
 Do the corpus-based results on differences in the usage
of synonyms match the linguistic intuitions of native
speakers, i.e. subjective acceptability ratings
How could combining two types of linguistic evidence
be used to enhance existing lexicographical
descriptions of word usage
11
A few words on synonymy
• as a premise absolute synonymy, i.e. full
interchangeability in all possible contexts, is not
expected to exist in practice or to be found in the
corpora or otherwise
• on a naive level synonymy is believed to exist, as
speakers of a language feel that some words can
be interchanged with each other without an
essential change in the meaning and connotations
of an utterance
• synonymy is interpreted as near-synonymy in this
study
12
A description of synonymy
(Cruse 2000: 156-160)
• ”based on empirical, contextual evidence”
• ”synonyms are words
1) whose semantic similarities are more salient
than their differences
2) that do not primarily contrast with each other;
and
3) whose permissible differences must in general
be either minor, backgrounded, or both”
13
The corpus-based study
• A refinement of Arppe (2002)
• based on lexicographical sources (descriptions,
examples) and frequency information a pair of
Finnish cognitive verbs had been chosen
 miettiä and pohtia ~ think, consider, ponder
• approximately 2 million words of Finnish
newspaper text
• automatically morphosyntactically analyzed
using Connexor’s Functional dependency (FDG)
parser
14
The corpus-based study (cont’d)
• all instances of the selected two verbs and selected
argument types (agent) were manually identified
and the analyses were corrected if necessary
• the agents were manually semantically classified
according to WordNet (Miller et al. 1991)
• t-score (Church et alii 1991) is used to highlight
the differences in the frequency of contextual
features
 morpho-syntactic features considered similar to
lexemes (that Church et alii observed)
15
Judgements in synonymy: pohtia
• Hallitus pohtii lähiviikkoina, pitääkö se
kiinni lupauksestaan painaa valtion menot
vuonna 1995 reaalisesti vuoden 1991 tasolle.
 The government is considering in the coming
weeks whether it will keep its promise to push
public spending in 1995 down to the level of 1991.
• ??? Hallitus miettii lähiviikkoina, pitääkö se
kiinni lupauksestaan painaa valtion menot
vuonna 1995 reaalisesti vuoden 1991 tasolle.
16
Judgements in synonymy: pohtia
• Työryhmässä oli erillinen jaos, joka
pohti moottorikelkkailua Lapin
läänissä.
 There is a separate subgroup in the
working group which was considering
motor-sledding in the province of Lapland
• ??? Työryhmässä oli erillinen jaos,
joka mietti moottorikelkkailua Lapin
läänissä.
17
Judgements in synonymy: pohtia
• Nato pohtii laajentamiskysymystä
kokouksessaan Brysselissä.
 Nato is considering the issue of
expansion in its meeting in Brussels.
• ??? Nato miettii laajentamiskysymystä
kokouksessaan Brysselissä.
18
Judgements in synonymy: miettiä
• Mietin muuttoa pari vuotta, laskin
yhteen plussia ja miinuksia.
 I considered moving for a couple of
years, I counted together the plusses and
minuses.
• ??? Pohdin muuttoa pari vuotta, laskin
yhteen plussia ja miinuksia.
19
Judgements in synonymy: miettiä
• Aina kun mietin, että synnyttäisin
lapsen, ajatus tuntui mahdottomalle.
 Always when I’m considering that I
would give birth to a child, the thought
seems inconceivable.
• Aina kun pohdin, että synnyttäisin
lapsen, ajatus tuntui mahdottomalle.
20
Obvious conclusions?
• pohtia is tilted toward collective human
subjects such as eduskunta ’parliament’,
jaos ’subdivision’ or Nato ’NATO’
• miettiä is tilted towards individual, personal
subjects, as in the 1st person singular
21
The corpus strikes back I
• ... miksi Suomessa jopa eduskunta
miettii milloin kaupan ovi saa olla
auki?
 ... why in Finland even the Parliament is
considering when a shop can have its doors
open?
• MTK miettii ehtoja tänään.
 MTK is considering its negotiation
terms today.
22
The corpus strikes back II
• Liikenneministeriön työryhmä miettii
parhaillaan, miten tunnuksettomia
puheluita pitäisi kohdella.
 A working group in the Transport
Ministry is presently considering how nonprefixed calls should be treated.
• Yhtä kuitenkin pohdin.
 There is one issue that I’m considering.
23
Preliminary conclusions
• the two verbs are more interchangeable, i.e.
synonymous, than one would suspect at first
 collective human subjects can be used
also with miettiä
 individual, personal subject can be used
also with pohtia
24
Data on the occurrences of the
two verbs
• 410 occurrences of miettiä
 49 unique word forms
• 445 occurrences of pohtia
 representing 45 unique word forms
• 25 of the morphological analyses were common
• active indicative present tense third person
singular was the most frequent form
 85 occurrences of miettii
145 occurrences of pohtii
25
Corpus-based results –
morphological preferences
t-score
Fisher’s exact test
2.358
2.148
-2.705
-8.170
1.000000
1.000000
0.000013
0.000001
Verb
miettiä
pohtia
miettiä
pohtia
nfeature,verb/
nfeature,total
24/26
206/336
130/336
2/26
Morpho-syntactic
feature
0_SG1
0_SG3
0_SG3
0_SG1
26
Corpus-based results –
preferences of agent types
t-score
1.908
1.844
0.679
0.560
0.480
0.0
0.0
-0.791
-2.307
-3.518
Fisher’s exact
test
1.0000
1.0000
1.0000
0.9089
1.0000
0.2700
0.5199
0.3067
0.0004
0.0004
Verb
pohtia
pohtia
pohtia
miettiä
pohtia
miettiä
miettiä
pohtia
miettiä
miettiä
nfeature,verb/
nfeature,total
34/44
155/254
2/2
4/6
1/1
0/2
0/1
2/6
99/254
10/44
Semantic category of
subject/agent
SEM_HUMAN_GROUP
SEM_HUMAN_INDIVIDUAL
SEM_COGNITION
SEM_LOCATION
SEM_ACTIVITY
SEM_COGNITION
SEM_ACTIVITY
SEM_LOCATION
SEM_HUMAN_INDIVIDUAL
SEM_HUMAN_GROUP
27
Corpus-based results - summary
• there seemed to be statistically significant
differences in the preferences of either verb
according to the person and countability of
the agent
• 1st person singular frames prefer miettiä
• 3rd person singular collective human frames
prefer pohtia
28
Psycholinguistic Experiments
• Two off-line experiments
– Forced Choice
– Acceptability Rating
• Hypotheses based on the corpus-based
results
– 1st person singular agents (1SG) prefer miettiä
– 3rd person collective agents (3COLL) prefer
pohtia
29
XP 1: Forced choice
Materials
• 31 sentence triplets with 31 sentence frames and
three different verbs for each triplet, e.g.,
• Anu Joutsasta pohti hetken ~ Anu from Joutsa thought for a
moment
• Anu Joutsasta mietti hetken
• Anu Joutsasta ajatteli hetken
• The materials were constucted by using (slightly
edited) natural instances with either experimental
verb as the sentence frame for the other(s)
 the source of the natural instances was the same corpus
as in the corpus-based study
30
Forced choice (cont’d)
• The two experimental verbs (pohtia vs. miettiä)
and the fillers (ajatella) were presented semirandomized within each triplet in the appropriate
inflected form.
• The participants were instructed to select the most
natural sentence from each triplet and check the
appropriate box on the experimental sheet.
• 21 Finnish native speakers participated in the
Experiment
31
Results: XP 1(1) (N=520)
60,0
50,0
miettiä
45.0
35.8
19.2
pohtia
10.4
31.9
57.7
40,0
%
%
1sg
3sg
3coll
Miettiä
30,0
Pohtia
20,0
10,0
0,0
1sg
3sg
3coll
32
Results: XP1(2)
• The overall distribution of responses differed
significantly from chance (2 , p < .0001)
• The 1SG agent clearly preferred the verb miettiä
(2 , p < .001)
• The 3SG-COLLECTIVE agent had a clear
preference for the verb pohtia (2 , p < .001)
• There was no preference either way in the 3SG
(non-collective) category (2 , n.s.)
33
XP 2: Acceptability rating
• Sentence frames with each Agent Type (1SG, 3SG
& 3COLL) – 21 frames each – were used to
construct the experimental sentences with both the
verbs miettiä and pohtia as well as the closely
related verb filler ajatella ~ think (generic)
 The sentence frames were based on natural instances
extracted from the corpus used in the corpus-based
study
 1/3 of the sentences had the original verb in the corpus,
2/3 had another verb in the corresponding form
• this amounted to 63 test sentences per test subject
• 40 filler sentences were constructed with the verbs
käsittää and ymmärtää ~ understand (20 + 20) 34
Acceptability rating (cont’d)
• The experimental sentences as well as the
sentences with ajatella were counter-balanced
over three experimental lists
• Each list included the same 40 filler sentences
• Altogether 103 sentences were presented
randomized on three experimental sheets
• The verbs were presented in angle brackets, e.g.,
Anu Joutsasta <ajatteli> hetken
35
Acceptability rating (cont’d)
• The three sheets were distributed to 54
Finnish native speakers (as) evenly (as
possible)
• The participants were instructed to evaluate
the acceptability of each verb in the
sentence frame on a scale of 1-7 by
checking the appropriate box on the sheet.
36
Mean Acceptability Scores XP2
1SG
3SG
3COLL
miettiä
pohtia
5.6
5.3
4.5
5.2
5.6
5.4
37
MAS
Mean Acceptability Scores XP2
6,0
5,5
5,0
4,5
4,0
3,5
3,0
miettiä
pohtia
1sg
3sg
3coll
38
• Significant main effect of Agent Type
• Significant interaction of Agent Type and
Verb
•  Agent Type significant with miettiä but
not with pohtia
• miettiä: 3COLL significantly less
acceptable than either 1SG or 3SG
(p<.001), no difference between 1SG and
3SG (p>.2)
39
• Within the three Agent Types:
– SG1: miettiä significantly more acceptable than
pohtia (p < .01)
– SG3: no significant difference (p > .1)
– 3COLL: miettiä significantly less acceptable
than pohtia (p < .001)
40
Discussion
• both the corpus-based evidence and the
psycholinguistic test results converge
• the psycholinguistic test results deepen the
picture that the corpus provides and give an
explanation for the mechanism that drives
the selection of either verb in a particular
context/frame
A word can be selected simply because the
alternative is not preferred
41
Relationships between the
different types of evidence
• the forced choice tests reflect normal actual
usage situations (~ performance) and thus
mirror the corpus-based results
• the acceptability tests reflect the general
linguistic insights about what is considered
possible and what is not (~ competence)
 sounds like building blocks for generative
descriptions
42
Conclusions
• the two types of empirical definitely show
that the two near-synonymous verbs differ
in usage regarding the studied features
• combining two types of empirical linguistic
evidence can be used to enhance and enrich
lexical descriptions
43
Questions, Comments, Critique,
Discussion
Download