Two-dimensional clusters in grammatical relations

advertisement
From: AAAI Technical Report SS-95-01. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.
Two-dimensional clusters
grammatical relations
in
Mats Rooth
1
Introduction
If someone asked me to explain the word "product" as used below, I might
say that it refers to things like drugs, cars, and software which are developed,
produced, sold, and used.
a new product.
(i) The companyannounced
Whileit is drugs,
carsandsoftware
whichareserving
as examples
of products,
thelistof verbshasa roleas well.If I merely
saidthatproducts
werethings
likedrugs,
cars,andsoftware,
someone
wouldhardly
be in a position
to say
whether
toothpaste
countsas a product.
And a definition
of "product"
as
simply"something
whichis produced"
getsa moregeneral
senseof theword,
including
thingssuchas toxicwasteproduced
by an industrial
process,
and
carbondioxide
produced
by humanrespiration.
Theexplanation
of "product"
by multiplied
example
involves
the implicit
claimthatdrugs,
cars,andsoftware
arealldeveloped,
andallproduced,
all
sold,andallused.Notonlyis thistrue,butin a largeenough
textcorpus
we canlindexamples
examples
of drugs,
cars,andsoftware
beingdescribed
as
beingdeveloped,
produced,
soldandused,in allcombinations:
(2)a. ~95~,.2 Theirgoalis to develop
newdrugsto compete
in worldmarkets
b. 44~u~Genzyme
sellsspecialty
chemicals
usedby othercompanies
to
produce
drugs,
diagnostic
testsand otherproducts
andperforms
contractresearch
fordrugcompanies.
"
c. 442~17s We absolutely
mustbe in theU.S.market
by 1992," he says
, " to sellthenewdrugsourresearch
labswillbe producing.
"
d. ~,7~4Milton
Bass,a NewYorkattorney
forZenith,
asserted
thatthe
appeals
court’s
ruling
willspurcivil
suitsagainst
otherdrugcompanies
thathave" conducted
campaigns
to frighten
doctors
and othersnot
to usegeneric
drugs.
"
135
(3)
s,,,2The Europeans’
problem:
thehugesumsit takesto develop
new
carsand bringthemto market.
Theeightmajorautomakers
didn’t
produce
anycarslastweek
b. ,7o~2o
because
of theholidays
This year,forthefirst
C. m5~2 "
time,
we’ve
hadtoworktosellthecars
,"says
Michael
J.
Jackson
,aSaab
dealer
in
suburban
Washington
,
D.C.
do
61~3~- A haltto importsof sedans, plusa requirement
thattop
officials
useonlyChinese-made
cars.
pavesthewayfor thecompetition
to develop
(4) a. ~2~,n" Thissettlement
software
in theoperating
systemarena.
"
b.
~6 He guessedit willtakeMicrosoft
six to 12 monthsto produce
the software
basedon theSybasetechnology.
~,
The
company
operates
retailstoresthatsellcomputer
software
Co
d. ~7,~Theyare writinga bookto showcampaign
managers
how to use
Lotus1-2-3software
to analyze
localvoting
habits.
Theseexamples
are froma corpusof WallStreetJournal
newspaper
stories,
Containing
roughly
60 million
word-like
tokens
(Liberman
[Liberman,
1992]).
Then-tuber
at thebeginning
of an example
indicates
itsposition
in thecorpus.
Thiscombinatory
pattern
evident
in thesesentences
is of independent
interest,
because
it suggests
a simple
wayof capturing
selectional
patterns
relatingverbsto theirobjects,
or lexical
pairsparticipating
in othergrammatical
relations.
In a number
of problems
arising
in computational
linguistics,
we
needto be ableto decide
whether
a givenphrase
is an appropriate
filler
fora
grammatical
relation
assigned
by another
word.Thisoftencomesdownto a
question
of semantic
compatibility
between
theheadof thefiller
phrase
and
thesemantic
roleassociated
withthegrammatical
relation.
For instance,
in
thesentence
below,resulted
and approved
bothhavemorphological
readings
as bothtensed
verbsandpastparticiples.
resulted
froma settlement
approved
yesterday.
(5) Thecharge
Themorphological
indeterminacy
results
in syntactic
ambiguity.
In thefirst
structure
in thefirstrowof tableI , charge
is thesubject
of resulted
and
approved a postmodifier of settlement. In the second analysis, resulted is a
postmodifier of charge, and charge is the subject of approved. Deciding between the syntactic analyses below comes down, at least in part, to a matter
of semantic selection. In choosing an analysis, it would be useful to know
(amongother things) whether settlement is a good object for approve (it is)
and whether charge is a good subject for approve (it is not). (Note that in terms
of the underlying role assigned, the postmodified noun phrases are equivalent
to objects of their modifiers.) The sentence below has the same ambiguity,
136
~P
N~R
iwl,m~,~.~r mumcemn~ V
obq:""
~^a
~
P
~iIlw&Io*,,~ wqm
im:z~m~
¯
Table 1: First row: corrext and incorrect syntactic analyses for sentence (x).
Secondrow: incorrect and correct syntactic analyses for sentence (y).
andinthiscaseit isthesecond
syntactic
analyis
whichis correct.
union
contracts
signed
in thethird
quarter
granted
slightly
(6) Private-sector
lowerwageincreases.
Suppose
thatwe wantedto attacksuchproblems
usinginformation
from
a training
corpus.
Sincethousands
of verbsand no-nsare involved,
we can
notconclude
thata givenverb-object
pairis impossible,
simplybecause
we
can not findit in the corpus.
Takefor instance
a slightly
moreuncommon
productverbsuch as e~ort,and a moreuncommonproductnoun,suchas
engine.
Although
theverb-object
paire~ortengine
is intuitively
plausible,
it
wasnotdetected
at allby the method
described
belowin a six million
word
training
corpsus.
A selection
modelwhichgeneralizes
amongwordshas the
potential
of solving
theproblem,
sincewhiletheverbexport
e~ortdoesnot
occurwithengine,
it occurs
withotherproduct
nouns,
andengine
occurs
with
otherproduct
verbs.
Thepurpose
of thispaperis to develop
a mathematical
andcomputational
modelwhichcaptures
thenotion
of a selectional
dependency
between
a setof
verbsanda setof nouns,
or moregenerally
twosetsof wordsparticipating
in a
grammatical
relation.
Section
2 describes
a simple
categorical
selection
model,
whichin section
3 is givena probabilistic
twist.
Locally
optimal
probabilistic
modelscan be generated
by an incremental
procedure
similar
to Ba-m-Welch
137
acquire 16
bo~t
buy 16
climb
cut
de~Jine
drop
1
dump
1
f-II
1
1
1
2 2 48
1
2
8
1
104
18
2
19
2 3
I 1
3
2 I
132
20
7
1
2 28
3
20
3
8
2
1
23
1 76
bold t 18
|~1
6 3
jump
2
lower
plunlie
purcha~
8
push
1 2
r~se
reduce
g 1
ret~n -1
rime
13 9
138 18
sell 114 2 1 40
slmdl
17
tr~le
1 1
2 2 2
2
I 18
2
35
6 7"r
I I
1 39 21 8
28
53 I
36 348
2 10 13
!
10 11
1 66 64 11
1
13
9 16
1
2
1 2 30
9
6 5
2 10
1
2 14 171 38 33
3 2 11 62
9
25
I 22
3
5 68
5 26
8
1
3 26 36 2 1 36
2 8
9
2
1
23 83 88
2
14
4
2
I 17
6 98
1
3 44 20
1
4
5 28
8$
131 149 26
5
105 3
S
1 22 56 8
9
1
13
17
-2 2 3 52 125 18 19
1
6
7"J 2
1 12
8
48 243
4 9
2O 6
9
’;’ 22
87 29
59 10
107 190
5
19
1
29
2 30
3
19
5
7
10
1
4
121
75
16
24
74
41
21
144
1
2
25
30
2 11
1
2
28
28
3 1
16
1 2
2
20
6
16 1
1
4(5
2 26
1
3
1
1
149
104
1 3
3’r
5
4
2 1
11 1
21
1
2 22
2
3
1
Table2: Frequency
counts
for24 verbsand24 object
heads.
reestimation
of hidden
Markovmodels.
In section
4 we lookat results
fora
reasonably
largesample
of verbsan nouns.
Section
5 discusses
applications
of
theprobabilistic
model,
giving
prelimi-ary
results
fora parsing
problem.
2
Categorical
selectiontypes
Table(2) is a matrixof frequency
countsforverb-object
occurrences
twenty-four
verbsandtwenty-four
nouns.
The tableis a sub-part
of a verbobjectmatrixderived
froman approximately
6 million
words~mpleof the
II
WallStreetJournal,
parsedby DonaldHindlewithhis Fidditch
parser.
extracted
verb-noun
pairswitha Lispprogram
fromlistrepresentations
of
parses,
and mappedthemto uni-flected
formsusinga fullformwordlist.
The resulting
listcontained
84182non-zero
frequency
counts.
The frequency
matrixwas reduced
by eliminating
rows(corresponding
to verbs)withfewer
thanfivenon-zero
entries,
andsubsequently
eliminating
col-m-s
withfewer
thanf~venon-zero
entries.
Thisgavea matrix
indexed
by 992verbsand1027
nouns,
containing
55251non-zero
entries.
Theverbsandnounsin the24 × 24
sub-table
wereselected
by handforillustrative
purposes.
Re-arranging
the rowsand columsin thesmalltablebrin~out a dependencybetween
the rowsandcolumns.
In table2, mostof thenon-zero
entries
arein thethreeblockson the diagonal.
We can thinkof theblockorganization
as capturing
threesemantic
selectional
typeswithin
thispartof the
verb-object
grammatical
relation.
Thefirstblockcorresponds
to a notion
of
IThe
parser
isdescribed
in[Hindle,
1983]
and[Hindle,
1994].
S;rnilar
verb-object
data
is
discussed
in [Church
et al., 199x].
138
"~
~quire i 16
buy 16
dump
1
bold 18
purchm
8
ret~n
I
sell 114
trado
1
climb
decline
dl.op
1
hdl
|sin
jump
rise
plun|e
boost
cut
incTe~m
6
Imem.
push
1
reduce
slseh
9
38 5
48 53 38
3
2
7 22 5
5 17 6
13
40 72 48
2
7
77
S48
10
68
98
17
243
22
87 29 19
1
107 190 29 2 2
10
121 30 3
24 20 6
21
1
3
144 149 104 2 1
2 37
5 1
1
3
2 $
I 1
2 I
1
1
3
25 4
2
1
1
28
8
1
2 1
8
5
1 36
1
4
6
9
1
2
59 10
8
2
78
2
16
1
16
74
41
2
1
1
2
I 13 9
1
3
1
2
1
I
1
1 1
1
2
to
!
2
2
3
1
2
1
I
2
1 2
6
12
8
2
1 2
9
8
2 10 13
1
1
18
13
2
1
9 16
lg 2 1 2 30 7
9
e 5
132 I 2 14 171 28
38 33
20
2 11 62 28
g 25
g
3
2 8
136 2 3 52 12822 18
18 19
2
’ 3
2
14
4
1
1 1
2
I 18 39 21 5 1 19
1
104 10 11 66 64 11 30 5
2
1
3
25
5 26 26 36 2 11 16
2O
2
23 83 55 2 4
3 1
1
44 20
1 2
5
1 23 5 28 131 149 26 46 11
1
1
76 105 3 22 55 8 26 21
17
4 9 20
6
3 3
Table 3: The same counts in another order.
exchange of financial instruments or ownership interests. The second block
involves measurementof a scalar motion by some dimensioned quantity, and
the third block involves changein somescalar quantity, such as a stock price.
To represent the blocks, we need not re-order the matrix: we can equate a
block with a pair of a subset of the verb set and and a subset of the nounset.
A closer examinationshows that it is not reasonable to insist that the noun
sets for different blocks be disjoint. The noun8take occurs freqently with the
verbs of the third block (e.g. boost, increase, ra~e, andreduce), as well as
with the verbs of the first (e.g. acquire, 5u7/, sell, and trude). Here are some
examplesentences:
(7) a. lu~ Most of Japan’s big computercompanieshold a 1%stake in Ascii,
and Mitsui ~ Co., one of Japan’s largest trading companies, plan~ to
boost its stake to 5%.
b. 12~2, ThoughKoito is resisting, and Mr. Pickens has announcedplans
to increase his stake to 26%,he maintains "there’s nothing hostile"
about his investment.
c. ~ego3, Ford will raise its Jaguar stake to the maximum
15%after the
30-day waiting period expires, the Fordexecutive said.
d. ~,, The sales reduce the group’s stake to 740,500 shares.
(8) a.
Galesi Groupsaid it is seeking to acquireLoneStar Technologies
Inc.’s stake in AmericanFederal Bankfor $58 million.
b. ~o24,4 TheItalian financier is close to announcingthat he’s setting up
a holding companyto buy controlling stakes in Hungariancompxnies,
according to Hungariansources.
678667"r
139
c. 2~6 Domino’s Pizza owner Thomas Monaghanmay sell his 97% stake
in the chain, which is the nation’s largest pizza delivery company.
d. 4~7~2One possibility would be for the U.S. group, formally called Newgateway PLC, to trade its troublesome stake to Isosceles for certain
Gatewayassets.
The reason for the overlap is that a stake (say in a company)is both something
whichcanbe bought
or sold,anda scalar
quantity
whichcanbe increased
or
decreased.
Theverbincrease
is in principle
a symmetric
example,
though
thisis not
really
obvious
in thesedata.It occursbothwithobjects
denoting
a changing scalarquantity,
andwitha objects
(orpseodo-objects)
measuring
that
change:
(9) a. ~,8 JackW. Forrest,
Environmental
Systems
president
and chiefexecutive
oi~icer,
saidthechange
increases
thecompany’s
effective
tax
rateto about35% from20%.
b. ~8,~ou
If Mr.Wanniski’s
theory
is right,
thata taxcutincreases
the
value
ofcapital
assets
heldbyowners,
wouldn’t
thatincrease
alsorepresenta rather
dramatic
andtotally
unjustified
inflation
of those
values?
c. 4~02~u
Georgia
Gulfsaidit increased
theexercise
priceof therights
to
$120from$50.
d. ~5+8o60
"Ithasincreased
thelevelof caution
in themarket,"
saidtlae
Chicago
trader.
~o Economists
saidthe Augustcivilian
unemployment
ratewillhave
(10)a.
increased
0.1percentage
pointto 5.3%.
b. ~070Unleaded-gasoline
futures
weremixed,although
the September
contract
increased
0.22centto settle
at 54.15centsa gallon.
In thelatter
group,
thechanging
scalar
quantity
is realized
as subject.
Giventwo vocabularies
V and N, we definea selection
typeas a pair
(V’,N’>,whereV’ C V and N’ C N. A selectional
modelis simplya set of
selectional
types,and givenwhatwassaidabove,we shouldnotimposeany
requirement
of non-overlap
forthenounsetsor verbsetsof a selection
model.
140
A reasonable
selection
modelforthe24 × 24 frequency
matrix
is:
r acquire
buy
dump hold
~interest
security
purchase
retain ’|share
stake
~stock
unit
Sc~mbdecline
dropfallgain
,~footmarkpence
jumprise
l.point
yen
,plunge
increase
fboostcut
./~increase
lower
\ Lpushraise
reduce slash
3
Probabilistic
|dividend
price
,~raterating
tax
|valueshare
tstakestock
selection
types
Theaboveconstruction,
because
it is categorical,
encodes
no information
about
therelative
freqeuncy
of,forinstance,
buyanddumpas verbal
realizations
of
thefirstselectionai
type.In applications,
having
access
to graded
information
is useful,
in thatit allows
thelargenumbers
o~ analyses
-- suchas syntactic
parses-- to be ranked.Futhermore,
oncewe get beyondsimpleexamples,
it is notclearthatmembership
in selectional
patterns
should
be considered
2
discrete.
Amongwaysof introducing
graded
distinctions,
probabilistic
modelsareappealing,
because
theyhavethepotential
of telling
us, in complex
situations,
howa number
of graded
distinctions
areto be combined.
Thesimplerecipe
forturning
a categorical
modelintoa probability
modelis replace
characteristic
functions
of setswithprobability
distributions.
In thepresent
case,
we redefine
a selection
typeas a pairof discrete
distributions,
oneonthe
3
verbsand oneon the nouns:
Thefunction
Avp~mapsa verbto a number
in theinterval
[0,1],meeting
the
constraint
thatthesetof verbprobaiities
sumsto one,andsimilarly
forthe
nouns.In orderto use thisnotation,
we mustassumethatthe verb,noun
2Fttrthermore,
I did notsaywhatmakesone categorical
selectional
modelbetterthan
another,
andhow theyareto be discovered
computationally.
I haveexperimented
withan
incremental
searchforselection
models,
usinga Solomonoff-Kolmogoroff-Chaitin
measure
to
evaluate
thecombination
of the complexity
of themodelwiththecomplexity
of describing
thedatamatrixgiventhemodel.I willnotdescribe
thismethodhere,sincethesearch
was
computationally
expensive,
andtheresultsonlymoderately
encouraging.
3Avp~, generates a probability distribution AX~’~x p.r measuring subsets of V. In the
text, I surpress the distinction between discrete probability distributions and their generators.
141
type 3 .481
tFpe 1 .204
tTpe 2 .315
fail .344
point .364
raise .268
ret~t .238
sell .319
share .&tO
cent
.2gl
rise .343
reduce .189
price .207
buy .288
stsJm .198
gain .128
pence .080
cut .102
coat .143
hold .098
stock .172
drop .059
yon .0TI
lower .109
stake .lOS
acquire .093 interest .078
deeUue,038
price .060 increase .I00
debt .OTI pu~ .0~3
sumet.065
climb .028
ruts .089
bo~t .0~J
tax .Oe3 increase .030
unit .089
jump .020
tax .021
push .036
ratin E .059
trade .027 sw.-urity .038
plunp .019
o~ .01T
sl~th .034 dividend .080
boost .024
bond .037
increnm .006
cost .016
sell .010
value .044
retain .019
debt .004
push .008
bit .011
decline .009 interest .006
gain .011
price .002
trade .003
drop .006
stock .OO3
mark .011
dump.009 aversge .002
reduce .002
foot .004
trade .005
mark .003
push .009
yen .002
boo~ .001
stock .002
hold .002 avere~e.002
reduce .008
bit .001
sell .001
value .00~
retain .001
ses~ .002
raise .003
mark .001
cut .001 interest .002
acquire .001
yea .001
decline .001
foot .001
bold .001
unit .001
gain .000
share .000
rise .001
value .0400
raise .000
ses~ .000
rise .000
point .000
plunse .001
rate .0400
buy .000
shm .000
plunKm.000
cent .000
drop .000
point .000
cluh .000
rstlng .000
tall .000
pence .000
slseb .OGO
coat .000
lower .000
stske .000
c/lmb .000
bond .004)
lower .000
cent .000
acquire .000
debt .000 purchase .000 security .000
cut .000 dividend .004)
purchase .04}0 dividend .000
boy .000
bit .000
fail .000
tsx .000
retain .000 security .000
jump .000
unit .000
jump .000
pence .000
dump .000
bond .000
dump .000
toot .000
climb .000
rstin s .000
Table 4: Parametersof a selection model.
and type sets are (or have been rendered)disjoint: we do not want to identify
the verb probability P~/v with the noun probability pirncrease/N . We also
add a probability distribution over the types. Using an initial segmentof the
natural numbersto index the types, a probabilistic selection modelfor V and
N with k types then consists of a probability distribution Arprover the set of
integers {1, ..., k} (= T), andfor eachtype r in {1, ..., k}, a pair of probability
distributions (Avp~, Anp~), as described above.
Derivatively, for any type ~" we construct a probability distribution on V × N
as a product:
p;,. =p;p
Weconstuct a probability distribution on T × V x N as a disjoint union of
these products:
Such a modelcan by used to assign probabilities to verb-object pairs. In
the probility space just defined, if a verb-nounpair is generated,it is generated
in sometype, and we obtain the probability for the verb-nounpair by summing
over the types:
= E Ep1.p;p;,
I"
In section 5, such probabilities are used to comparetwo grammaticalanalyses.
Table 3 gives the parametersof a selection modelof order three for the 24 × 24
data.4 Notice that the nounstake is rankedhigh in both the second and third
types.
40r rather, as one can discover by s.mmirlg the first col,,mn of numbers,the approximate
parameters.
142
L~/pe I .204
~rpe 2 .315
fall .344 point .364
rehm.26S
l~te .238
rise .343
cent .281
reduce .109
price .207
pence
.080
cut
.162
coot .143
I~in .128
drop .089
)qDn.OTZ
lower .109
debt .071
decline .038 ovlra4~e .017 increase .100
tez .063
climb .028
bit .011
boast .OY3 min K .089
jump .020
mark .011
push .036 dividend .050
plunse .019
foot .004
sluh .034
value .044
t}rpe 3 .481
sell .319 8bin .340
buy .288
stake A98
bold .096
stork .1T2
acquire .093 intet~a .OT8
mrrk,,,e .063
ammt.068
trsdo .027
unit .059
retain .019 Imcurity .038
dump .009
bond .03T
Table 5: The same model, with verbs and no-us represented only where they
are mostlikely to be generated.
Estimating a model
Givena selection
modelandobserved
verb-object
pair(v,n),theprobability
thatit isgenerated
intyper is:
Pv,n
Multiplying by the frequency .f~,n, we obtain the expected numberof occurrences of the event (%v, n) given the observed frequency and the model:
Thisformsa basisforre-estimating
theprobability
parameters:
e,.,,, =Z,~e,.,,,,,,e,.,~,=F_,,,e,-,,,,,,e,.=Z,,,,~e,.,,,,,,
The probabilities q, the parametersof the newmodel, are computedas relative
frequencies of expected numbersof events, as determined by the old model.
For instance, the probability of the verb fall within type 1 wouldthe expected
numberof occurences of fall in that type (given the data and model), divided
by the expected numberof occurrences of that type of verb-object pair.
The formulas are similar to the Bantu-Welchre-estimation formulas for
hidden Markov models (Bantu 1972). s Adapting Baum’s result for HMMs,
it can be shownthat an iterative re~estimatiom of parametersproduceslocal
improvementsin the probability of the observed data given the model, and
converges to a local maximum
of this probability. Table 3 was derived in one
hundrediterations starting from a randomstate, using the frequency data in
5Selection
modelsas described
herecan be viewedas zero-order
HMMs,augmented
with
a secondsurface
vocabulary
andassociated
emission
probabilities.
Thatis,givena state
(ortype),two surfacesymbolsare independently
generated.
Furthermore,
as usedhere,
thetypesarehiddenin thesensethattheyaregivenno priorinterpretation,
andarenot
observed
in the data.
143
Table2. Table3 is a different
wayof looking
at thesamemodel:
eachverbor
nounis is shownonlyin thetypewhereit is mostlikely
to be generated,
i.e.
whereprp~(orp~.p~)
is maximal.
Thisrepresentation
reconstructs
thethree
eight-by-eight
boxesof table2.
4
Results for a larger data matrix
Storagerequirements
for the estimation
algorithm
are modest.Thereare
[T[]V[
+ ITI]N]
+ [T[ probability
parameters.
There-estimation
formulas
sum
overnon-zero
frequencies,
andthe expectations
canbe computed
by summing
iteratively
oversuchfrequencies.
In eachstep,a frequency
fv,nis apportioned
amongthetypesaccording
to theratio~.~,and theportion
fora typev is
addedto running
subtotals
of e,,~ande~,n.Thisprocedure
requires
intermediatestorage
of thesamesizeas theprobability
model.
Sinceno random
access
to the frequencies
is required
thereis no needto represent
a V x N matrix.
(Thismightturnoutto be useful.
In a larger
datamatrix
basedon 60 million
wordsof text,about3500verbrootsoccurwithfiveor moredifferent
nouns,
andabout7500nounroots,
notcounting
proper
no-heor numbers,
occurwith
fiveor moreverbs.One of thesenumbers
wouldbe multiplied
if verbs-with
complements
accompanied
by prepositions
wereincluded
as separate
entries.)
The algorithm
was implemented
in CommonLisp.Tables4, 4 and 4 give
themostprobable
nounsandverbsin eachtypeof a probabiity
modelforthe
992x 1027matrix
withthirty-two
types,
obtained
withfourhundred
iterations.
Thethreeblocks
of 2 arerepresented
here:type3 is theproduct
type(e.g.
develop software), changing dimensionedobjects (raise price) are in type 8, and
scalar increments (rise cent) in type 26. Type 30 is a related one where the
object typically denotes a scalar motionevent, such as a decline or an increase.
The nouns of types 19 and 29 primarily name people. In the nouns, the split
seems to amount roughly to a distinction between powerful, active people
(executives, lawyers, officials) and weakor passive ones (workers, clients,
shareholders). Looking at the verbs, the type 19 people are appointed and
°replaced; the type 29 people are given orders and permissions.
Several types are dominated by a commonand semantically empty verb,
such as be (12), have (23), or make(31). In these cases, the noun sets are not
intuitively coherent, presumably because these verbs impose such weak selectional restrictions. The verb sets are not particularly coherent either, though
this is balanced by the fact that most of the verbs have low probabilities. These
common
verbs also occur in manyother types, for instace be in top position in
type 32, which is an intuitively coherent type.
Sin verb group 19, a number of items resulting from parsing errors are evident. Presumably, manage comes from managing director misidentified as a verb phrase.
144
t~/pe 1.015
~pe 2 .021
meet .23606
standard .06597;
end .10815
ysar .11873
set .15089
record .05207!
trade .07582
Friday .09733
be .04807
Soaa .04920
say .06728
w~k .08977
keep .042O8
need .04483
work .O~l~S,
month .08074
bit .03637 requiremmlt .04162
bel;in .06120!
d&y .06488
mt-, .03403
level .03564
spend .08076
time .04698
exceed .03322
demand .03168 ~anounco .02880
tod~r .04067
reach .02533
tarpt .02963
close .02638
~.aday .03573
~.hieve .01923
high .02828
open .02420
hour .00881
fulfill .01518
.02678
start .02219
Monda~r .02724
pace
sut ~..01330
.02158
expire .01718
wsy .02458
llve .01241
date .01860
mark .01818 Wadnwday .02287
surpus .01163
pertinent .01819
lut .01548
c&pitai .02116
ut~bllsh .01126 expocta~on .01764
wait .01396
taik .01933
point .01623
strc~ .00919
follow .01147
night .01526
:eliminate .00888
limit .01898
serve .0U17
w~r .01030
typc 4 .029
t~e 3 .032
type 6 .035
use .16863
product .07639 continue .06015
oper~ion .08075
complete .08198
sslc .10672
de,lop .09296
system .04469
be~n .08647
effort .07703
flnLnco .08686 ! acq,,;sition .06132
produce .08613
drug .03406
16unch .04128
progx~un .0g099
include .04893 transaction .08817
introduce .03273 teebnololD, .03110 conduct .03157
csmpaJgn .02961
be .03700
chamge.04977
markct .02933 computer .02236
tlsum@ .02848 production .02943
.03244
purchsee .04142
sell .02778
country .01948 I expand .02828 iuwlctiSatlon .02209
,,pprove .03176
merger .03092
include .03140
car .01875I
start .02476
service .02029 rcprseeut .02?03
order .0270g
s~p .02109 equipment .01685 I
be .02280
proem .01670 ~ounce
.02536
incruue .02646
supply .01770
line .01666
my .02189
work .01808
block .02272
action .02278
version .01500
h-St .01456 negotiation .01414 consider .02208
Set .O16O2
ta~keover .01989
distribute .01149
machine .01410
follow .01338
psyment .01407
involve .01758
proje~ .01827
test .01145
procmKI .01391
ban .01304:
activity .01a83
propose .01702
imue .01792
msnufacture .01107
prosrnm .01230
r~trict .01189:
practice .01290
explore .01670
offering .01475
promote .01035
ton .01226 support .011871
talk .01264
expect .01570
move .01278
lnm~l .00983
software .01178
o~mme .01148 development .01288
follow .01496
use .01208
build .00971
model .01146
)i~n .01145
use .01224
discuss .01474 iuvwtment .01098
t, ~e 6.024
t]vpe 7.023
We 8.041
file .14736
suit .11129
p~ .09036 money .20670
raise .17768
price .15348
follow .09798
cm .06304
spend .07740
S .07810
incrom .09248
rm .15152
settle .04932
report .06176 raise .05364
role .05575
boost .07783
st’ke .047234
deny .03748
decision .04991
be .03728
fund .06401
reduce .04806
question .03490
imue .03688
charSc .04782
cub .03552
lower .04484
set .03375
rating .03048
he~r .03011
Iowsuit .03991
Isee .03372;
lut .02660
cut .03509
v~ue .02881
gay .02876
st&tement .03674
ccot .03356
time .02688
.02377
sale .02844
8&vg
dismiss .02466
&pp~ .02439
.03050 dollar .02812
offer .02240
earning .02379
bring .02329
complaint .02390
use .02727
part .02037
ptmh .01979
number .02363 :
be .02301
cJsJm.02312
put .02146 i million .01805
expect .01608
level .02135
connrm .02157
ailelKatiou .021421 lend .01760 gsme .01680
keep .01407
doUar .01719
appeal .02020
.01720
cupitai
.01508
ruling .01910;
P~Y
disclcoe .01198
c~pitai .0170Y
include .02019
piss .01561 inv~t .01698 total .01428
double .01191
revenue .01661
review .01744
action .01490
need .01668 billion .01380
lu~wer .01182
sise .01569
reverse .01627
rumor .01408
give .01620
llfe .01291
bring .01062
¢cet .01369
see .01521 recommendation .01402
add .01520
7ear .01182 maintain .01042 production .01367
~/pe9 .023
t,/pe 10.014
11 .032
reject .08033
bid .15796
hold .42625 company .29762
I~y .21425
cost .13843
consider .06530
prupceai .11701 acquire .04866
meeting .09623
reduce .14588
debt .08962
accept .08288
call .04400
o~r .11493
~ .05878
cut .09988
rex .05801
suppo~ .Q3836
plan .06008
attend .02986
t41k .04176 : incremm .03444
dividend .03704
decline .03064
comment .03193 schedule .02436
hesrinK .03762 I
cover .02748
price .03451
drop .02975
requ~t .02891
lesve .02433 conference .03723
include .02187
$ .02684
close .02825
ides .02331
.02301
pceitiou .02211
be .01433
fee .02370
submit .02401
attempt .01 774
tell .02178
election .01934
repay .01388
expense .02319
be .02389
claim .01346
force .02111
concern .01878
slub .01313
deRclt .02184
race/re .02382
effort .01319
follow .01705 discussion .01367
keep .01286
interest .019U
upprove .02158
issue .01301
be .01290
poet .01207
limit .01228
bill .01868
launch .02135
strateSy .01216
mine .01207
stock .01060
control .01161
IoLn .01818
b4w.k .01941
option .01175
coutrui .01057
hoct~e .01017
ovoid .00944 ~capitaJ-Kains .01785
withdrsw .01771 application .01093
mk .00993
p~ty .00770
trim .00897
premium .01715
review .01711 ameudment .01044
expect .00783
~ion .00729
raise .00868
mount .01678
eanouuce .01688
$.00988
v~ue .00758
interest .00715
colh~t .00847
force .01879
Table
6: Partof a 32typeselection
model.
145
13 .026
t~ 14 .030
put .03984
do .20440
business .15112
show .10237
075o~
become .02744
w~y .02070
run .10720
job .09334
reflect .06022
Krowth .06279
find .01g18
time .02(JO1
be .09109
thin K .05915
express .03968
vlJue .03758
see .00639 pnmident .02846
cr~te .06320
compe~y .03691
m .03130
economy .03209
live .00600 cempe~y .01500
form .05178
work .03633
improve .02T87
concern .03047
include .00383
reason .01314
lesve .03496
venture .02406
slow .01795
¯ demand .03012
pt .00316
lot .01245
Set .O3488
lot .01903
coy .01794
siKn .02552
identify .00295 problem .01181
and .01K91 :veroment .01850
enhance .01611 performsnce .02153
remLin .00257 chairmen .01181
m .01579
fund .01547 continue .01538
mum .02080
cite .00242
bume .01158
see .01489
pro~,.ol~t pursue .01512 confidence .02062
ropromnt .00222
cue .01024 eliminate .01072
d,,,-s .01143I
grow .01808
ahillty .02002
Nee .00217
unit .00933
imp .0O987
.01139
fuel .01503
strenKth .01637
prove .00217
sign .00926
lose .00979
risk .00082
indicate .01460
inflntion .01438
-tlow .00201
thin
.00897
expect .00978
year .0(0)40
cite .01430
benefit .01353
K
went
.m
8yetem .0(0J26 represent .01363
tez’V~ .00818
ente.r .008~J
support .01348
msrk .00197
Nlmblieh .00798
room .00784
mmmumlmmnmmmm
~mmlm
17 .020
~_6
.o22 qFeemcnt
msr
.25364
n
plen .18452
can-/ .088~
dividend .08273
expead .07881
pruluro .06255
siK .11081
announce .08203
bill .08091
receive .06249
wsmmt .04434
put .06053
business .03365
&pprmm .06146
contract .04019
write .08310
memmge.03976
enter .00256
bolrd .02548
peas .04008 leKisi&tion .03891
declare .04667
book .02798
be .03619
c&pacity .02433
be .03342
letter .02473
note .02597
I~ .02317
industry .02232
Set .04338
ueloti&te .02940
accord .02398
publish .03854
lignal .02243
tep .01821 membership .02207
s~y .02620 settlement .02249
iaue .02821
stock .02046 dominste .01791
line .01869
introduce .01946
dcsJ .01028
rend .02800
~rticle .02037
keep .01595
economy .01522
.022g7
term/n~te .01562 mmure
.01622
deliver
informstion .01688
open .01580
Iron .01442
spend .01310
pact .01546
include .01617
sentence .01583
serve .01438
b4um.01228
,,4 .01531
veto .01293
psckqe .01450
Iil~m .01473
hit .01415
end .01221
putt .01408 competition
-,4opt ,01238
level .01253 enbordinMe .01438
price .01447
.0U85
propose .01171 resolution .01110
see .01412
news .01350
&fleet .01397
country .01128
.01142
proKrm
.01028
suspend
.01406
report
.01294
brin
.01250
side
.01123
fOrKe
K
diecmm .01009
bcsd .0U19
decision
.00951J~
~
typ 18 .021
&w .
impels .04621
riKht .083091
adopt .04096
rulc .07114
ce~ .03129
policy .08715!
exercise .03047
provision .04382!
include .02978
rmtriction .04050 I
tighten .02341
credit .03410
propose .02050
ban .03290
enforce .01916 r~ulation .02586
use .01698
security .02819
control .022TT
chamKe.01665
br~.k .01621
option .02191
extend .01587
8t&nds~l .01958
lib .01441 requirement .01490
oppcee .01376
tax .01227
remove .01323
mmlmmlmm
20 .015
~1.0~-r6------.0835K
usume .06687
position .1
fill .04823
become.11146 presidcnt .0K178
account .06605
iud .09514 mortK~KC .06931
meln .04154
control .04908
cap .07303
group .05734
n.mq~ .07970
offlci~d .08558
enldyst
.04518
return
.03868
powlr
.04743
bsnk
.07174 company .06733
be .00654
rn~sln .05788 chsirmen .04384
be .02827
consult .06777 &ttention .08347
imsKe .04092
8hm .02387 responsibility .03439
hcad .05826; concern .03268
tell .03499 executive .04324
include .03194
m~q~er .03607 strenlthcn .02213
pcet .03189
draw .04489;
force .02934
name .02968
pa~ner .02709
malnt-;n .02148
csJl .03184 indicate .03417
coupon .02364
member .02248
seice .02047
view .01745 en~inco1" .02217
hire .021K1
list .02076
ml.r ket .0133K
trader .01837
bur .01857
home .01714
trade .01944 indicator .01819
set .00933
bosrd .01599
chenge .01815
ownership .01703
attract .01666:
unit .01787
elect .00929
firm .01408
improve .01648
job .01598
form .01528
of~os .017T2
ouet .00924
l&w3~r .01362
use .01636
order .01501
etsSe .01513
bom’d .01306
uk .00854
company .01347
shii~ .01626
nsJme .01296
focus .01419 operltion .01214
&ppoint .00786 consultant .01221
place .01532
title .01296
turn .01066
te am .01078
banker .01196
bolster .01465
mt .01233
cre&te
Table7: Anotherpart.
146
type 23 .083
t,tpe 22 .025
t}rpe 24 .088
opermte .23595
officer .08099
have .88786
Io~ .03589
uU .24706
shlure .17737
build .12124
phult .03029
be .02008
ohm .03026
buy .21141
stock .0972~
open .08096
profit .05795
m~r .00934
effect .02868 acquire .06323
stlke
.~)~
clcoe .03310
system .04020
impa~t .02606 )urchin .04274 company ,
8ct .00734
maaufa~’turo .02973
Itorl .03467
loee .00644
. ~le .02217
own .03660 intermit
"03~
keep .02182
flciilty .03319
interpret .02Q82 include .03528;
m .00506
suet .
run .01790 operstion .02793,
limit .0087?
income .01999
totLl .0~347~
unit .02787
turn .01747
office .02522I
lack .00488 problem .01842
hold .02227
bond .02452
eltablilb .01689
center .02376 i increue .00432
ri3bt .01540
be .01894 businm .02242
mmm.01831
c~r .01748I
cite .00375
pl,-, .01473
issue .01459 security .01817
be .01426 comp4my.01672
cut .00366
time .01334
prefer .01360 opeTstion .01475
fly .0139e
home .01634
¯cld .00228 cbamce .01282~
trldo .01205
dollsr .01390
eximmd .01338
door .01471
exceed .0022Q trouble .01257
ret~n .00987
ton .01164
say .01300
mile .0U44
feel .00208 comment .01108
offer .00840
imuo .01091
mLint~dn .01149 I buiidin I .01131 coEmlder .00198
vLiuo .01099
receive .OOT02 product .00996
mmm.00194
lmme.01147 I
e~,.o11
suet .01068 incromm .00684
car .00955
t~e 25 .031
t~tpe 26.054
27 ~029
VP
receive .15326 IppruvLi .09040
rilml .22903
~1~ .74167 )rovide .22624
eervtce .08067
Ip~ .14544 controct .08712
yield .13168
point .07754
offer .11291
w~y .04967
win .11622
control .04008
fLll .12219i
cant .05860
chanlfe .08438 inform&tion .04529
mk .11268
order .027T5
be .08788
yea .01537
live.0e825
detadi .04282
Ialn .07135
support .02298
own .03445
ponce .01498
find .04507
hand .04176
obt~n .03024
shin .01816
drop .02992
price .00969 dis¢l~e .03263
aline .02648
live .02811 daumlqle .01668
jump .02685
ro~e .00935
Set .03251
d~t~ .02517
require .02492
licenee .01496
.024T4
year .00776
be .01960
benefit .01661
lose .02422
scce~ .01461 incremm .02322 average.00451
mink .01937
term .01436
need .02124
help .01388
climb .02222
c~t .0Q328 discmm .01320
incentive .01346
Iocee .01948 benefit ¯OI2U
define .02091
bit .00276
use .01279
figure .01070
lp~mt .01426
vote .01237
g-;n .02001 demand .00283
cles.r .01147
evidemce .01069
demand.01242 t
riKht .01~7
hold .01557
ton .00250 include .01115
l .01045
secure .01030
loam .01140
buy .01167
fee .00244 obtsin .01086
loan .01006
&wLrd .00975 attention .01126 a~quiro .01089 inflation .00244
need .01088
tenon .009~8
Lvmit .00788
boom: .OIOT5
8osr .0089
ran§e .00231
releeee .00901
record .00987
He 28 .023
t,~e 29 .052
t~e 30 ¯037
t~ke .71341 yesterday .08897
"tlow .05695
company .07287
report .19367
lea .17676
trade .21074
pl-~e .06415
live .04996
people .05791
poct .12246
soda . 10309
handle .00380 ~dvLntqe .05361
tell .04769
immctor .04724
ely .07210 e~rninK .09973
offer .00343
step .04657
help .04348 overnment .02802
expect .06355
profit .08199
c-q .00334
action .03613
uk .03068
customer .02600
include .04197 income .08284
include .00324
effect .03290
Z~Kluire .02360 employee .02558 operate .02780
rode .04715
m .00264
volume .03119
ssy .02168
worker .02435
show .02579 de~line .04575
find .00205
cha.r|e .02457
sttract .02134
client .01987 attribute
.02379i increase .04519
otTeet .00204
control .02322
force .02101
bank .01928
follow .020491 result .04247
represent .00179 position .01873 represent .02067 sbaz~holder .01402
l~ve .01666
riee .02973
enjoy.00148
y~ur .018412 protect .01823
qency .01363 generate.01453
drop .02688
relnMJn .00146
profit .01806
leave .01803
coeeumer .01271 produce .01249 revenue .02411
¯ -’cept .00121
look .01TTO
u~o .01768
Foup .01270
be .01199
event .01580
live .00120
time .01438
Imp .01568
state .01284
.01023
net .01198
overcome .00117
risk .013345 include .01482
reporter .01234
r~fleet .01006 return .01168
note .OOU2
.01320
.01475
encouroKo
court
.0U83
estimate
.00984
pLrt
charlle .00803
t,/pe 31 o29
~p 32 .029
ma~e .83180
decision .06421
be .11257
problem .14799
My .01986
money .03619
face .08331
issue .02183
ceJl .0o735
payment .03842
cemm.04889
pr~meuTe.02099
.00696
.03247
sense
resolve .02475
d~e
.02080
expect .00582
bid .02883
solve .02389
recension .01835
Lccept .00552
offer .02289
avoid .02373
chaJlenee .01664
welcome .00613
product .02105 ~cldrmm.02302
situation .01(163
olm .00451
chooKe .02097
flKht .02070
effect .01617
avoid .00400
-loan .01927
cute .02037 competition .01527
lanv* .00360
pose .01971
move .01093
thre&t .01394
ke*p .00387
profit .01682
reflect .01856
risk .01369
aE*Ct .0o315 inveetmant .01484
m .01673
COUCeru.01343
guarantee .00299 difflmlnce .01479 ,revQnt .01505
question .01327
clncei .00278
effort .01280
emm.01416
crisis .01323
defer .0o277 statement .01278
suffer .01239
sbort~qle .01284
reflect .0G250
$ .01184
cite .01236
crime .01239
Table
8:Still
another
part.
147
5
Application
to parsing
A casual examination of clusters can at most suggest that the right sort of
thing is going on; the point is to use such representations to do something
that we want to do for an independent reason. In the introduction, I said that
resolving manyparsing ambiguities comes downto evaluating selectional compatibility betweentwo lexical items. Given a probabilistic selection model, we
can assign a probability to any verb-object pair drawn from the lexicon of 992
verbs and 1027 nouns. In order to evaluate parses, we need to include other
grammaticalrelations. In somecases, such as subjects, this is fairly straightforward. In others, such as second complements of verbs, getting access to
frequency counts is not straightforward, since identifying second complements
such as prepositional phrases -- requires resolving attachment ambiguities.
This can not be done systematically without the hind of lexical information we
are trying to induce. Presumably, a procedure learning selectional restrictions
for second complementswould have to initially consider several attachments,
and iteratively learn the lexical information required for dis~mbiguation. This
method is applied to simpler data (paying attention just to prepositions and
not the heads of their objects) in Hindle and Rooth [Hindle and Rooth, 1993].
To investigate the possibility of disambiguating syntactic ambiguities with
selectional information, I considered the past participle vs. tensed verb ambiguity of sold in positions immediately following a noun phrase. In the first
example below, sold is a tensed verb, and the preceding noun administration
is the head of its subject, describing the agent. In the second example, sold
is a participial post-modifier, and the preceding noun phase denotes the sold
object.
(ll)a. ~o6o~5evidence that the Reagan administration sold arms to Iran
b. ~, represented payments for arms sold to Nicaraguan insurgents
The situation is actually more complex, since sold occurs frequently in the
middle construction, with a syntactic subject filling the semantic role of a sold
object, rather than the seller:
(12)~,7.,. Thefranchise
soldin 1979for$Iimillion
To sidestep
thisproblem,
I defined
theproblem
to be solved
as one of identifying
thesemantic
relation
(seller
or soldobject)
of thenounphrase,
rather
thanidentifying
a grammatical
relation
or partof speech.
In otherwords,
the
middleconstructions
~re grouped
withthe postmodifiers.
By hand,I classifiedthefirst
relevant
occurrence
of eachnoun-sold
pairin thefullWallStreet
Journal
corpusfrom[Liberman,
1992].Of the 1027nounsin the model,220
wererepresented
in thisconfiguration,
of which87 werepasttenseverbs,
118
wereparticipial
postmodifiers,
and 15 middleconstructions.
An augmented
selection
modelwas obtained
(in a somewhat
ad hoc way)by addingtwo ad-
148
ditional
verbssold/VBN
and sold/VBDto the verbset.Training material
7 Themodelwas usedto classify
consisted
similar
but independent
data.
the
220testexamples
by meansof theratio:
Psold/VBD,n
Psold/VBV,n"{- Psold/VBN,tt
Pairs with a score greater than 0.5 were assigned to the seller role, and those
with a lower score to the sold object role. Of the 118 post-modlfler pairs, 110
were correctly classified into the sold object role. Of the 87 ordinary subjectverb items, 69 were correctly classified into the seller role. Of the 15 instances
of the middle construction, 11 were correctly classified into the sold object
role. This gives an rate of correct classification of 86.4%. The disambiguation
scores are listed in tables 5, 5, and 5. Appositely, the least ambiguousinstance
of the seller role is the noun seller. The least ambiguousex~tmple of the sold
object role is output. Manyof the problematic nouns -- those with scores
below 0.5 in table 5 -- nameinstitutions which can be agents, but can also be
bought and sold, for instance company,airline, and 8tore. Since companies are
actually described in the Wall Street Journal both as being sold and as selling
things, we would not expect a selectional approach to be uniformly successful
in this case. However,we want to resolve clear cases correctly m artefacts,
materials, and financial instruments are unambiguossold objects, and people
are unambiguoussellers. With few exceptions, such clear cases are resolved
correctly.
7In writing this section, I found that the training data and my record of how they were
constructed had been lost. Therefore, the evaluation will be redone, and final results may
differ from those described here.
149
output .O0(X)O
volume .00001
suit .01225
guolin8 .044M
bill .04526
missile .08?26
movie .0?033
system .08970
technololy .09182
par~ .09475
film .09783 advertisinl .11294
mstarisl .11427
shoe .11488
oil .12449
product .12479
chip .12479
plant .12868
device .13161 merchandise .13183
|cod .13240
car .13363
machine .13475
operation .13514
barrel .13680
bulidlnl; .13923
loan .14066
document .14141
copy .14185
iold .14160
debt .14402
truck .14454
debenture .14468
bond .14543
brand .14633
share .14641
Sis .14673
food .14708
certiflclte
.14839
covera4Ke .14758
inventory .14886
~ .14924
~t ,14957
acre .15149
phone .15157
property .15334
note .15414
version ,15509
warrant .15515
ticket .1332S
block .13751
card .15806
pound .16148
business .16485
home .16962
contract .ITIO0
computer .17181
steel .17360
show .17449
amount .17630
dollar .18039
proKrmm .18111
sJrcre/t .13204
insurance .18227
enKinc .18332
book .!8543
site .19478
line .19487
land .19868
dru s .19588
security .19686
wespon .19701
test .19931
ttpe .20481
item .21203
house .21368
rest .21T06
piano .21825
station .21854
piece .22123
psper .22272
msrk .22307
import .22753
arm .23139
model .23296
mqsaJne .23470
call .24358
billion .24488
service .25488
issue .28688
work .23714
polio/ .26237
oncqD’ .20286
supply .20845
pcettion .29432
vehicle .29797
tbrih .30081 collection .30386
list .33141’
sd .33409
million .33923
|ace .34198
one .34246
shipment .38118
art .38336
article .39738 newspaper .39756
t,/pe .41898
set .43282 tronsaction .45646
hcepiteJ .55241
kind .68506
pla~er .69998
export .71760
switch .74617
snimsJ .84308
afent .75588
premium .84474
Table9: Dis~mbiguation
scoresfor NP/postmodifier
combinations.
Items
above
thelinearecorrectly
classified.
trade .00001
effort
ut&te .16620
division
subsidiary
.26404
account
comp4my .36211
trust
s/rilne
.44064 institution
|isnt
.60438
team
oper&tor .63T52
utifity
industry
.66728
couple
country
.71725
firm
foreiKner .75349
room
corporotion
.78073
hnsband
n~tion .83127 stockholder
school .89880
broker
producer .90640
s&ency
wife .94643
ch-;rmsa
a~n .96147 department
people .97064
membcr
employee .97603
pislntiff
defendant .98917
fsrm~.
-I
trsder .ggT3T
ot~¢i
sdmin/strstion 1.00000
an&lynn
found&rich 1.00000
omcer
seller 1.00000
.08683
plan .11408
shop .14468
.17522
store .17609
unit .23840
.27786
network .32662
fund .32709
.39613
psrtnership
.40965
concern .43030
.46481
.64480
psrent
.63279
.69602 orsanins~ion
.63796
insurer
.64974
ba~k .664S7
.678U
msmqer
.68160
maker .?0603
.72093 manufacturer
.?2643 shareholder
.74399
.78649
lender .73983
I~’oup .77943
.80379
publisher
.80629
developer .81175
.83832
st&re .87692
family .88925
.90178
individuaA .90383
executive
.90404
.91013
psrtncr
.93268
customer .94037
.94838
holder .98723
boa~rd .95960
.96479
f&ther .96875
dcsJer .97063
.97166
investor .97463
brotbcr
.97580
.97753
world .97878 l[overnment
.98305
.99021
friend .99417
l&wyer .99581
.99880
.99883
client
owner 1.00000
buyer 1.00000
1.00000
director 1.00000
1.00000
psrticipant 1.00000
president 1.00000
Table10: Disarnbiguation
scoresfor agent/verb
combinations.
Itemsbelow
thelinearecorrectly
classified.
I
o/ferin|
franchise
price
other
mctsA .14244 &psrtment .14406 stock .14880 I
.13354
.15312 mcmbersbJp.16788
future .19601 scot .27562
.43832
hundred .44699 conference .49000
.37470
thin~ .65876
target .99409 run .99997
Table 11: Disambiguation scores for middle constructions.
line are correctlyclassified.
15o
Items above the
°.
References
[Church
et al.,199x]Church,K. W., Gale,W. A., Hanks,P., and Hindle,
D. (199x).
Usingstatistics
in lexical
analysis.
In Zeruik,
editor,
Le~cal
acquisition: e~loiting on-line resource8 to build a le.zicon.
[Hindle, 1983] Hindle, D. (1983). User manual for fidditch, a deterministic
parser. Technical Memorandum7590-142, Naval Research Laboratory.
[Hindie,
1994]Hindie,
D. (1994).
A parser
fortextcorpora.
In Atkins,
B. and
Z~mpolli,
A., editors,
Computational
Approaches
to the Le~con.Oxford
University
Press,
Oxford.
[Hindle
andRooth,1993]Hindle,
D. and Rooth,M. (1993).
Structural
ambiguityandlexical
relations.
Computational
Linguistics,
18:xx-yy.
[Liberman, 1992] Liberman, M. (1992).
for Computational Linguistics.
151
ACL-DCICD ROM1. Association
Download