The Topology of WordNet: Some Metrics

advertisement
The Topology of WordNet:
some metrics
Ann Devitt and Carl Vogel
Computational Linguistics Group
Trinity College Dublin, Ireland
Introduction
Measures
 WordNet “sub-hierarchies”
 Multiple inheritance
 Branching Factor
 Depth versus Height
 Cluster coefficients
 Specificity pilot study

Ann Devitt, TCD
Terminology

WordNet as directed acyclic graph

Node and synset interchangeable
Ann Devitt, TCD
Dimensional distribution
Ann Devitt, TCD
Overlap between hierarchies

2072 synsets: more than 1 top hierarchy

35 synsets: more than 2 top hierarchies
Ann Devitt, TCD
Some overlap examples

Abstraction and Event

948 synsets


group action
Entity and Group

250 nodes

weaponry
Ann Devitt, TCD
Multiple inheritance
2.6% of nodes
 Normal distribution throughout depth
 Significantly different in different
taxonomies:
 χ2 (8, N=75180)=324.27, p≤0.001

Ann Devitt, TCD
Specificity examples

Parents = 1, depth < 3

Parents > 1, depth < 3
person
 artefact

damnation
 office


Parents = 1, depth > 8

Parents > 1, depth > 8
sea bass
 selfcondemnation
 bombardon

beagle
 palomino

Ann Devitt, TCD
Branching Factor
Number of children + 1
 Including leaf nodes
 Range: 1 – 573
 Average: 2.023
 Excluding leaf nodes:
 Average: 5.793
 97% less than 20

Ann Devitt, TCD
Branching factor
Overall low branching factor
 Same distribution in all sub-hierarchies
 Large number of nodes in total
 Greater overall depth in paths
 Not a shallow structure


despite 55,000 leaf nodes
Ann Devitt, TCD
Depth vs Height
Depth:
 Maximum = 18
 Normal distribution
 Height:
 Maximum = 5
 93.6% 1 or 2 nodes from a leaf node
 Zipfian distribution

Ann Devitt, TCD
Depth vs Height

Reported distributions
 the same across the different sub
hierarchies

Depth is a more informative measure
Ann Devitt, TCD
Clustering coefficient
Measure of graph connectivity
 Ratio:

Number of connections btwn nodes
 Possible number of connections
2 Σi
ki (ki – 1)

Ann Devitt, TCD
Cluster coefficients

First-order measure
 Not useful for WordNet
 Only 62 nodes have a coefficient > 0
 Does not form clusters readily
Ann Devitt, TCD
Cluster coefficients

Second-order measure
 Average 0.337
 Normal distribution
 May form clusters of wider diameter
Ann Devitt, TCD
Pilot Study Aims
1.
2.
3.
Do people have a notion of
generality/specificity for concepts?
Do people agree on what is more/less
general/specific?
What features of WordNet do these
judgments correlate with?
Ann Devitt, TCD
Sample ranking task I
Axis, axis of rotation – (the center around
which something rotates
 River boat – (a boat used on rivers or to ply
a river)
 Remains – (any object that is left unused or
still extant; “I threw out the remains of my
dinner”

Ann Devitt, TCD
Sample ranking task II
rational motive - (a motive that can be
defended by reasoning or logical argument
 disapproval - (the act of disapproving or
condemning)
 harmony, concord, concordance (agreement of opinions)

Ann Devitt, TCD
Do people agree on what is
more/less general/specific?



YES
Cochran Q statistic (Cochran 1950)
H0 : that any agreement between respondents is
due to chance
Overall: for 11 respondents
 Cochran's Q165.859
 44 degrees of freedom
 Asymp. Sig. .000
Ann Devitt, TCD
What WN features correlate?




Depth
 Less deep = more general
Children
 Inconclusive
Sisters
 Less sisters = more general
Sub-hierarchy
 Did not seem to affect judgments
 Did increase the difficulty of the task
Ann Devitt, TCD
Conclusion
WordNet metrics
 Inheritance: Sub-hierarchy and parentage
 Branching Factor
 Distance: depth and height
 Clustering
 Pilot study
 Suggests where to go with a larger study

Ann Devitt, TCD
Bibliography



W. G. Cochran: The comparison of percentages in
matched samples. Biometrika, 37:256-266, 1950
David Touretsky: The Mathematics of Inheritance
Systems, Los Altos, CA: Morgan Kaufmann
(1986)
D. J. Watts and S. H. Strogatz: Collective
dynamics of small world networks, Nature 401,
130 (1999)
Ann Devitt, TCD
Multiple Inheritance vs Depth
Ann Devitt, TCD
Download