InPhOrmed Philosophy

advertisement
InPhOrmed Philosophy
Combining text mining and expert judgment
Colin Allen <colallen@indiana.edu>
Mathias Niepert <mniepert@indiana.edu>
Cameron Buckner <cbuckner@indiana.edu>
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
1
Goals
To build and maintain
a “dynamic ontology”
for the discipline of
philosophy
To deploy the InPhO
in a variety of Digital
Philosophy
applications.
Digital Tools for the Humanities
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
2
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
3
InPhO Architecture Overview
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
4
SEP Data
1289 authors (+12)
114 subject editors (+1)
1037 published
entries (+20)
12.05 million words
(+250,000)
600-850K articles
accessed/week
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
5
Other
data
sources
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
6
Wanted: Structured Data Out
Ontology: formal, machine-readable specification of the
types of entities in a domain and relationships between them
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
7
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
8
Bridging the Data-Metadata Gap
Two “extremes”:
Hire experts to design & maintain an ontology
Problems: labor-intensive, expensive,
depends on “double” experts
Tagging approaches, folksonomies
Problems: may not meet academic
standards; noisy
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
9
Building ontologies: A third way
Stratified collaboration
Authors &
Editors (Experts)
Expert feedback
software
Ontology Model Induction (Answer Sets)
General feedback
software
Expert-written content
semi
expert
input
and
usage
Software Semantic/Statistical Analysis
The InPhO “layer cake”
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
10
• Experts are busy people. Experts don’t
Authors &
Editors (Experts)
Ontology Model Induction (Answer Sets)
want to be bothered with garbage.
Experts don’t like their hard work
messed up by amateurs.
• Knowledgeable amateurs often have more
semi
expert
input
and
usage
Software Semantic/Statistical Analysis
time and motivation to fix things, but
they are rare. They don’t like having their
hard work messed up by the clueless
either.
• Well-intentioned amateurs are plentiful
and motivated to donate their time, But
they make mistakes.
• Software has lots of time, has no
motivation problems but is clueless.
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
11
Two networks
authors, editors, readers, etc.
12
Software Layer 1
Is it possible to detect hyper/hyponymy statistically?
If t1 is a hypernym of t2 then
t1 is semantically similar to t2
t1 is more general than t2 with respect to a taxonomy
Probabilistic J-measure widely used to estimate the
semantic similarity between two terms:
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
13
Semantic similarity
We build a directed and weighted co-occurrence graph G = (V , E) in
which each node represents a term in our set of keywords. An edge
between two terms t1 and t2 indicates that the terms co-occur in the
encyclopedia at least once and the weight of the edge is a measure of
their semantic similarity
By iterating over all documents in the encyclopedia and counting their
term (co-)occurrences we can estimate the probabilities p(ti ), p(tj ),
p(ti,tj ), and thus p(ti|tj ) for all terms ti, tj, with respect to a unit of text.
Currently we consider a document and a one sentence sliding window
as units of text; i.e. two co-occurrence records are created, one for the
sentence level and one for the document level. The latter is projected
into the former by treating an entire document as one large sentence,
giving us more co-occurrences at the cost of possibly including some
connected but unrelated terms in our graph.
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
14
Hyper/hyponymy
We hypothesize encyclopedias are “balanced” – terms representing
more general categories tend to co-occur with more terms in the
encyclopedia’s text.
Normalized node in degree will usually be a good measure for the
generality of category, but we anticipate that entropy is an even
better approximation of generality because it not only takes into
account the in-degree of the node but also how evenly its adjacent
nodes are conditionally distributed.
Node entropy provides a measure for generality that can be used to
rank hypernym/hyponym candidates via the “R-measure” (Niepert et
al. 2007):
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
15
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
16
Leveraging Expertise
Authors &
Editors (Experts)
Simple question interface to
gather feedback on statistically
generated “hypotheses”
Automated (nonmonotonic)
reasoning to put the pieces
together (Answer Set
Programming)
Ontology Model Induction (Answer Sets)
semi
expert
input
and
usage
Software Semantic/Statistical Analysis
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
17
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
18
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
19
Software 2
Is it possible to use possibly
conflicting evalautions?
Yes: Non monotonic
reasoning.
We use answer-set
programming.
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
20
Answer Set
Programming
Three parts:
Signature: predicate symbols (e.g., desc) and set of
terms (here: terms referring to ideas in Philosophy)
Declaration: Set of feedback facts, (e.g., morespecific(Neural Network, Connectionism)) and the
facts given by the existing ontological structure (e.g.,
is-a(Thinking Machines, Artificial Intelligence))
Regular Part (set of rules)
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
21
Answer Set Programming
Conflicting Feedback
Conflicting feedback is possible!
Modeled using predicate ic (inconsistent):
ic(X, Y) :- ms(X, Y), mg(X, Y).
Can be used to model “semantic links” between
incomparable ideas:
plink(X, Y) :- s4(X, Y), ic(X, Y), not desc(X, Y), class(X).
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
22
Quality of response
Is it possible to use feedback
according to quality?
If X is an author or editor at the
SEP, then X is an expert in
subject areas a, b, ...
If Y provides feedback in area a
that is well correlated with experts
in a, then Y’s feedback about
edge E in a may be trusted in the
absence of contrary expert
feedack.
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
23
Untitled 2
Managing
Expertise
Is it possible to use feedback
according to quality?
If X is an author or editor at
the SEP, then X is an
expert in subject areas a,
b, ...
If Y provides feedback in
area a that is well
correlated with experts in
a, then Y’s feedback about
edge E in a may be trusted
in the absence of contrary
expert feedack.
10/20/08 5:53 PM
%stratified input predicates are named same as unstratified counterparts but
with "i" su!x
rating(0).
rating(1).
rating(2).
rating(3).
rating(4).
%process expert-level-stratified input versions of predicates into those used by
program
%inverse
msi(X, Y, A) " mgi(Y, X, A), rating(A).
mgi(X, Y, A) " msi(Y, X, A), rating(A).
%similarity symmetry
p4i(X, Y, A) " p4i(Y, X, A), rating(A).
p3i(X, Y, A) " p3i(Y, X, A), rating(A).
%incomparable symmetry
ici(X, Y, A) " ici(Y, X, A), rating(A).
%evidence against similarity ratings if contradicted by those at a higher level
np4(X, Y, A) " p4i(X, Y, A), p0i(X, Y, B), B>A, rating(A), rating(B).
np4(X, Y, A) " p4i(X, Y, A), p1i(X, Y, B), B>A, rating(A), rating(B).
np3(X, Y, A) " p3i(X, Y, A), p0i(X, Y, B), B>A, rating(A), rating(B).
np3(X, Y, A) " p3i(X, Y, A), p1i(X, Y, B), B>A, rating(A), rating(B).
%if no evidence against the similarity at that level, allow to pass through the
"filter"
p4(X, Y) " p4i(X, Y, A), not np4(X, Y, A), rating(A).
p3(X, Y) " p3i(X, Y, A), not np3(X, Y, A), rating(A).
%allow lower-level generality evaluations to pass through if not contradicted by
a higher level
%mg(X, Y) " mgi(X, Y, A), rating(A), rating(B), not msi(X, Y, B), B>A.
%ms(X, Y) " msi(X, Y, A), rating(A), rating(B), not msi(X, Y, B), B>A.
%evidence against the generality ratings at one level if contradicted by those at
a higher level
nmg(X, Y, A) " mgi(X, Y, A), msi(X, Y, B), B>A, rating(A), rating(B).
http://inpho.cogs.indiana.edu/
nms(X, Y, A) " msi (X, Y, A), mgi(X, Y, B), B>A, rating(A), rating(B).
InPhO
%if no evidence against the generality at that level, allow to pass through the
the Indiana Philosophy Ontology project
24
Page 1 of 4
Representing Philosophy
Three Models
Wiki — Power to the people! The world is flat!
Peer reviewed — Experts know best!
Mountaintop sanctuaries (SEP, “Formal” Ontology )
Stratified — From each according to ability! A
complex landscape (InPhO)
http://inpho.cogs.indiana.edu/
InPhO
the Indiana Philosophy Ontology project
25
26
Download