Markov Logic in Natural Language Processing

advertisement
Markov Logic in Natural
Language Processing
Hoifung Poon
Dept. of Computer Science & Eng.
University of Washington
Overview




Motivation
Foundational areas
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
2
Holy Grail of NLP:
Automatic Language Understanding
Natural language search
Answer questions
Knowledge discovery
……
Text
Meaning
3
Reality: Increasingly Fragmented
Parsing
Semantics
Tagging
Morphology
Information
Extraction
4
Time for a New Synthesis?



Speed up progress
New opportunities to improve performance
But we need a new tool for this …
5
Languages Are Structural
governments
lm$pxtm
(according to their families)
6
Languages Are Structural
S
govern-ment-s
l-m$px-t-m
VP
NP
V
(according to their families)
NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase
activation in IL-10 up-regulation
in human monocytes by gp41......
involvement
Theme
Cause
up-regulation
Theme
IL-10
Cause
gp41
Site
activation
Theme
human
p70(S6)-kinase
monocyte
George Walker Bush was the
43rd President of the United
States.
……
Bush was the eldest son of
President G. H. W. Bush and
Babara Bush.
…….
In November 1977, he met
Laura Welch at a barbecue. 7
Languages Are Structural
S
govern-ment-s
l-m$px-t-m
VP
NP
V
(according to their families)
NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase
activation in IL-10 up-regulation
in human monocytes by gp41......
involvement
Theme
Cause
up-regulation
Theme
IL-10
Cause
gp41
Site
activation
Theme
human
p70(S6)-kinase
monocyte
George Walker Bush was the
43rd President of the United
States.
……
Bush was the eldest son of
President G. H. W. Bush and
Babara Bush.
…….
In November 1977, he met
Laura Welch at a barbecue. 8
Processing Is Complex
Morphology
POS Tagging
Semantic Role Labeling
Coreference Resolution
Chunking
Syntactic Parsing
Information Extraction
……
9
Pipeline Is Suboptimal
Morphology
POS Tagging
Chunking
Semantic Role Labeling
Syntactic Parsing
Coreference Resolution
Information Extraction
……
10
First-Order Logic




Main theoretical foundation of computer science
General language for describing
complex structures and knowledge
Trees, graphs, dependencies, hierarchies, etc.
easily expressed
Inference algorithms (satisfiability testing,
theorem proving, etc.)
11
Languages Are Statistical
I saw the man with the telescope
NP
I saw the man with the telescope
NP
ADVP
I saw the man with the telescope
Microsoft buys Powerset
Microsoft acquires Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsoft’s purchase of Powerset, …
……
Here in London, Frances Deek is a retired teacher …
In the Israeli town …, Karen London says …
Now London says …
London  PERSON or LOCATION?
G. W. Bush ……
…… Laura Bush ……
Mrs. Bush ……
Which one?
12
Languages Are Statistical





Languages are ambiguous
Our information is always incomplete
We need to model correlations
Our predictions are uncertain
Statistics provides the tools to handle this
13
Probabilistic Graphical Models







Mixture models
Hidden Markov models
Bayesian networks
Markov random fields
Maximum entropy models
Conditional random fields
Etc.
14
The Problem




Logic is deterministic, requires manual coding
Statistical models assume i.i.d. data,
objects = feature vectors
Historically, statistical and logical NLP
have been pursued separately
We need to unify the two!
15
Also, Supervision Is Scarce



Supervised learning needs training examples
Tons of texts … but most are not annotated
Labeling is expensive (Cf. Penn-Treebank)
 Need to leverage indirect supervision
16
A Promising Solution:
Statistical Relational Learning



Emerging direction in machine learning
Unifies logical and statistical approaches
Principal way to leverage direct and indirect
supervision
17
Key: Joint Inference



Models complex interdependencies
Propagates information from more certain
decisions to resolve ambiguities in others
Advantages:




Better and more intuitive models
Improve predictive accuracy
Compensate for lack of training examples
SRL can have even greater impact when
direct supervision is scarce
18
Challenges in Applying
Statistical Relational Learning



Learning is much harder
Inference becomes a crucial issue
Greater complexity for user
19
Progress to Date



Probabilistic logic [Nilsson, 1986]
Statistics and beliefs [Halpern, 1990]
Knowledge-based model construction
[Wellman et al., 1992]

Stochastic logic programs [Muggleton, 1996]
Probabilistic relational models [Friedman et al., 1999]
Relational Markov networks [Taskar et al., 2002]

Etc.

This talk: Markov logic [Domingos & Lowd, 2009]


20
Markov Logic:
A Unifying Framework





Probabilistic graphical models and
first-order logic are special cases
Unified inference and learning algorithms
Easy-to-use software: Alchemy
Broad applicability
Goal of this tutorial:
Quickly learn how to use Markov logic and Alchemy
for a broad spectrum of NLP applications
21
Overview


Motivation
Foundational areas






Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
22
Markov Networks

Undirected graphical models
Smoking
Cancer
Asthma

Cough
Potential functions defined over cliques
1
P( x)    c ( xc )
Z c
Z    c ( xc )
x
c
Smoking Cancer
Ф(S,C)
False
False
4.5
False
True
4.5
True
False
2.7
True
True
4.5
23
Markov Networks

Undirected graphical models
Smoking
Cancer
Asthma

Cough
Log-linear model:
1


P( x)  exp   wi f i ( x) 
Z
 i

Weight of Feature i
Feature i
 1 if  Smoking  Cancer
f1 (Smoking, Cancer )  
 0 otherwise
w1  1.5
24
Markov Nets vs. Bayes Nets
Property
Markov Nets
Bayes Nets
Form
Prod. potentials
Prod. potentials
Potentials
Arbitrary
Cond. probabilities
Cycles
Allowed
Forbidden
Partition func. Z = ?
Indep. check
Z=1
Graph separation D-separation
Indep. props. Some
Some
Inference
Convert to Markov
MCMC, BP, etc.
25
Inference in Markov Networks

Goal: compute marginals & conditionals of
P( X ) 


1


exp   wi f i ( X ) 
Z
 i



Z   exp   wi f i ( X ) 
X
 i

Exact inference is #P-complete
Conditioning on Markov blanket is easy:
w f ( x) 


P( x | MB( x )) 
exp   w f ( x  0)   exp   w f ( x  1) 
exp
i

i
i i
i i
i
i i
Gibbs sampling exploits this
26
MCMC: Gibbs Sampling
state ← random truth assignment
for i ← 1 to num-samples do
for each variable x
sample x according to P(x|neighbors(x))
state ← state with new value of x
P(F) ← fraction of states in which F is true
27
Other Inference Methods


Belief propagation (sum-product)
Mean field / Variational approximations
28
MAP/MPE Inference

Goal: Find most likely state of world given
evidence
max P( y | x)
y
Query
Evidence
29
MAP Inference Algorithms





Iterated conditional modes
Simulated annealing
Graph cuts
Belief propagation (max-product)
LP relaxation
30
Overview


Motivation
Foundational areas






Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
31
Generative Weight Learning



Maximize likelihood
Use gradient ascent or L-BFGS
No local maxima

log Pw ( x)  ni ( x)  Ew ni ( x)
wi
No. of times feature i is true in data
Expected no. times feature i is true according to model

Requires inference at each step (slow!)
32
Pseudo-Likelihood
PL( x)   P( xi | neighbors ( xi ))
i




Likelihood of each variable given its
neighbors in the data
Does not require inference at each step
Widely used in vision, spatial statistics, etc.
But PL parameters may not work well for
long inference chains
33
Discriminative Weight Learning

Maximize conditional likelihood of query (y)
given evidence (x)

log Pw ( y | x)  ni ( x, y )  Ew ni ( x, y )
wi
No. of true groundings of clause i in data
Expected no. true groundings according to model

Approximate expected counts by counts in
MAP state of y given x
34
Voted Perceptron



Originally proposed for training HMMs
discriminatively
Assumes network is linear chain
Can be generalized to arbitrary networks
wi ← 0
for t ← 1 to T do
yMAP ← Viterbi(x)
wi ← wi + η [counti(yData) – counti(yMAP)]
return  wi / T
35
Overview


Motivation
Foundational areas






Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
36
First-Order Logic





Constants, variables, functions, predicates
E.g.: Anna, x, MotherOf(x), Friends(x, y)
Literal: Predicate or its negation
Clause: Disjunction of literals
Grounding: Replace all variables by constants
E.g.: Friends (Anna, Bob)
World (model, interpretation):
Assignment of truth values to all ground
predicates
37
Inference in First-Order Logic





Traditionally done by theorem proving
(e.g.: Prolog)
Propositionalization followed by model
checking turns out to be faster (often by a lot)
Propositionalization:
Create all ground atoms and clauses
Model checking: Satisfiability testing
Two main approaches:


Backtracking (e.g.: DPLL)
Stochastic local search (e.g.: WalkSAT)
38
Satisfiability






Input: Set of clauses
(Convert KB to conjunctive normal form (CNF))
Output: Truth assignment that satisfies all clauses,
or failure
The paradigmatic NP-complete problem
Solution: Search
Key point:
Most SAT problems are actually easy
Hard region: Narrow range of
#Clauses / #Variables
39
Stochastic Local Search






Uses complete assignments instead of partial
Start with random state
Flip variables in unsatisfied clauses
Hill-climbing: Minimize # unsatisfied clauses
Avoid local minima: Random flips
Multiple restarts
40
The WalkSAT Algorithm
for i ← 1 to max-tries do
solution = random truth assignment
for j ← 1 to max-flips do
if all clauses satisfied then
return solution
c ← random unsatisfied clause
with probability p
flip a random variable in c
else
flip variable in c that maximizes # satisfied clauses
return failure
41
Overview


Motivation
Foundational areas






Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
42
Rule Induction

Given: Set of positive and negative examples of
some concept




Goal: Induce a set of rules that cover all positive
examples and no negative ones




Example: (x1, x2, … , xn, y)
y: concept (Boolean)
x1, x2, … , xn: attributes (assume Boolean)
Rule: xa ^ xb ^ …  y (xa: Literal, i.e., xi or its negation)
Same as Horn clause: Body  Head
Rule r covers example x iff x satisfies body of r
Eval(r): Accuracy, info gain, coverage, support, etc.
43
Learning a Single Rule
head ← y
body ← Ø
repeat
for each literal x
rx ← r with x added to body
Eval(rx)
body ← body ^ best x
until no x improves Eval(r)
return r
44
Learning a Set of Rules
R←Ø
S ← examples
repeat
learn a single rule r
R←RU{r}
S ← S − positive examples covered by r
until S = Ø
return R
45
First-Order Rule Induction





y and xi are now predicates with arguments
E.g.: y is Ancestor(x,y), xi is Parent(x,y)
Literals to add are predicates or their negations
Literal to add must include at least one variable
already appearing in rule
Adding a literal changes # groundings of rule
E.g.: Ancestor(x,z) ^ Parent(z,y)  Ancestor(x,y)
Eval(r) must take this into account
E.g.: Multiply by # positive groundings of rule
still covered after adding literal
46
Overview




Motivation
Foundational areas
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
47
Markov Logic




Syntax: Weighted first-order formulas
Semantics: Feature templates for Markov
networks
Intuition: Soften logical constraints
Give each formula a weight
(Higher weight  Stronger constraint)
P(world)  exp  weights of formulas it satisfies

48
Example: Coreference Resolution
Mentions of Obama are often headed by "Obama"
Mentions of Obama are often headed by "President"
Appositions usually refer to the same entity
Barack Obama, the 44th President
of the United States, is the first
African American to hold the office.
……
49
Example: Coreference Resolution
x MentionOf ( x, Obama)  Head( x,"Obama ")
x MentionOf ( x, Obama)  Head( x,"President ")
x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)
50
Example: Coreference Resolution
1.5 x MentionOf ( x, Obama)  Head( x,"Obama ")
0.8 x MentionOf ( x, Obama)  Head( x,"President ")
100 x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)
51
Example: Coreference Resolution
1.5 x MentionOf ( x, Obama)  Head( x,"Obama ")
0.8 x MentionOf ( x, Obama)  Head( x,"President ")
100 x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)
Two mention constants: A and B
Apposition(A,B)
Head(A,“President”)
Head(B,“President”)
MentionOf(A,Obama)
MentionOf(B,Obama)
Head(A,“Obama”)
Head(B,“Obama”)
Apposition(B,A)
52
Markov Logic Networks

MLN is template for ground Markov nets

Probability of a world x:
1


P( x)  exp   wi ni ( x) 
Z
 i

Weight of formula i



No. of true groundings of formula i in x
Typed variables and constants greatly reduce size of
ground Markov net
Functions, existential quantifiers, etc.
Can handle infinite domains [Singla & Domingos, 2007]
and continuous domains [Wang & Domingos, 2008]
53
Relation to Statistical Models

Special cases:











Markov networks
Markov random fields
Bayesian networks
Log-linear models
Exponential models
Max. entropy models
Gibbs distributions
Boltzmann machines
Logistic regression
Hidden Markov models
Conditional random fields

Obtained by making all
predicates zero-arity

Markov logic allows
objects to be
interdependent
(non-i.i.d.)
54
Relation to First-Order Logic



Infinite weights  First-order logic
Satisfiable KB, positive weights 
Satisfying assignments = Modes of distribution
Markov logic allows contradictions between
formulas
55
MLN Algorithms:
The First Three Generations
Problem
MAP
inference
Marginal
inference
Weight
learning
Structure
learning
First
generation
Weighted
satisfiability
Gibbs
sampling
Pseudolikelihood
Inductive
logic progr.
Second
generation
Lazy
inference
MC-SAT
Voted
perceptron
ILP + PL
(etc.)
Third
generation
Cutting
planes
Lifted
inference
Scaled conj.
gradient
Clustering +
pathfinding
56
MAP/MPE Inference

Problem: Find most likely state of world
given evidence
max P( y | x)
y
Query
Evidence
57
MAP/MPE Inference

Problem: Find most likely state of world
given evidence
1


max
exp   wi ni ( x, y ) 
y
Zx
 i

58
MAP/MPE Inference

Problem: Find most likely state of world
given evidence
max
y
 w n ( x, y)
i i
i
59
MAP/MPE Inference

Problem: Find most likely state of world
given evidence
max
y


 w n ( x, y)
i i
i
This is just the weighted MaxSAT problem
Use weighted SAT solver
(e.g., MaxWalkSAT [Kautz et al., 1997] )
60
The MaxWalkSAT Algorithm
for i ← 1 to max-tries do
solution = random truth assignment
for j ← 1 to max-flips do
if  weights(sat. clauses) > threshold then
return solution
c ← random unsatisfied clause
with probability p
flip a random variable in c
else
flip variable in c that maximizes
 weights(sat. clauses)
return failure, best solution found
61
Computing Probabilities




P(Formula|MLN,C) = ?
MCMC: Sample worlds, check formula holds
P(Formula1|Formula2,MLN,C) = ?
If Formula2 = Conjunction of ground atoms


First construct min subset of network necessary to
answer query (generalization of KBMC)
Then apply MCMC
62
But … Insufficient for Logic

Problem:
Deterministic dependencies break MCMC
Near-deterministic ones make it very slow

Solution:
Combine MCMC and WalkSAT
→ MC-SAT algorithm [Poon & Domingos, 2006]
63
Auxiliary-Variable Methods

Main ideas:



Use auxiliary variables to capture dependencies
Turn difficult sampling into uniform sampling
Given distribution P(x)
1, if 0  u  P( x )
f ( x, u )  

0, otherwise

 f ( x, u) du  P( x)
Sample from f (x, u), then discard u
64
Slice Sampling [Damien et al. 1999]
P(x)
U
Slice
u(k)
X
x(k)
x(k+1)
65
Slice Sampling

Identifying the slice may be difficult
1
P( x )    i ( x )
Z i

Introduce an auxiliary variable ui for each Фi
f ( x, u1,
1 if 0  ui  i ( x)
, un )  
0 otherwise
66
The MC-SAT Algorithm

Select random subset M of satisfied clauses






With probability 1 – exp ( – wi )
Larger wi  Ci more likely to be selected
Hard clause (wi  ): Always selected
Slice  States that satisfy clauses in M
Uses SAT solver to sample x | u.
Orders of magnitude faster than Gibbs sampling,
etc.
67
But … It Is Not Scalable




1000 researchers
Coauthor(x,y): 1 million ground atoms
Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z):
1 billion ground clauses
Exponential in arity
68
Sparsity to the Rescue




1000 researchers
Coauthor(x,y): 1 million ground atoms
But … most atoms are false
Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z):
1 billion ground clauses
Most trivially satisfied if most atoms are false
No need to explicitly compute most of them
69
Lazy Inference

LazySAT [Singla & Domingos, 2006a]




Lazy version of WalkSAT [Selman et al., 1996]
Grounds atoms/clauses as needed
Greatly reduces memory usage
The idea is much more general
[Poon & Domingos, 2008a]
70
General Method for Lazy Inference


If most variables assume the default value,
wasteful to instantiate all variables / functions
Main idea:



Allocate memory for a small subset of
“active” variables / functions
Activate more if necessary as inference proceeds
Applicable to a diverse set of algorithms:
Satisfiability solvers (systematic, local-search), Markov chain Monte
Carlo, MPE / MAP algorithms, Maximum expected utility algorithms,
Belief propagation, MC-SAT, Etc.

Reduce memory and time by orders of magnitude
71
Lifted Inference





Consider belief propagation (BP)
Often in large problems, many nodes are
interchangeable:
They send and receive the same messages
throughout BP
Basic idea: Group them into supernodes,
forming lifted network
Smaller network → Faster inference
Akin to resolution in first-order logic
72
Belief Propagation
 x  f ( x) 
Nodes
(x)

h x
hn ( x ) \{ f }
( x)
Features
(f)
 wf ( x )

 f  x ( x)    e
 y  f ( y ) 

~{ x} 
yn ( f ) \{ x}

73
Lifted Belief Propagation
 x  f ( x) 
Nodes
(x)

h x
hn ( x ) \{ f }
( x)
Features
(f)
 wf ( x )

 f  x ( x)    e
 y  f ( y ) 

~{ x} 
yn ( f ) \{ x}

74
Lifted Belief Propagation
, :
Functions
of edge
counts
Nodes
(x)
 x  f ( x)     h  x ( x)

hn ( x ) \{ f }
Features
(f)
 wf ( x )

 f  x ( x)    e
 y  f ( y ) 

~{ x} 
yn ( f ) \{ x}

75
Learning




Data is a relational database
Closed world assumption (if not: EM)
Learning parameters (weights)
Learning structure (formulas)
76
Parameter Learning

Parameter tying: Groundings of same clause

log P ( x)  ni ( x)  Ex  ni ( x) 
wi
No. of times clause i is true in data
Expected no. times clause i is true according to MLN


Generative learning: Pseudo-likelihood
Discriminative learning: Conditional likelihood,
use MC-SAT or MaxWalkSAT for inference
77
Parameter Learning



Pseudo-likelihood + L-BFGS is fast and
robust but can give poor inference results
Voted perceptron:
Gradient descent + MAP inference
Scaled conjugate gradient
78
Voted Perceptron for MLNs



HMMs are special case of MLNs
Replace Viterbi by MaxWalkSAT
Network can now be arbitrary graph
wi ← 0
for t ← 1 to T do
yMAP ← MaxWalkSAT(x)
wi ← wi + η [counti(yData) – counti(yMAP)]
return  wi / T
79
Problem: Multiple Modes



Not alleviated by contrastive divergence
Alleviated by MC-SAT
Warm start: Start each MC-SAT run at
previous end state
80
Problem: Extreme Ill-Conditioning



Solvable by quasi-Newton, conjugate gradient, etc.
But line searches require exact inference
Solution: Scaled conjugate gradient
[Lowd & Domingos, 2008]



Use Hessian to choose step size
Compute quadratic form inside MC-SAT
Use inverse diagonal Hessian as preconditioner
81
Structure Learning



Standard inductive logic programming optimizes
the wrong thing
But can be used to overgenerate for L1 pruning
Our approach:
ILP + Pseudo-likelihood + Structure priors
[Kok & Domingos 2005, 2008, 2009]


For each candidate structure change:
Start from current weights & relax convergence
Use subsampling to compute sufficient statistics
82
Structure Learning




Initial state: Unit clauses or prototype KB
Operators: Add/remove literal, flip sign
Evaluation function:
Pseudo-likelihood + Structure prior
Search: Beam search, shortest-first search
83
Alchemy
Open-source software including:
 Full first-order logic syntax
 Generative & discriminative weight learning
 Structure learning
 Weighted satisfiability, MCMC, lifted BP
 Programming language features
alchemy.cs.washington.edu
84
Alchemy
Represent- F.O. Logic +
ation
Markov nets
Inference
Learning
Prolog
BUGS
Horn
clauses
Bayes
nets
Model check- Theorem MCMC
ing, MCMC, proving
lifted BP
Parameters
No
Params.
& structure
Uncertainty Yes
No
Yes
Relational
Yes
No
Yes
85
Running Alchemy

Programs





Infer
Learnwts
Learnstruct
Options
MLN file




Types (optional)
Predicates
Formulas
Database files
86
Overview




Motivation
Foundational areas
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
87
Uniform Distribn.: Empty MLN
Example: Unbiased coin flips
Type:
flip = { 1, … , 20 }
Predicate: Heads(flip)
1
Z
0
e0
1
P(Heads( f ))  1

1 0
e Ze
2
Z
88
Binomial Distribn.: Unit Clause
Example: Biased coin flips
Type:
flip = { 1, … , 20 }
Predicate: Heads(flip)
Formula: Heads(f)
 p 

Weight:
Log odds of heads: w  log 
1 p 
1
Z
w
ew
1
P(Heads(f ))  1

p
1 0
w
e  Z e 1 e
Z
By default, MLN includes unit clauses for all predicates
(captures marginal distributions, etc.)
89
Multinomial Distribution
Example: Throwing die
throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face)
Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).
Exist f Outcome(t,f).
Types:
Too cumbersome!
90
Multinomial Distrib.: ! Notation
Example: Throwing die
throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Types:
Formulas:
Semantics: Arguments without “!” determine arguments with “!”.
Also makes inference more efficient (triggers blocking).
91
Multinomial Distrib.: + Notation
Example: Throwing biased die
throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas: Outcome(t,+f)
Types:
Semantics: Learn weight for each grounding of args with “+”.
92
Logistic Regression (MaxEnt)
 P(C  1 | F  f ) 
  a  bi f i
Logistic regression: log 
 P(C  0 | F  f ) 
Type:
obj = { 1, ... , n }
Query predicate:
C(obj)
Evidence predicates: Fi(obj)
Formulas:
a
C(x)
bi
Fi(x) ^ C(x)
Resulting distribution: P(C  c, F  f ) 
1


exp  ac   bi f i c 
Z
i


 exp a   bi f i  
 P(C  1 | F  f ) 
  a   bi f i
  log 
Therefore: log 


exp(
0
)
 P(C  0 | F  f ) 


Alternative form:
Fi(x) => C(x)
93
Hidden Markov Models
obs = { Red, Green, Yellow }
state = { Stop, Drive, Slow }
time = { 0, ..., 100 }
State(state!,time)
Obs(obs!,time)
State(+s,0)
State(+s,t) ^ State(+s',t+1)
Obs(+o,t) ^ State(+s,t)
Sparse HMM:
State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .
94
Bayesian Networks






Use all binary predicates with same first argument
(the object x).
One predicate for each variable A: A(x,v!)
One clause for each line in the CPT and
value of the variable
Context-specific independence:
One clause for each path in the decision tree
Logistic regression: As before
Noisy OR: Deterministic OR + Pairwise clauses
95
Relational Models

Knowledge-based model construction




Stochastic logic programs




Allow only Horn clauses
Same as Bayes nets, except arbitrary relations
Combin. function: Logistic regression, noisy-OR or external
Allow only Horn clauses
Weight of clause = log(p)
Add formulas: Head holds  Exactly one body holds
Probabilistic relational models


Allow only binary relations
Same as Bayes nets, except first argument can vary
96
Relational Models

Relational Markov networks




Bayesian logic




SQL → Datalog → First-order logic
One clause for each state of a clique
+ syntax in Alchemy facilitates this
Object = Cluster of similar/related observations
Observation constants + Object constants
Predicate InstanceOf(Obs,Obj) and clauses using it
Unknown relations: Second-order Markov logic
S. Kok & P. Domingos, “Statistical Predicate Invention”, in
Proc. ICML-2007.
97
Overview




Motivation
Foundational areas
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
98
Text Classification
The 56th quadrennial United States presidential
election was held on November 4, 2008. Outgoing
Republican President George W. Bush's policies and
actions and the American public's desire for change
were key issues throughout the campaign. ……
Topic = politics
The Chicago Bulls are an American professional
basketball team based in Chicago, Illinois, playing in
the Central Division of the Eastern Conference in the
National Basketball Association (NBA). ……
Topic = sports
……
99
Text Classification
page = {1, ..., max}
word = { ... }
topic = { ... }
Topic(page,topic)
HasWord(page,word)
Topic(p,t)
HasWord(p,+w) => Topic(p,+t)
If topics mutually exclusive: Topic(page,topic!)
100
Text Classification
page = {1, ..., max}
word = { ... }
topic = { ... }
Topic(page,topic)
HasWord(page,word)
Links(page,page)
Topic(p,t)
HasWord(p,+w) => Topic(p,+t)
Topic(p,t) ^ Links(p,p') => Topic(p',t)
Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification
Using Hyperlinks,” in Proc. SIGMOD-1998.
101
Entity Resolution
AUTHOR: H. POON & P. DOMINGOS
TITLE: UNSUPERVISED SEMANTIC PARSING
VENUE: EMNLP-09
AUTHOR: Hoifung Poon and Pedro Domings
TITLE: Unsupervised semantic parsing
VENUE: Proceedings of the 2009 Conference on Empirical Methods in
Natural Language Processing
SAME?
AUTHOR: Poon, Hoifung and Domings, Pedro
TITLE: Unsupervised ontology induction from text
VENUE: Proceedings of the Forty-Eighth Annual Meeting of the
Association for Computational Linguistics
SAME?
AUTHOR: H. Poon, P. Domings
TITLE: Unsupervised ontology induction
VENUE: ACL-10
102
Entity Resolution
Problem: Given database, find duplicate records
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
=> SameField(f,r,r’)
SameField(f,r,r’) => SameRecord(r,r’)
103
Entity Resolution
Problem: Given database, find duplicate records
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
=> SameField(f,r,r’)
SameField(f,r,r’) => SameRecord(r,r’)
SameRecord(r,r’) ^ SameRecord(r’,r”)
=> SameRecord(r,r”)
Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty
with Application to Noun Coreference,” in Adv. NIPS 17, 2005.
104
Entity Resolution
Can also resolve fields:
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
=> SameField(f,r,r’)
SameField(f,r,r’) <=> SameRecord(r,r’)
SameRecord(r,r’) ^ SameRecord(r’,r”)
=> SameRecord(r,r”)
SameField(f,r,r’) ^ SameField(f,r’,r”)
=> SameField(f,r,r”)
More: P. Singla & P. Domingos, “Entity Resolution with Markov
Logic”, in Proc. ICDM-2006.
105
Information Extraction
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.
Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing. Singapore: ACL.
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.
EMNLP-2009.
106
Information Extraction
Author
Title
Venue
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.
Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing. Singapore: ACL.
SAME?
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.
EMNLP-2009.
107
Information Extraction



Problem: Extract database from text or
semi-structured sources
Example: Extract database of publications
from citation list(s) (the “CiteSeer problem”)
Two steps:


Segmentation:
Use HMM to assign tokens to fields
Entity resolution:
Use logistic regression and transitivity
108
Information Extraction
Token(token, position, citation)
InField(position, field!, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
109
Information Extraction
Token(token, position, citation)
InField(position, field!, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
More: H. Poon & P. Domingos, “Joint Inference in Information
Extraction”, in Proc. AAAI-2007.
110
Biomedical Text Mining

Traditionally, name entity recognition or
information extraction
E.g., protein recognition, protein-protein identification

BioNLP-09 shared task: Nested bio-events



Much harder than traditional IE
Top F1 around 50%
Naturally calls for joint inference
111
Bio-Event Extraction
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
involvement
Theme
Cause
up-regulation
Theme
IL-10
Cause
Site
gp41
human
monocyte
activation
Theme
p70(S6)-kinase
112
Bio-Event Extraction
Token(position, token)
DepEdge(position, position, dependency)
IsProtein(position)
EvtType(position, evtType)
InArgPath(position, position, argType!)
Logistic
regression
Token(i,+w) => EvtType(i,+t)
Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)
DepEdge(i,j,+d) => InArgPath(i,j,+a)
Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)
…
113
Bio-Event Extraction
Token(position, token)
DepEdge(position, position, dependency)
IsProtein(position)
EvtType(position, evtType)
InArgPath(position, position, argType!)
Token(i,+w) => EvtType(i,+t)
Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)
DepEdge(i,j,+d) => InArgPath(i,j,+a)
Adding a few joint inference
Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)
rules doubles the F1
…
InArgPath(i,j,Theme) => IsProtein(j) v
(Exist k k!=i ^ InArgPath(j, k, Theme)).
…
More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge
Extraction from Biomedical Literature”, NAACL-2010.
114
Temporal Information Extraction


Identify event times and temporal relations
(BEFORE, AFTER, OVERLAP)
E.g., who is the President of U.S.A.?



Obama: 1/20/2009  present
G. W. Bush: 1/20/2001  1/19/2009
Etc.
115
Temporal Information Extraction
DepEdge(position, position, dependency)
Event(position, event)
After(event, event)
DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
116
Temporal Information Extraction
DepEdge(position, position, dependency)
Event(position, event)
After(event, event)
Role(position, position, role)
DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
More:
K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly
Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.
X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.
117
Semantic Role Labeling


Problem: Identify arguments for a predicate
Two steps:


Argument identification:
Determine whether a phrase is an argument
Role classification:
Determine the type of an argument (agent, theme,
temporal, adjunct, etc.)
118
Semantic Role Labeling
Token(position, token)
DepPath(position, position, path)
IsPredicate(position)
Role(position, position, role!)
HasRole(position, position)
Token(i,+t) => IsPredicate(i)
DepPath(i,j,+p) => Role(i,j,+r)
HasRole(i,j) => IsPredicate(i)
IsPredicate(i) => Exist j HasRole(i,j)
HasRole(i,j) => Exist r Role(i,j,r)
Role(i,j,r) => HasRole(i,j)
Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for
semantic role labeling”, in Computational Linguistics 2008.
119
Joint Semantic Role Labeling
and Word Sense Disambiguation
Token(position, token)
DepPath(position, position, path)
IsPredicate(position)
Role(position, position, role!)
HasRole(position, position)
Sense(position, sense!)
Token(i,+t) => IsPredicate(i)
DepPath(i,j,+p) => Role(i,j,+r)
Sense(I,s) => IsPredicate(i)
HasRole(i,j) => IsPredicate(i)
IsPredicate(i) => Exist j HasRole(i,j)
HasRole(i,j) => Exist r Role(i,j,r)
Role(i,j,r) => HasRole(i,j)
Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)
More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates,
Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.
120
Practical Tips: Modeling


Add all unit clauses (the default)
How to handle uncertain data:
R(x,y) ^ R’(x,y) (the “HMM trick”)

Implications vs. conjunctions
For soft correlation, conjunctions often better
 Implication: A => B is equivalent to !(A ^ !B)
 Share cases with others like A => C
 Make learning unnecessarily harder
121
Practical Tips: Efficiency




Open/closed world assumptions
Low clause arities
Low numbers of constants
Short inference chains
122
Practical Tips: Development




Start with easy components
Gradually expand to full task
Use the simplest MLN that works
Cycle: Add/delete formulas, learn and test
123
Overview




Motivation
Foundational areas
Markov logic
NLP applications



Basics
Supervised learning
Unsupervised learning
124
Unsupervised Learning: Why?




Virtually unlimited supply of unlabeled text
Labeling is expensive (Cf. Penn-Treebank)
Often difficult to label with consistency and
high quality (e.g., semantic parses)
Emerging field: Machine reading
Extract knowledge from unstructured text with
high precision/recall and minimal human effort
(More in tomorrow’s talk at 4PM)
125
Unsupervised Learning: How?


I.i.d. learning: Sophisticated model requires
more labeled data
Statistical relational learning: Sophisticated
model may require less labeled data



Relational dependencies constrain problem space
One formula is worth a thousand labels
Small amount of domain knowledge 
large-scale joint inference
126
Unsupervised Learning: How?



Ambiguities vary among objects
Joint inference  Propagate information from
unambiguous objects to ambiguous ones
E.g.:
Are they
G. W. Bush …
coreferent?
He …
…
Mrs. Bush …
127
Unsupervised Learning: How



Ambiguities vary among objects
Joint inference  Propagate information from
unambiguous objects to ambiguous ones
E.g.:
Should be
G. W. Bush …
coreferent
He …
…
Mrs. Bush …
128
Unsupervised Learning: How



Ambiguities vary among objects
Joint inference  Propagate information from
unambiguous objects to ambiguous ones
E.g.:
So must be
G. W. Bush …
singular male!
He …
…
Mrs. Bush …
129
Unsupervised Learning: How



Ambiguities vary among objects
Joint inference  Propagate information from
unambiguous objects to ambiguous ones
E.g.:
Must be
G. W. Bush …
singular female!
He …
…
Mrs. Bush …
130
Unsupervised Learning: How



Ambiguities vary among objects
Joint inference  Propagate information from
unambiguous objects to ambiguous ones
E.g.:
Verdict:
G. W. Bush …
Not coreferent!
He …
…
Mrs. Bush …
131
Parameter Learning

Marginalize out hidden variables

log P ( x)  Ez| x  ni ( x, z )   Ex , z  ni ( x, z ) 
wi
Sum over z, conditioned on observed x
Summed over both x and z


Use MC-SAT to approximate both expectations
May also combine with contrastive estimation
[Poon & Cherry & Toutanova, NAACL-2009]
132
Unsupervised Coreference Resolution
Head(mention, string)
Type(mention, type)
MentionOf(mention, entity)
Mixture model
MentionOf(+m,+e)
Type(+m,+t)
Head(+m,+h) ^ MentionOf(+m,+e)
Joint inference formulas:
Enforce agreement
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
… (similarly for Number, Gender etc.)
133
Unsupervised Coreference Resolution
Head(mention, string)
Type(mention, type)
MentionOf(mention, entity)
Apposition(mention, mention)
MentionOf(+m,+e)
Type(+m,+t)
Head(+m,+h) ^ MentionOf(+m,+e)
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
… (similarly for Number, Gender etc.)
Joint inference formulas:
Leverage apposition
Apposition(a,b) => (MentionOf(a,e) <=> MentionOf(b,e))
More: H. Poon and P. Domingos, “Joint Unsupervised Coreference
Resolution with Markov Logic”, in Proc. EMNLP-2008.
134
USP: End-to-End Machine Reading
[Poon & Domingos, EMNLP-2009, ACL-2010]



Read text, extract knowledge, answer
questions, all without any training examples
Recursively clusters expressions composed
with or by similar expressions
Compared to state of the art like TextRunner,
five-fold increase in recall, precision from
below 60% to 91%
(More in tomorrow’s talk at 4PM)
135
End of The Beginning …


Let’s not forget our grand goal:
Computers understand natural languages
Time to think about a new synthesis





Integrate previously fragmented subfields
Adopt 80/20 rule
End-to-end evaluations
Statistical relational learning offers a
promising new tool for this
Growth area of machine learning and NLP
136
Future Work: Inference

Scale up joint inference




Cutting-planes methods (e.g., [Riedel, 2008])
Unify lifted inference with sampling
Coarse-to-fine inference [Kiddon & Domingos, 2010]
Alternative technology
E.g., linear programming, lagrangian relaxation
137
Future Work: Supervised Learning

Alternative optimization objectives
E.g., max-margin learning [Huynh & Mooney, 2009]

Learning for efficient inference
E.g., learning arithmetic circuits [Lowd & Domingos, 2008]

Structure learning:
Improve accuracy and scalability
E.g., [Kok & Domingos, 2009]
138
Future Work: Unsupervised Learning




Model: Learning objective, formalism, etc.
Learning: Local optima, intractability, etc.
Hyperparameter tuning
Leverage available resources




Semi-supervised learning
Multi-task learning
Transfer learning (e.g., domain adaptation)
Human in the loop
E.g., interative ML, active learning, crowdsourcing
139
Future Work: NLP Applications

Existing application areas:





More joint inference opportunities
Additional domain knowledge
Combine multiple pipeline stages
A “killer app”: Machine reading
Many, many more awaiting YOU to discover
140
Summary


We need to unify logical and statistical NLP
Markov logic provides a language for this






Syntax: Weighted first-order formulas
Semantics: Feature templates of Markov nets
Inference: Satisfiability, MCMC, lifted BP, etc.
Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.
Growing set of NLP applications
Open-source software: Alchemy
alchemy.cs.washington.edu

Book: Domingos & Lowd, Markov Logic,
Morgan & Claypool, 2009.
141
References
[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen
Soderland, Matt Broadhead, Oren Etzioni, "Open Information
Extraction From the Web", In Proc. IJCAI-2007.
[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk,
"Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.
[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker,
"Gibbs sampling for Bayesian non-conjugate and hierarchical
models by auxiliary variables", Journal of the Royal Statistical
Society B, 61:2.
[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov
Logic, Morgan & Claypool.
[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi
Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI- 142
1999.
References
[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of
probability", Artificial Intelligence 46.
[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "MaxMargin Weight Learning for Markov Logic Networks", In Proc.
ECML-2009.
[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general
stochastic approach to solving problems with hard and soft
constraints", In The Satisfiability Problem: Theory and Applications.
AMS.
[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical
Predicate Invention", In Proc. ICML-2007.
[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting
Semantic Networks from Text via Relational Clustering", In Proc. 143
ECML-2008.
References
[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning
Markov Logic Network Structure via Hypergraph Lifting", In Proc.
ICML-2009.
[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal
Information Extraction", In Proc. AAAI-2010.
[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient
Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.
[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos,
"Learning Arithmetic Circuits", In Proc. UAI-2008.
[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel,
"Jointly Identifying Predicates, Arguments and Senses using Markov
Logic", In Proc. NAACL-2009.
144
References
[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in
Proc. ILP-1996.
[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence
28.
[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry
Winograd, "The PageRank Citation Ranking: Bringing Order to the
Web", Tech. Rept., Stanford University, 1998.
[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound
and Efficient Inference with Probabilistic and Deterministic
Dependencies", In Proc. AAAI-06.
[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint
Inference in Information Extraction", In Proc. AAAI-07.
145
References
[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc
Sumner, "A General Method for Reducing the Complexity of
Relational Inference and its Application to MCMC", In Proc. AAAI08.
[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint
Unsupervised Coreference Resolution with Markov Logic", In Proc.
EMNLP-08.
[Poon & Domingos, 2009] Hoifung and Pedro Domingos,
"Unsupervised Semantic Parsing", In Proc. EMNLP-09.
[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry,
Kristina Toutanova, "Unsupervised Morphological Segmentation
with Log-Linear Models", In Proc. NAACL-2009.
146
References
[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,
"Joint Inference for Knowledge Extraction from Biomedical
Literature", In Proc. NAACL-10.
[Poon & Domingos, 2010] Hoifung and Pedro Domingos,
"Unsupervised Ontology Induction From Text", In Proc. ACL-10.
[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency
of MAP Inference for Markov Logic", In Proc. UAI-2008.
[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa
Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to BioMolecular Event Extraction", In Proc. BioNLP 2009 Shared Task.
[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local
search strategies for satisfiability testing", In Cliques, Coloring, and
Satisfiability: Second DIMACS Implementation Challenge. AMS. 147
References
[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,
"Memory-Efficient Inference in Relational Domains", In Proc. AAAI2006.
[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity
Resolution with Markov Logic", In Proc. ICDM-2006.
[Singla & Domingos, 2007] Parag Singla and Pedro Domingos,
"Markov Logic in Infinite Domains", In Proc. UAI-2007.
[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted
First-Order Belief Propagation", In Proc. AAAI-2008.
[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller,
"Discriminative probabilistic models for relational data", in Proc. UAI2002.
148
References
[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria
Haghighi, Chris Manning, "A global joint model for semantic role
labeling", Computational Linguistics.
[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid
Markov Logic Networks", In Proc. AAAI-2008.
[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P.
Goldman, "From knowledge bases to decision models", Knowledge
Engineering Review 7.
[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel,
Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying
Temporal Relations with Markov Logic", In Proc. ACL-2009.
149
Download