Slides - Ivan Titov

advertisement
Unsupervised and Weakly-Supervised
Probabilistic Modeling of Text
Ivan Titov
Outline

Introduction to the Topic

Seminar Plan

Requirements and Grading
2
What do we want to do with text?

One of the ultimate goals of natural language processing is
to learn a computer to understand text


Text understanding in an open domain is a very complex
problem which you cannot possibly solve using a set of handcrafted rules
Instead essentially all the modern approaches to natural
language processing use statistical techniques
Example of Ambiguites
… Nissan car and truck plant is located in …
… divide life into plant and animal kingdom …
… (Article This) (Noun can) (Modal will) (Verb rust ) …
The dog bit the kid. He was taken to a veterinarian |
hospital).
Tiger was in Washington for the PGA tour
NLP Tasks

“Full” language understanding is beyond state of the art and
cannot be approached as a single task, instead:
 Practical Applications:


Relation extraction, question answering, text summarization,
translation, ….
Prediction of Linguistic Representations:

Syntactic parsing, shallow semantic parsing (semantic role labeling),
discourse parsing, …
Supervised Statistical Methods

Annotate texts with (structured) labels and learn a
model from this data
Supervised Statistical Methods

More formally:



X – text,Y – label (e.g., syntactic structure)
Construct a parameterized model P(Y | X, W)
Estimate W on a collection {(Xi, Yi)}i=1…N :

Maximum likelihood estimation:
Q
^
W = arg maxW i = 1:::N P(Yi jX i ; W )

Predict a label for new example X:
^ = arg maxY P(YjX ; W
^)
Y
Supervised Statistical Models

Most task in NLP are complex and therefore large amounts of
data are needed



Annotation isNot
not feasible
just YESfor
or many
NO, but
usually
tasks
and complex graphs
Domain variability:
brittle when
applied out-of-domain
very expensive
for others



E.g., the standard PennTreebank Wall Street Journal dataset around
40,000 sentences (2 mln words)
A question answering model learned on biological data will be bad
work on news data
Many languages
Need data: for every language, every domain, every
task ?
Unsupervised and Weakly-Supervised Models


Virtually unlimited amount of unlabeled text (e.g., on the Web)
Unsupervised Models



Do not use any kind of labeled data
Model jointly P(H, X| W), where H represents interest for the task in
question (latent semantic topics, syntactic relations, etc)
Estimation on an unlabeled dataset {Xi}i=1…N :

Maximum Likelihood estimation:
Q P
^
W = argmaxW i
Hi
P(H i ; X i jW)
Sum over the variable
you do not observe
Example: Unsupervised Topic Segmentation
Location
[The hotel is located on Maguire street, one block from the river. Public
transport in London is straightforward, the tube station is about anView
8
minute walk or you can get a bus for £ 1.50. ] [We had a stunning view
(from the floor to ceiling window) of the Tower and the Thames.] [One
thing we really enjoyed about this place – our huge bath tub with
jacuzzi, this is so different from usually small European hotels. Rooms
are nicely decorated and very light.] ...
Rooms

Useful for:




Summarization (summarize multiple reviews along key aspects)
Sentiment prediction (predict star ratings for each aspect)
Visualization
....
Semi-Supervised Learning




Small amount of labeled data f (X i ; Yi )gi = 1:::N L
Large amount of unlabeled data f X i gi = N L + 1:::N L + N U
Define a joint model P(X,Y | W)
Model estimated on both datasets:

Maximum Likelihood estimation
QNL
QNL + NU P
^
W = argmaxW i = 1 P(Yi ; X i jW) i = N L + 1
Yi
P(Yi ; X i jW)
Sum over the
unobserved variable on
unlabeled dataset
11
Weakly-Supervised Learning (Web)


Texts are not just isolated sequences of sentences
We always have additional information

User-generated annotation
Can we learn how to
summarized, segment,
understand using this
information?
12
Weakly-Supervised Learning (Web)


Texts are not just isolated sequences of sentences
We always have additional annotation

Temporal Relations between documents
Can we learn to translate, or
port semantic model from one
language to another?
13
Weakly-Supervised Learning (Web)


Texts are not just isolated sequences of sentences
We always have additional annotation







User-Generated annotation
Temporal Relations between documents
Links between documents
Clusters of similar documents
.......
How useful is it?

Can we project annotated resources from language to language?

Can we improve unsupervised / supervised models?
Hot topic in NLP recently
14
Why we will consider probabilistic models?


In the class we will focus on (Bayesian) probability models
Why?




They provide a concise way to define model and approximation assumptions
They are like LEGO blocks – we can combine different models as building
blocks together to learn a new model for the task
Prior knowledge can be integrated in them in a simple and consistent way
Missing data can be easily accounted for (just some over the corresponding
variable)

We saw an example in semi-supervised learning
15
Goals of the seminar

Understand the methodology:


Classes of models considered in NLP
Approximation techniques for learning and inference



(Exact inference will not be tractable for most of the considered problems)
Learn interesting applications of the methods in NLP
See that sometimes we can substitute expensive annotation with a
surrogate signal and obtain good results
16
Plan

Next class (April 23):


Introduction:

Topic models (PLSA, LDA)

Basic learning / inference techniques: EM and Gibbs sampling
Decide on the paper to present


On the basis of the survey and the number of registered students, I will adjust my list and it
will be online on Wednesday
Starting from April 30: paper presentations by you
17
Topics

Modelling semantic topics of data collections:



Integrating syntax



Grounded language acquisition
Joint modelling of multiple language
Modelling multiple modes:


Modeling syntax and topics
Shallow models of semantics


Topic segmentation models (including modelling order of topics)
Topic hierarchies
Gestures and Discourse
Learning feature representations from text
18
Requirements

Present a paper to the class



Write 3 critical “reviews” of 3 selected papers (1.5 - 2 pages each)
A term paper (12-15 pages) for those getting 7 points


We will see how long the presentations should be depending on the number of
students
Make sure you are registered to the right “version” in HISPOS!
Read papers and participate in discussion
19
Grades

Class participation grade: 60 %




You talk and discussion after your talk
Your participation in discussion of other talks
3 reviews (5 % each)
Term paper grade: 40 %


Only if you get 7 points, otherwise you do not need one
Term paper
20
Presentation





Present a paper in an accessible way
Have a critical view on the paper: discuss shortcomings, possible future
work, etc
To give a good presentation in most of the cases you may need to read
one or two additional papers (e.g., those referenced in the paper)
Links to the tutorials on how to make a good presentation will be
available on the class web-page
Send me your slide 4 days before the talk by 6 pm


If we keep the class on Friday, it means that the deadline on Mon by 6 pm
I will give my feedback within 2 days of receiving
21
Presentation





Present a paper in an accessible way
Have a critical view on the paper: discuss shortcomings, possible future
work, etc
To give a good presentation in most of the cases you may need to read
one or two additional papers (e.g., those referenced in the paper)
Links to the tutorials on how to make a good presentation will be
available on the class web-page
Send me your slide 4 days before the talk by 6 pm
If we keep the class on Friday, it means that the deadline is Mon, 6 pm
 I will give my feedback within 2 days of receiving
(The first 2 presenters can send me slides 2 days before if they prefer)

22
Term paper

Goal



Describe the paper you presented in class
Your ideas, analysis, comparison (more later)
It should be written in a style of a research paper, the only difference is that in this
paper most of the work you present is not your own

Length: 12 – 15 pages

Grading criteria





Clarity
Paper organization
Technical correctness
New ideas are meaningful and interesting
Submitted in PDF to my email
23
Critical review


A short critical (!) essay reviewing one of the paper presented in class

One or two paragraphs presenting the essence of the paper

Other parts underlying both positive sides of the paper (what you like) and its shortcomings
The review should be submitted before its presentation in class

(Exception is the additional reviews submitted for the seminars you
skipped, later about it)

No copy-paste from the paper

Length: 1.5 – 2 pages
24
Your ideas / analysis



Comparison of the methods used in the paper with other material
presented in the class or any other related work
Any ideas on improvement of the approach
....
25
Attendance policy


You can skip ONE class without any explanation
Otherwise, you will need to write an additional critical review (for the
paper which was presented while you were absent)
26
Office Hours

I would be happy to see you and discuss after the talk from 16:00 –
17:00 on Fridays (may change if the seminar timing changes):


Office 3.22, C 7.4
Otherwise, send me email and I find the time
27
Other stuff



Timing of the class
Survey (Doodle poll?)
Select a paper to present and papers to review by the next class
(we will use Google docs)
28
Download