Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov Outline Introduction to the Topic Seminar Plan Requirements and Grading 2 What do we want to do with text? One of the ultimate goals of natural language processing is to learn a computer to understand text Text understanding in an open domain is a very complex problem which you cannot possibly solve using a set of handcrafted rules Instead essentially all the modern approaches to natural language processing use statistical techniques Example of Ambiguites … Nissan car and truck plant is located in … … divide life into plant and animal kingdom … … (Article This) (Noun can) (Modal will) (Verb rust ) … The dog bit the kid. He was taken to a veterinarian | hospital). Tiger was in Washington for the PGA tour NLP Tasks “Full” language understanding is beyond state of the art and cannot be approached as a single task, instead: Practical Applications: Relation extraction, question answering, text summarization, translation, …. Prediction of Linguistic Representations: Syntactic parsing, shallow semantic parsing (semantic role labeling), discourse parsing, … Supervised Statistical Methods Annotate texts with (structured) labels and learn a model from this data Supervised Statistical Methods More formally: X – text,Y – label (e.g., syntactic structure) Construct a parameterized model P(Y | X, W) Estimate W on a collection {(Xi, Yi)}i=1…N : Maximum likelihood estimation: Q ^ W = arg maxW i = 1:::N P(Yi jX i ; W ) Predict a label for new example X: ^ = arg maxY P(YjX ; W ^) Y Supervised Statistical Models Most task in NLP are complex and therefore large amounts of data are needed Annotation isNot not feasible just YESfor or many NO, but usually tasks and complex graphs Domain variability: brittle when applied out-of-domain very expensive for others E.g., the standard PennTreebank Wall Street Journal dataset around 40,000 sentences (2 mln words) A question answering model learned on biological data will be bad work on news data Many languages Need data: for every language, every domain, every task ? Unsupervised and Weakly-Supervised Models Virtually unlimited amount of unlabeled text (e.g., on the Web) Unsupervised Models Do not use any kind of labeled data Model jointly P(H, X| W), where H represents interest for the task in question (latent semantic topics, syntactic relations, etc) Estimation on an unlabeled dataset {Xi}i=1…N : Maximum Likelihood estimation: Q P ^ W = argmaxW i Hi P(H i ; X i jW) Sum over the variable you do not observe Example: Unsupervised Topic Segmentation Location [The hotel is located on Maguire street, one block from the river. Public transport in London is straightforward, the tube station is about anView 8 minute walk or you can get a bus for £ 1.50. ] [We had a stunning view (from the floor to ceiling window) of the Tower and the Thames.] [One thing we really enjoyed about this place – our huge bath tub with jacuzzi, this is so different from usually small European hotels. Rooms are nicely decorated and very light.] ... Rooms Useful for: Summarization (summarize multiple reviews along key aspects) Sentiment prediction (predict star ratings for each aspect) Visualization .... Semi-Supervised Learning Small amount of labeled data f (X i ; Yi )gi = 1:::N L Large amount of unlabeled data f X i gi = N L + 1:::N L + N U Define a joint model P(X,Y | W) Model estimated on both datasets: Maximum Likelihood estimation QNL QNL + NU P ^ W = argmaxW i = 1 P(Yi ; X i jW) i = N L + 1 Yi P(Yi ; X i jW) Sum over the unobserved variable on unlabeled dataset 11 Weakly-Supervised Learning (Web) Texts are not just isolated sequences of sentences We always have additional information User-generated annotation Can we learn how to summarized, segment, understand using this information? 12 Weakly-Supervised Learning (Web) Texts are not just isolated sequences of sentences We always have additional annotation Temporal Relations between documents Can we learn to translate, or port semantic model from one language to another? 13 Weakly-Supervised Learning (Web) Texts are not just isolated sequences of sentences We always have additional annotation User-Generated annotation Temporal Relations between documents Links between documents Clusters of similar documents ....... How useful is it? Can we project annotated resources from language to language? Can we improve unsupervised / supervised models? Hot topic in NLP recently 14 Why we will consider probabilistic models? In the class we will focus on (Bayesian) probability models Why? They provide a concise way to define model and approximation assumptions They are like LEGO blocks – we can combine different models as building blocks together to learn a new model for the task Prior knowledge can be integrated in them in a simple and consistent way Missing data can be easily accounted for (just some over the corresponding variable) We saw an example in semi-supervised learning 15 Goals of the seminar Understand the methodology: Classes of models considered in NLP Approximation techniques for learning and inference (Exact inference will not be tractable for most of the considered problems) Learn interesting applications of the methods in NLP See that sometimes we can substitute expensive annotation with a surrogate signal and obtain good results 16 Plan Next class (April 23): Introduction: Topic models (PLSA, LDA) Basic learning / inference techniques: EM and Gibbs sampling Decide on the paper to present On the basis of the survey and the number of registered students, I will adjust my list and it will be online on Wednesday Starting from April 30: paper presentations by you 17 Topics Modelling semantic topics of data collections: Integrating syntax Grounded language acquisition Joint modelling of multiple language Modelling multiple modes: Modeling syntax and topics Shallow models of semantics Topic segmentation models (including modelling order of topics) Topic hierarchies Gestures and Discourse Learning feature representations from text 18 Requirements Present a paper to the class Write 3 critical “reviews” of 3 selected papers (1.5 - 2 pages each) A term paper (12-15 pages) for those getting 7 points We will see how long the presentations should be depending on the number of students Make sure you are registered to the right “version” in HISPOS! Read papers and participate in discussion 19 Grades Class participation grade: 60 % You talk and discussion after your talk Your participation in discussion of other talks 3 reviews (5 % each) Term paper grade: 40 % Only if you get 7 points, otherwise you do not need one Term paper 20 Presentation Present a paper in an accessible way Have a critical view on the paper: discuss shortcomings, possible future work, etc To give a good presentation in most of the cases you may need to read one or two additional papers (e.g., those referenced in the paper) Links to the tutorials on how to make a good presentation will be available on the class web-page Send me your slide 4 days before the talk by 6 pm If we keep the class on Friday, it means that the deadline on Mon by 6 pm I will give my feedback within 2 days of receiving 21 Presentation Present a paper in an accessible way Have a critical view on the paper: discuss shortcomings, possible future work, etc To give a good presentation in most of the cases you may need to read one or two additional papers (e.g., those referenced in the paper) Links to the tutorials on how to make a good presentation will be available on the class web-page Send me your slide 4 days before the talk by 6 pm If we keep the class on Friday, it means that the deadline is Mon, 6 pm I will give my feedback within 2 days of receiving (The first 2 presenters can send me slides 2 days before if they prefer) 22 Term paper Goal Describe the paper you presented in class Your ideas, analysis, comparison (more later) It should be written in a style of a research paper, the only difference is that in this paper most of the work you present is not your own Length: 12 – 15 pages Grading criteria Clarity Paper organization Technical correctness New ideas are meaningful and interesting Submitted in PDF to my email 23 Critical review A short critical (!) essay reviewing one of the paper presented in class One or two paragraphs presenting the essence of the paper Other parts underlying both positive sides of the paper (what you like) and its shortcomings The review should be submitted before its presentation in class (Exception is the additional reviews submitted for the seminars you skipped, later about it) No copy-paste from the paper Length: 1.5 – 2 pages 24 Your ideas / analysis Comparison of the methods used in the paper with other material presented in the class or any other related work Any ideas on improvement of the approach .... 25 Attendance policy You can skip ONE class without any explanation Otherwise, you will need to write an additional critical review (for the paper which was presented while you were absent) 26 Office Hours I would be happy to see you and discuss after the talk from 16:00 – 17:00 on Fridays (may change if the seminar timing changes): Office 3.22, C 7.4 Otherwise, send me email and I find the time 27 Other stuff Timing of the class Survey (Doodle poll?) Select a paper to present and papers to review by the next class (we will use Google docs) 28