ppt

Finding and Using RhetoricalSemantic Relations in Text Sasha Blair-Goldensohn 28 April 2005 Outline • Background • Relations and Definitional QA • Exploring Statistical Techniques for Relation Finding • Using Mined Relations For Fun and Profit Situating This Talk • Various levels of textual relations (a.k.a. predicates) – Word-level, e.g. hypernym-hyponym • WordNet catalogs many of these – Syntactic, e.g. verb-argument – Propositional, e.g. agent-patient • Wide array of work on parsers for syntactic and propositional structure can derive relations at the sentence level – Rhetorical, e.g. cause-effect, contrast • Work in this domain more theoretical, no “general use” parser • This talk – How rhetorical-type relations can be useful for a particular task • Interaction between rhetorical and word-level relations – Experiments in detecting and using these relations Motivation • Definitional Questions – “What/Who is X?” • Concepts / Things / Processes: Muzak, thin layer chromatography, Hogwarts, Aum Shinrikyo, etc. • People: Sonia Gandhi, Neil Diamond • Exploratory manual analysis of definitions – Some properties consistently “good” across topics • e.g., Superordinate, Cause-Effect, Contrast – Other “good” properties harder to generalize • Different for a chemical procedure (applications, process components) vs. a cult (founder, beliefs, membership) – Templates could be useful here for certain broad categories (people, organizations, etc.) – … but our focus is on a system to define any term DefScriber: A Hybrid System – Knowledge-driven: three predicates (a.k.a. relations): • Genus: category information (“Shiraz is a grape.”) • Species: differentiating the subject from other category members (“Shiraz is used to make a popular style of red wine…”) – Sentences containing both Genus and Species identified by pattern • Non-specific Definitional (NSD): relevant information that may be impractical to classify generally (“Reds are now in favor in Australia, but in the 1970s white wine was more popular.”) – NSD sentence identified (mainly) by function of term concentration – Data-driven: statistical summarization-esque techniques to organize NSD information • Separate core concepts from more marginal ones • Cluster key subtopics • Order sentences in using importance and cohesion Pattern-Based Relation Identification (G-S) Example Sentence To Pattern S S NP NP VP The Hindu Kush NP DT? TERM NP FormativeVb The Hajj, NP VP NP is or Pilgrimage to Makkah (Mecca), NP PP NP the boundary NP VP NP S NP contains contains VP represents Matches Input Sentence between two major plates: the Indian and Eurasian. The original Genus-Species sentence Genus PP PREP Species The extracted partial syntax-tree pattern NP the central duty PP of Islam. A matching sentence Example Output (From DUC 2004) Who is Sonia Gandhi? Congress President Sonia Gandhi, who married into what was once India’s most powerful political family, is the first non-Indian since independence 50 years ago to lead the Congress. After Prime Minister Rajiv Gandhi was assassinated in 1991, Gandhi was persuaded by the Congress to succeed her husband to continue leading the party as the chief, but she refused. The BJP had shrugged off the influence of the 51-year-old Sonia Gandhi when she stepped into politics early this year, dismissing her as a “foreigner.” Sonia Gandhi is now an Indian citizen. Gandhi, who is 51, met her husband when she was an 18-year old student at Cambridge in London, the first time she was away from her native Italy. • Starting with Genus and Species information gives answer context • Word-based chaining of concepts for cohesion • Use of pronoun rewriting (Nenkova, 2003) to clarify initial references and make later ones more fluid • Contrast reads well – but we were just lucky! • Statistical analysis (data-driven techniques) create a definition that proceeds from more to less central topics – Five extracted sentences extracted from four different documents Some Formal Evaluations • Survey-based evaluation (2003) – Users rated five qualitative aspects of definitions – Showed significant improvement over query-focused multi-document summarization • Automatic and manual evals in DUC 2004 “Who is X?” task – Best results among 22 teams in automated (ROUGE) evaluation (significantly better than 20) – Less distinguished in manual evaluation of coverage, responsiveness, and quality • Little significant diff: on avg, 1.1 systems better, 2 worse • Because extractive task? Informal Observations • DefScriber Pros – Robust: Data-driven approaches will provide an answer for any topic, dynamically • Stock answer for “Why not use Google definitions?” – Nice answers when we find a G-S sentence and we have some coherent threads • Cons – Predicate coverage for G-S only – Data-driven techniques are limited • Similarity-based (word-overlap) • Use data from retrieved documents only (mod IDF) Adding Predicates • We want to add predicates that are consistently useful, e.g. Cause-Effect, Contrast – Approach of syntax-tree patterns with high precision (~96%) but uneven recall, and requires significant manual effort – Initial markup study indicates these predicates are stated in highly varied ways, and not always explicitly, e.g. • E.g., “Diabetes is a disease of the endocrine system. Symptoms can include tiredness, thirst and the need to urinate frequently.” • Idea: A technique to determine a relation using word pairs, even when it is not explicitly stated Strengthening Data-driven Techniques • We want to strengthen our techniques, because word-based similarity can limit us in some cases, e.g.: • We would like to follow: – Tachyons are a class of particles which are able to travel faster than the speed of light. • With: – By extension of this terminology, particles that travel slower than light are called tardyons, and particles, such as photons, that travel exactly at the speed of light are called luxons. • but the felicitousness of this combination due to Contrast is missed by similarity-based metric • Idea: A technique to use relations in addition to similarity / identity to a cohesion metric Choosing an Approach • Learning relationship content, e.g. that disease causes symptoms, or that faster contrasts with slower – Echihabi and Marcu (2002) use cue phrases to mine large corpora to construct a word-pair-based classifier for four relations including Cause and Contrast and detect these relations across clauses or sentences – Lapata and Lascarides (2004) use a similar approach for sentence-internal temporal relations (Before, After, During, etc.) using word pairs and other features like verb tenses • As opposed to learning patterns – Snow, Jurafsky et al. (2005) use a supervised approach to learn patterns for the hypernymy relation based on dependency-tree • e.g., “X is a Y”, “X, Y and other Z”, etc. – Some issues including usefulness for non-explicit relations and cohesion application (more later) The Approach • Begin by following Echihabi and Marcu: – Compile a small set of cue-phrases for each relation, e.g. • Cause: [Because X, Y], [X. As a consequence, Y], etc. • Contrast: [X. However, Y], [X even though Y], etc. • Baseline: Choose random non-contiguous sents from a document – Mine a large amount of (noisy) data: • If we find a sentence: “Because [x1 x2 … xn] , [y1 y2 … ym] .” • And note down that pairs (x1, y1) … (xn, ym) were observed in a causal setting • So if we find: “Because [of poaching , smuggling and related treacheries], [tigers, rhinos and civets are endangered species] . • … our belief that the pair (poaching ,endangered) indicates a causal relationship is increased – Construct a naïve Bayes classifier s/t for two text spans W1 and W2, the probability of Relation rk is estimated as: Goals • Attain “good” accuracy – Not essential to exceed previous numbers since we are concerned with application • Apply model to address DefScriber “cons” – Make a system that can be used in an online setting • Consider alternative uses for model System Design • Corpus: Aquaint collection (LDC) of approximately 20M sentences of newswire text from 1996-2000 • Mined examples of Cause and Contrast – Approx 407k cause – Approx 943k contrast – Trained system on approx 400k each, and added 400k “no relation” as baseline • “No relation” is taken as sentence pairs from the same document which are at least 3 sents apart • 64M word pairs with counts in MySQL Database – Efficiency concerns Classification Task • Given two text spans, predict the relation between them when cue patterns are removed • Used 10k held out test data for each relation type – Baseline for binary classifier = 50% Smoothing • Our data is very sparse given the possible number of word pairs (99% of possible pairs unseen in 400k norel sentence pairs) • Using LaPlace smoothing, we estimate the probability of a given word pair as: C ( x, y )   PLap( x, y )  N  B • Where B is the number of unseen events. But with λ = 1, 94% of the probability space goes to unseen events • We can experiment with smaller λ – Or estimate values empirically Effect of λ Parameter Binary Classification @ 100k Training Exampes 0.75 0.70 Accuracy 0.65 0.60 cause v norel contrast v norel 0.55 0.50 0.45 0.40 1.0000 0.1000 0.0100 LaPlace Parameter 0.0010 0.0001 Good-Turing Smoothing • Smoothes all counts based on ratio of frequencies of frequencies – Gives N1/N = .08 probability to unseen events • Depends on choice of smoothing function for higher frequencies where we have few examples • In limited experiments, performed moderately worse than LaPlace (within .05) – May improve with more data (and effort!) Stemming • Experimented with Porter Stemmer to address sparsity – Improves classification accuracy marginally (< 1 percent) • However, somewhat coarse-grained for other tasks – Currently using unstemmed models; lemmatization might be better Classification Results Binary Classification : Unstemmed / Laplace @ 0.01 0.90 Accuracy 0.85 0.85 0.80 0.80 0.75 0.76 cause vs norel 0.80 0.75 contrast vs norel 0.73 0.70 0.69 0.65 0.64 cause vs norel - ref contrast vs norel - ref 0.60 0.55 0.50 100k 200k 400k 890k Training Examples (not to scale) 3882k Another Task: Term Suggestion • We can also use these models to look for pairs of words which are most strongly linked for a given relation, e.g. Contrast • Using log-likelihood measure a la Dunning – Null hypothesis is that for two terms w and t, the pair (w,t) is equally likely for the Contrast model or not – H0 = P(w,t|ContrastModel) = P(w,t|~ContrastModel) = P(w|t) – So given a word w, we wish to suggest the term(s) t for which H0 is most unlikely • Issues: Evaluation and Sparsity Term Suggestion: an Example • Recall our example: • Tachyons are a class of particles which are able to travel faster than the speed of light. • By extension of this terminology, particles that travel slower than light are called tardyons, and particles, such as photons, that travel exactly at the speed of light are called luxons. • Contrast terms above log-likelihood threshold • Speed: not, still, only, speed, average, exactly, football, slower, dial, race, faster, isn’t, efficient, strength, toughness • Faster: buyer, perhaps, #unk#, speed • Class: not, restroom, island, mostly, individual, down, lost, subject, guys, only, schools – Non-content terms: May indicate contrast language – Noise / context-specific suggestions – Useful terms: some antonyms, but also pseudo-coordinates, and often term itself – we are more interested in rhetorical relevance more than strict relation • Seems promising, but only anecdotal evidence here Applying to Definitional Answers • Several potential directions for algorithm input from relation models – As additional weight when selecting “next” sentence by measuring cause/contrast-ness of pairing • Idea: encourage causal / contrast “chains” in the definition • Could be done as classification or with term suggestions – Use term suggestions to boost “importance” measure at word level • Idea: even if a sentence doesn’t seem ideal from a cohesion perspective, it may be important enough to insert anyway if it has strong relation links with the cluster as a whole – “Needle in Haystack” issue • Which terms to use as seeds for suggestion? Contrast Chain Weighting Idea: Use suggested terms rather than span classifier since textual regularities of adjacent sentences may be missing Algorithm: 1. Extract keywords K from current sent 2. For each k in K 1. Get terms T with LogLike(Contrast(t,K)) > threshold 2. For each potential next sent S, ContrastScore(S) = WeightedOverlap(T,S) 3. Choose best next S as a function of ContrastScore(S) and other weights Applying To Definitions: “What is bankruptcy?” Old Answer: New Answer: There are two types of bankruptcy Chapter 7 bankruptcy and Chapter 13 bankruptcy. People with insufficient assets or income could still file a Chapter 7 bankruptcy, which if approved by a judge erases debts entirely after certain assets are forfeited. File bankruptcy petition with the clerk of the bankruptcy courts. Bankruptcy spawns new restaurant Jan 25, 2005 Lansdale Reporter, According to United States Bankruptcy Court documents Memphis Magic filed for Chapter 11 bankruptcy on Oct. 29 which had voluntarily ... Some people file bankruptcy because of the automatic stay provision, the part of the bankruptcy code that offers legal protection against bill collectors. There are two types of bankruptcy Chapter 7 bankruptcy and Chapter 13 bankruptcy. When a co-signer is involved in consumer debt situations, a Chapter 13 proceeding could protect the co-signer who has not also filed for bankruptcy protection. People with insufficient assets or income could still file a Chapter 7 bankruptcy, which if approved by a judge erases debts entirely after certain assets are forfeited. Just filing the bankruptcy does not breach the mortgage; filing to make payments according to the loan agreement is a breach. Personal debt pushes more into bankruptcy Jan 26, 2005 Manawatu Standard, The rules that apply to personal bankruptcy are similar to those that govern company bankruptcy: the slate is wiped clean after three years. Further Uses for Model • For coherence/cohesion in general-purpose summarization • For answering causal or comparative questions – “Why did Dow-Corning go bankrupt?” • Filter by terms that have causal relationship with bankruptcy – “How fast is a lion?” • Filter by terms that are contrasted with fast • As added weight on bootstrapped data for, e.g. opinions – If we believe term X has strong positive orientation, and we believe X causes/contrasts reliably with Y, we can increase/decrease our belief about the positive orientation of Y • As general tool for applications that can accept weaker inferences in exchange for broad coverage Alternatives • “Couldn’t you just use WordNet?” – Certainly complementary – WN has issues of coverage • Number of terms, number of relations both limited • Much more precise, but doesn’t clearly contain things like the “contrast” between speed and strength – Probabilities over relations • “What about patterns?” – Again complementary – Issues with explicit statement of relations – For methods like Snow et al., need training data Issues • Sparsity – More effort into smoothing (class-based methods, principled estimation for parameter-based techniques) – Additional data, features • Pattern inaccuracy – Estimated at up to 15% by Echihabi -- address with syntax-aware patterns – e.g., " I think the bond is going to pass as it is because it ' s an excellent proposal , " [she said] . – Pattern-learning can discover and rank patterns, but most methods need training data • Evaluation – DUC, TREC, and others! Wrap Up • Building a model of certain rhetoricalsemantic relations seems feasible • Validated previous work on classification • Exploring new avenues for applying these models to QA, summarization, and beyond Example Run: “What is the Hajj?” Goal-Driven •Use definitional predicates such as Genus and Species to search for sentences conveying typical definitional information. •Implementation combines featurebased classification and pattern recognition over syntax trees. Data-Driven •Adapt techniques from summarization to maximize content importance, cohesion and coverage. •Implementation uses lexical distance for centroid-based clustering and cohesion metrics Document Retrieval 11 Web documents, 1127 total sentences Predicate Identification 383 Nonspecific Definitional sentences DataDriven Analysis Clusters, ordering information 9 Genus-Species Sentences 1. The Hajj, or pilgrimage to Makkah (Mecca), is the central duty of Islam. 2. The Hajj is a milestone event in a Muslim 's life. 3. The hajj is one of five pillars that make up the foundation of Islam. 4. The hajj is a weeklong pilgrimage that begins in the 12th month of the Islamic lunar calendar. … Definition Creation The Hajj, or pilgrimage to Makkah [Mecca], is the central duty of Islam. More than two million Muslims are expected to take the Hajj this year. Muslims must perform the hajj at least once in their lifetime if physically and financially able. The Hajj is a milestone event in a Muslim's life. The annual hajj begins in the twelfth month of the Islamic year (which is lunar, not solar, so that hajj and Ramadan fall sometimes in summer, sometimes in winter). The Hajj is a week-long pilgrimage that begins in the 12th month of the Islamic lunar calendar. Another ceremony, which was not connected with the rites of the Ka'ba before the rise of Islam, is the Hajj, the annual pilgrimage to 'Arafat, about two miles east of Mecca, toward Mina. The hajj is one of five pillars that make up the foundation of Islam.

ppt

Related documents

Products

Support

ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib