Information extraction

Information extraction Lecture 8: Information Extraction from unstructured sources Semantic IE Reasoning You are here Fact Extraction Is-A Extraction singer Entity Disambiguation singer Elvis Entity Recognition Source Selection and Preparation Overview • Fact extraction from text • Difficulties • POS Tagging • Dependency grammars Reminder: Fact Extraction Fact extraction is the extraction of facts about entities from a corpus. Coyote chases Roadrunner. chases(Coyote, Roadrunner) Def: Extraction Pattern An extraction pattern for a binary relation r is a string that contains two place-holders X and Y, indicating that two entities stand in relation r. Extraction patterns for chases(X,Y): • “X chases Y” • “X runs after Y” • “X’s chase for Y” • “Y is chased by X” task> Task: Extraction Pattern Define extraction patterns for • wasBornInCity(X,Y) • isDating(X,Y) Def: Extraction Pattern Match A match for an extraction pattern p in a corpus c is a substring s of c such that σ there exists a substitution from X and Y to words such that σ(p) = s “X’s chase for Y” Coyote’s chase for RR rarely succeeds. Def: Extraction Pattern Match A match for an extraction pattern p in a corpus c is a substring s of c such that σ there exists a substitution from X and Y to words such that σ(p) = s “X’s chase for Y” σ = {X → Coyote, Y → RR} Coyote’s chase for RR rarely succeeds. Def: Extraction Pattern Match A match for an extraction pattern p in a corpus c is a substring s of c such that σ there exists a substitution from X and Y to words such that σ(p) = s “X’s chase for Y” σ = {X → Coyote, Y → RR} match Coyote’s chase for RR rarely succeeds. Def: Pattern Application Given a pattern p for a relation r, and a corpus, pattern application produces, for every match of p with substitution , σ the fact r(σ(X), σ(Y )) “X’s chase for Y” is pattern for chases() σ = {X → Coyote, Y → RR} Coyote’s chase for RR rarely succeeds. Def: Pattern Application Given a pattern p for a relation r, and a corpus, pattern application produces, for every match of p with substitution , σ the fact r(σ(X), σ(Y )) “X’s chase for Y” is pattern for chases() σ = {X → Coyote, Y → RR} Coyote’s chase for RR rarely succeeds. chases(Coyote,RR) Disambiguation is required! Coyote’s chase for RR rarely succeeds. chases(Coyote, RR) ? Strictly speaking, we produce the fact r(d(σ(X)), d(σ(Y ))) where d is a disambiguation function. Where do we get the patterns? • Option 1: Manually compile patterns. Where do we get the patterns? • Option 2: Manually annotate patterns in text While Coyote chases Roadrunner, he experiences certain inconveniences. “X chases Y” Where do we get the patterns? • Option 3: Pattern Deduction Retrieve patterns by help of known facts. Known facts: • chases(Obama,Osama) Obama chases Osama since 2009. Pattern for chases(X,Y): “X chases Y” Def: Pattern Deduction Given a corpus, and given a KB, pattern deduction is the process of finding extraction patterns that produce facts of the KB when applied to the corpus. KB chases Corpus + Obama chases Osama. Def: Pattern Deduction Given a corpus, and given a KB, pattern deduction is the process of finding extraction patterns that produce facts of the KB when applied to the corpus. KB chases Corpus + Obama chases Osama. => “X chases Y” is a pattern for chases(X,Y) Def: Pattern iteration/DIPRE Pattern iteration (also: DIPRE) is the process of repeatedly applying pattern deduction and pattern application, thus continuously augmenting the KB. Known facts Patterns New facts Patterns Example: DIPRE Obama chases Osama. Coyote runs after RR. Coyote chases RR. chases Example: DIPRE Obama chases Osama. Coyote runs after RR. Coyote chases RR. chases “X chases Y” Example: DIPRE Obama chases Osama. Coyote runs after RR. Coyote chases RR. chases chases “X chases Y” Example: DIPRE Obama chases Osama. Coyote runs after RR. Coyote chases RR. chases “X chases Y” chases “X runs after Y” ... Task: DIPRE Given this KB, wantsToEat apply DIPRE to this corpus: Obama likes Michelle. Coyote wants to eat Roadrunner. Harry wants to eat cornflakes. Harry likes cornflakes for breakfast. Overview • Fact extraction from text • Difficulties • POS Tagging • Dependency grammars We use labels to find patterns KB label chases “Obama” label “Osama” Corpus + Obama chases Osama. => “X chases Y” is a pattern for chases(X,Y) Ambiguity is a problem KB label chases “Obama” label loves label “Osama” Corpus + Obama chases Osama. Ambiguity is a problem KB label chases “Obama” label loves label “Osama” Corpus + Obama chases Osama. => “X chases Y” is a pattern for loves(X,Y) or for chases(X,Y)? Disambiguation helps KB chases label “Obama” loves label label “Osama” Corpus + Obama chases Osama. => “X chases Y” is a pattern for chases(X,Y) Phrase structure can be a problem KB label chases “Obama” label “Osama” Corpus + Obama hat Osama gejagt. Phrase structure can be a problem KB label chases “Obama” label “Osama” Corpus + Obama hat Osama gejagt. => “X hat Y” is a pattern for chases(X,Y)? Multiple links are a problem KB label “Obama” chases shot label “Osama” Corpus + Obama chases Osama. Multiple links are a problem KB label “Obama” chases shot label “Osama” Corpus + Obama chases Osama. => “X chases Y” is a pattern for chases(X,Y) or for shot(X,Y)? Confidence of a pattern The confidence of an extraction pattern is the number of matches that produce known facts divided by the total number of matches. Pattern produces mostly new facts => risky Pattern produces mostly known facts => safe Simple word match is not enough Coyote invents a wonderful machine. + “X invents a Y” = invents(Coyote,wonderful) Patterns may be too specific Coyote invents a wonderful machine. + “X invents a great Y” = invents(Coyote,machine) Overview • Fact extraction from text • Difficulties • POS Tagging • Dependency grammars Def: POS A Part-of-Speech (also: POS, POS-tag, word class, lexical class, lexical category) is a set of words with the same grammatical role. • Proper nouns (PN): Wile, Elvis, Obama... • Nouns (N): desert, sand, ... • Adjectives (ADJ): fast, funny, ... • Adverbs (ADV): fast, well, ... • Verbs (V): run, jump, dance, ... Def: POS (2) • Pronouns (PRON): he, she, it, this, ... (what can replace a noun) • Determiners (DET): the, a, these, your,... (what goes before a noun) • Prepositions (PREP): in, with, on, ... (what goes before determiner + noun) • Subordinators (SUB): who, whose, that, which, because... (what introduces a sub-sentence) Def: POS-Tagging POS-Tagging is the process of assigning to each word in a sentence its POS. new machine. Wile tries a PN V DET ADJ N task> Task: POS-Tagging POS-Tag the following sentence: The coyote falls into a canyon. • N, V, PN, ADJ, ADV, PRON • Determiners (DET): the, a, these, your,... (what goes before a noun) • Prepositions (PREP): in, with, on, ... (what goes before determiner + noun) • Subordinators (SUB): who, whose, that,... (what introduces a sub-sentence) 1 2 3 4 Probabilistic POS-Tagging Introduce random variables for words and for POS-tags: (visible) X1 Wile X2 runs Wile runs Wile ... runs (hidden) Y1 Y2 PN V ADJ V Prob. PN PREP ... P (w3) = 0.1 P (w1) = 0.7 P (w2) = 0.1 probabilities> Independence Assumptions Every tag depends just on its predecessor P (Yi|Y1, ..., Yi−1) = P (Yi|Yi−1) Every word depends just on its tag: P (Xi|X1, ..., Xi−1, Y1, ..., Yi) = P (Xi|Yi) (visible) X1 Wile X2 runs Wile runs Wile ... runs (hidden) Y1 Y2 PN V ADJ V Prob. PN PREP ... P (w3) = 0.1 P (w1) = 0.7 P (w2) = 0.1 Homogeneity Assumption 1 P (Yi|Y1, ..., Yi−1) = P (Yi|Yi−1) = P (Yk |Yk−1) ∀ i, k Homogeneity Assumption 1 P (Yi|Y1, ..., Yi−1) = P (Yi|Yi−1) = P (Yk |Yk−1) ∀ i, k Example: The probability that a verb comes after a noun is the same at position 3 and at position 9: P (Y3 = V erb|Y2 = N oun) = P (Y9 = V erb|Y8 = N oun Homogeneity Assumption 1 P (Yi|Y1, ..., Yi−1) = P (Yi|Yi−1) = P (Yk |Yk−1) ∀ i, k Example: The probability that a verb comes after a noun is the same at position 3 and at position 9: P (Y3 = V erb|Y2 = N oun) = P (Y9 = V erb|Y8 = N oun Let’s denote this probability by P (V erb|N oun) Homogeneity Assumption 1 P (Yi|Y1, ..., Yi−1) = P (Yi|Yi−1) = P (Yk |Yk−1) ∀ i, k Example: The probability that a verb comes after a noun is the same at position 3 and at position 9: P (Y3 = V erb|Y2 = N oun) = P (Y9 = V erb|Y8 = N oun Let’s denote this probability by P (V erb|N oun) “Transition probability” P (v|w) := P (Yi = v|Yi−1 = w) (f or any i) Homogeneity Assumption 2 P (Xi|X1, ..., Xi−1, Y1, ..., Yi) = P (Xi|Yi) = P (Xk |Yk ) ∀ i, k Homogeneity Assumption 2 P (Xi|X1, ..., Xi−1, Y1, ..., Yi) = P (Xi|Yi) = P (Xk |Yk ) ∀ i, k Example: The probability that a PN is “Wile” is the same at position 3 and 9: P (X3 = W ile|Y3 = P N ) = P (X9 = W ile|Y9 = P N ) Homogeneity Assumption 2 P (Xi|X1, ..., Xi−1, Y1, ..., Yi) = P (Xi|Yi) = P (Xk |Yk ) ∀ i, k Example: The probability that a PN is “Wile” is the same at position 3 and 9: P (X3 = W ile|Y3 = P N ) = P (X9 = W ile|Y9 = P N ) Let’s denote this probability by P (W ile|P N ) Homogeneity Assumption 2 P (Xi|X1, ..., Xi−1, Y1, ..., Yi) = P (Xi|Yi) = P (Xk |Yk ) ∀ i, k Example: The probability that a PN is “Wile” is the same at position 3 and 9: P (X3 = W ile|Y3 = P N ) = P (X9 = W ile|Y9 = P N ) Let’s denote this probability by P (W ile|P N ) “Emission Probability” P (v|w) := P (Xi = v|Yi = w) (f or any i, word v) Def: HMM A (homogeneous) Hidden Markov Model (also: HMM) is a sequence of random (We use random variables variables, such that P (< x1, ..., xn, y1, ..., yn >) = Words of the sentence POS-tags ...where y0 is fixed as X1,...,Xn,Y1,...,Yn that span the possible worlds) Q i P (xi|yi) × P (yi|yi−1) Emission probabilities Transition probabilities y0 := Start HMMs as graphs P (< x1, ..., xn, y1, ..., yn >) = Q i P (xi|yi) × P (yi|yi−1) Transition probabilities P (P N |Start) PN Start P (DET |Start) P (V |P N ) P (V |N ) DET P (N |DET ) N V P (EN D|V ) END HMMs as graphs P (< x1, ..., xn, y1, ..., yn >) = Q i P (xi|yi) × P (yi|yi−1) Emission probabilities P (P N |Start) PN P (V |P N ) Start P (DET |Start) Transition probabilities P (V |N ) DET P (N |DET ) N P (a|DET ) P (the|DET ) the a V P (EN D|V ) END HMMs as graphs P (< x1, ..., xn, y1, ..., yn >) = Q i P (xi|yi) × P (yi|yi−1) Emission probabilities P (P N |Start) PN P (V |P N ) Start P (DET |Start) Transition probabilities P (V |N ) V P (EN D|V ) DET P (N |DET ) N P (a|DET ) P (W ile|N ) END . P (coyote|N ) P (the|DET ) P (RR|N ) Wile the a RoadRunner P (bird|N ) coyote bird eats sleeps Example: HMMs as graphs P (< x1, ..., xn, y1, ..., yn >) = Q i P (xi|yi) × P (yi|yi−1) P (< the, bird, sleeps, ., , EN D >) =0.2 × 0.5 × 1 × 0.7 × 1 × 0.9 × 1× 1 DET, N , V Start 1 PN 0.8 0.2 DET 1 1 V 1 END .1 N 0.1 0.5 0.4 0.2 0.9 0.5 0.7 0.6 Wile the a RoadRunner coyote bird eats sleeps task> Task: HMMs Write the product of values for P (< W ile, sleeps, ., P N, V, EN D >) P (< a, coyote, eats, ., DET, N, V, EN D >) Start 1 PN 0.8 0.2 DET 1 1 V 1 END .1 N 0.1 0.5 0.4 0.2 0.9 0.5 0.7 0.6 Wile the a RoadRunner coyote bird eats sleeps All other probabilities are 0! P (< W ile, bird, P N, V >) = 0 The graph is just one way to visualize P (< W ile, sleeps, V, V >) = 0 the emisssion and transistion probabilities! 0 Start 1 PN 0.8 0.2 DET 1 0.5 0.4 1 1 V N END .1 0 0.1 0.2 0.9 0.5 0.7 0.6 Wile the a RoadRunner coyote bird eats sleeps Estimate probabilities Start with manually annotated corpus The laws of gravity apply selectively DET N PREP N V ADV P (tagi|tagj ) = # cases where tagi f ollows tagj # appearances of tagj P (ADV |V ) = P (wordi|tagj ) = 1 1 P (V |N ) = 1 2 ... # cases where wordi has tagj # appearances of tagj P (the|DET ) = 1 1 P (laws|N ) = 1 2 ... task> Task: Estimate probabilities Estimate the transition and emission probs: Wile runs many runs . START PN V ADJ N END P (P N |Start) = P (W ile|P N ) = P (V |P N ) = P (runs|V ) = ... ... Draw the HMM as a graph. Task: Compute probabilities Using the probabilities from the last task, compute the following probabilities: P (< W ile, runs, ., P N, V, EN D >) P (< W ile, runs, ., P N, N, EN D >) Def: Probabilistic POS Tagging Given a sentence and transition and emission probabilities, Probabilistic POS Tagging computes the sequence of tags that has maximal probability (in an HMM). ~ = Wile runs. X P (W ile, runs, P N, N ) = 0.01 P (W ile, runs, V, N ) = 0.01 P (W ile, runs, P N, V ) = 0.1 ... winner POS Tagging is not trivial Several cases are difficult to tag: • the word “blue” has 4 letters. • pre- and post-secondary • look (a word) up • The Duchess was entertaining last night. Wikipedia/POS tagging POS Taggers There are several free POS taggers: • Stanford POS tagger • hunpos • MBT: Memory-based Tagger • TreeTagger • ACOPOST (formerly ICOPOST) • YamCha • ... Stanford NLP/Resources Try it out! POS taggers achieve very good precision. POS Tagging helps pattern IE We can choose to match the placeholders with only nouns or proper nouns: “X invents a Y” Coyote invents a catapult invents(Coyote,catapult) Coyote invents a really cool thing. invents(Coyote,really) POS-tags can generalize patterns “X invents a ADJ X” Coyote invents a cool catapult Coyote invents a great catapult Coyote invents a super-duper catapult. match match match Phrase structure is a problem “X invents a ADJ X” Coyote invents a very great catapult. Coyote, who is very hungry, invents a great catapult. no match no match We will see next time how to solve it. Semantic IE Reasoning You are here Fact Extraction Is-A Extraction singer Entity Disambiguation singer Elvis Entity Recognition Source Selection and Preparation References Brin: Extracting Patterns and Relations from the WWW Agichtein: Snowball Ramage: HMM Fundamentals Web data mining class

Information extraction

Related documents

Products

Support

Information extraction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib