Cross-genre Information Linking and Summarization

advertisement
Cross-genre Information Linking and
Summarization
Heng Ji
jih@rpi.edu
Acknowledgement: summarization introduction slides from
Ed Hovy, Daniel Marcu, Sujian Li
Outline



Summarization Recap
Cross-genre Information Linking
Cross-genre Summarization
2
What happened?
MILAN, Italy, April 18. A small airplane crashed into a government
building in heart of Milan, setting the top floors on fire, Italian
police reported. There were no immediate reports on casualties as
rescue workers attempted to clear the area in the city's financial
How
manyreports
victims?
district. Few detailsWhen,
of the crash
were available,
but news
where?
about it immediately set off fears that it might be a terrorist act
akin to the Sept. 11 attacks in the United States. Those fears sent
U.S. stocks tumbling to session lows in late morning trading.
Says who?
Was
a terrorist act?
Witnesses reported hearing a loud explosion from
theit30-story
office building, which houses the administrative offices of the local
Lombardy region and sits next to the city's central train station.
Italian state television said the crash put a hole in the 25th floor
of the Pirelli building. News reports said smoke poured from the
opening. Police and ambulances rushed to the building in downtown
Milan. No further details were immediately available.
What was the target?
1. How many people were injured?
2. How many people were killed? (age, number, gender, description)
3. Was the pilot killed?
4. Where was the plane coming from?
5. Was it an accident (technical problem, illness, terrorist act)?
6. Who was the pilot? (age, number, gender, description)
7. When did the plane crash?
8. How tall is the Pirelli building?
9. Who was on the plane with the pilot?
10. Did the plane catch fire before hitting the building?
11. What was the weather like at the time of the crash?
12. When was the building built?
13. What direction was the plane flying?
14. How many people work in the building?
15. How many people were in the building at the time of the crash?
16. How many people were taken to the hospital?
17. What kind of aircraft was used?
Questions
 What kinds of summaries do people want?
 What are summarizing, abstracting, gisting,...?
 How sophisticated must summ. systems
be?
 Are statistical techniques sufficient?
 Or do we need symbolic techniques and deep
understanding as well?
 What milestones would mark quantum
leaps in summarization theory and practice?
 How do we measure summarization quality?
5
Definitions
 Summary definition(Sparck Jones,1999)
“a reductive transformation of source text to summary
text through content condensation by selection and/or
generalization on what is important in the source.”
Schematic summary processing model
Source
text
Interpretation
Source
representation
Transformation
Summary
representation
Summary
text
Generation
Summarizing factors
 Input




(Sparck Jones 2007)
subject type: domain
genre: newspaper articles, editorials, letters, reports...
form: regular text structure; free-form
source size: single doc; multiple docs (few; many)
 Purpose
 situation: embedded in larger system (MT, IR) or not?
 audience: focused or general
 usage: IR, sorting, skimming...
 Output
 completeness: include all aspects, or focus on some?
 format: paragraph, table, etc.
 style: informative, indicative, aggregative, critical...
8
Examples
Exercise: summarize the following texts for
the following readers:
text1: Coup Attempt reader1: your friend, who knows
nothing about South Africa.
text2: childrens’
story
reader2: someone who lives in
South Africa and knows the political
position.
reader3: your 4-year-old niece.
reader4: the Library of Congress.
9
‘Genres’ of Summary?
 Indicative vs. informative
...used for quick categorization vs. content processing.
 Extract vs. abstract
...lists fragments of text vs. re-phrases content coherently.
 Generic vs. query-oriented
...provides author’s view vs. reflects user’s interest.
 Background vs. just-the-news
...assumes reader’s prior knowledge is poor vs. up-to-date.
 Single-document vs. multi-document source
...based on one text vs. fuses together many texts.
10
A Summarization Machine
DOC
MULTIDOCS
QUERY
50%
10%
Very Brief
Brief
Headline
100
Long
%
Extract
ABSTRACTS
Abstract
?
Indicative
Informative
Generic
Query-oriented
EXTRACTS
Background
Just the news
11
CASE FRAMES
TEMPLATES
CORE CONCEPTS
CORE EVENTS
RELATIONSHIPS
CLAUSE FRAGMENTS
INDEX TERMS
Computational Approach
Top-Down:
Bottom-Up:
 I know what I want!
• I’m dead curious:
what’s in the text?
 User needs: only
certain types of info
 System needs:
particular criteria of
interest, used to focus
search
• User needs: anything
that’s important
• System needs: generic
importance metrics,
used to rate content
12
Review of Methods
Bottom-up methods




Top-down methods
 Information extraction
templates
 Query-driven extraction:
Text location: title, position
Cue phrases
Word frequencies
Internal text cohesion:
 query expansion lists
 co-reference with query
 word co-occurrences
 local salience
 co-reference of names,
names
 lexical similarity to query
objects
 lexical similarity
 semantic rep/graph centrality
 Discourse structure
centrality
13
Query-Driven vs. Text-Driven Focus
 Top-down: Query-driven focus
 Criteria of interest encoded as search specs.
 System uses specs to filter or analyze text
portions.
 Examples: templates with slots with semantic
characteristics; termlists of important terms.
 Bottom-up: Text-driven focus
 Generic importance metrics encoded as
strategies.
 System applies strategies over rep of whole text.
 Examples: degree of connectedness in semantic
graphs; frequency of occurrence of tokens.
14
Bottom-Up, using Info. Retrieval
 IR task: Given a query, find the relevant
document(s) from a large set of documents.
 Summ-IR task: Given a query, find the relevant
passage(s) from a set of passages (i.e., from one
or more documents).
• Questions:
1. IR techniques work on large
volumes of data; can they scale
down accurately enough?
2. IR works on words; do abstracts
require abstract representations?
15
xx xxx xxxx x xx xxxx
xxx xx xxx xx xxxxx x
xxx xx xxx xx x xxx xx
xx xxx x xxx xx xxx x
xx x xxxx xxxx xx
xx xxxx xxx
xxx xx xx xxxx x xxx
xx x xx xx xxxxx x x xx
xxx xxxxxx xxxxxx x x
xxxxxxx xx x xxxxxx
xxxx
xx xx xxxxx xxx xx x
xx xxxx xxx xxxx xx
xxxxx xxxxx xx xxx x
xxxxx xxx
Top-Down, using Info. Extraction
 IE task: Given a template and a text, find all the
information relevant to each slot of the template and
fill it in.
 Summ-IE task: Given a query, select the best
template, fill it in, and generate the contents.
• Questions:
1. IE works only for very particular
templates; can it scale up?
2. What about information that
doesn’t fit into any template—is
this a generic limitation of IE?
16
xx xxx xxxx x xx xxxx
xxx xx xxx xx xxxxx x
xxx xx xxx xx x xxx xx
xx xxx x xxx xx xxx x
xx x xxxx xxxx xxxx xx
xx xxxx xxx
xxx xx xx xxxx x xxx
xx x xx xx xxxxx x x xx
xxx xxxxxx xxxxxx x x
xxxxxxx xx x xxxxxx
xxxx
xx xx xxxxx xxx xx x xx
xx xxxx xxx xxxx xx
xxxxx xxxxx xx xxx x
xxxxx xxx
Xxxxx: xxxx
Xxx: xxxx
Xxx: xx xxx
Xx: xxxxx x
Xxx: xx xxx
Xx: x xxx xx
Xx: xxx x
Xxx: xx
Xxx: x
Paradigms: NLP/IE vs. ir/statistics
NLP/IE:
IR/Statistics:
• Approach: try to ‘understand’
text—re-represent content
using ‘deeper’ notation; then
manipulate that.
• Need: rules for text analysis
and manipulation, at all levels.
• Strengths: higher quality;
supports abstracting.
• Weaknesses: speed; still needs
to scale up to robust opendomain summarization.
• Approach: operate at lexical
level—use word frequency,
collocation counts, etc.
• Need: large amounts of text.
• Strengths: robust; good for
query-oriented summaries.
• Weaknesses: lower quality;
inability to manipulate
information at abstract levels.
17
Toward the Final Answer...
 Problem: What if neither IR-like nor
IE-like methods work?
– sometimes counting and
templates are insufficient,
– and then you need to do
inference to understand.
 Solution:
Mrs. Coolidge: “What did the
preacher preach about?”
Coolidge: “Sin.”
Mrs. Coolidge: “What did he
say?”
Coolidge: “He’s against it.”
 semantic analysis of the text (NLP),
 using adequate knowledge bases that
support inference (AI).
18
Word counting
Inference
2
Elaboration
2
Elaboration
2
Background
Justification
With its
distant orbit
(50 percent
farther from
the sun than
Earth) and
slim
atmospheric
blanket,
(1)
Mars
experiences
frigid
weather
conditions
(2)
8
Example
3
Elaboration
Surface
temperature
s typically
average
about -60
degrees
Celsius (-76
degrees
Fahrenheit)
at the
equator and
can dip to 123 degrees
C near the
poles
(3)
8
Concession
45
Contrast
Only the
midday sun
at tropical
latitudes is
warm
enough to
thaw ice on
occasion,
(4)
5
Evidence
Cause
but any
liquid water
formed in
this way
would
evaporate
almost
instantly
(5)
Although the
atmosphere
holds a
small
amount of
water, and
water-ice
clouds
sometimes
develop,
(7)
because of
the low
atmospheric
pressure
(6)
Most Martian
weather
involves
blowing dust
and carbon
monoxide.
(8)
10
Antithesis
Each winter,
for example,
a blizzard of
frozen
carbon
dioxide
rages over
one pole,
and a few
meters of
this dry-ice
snow
accumulate
as
previously
frozen
carbon
dioxide
evaporates
from the
opposite
polar cap.
(9)
Yet even on
the summer
pole, where
the sun
remains in
the sky all
day long,
temperature
s never
warm
enough to
melt frozen
water.
(10)
Marcu 97
2
Elaboration
2
Elaboration
8
Example
2
Background
Justification
1
2
3
Elaboration
45
Contrast
3
4
8
Concession
10
Antithesis
7
9
8
10
Summarization = selection of the
most important units
5
Evidence
Cause
2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6
5
6
20
Information extraction Method
 Idea: content selection using templates
 Predefine a template, whose slots specify what is of
interest.
 Use a canonical IE system to extract from a (set of)
document(s) the relevant information; fill the template.
 Generate the content of the template as the summary.
 Previous IE work:
 FRUMP (DeJong, 78): ‘sketchy scripts’ of terrorism,
natural disasters, political visits...
 (Mauldin, 91): templates for conceptual IR.
 (Rau and Jacobs, 91): templates for business.
 (McKeown and Radev, 95): templates for news.
21
Information Extraction method
 Example template:
MESSAGE:ID
SECSOURCE:SOURCE
SECSOURCE:DATE
TSL-COL-0001
Reuters
26 Feb 93
Early afternoon
26 Feb 93
World Trade Center
Bombing
AT LEAST 5
INCIDENT:DATE
INCIDENT:LOCATION
INCIDENT:TYPE
HUM TGT:NUMBER
22
Full Generation Example
 Challenge: Pack content densely!
 Example (McKeown and Radev, 95):
 Traverse templates and assign values to
‘realization switches’ that control local choices
such as tense and voice.
 Map modified templates into a representation of
Functional Descriptions (input representation to
Columbia’s NL generation system FUF).
 FUF maps Functional Descriptions into English.
23
Generation Example (McKeown and Radev,
95)
NICOSIA, Cyprus (AP) – Two bombs exploded near government
ministries in Baghdad, but there was no immediate word of any
casualties, Iraqi dissidents reported Friday. There was no independent
confirmation of the claims by the Iraqi National Congress. Iraq’s
state-controlled media have not mentioned any bombings.
Multiple sources and disagreement
Explicit mentioning of “no information”.
24
Graph-based Methods
 Degree Centrality
 LexRank
 Continuous LexRank
Degree Centrality
 Problem Formulation
 Represent each sentence by a vector
 Denote each sentence as the node of a graph
 Cosine similarity determines the edges
between nodes
Degree Centrality
 Since we are
interested in
significant
similarities, we
can eliminate
some low values
in this matrix by
defining a
threshold.
Degree Centrality
 Compute the degree
of each sentence
 Pick the nodes
(sentences) with
high degrees
Degree Centrality
 Disadvantage in Degree Centrality
approach
LexRank
 Centrality vector p which will give a
lexrank of each sentence (similar to page
rank) defined by :
LexRank
 B is a stochastic matrix
 Is it an irreducible and aperiodic matrix?
 Dampness (Page et al. 1998)
Summarization&keyphrase
 句子-句子关系图
 词-词关系图
 3 kinds of relations
SS-Relation: U
WW-Relation: V
SW-Relation: W
Build Sentence-Sentence Graph
 Sentence relations
U  [U ij ]nn
U ij  sim( si , s j ), i  j
U ij  0, i  j
si  wt1 , wt2 ,...wtn 
wti  tf ti * isf ti
isf ti  1  log( N / nti )
Build Word-Word Graph
 Word relations
V  [Vij ]nn
Vij  sim(ti , t j ), i  j
Vij  0, i  j
 Word similariy computation:
 Based on dictionary (WordNet)
 Based on corpus (mutual information)
Build Sentence-Word Graph
 Relation between sentences and words
W  [Wij ]mn
Wij  aff ( si , t j )
 Similarity computation:
T  {t j |1  j  n} S  {si |1  i  m}
aff ( si , t j ) 
tft j  isf t j
 tf
tSi
t
 isf t
Document Model
Assumption 1
If a sentence is important, its closely connected
sentences are also important; If a word is important,
its closely related words are also important.
Assumption 2
More important words are included in a sentence,
more important the sentence is.
More frequent a word occurs in important sentences,
more important the word is.
Reinforcement Algorithm
 Assumptions:
u ( si )   j U ji u ( s j )
v(t j )   i Vij v(ti )
u ( si ) 

j
W ji v (t j )
v(t j )   i Wij u ( si )
m
n
j 1
j 1
u ( si )   U ji u ( s j )   W ji v(t j )
n
m
v(t j )    Vij v(ti )    Wij u ( si )
i 1
i 1
matrix form:
u  aU u  W v
T
v  aV v  W u
T
T
Then we can simultaneously rank sentences (u) and words
(v).
Postprocessing
 Simple processing
 Extract the highest scored sentences until the
length reaches the requirement.
 Problems
 1: redundancy
 2: meaningless words in sentences (rules
based)
 3: coherence
Sentence simplification
 Delete meaningless words in sentences
 News specific noisy words
 Content irrelevant words
 Rule based method
 The beginning of news: e.g.,“ALBUQUERQUE, N.M.
(AP) ;
 The initial words in the sentence: such as
“and”,”also”,”besides,”,”though,”,”in
addition”,”somebody said”,“somebody says”;;
 “somebody (代词)/It is said/reported/noticed/thought
that” ;
 The parenthesized content in captalized letters …
Sentence ordering
 Sentence ordering by score: no logic in the
content
 Temporal based sentence ordering
 Acquire the time stamp from the original texts
 Order sentence according to the publish time of
documents; For the sentences in the same document,
order them by their occurrence in the document
Outline



Summarization Recap
Cross-genre Information Linking (Huang et al.,
2012)
Cross-genre Summarization
43
What are Other People Planning?
10K tweets posted each hour
Top tweets ranked by TextRank
Desperately Looking for a Better Ranker
Top Ranked Tweets
from our system
What is Informativeness?

After temporal and spatial constraints, informative to
a general audience or helpful for tracking events




Informative Tweet Examples



Break news
Real-time coverage of ongoing events
…
New Yorkers, find your exact evacuation zone by your address
here: http://t.co/9NhiGKG /via @user #Irene #hurricane #NY
Hurricane Irene: Latest developments http://t.co/2nQOJLO
Non-Informative Tweet Examples


Me, Myself, and Hurricane Irene.
I'm ready For hurricane Irene.
Limitation of Previous Work



Supervised ranking models require large amount of labeled
data and multiple levels of features (e.g. content and user
account features)
Ignored cross-genre linkages and background knowledge
Tweets about events of general interest are sent by many
disconnected users


Need to handle link sparsity with implicit user network prediction
Ignored subjectivity detection and redundancy removal
Our relationships have
been ignored
Motivations and Hypotheses



Informative tweets often contain rich links to diverse
networks
Hypothesis 1: Informative tweets are more likely to be
posted by credible users; and vice versa (credible users
are more likely to post informative tweets).
Hypothesis 2: Tweets involving many users are more
likely to be informative.


Similar tweets appear with high frequency
Synchronous behavior of users indicates informative
information



Had fun in the excursion bus for 16 hours
Saw some empty ancient caves in darkness
Our bus hit a house before the second caves!
Motivations and Hypotheses (Cont’)

Hypothesis 3: Tweets aligned with contents of web
documents are more likely to be informative.



New Yorkers, find your exact evacuation zone by your
address here: http://t.co/9NhiGKG /via @user #Irene
\#hurricane \#NY
Details of Aer Lingus flights affected by Hurricane Irene can
be found at http://t.co/PCqE74V\u201d
Hurricane Irene: City by City Forecasts http://t.co/x1t122A
Approach Overview




Make use of correlations to
formal genre web documents
Infer implicit tweet-user relations
to enrich network linkages
Extend to heterogeneous
networks instead of homogeneous
networks
Effective propagation model to
consider global evidence from
different genres
Web-Tweet-User Networks
D1
T1
U1
T2
D2
U2
T3
D3
U3
T4
Web-Tweet Networks
Tweet-User Networks
Non-Informative Tweet Filtering

Capture the characteristics of a noisy tweet by a few
patterns


very short tweets without a complementary URL
tweets with subjective opinions (e.g. include I, me, my…)



I'm ready for hurricane Irene
I hope New York and New Jersey are ok when the hurricane
hits
informal tweets containing slang words
Precision: 96.59%
Initializing Ranking Scores

Initializing Tweet and Web Document Scores


TextRank based on content similarity (cosine & tf.idf)
Initializing User Credibility Scores


TextRank based on retweet/reply/user mention networks
Bayesian Ranking approach to consider user and tweet
networks simultaneously (Wang et al 2011)
Rank(x): the increase of posterior
probability that a user is credible,
normalized by prior probability
: the percentage of true claims
: the percentage of credible users
: the explicit tweet-user networks
Constructing Heterogeneous Networks

Tweet-User Networks



Explicit tweet-user relations are sparse
Infer implicit tweet-user relations.
-U1 posts T1, if sim(T1,T2) exceeds an threshold,
an direct edge is created for U1 and T2.
Web-Tweet Networks

Ti is aligned with relevant web document Dj if they
are on the similar topic (cosine & tf.idf)
Tri-HITS: preliminaries
Similarity matrix Wdt
Transition matrix Pdt
Heterogeneous Networks
Initial ranking scores
S0(d)
S0(t)
S0(u)
0.45
T1
D1
0.5
0.8
D2
1.0
U1
T2
1.0
0.1
T3
1.0
U2
Implicit links between tweets
and web documents:
Wtd Wdt
Explicit and implicit links
between tweets and users:
Wtu Wut
Propagation from tweets to web documents

Tri-HITS: based on the similarity matrix
Updated Score
Initial Score
Propagated Score

Co-HITS: based on transition matrix (Deng et al 2009)

Differences between Tri-HITS and Co-HITS:


Tri-HITS: normalize the propagated ranking scores based on
original similarity matrix
Co-HITS propagates normalized ranking scores using the
transition matrix
Tri-HITS (con’t)

Propagation from tweets to users

Propagation from web documents and users to tweets
Set
Set
to 0 will only consider tweet-user networks
to 0 will only consider web-tweet networks
An Example over bipartite graph
0.45
T1 0.2
0.8
T2 0.3
0.6 D1
0.4 D2
0.1
T3 0.5
Wtd
Propagated scores in first iteration
of Tri-HITS:
Choose =0.5, the final ranking of tweets
Tri-HITS: (0.276, 0.463, 0.261)
Co-HITS: (0.202, 0.331, 0.467)
Pdt
Ptd
Propagated scores in first iteration of
Co-HITS:
Co-HITS: Weaken or damage original
meaning of semantic similarity
Overall Performance
 Non-informative
tweets filtering is
important for informal
information from
social media.
 Evidence from multigenre networks
improves TextRank
significantly
 Knowledge transferred
from the Web and the
Inferred Implicit Social
Networks dramatically
boosted quality
nDCG@ top n ranked tweets
Remaining Error Analysis

Topically-relevant tweet identification


Non-informative tweet identification by performing
deeper linguistic analysis and rumor/sarcasm detection



Hurricane names hurricane names http://t.co/iisc7UY ;)
My favorite parts of Hurricane coverage is when the weathercasters
stand in those 100 MPH winds right on the beach. Good stuff.
Deep semantic analysis to improve inferring implicit
linkages


Hurricane Kitty: http://t.co/cdIexE3
“MTA closed” = “Subway shut down”
Subjectivity Detection with opinion mining

Damn earthquake & hurricane in the same week… = Worst week to
live on the East Coast
Outline



Summarization Recap
Cross-genre Information Linking
Cross-genre Summarization
60
Tweet Ranking (Liu et al., 2012)

Modifying Weights in TextRank

Retw: a tweet is more important if it has been re-tweeted
more times
Foll: a tweet is more important if it is published by an
account with more followers
Readability: sentence length, word length, OOV
They also considered user diversity



61
Another Similar Work by Yan et al., 2012

Tweet Graph:



User Graph


62
Popularity
Personalization
(user’s topic
preference)
#tweets
Co-Ranking
Discussion



No Solid Work on News Summarization
using Newsworthy tweets yet
How to Proceed?
Look at Jesse’s data and Results
63
Download