Tracking the Dynamic Evolution of Participants Salience in a

Tracking the Dynamic Evolution of Participant Salience in a Discussion
Ahmed Hassan
University of Michigan
hassanam@umich.edu
Anthony Fader
University of Michigan
afader@umich.edu
Kevin M. Quinn
Burt L. Monroe
Harvard University
Pennsylvania State University
kquinn@fsa.harvard.edu burtmonroe@psu.edu
Michael H. Crespin
University of Georgia
crespin@uga.edu
Michael Colaresi
Michigan State University
colaresi@msu.edu
Dragomir R. Radev
University of Michigan
radev@umich.edu
Abstract
influence other people and pass information or advice to them. Those persons are often regarded
as experts in the field or simply influential people and they tend to affect the ideas and rhetoric
of other participants. This effect can be tracked
down by tracking the similarity between different
speeches. We can then imagine a debate with many
people arguing about many different things as a
network of speeches or participations interacting
with each other. We can then try to identify the
most salient or important participants by identifying the most central speeches in this network and
associating them with their speakers. When we
have a large dataset of debates and conversations
that expand over a long period of time, the salience
of participants becomes a dynamic property that
changes over time. To capture this dynamic nature
of the process, the graph of speeches must evolve
over time such that we have a different graph at
each instance of time that reflects the interaction
of speeches at this instant.
We introduce a technique for analyzing the
temporal evolution of the salience of participants in a discussion. Our method can
dynamically track how the relative importance of speakers evolve over time using
graph based techniques. Speaker salience
is computed based on the eigenvector centrality in a graph representation of participants in a discussion. Two participants in a
discussion are linked with an edge if they
use similar rhetoric. The method is dynamic in the sense that the graph evolves
over time to capture the evolution inherent to the participants salience. We used
our method to track the salience of members of the US Senate using data from the
US Congressional Record. Our analysis
investigated how the salience of speakers
changes over time. Our results show that
the scores can capture speaker centrality
in topics as well as events that result in
change of salience or influence among different participants.
1
Introduction
There are several sources of data that record
speeches or participations in debates or discussions among a group of speakers or participants.
Those include parliamentary records, blogs, and
news groups. This data represents a very important
and unexploited source of information that contains several trends and ideas. In any debate or
discussion, there are certain types of persons who
c 2008.
Licensed under the Creative Commons
Attribution-Noncommercial-Share Alike 3.0 Unported license (http://creativecommons.org/licenses/by-nc-sa/3.0/).
Some rights reserved.
We apply our method to the US Congressional
Record. The US Congressional Record documents
everything said and done in the US Congress
House and Senate. The speeches in this data set
are made by a large number of people over a long
period of time. Using political speeches as test
data for the proposed method adds an extra layer
of meaning onto the measure of speakers salience.
Speaker salience of the Congress members can reflect the importance or influence in the US legislative process. The way salience scores evolve
over time can answer several interesting issues like
how the influence of the speakers vary with majority status and change of party control. It can also
study the dynamics of the relative distribution of
attention to each topic area in different time periods.
313
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 313–320
Manchester, August 2008
The rest of this paper will proceed as follows.
Section 2 reviews some related work. In Section 3,
we describe how the data can be clustered into different topic clusters. In Section 4, we describe
our method for computing the salience of different
participant in a discussion, we also describe how
to the network of speakers varies over time. Section 5 describes the experimental setup. Finally,
we present the conclusions in Section 6.
2
Related Work
Several methods have been proposed for identifying the most central nodes in a network. Degree
centrality, closeness, and betweenness (Newman,
2003) are among the most known methods for
measuring centrality of nodes in a network. Eigenvector centrality is another powerful method that
that has been applied to several types of networks.
For example it has been used to measure centrality in hyperlinked web pages networks (Brin
and Page, 1998; Kleinberg, 1998), lexical networks (Erkan and Radev, 2004; Mihalcea and Tarau, 2004; Kurland and Lee, 2005; Kurland and
Lee, 2006), and semantic networks (Mihalcea et
al., 2004).
The interest of applying natural language processing techniques in the area of political science
has been recently increasing.
(Quinn et al., 2006) introduce a multinomial
mixture model to cluster political speeches into
topics or related categories. In (Porter et al., 2005),
a network analysis of the members and committees
of the US House of Representatives is performed.
The authors prove that there are connections linking some political positions to certain committees.
This suggests that there are factors affecting committee membership and that they are not determined at random. In (Thomas et al., 2006), the authors try to automatically classify speeches, from
the US Congress debates, as supporting or opposing a given topic by taking advantage of the voting
records of the speakers. (Fader et al., 2007) introduce MavenRank , which is a method based on
lexical centrality that identifies the most influential members of the US Senate. It computes a single salience score for each speaker that is constant
over time.
In this paper, we introduce a new method for
tracking the evolution of the salience of participants in a discussion over time. Our method is
based on the ones described in (Erkan and Radev,
2004; Mihalcea and Tarau, 2004; Fader et al.,
2007), The objective of this paper is to dynamically rank speakers or participants in a discussion.
The proposed method is dynamic in the sense that
the computed importance varies over time.
3
Topic Clusters
Before applying the proposed method to a data
set with speeches in multiple topics, we first need
to divide the speech documents into topic clusters. We used the model described in (Quinn et al.,
2006) for this purpose. The model presented in this
paper assumes that the probabilities of a document
belonging to a certain topic varies smoothly over
time and the words within a given document have
exactly the same probability of being drawn from
a particular topic (Quinn et al., 2006). These two
properties make the model different than standard
mixture models (McLachlan and Peel, 2000) and
the latent Dirichlet allocation model of (Blei et al.,
2003). The model of (Quinn et al., 2006) is most
closely related to the model of (Blei and Lafferty,
2006), who present a generalization of the model
used by (Quinn et al., 2006).
The output from the topic model is a D × K
matrix Z where D is the number of speeches , K
is the number of topics and the element zdk represents the probability of the dth speech being generated by topic k. We then assign each speech d
to the kth cluster where k = arg maxj zdj . If the
maximum value is not unique, one of the clusters
having the maximum value is arbitrary selected.
4
Speaker Centrality
In this section we describe how to build a network
of speeches and use it to identify speaker centrality.
We also describe how to generate different projections of the network at different times, and how
to use those projection to get dynamic salience
scores.
4.1
Computing Speaker Salience
The method we used is similar to the methods described in (Erkan and Radev, 2004; Mihalcea and
Tarau, 2004; Kurland and Lee, 2005), which were
originally used for ranking sentences and documents in extractive summarization and information
retrieval systems.
A collection of speeches can be represented as
a network where similar speeches are linked to
each other. The proposed method is based on
314
the premise that important speeches tend to be
lexically similar to other important speeches, and
important speeches tend to belong to important
speakers. Hence given a collection of speeches and
a similarity measure, we can build a network and
define the centrality score of a speech recursively
in terms of the scores of other similar speeches.
Later, we can compute the salience of a speaker
as the sum of the centrality measure of all his
speeches.
To measure the similarity between two
speeches, we use the bag-of-words model to represent each sentence as an N-dimensional vector of
tf-idf scores, where N is the number of all possible
words in the target language. The similarity
between two speeches is then computed using the
cosine similarity between the two vectors.
A vector of term frequencies is used to represent
each speech. Those term frequencies are weighted
according to the relative importance of the given
term in the cluster.
The vectors representing speeches contain term
frequencies (or tf), which are weighted according
to their inverse document frequencies to account
for the relative importance of the given term in the
cluster. The inverse document frequency of a term
w is given by (Sparck-Jones, 1972)
N
(1)
idf(w) = log
nw
where nw is the number of speeches in the cluster containing the term w, and N is the number of
documents in the cluster. We calculated idf values
specific to each topic, rather than to all speeches.
We preferred to use topic-specific idf values because the relative importance of words may vary
from one topic to the other.
The tf-idf cosine similarity measure is computed
as the cosine of the angle between the tf-idf vectors. It is defined as follows:
√P
P
2
w∈u,v tf u (w) tf v (w) idf(w)
,
P
(tf
(w)
idf(w))2
v
w∈v
2
w∈u (tf u (w) idf(w))
√
(2)
The choice of tf-idf scores to measure speech
similarity is an arbitrary choice. Some other possible similarity measures are edit distance, language
models (Kurland and Lee, 2005), or generation
probabilities (Erkan, 2006).
The recursive definition of the score of any
speech s in the speeches network is given by
X p(t)
p(s) =
(3)
deg(t)
where deg(t) is the degree of node t, and adj[s] is
the set of all speeches adjacent to s in the network.
This can be rewritten in matrix notation as:
p = pB
(4)
where p = (p(s1 ), p(s2 ), . . . , p(sN )) and the matrix B is the row normalized similarity matrix of
the graph
S(i, j)
B(i, j) = P
k S(i, k)
(5)
where S(i, j) = sim(si , sj ). Equation (4) shows
that the vector of salience scores p is the left eigenvector of B with eigenvalue 1.
The matrix B can be thought of as a stochastic
matrix that acts as transition matrix of a Markov
chain. An element X(i, j) of a stochastic matrix
specifies the transition probability from state i to
state j in the corresponding Markov chain. And
the whole process can be seen as a Markovian random walk on the speeches graph. To help the random walker escape from periodic or disconnected
components, (Brin and Page, 1998) suggests reserving a small escape probability at each node
that represents a chance of jumping to any node
in the graph, making the Markov chain irreducible
and aperiodic, which guarantees the existence of
the eigenvector.
Equation (4) can then be rewritten, assuming a
uniform escape probability, as:
p = p[dU + (1 − d)B]
(6)
where N is the total number of nodes, U is a
square matrix with U(i, j) = 1/N for all i, j, and
d is the escape probability chosen in the interval
[0.1, 0.2] (Brin and Page, 1998).
4.2
Dynamic Salience Scores
We use the time stamps associated with the data to
compute dynamic salience scores pT (u) that identify central speakers at some time T . To do this,
we create a speech graph that evolves over time.
Let T be the current date and let u and v be two
speech documents that occur on days tu and tv .
Our goal is to discount the lexical similarity of u
and v based on how far apart they are. One way
to do this is by defining a new similarity measure
s(u, v; T ) as:
t∈adj[s]
315
s(u, v; T ) = tf-idf-cosine(u, v) · f (u, v; T ) (7)
where f (u, v; T ) is a function taking values in
[0, 1].
If f (u, v; T ) = 1 for all u, v, and T , then time is
ignored when calculating similarity and pT (u) =
p(u). On the other hand, suppose we let
(
1 if tu = tv = T ,
f (u, v; T ) =
(8)
0 else.
This removes all edges that link a speech, occurring at some time T , to all other speeches occurring at some time other than T and the ranking algorithm will be run on what is essentially the subgraph of documents restricted to time T (although
the isolated speech documents will receive small
non-zero scores because of the escape probability
from Section 4.1). These two cases act as the extreme boundaries of possible functions f : in the
first case time difference has no effect on document
similarity, while in the second case two documents
must occur on the same day to be similar.
We use the following time weight functions in
our experiments. In each case, we assume that the
speeches represented by speech documents u and v
have already occurred, that is, tu , tv ≤ T . We will
use the convention that f (u, v; T ) = 0 if tu > T
or tv > T for all time weight functions, which
captures the idea that speeches that have not yet
occurred have no influence on the graph at time T .
Also define
age(u, v; T ) = T − min{tu , tv }
This function gives speech documents that
occur at time T full weight and then decreases
their weight linearly towards time T + b,
where it becomes 0.
• Boundary: Given d ≥ 0, define
(
1 if age(u, v; T ) ≤ d
fbnd,d (u, v; T ) =
0 if age(u, v; T ) > d
(12)
This function gives speech documents occurring within d days of T the regular tf-idf similarity score, but sets the similarity of speech
documents occurring outside of d days to 0.
The case when d = 0 is one of the boundary
cases explained above.
(9)
which gives the age of the oldest speech document
from the pair u, v at time T .
• Exponential: Given a parameter a > 0, define
fexp,a (u, v; T ) = e−a age(u,v;T ) .
Figure 1: The Dynamic boundary cases for Senator Santorum.
(10)
This function will decrease the impact of similarity as time increases in an exponential
fashion. a is a parameter that controls how
fast this happens, where a larger value of a
makes earlier speeches have a small impact
on current scores and a smaller value of a
means that earlier speeches will have a larger
impact on current scores.
• Linear: Given b > 0, define

1

1 − b age(u, v; T )
flin,d (u, v; T ) =
if age(u, v; T ) ≤ b


0 if age(u, v; T ) > b
(11)
Figure 1 gives an example of different time
weighting functions for Senator Rick Santorum
(R - Pennsylvania) on topic 22 (Abortion) during
1997, the first session of the 105th Congress. The
dashed line shows the case when time has no effect on similarity (his score is constant over time),
while the solid line shows the case where only
speeches on the current day are considered similar (his score spikes only on days where he speaks
and is near zero otherwise). The dotted line shows
the case when the influence of older speeches decreases exponentially, which is more dynamic than
the first case but smoother than the second case.
5
5.1
Experiments and Results
Data
We used the United States Congressional Speech
corpus (Monroe et al., 2006) in our experiment.
316
This corpus is in XML formatted version of the
electronic United States Congressional Record
from the Library of Congress1 . The Congressional
Record is a verbatim transcript of the speeches
made in the US House of Representatives and Senate and includes tens of thousands of speeches per
year (Monroe et al., 2006). The data we used cover
the period from January 2001 to January 2003.
5.2
Experimental Setup
We used results from (Quinn et al., 2006) to get
topic clusters from the data, as described in Section 3. The total number of topics was 42. The
average sized topic cluster had several hundred
speech documents (Quinn et al., 2006).
We set up a pipeline using a Perl implementation of the proposed method We ran it on the topic
clusters and ranked the speakers based on the centrality scores of their speeches. The graph nodes
were speech documents. A speaker’s score was
determined by the average of the scores of the
speeches given by that speaker. After comparing
the different time weighting function as shown in
Figure 1, we decided to use the exponential time
weight function for all the experiments discussed
below. Exponential time weighting function decreases the impact of similarity as time increases
in an exponential fashion. It also allows us to control the rate of decay using the parameter a.
5.3
Baseline
We compare the performance of our system to
a simple baseline that calculates the salience of
a speaker as a weighted count of the number of
times he has spoken. The baseline gives high
weight to recent speeches . The weight decreases
as the speeches gets older. The salience score of a
speaker is calculate as follows:
BS(i) =
X
γ d0 −d ∗ Sdi
(13)
d
Where BS(i) is the baseline score of speaker i,
γ is the discounting factor, d0 is the current date,
and Sdi is the number of speeches made by speaker
i at date d. We used γ = 0.9 for all our experiments.
5.4
Results
One way to evaluate the dynamic salience scores,
is to look at changes when party control of the
1
http://thomas.loc.gov
chamber switches. Similar to (Hartog and Monroe, 2004), we exploit the party switch made by
Senator Jim Jeffords of Vermont and the resulting change in majority control of the Senate during the 107th Congress as a quasi-experimental
design. In short, Jeffords announced his switch
on May 24, 2001 from Republican to Independent
status, effective June 6, 2001. Jeffords stated that
he would vote with the Democrats to organize the
Senate, giving the Democrats a one-seat advantage
and change control of the Senate from the Republicans back to the Democrats. This change of majority status during the 107th Congress allows us
to ignore many of the factors that could potentially
influence dynamic salience scores at the start of a
new congress.
On average, we expect committee chairs or a
member of the majority party to be the most important speaker on each topic followed by ranking
members or a member of the minority party. If
our measure is capturing dynamics in the centrality of Senators, we expect Republicans to be more
central before the Jeffords switch and Democrats
becoming central soon afterwards, assuming the
topic is being discussed on the Senate floor. We
show that the proposed technique captures several
interesting events in the data and also show that the
baseline explained above fails to capture the same
set of events.
Figure 2(a) shows the dynamic salience scores
over time for Senator John McCain (R - Arizona)
and Senator Carl Levin (D - Michigan) on topic
5 (Armed Forces 2) for the 107th Senate. McCain was the most salient speaker for this topic
until June 2001. Soon after the change in majority status a switch happened and Levin, the new
chair of Senate Armed Services, replaced McCain
as the most salient speaker. On the other hand,
Figure 2(b) shows the baseline scores for the same
topic and same speakers. We notice here that the
baseline failed to capture the switch of salience
near June 2001.
We can also observe similar behavior in Figure 3(a). This figure shows how Senate Majority
Leader Trent Lott (R - Mississippi) was the most
salient speaker on topic 35 (Procedural Legislation) until July 2001. Topic 35 does not map to
a specific committee but rather is related to maneuvering bills through the legislative process on
the floor, a job generally delegated to members in
the Senate leadership. Just after his party gained
317
Dynamic Lexrank, Senate 107 Armed Forces 2 (Infrastructure)’’, Exponential, a=0.02, th=
Baseline, Senate 107 Armed Forces 2 (Infrastructure)
0.8
10
MCCAIN
LEVINCARL
MCCAIN
LEVINCARL
0.7
8
0.6
6
LexRank
LexRank
0.5
0.4
4
0.3
0.2
2
0.1
0
Jan01 Mar01 May01
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Jul02
0
Jan01 Mar01 May01
Sep02 Nov02 Jan03
Jul01
Time
Sep01 Nov01 Jan02 Mar02 May02
Jul02
Sep02 Nov02 Jan03
Time
(a) Dynamic Lexrank
(b) Baseline
Figure 2: The Switch of Speakers Salience near Jun 2001 for Topic 5(Armed Forces 2).
Dynamic Lexrank, Senate 107 Procedural 4 (Legislaton 2)’’, Exponential, a=0.02, th=
Baseline, Senate 107 Procedural 4 (Legislaton 2)’’
0.8
20
REID
LOTT
REID
LOTT
0.7
0.6
15
LexRank
LexRank
0.5
0.4
10
0.3
0.2
5
0.1
0
Jan01 Mar01 May01
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Time
Jul02
0
Jan01 Mar01 May01
Sep02 Nov02 Jan03
(a) Dynamic Lexrank
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Time
Jul02
Sep02 Nov02 Jan03
(b) Baseline
Figure 3: The Switch of Speakers Salience near Jun 2001 for Topic 35(Procedural Legislation).
majority status, Senator Harry Reid (D - Nevada)
became the most salient speaker for this topic. This
is consistent with Reid’s switch from Assistant minority Leader to Assistant majority Leader. Again
the baseline scores for the same topic and speakers
in Figure 3(b) fails to capture the switch.
An even more interesting test would be to check
whether the Democrats in general become more
central than Republicans after the Jeffords switch.
Figure 4(a) shows the normalized sum of the
scores of all the Democrats and all the Republicans
on topic 5 (Armed Forces 2) for the 107th Senate.
The figure shows how the Republicans were most
salient until soon after the Jeffords switch when the
Democrats regained the majority and became more
salient. We even discovered similar behavior when
we studied how the average salience of Democrats
and Republicans change across all topics. This is
shown in Figure 5(a) where we can see that the
Republicans were more salient on average for all
topics until June 2001. Soon after the change in
majority status, Democrats became more central.
Figures 4(b) and 5(b) show the same results using
the baseline system. We notice that the number of
speeches made by the Democrats and the Republicans is very similar in most of the times. Even
when one of the parties has more speeches than
the other, it does not quite reflect the salience of
the speakers or the parties in general.
An alternative approach to evaluate the dynamic
scores is to exploit the cyclical nature of the legislative process as some bills are re-authorized on
a fairly regular time schedule. For example, the
farm bill comes due about every five years. As a
new topic is coming up for debate, we expect the
saliency scores for relevant legislators to increase.
Figure 6 shows the dynamic scores of Senator
Thomas Harkin (D - Iowa), and Senator Richard
318
Dynamic Lexrank, Senate 107 Armed Forces 2’’
Baseline, Senate 107 Armed Forces 2 (Infrastructure)
1
30
Republicans
Democrats
Democrates
Republicans
25
0.8
20
LexRank
LexRank
0.6
15
0.4
10
0.2
5
0
Jan01 Mar01 May01
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Jul02
0
Jan01 Mar01 May01
Sep02 Nov02 Jan03
Jul01
Time
Sep01 Nov01 Jan02 Mar02 May02
Jul02
Sep02 Nov02 Jan03
Time
(a) Dynamic Lexrank
(b) Baseline
Figure 4: The Switch of Speakers Salience near Jun 2001 for Topic 5(Armed Forces 2), Republicans vs
Democrats.
Dynamic Lexrank, Senate 107
Baseline, Senate 107
20
Republicans
Democrats
Democrates
Republicans
700
18
600
500
LexRank
LexRank
16
14
400
300
200
12
100
10
Jan01 Mar01 May01
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Time
Jul02
0
Jan01 Mar01 May01
Sep02 Nov02 Jan03
(a) Dynamic Lexrank
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Time
Jul02
Sep02 Nov02 Jan03
(b) Baseline
Figure 5: The Switch of Speakers Salience near Jun 2001 for All Topics, Republicans vs Democrats.
Lugar (R - Indiana) during the 107th senate on
topic 24 (Agriculture). The two senators were
identified, by the proposed method, as the most
salient speakers for this topic, as expected, since
they both served as chairmen of the Senate Committee on Agriculture, Nutrition, and Forestry
when their party was in the majority during the
107th Senate. This committee was in charge of
shepherding the Farm Bill through the Senate. The
scores of both senators on the agriculture topic significantly increased starting late 2001 until June
2002. The debate began on the bill starting in
September of 2001 and it was not passed until May
2002.
6
Conclusion
ticipants in a discussion. We used this method to
track the evolution of salience of speakers in the
US Congressional Record. We showed that the
way salience scores evolve over time can answer
several interesting issues. We tracked how the influence of the speakers vary with majority status
and change of party control. We also show how
a baseline system that depends on the number of
speeches fails to capture the interesting events captured by the proposed system. We also studied the
dynamics of the relative distribution of attention to
each topic area in different time periods and captured the cyclical nature of the legislative process
as some bills are re-authorized on a fairly regular
time schedule.
We presented a graph based method for analyzing the temporal evolution of the salience of par319
switch on asset prices of republican and democratic
firms. Legislative Studies Quarterly, 33:63–84.
Dynamic Lexrank, Senate 107 Agriculture’’, Exponential, a=0.02, th=
0.5
LUGAR
HARKIN
Kleinberg, Jon. 1998. Authoritative sources in a hyperlinked environment. In the ACM-SIAM Symposium
on Discrete Algorithms, pages 668–677.
0.4
LexRank
0.3
Kurland, Oren and Lillian Lee. 2005. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In SIGIR 2005, pages
306–313.
0.2
0.1
0
Jan01 Mar01 May01
Jul01
Sep01 Nov01 Jan02 Mar02 May02
Jul02
Sep02 Nov02 Jan03
Time
Figure 6: The Farm Bill Discussions on the Relative Distribution of Attention to Topic 24 (Agriculture).
Kurland, Oren and Lillian Lee. 2006. Respect my authority! HITS without hyperlinks, utilizing clusterbased language models. In SIGIR 2006, pages 83–
90.
McLachlan, Geoffrey and David Peel. 2000. Finite
Mixture Models. New York: Wiley.
Mihalcea, Rada and Paul Tarau. 2004. TextRank:
Bringing order into texts. In EMNLP 2004.
Acknowledgments
This paper is based upon work supported by
the National Science Foundation under Grant No.
0527513, ”DHB: The dynamics of Political Representation and Political Rhetoric”. Any opinions,
findings, and conclusions or recommendations expressed in this paper are those of the authors and
do not necessarily reflect the views of the National
Science Foundation.
References
Blei, David and John Lafferty. 2006. Dynamic topic
models. In ICML 2006.
Blei, David, Andrew Ng, and Michael Jordan. 2003.
Latent dirichlet allocation. Journal of Machine
Learning Research, 3:993–1022.
Brin, Sergey and Lawrence Page. 1998. The anatomy
of a large-scale hypertextual Web search engine.
CNIS, 30(1–7):107–117.
Erkan, Güneş and Dragomir Radev. 2004. Lexrank:
Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research
(JAIR).
Erkan, Gunes. 2006. Language model-based document
clustering using random walks. In HLT/NAACL
2006, pages 479–486. Association for Computational Linguistics.
Mihalcea, Rada, Paul Tarau, and Elizabeth Figa. 2004.
Pagerank on semantic networks, with application
to word sense disambiguation. In COLING 2004,
pages 1126–1132.
Monroe, Burt, Cheryl Monroe, Kevin Quinn, Dragomir
Radev, Michael Crespin, Michael Colaresi, Anthony
Fader, Jacob Balazer, and Steven Abney. 2006.
United states congressional speech corpus. Department of Political Science, The Pennsylvania State
University.
Newman, Mark. 2003. A measure of betweenness
centrality based on random walks. Technical Report
cond-mat/0309045, Arxiv.org.
Porter, Mason, Peter Mucha, Miark Newman, and
Casey Warmbrand. 2005. A network analysis of
committees in the U.S. House of Representatives.
PNAS, 102(20).
Quinn, Kevin, Burt Monroe, Michael Colaresi, Michael
Crespin, and Dragomir Radev. 2006. An automated
method of topic-coding legislative speech over time
with application to the 105th-108th U.S. senate. In
Midwest Political Science Association Meeting.
Sparck-Jones, Karen. 1972. A statistical interpretation
of term specificity and its application in retrieval.
Journal of Documentation, 28(1):11–20.
Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get
out the vote: Determining support or opposition from
Congressional floor-debate transcripts. In EMNLP
2006, pages 327–335.
Fader, Anthony, Dragomir Radev, Michael Crespin,
Burt Monroe, Kevin Quinn, and Michael Colaresi.
2007. Mavenrank: Identifying influential members
of the us senate using lexical centrality. In EMNLP
2007.
Hartog, Chris Den and Nathan Monroe. 2004. The
value of majority status: The effect of jeffords’s
320