Slides

advertisement
Topic-Factorized Ideal Point
Estimation Model for Legislative
Voting Network
Yupeng Gu†, Yizhou Sun†, Ning Jiang‡, Bingyu Wang†, Ting Chen†
†Northeastern
University
‡University of Illinois at Urbana-Champaign
Background
Federal
Legislation
(bill)
The House
Senate
Law
……
Bill 1 Bill 2
United States Congress
Ronald Paul
Barack Obama
The House
Senate
Politician
Republican
Democrat
Ronald Paul
Barack Obama
liberal
conservative
2 /27
Outline
• Background and Problem Definition
• Topic Factorized Ideal Point Model and Learning Algorithm
• Experimental Results
• Conclusion
3 /27
Legislative Voting Network
4 /27
Problem Definition
Output:
Input:
Legislative Network
๐’™๐‘ข : Ideal Points for Politician ๐‘ข
๐’‚๐‘‘ : Ideal Points for Bill ๐‘‘
๐’™๐‘ข ’s on different topics
5 /27
Existing Work
• 1 dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish
and Blei, 2011)
• High-dimensional ideal point model (Poole and Rosenthal, 1997)
• Issue-adjusted ideal point model (Gerrish and Blei, 2012)
6 /27
Motivations
•
Voters have different attitudes on different topics.
Topic 1
Topic 2
Topic 3
Topic 4
•
Traditional matrix factorization method cannot give the meanings for each
dimension.
๐‘€
•
≈ ๐‘ˆ
⋅
๐‘‰๐‘‡
๐‘˜ ๐‘กโ„Ž latent factor
Topics of bills can influence politician’s voting, and the voting behavior can better
interpret the topics of bills as well.
Topic Model:
• Health
• Public Transport
• …
Voting-guided Topic Model:
• Health Service
• Health Expenses
• Public Transport
• …
7 /27
Outline
• Background and Problem Definition
• Topic Factorized Ideal Point Model and Learning Algorithm
• Experimental Results
• Conclusion
8 /27
Topic Factorized IPM
๐‘›(๐‘‘, ๐‘ค)
๐‘ฃ๐‘ข๐‘‘
๐‘ข
๐‘ค
๐‘‘
Entities:
• Politicians
• Bills
• Terms
Links:
• (๐‘ƒ, ๐ต)
• (๐ต, ๐‘‡)
Politicians
Bills
Terms
Heterogeneous Voting Network
9 /27
Text Part
Politicians
Bills
Terms
10 /27
Text Part
• We model the probability of each word in each document as a
mixture of multinomial distributions, as in PLSA (Hofmann, 1999) and
LDA (Blei et al., 2003)
๐’˜๐‘‘ = ๐‘› ๐‘‘, 1 , ๐‘› ๐‘‘, 2 , … , ๐‘› ๐‘‘, ๐‘๐‘ค
๐œƒ๐‘‘๐‘˜ = ๐‘(๐‘˜|๐‘‘)
๐›ฝ๐‘˜๐‘ค = ๐‘(๐‘ค|๐‘˜)
๐‘‘
๐‘˜
๐‘ค
Bill
Topic
Word
๐‘›(๐‘‘,๐‘ค)
๐‘ ๐’˜๐‘‘ ๐œฝ, ๐œท =
(
๐‘ค
๐œƒ๐‘‘๐‘˜ ๐›ฝ๐‘˜๐‘ค )
๐‘˜
๐‘›(๐‘‘,๐‘ค)
๐‘ ๐‘พ ๐œฝ, ๐œท =
(
๐‘‘
๐‘ค
๐œƒ๐‘‘๐‘˜ ๐›ฝ๐‘˜๐‘ค )
๐‘˜
11 /27
Voting Part
Politicians
Bills
Terms
12 /27
Topic Topic
1
2
Voting Part
……
๐‘‘1 ๐‘‘2
……
Topic …… Topic
๐‘˜
Voter ๐‘ข
๐’™๐‘ข
๐‘ฅ๐‘ข1 ๐‘ฅ๐‘ข2
๐‘ฅ๐‘ข๐‘˜
๐พ
๐‘ฅ๐‘ข๐พ
Bill ๐‘‘
๐’‚๐‘‘
๐‘Ž๐‘‘1 ๐‘Ž๐‘‘2
๐‘Ž๐‘‘๐‘˜
๐‘Ž๐‘‘๐พ
-1
1
1
1
1
1
๐‘ข2 0
0
-1
1
1
1
-1
1
๐พ
๐œƒ๐‘‘1 ๐‘ฅ๐‘ข1 ๐‘Ž๐‘‘1
๐‘Ÿ๐‘ข๐‘‘ =
……
1
๐œƒ๐‘‘๐‘˜ ๐‘ฅ๐‘ข๐‘˜ ๐‘Ž๐‘‘๐‘˜
……
……
1
1
-1
1
0
๐พ
๐‘Ÿ๐‘ข๐‘‘ =
๐œƒ๐‘‘๐‘˜ ๐‘ฅ๐‘ข๐‘˜ ๐‘Ž๐‘‘๐‘˜
๐‘˜=1
0
๐‘ ๐‘ฃ๐‘ข๐‘‘ = 1 = ๐œŽ(
User-Bill voting matrix ๐‘ฝ
YEA
๐œƒ๐‘‘๐‘˜ ๐‘ฅ๐‘ข๐‘˜ ๐‘Ž๐‘‘๐‘˜ + ๐‘๐‘‘ )
๐‘˜
๐‘ ๐‘ฃ๐‘ข๐‘‘ = −1 = 1 − ๐œŽ(
๐ผ{๐‘ฃ๐‘ข๐‘‘ =1}
๐‘ ๐‘ฝ ๐œฝ, ๐‘ฟ, ๐‘จ, ๐’ƒ =
๐‘ฅ๐‘ข๐‘˜ ๐‘Ž๐‘‘๐‘˜
๐‘˜=1
๐œƒ๐‘‘๐พ ๐‘ฅ๐‘ข๐พ ๐‘Ž๐‘‘๐พ
1
๐‘Ž๐‘‘๐‘˜ ∈ ๐‘น
๐‘‘๐‘๐ท
๐‘ข1 0
๐‘ข๐‘๐‘ˆ 1
๐‘ฅ๐‘ข๐‘˜ ∈ ๐‘น
(๐‘ ๐‘ฃ๐‘ข๐‘‘ = 1
๐‘ข,๐‘‘ :๐‘ฃ๐‘ข๐‘‘ ≠0
1+๐‘ฃ๐‘ข๐‘‘
2 ๐‘
๐ผ{๐‘ฃ๐‘ข๐‘‘ =−1}
๐‘ฃ๐‘ข๐‘‘ = −1
1−๐‘ฃ๐‘ข๐‘‘
2 )
๐œƒ๐‘‘๐‘˜ ๐‘ฅ๐‘ข๐‘˜ ๐‘Ž๐‘‘๐‘˜ + ๐‘๐‘‘ )
NAY
๐‘˜
13 /27
Combining Two Parts Together
• The final objective function is a linear combination of the two average
log-likelihood functions over the word links and voting links.
๐ฝ ๐œฝ, ๐œท, ๐‘ฟ, ๐‘จ, ๐’ƒ = 1 − ๐œ†
๐‘‘,๐‘ค ๐‘› ๐‘‘, ๐‘ค log(
๐‘๐น
๐‘˜ ๐œƒ๐‘‘๐‘˜ ๐›ฝ๐‘˜๐‘ค )
+๐œ†
1 + ๐‘ฃ๐‘ข๐‘‘
1 − ๐‘ฃ๐‘ข๐‘‘
log ๐‘ ๐‘ฃ๐‘ข๐‘‘ = 1 +
log ๐‘ ๐‘ฃ๐‘ข๐‘‘ = −1 )
2
2
๐‘๐‘‰
๐‘ข,๐‘‘ :๐‘ฃ๐‘ข๐‘‘ ≠0(
s.t.
0 ≤ ๐œƒ๐‘‘๐‘˜ ≤ 1,
and
๐œƒ๐‘‘๐‘˜ = 1
0 ≤ ๐›ฝ๐‘˜๐‘ค ≤ 1,
๐‘˜
๐›ฝ๐‘˜๐‘ค = 1
๐‘ค
• We also add an ๐‘™2 regularization term to ๐ด and ๐‘‹ to reduce over-fitting.
๐ฝ ๐œฝ, ๐œท, ๐‘ฟ, ๐‘จ, ๐’ƒ = 1 − ๐œ†
−
1
(
2๐œŽ 2
๐‘‘,๐‘ค ๐‘› ๐‘‘, ๐‘ค log(
๐‘๐น
๐’™๐’–
๐‘ข
2
+
2
๐’‚๐‘‘
๐‘‘
๐‘˜ ๐œƒ๐‘‘๐‘˜ ๐›ฝ๐‘˜๐‘ค )
2
2
+๐œ†
1 + ๐‘ฃ๐‘ข๐‘‘
1 − ๐‘ฃ๐‘ข๐‘‘
log ๐‘ ๐‘ฃ๐‘ข๐‘‘ = 1 +
log ๐‘ ๐‘ฃ๐‘ข๐‘‘ = −1 )
2
2
๐‘๐‘‰
๐‘ข,๐‘‘ :๐‘ฃ๐‘ข๐‘‘ ≠0(
)
14 /27
Learning Algorithm
• An iterative algorithm where ideal points related parameters (๐‘‹, ๐ด, ๐‘)
and topic model related parameters (๐œƒ, ๐›ฝ) enhance each other.
• Step 1: Update ๐‘‹, ๐ด, ๐‘ given ๐œƒ, ๐›ฝ
• Gradient descent
• Step 2: Update ๐œƒ, ๐›ฝ given ๐‘‹, ๐ด, ๐‘
• Follow the idea of expectation-maximization (EM) algorithm: maximize a lower bound of
the objective function in each iteration
15 /27
Learning Algorithm
• Update ๐œƒ: A nonlinear constrained optimization problem.
Remove the constraints by a logistic function based transformation:
๐‘’ ๐œ‡๐‘‘๐‘˜
๐œƒ๐‘‘๐‘˜ =
1+
๐พ−1 ๐œ‡๐‘‘๐‘˜′
๐‘˜ ′ =1 ๐‘’
1
1+
๐พ−1 ๐œ‡๐‘‘๐‘˜′
๐‘˜ ′ =1 ๐‘’
if 1 ≤ ๐‘˜ ≤ ๐พ − 1
if ๐‘˜ = ๐พ
and update ๐œ‡๐‘‘๐‘˜ using gradient descent.
• Update ๐›ฝ:
Since ๐›ฝ only appears in the topic model part, we use the same updating rule as in PLSA:
where
16 /27
Outline
• Background and Problem Definition
• Topic Factorized Ideal Point Model and Learning Algorithm
• Experimental Results
• Conclusion
17 /27
Data Description
• Dataset:
• U.S. House and Senate roll call data in the years between 1990 and 2013.∗
• 1,540 legislators
• 7,162 bills
• 2,780,453 votes (80% are “YEA”)
• Keep the latest version of a bill if there are multiple versions.
• Randomly select 90% of the votes as training and 10% as testing.
∗
Downloaded from http://thomas.loc.gov/home/rollcallvotes.html
18 /27
Evaluation Measures
• Root mean square error (RMSE) between the predicted vote score and the
ground truth
RMSE =
1+๐‘ฃ๐‘ข๐‘‘
−๐‘
๐‘ข,๐‘‘ :๐‘ฃ๐‘ข๐‘‘ ≠0
2
2
๐‘ฃ๐‘ข๐‘‘ =1
๐‘๐‘‰
• Accuracy of correctly predicted votes (using 0.5 as a threshold for the
predicted accuracy)
Accuracy =
๐‘ข,๐‘‘(๐ผ ๐‘ ๐‘ฃ๐‘ข๐‘‘ =1 >0.5 && ๐‘ฃ๐‘ข๐‘‘ =1
+๐ผ ๐‘ ๐‘ฃ =1 <0.5 && ๐‘ฃ =−1 )
๐‘ข๐‘‘
๐‘ข๐‘‘
๐‘๐‘‰
• Average log-likelihood of the voting link
AvelogL =
๐‘ข,๐‘‘ :๐‘ฃ๐‘ข๐‘‘ ≠0
1+๐‘ฃ๐‘ข๐‘‘
2
1−๐‘ฃ๐‘ข๐‘‘
2
log ๐‘ ๐‘ฃ๐‘ข๐‘‘ =1 +
log ๐‘(๐‘ฃ๐‘ข๐‘‘ =−1)
๐‘๐‘‰
19 /27
Experimental Results
Training Data set
Testing Data set
20 /27
Parameter Study
1
๐ฝ ๐œฝ, ๐œท, ๐‘ฟ, ๐‘จ, ๐’ƒ = 1 − ๐œ† ⋅ ๐‘Ž๐‘ฃ๐‘’๐‘™๐‘œ๐‘”๐ฟ ๐‘ก๐‘’๐‘ฅ๐‘ก + ๐œ† ⋅ ๐‘Ž๐‘ฃ๐‘’๐‘™๐‘œ๐‘”๐ฟ(๐‘ฃ๐‘œ๐‘ก๐‘–๐‘›๐‘”) − 2 (
2๐œŽ
Parameter study on ๐œ†
๐’™๐’–
๐‘ข
2
2
+
๐’‚๐‘‘
2
2
)
๐‘‘
Parameter study on ๐œŽ (regularization coefficient)
21 /27
Case Studies
• Ideal points for three famous politicians:
(Republican, Democrat)
• Ronald Paul (R), Barack Obama (D), Joe Lieberman (D)
Public Transportation
Funds
Health Expenses
Health Service
Law
Military
Financial Institution
Individual Property
Education
Foreign
Ronald Paul
Barack Obama
Joe Lieberman
22 /27
Case Studies
• Pick three topics:
(Republican, Democrat)
23 /27
Case Studies
Bill: H_R_4548 (in 2004)
๐‘ฅ๐‘ข๐‘˜
R. Paul
Nay
H_R_4548
For Unseen Bill ๐‘‘:
๐‘Ž๐‘‘๐‘˜
Topic Model
๐œฝ๐‘‘
TF-IPM
๐’™๐‘ข
Experts
๐’‚๐‘‘ , ๐‘๐‘‘
๐œƒ๐‘‘๐‘˜
๐‘ ๐‘ฃ๐‘ข๐‘‘ = 1 = ๐œŽ(
๐œƒ๐‘‘๐‘˜ ๐‘ฅ๐‘ข๐‘˜ ๐‘Ž๐‘‘๐‘˜ + ๐‘๐‘‘ )
๐‘˜
๐‘ ๐‘ฃ๐‘ข๐‘‘ = 1
24 /27
Outline
• Background and Problem Definition
• Topic Factorized Ideal Point Model and Learning Algorithm
• Experimental Results
• Conclusion
25 /27
Conclusion
• We estimate the ideal points of legislators and bills on multiple
dimensions instead of global ones.
• The generation of topics are guided by two types of links in the
heterogeneous network.
• We present a unified model that combines voting behavior and topic
modeling, and propose an iterative learning algorithm to learn the
parameters.
• The experimental results on real-world roll call data show our
advantage over the state-of-the-art methods.
26 /27
Thanks
Q&A
27 /27
Download