Slides

Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network Yupeng Gu†, Yizhou Sun†, Ning Jiang‡, Bingyu Wang†, Ting Chen† †Northeastern University ‡University of Illinois at Urbana-Champaign Background Federal Legislation (bill) The House Senate Law …… Bill 1 Bill 2 United States Congress Ronald Paul Barack Obama The House Senate Politician Republican Democrat Ronald Paul Barack Obama liberal conservative 2 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 3 /27 Legislative Voting Network 4 /27 Problem Definition Output: Input: Legislative Network 𝒙𝑢 : Ideal Points for Politician 𝑢 𝒂𝑑 : Ideal Points for Bill 𝑑 𝒙𝑢 ’s on different topics 5 /27 Existing Work • 1 dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish and Blei, 2011) • High-dimensional ideal point model (Poole and Rosenthal, 1997) • Issue-adjusted ideal point model (Gerrish and Blei, 2012) 6 /27 Motivations • Voters have different attitudes on different topics. Topic 1 Topic 2 Topic 3 Topic 4 • Traditional matrix factorization method cannot give the meanings for each dimension. 𝑀 • ≈ 𝑈 ⋅ 𝑉𝑇 𝑘 𝑡ℎ latent factor Topics of bills can influence politician’s voting, and the voting behavior can better interpret the topics of bills as well. Topic Model: • Health • Public Transport • … Voting-guided Topic Model: • Health Service • Health Expenses • Public Transport • … 7 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 8 /27 Topic Factorized IPM 𝑛(𝑑, 𝑤) 𝑣𝑢𝑑 𝑢 𝑤 𝑑 Entities: • Politicians • Bills • Terms Links: • (𝑃, 𝐵) • (𝐵, 𝑇) Politicians Bills Terms Heterogeneous Voting Network 9 /27 Text Part Politicians Bills Terms 10 /27 Text Part • We model the probability of each word in each document as a mixture of multinomial distributions, as in PLSA (Hofmann, 1999) and LDA (Blei et al., 2003) 𝒘𝑑 = 𝑛 𝑑, 1 , 𝑛 𝑑, 2 , … , 𝑛 𝑑, 𝑁𝑤 𝜃𝑑𝑘 = 𝑝(𝑘|𝑑) 𝛽𝑘𝑤 = 𝑝(𝑤|𝑘) 𝑑 𝑘 𝑤 Bill Topic Word 𝑛(𝑑,𝑤) 𝑝 𝒘𝑑 𝜽, 𝜷 = ( 𝑤 𝜃𝑑𝑘 𝛽𝑘𝑤 ) 𝑘 𝑛(𝑑,𝑤) 𝑝 𝑾 𝜽, 𝜷 = ( 𝑑 𝑤 𝜃𝑑𝑘 𝛽𝑘𝑤 ) 𝑘 11 /27 Voting Part Politicians Bills Terms 12 /27 Topic Topic 1 2 Voting Part …… 𝑑1 𝑑2 …… Topic …… Topic 𝑘 Voter 𝑢 𝒙𝑢 𝑥𝑢1 𝑥𝑢2 𝑥𝑢𝑘 𝐾 𝑥𝑢𝐾 Bill 𝑑 𝒂𝑑 𝑎𝑑1 𝑎𝑑2 𝑎𝑑𝑘 𝑎𝑑𝐾 -1 1 1 1 1 1 𝑢2 0 0 -1 1 1 1 -1 1 𝐾 𝜃𝑑1 𝑥𝑢1 𝑎𝑑1 𝑟𝑢𝑑 = …… 1 𝜃𝑑𝑘 𝑥𝑢𝑘 𝑎𝑑𝑘 …… …… 1 1 -1 1 0 𝐾 𝑟𝑢𝑑 = 𝜃𝑑𝑘 𝑥𝑢𝑘 𝑎𝑑𝑘 𝑘=1 0 𝑝 𝑣𝑢𝑑 = 1 = 𝜎( User-Bill voting matrix 𝑽 YEA 𝜃𝑑𝑘 𝑥𝑢𝑘 𝑎𝑑𝑘 + 𝑏𝑑 ) 𝑘 𝑝 𝑣𝑢𝑑 = −1 = 1 − 𝜎( 𝐼{𝑣𝑢𝑑 =1} 𝑝 𝑽 𝜽, 𝑿, 𝑨, 𝒃 = 𝑥𝑢𝑘 𝑎𝑑𝑘 𝑘=1 𝜃𝑑𝐾 𝑥𝑢𝐾 𝑎𝑑𝐾 1 𝑎𝑑𝑘 ∈ 𝑹 𝑑𝑁𝐷 𝑢1 0 𝑢𝑁𝑈 1 𝑥𝑢𝑘 ∈ 𝑹 (𝑝 𝑣𝑢𝑑 = 1 𝑢,𝑑 :𝑣𝑢𝑑 ≠0 1+𝑣𝑢𝑑 2 𝑝 𝐼{𝑣𝑢𝑑 =−1} 𝑣𝑢𝑑 = −1 1−𝑣𝑢𝑑 2 ) 𝜃𝑑𝑘 𝑥𝑢𝑘 𝑎𝑑𝑘 + 𝑏𝑑 ) NAY 𝑘 13 /27 Combining Two Parts Together • The final objective function is a linear combination of the two average log-likelihood functions over the word links and voting links. 𝐽 𝜽, 𝜷, 𝑿, 𝑨, 𝒃 = 1 − 𝜆 𝑑,𝑤 𝑛 𝑑, 𝑤 log( 𝑁𝐹 𝑘 𝜃𝑑𝑘 𝛽𝑘𝑤 ) +𝜆 1 + 𝑣𝑢𝑑 1 − 𝑣𝑢𝑑 log 𝑝 𝑣𝑢𝑑 = 1 + log 𝑝 𝑣𝑢𝑑 = −1 ) 2 2 𝑁𝑉 𝑢,𝑑 :𝑣𝑢𝑑 ≠0( s.t. 0 ≤ 𝜃𝑑𝑘 ≤ 1, and 𝜃𝑑𝑘 = 1 0 ≤ 𝛽𝑘𝑤 ≤ 1, 𝑘 𝛽𝑘𝑤 = 1 𝑤 • We also add an 𝑙2 regularization term to 𝐴 and 𝑋 to reduce over-fitting. 𝐽 𝜽, 𝜷, 𝑿, 𝑨, 𝒃 = 1 − 𝜆 − 1 ( 2𝜎 2 𝑑,𝑤 𝑛 𝑑, 𝑤 log( 𝑁𝐹 𝒙𝒖 𝑢 2 + 2 𝒂𝑑 𝑑 𝑘 𝜃𝑑𝑘 𝛽𝑘𝑤 ) 2 2 +𝜆 1 + 𝑣𝑢𝑑 1 − 𝑣𝑢𝑑 log 𝑝 𝑣𝑢𝑑 = 1 + log 𝑝 𝑣𝑢𝑑 = −1 ) 2 2 𝑁𝑉 𝑢,𝑑 :𝑣𝑢𝑑 ≠0( ) 14 /27 Learning Algorithm • An iterative algorithm where ideal points related parameters (𝑋, 𝐴, 𝑏) and topic model related parameters (𝜃, 𝛽) enhance each other. • Step 1: Update 𝑋, 𝐴, 𝑏 given 𝜃, 𝛽 • Gradient descent • Step 2: Update 𝜃, 𝛽 given 𝑋, 𝐴, 𝑏 • Follow the idea of expectation-maximization (EM) algorithm: maximize a lower bound of the objective function in each iteration 15 /27 Learning Algorithm • Update 𝜃: A nonlinear constrained optimization problem. Remove the constraints by a logistic function based transformation: 𝑒 𝜇𝑑𝑘 𝜃𝑑𝑘 = 1+ 𝐾−1 𝜇𝑑𝑘′ 𝑘 ′ =1 𝑒 1 1+ 𝐾−1 𝜇𝑑𝑘′ 𝑘 ′ =1 𝑒 if 1 ≤ 𝑘 ≤ 𝐾 − 1 if 𝑘 = 𝐾 and update 𝜇𝑑𝑘 using gradient descent. • Update 𝛽: Since 𝛽 only appears in the topic model part, we use the same updating rule as in PLSA: where 16 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 17 /27 Data Description • Dataset: • U.S. House and Senate roll call data in the years between 1990 and 2013.∗ • 1,540 legislators • 7,162 bills • 2,780,453 votes (80% are “YEA”) • Keep the latest version of a bill if there are multiple versions. • Randomly select 90% of the votes as training and 10% as testing. ∗ Downloaded from http://thomas.loc.gov/home/rollcallvotes.html 18 /27 Evaluation Measures • Root mean square error (RMSE) between the predicted vote score and the ground truth RMSE = 1+𝑣𝑢𝑑 −𝑝 𝑢,𝑑 :𝑣𝑢𝑑 ≠0 2 2 𝑣𝑢𝑑 =1 𝑁𝑉 • Accuracy of correctly predicted votes (using 0.5 as a threshold for the predicted accuracy) Accuracy = 𝑢,𝑑(𝐼 𝑝 𝑣𝑢𝑑 =1 >0.5 && 𝑣𝑢𝑑 =1 +𝐼 𝑝 𝑣 =1 <0.5 && 𝑣 =−1 ) 𝑢𝑑 𝑢𝑑 𝑁𝑉 • Average log-likelihood of the voting link AvelogL = 𝑢,𝑑 :𝑣𝑢𝑑 ≠0 1+𝑣𝑢𝑑 2 1−𝑣𝑢𝑑 2 log 𝑝 𝑣𝑢𝑑 =1 + log 𝑝(𝑣𝑢𝑑 =−1) 𝑁𝑉 19 /27 Experimental Results Training Data set Testing Data set 20 /27 Parameter Study 1 𝐽 𝜽, 𝜷, 𝑿, 𝑨, 𝒃 = 1 − 𝜆 ⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿 𝑡𝑒𝑥𝑡 + 𝜆 ⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿(𝑣𝑜𝑡𝑖𝑛𝑔) − 2 ( 2𝜎 Parameter study on 𝜆 𝒙𝒖 𝑢 2 2 + 𝒂𝑑 2 2 ) 𝑑 Parameter study on 𝜎 (regularization coefficient) 21 /27 Case Studies • Ideal points for three famous politicians: (Republican, Democrat) • Ronald Paul (R), Barack Obama (D), Joe Lieberman (D) Public Transportation Funds Health Expenses Health Service Law Military Financial Institution Individual Property Education Foreign Ronald Paul Barack Obama Joe Lieberman 22 /27 Case Studies • Pick three topics: (Republican, Democrat) 23 /27 Case Studies Bill: H_R_4548 (in 2004) 𝑥𝑢𝑘 R. Paul Nay H_R_4548 For Unseen Bill 𝑑: 𝑎𝑑𝑘 Topic Model 𝜽𝑑 TF-IPM 𝒙𝑢 Experts 𝒂𝑑 , 𝑏𝑑 𝜃𝑑𝑘 𝑝 𝑣𝑢𝑑 = 1 = 𝜎( 𝜃𝑑𝑘 𝑥𝑢𝑘 𝑎𝑑𝑘 + 𝑏𝑑 ) 𝑘 𝑝 𝑣𝑢𝑑 = 1 24 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 25 /27 Conclusion • We estimate the ideal points of legislators and bills on multiple dimensions instead of global ones. • The generation of topics are guided by two types of links in the heterogeneous network. • We present a unified model that combines voting behavior and topic modeling, and propose an iterative learning algorithm to learn the parameters. • The experimental results on real-world roll call data show our advantage over the state-of-the-art methods. 26 /27 Thanks Q&A 27 /27

Slides

Related documents

Products

Support

Slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib