Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network Yupeng Gu†, Yizhou Sun†, Ning Jiang‡, Bingyu Wang†, Ting Chen† †Northeastern University ‡University of Illinois at Urbana-Champaign Background Federal Legislation (bill) The House Senate Law …… Bill 1 Bill 2 United States Congress Ronald Paul Barack Obama The House Senate Politician Republican Democrat Ronald Paul Barack Obama liberal conservative 2 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 3 /27 Legislative Voting Network 4 /27 Problem Definition Output: Input: Legislative Network ๐๐ข : Ideal Points for Politician ๐ข ๐๐ : Ideal Points for Bill ๐ ๐๐ข ’s on different topics 5 /27 Existing Work • 1 dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish and Blei, 2011) • High-dimensional ideal point model (Poole and Rosenthal, 1997) • Issue-adjusted ideal point model (Gerrish and Blei, 2012) 6 /27 Motivations • Voters have different attitudes on different topics. Topic 1 Topic 2 Topic 3 Topic 4 • Traditional matrix factorization method cannot give the meanings for each dimension. ๐ • ≈ ๐ ⋅ ๐๐ ๐ ๐กโ latent factor Topics of bills can influence politician’s voting, and the voting behavior can better interpret the topics of bills as well. Topic Model: • Health • Public Transport • … Voting-guided Topic Model: • Health Service • Health Expenses • Public Transport • … 7 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 8 /27 Topic Factorized IPM ๐(๐, ๐ค) ๐ฃ๐ข๐ ๐ข ๐ค ๐ Entities: • Politicians • Bills • Terms Links: • (๐, ๐ต) • (๐ต, ๐) Politicians Bills Terms Heterogeneous Voting Network 9 /27 Text Part Politicians Bills Terms 10 /27 Text Part • We model the probability of each word in each document as a mixture of multinomial distributions, as in PLSA (Hofmann, 1999) and LDA (Blei et al., 2003) ๐๐ = ๐ ๐, 1 , ๐ ๐, 2 , … , ๐ ๐, ๐๐ค ๐๐๐ = ๐(๐|๐) ๐ฝ๐๐ค = ๐(๐ค|๐) ๐ ๐ ๐ค Bill Topic Word ๐(๐,๐ค) ๐ ๐๐ ๐ฝ, ๐ท = ( ๐ค ๐๐๐ ๐ฝ๐๐ค ) ๐ ๐(๐,๐ค) ๐ ๐พ ๐ฝ, ๐ท = ( ๐ ๐ค ๐๐๐ ๐ฝ๐๐ค ) ๐ 11 /27 Voting Part Politicians Bills Terms 12 /27 Topic Topic 1 2 Voting Part …… ๐1 ๐2 …… Topic …… Topic ๐ Voter ๐ข ๐๐ข ๐ฅ๐ข1 ๐ฅ๐ข2 ๐ฅ๐ข๐ ๐พ ๐ฅ๐ข๐พ Bill ๐ ๐๐ ๐๐1 ๐๐2 ๐๐๐ ๐๐๐พ -1 1 1 1 1 1 ๐ข2 0 0 -1 1 1 1 -1 1 ๐พ ๐๐1 ๐ฅ๐ข1 ๐๐1 ๐๐ข๐ = …… 1 ๐๐๐ ๐ฅ๐ข๐ ๐๐๐ …… …… 1 1 -1 1 0 ๐พ ๐๐ข๐ = ๐๐๐ ๐ฅ๐ข๐ ๐๐๐ ๐=1 0 ๐ ๐ฃ๐ข๐ = 1 = ๐( User-Bill voting matrix ๐ฝ YEA ๐๐๐ ๐ฅ๐ข๐ ๐๐๐ + ๐๐ ) ๐ ๐ ๐ฃ๐ข๐ = −1 = 1 − ๐( ๐ผ{๐ฃ๐ข๐ =1} ๐ ๐ฝ ๐ฝ, ๐ฟ, ๐จ, ๐ = ๐ฅ๐ข๐ ๐๐๐ ๐=1 ๐๐๐พ ๐ฅ๐ข๐พ ๐๐๐พ 1 ๐๐๐ ∈ ๐น ๐๐๐ท ๐ข1 0 ๐ข๐๐ 1 ๐ฅ๐ข๐ ∈ ๐น (๐ ๐ฃ๐ข๐ = 1 ๐ข,๐ :๐ฃ๐ข๐ ≠0 1+๐ฃ๐ข๐ 2 ๐ ๐ผ{๐ฃ๐ข๐ =−1} ๐ฃ๐ข๐ = −1 1−๐ฃ๐ข๐ 2 ) ๐๐๐ ๐ฅ๐ข๐ ๐๐๐ + ๐๐ ) NAY ๐ 13 /27 Combining Two Parts Together • The final objective function is a linear combination of the two average log-likelihood functions over the word links and voting links. ๐ฝ ๐ฝ, ๐ท, ๐ฟ, ๐จ, ๐ = 1 − ๐ ๐,๐ค ๐ ๐, ๐ค log( ๐๐น ๐ ๐๐๐ ๐ฝ๐๐ค ) +๐ 1 + ๐ฃ๐ข๐ 1 − ๐ฃ๐ข๐ log ๐ ๐ฃ๐ข๐ = 1 + log ๐ ๐ฃ๐ข๐ = −1 ) 2 2 ๐๐ ๐ข,๐ :๐ฃ๐ข๐ ≠0( s.t. 0 ≤ ๐๐๐ ≤ 1, and ๐๐๐ = 1 0 ≤ ๐ฝ๐๐ค ≤ 1, ๐ ๐ฝ๐๐ค = 1 ๐ค • We also add an ๐2 regularization term to ๐ด and ๐ to reduce over-fitting. ๐ฝ ๐ฝ, ๐ท, ๐ฟ, ๐จ, ๐ = 1 − ๐ − 1 ( 2๐ 2 ๐,๐ค ๐ ๐, ๐ค log( ๐๐น ๐๐ ๐ข 2 + 2 ๐๐ ๐ ๐ ๐๐๐ ๐ฝ๐๐ค ) 2 2 +๐ 1 + ๐ฃ๐ข๐ 1 − ๐ฃ๐ข๐ log ๐ ๐ฃ๐ข๐ = 1 + log ๐ ๐ฃ๐ข๐ = −1 ) 2 2 ๐๐ ๐ข,๐ :๐ฃ๐ข๐ ≠0( ) 14 /27 Learning Algorithm • An iterative algorithm where ideal points related parameters (๐, ๐ด, ๐) and topic model related parameters (๐, ๐ฝ) enhance each other. • Step 1: Update ๐, ๐ด, ๐ given ๐, ๐ฝ • Gradient descent • Step 2: Update ๐, ๐ฝ given ๐, ๐ด, ๐ • Follow the idea of expectation-maximization (EM) algorithm: maximize a lower bound of the objective function in each iteration 15 /27 Learning Algorithm • Update ๐: A nonlinear constrained optimization problem. Remove the constraints by a logistic function based transformation: ๐ ๐๐๐ ๐๐๐ = 1+ ๐พ−1 ๐๐๐′ ๐ ′ =1 ๐ 1 1+ ๐พ−1 ๐๐๐′ ๐ ′ =1 ๐ if 1 ≤ ๐ ≤ ๐พ − 1 if ๐ = ๐พ and update ๐๐๐ using gradient descent. • Update ๐ฝ: Since ๐ฝ only appears in the topic model part, we use the same updating rule as in PLSA: where 16 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 17 /27 Data Description • Dataset: • U.S. House and Senate roll call data in the years between 1990 and 2013.∗ • 1,540 legislators • 7,162 bills • 2,780,453 votes (80% are “YEA”) • Keep the latest version of a bill if there are multiple versions. • Randomly select 90% of the votes as training and 10% as testing. ∗ Downloaded from http://thomas.loc.gov/home/rollcallvotes.html 18 /27 Evaluation Measures • Root mean square error (RMSE) between the predicted vote score and the ground truth RMSE = 1+๐ฃ๐ข๐ −๐ ๐ข,๐ :๐ฃ๐ข๐ ≠0 2 2 ๐ฃ๐ข๐ =1 ๐๐ • Accuracy of correctly predicted votes (using 0.5 as a threshold for the predicted accuracy) Accuracy = ๐ข,๐(๐ผ ๐ ๐ฃ๐ข๐ =1 >0.5 && ๐ฃ๐ข๐ =1 +๐ผ ๐ ๐ฃ =1 <0.5 && ๐ฃ =−1 ) ๐ข๐ ๐ข๐ ๐๐ • Average log-likelihood of the voting link AvelogL = ๐ข,๐ :๐ฃ๐ข๐ ≠0 1+๐ฃ๐ข๐ 2 1−๐ฃ๐ข๐ 2 log ๐ ๐ฃ๐ข๐ =1 + log ๐(๐ฃ๐ข๐ =−1) ๐๐ 19 /27 Experimental Results Training Data set Testing Data set 20 /27 Parameter Study 1 ๐ฝ ๐ฝ, ๐ท, ๐ฟ, ๐จ, ๐ = 1 − ๐ ⋅ ๐๐ฃ๐๐๐๐๐ฟ ๐ก๐๐ฅ๐ก + ๐ ⋅ ๐๐ฃ๐๐๐๐๐ฟ(๐ฃ๐๐ก๐๐๐) − 2 ( 2๐ Parameter study on ๐ ๐๐ ๐ข 2 2 + ๐๐ 2 2 ) ๐ Parameter study on ๐ (regularization coefficient) 21 /27 Case Studies • Ideal points for three famous politicians: (Republican, Democrat) • Ronald Paul (R), Barack Obama (D), Joe Lieberman (D) Public Transportation Funds Health Expenses Health Service Law Military Financial Institution Individual Property Education Foreign Ronald Paul Barack Obama Joe Lieberman 22 /27 Case Studies • Pick three topics: (Republican, Democrat) 23 /27 Case Studies Bill: H_R_4548 (in 2004) ๐ฅ๐ข๐ R. Paul Nay H_R_4548 For Unseen Bill ๐: ๐๐๐ Topic Model ๐ฝ๐ TF-IPM ๐๐ข Experts ๐๐ , ๐๐ ๐๐๐ ๐ ๐ฃ๐ข๐ = 1 = ๐( ๐๐๐ ๐ฅ๐ข๐ ๐๐๐ + ๐๐ ) ๐ ๐ ๐ฃ๐ข๐ = 1 24 /27 Outline • Background and Problem Definition • Topic Factorized Ideal Point Model and Learning Algorithm • Experimental Results • Conclusion 25 /27 Conclusion • We estimate the ideal points of legislators and bills on multiple dimensions instead of global ones. • The generation of topics are guided by two types of links in the heterogeneous network. • We present a unified model that combines voting behavior and topic modeling, and propose an iterative learning algorithm to learn the parameters. • The experimental results on real-world roll call data show our advantage over the state-of-the-art methods. 26 /27 Thanks Q&A 27 /27