Report

advertisement
ECE 539: Artificial Neural Networks
Prediction of Voting Patterns Based on Census and Demographic Data
Analysis PerformedBy: Mike He
Abstract:
This project aims to predict the voting patterns of counties in Wisconsin and
Minnesota in the 2004 Presidential Election based on the analysis of demographic data by
multi-layer perceptrons with back-propagation learning algorithms. Demographic data will
consist of population levels, composition by gender, composition by race, and composition by
age group.
Problem Statement
The early prediction of presidential elections is the subject of much debate and
speculation by a whole industry of pundits and specialists, and for good reason too: the results
of presidential elections can have enormous impact and national and world affairs. Predictions
are made based on a wide range of indicators, from standard predictors, such as opinion polls,
to the more eccentric, such as pop culture superstitions.
There are many long-standing generalizations about what factors lead to certain voting
patterns. A large number of these center around the demographic composition of districts. For
example, it is widely generalized that women tend to vote more liberal than men; that
minorities tend to vote more liberal than Caucasians; that urban centers tend to vote more
liberal than rural areas; and that older people tend to vote more conservatively than younger
people.
As an example, here is a this graph of county electoral results, where blue marks
counties that voted predominantly Democratic and red predominantly Republican. It is clear
the central region of the country tends to vote Republican. This region is characterized by
being more rural and having a smaller percentage of minorities. The blue regions are
primarily found near the big cities or in major coastal areas, both of which are marked by high
urban density and a greater proportion of minorities.
2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and
Mark Newman; University of Michigan 3
This map is also misleading, as is appears that the vast majority of the country votes
Republican. The reason for this is the disproportionately high population that exists in the
regions that tended to vote Democratic. This can be clearly seen in the next graph, which
displays the areas of counties based on the population present in them:
2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and
Mark Newman; University of Michigan 3
This map clearly shows a fairly even split between red counties and blue counties, and also
highlights the extreme urban density seen in the blue regions in the first map.
Even the second graph is misleading in another way. It appears as though bright
divisions exist between Republican and Democratic counties. The reality is very different –
most wins are only by slim margins, representing a much more homogenous country than
what appears from the graph. This can be seen from the following graph, which displays
voting results as shades of red, purple, and blue, rather than single colors:
2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and
Mark Newman; University of Michigan 3
This also highlights one of the difficulties in this analysis. With margins of victories that are
often single percentage points of fewer, it is difficult to decisively call a region for one
candidate or the other. This will manifest itself later, when many close calls are predicted in
the results. This is also a reason that the decision was made to judge error by vote percentages
rather than strict calls for one candidate or another.
Data Specification
The voting records were obtained for the counties in Wisconsin and Minnesota for the
2004 Presidential Election. Data was acquired from the USA Today 2004 Presidential
Election Results Report.1 Demographic data was obtained from the US Census Bureau from
the 2000 US Census.2 It is assumed that there was not significant shift in the county
demographics between 2000 and 2004, and that any changes were largely random and
distributed.
The county-by-county voting and demographic data was obtained from two states:
Wisconsin and Minnesota. The data from Wisconsin was used for training and the data from
Minnesota used for testing.
There were a total of 14 features. These were:














Total County Population
Percentage of Population of Males
Percentage of Population of Females
Percentage of Population of White/Caucasians
Percentage of Population of Black/African Americans
Percentage of Population of American Indian/Native Americans
Percentage of Population of Asian Americans
Percentage of Population of Pacific Islanders
Percentage of Population of Multiracial People
Percentage of Population of 18-24 age group
Percentage of Population of 25-44 age group
Percentage of Population of 45-64 age group
Percentage of Population of 65+ age group
Median Age of Resident
The features can be roughly divided into four categories: those dealing with population size,
those dealing with gender composition, those dealing with racial composition, and those
dealing with age composition.
The output of the data was specified by the percentage of votes that went to President
George W. Bush, Senator John Kerry, or Ralph Nader. The percentage was used for error
analysis rather than a straight binary vote due to the large number of extremely close votes
and the resulting difficulty in calling these counties one way or another. Furthermore, it is a
better representation of the actual nature of our political composition.
Data Processing
The data collected was first processed to obtain a usable form, such as calculating the
percentage of men and women in each county rather than using total amounts. Then, each
feature was scaled to have zero mean and unity variance. This was done to give each feature
equal weight in the analysis process. This output targets were scaled to a range from 0.2 to 0.8
to better suit the activation functions, discussed below.
Programming
Programming and experimental analysis were performed in MATLab 7.0. A number
of library files from class were used in this experiment, most modified to suit the particular
needs of the project. Additional scripts and functions were written to handle the running of the
experiments.
Methodology
The analysis of the processed data was performed using Multi-Layer Perceptrons
(MLP), utilizing a back-propagation learning algorithm. The equations can be summarized as
follows:
The error propagation back pass is:

()
i
N (  1)

( )
(  1)
(  1)
 f ' ui ( k )    m (k )  wim (t )   L,
(k )  
m1
 f ' u ( L ) ( k )  [ d ( k )  z ( L ) ( k )]
  L.
i
i
i





And the two equations to determine the weight update pass are:

K
E

 i( ) (k ) z (j 1) (k )

( )
wij (t ) k 1
wij(  ) (t  1)  wij(  ) (t )  
E
  wij(  ) (t )  wij(  ) (t  1) 
( )
wij (t )
The MLP consists of an input layer, which consists of the input feature data; a number
of hidden layers, each with a number of hidden neurons; and an output layer. The hidden
layers use hyperbolic tangent (tanh) activation functions while the output layer uses sigmoid
activation functions. The sigmoid activation functions work better when outputs are scaled to
between 0.2 and 0.8, which is the reason for doing so in data processing.
The training of the MLP was performed with the training data set, in this case county
data from Wisconsin. The testing was performed with either the training data set when
training errors were needed and with the testing data set, the county data from Minnesota,
when the final testing results were needed. The Learning Coefficient was set to be 0.4 and the
momentum coefficient 0.8. These values were determined as part of an experiment, discussed
further later.
Determination of Network Structure
The first step taken in the creation of an analytical MLP tool is determining what
network structure to use. Many different configurations were tested by setting up an MLP
with that configuration, performing training, then using the trained MLP to analyze the
training data. The training error was found for each my measuring the total square error of
each element. Configurations with exceptionally high training error were thrown out. The best
performers were further evaluated by performing multiple trials and summing the errors. This
error was compared to the errors produced by other configurations, and the best configuration
chosen. The results of this comparison are shown:
Configuration
Trial 1 Error
Trial 2 Error
Trial 3 Error
Trial 4 Error
Trial 5 Error
Trial 6 Error
Trial 7 Error
Trial 8 Error
Trial 9 Error
Trial 10 Error
Total Error
15x5
0.889456
0.811848
0.704948
0.938739
0.735102
0.86532
0.788699
0.829218
0.961989
0.648543
8.173863
20x5
0.935048
0.741209
0.980096
0.743952
0.893687
0.92844
0.721015
0.791872
0.865029
0.716065
8.316413
15x3
0.721439
0.946445
0.786312
0.815924
0.780944
0.909761
0.962262
0.810785
0.709157
0.882153
8.325182
15x8
0.823031
0.913062
0.712313
0.991811
0.906314
0.869915
0.877011
0.96073
0.921192
0.889
8.86438
14x3
0.949631
0.682764
1.0061
0.929048
0.823046
0.878523
0.770735
0.839168
0.753341
0.831507
8.463862
8x4
0.864291
0.676449
0.800232
0.905284
0.786811
0.765392
0.825166
0.89271
0.849462
0.877225
8.243021
5x3
0.822366
0.798034
0.744853
0.791891
0.970131
0.802426
0.938739
0.890154
0.803414
0.778872
8.340881
The final configuration chosen was 15x5, which indicated 15 neurons in the hidden layers and
4 total hidden layers plus one output layer.
Determination of Coefficients
Next, an experiment was performed to determine the optimal setting for the learning
and momentum coefficients. Using the network structure determined before, different
coefficients were evaluated for their performance. This time, the trials were evaluated by two
criteria. One was the square error as before, and the other was the maximum square error
produced from all the data points. First, a large number were tested for an initial screening.
The better performers were isolated for further testing, detailed below.
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
Trial 9
Trial 10
Total
α, μ = 0.2,0.5
Total
Max
Square Square
Error
Error
0.9524 0.0861
0.8459 0.0584
0.8562 0.0657
0.8055 0.0634
0.8501 0.0819
0.8693 0.0688
0.9642 0.0661
0.8084 0.0669
0.8651 0.0876
0.7102 0.0678
8.5273 0.7127
α, μ = 0.1,0.8
Total
Max
Square Square
Error
Error
0.9887 0.0651
0.9368 0.0997
0.6276 0.0501
0.7808 0.0771
0.9003 0.0945
0.8712 0.0726
0.863 0.0734
0.9234 0.0751
0.8566 0.0814
0.9186 0.0735
8.667 0.7625
α, μ = 0.4,0.8
Total
Max
Square Square
Error
Error
3.1269 0.0389
3.6762 0.0584
3.3724 0.0671
3.4069 0.0643
3.602 0.0676
3.4037 0.1188
3.4384 0.0596
3.601 0.0881
3.4643 0.0543
3.6253 0.0481
34.7171 0.6652
α, μ = 0.01, 0.5
Total
Max
Square Square
Error
Error
1.0604 0.1019
0.9735 0.0729
0.8484 0.0723
1.0305 0.1044
0.9709 0.0864
1.0792 0.0747
1.1311 0.1249
0.8182 0.0671
0.8831
0.076
0.8455 0.0617
9.6408 0.8423
α, μ = 0.6,0.8
Total
Max
Square Square
Error
Error
3.4711 0.0607
3.5553 0.0487
3.4959 0.0693
3.4841 0.0698
3.4718 0.0651
3.4879 0.0682
3.4861 0.0541
3.5458
0.068
3.4716
0.058
3.4743 0.0628
34.9439 0.6247
The chosen values for the learning and momentum coefficients are 0.2 and 0.5 respectively.
Testing Against Training Data
Next, the trained and configured MLP was tested against the training data to evaluate
the performance so far. The criteria for this evaluation included the total square error, the max
square error, and an additional test of classification rate to see which candidate the MLP
decided the county would vote for compared with actual voting results. Here the MLP is
attempting to make a prediction of who will win the popular vote of the county based on the
predicted voting percentages, and is therefore a more difficult task due to the closeness
between votes in many counties.
100 trials were performed of training and testing of the MLP. The results are
summarized below:
Trial
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Total
Square
Error
0.7935
0.8454
0.9309
0.7057
0.9628
0.7859
0.7761
0.8668
0.8613
0.8420
0.8391
0.8424
0.7801
0.7799
0.7735
0.7181
1.0524
0.7658
0.8647
0.7479
0.8722
0.8409
0.9002
0.9044
0.8354
0.9474
0.8559
1.0559
0.6800
0.6815
1.1191
0.8938
0.8494
0.8450
0.7824
0.8824
0.8216
0.8822
0.8369
0.8011
0.8296
0.9624
0.7568
0.7391
0.7675
0.7400
Max
Square
Error
0.1080
0.0790
0.0865
0.0824
0.0968
0.0414
0.0512
0.0714
0.0693
0.0829
0.0652
0.0977
0.0620
0.0793
0.0609
0.0747
0.0807
0.0795
0.0559
0.0584
0.0700
0.0738
0.0814
0.0840
0.0652
0.0676
0.0695
0.1112
0.0672
0.0541
0.1316
0.0805
0.0673
0.0687
0.0667
0.0671
0.0827
0.0865
0.0883
0.0688
0.0660
0.1320
0.0782
0.0638
0.0389
0.0668
Classification
Rate
0.7778
0.7778
0.7917
0.8056
0.8056
0.8056
0.7500
0.8056
0.7778
0.7500
0.7222
0.7778
0.7222
0.7361
0.7361
0.7500
0.8056
0.7639
0.8056
0.7639
0.8056
0.7361
0.7361
0.7917
0.7917
0.7917
0.7639
0.7222
0.7639
0.7639
0.8056
0.7639
0.7361
0.7639
0.7361
0.7778
0.7500
0.7500
0.7778
0.7500
0.7639
0.7778
0.7361
0.7917
0.7222
0.7500
Correct
Classifications
56.0000
56.0000
57.0000
58.0000
58.0000
58.0000
54.0000
58.0000
56.0000
54.0000
52.0000
56.0000
52.0000
53.0000
53.0000
54.0000
58.0000
55.0000
58.0000
55.0000
58.0000
53.0000
53.0000
57.0000
57.0000
57.0000
55.0000
52.0000
55.0000
55.0000
58.0000
55.0000
53.0000
55.0000
53.0000
56.0000
54.0000
54.0000
56.0000
54.0000
55.0000
56.0000
53.0000
57.0000
52.0000
54.0000
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
0.8799
0.7199
0.8072
0.8719
0.8574
0.8965
0.8521
0.9080
0.8920
0.8507
0.8021
0.7270
0.6613
0.9072
0.8518
0.8954
0.8082
0.7965
1.0761
0.8317
0.8492
0.9011
0.7404
0.7979
0.8649
0.9108
0.9303
0.7469
0.7501
0.8051
0.8298
1.1192
0.8145
0.9637
0.8705
0.5819
0.9716
0.7284
0.9753
0.8956
0.9305
0.6456
0.7965
0.7891
0.7510
0.7657
0.7817
0.6483
0.8187
0.8620
0.0642
0.0601
0.0559
0.0588
0.0665
0.0731
0.0553
0.0556
0.0705
0.0718
0.0804
0.0723
0.0761
0.0706
0.0753
0.0944
0.0693
0.0297
0.1147
0.0897
0.0721
0.0739
0.0760
0.0858
0.1044
0.0817
0.0621
0.0825
0.0428
0.0741
0.0672
0.0985
0.1210
0.0939
0.0675
0.0288
0.0838
0.0617
0.0688
0.0618
0.0703
0.0297
0.0689
0.0582
0.0712
0.0689
0.0603
0.0351
0.0863
0.0645
0.7778
0.7639
0.7917
0.7917
0.8194
0.7639
0.6944
0.7778
0.7222
0.7639
0.7222
0.7778
0.7778
0.7500
0.7778
0.7917
0.7639
0.7500
0.7778
0.7778
0.7778
0.7639
0.7500
0.7778
0.8056
0.7500
0.7222
0.7361
0.7639
0.7778
0.7778
0.7917
0.7639
0.7500
0.7778
0.7639
0.7639
0.7639
0.7917
0.7639
0.7778
0.7778
0.7778
0.7361
0.7778
0.8056
0.7917
0.7778
0.7500
0.7361
56.0000
55.0000
57.0000
57.0000
59.0000
55.0000
50.0000
56.0000
52.0000
55.0000
52.0000
56.0000
56.0000
54.0000
56.0000
57.0000
55.0000
54.0000
56.0000
56.0000
56.0000
55.0000
54.0000
56.0000
58.0000
54.0000
52.0000
53.0000
55.0000
56.0000
56.0000
57.0000
55.0000
54.0000
56.0000
55.0000
55.0000
55.0000
57.0000
55.0000
56.0000
56.0000
56.0000
53.0000
56.0000
58.0000
57.0000
56.0000
54.0000
53.0000
97
98
99
100
Average
0.8736
0.9155
0.7323
0.8082
0.8368
0.0811
0.0816
0.0745
0.0726
0.0732
0.7639
0.7917
0.7222
0.8472
0.7675
55.0000
57.0000
52.0000
61.0000
55.2600
Testing Against Testing Data
Finally, the trained and configured MLP was tested against the testing, which in this
case is the application to an entirely different state: Minnesota. Minnesota is similar to
Wisconsin in a number of ways, most importantly in the demographic composition of the
state. In addition, it is different in other important ways, such as the political and economic
tendencies, and therefore makes a good comparison. The criteria for this evaluation again
included the total square error, the max square error, and classification rate. The results are
summarized below:
Trial
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Total
Square
Error
1.748
1.8486
1.8273
1.8099
1.8162
1.8132
1.9981
1.7961
1.864
1.8386
1.8519
1.8899
1.7599
1.672
1.6927
1.756
1.8629
1.8327
1.7759
1.7943
1.8807
1.7642
1.7899
1.7999
1.8506
1.9034
Max
Square
Error
0.1005
0.1389
0.1393
0.1159
0.0962
0.1314
0.1272
0.1266
0.1082
0.1078
0.1251
0.1203
0.1014
0.0797
0.0724
0.1056
0.1153
0.1215
0.1051
0.1446
0.0946
0.1172
0.1134
0.0898
0.0855
0.1038
Classification
Rate
0.7011
0.7011
0.7816
0.7586
0.7241
0.7701
0.6667
0.7701
0.6897
0.7356
0.7011
0.6897
0.7701
0.7126
0.7471
0.7701
0.7471
0.7701
0.7816
0.7471
0.7241
0.7701
0.7701
0.7241
0.7241
0.7471
Correct
Classifications
61
61
68
66
63
67
58
67
60
64
61
60
67
62
65
67
65
67
68
65
63
67
67
63
63
65
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
1.9119
1.8503
1.8749
1.748
1.8485
1.7729
1.69
1.8061
1.8597
1.7511
1.826
1.7359
1.8854
1.8667
1.8069
1.7613
1.799
1.8376
1.7743
1.8991
1.7946
1.8624
1.9459
1.7988
1.7727
1.8577
1.8843
1.8083
1.7571
1.8365
1.7481
1.8276
1.84
1.6395
1.7308
1.8694
1.8201
1.7706
1.7251
1.9569
1.7963
1.7257
1.9231
1.842
1.7438
1.8028
1.8951
1.8768
1.8105
1.9133
0.1245
0.1097
0.1302
0.1054
0.0792
0.1165
0.075
0.0985
0.1354
0.119
0.091
0.1074
0.0878
0.1255
0.1206
0.1001
0.099
0.0975
0.1177
0.1159
0.1113
0.1283
0.0972
0.1002
0.1228
0.1193
0.1208
0.1207
0.115
0.0832
0.1111
0.1098
0.0876
0.1149
0.1187
0.1066
0.0938
0.0726
0.131
0.1361
0.1179
0.1015
0.1198
0.1035
0.1005
0.1332
0.134
0.1353
0.107
0.145
0.7356
0.7356
0.7816
0.7356
0.7471
0.7701
0.7701
0.7701
0.7586
0.7586
0.7471
0.7701
0.7701
0.7356
0.7586
0.7241
0.7701
0.7471
0.7701
0.7471
0.7586
0.7701
0.6552
0.7701
0.7356
0.7471
0.7586
0.7586
0.7586
0.7816
0.7701
0.7471
0.7471
0.7586
0.7701
0.6897
0.7471
0.7241
0.7931
0.7126
0.7701
0.7701
0.7471
0.7586
0.7356
0.7471
0.7701
0.7471
0.7241
0.6897
64
64
68
64
65
67
67
67
66
66
65
67
67
64
66
63
67
65
67
65
66
67
57
67
64
65
66
66
66
68
67
65
65
66
67
60
65
63
69
62
67
67
65
66
64
65
67
65
63
60
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Average
1.8098
1.7634
1.8445
1.9025
1.8114
1.8955
1.7216
1.7065
1.7466
1.7825
1.7952
1.8082
1.82
1.8987
1.8746
1.909
1.8671
1.9113
1.897
1.7034
1.7478
1.8667
1.7049
1.8847
1.8179
0.122
0.105
0.1178
0.1017
0.1306
0.1613
0.0913
0.0874
0.0955
0.1311
0.1203
0.1309
0.1444
0.1417
0.1304
0.1176
0.137
0.1077
0.0977
0.1014
0.099
0.1216
0.0687
0.1236
0.1123
0.7816
0.7816
0.7356
0.7586
0.7701
0.7701
0.7471
0.7471
0.7356
0.7701
0.7356
0.7356
0.7701
0.7816
0.7241
0.7011
0.7241
0.7586
0.7126
0.7701
0.7471
0.7356
0.7586
0.7701
0.7474
68
68
64
66
67
67
65
65
64
67
64
64
67
68
63
61
63
66
62
67
65
64
66
67
65.0200
Discussion of Results
The results obtained from both the training data tests and the testing data tests show a
very promising analysis method. The average classification rate among all the counties over
100 trials was 77% for Wisconsin and 75% for Minnesota, very impressive numbers. The
ability to predict how counties will vote with 77% accuracy after training would be a very
powerful tool in presidential elections. Even more impressive is that moving to a state that is
different in political tendencies still yields a 75% classification rate.
It is important to keep in mind that these predictions are made purely on the basis of
demographic data. This information is readily accessible to anyone, is already extensively
tracked by government agencies, is much easier to obtain than many other forms of
information, and changes in the data are easily tracked. Such innocuous data could be easily
overlooked due to the fact that on the surface, it has nothing to do with politics. However, as
this analysis has demonstrated, even such simple data has good predictive power.
It also demonstrates that at least some of the common generalizations about the voting
tendencies of certain segments of the population have some element of truth. If that were not
the case, this analysis would not have provided predictive power. An interesting topic for
further research would be the relative importance of the different demographic features in
making predictions. This would perhaps give new insight into the structure of society with
respect to politics and identity groups.
This is perhaps also a demonstration of the power of Multi-Layer Perceptrons as well
as demographic data in making predictions about elections. The MLP is the mechanism that
enables the predictions made. It also provides consistently good results – the standard
deviation of the classification rate is less than 3% for both the Wisconsin and Minnesota data,
which is better than that for most polls. Further research could perhaps make the MLP a
useful tool in political elections on many levels throughout many different states.
References:
1. 2004 Presidential Election voting results
http://www.usatoday.com/news/politicselections/vote2004/results.htm
2. US Census 2000 Full Data Sets
http://www.census.gov/popest/datasets.html
3. 2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and Mark
Newman; University of Michigan
http://www-personal.umich.edu/~mejn/election/
Download