ECE 539: Artificial Neural Networks Prediction of Voting Patterns Based on Census and Demographic Data Analysis PerformedBy: Mike He Abstract: This project aims to predict the voting patterns of counties in Wisconsin and Minnesota in the 2004 Presidential Election based on the analysis of demographic data by multi-layer perceptrons with back-propagation learning algorithms. Demographic data will consist of population levels, composition by gender, composition by race, and composition by age group. Problem Statement The early prediction of presidential elections is the subject of much debate and speculation by a whole industry of pundits and specialists, and for good reason too: the results of presidential elections can have enormous impact and national and world affairs. Predictions are made based on a wide range of indicators, from standard predictors, such as opinion polls, to the more eccentric, such as pop culture superstitions. There are many long-standing generalizations about what factors lead to certain voting patterns. A large number of these center around the demographic composition of districts. For example, it is widely generalized that women tend to vote more liberal than men; that minorities tend to vote more liberal than Caucasians; that urban centers tend to vote more liberal than rural areas; and that older people tend to vote more conservatively than younger people. As an example, here is a this graph of county electoral results, where blue marks counties that voted predominantly Democratic and red predominantly Republican. It is clear the central region of the country tends to vote Republican. This region is characterized by being more rural and having a smaller percentage of minorities. The blue regions are primarily found near the big cities or in major coastal areas, both of which are marked by high urban density and a greater proportion of minorities. 2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and Mark Newman; University of Michigan 3 This map is also misleading, as is appears that the vast majority of the country votes Republican. The reason for this is the disproportionately high population that exists in the regions that tended to vote Democratic. This can be clearly seen in the next graph, which displays the areas of counties based on the population present in them: 2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and Mark Newman; University of Michigan 3 This map clearly shows a fairly even split between red counties and blue counties, and also highlights the extreme urban density seen in the blue regions in the first map. Even the second graph is misleading in another way. It appears as though bright divisions exist between Republican and Democratic counties. The reality is very different – most wins are only by slim margins, representing a much more homogenous country than what appears from the graph. This can be seen from the following graph, which displays voting results as shades of red, purple, and blue, rather than single colors: 2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and Mark Newman; University of Michigan 3 This also highlights one of the difficulties in this analysis. With margins of victories that are often single percentage points of fewer, it is difficult to decisively call a region for one candidate or the other. This will manifest itself later, when many close calls are predicted in the results. This is also a reason that the decision was made to judge error by vote percentages rather than strict calls for one candidate or another. Data Specification The voting records were obtained for the counties in Wisconsin and Minnesota for the 2004 Presidential Election. Data was acquired from the USA Today 2004 Presidential Election Results Report.1 Demographic data was obtained from the US Census Bureau from the 2000 US Census.2 It is assumed that there was not significant shift in the county demographics between 2000 and 2004, and that any changes were largely random and distributed. The county-by-county voting and demographic data was obtained from two states: Wisconsin and Minnesota. The data from Wisconsin was used for training and the data from Minnesota used for testing. There were a total of 14 features. These were: Total County Population Percentage of Population of Males Percentage of Population of Females Percentage of Population of White/Caucasians Percentage of Population of Black/African Americans Percentage of Population of American Indian/Native Americans Percentage of Population of Asian Americans Percentage of Population of Pacific Islanders Percentage of Population of Multiracial People Percentage of Population of 18-24 age group Percentage of Population of 25-44 age group Percentage of Population of 45-64 age group Percentage of Population of 65+ age group Median Age of Resident The features can be roughly divided into four categories: those dealing with population size, those dealing with gender composition, those dealing with racial composition, and those dealing with age composition. The output of the data was specified by the percentage of votes that went to President George W. Bush, Senator John Kerry, or Ralph Nader. The percentage was used for error analysis rather than a straight binary vote due to the large number of extremely close votes and the resulting difficulty in calling these counties one way or another. Furthermore, it is a better representation of the actual nature of our political composition. Data Processing The data collected was first processed to obtain a usable form, such as calculating the percentage of men and women in each county rather than using total amounts. Then, each feature was scaled to have zero mean and unity variance. This was done to give each feature equal weight in the analysis process. This output targets were scaled to a range from 0.2 to 0.8 to better suit the activation functions, discussed below. Programming Programming and experimental analysis were performed in MATLab 7.0. A number of library files from class were used in this experiment, most modified to suit the particular needs of the project. Additional scripts and functions were written to handle the running of the experiments. Methodology The analysis of the processed data was performed using Multi-Layer Perceptrons (MLP), utilizing a back-propagation learning algorithm. The equations can be summarized as follows: The error propagation back pass is: () i N ( 1) ( ) ( 1) ( 1) f ' ui ( k ) m (k ) wim (t ) L, (k ) m1 f ' u ( L ) ( k ) [ d ( k ) z ( L ) ( k )] L. i i i And the two equations to determine the weight update pass are: K E i( ) (k ) z (j 1) (k ) ( ) wij (t ) k 1 wij( ) (t 1) wij( ) (t ) E wij( ) (t ) wij( ) (t 1) ( ) wij (t ) The MLP consists of an input layer, which consists of the input feature data; a number of hidden layers, each with a number of hidden neurons; and an output layer. The hidden layers use hyperbolic tangent (tanh) activation functions while the output layer uses sigmoid activation functions. The sigmoid activation functions work better when outputs are scaled to between 0.2 and 0.8, which is the reason for doing so in data processing. The training of the MLP was performed with the training data set, in this case county data from Wisconsin. The testing was performed with either the training data set when training errors were needed and with the testing data set, the county data from Minnesota, when the final testing results were needed. The Learning Coefficient was set to be 0.4 and the momentum coefficient 0.8. These values were determined as part of an experiment, discussed further later. Determination of Network Structure The first step taken in the creation of an analytical MLP tool is determining what network structure to use. Many different configurations were tested by setting up an MLP with that configuration, performing training, then using the trained MLP to analyze the training data. The training error was found for each my measuring the total square error of each element. Configurations with exceptionally high training error were thrown out. The best performers were further evaluated by performing multiple trials and summing the errors. This error was compared to the errors produced by other configurations, and the best configuration chosen. The results of this comparison are shown: Configuration Trial 1 Error Trial 2 Error Trial 3 Error Trial 4 Error Trial 5 Error Trial 6 Error Trial 7 Error Trial 8 Error Trial 9 Error Trial 10 Error Total Error 15x5 0.889456 0.811848 0.704948 0.938739 0.735102 0.86532 0.788699 0.829218 0.961989 0.648543 8.173863 20x5 0.935048 0.741209 0.980096 0.743952 0.893687 0.92844 0.721015 0.791872 0.865029 0.716065 8.316413 15x3 0.721439 0.946445 0.786312 0.815924 0.780944 0.909761 0.962262 0.810785 0.709157 0.882153 8.325182 15x8 0.823031 0.913062 0.712313 0.991811 0.906314 0.869915 0.877011 0.96073 0.921192 0.889 8.86438 14x3 0.949631 0.682764 1.0061 0.929048 0.823046 0.878523 0.770735 0.839168 0.753341 0.831507 8.463862 8x4 0.864291 0.676449 0.800232 0.905284 0.786811 0.765392 0.825166 0.89271 0.849462 0.877225 8.243021 5x3 0.822366 0.798034 0.744853 0.791891 0.970131 0.802426 0.938739 0.890154 0.803414 0.778872 8.340881 The final configuration chosen was 15x5, which indicated 15 neurons in the hidden layers and 4 total hidden layers plus one output layer. Determination of Coefficients Next, an experiment was performed to determine the optimal setting for the learning and momentum coefficients. Using the network structure determined before, different coefficients were evaluated for their performance. This time, the trials were evaluated by two criteria. One was the square error as before, and the other was the maximum square error produced from all the data points. First, a large number were tested for an initial screening. The better performers were isolated for further testing, detailed below. Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Trial 9 Trial 10 Total α, μ = 0.2,0.5 Total Max Square Square Error Error 0.9524 0.0861 0.8459 0.0584 0.8562 0.0657 0.8055 0.0634 0.8501 0.0819 0.8693 0.0688 0.9642 0.0661 0.8084 0.0669 0.8651 0.0876 0.7102 0.0678 8.5273 0.7127 α, μ = 0.1,0.8 Total Max Square Square Error Error 0.9887 0.0651 0.9368 0.0997 0.6276 0.0501 0.7808 0.0771 0.9003 0.0945 0.8712 0.0726 0.863 0.0734 0.9234 0.0751 0.8566 0.0814 0.9186 0.0735 8.667 0.7625 α, μ = 0.4,0.8 Total Max Square Square Error Error 3.1269 0.0389 3.6762 0.0584 3.3724 0.0671 3.4069 0.0643 3.602 0.0676 3.4037 0.1188 3.4384 0.0596 3.601 0.0881 3.4643 0.0543 3.6253 0.0481 34.7171 0.6652 α, μ = 0.01, 0.5 Total Max Square Square Error Error 1.0604 0.1019 0.9735 0.0729 0.8484 0.0723 1.0305 0.1044 0.9709 0.0864 1.0792 0.0747 1.1311 0.1249 0.8182 0.0671 0.8831 0.076 0.8455 0.0617 9.6408 0.8423 α, μ = 0.6,0.8 Total Max Square Square Error Error 3.4711 0.0607 3.5553 0.0487 3.4959 0.0693 3.4841 0.0698 3.4718 0.0651 3.4879 0.0682 3.4861 0.0541 3.5458 0.068 3.4716 0.058 3.4743 0.0628 34.9439 0.6247 The chosen values for the learning and momentum coefficients are 0.2 and 0.5 respectively. Testing Against Training Data Next, the trained and configured MLP was tested against the training data to evaluate the performance so far. The criteria for this evaluation included the total square error, the max square error, and an additional test of classification rate to see which candidate the MLP decided the county would vote for compared with actual voting results. Here the MLP is attempting to make a prediction of who will win the popular vote of the county based on the predicted voting percentages, and is therefore a more difficult task due to the closeness between votes in many counties. 100 trials were performed of training and testing of the MLP. The results are summarized below: Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Total Square Error 0.7935 0.8454 0.9309 0.7057 0.9628 0.7859 0.7761 0.8668 0.8613 0.8420 0.8391 0.8424 0.7801 0.7799 0.7735 0.7181 1.0524 0.7658 0.8647 0.7479 0.8722 0.8409 0.9002 0.9044 0.8354 0.9474 0.8559 1.0559 0.6800 0.6815 1.1191 0.8938 0.8494 0.8450 0.7824 0.8824 0.8216 0.8822 0.8369 0.8011 0.8296 0.9624 0.7568 0.7391 0.7675 0.7400 Max Square Error 0.1080 0.0790 0.0865 0.0824 0.0968 0.0414 0.0512 0.0714 0.0693 0.0829 0.0652 0.0977 0.0620 0.0793 0.0609 0.0747 0.0807 0.0795 0.0559 0.0584 0.0700 0.0738 0.0814 0.0840 0.0652 0.0676 0.0695 0.1112 0.0672 0.0541 0.1316 0.0805 0.0673 0.0687 0.0667 0.0671 0.0827 0.0865 0.0883 0.0688 0.0660 0.1320 0.0782 0.0638 0.0389 0.0668 Classification Rate 0.7778 0.7778 0.7917 0.8056 0.8056 0.8056 0.7500 0.8056 0.7778 0.7500 0.7222 0.7778 0.7222 0.7361 0.7361 0.7500 0.8056 0.7639 0.8056 0.7639 0.8056 0.7361 0.7361 0.7917 0.7917 0.7917 0.7639 0.7222 0.7639 0.7639 0.8056 0.7639 0.7361 0.7639 0.7361 0.7778 0.7500 0.7500 0.7778 0.7500 0.7639 0.7778 0.7361 0.7917 0.7222 0.7500 Correct Classifications 56.0000 56.0000 57.0000 58.0000 58.0000 58.0000 54.0000 58.0000 56.0000 54.0000 52.0000 56.0000 52.0000 53.0000 53.0000 54.0000 58.0000 55.0000 58.0000 55.0000 58.0000 53.0000 53.0000 57.0000 57.0000 57.0000 55.0000 52.0000 55.0000 55.0000 58.0000 55.0000 53.0000 55.0000 53.0000 56.0000 54.0000 54.0000 56.0000 54.0000 55.0000 56.0000 53.0000 57.0000 52.0000 54.0000 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 0.8799 0.7199 0.8072 0.8719 0.8574 0.8965 0.8521 0.9080 0.8920 0.8507 0.8021 0.7270 0.6613 0.9072 0.8518 0.8954 0.8082 0.7965 1.0761 0.8317 0.8492 0.9011 0.7404 0.7979 0.8649 0.9108 0.9303 0.7469 0.7501 0.8051 0.8298 1.1192 0.8145 0.9637 0.8705 0.5819 0.9716 0.7284 0.9753 0.8956 0.9305 0.6456 0.7965 0.7891 0.7510 0.7657 0.7817 0.6483 0.8187 0.8620 0.0642 0.0601 0.0559 0.0588 0.0665 0.0731 0.0553 0.0556 0.0705 0.0718 0.0804 0.0723 0.0761 0.0706 0.0753 0.0944 0.0693 0.0297 0.1147 0.0897 0.0721 0.0739 0.0760 0.0858 0.1044 0.0817 0.0621 0.0825 0.0428 0.0741 0.0672 0.0985 0.1210 0.0939 0.0675 0.0288 0.0838 0.0617 0.0688 0.0618 0.0703 0.0297 0.0689 0.0582 0.0712 0.0689 0.0603 0.0351 0.0863 0.0645 0.7778 0.7639 0.7917 0.7917 0.8194 0.7639 0.6944 0.7778 0.7222 0.7639 0.7222 0.7778 0.7778 0.7500 0.7778 0.7917 0.7639 0.7500 0.7778 0.7778 0.7778 0.7639 0.7500 0.7778 0.8056 0.7500 0.7222 0.7361 0.7639 0.7778 0.7778 0.7917 0.7639 0.7500 0.7778 0.7639 0.7639 0.7639 0.7917 0.7639 0.7778 0.7778 0.7778 0.7361 0.7778 0.8056 0.7917 0.7778 0.7500 0.7361 56.0000 55.0000 57.0000 57.0000 59.0000 55.0000 50.0000 56.0000 52.0000 55.0000 52.0000 56.0000 56.0000 54.0000 56.0000 57.0000 55.0000 54.0000 56.0000 56.0000 56.0000 55.0000 54.0000 56.0000 58.0000 54.0000 52.0000 53.0000 55.0000 56.0000 56.0000 57.0000 55.0000 54.0000 56.0000 55.0000 55.0000 55.0000 57.0000 55.0000 56.0000 56.0000 56.0000 53.0000 56.0000 58.0000 57.0000 56.0000 54.0000 53.0000 97 98 99 100 Average 0.8736 0.9155 0.7323 0.8082 0.8368 0.0811 0.0816 0.0745 0.0726 0.0732 0.7639 0.7917 0.7222 0.8472 0.7675 55.0000 57.0000 52.0000 61.0000 55.2600 Testing Against Testing Data Finally, the trained and configured MLP was tested against the testing, which in this case is the application to an entirely different state: Minnesota. Minnesota is similar to Wisconsin in a number of ways, most importantly in the demographic composition of the state. In addition, it is different in other important ways, such as the political and economic tendencies, and therefore makes a good comparison. The criteria for this evaluation again included the total square error, the max square error, and classification rate. The results are summarized below: Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Total Square Error 1.748 1.8486 1.8273 1.8099 1.8162 1.8132 1.9981 1.7961 1.864 1.8386 1.8519 1.8899 1.7599 1.672 1.6927 1.756 1.8629 1.8327 1.7759 1.7943 1.8807 1.7642 1.7899 1.7999 1.8506 1.9034 Max Square Error 0.1005 0.1389 0.1393 0.1159 0.0962 0.1314 0.1272 0.1266 0.1082 0.1078 0.1251 0.1203 0.1014 0.0797 0.0724 0.1056 0.1153 0.1215 0.1051 0.1446 0.0946 0.1172 0.1134 0.0898 0.0855 0.1038 Classification Rate 0.7011 0.7011 0.7816 0.7586 0.7241 0.7701 0.6667 0.7701 0.6897 0.7356 0.7011 0.6897 0.7701 0.7126 0.7471 0.7701 0.7471 0.7701 0.7816 0.7471 0.7241 0.7701 0.7701 0.7241 0.7241 0.7471 Correct Classifications 61 61 68 66 63 67 58 67 60 64 61 60 67 62 65 67 65 67 68 65 63 67 67 63 63 65 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 1.9119 1.8503 1.8749 1.748 1.8485 1.7729 1.69 1.8061 1.8597 1.7511 1.826 1.7359 1.8854 1.8667 1.8069 1.7613 1.799 1.8376 1.7743 1.8991 1.7946 1.8624 1.9459 1.7988 1.7727 1.8577 1.8843 1.8083 1.7571 1.8365 1.7481 1.8276 1.84 1.6395 1.7308 1.8694 1.8201 1.7706 1.7251 1.9569 1.7963 1.7257 1.9231 1.842 1.7438 1.8028 1.8951 1.8768 1.8105 1.9133 0.1245 0.1097 0.1302 0.1054 0.0792 0.1165 0.075 0.0985 0.1354 0.119 0.091 0.1074 0.0878 0.1255 0.1206 0.1001 0.099 0.0975 0.1177 0.1159 0.1113 0.1283 0.0972 0.1002 0.1228 0.1193 0.1208 0.1207 0.115 0.0832 0.1111 0.1098 0.0876 0.1149 0.1187 0.1066 0.0938 0.0726 0.131 0.1361 0.1179 0.1015 0.1198 0.1035 0.1005 0.1332 0.134 0.1353 0.107 0.145 0.7356 0.7356 0.7816 0.7356 0.7471 0.7701 0.7701 0.7701 0.7586 0.7586 0.7471 0.7701 0.7701 0.7356 0.7586 0.7241 0.7701 0.7471 0.7701 0.7471 0.7586 0.7701 0.6552 0.7701 0.7356 0.7471 0.7586 0.7586 0.7586 0.7816 0.7701 0.7471 0.7471 0.7586 0.7701 0.6897 0.7471 0.7241 0.7931 0.7126 0.7701 0.7701 0.7471 0.7586 0.7356 0.7471 0.7701 0.7471 0.7241 0.6897 64 64 68 64 65 67 67 67 66 66 65 67 67 64 66 63 67 65 67 65 66 67 57 67 64 65 66 66 66 68 67 65 65 66 67 60 65 63 69 62 67 67 65 66 64 65 67 65 63 60 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Average 1.8098 1.7634 1.8445 1.9025 1.8114 1.8955 1.7216 1.7065 1.7466 1.7825 1.7952 1.8082 1.82 1.8987 1.8746 1.909 1.8671 1.9113 1.897 1.7034 1.7478 1.8667 1.7049 1.8847 1.8179 0.122 0.105 0.1178 0.1017 0.1306 0.1613 0.0913 0.0874 0.0955 0.1311 0.1203 0.1309 0.1444 0.1417 0.1304 0.1176 0.137 0.1077 0.0977 0.1014 0.099 0.1216 0.0687 0.1236 0.1123 0.7816 0.7816 0.7356 0.7586 0.7701 0.7701 0.7471 0.7471 0.7356 0.7701 0.7356 0.7356 0.7701 0.7816 0.7241 0.7011 0.7241 0.7586 0.7126 0.7701 0.7471 0.7356 0.7586 0.7701 0.7474 68 68 64 66 67 67 65 65 64 67 64 64 67 68 63 61 63 66 62 67 65 64 66 67 65.0200 Discussion of Results The results obtained from both the training data tests and the testing data tests show a very promising analysis method. The average classification rate among all the counties over 100 trials was 77% for Wisconsin and 75% for Minnesota, very impressive numbers. The ability to predict how counties will vote with 77% accuracy after training would be a very powerful tool in presidential elections. Even more impressive is that moving to a state that is different in political tendencies still yields a 75% classification rate. It is important to keep in mind that these predictions are made purely on the basis of demographic data. This information is readily accessible to anyone, is already extensively tracked by government agencies, is much easier to obtain than many other forms of information, and changes in the data are easily tracked. Such innocuous data could be easily overlooked due to the fact that on the surface, it has nothing to do with politics. However, as this analysis has demonstrated, even such simple data has good predictive power. It also demonstrates that at least some of the common generalizations about the voting tendencies of certain segments of the population have some element of truth. If that were not the case, this analysis would not have provided predictive power. An interesting topic for further research would be the relative importance of the different demographic features in making predictions. This would perhaps give new insight into the structure of society with respect to politics and identity groups. This is perhaps also a demonstration of the power of Multi-Layer Perceptrons as well as demographic data in making predictions about elections. The MLP is the mechanism that enables the predictions made. It also provides consistently good results – the standard deviation of the classification rate is less than 3% for both the Wisconsin and Minnesota data, which is better than that for most polls. Further research could perhaps make the MLP a useful tool in political elections on many levels throughout many different states. References: 1. 2004 Presidential Election voting results http://www.usatoday.com/news/politicselections/vote2004/results.htm 2. US Census 2000 Full Data Sets http://www.census.gov/popest/datasets.html 3. 2004 Presidential Election Results by County; Michael Gastner, Cosma Shalizi, and Mark Newman; University of Michigan http://www-personal.umich.edu/~mejn/election/