Artificial Neural Network Prediction of Major League Baseball Teams Winning Percentages Scott Wiese ECE 539 Professor Hu Motivation Current trends in managing player personnel focuses heavily on statistics to weigh future production against potential salaries. Used to determine whether or not to sign specific players Determine if current players are overpaid Motivation Claimed that statistics can be a valid predictor of both a player’s and team’s production Claimed that one season, 162 games, is a long enough trial period that statistics can predict a team’s winning percentage Goals Can I develop an artificial neural network that when given a team’s statistics for a year that will accurately predict a team’s winning percentage? Data Collection Collected 3 years of data for all 30 Major League Baseball teams Gathered from statistical database available on www.MLB.com 74 statistics besides winning percentage gathered Neural Network Selection Back Trained Multi Layer Perceptron Excellent at analyzing large feature sets Supervised Training Good at classification problems Preprocessing Normalized each feature vector Used singular value decomposition to emphasize most important features Testing Wanted to determine which MLP configuration would best predict winning percentage Baseline MLP: 1 hidden layer, 1 hidden neuron Tested MLPs: 1 through 5 hidden layers, 1, 3, or 5 hidden neurons in all layers Testing Results Average Success Rates 1 hidden layer 2 hidden layers 1 neuron 33.33 56.67 3 neurons 45.56 35.56 5 neurons 32.22 45.56 3 hidden layers 50 41.11 43.33 4 hidden layers 60 31.11 40 Testing Results % Success Rate Average Success Rates - Exact Matching 65 60 55 50 45 40 35 30 25 20 1 neuron 3 neurons 5 neurons 0 1 2 3 Hidden Layers 4 5 6 Testing Now that we know the 4 hidden layers, 1 hidden neuron network performed the best, test it again against the baseline with new data Success when predicted winning percentage within +/- 0.15 Testing Results Final Testing Baseline Best MLP Trial 1 23.33 73.33 Trial 2 16.67 43.33 Trial 3 Mean 33.33 24.44 26.67 47.78 Best MLP’s performance almost twice as good as baseline’s performance. Preliminary Conclusions Advanced MLP structure is better at predicting a team’s winning percentage. Unfortunately, still under 50% given a .15 error bound Can classification work better Classification Testing Classify teams into 3 groups Division winners (> .590) Winning teams (.500<x<.589) Losing teams (<.500) Same process as above Classification Results Average Success Rates 1 hidden layer 2 hidden layers 3 hidden layers 4 hidden layers 1 neuron 55.5556 61.1111 56.6667 53.3333 3 neurons 58.8889 61.1111 66.6667 57.7778 5 neurons 66.6667 62.2222 73.3333 60 3 hidden layers with 5 hidden neurons is best Classification Results Average Success Rates - Classification % Success Rate 75 70 65 1 neuron 60 3 neurons 55 5 neurons 50 45 40 0 1 2 3 Hidden Layers 4 5 6 Classification Results Again, now that we know the best advanced network, test it against the baseline with more data. Classification Results Final Testing Trial 1 Trial 2 Trial 3 Mean Baseline 66.6667 63.333 60 63.333 Best MLP 66.6667 63.333 63.3333 64.444 Negligible difference between the two networks even though there was nearly a 50% improvement in the original trial. Conclusions Advanced network better at pure prediction than baseline Still a very moderate success rate given the error bounds Classification results very promising Shows that statistics are important in separating teams’ results