CS 487/519 APPLIED MACHINE LEARNING I AT NEW MEXICO STATE UNIVERSITY UNDER DR. HUIPING CAO 1 ”Pokémon Predictor” Project Stage III Ziad Arafat† Student, NMSU, Angel Camacho† Student, NMSU, Mathew Groover† Student, NMSU, Jason Ivey, Member, NMSU Nano-Sat Lab, Abstract—This report details the preliminary motivation and description of the ”Pokémon Predictor,” assignment in CS 487/519 Applied Machine Learning I at New Mexico State University. Keywords—Computer Society, IEEEtran, journal, LATEX, Machine Learning, PyTorch. ✦ 1 M OTIVATION O U r machine learning case study focuses on a Pokémon Battle Predictor. In recent years online competitive games have had a boom in popularity, Pokémon battles are no exception to this event. Our motive is very simple, we want this battle predictor to enhance and give the user an advantage over their competitor. For effective use the battle predictor will have one situational condition, it must be used before the battle takes place. NMSU March 16th, 2022 1.1 INTRODUCTION The problem that arises within our case study relies on predicting which party is • • • • A. Camacho is a student in the Computer Science department of New Mexico State University. E-mail: angelcam@nmsu.edu M. Groover is a student in the Computer Science department of New Mexico State University. E-mail: mgroov@nmsu.edu J. Ivey is a student in the Computer Science department of New Mexico State University, as well as a memeber of the INCA/SAS-Sat Nano-Satellite teams. E-mail: jiveyguy@nmsu.edu Z. Arafat is a student in the Computer Science department of New Mexico State University. E-mail: ziada@nmsu.edu Manuscript received March 16, 2022; revision is scheduled for late March 2022. more likely to win given the input parameters (Pokémon and its attributes or “stats”). The training data consist of a list of matchups and their results (Target Classes). 1.2 PROPOSED SOLUTION Our proposed method to solve our challenge will rely on the implementation of four major steps, beginning with analyzing and visualizing the data. Firstly, collect and analyze data about the classes, such as size, noise, and randomness. We will then determine which data is the most relevant to our case. Secondly, we will use pre-processing techniques to restructure and augment our data to meet our problem definition and other needs. This includes dimension reduction, standardisation, pipelines, and the restructuring of the data. In the case of dimension reduction, we will experiment and test different methods to determine which is the most optimal for dimension reduction. This includes PCA, Kernel PCA, and LDA reduction techniques. While for the case of Standardisation, we will select the appropriate algorithm to standardize the data. This is due to the fact that different dimension reduction and machine learning algorithms ask for different methods to standardize data. CS 487/519 APPLIED MACHINE LEARNING I AT NEW MEXICO STATE UNIVERSITY UNDER DR. HUIPING CAO In the use of pipelines, we will test different methods and hyperparameters, which will be done in sci-kit learn, scikit learn includes a library that allows and facilitates this process. When structuring data, it will be important to note the fact that we will not only include the data that the Kaggle dataset provides but the appropriate and specific corresponding attributes of the data. What this means is that we will combine the corresponding attributes with the Kaggle dataset, which in turn provides more features. Furthermore, the “ID” feature will not be implemented in the training, this is because the goal is only to train based on the attributes and not the individual ID, this will facilitate the work of the algorithm with new data. Thirdly, In order to properly train our models, we are implementing a pipeline training model. We are using a predetermined set of data that will enhance the speed of data training. By using a pipeline system we will be able to rapidly iterate and change hyperparameters. As well as use multiple models to determine the proper model for our data. Once we determine which model works best through parameters such as accuracy, time, and precision. We will then seek to improve the selected model further and finally conclude with our final battle predictor model. Finally, the goal will be a standalone python executable. It will use the researched and complete models to predict which Pokemon will win. Our focus in this standalone application is a quick and usable general user interface(GUI). We wish to use ideas such as spacing signifiers and color theory to make an intuitive and discoverable interface. We are assuming in our design that the final user can read and use a keyboard with relative ease. 2 R EFERENCES [1] J. Bouchet, Pokemon battles, (October 2017). Retrieved February 23, 2022 from https://www.kaggle.com/jonathanbouchet/pokemonbattles/report CS 487/519 APPLIED MACHINE LEARNING I AT NEW MEXICO STATE UNIVERSITY UNDER DR. HUIPING CAO Angel Camacho did not write his biography. Mathew Groover is a senior of computer science at NMSU graduating this may. He completed a full internship in the summer of 2020. He enjoys programming games on the side as a hobby. Jason Ivey is a current senior at New Mexico State University focusing on artificial intelligence and minoring in Electrical Engineering. He is a current member of NMSU’s Electrical Engineering Department’s Nano-Sat lab and previously successfully delivered a nano-satellite to a publically traded spaceflight company. This satellite was NMSU’s first satellite mission and the first instance of NMSU code in low-earth-orbit. Ziad Arafat is a computer science major at NMSU with a strong career interest in AI and data science. Ziad seeks to apply artificial intelligence for safety inspection systems and image analysis. He also enjoys finding clever ways to automate tedious business tasks at his work. ACKNOWLEDGMENTS Thank you to Dr. Huiping Cao, Shahriar Rahman Dipon, and Erick Draayer of NMSU, as well as: IEEE and Overleaf and the LATEX community for making documentation and presets readily available to us during this paper. 3