International Journal of Conceptions on Computing and Information Technology Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808 UPCAnalysis: Predictive Analysis of the Examinees’ Outcome in UPCAT for Rosales National High School, Philippines Aldous Val D. Basco, Jerald F. Dacumos, Ma. Kristine A. Dolor, Lian Grace Y. Perez, Zamora, Jennifer T. College of Computer Studies New Era University No. 9 Central Avenue New Era, Quezon City, Phillipines {aldousval, iamjfd.z, mariakristine.dolor, liangraceperez}@gmail.com, and jtzamora@neu.edu.ph Being the national college, a UP education is the pot of gold that students everywhere throughout the nation yearn for the University of the Philippines (UP) holds the crown as the nation's top school (QS, 2013)[4]. Keeping in mind the end goal to get admitted to the UP then again, the student needs to pass UPCAT, of the UP College Admission Test (Sicat et al 2009)[2]. In any case, unfortunately, not everybody achieves the end of the rainbow and gets to the pot of gold. In view of UP's legacy of magnificence, administration and honor and its type of genuine "Iskolars ng bayan", it is top decision among school entrance examinees. Abstract— Despite the fact that students from Rosales National High School are performing well, statistics show that few students pass the University of the Philippines College Aptitude Test. UPCAT comprises of four subtests: Language Proficiency, Science, Mathematics, and Reading Comprehension. The scores on this subtest are consolidated with the weighted average of final grades in the first three years of high school to determine qualification into UP. By the use of data mining, this study will formulate a forecasting model to help RNHS in predicting the next UPCAT passers. This study will be able to compare the methods and techniques of data mining (Linear Regression, Decision Tree, and Neural Networks) during the prediction of future UPCAT passers, applying the data collected from 4th year students that are in the first, second and third sections in Rosales National High School, that includes their grades from 1st-3rd year with some of their personal information such as Student No., Name, Address, and their General Weighted Average. The study would determine the future UPCAT results from Rosales National High School. In this study, the researcher will use data mining techniques that will help to predict the outcome of the University of the Philippines College Admission Test (UP, 2013)[6]. Data mining is the computerassisted procedure of burrowing through and breaking down huge arrangements of information and afterward extracting the significance of information (Alexander, 2015)[1]. It additionally predicts practices and future patterns, permitting organizations to make proactive, information driven choices. Also, data mining tools can answer business addresses that customarily were excessively tedious, making it impossible to determine. It is an interdisciplinary field with contributions from many areas such as pattern recognition and bioinformatics (Han and Kamber, 2000)[3]. Other predictive problems include forecasting and modeling. Modeling is basically the demonstration of building a model in view of information from circumstances where the answer is known and after that applying the model to different circumstances where the answers aren't known. Researcher used data of fourth year students who took the exam from the year 2008-2012. At the end of the study, a forecasting model will be developed and will be used to predict if the student passed or failed in UPCAT that can be considered long term because UP College Admission Test is held once a year. Keywords- Education, College entrance exam, UPCAT, Data mining I. INTRODUCTION Education is an effective driver of improvement and is one of the most grounded instruments for decreasing neediness and enhancing wellbeing, sexual orientation uniformity, peace, and solidness (The World Bank, 2015)[5]. Education gives kids, youth and grown-ups with the information and aptitudes to be dynamic subjects and to satisfy themselves as people. Moreover, it can change and to actuate change and advance in the public. It empowers individuals of the nation socialized and very much mannered that is the reason numerous students are expecting to study in a high institutionalized university. College entrance exam is an institutionalized inclination test to measure the aggregate learning in different aptitude zones, for example, verbal, math, expository and composing skills. These tests are not intended to quantify what they have learned in school yet demonstrate the performance of the student. II. OBJECTIVES This study aims to design a forecasting model that will predict the outcome for the examinees from Rosales National 68 International Journal of Conceptions on Computing and Information Technology Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808 High School who will take the University of the Philippines College Aptitude Test. And to know what are the indicators to be considered in designing a forecasting model for UPCAT, as well as the significant relationships of each variable. Also, this study focused on determining the best technique in data mining that can be used to come up with the most accurate result and the level of acceptance of the predictive model to develop. An entrance examination is conducted by educational institutions to determine whether prospective students are qualified to enter. It is also used to determine the candidate’s preparation for a course of study. Research data show that individually administered aptitude tests have the following qualities: a) excellent predictors of future scholastic performance, b) provide ways of comparing an individual’s performance with that of others in the same situation, c) provide a profile of strengths and weaknesses, d) assess differences among individuals, e) uncover hidden talents in individuals, thus improving their educational opportunities, and f) sere a valuable tools for working with the handicaps. The use of GPA, as a predictive factor of student success, has been used alone or in combination with other selective admission criteria. (Stuenkel, 2009). The study concentrated on high school students performance that would figure out if the student would likely pass the UPCAT or not and the different variables that would influence their scores in the exam. The data from the previous UPCAT passers grades and performance will be used by the researcher. The study is limited to design a forecasting model that will help to analyze the relationship between the performance rating and the dataset from the records of the students who passed the exam from school year 2008-2012 in Rosales National High School and the profiles from the current senior students. So with data mining techniques, the cycle is built in educational system which consists of forming hypothesis, testing and training, i.e. its utilization can be directed to the various acts of the educational process in accordance with specific needs. a)number of students; b) professors; and c.) administration and supporting administration. III. RELATED WORKS The following information gathered from reference book, journals, internet and other related materials helped the researcher to forecast the students’ outcome of the UPCAT. Application of data mining in educational systems can be directed to support the specific needs of each of the participants in the educational process. The student is required to recommend additional activities, teaching materials and tasks that would favor and improve his/her learning. Professors would have the feedback, possibilities to classify students into groups based on their need for guidance and monitoring, to find the most made mistakes, find the effective actions, etc. Administration and administrative staff will receive the parameters that will improve system performance (Romero et al. 2007). Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. Data mining typically deals with data that have already been collected for some purpose other than the data mining analysis (for example, they may have been collected in order to maintain an up-to-date record of all the transactions in a bank). This means that the objectives of the data mining exercise play no role in the data collection strategy. This is one way in which data mining differs from much of statistics, in which data are often collected by using efficient strategies to answer specific questions. IV. Simply state, data mining refers to extracting or “mining” knowledge from large amounts of data. The term is actually a misnomer. Remember that the mining of gold from rocksor sand is referred to as gold mining rather than rock or sand mining. Thus, data mining should have been more appropriately name “knowledge mining from data,” which is unfortunately somewhat long. “Knowledge mining,” a shorter term may not reflect the emphasis on mining from large amounts of data. Nevertheless, mining is a vivid term characterizing the process that finds a small of precious nuggets from a great deal of ram material (Kamber et al. 2000). METHODOLOGY OF THE RESEARCH Figure 1 shows the Architectural Design of the study. Fig. 1. Architechtural Design Data mining is one component of the exciting area of machine learning and adaptive computation. The goal of building computer systems that can adapt to their environments and learn from their experience has attracted researchers from many fields, including computer science, engineering, mathematics, physics, neuroscience, and cognitive science. The current senior profile will be divided into two sections: External and Internal. External data consists of Name, Address and Student Number. Internal data consists of the grades from 1st year to 3rd year together with the grades of UPCAT passers from 2008-2012. After saving the students’ data in a database, it will now be imported for data preprocessing that will use data mining techniques. 69 International Journal of Conceptions on Computing and Information Technology Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808 Data mining (called as knowledge or data discovery) is the procedure of interpreting data from alternate points of view and abridging it into helpful information - data that can be utilized to build income, cuts costs, or both. Data mining is one of various explanatory devices for interpreting information. Linear Regression, decision tree and neural networks are the techniques used in forecasting UPCAT passers in Rosales National High School. The purpose of the researcher in using the three data mining techniques is to determine which forecasting model will give the most accurate result. Table 1 shows the summary for Decision Tree in RapidMiner. The table indicates that 83.64% were correctly classified. The Class Precision column, also called as positive predictive value indicates the fraction of instances that is retrieved which is relevant, while the Class recall, also called as sensitivity is the fraction of relevant instances that are retrieved. Both precision and recall are accordingly based on comprehension and measure of relevance which implies that high recall means that an algorithm returned significantly more relevant results than irrelevant. As illustrated in Table 1, on the first row, the model’s precision is 4/7 which is actually 63.64% and the second row’s model precision is 13/80 which is actually 86.02% and that means, the model had returned more relevant results. While the model’s recall for the column True P obtained 35.00% and in column True F obtained 95.24% which actually indicates that the model got a high recall and means that the model returned most of the relevant results. Statistical analysis was used also by the researcher for this study. In Business Intelligence (BI), statistical analysis includes gathering and examining each data sample test in an arrangement of items from which tests can be drawn. Aside from data mining and statistical analysis, the researcher used the Correlational Research Design that attempts to explore relationships to make predictions. Constructive Research Design is also used in the study because it is a method that builds artifact that solves a domain problem in order to create knowledge about how problems can be solved. V. Figure 3 shows the Neural Network Model using WEKA data mining tool. In WEKA, the more attribute or perceptron the more accurate results it provided. RESULTS AND DISCUSSION Fig. 2. Decision Tree from RapidMiner Fig. 3. Neural Network Model using WEKA Figure 2 demonstrates the decision tree show that the Math, English and Science are the characteristics are the most influencing in the model, respectively. The P means Passed and the F means Failed. To interpret the tree, and reads through it until it reached the leaf node. TABLE I. SUMMARY TABLE FOR DECISION TREE FROM RapidMiner True P Pred. P Pred. F Class recall True F Class Precision 7 4 63.64% 13 35.00% 80 95.24% 86.02% **accuracy: 83.64 +/- (mikro: 83.65%) Fig. 4. Results of Neural Network using WEKA 70 International Journal of Conceptions on Computing and Information Technology Vol. 3, Issue. 3, October’ 2015; ISSN: 2345 - 9808 Figure 4 illustrates the result of the model. It indicates that 87.0149% of the instances are correctly satisfied. The mean absolute error of the model is 0.1241 which is almost 13% of the data that means the model obtained low accuracy rate. percentage error. Based from the accuracy rate, 92.7894% efficiency of the forecasting model outcome of the UPCAT passers means that the model would give reliable results as that percentage of efficiency record. The model also represents that 9 per 10 students or for every 100 examinees, 92.7894 have chances of being passed. TABLE II. COEFFICIENT TABLE FOR MULTIPLE LINEAR REGRESSIONS USING IBM SPSS Model Constant FIL_AVE ENG_AVE MATH_AVE SCI_AVE VI. CONCLUSION AND DISCUSSION The significant predictors that the forecasting model classified are the subjects English, Filipino, Mathematics and Science which are true as these subjects are the four subtests comprising UPCAT. The student’s performance in English, Science, Math and Filipino increases, then there would be an increase in the University Predicted Grade. That means that the UPG and the four variables are having significant correlation. Unstandardized Coefficients B Standard Error T Stat P-value 19.204 19.934 -0.620 0.293 -0.411 0.431 0.248 0.253 0.382 0.317 0.232 0.424 0.293 0.287 Independent Variable: UPG_Converted 0.371032 0.03884 0.083707 0.223149 0.157988 In the research, three data mining algorithms were applied on the assessment data to predict the future UPCAT passers in Rosales National High School either the examinees passed or failed. The best data mining technique for this study is the Multiple Linear Regression using IBM SPSS as it provides the most accurate results. The 92.7894% efficiency of the forecasting model outcome of the UPCAT passers means that the model would give reliable results as that percentage of efficiency record. The model also represents that 9 per 10 students or for every 100 examinees, 92.7894 have chances of being passed. In table II, column 1 shows the predictor variables (constant, FIL_AVE, ENG_AVE, MATH_AVE, and SCI_AVE). In Multiple Linear Regression, several independent variables or functions are there. Adding a term x to the preceding regression gives equation: Y=B0+B1X1+B2X2+B3X3+B4X4 (1) The first variable Constant represents itself, also referred to as independent variable, UPG_Converted as the Yintercept, the height of the regression line when it crossed the Y-axis. The second column, B values are used to predict the dependent variables from independent variable. The B coefficients of FIL_AVE, ENG_AVE, MATH_AVE, and SCI_AVE indicate that for every unit increase, UPG_Converted is predicted. The efficiency of the forecasting model can be determined using the standard errors and using the formula of Mean Absolute Percentage Error. The future developers should use this model as the study formulated a forecasting model that can be extended with more distinctive attributes that will obtain accurate results and be useful to improve the students learning outcome. REFERENCES [1] [2] [3] (2) [4] [5] [6] [7] The absolute value in this calculation is summed for every fitted or forecasted point in time and divide it again by the number of fitted points n, multiplying it by 100 makes it a 71 Doug Alexander, “Data Mining”, unpublished. Gerardo P. Sicat and Marian Panganiban, “High School Background and Academic Performance”, August, 2009, in press. Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques, 2nd edition”, 2000. QS Quacquarelli Symonds, “Top Universities”, 2013, unpublished. The World Bank Group, “Education”, 2015, unpublished. University of the Philippines, 2015. Novelozo, Diaz, “Predictive Analysis of Examinees outcome in UPCAT for Rosales National High School”, unpublished.