See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261019848 Predicting GPA and academic dismissal in LMS using educational data mining: A case mining Conference Paper · February 2012 DOI: 10.1109/ICELET.2012.6333365 CITATIONS READS 43 670 3 authors: Mahdi Nasiri Behrouz Minaei ToobaBigDataScience company Iran University of Science and Technology 30 PUBLICATIONS 593 CITATIONS 353 PUBLICATIONS 4,543 CITATIONS SEE PROFILE Fereydoon Vafaei Colorado State University 3 PUBLICATIONS 54 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Secure Data Over Voice Communication View project Virastyar View project All content following this page was uploaded by Behrouz Minaei on 02 January 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 6th National and 3rd International conference of e-Learning and e-Teaching(ICELET2012) Predicting GPA and Academic Dismissal in LMS Using Educational Data Mining: A Case Mining Mahdi Nasiri Nasiri_m@iust.ac.ir Behrouz Minaei minaei_b@iust.ac.ir Fereydoon Vafaei, IUST vafaee@vu.iust.ac.ir Increasing in instrumented educational software and databases of student test scores has created large repositories of data reflecting how students tread learning track. EDM focuses on computational approaches for using those data to address important educational questions. [6] In fact, EDM is utilizing data mining methods, some of which have predictive applications such as classification [9] whereas others such as clustering are considered descriptive [10], in the field of education and pedagogy. Abstract— In this paper, we describe an educational data mining (EDM) case study based on the data collected from learning management system (LMS) of elearning center and electronic education system of Iran University of Science and Technology (IUST). Our main goal is to illustrate the applications of EDM in the domain of e-learning and online courses by implementing a model to predict academic dismissal and also GPA of graduated students. The monitoring and support of freshmen and first year students are considered very significant in many educational institutions. Consequently, if there are some ways to estimate probability of dismissal, drop out and other challenges within the process of the graduation, and also capable tools to predict GPA or even semester by semester grades, the university officials can design and improve more efficient strategies for education systems especially for e-learning ones which include less known and more complicated problems. To achieve the mentioned goal, a common methodology of data mining has been utilized which is called CRISP. Our results show that there can be confident models for predicting educational attributes. Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. b. Analyzing educational datasets has become widespread nowadays. There are many related conferences, workshops and symposiums around the world along with academic studies in this domain. [4, 5, 7, 8] These researches also include case studies to measure accuracy of discussed methods. In the Netherlands, there is the legal obligation that universities have to provide students with the necessary support to evaluate their study choice. A case study has been performed to predict electrical engineering students drop out in Department of Electrical Engineering in Eindhoven University of Technology. The experimental results show that rather simple and intuitive classifiers (decision trees) give a useful result with accuracies between 75 and 80%. [4] However, lessons drawn from a small scale case study in Department of Computer Science and Information Systems in University of Jyväskylä in Finland, show that even with a modest size dataset and well-defined problems it is still rather hard to obtain meaningful and truly insightful results with a set of traditional data mining (DM) approaches and Keywords- Educational Data Mining (EDM), Prediction, C5.0 Algorithm, Regression I. Introduction a. What’s EDM? Educational Data Mining (EDM) is an interdisciplinary field bringing together researchers from computer science, education, psychology, psychometrics, and statistics to analyze large data sets to answer educational research questions. [3] 978-1-4673-0957-8/12/$31.00 ©2012 IEEE Related Work 53 techniques including clustering, classification and association analysis. [5] There have also been some researches to predict educational attributes such as grades or GPA. [11] We have attempted to provide a confident model to predict some of these educational fields including GPA and the probability of the academic dismissal. c. Data Preparation This is one of the most time-consuming phases. First, we needed to extract e-learning students from others as all student data are stored in a unique database including students studying in traditional system and those studying in hybrid systems. Using SQL queries, this task was done at the first stage. It is also important to detect noise in data which would cause errors later. No noise was detected at this stage. Then at feature selection stage, 13 fields were selected to be used in creating the model. Passed courses, uncompleted courses, failed courses and deleted courses were among them. One feature was ignored during this stage. It was Semester Enroll Time Status ID since the single category was too long. Besides, to avoid over-fitting which causes remarkable errors in results, three categories of dismissed students were merged into one category. There are many reasons among students for leaving education incompletely. Self reasons for withdrawing and some other causes are added to academic dismissal because they are considered as minority in comparison with it. II. Methodology The data mining process must be reliable and repeatable by people with little data mining skills. Therefore, there is a ubiquitous methodology which is called Cross-Industry Standard Process for Data Mining (CRISP) initiatively launched in late 1996. [2] We have utilized CRISP in this paper. It consists of six phases which are going to be described as follows. Although we have focused on prediction methods, we have also performed some sequence mining analysis and obtained some association rules as results which are presented in Appendix 1. Since we haven’t concentrated on descriptive models, these results haven’t been discussed in conclusion but can be regarded for further work. a. Business Understanding This is the first phase including project objectives, requirement understanding and data mining problem definition. Our major goal is to predict academic dismissal of students and also find a model to predict GPA of graduated students in e-learning center of IUST. Regression and classification (C5.0 algorithm) have been used for predicting those issues. The objectives have been intended to assist university officials with evaluating and monitoring the efficiency of the system and it may also be applied for a supporting system for students. d. Modeling Two common methods of prediction in data mining are regression and C5.0 algorithm (a type of decision tree). We have utilized regression analysis for predicting GPA and C5.0 algorithm to predict academic dismissal. Straight-line regression analysis involves a response variable, y, and a single predictor variable, x. It is the simplest form of regression, and models y as a linear function of x. [1] That is, y = b + wx (1) b. Data Understanding The second phase consists of initial data collection and familiarization and data quality problems identification. Since electronic education system of IUST has been designed by means of a relational database and has been implemented by Microsoft SQL Server, we had to analyze the records of the tables of the database and understand the relations between them. There were 3 major tables used in our project each of which has some fields of student data such as ID, nationality, age, educational history, passed courses, failed courses and deleted courses. Furthermore, some other analytical graphs and statistical charts have been created by SPSS Clementine 12.0 for a better perception of the relations between the fields. Some of these graphs and charts are presented in Appendix 2. where the variance of y is assumed to be constant, and b and w are regression coefficients specifying the Y-intercept and slope of the line, respectively. The regression coefficients, w and b, can also be thought of as weights, so that we can equivalently write, y = w0 + w1x (2) As C5.0 algorithm is a profitable business one, there is no description available for it in public although we have used SPSS Clementine 12.0 which profits this algorithm for classification. e. Evaluation Business objectives and issue achievement evaluation are discussed in this phase which be argued later in results section of this paper. 54 f. outcomes of such case mining can detect new ways of supporting and monitoring educational processes in complicated systems of online courses. Eventually, it is important to evaluate and validate the proposed analysis with other types of data along with sharing the results with experts to gain their comments. In this sense, future work may also include applying this analysis for data of different educational systems so as to improve the methods. Deployment Result model deployment and repeatable data mining process implementation are considered in this phase. Obviously, case study projects are highly dependent on their data which may vary in the same projects but the significant point is their result accuracy can assure researchers of usefulness of these models. III. Results Below are the results of our analysis using SPSS Clementine 12.0 . a. V. Appendix 1 Regression Analysis Results for GPA Here, we present some results of sequence mining analysis in the format of association rules. As it was already mentioned in this paper, association rules are considered as descriptive models of data mining. Format: Antecedent => Consequent, Support=X, Confidence=X 1-If average of a random student is B in a semester then he will repeat B average for the next semester, Support=68.936, Confidence=54.115 2-If average of a random student is A in a semester then he will gain B average for the next semester, Support= 60.851, Confidence= 53.613 3- If average of a random student is B in a semester then he will gain A average for the next semester, Support= 60.851, Confidence= 50.412 4- If average of a random student is B in a semester then he will gain D average for the next semester, Support= 68.936, Confidence= 35.802 5- If average of a random student is A in a semester then he will gain D average for the next semester, Support= 60.851, Confidence= 34.965 These results show that great variations in average in two sequent semesters are not supported by a great confidence. Consequently, there is some kind of stability in average field among the students. Some main features of the analysis are as follows: Variables Entered: Unfinished Units, Rehearsal Units, Semester Education Status ID, Deleted Units, Counted Units, Semester Admission Status ID, Failed Unit, Semester Num Course Status ID, Taken Unit, Passed Unit and Effective Units. R=0.795 in which predictors are: Unfinished Units, Rehearsal Units, Semester Education Status ID, Deleted Units, Counted Units, Semester Admission Status ID, Failed Units, Semester Number of Courses Status ID, Taken Units, Passed Units and Effective Units. R Square=0.632, Adjusted R Square=0.631, Std. Error of the Estimate=4.020549. Analysis Of Variance (ANOVA): Regression: Sum of Squares=102994.643, df =12, Mean Square= 8582.887, F=530.961, Sig.=0.000, Dependent Variable = Average. b. C5.0 Algorithm Results for Academic Dismissal Currently, 17 rules have been obtained through this. Some of them are: Among bachelor students, if Total Rehearsal Units=2 and Total Unfinished Units=0 and Military Status of student is in Exempt Mode then he is expelled. Among master students, if Total Unfinished Units = {10.000 16.000 20.000}, then he/she is expelled and if Total Unfinished Units=3 then he/she has withdrawn. The accuracy of the model has been measured and equals to 88.5%. IV. Conclusion This work has presented analysis of real data of an e-Learning system. Regression and C5.0 algorithm of classification reveal some weaknesses of utilizing EDM since they are strongly dependent on the distribution of the data. Therefore, small variations in the data could cause different conclusions. According to this point, the future work may include enhancing these techniques by other data mining methods such as association rules and sequence mining. However, 55 VI. Appendix 2 [a] The filtered relation graph between the number of taken units(4,6,11,12,14) of courses and the gained average(A,B,C) [c] Sex on Study Period (Bachelor= 2.0, Master= 3.0) [b] The unfiltered relation graph between the number of taken units(4,6,12,14,16) of courses and the gained average(A,B,C,F) while thickness of lines indicates the frequency [d] Distribution of taken units and deleted units 56 [d] Distribution of taken units and their scores [f] Distribution of scores and taken units [e] Distribution of taken units and failed units [g] Distribution of GPA (A to F and not exactly mentioned: N, R, X) 57 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", Second Edition, University of Illinois at Urbana-Champaign, 2006 Pang-Ning Tan, Michael Steinbach and Vipin Kumar, "Introduction to Data Mining", Addison Wesley Publishers, Boston, USA, 2006 Romero, C. and Ventura, S. (Eds.), "Data Mining in e-Learning", 2006, pp. 261-278 Gerben W. Dekker, Mykola Pechenizkiy and Jan M. Vleeshouwers, "Predicting Students Drop Out: A Case Study", 2nd International Conference On Educational Data Mining Proceedings, Cordoba, Spain, July 1-3, 2009 Mykola Pechenizkiy, Toon Calders, Ekaterina Vasilyeva and Paul De Bra, "Mining the Student Assessment Data: Lessons Drawn from a Small Scale Case Study", The 1st International Conference on Educational Data Mining Proceedings, Montréal, Québec, Canada, June 20-21, 2008 C. Romero and S. Ventura, "Educational data mining: A survey from 1995 to 2005", Department of Computer Sciences, University of Cordoba, Cordoba, Spain, 2005 Hamalainen, W., Suhonen, J., Sutinen, E., & Toivonen, H., "Data mining in personalizing distance education courses", In World conference on open learning and distance education, Hong Kong, 2004 Hanna, M., "Data mining in the e-learning domain", Computers & Education Journal, 42(3), 267–287, 2004 Cristóbal Romero, Sebastián Ventura, Pedro G. Espejo and César Hervás, "Data Mining Algorithms to Classify Students", The 1st International Conference on Educational Data Mining Proceedings, Montréal, Québec, Canada, June 20-21, 2008 Elizabeth Ayers, "Rebecca Nugent and Nema Dean, Skill Set Profile Clustering Based on Weighted Student Responses", The 1st International Conference on Educational Data Mining Proceedings, Montréal, Québec, Canada, June 20-21, 2008 Amelia Zafra and Sebastan Ventura, "Predicting Student Grades in Learning Management Systems with Multiple Instance Genetic Programming", 2nd International Conference On Educational Data Mining Proceedings, Cordoba, Spain, July 1-3, 2009 58 View publication stats