Uploaded by Marwan Abdullah

3IEEE

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261019848
Predicting GPA and academic dismissal in LMS using educational data mining:
A case mining
Conference Paper · February 2012
DOI: 10.1109/ICELET.2012.6333365
CITATIONS
READS
43
670
3 authors:
Mahdi Nasiri
Behrouz Minaei
ToobaBigDataScience company
Iran University of Science and Technology
30 PUBLICATIONS 593 CITATIONS
353 PUBLICATIONS 4,543 CITATIONS
SEE PROFILE
Fereydoon Vafaei
Colorado State University
3 PUBLICATIONS 54 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Secure Data Over Voice Communication View project
Virastyar View project
All content following this page was uploaded by Behrouz Minaei on 02 January 2015.
The user has requested enhancement of the downloaded file.
SEE PROFILE
6th National and 3rd International conference of e-Learning and e-Teaching(ICELET2012)
Predicting GPA and Academic Dismissal in LMS
Using Educational Data Mining: A Case Mining
Mahdi Nasiri
Nasiri_m@iust.ac.ir
Behrouz Minaei
minaei_b@iust.ac.ir
Fereydoon Vafaei, IUST
vafaee@vu.iust.ac.ir
Increasing in instrumented educational software and
databases of student test scores has created large
repositories of data reflecting how students tread
learning track. EDM focuses on computational
approaches for using those data to address important
educational questions. [6]
In fact, EDM is utilizing data mining methods, some
of which have predictive applications such as
classification [9] whereas others such as clustering are
considered descriptive [10], in the field of education
and pedagogy.
Abstract— In this paper, we describe an educational
data mining (EDM) case study based on the data
collected from learning management system (LMS) of elearning center and electronic education system of Iran
University of Science and Technology (IUST). Our main
goal is to illustrate the applications of EDM in the
domain of e-learning and online courses by
implementing a model to predict academic dismissal and
also GPA of graduated students. The monitoring and
support of freshmen and first year students are
considered very significant in many educational
institutions. Consequently, if there are some ways to
estimate probability of dismissal, drop out and other
challenges within the process of the graduation, and
also capable tools to predict GPA or even semester by
semester grades, the university officials can design and
improve more efficient strategies for education systems
especially for e-learning ones which include less known
and more complicated problems.
To achieve the mentioned goal, a common methodology
of data mining has been utilized which is called CRISP.
Our results show that there can be confident models for
predicting educational attributes. Currently there is an
increasing interest in data mining and educational
systems, making educational data mining as a new
growing research community.
b.
Analyzing educational datasets has become
widespread nowadays. There are many related
conferences, workshops and symposiums around the
world along with academic studies in this domain. [4,
5, 7, 8]
These researches also include case studies to
measure accuracy of discussed methods. In the
Netherlands, there is the legal obligation that
universities have to provide students with the
necessary support to evaluate their study choice.
A case study has been performed to predict
electrical engineering students drop out in Department
of Electrical Engineering in Eindhoven University of
Technology. The experimental results show that
rather simple and intuitive classifiers (decision trees)
give a useful result with accuracies between 75 and
80%. [4]
However, lessons drawn from a small scale case
study in Department of Computer Science and
Information Systems in University of Jyväskylä in
Finland, show that even with a modest size dataset
and well-defined problems it is still rather hard to
obtain meaningful and truly insightful results with a
set of traditional data mining (DM) approaches and
Keywords- Educational Data Mining (EDM), Prediction,
C5.0 Algorithm, Regression
I. Introduction
a. What’s EDM?
Educational
Data
Mining
(EDM)
is
an
interdisciplinary field bringing together researchers
from computer science, education, psychology,
psychometrics, and statistics to analyze large data sets
to answer educational research questions. [3]
978-1-4673-0957-8/12/$31.00 ©2012 IEEE
Related Work
53
techniques including clustering, classification and
association analysis. [5]
There have also been some researches to predict
educational attributes such as grades or GPA. [11]
We have attempted to provide a confident model to
predict some of these educational fields including
GPA and the probability of the academic dismissal.
c. Data Preparation
This is one of the most time-consuming phases. First,
we needed to extract e-learning students from others
as all student data are stored in a unique database
including students studying in traditional system and
those studying in hybrid systems. Using SQL queries,
this task was done at the first stage.
It is also important to detect noise in data which
would cause errors later. No noise was detected at this
stage.
Then at feature selection stage, 13 fields were
selected to be used in creating the model. Passed
courses, uncompleted courses, failed courses and
deleted courses were among them. One feature was
ignored during this stage. It was Semester Enroll
Time Status ID since the single category was too
long.
Besides, to avoid over-fitting which causes
remarkable errors in results, three categories of
dismissed students were merged into one category.
There are many reasons among students for leaving
education incompletely. Self reasons for withdrawing
and some other causes are added to academic
dismissal because they are considered as minority in
comparison with it.
II. Methodology
The data mining process must be reliable and
repeatable by people with little data mining skills.
Therefore, there is a ubiquitous methodology which is
called Cross-Industry Standard Process for Data
Mining (CRISP) initiatively launched in late 1996. [2]
We have utilized CRISP in this paper. It consists of
six phases which are going to be described as follows.
Although we have focused on prediction methods, we
have also performed some sequence mining analysis
and obtained some association rules as results which
are presented in Appendix 1. Since we haven’t
concentrated on descriptive models, these results
haven’t been discussed in conclusion but can be
regarded for further work.
a. Business Understanding
This is the first phase including project objectives,
requirement understanding and data mining problem
definition.
Our major goal is to predict academic dismissal of
students and also find a model to predict GPA of
graduated students in e-learning center of IUST.
Regression and classification (C5.0 algorithm) have
been used for predicting those issues.
The objectives have been intended to assist
university officials with evaluating and monitoring
the efficiency of the system and it may also be applied
for a supporting system for students.
d. Modeling
Two common methods of prediction in data mining
are regression and C5.0 algorithm (a type of decision
tree).
We have utilized regression analysis for predicting
GPA and C5.0 algorithm to predict academic
dismissal.
Straight-line regression analysis involves a response
variable, y, and a single predictor variable, x. It is the
simplest form of regression, and models y as a linear
function of x. [1]
That is,
y = b + wx (1)
b. Data Understanding
The second phase consists of initial data collection
and familiarization and data quality problems
identification.
Since electronic education system of IUST has been
designed by means of a relational database and has
been implemented by Microsoft SQL Server, we had
to analyze the records of the tables of the database and
understand the relations between them.
There were 3 major tables used in our project each
of which has some fields of student data such as ID,
nationality, age, educational history, passed courses,
failed courses and deleted courses.
Furthermore, some other analytical graphs and
statistical charts have been created by SPSS
Clementine 12.0 for a better perception of the
relations between the fields. Some of these graphs and
charts are presented in Appendix 2.
where the variance of y is assumed to be constant,
and b and w are regression coefficients specifying the
Y-intercept and slope of the line, respectively. The
regression coefficients, w and b, can also be thought
of as weights, so that we can equivalently write,
y = w0 + w1x (2)
As C5.0 algorithm is a profitable business one, there
is no description available for it in public although we
have used SPSS Clementine 12.0 which profits this
algorithm for classification.
e. Evaluation
Business objectives and issue achievement evaluation
are discussed in this phase which be argued later in
results section of this paper.
54
f.
outcomes of such case mining can detect new ways of
supporting and monitoring educational processes in
complicated systems of online courses.
Eventually, it is important to evaluate and validate
the proposed analysis with other types of data along
with sharing the results with experts to gain their
comments. In this sense, future work may also include
applying this analysis for data of different educational
systems so as to improve the methods.
Deployment
Result model deployment and repeatable data mining
process implementation are considered in this phase.
Obviously, case study projects are highly dependent
on their data which may vary in the same projects but
the significant point is their result accuracy can assure
researchers of usefulness of these models.
III. Results
Below are the results of our analysis using SPSS
Clementine 12.0 .
a.
V. Appendix 1
Regression Analysis Results for GPA
Here, we present some results of sequence mining
analysis in the format of association rules. As it was
already mentioned in this paper, association rules are
considered as descriptive models of data mining.
Format: Antecedent => Consequent, Support=X,
Confidence=X
1-If average of a random student is B in a semester
then he will repeat B average for the next semester,
Support=68.936, Confidence=54.115
2-If average of a random student is A in a semester
then he will gain B average for the next semester,
Support= 60.851, Confidence= 53.613
3- If average of a random student is B in a semester
then he will gain A average for the next semester,
Support= 60.851, Confidence= 50.412
4- If average of a random student is B in a semester
then he will gain D average for the next semester,
Support= 68.936, Confidence= 35.802
5- If average of a random student is A in a semester
then he will gain D average for the next semester,
Support= 60.851, Confidence= 34.965
These results show that great variations in average
in two sequent semesters are not supported by a great
confidence. Consequently, there is some kind of
stability in average field among the students.
Some main features of the analysis are as follows:
Variables Entered: Unfinished Units, Rehearsal Units,
Semester Education Status ID, Deleted Units,
Counted Units, Semester Admission Status ID, Failed
Unit, Semester Num Course Status ID, Taken Unit,
Passed Unit and Effective Units.
R=0.795 in which predictors are: Unfinished Units,
Rehearsal Units, Semester Education Status ID,
Deleted Units, Counted Units, Semester Admission
Status ID, Failed Units, Semester Number of Courses
Status ID, Taken Units, Passed Units and Effective
Units.
R Square=0.632, Adjusted R Square=0.631, Std.
Error of the Estimate=4.020549.
Analysis Of Variance (ANOVA):
Regression: Sum of Squares=102994.643, df =12,
Mean Square= 8582.887, F=530.961, Sig.=0.000,
Dependent Variable = Average.
b.
C5.0 Algorithm Results for Academic
Dismissal
Currently, 17 rules have been obtained through this.
Some of them are:
Among bachelor students, if Total Rehearsal
Units=2 and Total Unfinished Units=0 and Military
Status of student is in Exempt Mode then he is
expelled.
Among master students, if Total Unfinished Units =
{10.000 16.000 20.000}, then he/she is expelled and
if Total Unfinished Units=3 then he/she has
withdrawn.
The accuracy of the model has been measured and
equals to 88.5%.
IV. Conclusion
This work has presented analysis of real data of an
e-Learning system. Regression and C5.0 algorithm of
classification reveal some weaknesses of utilizing
EDM since they are strongly dependent on the
distribution of the data. Therefore, small variations in
the data could cause different conclusions. According
to this point, the future work may include enhancing
these techniques by other data mining methods such
as association rules and sequence mining. However,
55
VI. Appendix 2
[a] The filtered relation graph between the number of
taken units(4,6,11,12,14) of courses and the gained
average(A,B,C)
[c] Sex on Study Period (Bachelor= 2.0,
Master= 3.0)
[b] The unfiltered relation graph between the number
of taken units(4,6,12,14,16) of courses and the gained
average(A,B,C,F) while thickness of lines indicates
the frequency
[d] Distribution of taken units and deleted
units
56
[d] Distribution of taken units and their scores
[f] Distribution of scores and taken units
[e] Distribution of taken units and failed units
[g] Distribution of GPA (A to F and not
exactly mentioned: N, R, X)
57
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
Jiawei Han and Micheline Kamber, "Data Mining:
Concepts and Techniques", Second Edition,
University of Illinois at Urbana-Champaign, 2006
Pang-Ning Tan, Michael Steinbach and Vipin Kumar,
"Introduction to Data Mining", Addison Wesley
Publishers, Boston, USA, 2006
Romero, C. and Ventura, S. (Eds.), "Data Mining in
e-Learning", 2006, pp. 261-278
Gerben W. Dekker, Mykola Pechenizkiy and Jan M.
Vleeshouwers, "Predicting Students Drop Out: A
Case Study", 2nd International Conference On
Educational Data Mining Proceedings, Cordoba,
Spain, July 1-3, 2009
Mykola Pechenizkiy, Toon Calders, Ekaterina
Vasilyeva and Paul De Bra, "Mining the Student
Assessment Data: Lessons Drawn from a Small Scale
Case Study", The 1st International Conference on
Educational Data Mining Proceedings, Montréal,
Québec, Canada, June 20-21, 2008
C. Romero and S. Ventura, "Educational data mining:
A survey from 1995 to 2005", Department of
Computer Sciences, University of Cordoba, Cordoba,
Spain, 2005
Hamalainen, W., Suhonen, J., Sutinen, E., &
Toivonen, H., "Data mining in personalizing distance
education courses", In World conference on open
learning and distance education, Hong Kong, 2004
Hanna, M., "Data mining in the e-learning domain",
Computers & Education Journal, 42(3), 267–287,
2004
Cristóbal Romero, Sebastián Ventura, Pedro G.
Espejo and César Hervás, "Data Mining Algorithms
to Classify Students",
The 1st International
Conference on Educational Data Mining Proceedings,
Montréal, Québec, Canada, June 20-21, 2008
Elizabeth Ayers, "Rebecca Nugent and Nema Dean,
Skill Set Profile Clustering Based on Weighted
Student Responses",
The 1st International
Conference on Educational Data Mining Proceedings,
Montréal, Québec, Canada, June 20-21, 2008
Amelia Zafra and Sebastan Ventura, "Predicting
Student Grades in Learning Management Systems
with Multiple Instance Genetic Programming", 2nd
International Conference On Educational Data
Mining Proceedings, Cordoba, Spain, July 1-3, 2009
58
View publication stats
Download