Uploaded by Prince Ngema

Predicting the completion of a Learner's undergraduate Degree using Machine Learning

advertisement
Using Machine Learning to predict the completion of a learner’s
Undergraduate Science Degree based on their First-year Marks
Prince Ngema
Supervised by DR Ritesh Adjhooda and DR Ashwini
BSC Hons Computer Science - 2019
The University of Witwatersrand
Course: COMS4030A- Adaptive Computation and Machine Learning
Abstract— Advances in the field of Machine learning due
to the increased usage of computers and the availability of
data has led to major improvements in domain of ”student
performance prediction”. Machine Learning can be used
in the development of models that can predict students’
performance. Predicting the students’ performance is a taxing
task because a student academic performance is dependant
on a number of factors. How a student performs in their
first year of study may give a vivid idea of where the student
is headed academically. We can therefore, using machine
learning techniques to leverage these marks and predict the
performance of students. The focus of this study, taking a
conceptual framework proposed by Spady [1970] as a rationale
is to identify the optimal Machine Learning algorithm for
predicting the completion of a learner’s Computer science
degree using marks obtained in their first year of study at
a South African University. We also focus on ranking the
predictive power of the courses taken in first year and we
also provide an interactive program that is able to predict the
completion of the student degree. The ML algorithms used are
J48 tree, K-Star, Naive Bayes, Multilayer perceptron, Support
Vector Machines(SVM) and the Logistic Regression. The
results of the algorithms were compared using 10-fold cross
validation method in terms of prediction accuracy, precision,
recall, AUROC curves and computation time. Results show
that... This study also revealed that previous knowledge of
Mathematics and Physics both at Ordinary level and 100
level are essential determinants of students’ performance in a
computer programming course.
index Terms– Image classification, image histogram, support
vector machines, RBF kernels.
1. INTRODUCTION
Many a time, admittance into a University programme is a
transformative experience for students, it gives them and their
families hope for a lustrous future. Alas, some students who
are accepted into the university programme fail to complete
their degree due poor academic performances. These students
are left lamenting and drowning in debt accumulated during
their years of study. In attempts to remedy this situation ,
many researchers from different walks of life have developed
models to predict students’ academic performance. Studies
show that the biggest attrition rate occurs at fresh-man level.
In South Africa, 29% of students drop out after doing their
fresh-man year studies and just 30% of fresh-man students
graduate after five years [Scott et al. 2007]. This study is
based on Computer Science learners at a South African
Higher-Education Research-Intensive Institution between the
years 2008 and 2017. These learners are categorized into
three groups: ”Completed”, this where a learner completes
their degree in minimum time ( 3 years); ”Delayed”, this
when the learner completes their degree in more than 3
years; ”Failed”, this where the learner drops out and does
not complete their degree. At this institution, they have been
21% of learners categorised as Completed; 24% as Delayed;
and 55% as Failed. In light of these alarming figures, higher
learning institutions are in need programs that will improve
student retention rates and graduation rates. These programs
aim to improve the performance of students by looking at
students’ previous performance and using this knowledge to
predict how a student is likely to perform in their field of
study. [Yadav et al. 2012].
Previous work in field of predicting student performance has
seen researchers employ different techniques like machine
learning, statistical analysis and data mining in attempts
of building models to do the prediction. Spady [1970]
proposed proposed that the student’s decision to stay and
complete their degree or withdraw from the academic
institution is influenced by Grades, intellectual development,
normative congruence and friendship support. He grouped
these factors under 2 systems: Grades and intellectual
development under the Academic system; and normative
congruence and friendship support under Social system. This
study lies squarely within the academic system. Grades
obtained during a student’s first-year of study are taken as
the predictor variable.
Following the conceptual framework of Spady [1970], we
define several features associated with first-year marks that
will be used to classify the student into three completion
profiles: ”Completed”, ”Delayed”, and ”Failed”. During their
first year of study, a student must have three majors and
one elective module. These four constitute our feature space
along with the student’s degree completion outcome and
the aggregate of all the marks obtain. Different machine
learning classifiers were used to predict the performance of
a student. These classifiers are : J48, K ∗ , Support Vector
Machines, Logistic Regression; Naive Bayes; and Multilayer
perceptron.
The goal of this study is therefore , exploring the
possibility of applying these machine learning algorithms
to predict the completion of a student’s Computer Science
degree based on first-year marks and investigating the value
of using first-year marks as predictor variables. The research
question can be seen as two fold:
• Are the above mentioned algorithms capable of
predicting students’ performance?
• Is using first-year marks as a predictor variables in a
student performance prediction task viable?
Using 10-fold cross validation method, Accuracy, AUROC
values and time taken to execute were used to scale the
performance of these models. Information gain was used to
rank features according to their predictive power. Based on
the experiments it is found that the accuracy level of the
classifiers range between 52% and 70%. The J48 achieved
the biggest recorded accuracy. The AUROC values vary
between 0,7 to 0,8. The J48 algorithm had the highest
AUROC value. The J48 algorithm also took the least time
to execute. In terms of Information gain, MajorOne is the
highest ranked feature, that is, MATHS 1 is the module with
the highest deterministic power.
An application software which uses the J48 predictive
model to predict the completion of a learner’s Computer
Science degree has been prepared. The stake holders of this
application are:
• Computer Science Students: Students who want to know
their chances of completing their degrees.
• Companies: Companies that offer scholarship or
bursaries to students might want to use this application
during the selection process.
• The institution: The institution
The reasons are to identify the student at risk of attrition
early enough in order to provide necessary support and
intervention for them with the goals of reducing attrition,
increasing retention, performance and graduation rate
Aljohani [2016]. A diagrammatic representation of the goals
of predicting student performance is shown in Fig1
There are three major general contributions of this paper:
(a) an interactive program which is able to predict the
completion of a student’s Computer Science degree (b)
a comparison of various classification models to classify
learner instances into these four completion outcomes; and
(c) the first trained classifier able to calculate the probability
of a learner completing their degree at a South African
University focused on the the conceptual framework of
Spady [1970].
This document is structured as follows.......................
2. R ELATED WORK
Many theoretical models (conceptual frameworks) on
student performance have been developed to date. In this
study we will adopt a conceptional framework proposed by
Spady [1970] as a rationale to predict the completion of a
Computer Science degree using first-year marks. To explain
Fig. 1: A diagrammatic presentation of the goals of predicting
student performance
the attrition process, Spady [1970] investigated the calibre
of reciprocity between the students and the environment of
their academic institutions. He asserted that this interaction
is a consequence of the exposure of individual students’
attributes such as dispositions, interests, attitudes and skills
to the influences, expectations and demands of the different
components of their institutions including courses, faculty
members, administrators and peers. His key premise was
that the effect of this reciprocity determines the level of
students’ integration within the academic and social systems
of their institutions and subsequently their tenacity. He
further postulated that the student’s decision to stay and
complete their degree or withdraw from the academic
institution is impacted by two chief factors in each of
the two systems as seen in Figure 1. These factors are
grades and intellectual development (Academic system)
and normative congruence and friendship support (social
system) [Spady 1970].
Studies to predict students academic performance have
been made over the years. Various techniques used to predict
students’ performance like data mining, statistical analysis
and machine learning have been employed. In this section,
we review some of the work previously done in this field.
In this paper student performance measures or indicates
whether a student was able to complete their degree.
A study to identify at risk-students in Mathematical
Sciences using biographical data and enrollment
observations to was conducted by researchers at a
South African University [Ajoodha and Jadhav 2017].
The basic methodology was to indicate influence of four
biographical characteristics (i.e. gender, spoken home
language, home province, and race description) on student
aggregates, explore the trajectory of student performance
over the period 2008 to 2017 respect to biographical
characteristics and calculated the posterior probability of
Fig. 2
The use of grades as a factor that affects students’ performance has been investigated by many researchers. In all of the
studies reviewed in Table??, researchers used grades as
failing to complete the minimum requirements given various
biographical profiles using Bayesian analysis. The results
showcased at-risk biographical profiles with a Bayesian
estimate that was greater than 0.7 for failing to complete
the requirements for a degree in the Mathematical Sciences.
performs better than the other the machine learning
algorithms. Comparing the performance of these algorithms
on engineered data and raw data, better performance results
were obtained on engineered data.
Oyelade et al. [2010] used 6 learning algorithms namely
K-Nearest Neighbour, Neural networks, naive Bayes
Algorithm, SVM and other machine learning algorithms
to predict using first-year marks the completion of a
student’s degree and other contributing factors. They also
compared these learning algorithms and concluded that the
Naive Bayes algorithm performs better and its accuracy is
satisfactory. [Oyelade et al. 2010].
Cortez and Silva [2008] investigated students’
performance using machine learning algorithms including
SVM and NB. They stressed out the point that grades are a
key feature in predicting students’ performance. They came
to this conclusion when they employed the Decision tree
algorithm, students grades were the root node which clearly
emphasizes the significance of marks in predicting outcome
of a student. They also concluded that the NB algorithm
out performs the SVM algorithm.
A similar study was conducted by Butcher and Muth
[1985], they used American Collage Testing Program(ACT)
test scores along with performance in high school and
information regarding the students’ programs to predict
how students will perform will perform in an Computer
Science course and semester one’s collage average of their
grade points. They took the statistical analysis approach and
used the Statistical system (SAS) to perform all statistical
analyses. They concluded that semester one’s marks may
indeed be used to predict students’ performance.
Daud et al. [2017] investigated students’ performance
prediction methods. The prediction methods investigated
included the NB , SVM and Bayesian networks. These
were applied to student data and the performance of each
classifier was measured using the F-measure method. After
carrying out experiments, they came to the conclusion that
the SVM out performs the other classifiers. This is a clear
contradiction with what Cortez and Silva [2008] found.
In Cortez and Silva [2008], it was concluded that the NB
algorithm performs better than the SVM classifier.
Another study was done by Pojon [2017], they investigated
students’ performance using ML. They used various ML
techniques(Linear regression, Decision tress, naive Bayes’
classifier) and compared their performance. Feature
engineering methods were used to better the performance
of the prediction model. They found that the NB classifier
Bydžovská [2016] having the grade average of a student
as one of their data attributes compared ML techniques
for predicting student performance. They employed SVM ,
Random forests, Naive Bayes and decision trees to predict
the students’ performance. Using the F-measure method
to measure the performance of each algorithm, the SVM
was ranked as number one, out performing all the other
algorithms. These results are in line with what Daud et al.
[2017] found.
3. M ETHODOLOGY
In this study we attempt to predict the completion of
a student’s Computer Science degree using first-year
marks. In simpler terms, we are trying to answer the
following question, Will a student in question complete their
computer science degree? The are three possible outcomes:
“Completed” , the student is expected to complete their
degree in minimum time (3 years); ”Failed”, the student
is expected not to complete their degree; and “Delayed”,
the students successfully completes their degree but not
in minimum time. Several machine learning classification
models to deduce a learner into one of three categories
Completed, Delayed or Failed will be employed.
The best performing algorithm will be used to create
an application that will be used to predict the outcome
of a student. To gauge the performance of the trained
classification models, confusion matrices and AUROC
curves will be used.
This section is structured as follows: A brief description
of the data collection procedure which includes information
concerning the ethics clearance certificate details is given
firstly. Secondly, preprocessing steps taken to prepare the
data for this research objective are outlined. Thirdly, a brief
descriptions of each machine learning classifier to be used
is given . Fourthly, the feature analysis process is presented
and finally brief descriptions of the evaluation metrics used
to gauge the performance of the classifiers.
Fig. 3: Proposed Methodology of Classification Model
A. Data collection and Ethics
The data utilized in this study was obtained from a South
African University. The data consists of biographical, student
marks from first year to third , all postgraduate marks and
enrollment observations of students from the Faculty of
Science doing Mathematical Science Degrees. The study
participants are students who studied anytime between the
years 2008 to 2017 at a research-concentrated South African
university. This study was approved by the Human Research
Ethics Committee of the University. The committee also
TABLE I: Systematic Literature Review
Study
Study Purpose
Attributes used
Models used
Findings
Pavlou et al. (2007)
Trust
A buyer’s intentions to accept
vulnerability based on her beliefs
that transactions with a seller will
meet her...
Trust, web site
asvxngs
Pavlou et al. (2007)
Trust
A buyer’s intentions to accept
vulnerability based on her beliefs
that transactions with a seller will
meet her...
Trust, web site
asvxngs
Variables
Description
Type
Major One
Aggregate of final Calculus 1
and Algebra 1 marks
Nominal
Major Two
Aggregate of all Coms 1 marks
Nominal
Major Three
Computation and Applied
Mathematics orInformation
Nominal
Systems Final Marks
Marks of any Level 1 course
Elective
Nominal
(i.e. Physics)
Aggregate
Aggregate of all marks obtained
Nominal
Progress Outcome
Outcome of a student in their final year
Categorical
Variable encoding values
4 = 70% - 100%
3 = 60% - 69 %
2 = 50 % - 59%
1 = 0% - 49%
4 = 70% -100%
3 = 60% - 69%
2 = 50% - 59%
1 = 0% - 49%
4 = 70% - 100%
3 = 60% - 69%
2 = 50% - 59%
1 = 0% - 49%
4 = 70% - 100%
3 = 60% - 69%
2 = 50% - 59%
1 = 0% - 49%
4 = 70% - 100%
3 = 60% - 69%
2 = 50% - 59%
1 = 0% - 49%
Completed - Yes
Failed - No
Delayed - NRT
TABLE II: The Students data set description
tackled ethical issues of protecting the identity of the students
participating in the research and ensuring the security of data.
Data preparation
First year Computer Science students undertake three major
courses and one elective. The data taken from the AISU ,
contains the marks obtained by each student during their first
year. Using this data, the features that will help us predict the
completion of the degree were engineered. These features are
MajorOne, MajorTwo, MajorThree, Elective, Aggregate and
ProgressOutcome. The first four attributes are the predictors
and Progress outcome is the target attribute. The target
attribute contains three classes, Completed , Delayed, and
Failed. Table IV shows the description of these attributes.
Data preprocessing
Some entries in the data contained missing values. This is
due to a number of reasons. The most common reason being
that some students did not enrol for some of the courses.
To combat this problem, entries with missing values where
removed. According to figure 4, there is a huge gap between
the number of students who are labeled as ’Failed” and those
those who are labelled ”Completed” or ”delayed.
The data set contains a total number of 624 instances.
The classes are imbalanced , this may lead to reduced
accuracy.To combat this, we undersampled the data using
the spreadsubsample filter on WEKA. The final data contains
393 instances with 131 in each class.
Attribute ranking
The goal of attribute ranking is to determine the predictive
power of each variable in our feature set. To achieve this, the
Information Gain Ranking algorithm will be used. The IGR
algorithm calculates the information gain for each feature
Major One
3
2
3
4
3
1
Major two
3
1
3
2
4
1
Major three
4
3
4
3
2
1
Elective
3
2
4
2
2
2
Aggregate
3
2
3
3
4
1
Outcome
Completed
Failed
Completed
Delayed
Delayed
Failed
TABLE III: The resulting data set after preprocessing
Completed
Delayed
21%
24%
55%
Failed
Fig. 4: The percentage of Delayed, Completed and Failed instances
contained in the data set
with respect to target feature. Information gain measures how
much information a feature gives us about a class. The values
of Information gain (entropy) are between 0 and 1, that is,
the minimum entropy is 0 and the maximum entropy is 1.
Classification Algorithms
In this study, the machine learning algorithms employed for
the classification process were K-Star, Naive Bayes, SVM,
Decision tree, Logistic Regression, and the Multi-layer
Perceptron.
k-Star:.The K* instance-based classifier uses an
entropy-based distance function to classify test instances
using the training instance most similar to them. The K*
implementation used in this paper closely followed the
implementation by Clearly and Trigg (1995). Using an
entropy-based distance function allows consistency in the
classification of real-valued and symbolic features found in
our experiments.
Naive Bayes: For prediction problems, this is the most
used algorithm [Pojon 2017]. It is beloved for its pragmatic
approach to machine learning problems. It uses Bayes’
theorem to classify instances to one or a number of
independent classes using probabilistic approach [Koller
et al. 2009]. It is the easiest learning algorithm to implement
[Pojon 2017]. It attempts to find (provided that all the
features are conditionally independent given the class label
of each instance) the likelihood of features occurring in
each class and take the class with the largest posterior
probability as our predicted class. The main assumption
is that all features are conditionally independent given the
class label of each instance.
SVM: Support Vector Machines (SVM) is supervised ML
algorithm which was initially developed for binary class
classification cases [Madzarov et al. 2009]. However, it
be extended to classification cases with more than two
classes by breaking them to classification cases with only
two classes [Madzarov et al. 2009]. The idea is to find
a multi-dimensional hyperplane that will best divide the
dataset into two classes. Test instances are then mapped on
the same space and predicted based on which side of the
hyperplane they fall on. SVMs can be scaled for nonlinear
and high-dimensional classification problems by employing
the kernel trick and one to many partitioning.
J48: The J48 algorithm , a successor of ID3 was developed
bY Ross Quinlan and is implemented in WEKA using Java
[Bashir and Chachoo 2017]. The decision tree’s greedy
top-down approach is adopted by this algorithm. It is
used for classification in which new instance is labelled
according to the training data.In this paper, we will use this
algorithm to classify a student into one of the 3 categories
( completed, not completed, NRT).
Logistic Regression: The Linear Logistic Regression model
predicts probabilities directly by using the Logit transform.
The implementation in this paper follows Sumner et al.
[2005].
Multi-layer Perceptron:
B. Model Evaluation
To gauge the performance of the trained classification
models, confusion matrices and Receiver Operating
Characteristic curves will be used. From the confusion
we will derive performance measure metrics from like
Precision, Recall and Accuracy and from the ROC curve
we will derive the AUC (area under the ROC curve metric).
From the confusion matrix depicted in Fig 5, True Positive
(TP) gives the proportion of positive entries that are
accurately identified, False Positive (FP) is the proportion
of positive entries that are classified as negative, False
Negative (FN) is the fraction of negative entries that are
classified as positive and True Negative (TN) is the number
of negative entries correctly classified.
A. Feature Ranking
Predicted Values
True
Positive
False
Negative
False
Positive
True
Negative
One important part of the study is feature selection,
which is used to rank the predictor variables according
to the strength of their relationship with dependent or
outcome variable, which is completion outcome in our case.
Information gain can be used to rank the predictor variables
according to the strength of their relationship with the
outcome variable. To rank the predictor variables in terms of
their predictive power, the IGR algorithm was used. Table 3
depicts the ranking of the features using information gain.
Column 1 indicates the rank of each feature , column 2
indicates the feature names and column 3 indicates the
entropy (information gain).
The attribute ranking with respect to the class label
actual
values
Fig. 5: Confusion Matrix
Featur
Accuracy which is the proportion of correct predictions is
calculated as follows:
TN + TP
Accuracy =
(1)
TP + TN + FP + FN
The higher the accuracy, the better the model’s performance.
Precision which is the capacity of a model not to classify a
positive instance as negative. It is calculated as follows :
(2)
The higher the precision, the better the model’s performance.
Recall which is ability of a classifier to find all positive
instances is calculated as follows:
TP
Recall =
(3)
TP + FN
The higher the Recall, the better the model’s performance.
Receiver Operating Characteristic (ROC) is a curve with true
positive rate is on the Y axis and false positive rate on
the X axis that visualizes the tradeoff between the model’s
sensitivity and specificity. Using this curve we can calculate
the AUC (Area under the ROC curve) metric. The AUC of a
model tells us about the classification model’s discriminatory
capabilities, that is, the ability of a model to discriminate
between classes. The higher the AUC of a model, the better
the performance of the model.
C. The Software Application Methodology
4. R ESULTS AND D ISCUSSION
In this study, Six classification models were employed with
the purpose of predicting the completion of a learner’s
undergraduate degree based on their first-year marks. This
section presents the results obtained when these classification
models were evaluated using Accuracy, Precision, Recall,
AUC and Execution time.
Rank
Aggregate
0.308
1
Elective
0.104
5
MajorOne
0.294
2
MajorTwo
0.189
3
MajorThree
0.186
4
TABLE IV: Feature Ranking
using gain ratio criteria shows that students with very
good background knowledge in MAT 103, MAT 102,
PHY 101, PHY 102 and MAT 101 will perform very
well in computer programming courses. These courses are
calculation-intensive and they require very sharp brains that
can think very fast. Computer programming is one of the
courses that involve developing resourceful algorithms and
the ability to transform these algorithms to efficient working
software. Sometime, these algorithms are mathematically
based. The results of this work validate this fact
Rank vs Entropy
Entropy
0.35
0.3
Entropy
TP
P recision =
TP + FP
Entropy
0.25
0.2
0.15
0.1
5 · 10−2
1
2
3
4
5
Feature Rank
The attribute ranking with respect to the class label
using gain ratio criteria shows that students with very
good background knowledge in MAT 103, MAT 102,
PHY 101, PHY 102 and MAT 101 will perform very
well in computer programming courses. These courses are
calculation-intensive and they require very sharp brains that
can think very fast. Computer programming is one of the
courses that involve developing resourceful algorithms and
the ability to transform these algorithms to efficient working
software. Sometime, these algorithms are mathematically
based. The results of this work validate this fact.
Classification Model Evaluation
Different classifiers whose results vary depending on
efficiency were employed. In this paper the following
algorithms were employed: Naive Bayes, SVMs, Decision
trees, K* , Multilayer- perceptron and Logistic Regression.
To gauge the performance of these classifiers, we used
confusion matrices and ROC curves. The confusion matrix
of each model is depicted in Fig 6.In addition, table 4 shows
the execution time (in seconds) used by each classifier when
building its model for the training data. J48 tree used the
shortest time (0.02 seconds) for classification while CART
and BF Tree used the same time (0.22 seconds) for their
classification. Therefore, J48 tree also has the least execution
time among the three algorithms under investigation.
Comparison was also made based on ROC Curves and
AUROC. From Figure 5d, it is indicated that the J48 model
is the best performer in terms of the area under the ROC
curve (AUROC) It achieved an AUROC value of 0.8645
using 10-fold cross validation. With the exception of the K*
and naive Bayes, compared to the other four classification
models employed in this paper the J48 took the least time
to build.
Figure 5a illustrates the ROC curve for the SVM
model which achieves an AUROC of 0.754 using
10-fold cross validation. Furthermore, With the exception
of the Multi-layer perceptron, compared to other four
classifications, the SVM classification model took the longest
time to build.
Figure 5b illustrates the ROC curve for the naive Bayes
model which achieves an AUROC of 0.783 using 10-fold
cross validation.With the exception of the K* classification
model, from the other five models employed in this paper
Naive Bayes took the least time to build. Figure 5c
illustrates the ROC curve for the Multi-layer perceptron
model which achieves an AUROC of 0.7867 using 10-fold
cross validation. The Multi-layer perceptron model took the
longest time to build as compared to any model employed
in this study.
Figure 5e illustrates the ROC curve for the K ∗ model
which achieves an AUROC of 0.7852 using 10-fold cross
validation. The k ∗ model took the least time to build as
compared to any model employed in this study.
Figure 5f illustrates the ROC curve for the Logistic
Regression model which achieves an AUROC of 0.7764
using 10-fold cross validation. With the exception of the
SVM and the Multi-layer perceptron classification models,
the Logistic Regression Model took the longest time to build.
Algorithm
Naive Bayes
SVM
J48
Logistic Regresion
Multi-layer Perceptron
K-Star
Accuracy (%)
52.013
58.015
70.125
60.560
54.199
57.506
Precision
0.571
0.581
0.690
0.605
0.548
0.548
Recall
0.583
0.580
0.692
0.606
0.542
0.542
Execution time(sec)
0.10
0.40
0.09
0.20
0.56
0.01
TABLE V
(a) A confusion matrix describing the performance of the Naive
Bayes predictive model. The NB model achieves an accuracy of
52%, a precision value of 0.58 and a recall value of 0.58. Out of
the 393 instances, 225 were correctly classified and only 168 were
classified incorrectly.
Actual
Completed
Failed
Delayed
74
4
40
9
48
102 25
29 62
(b) A confusion matrix describing the performance of the Logistic Regression
predictive model. The NB model achieves an accuracy of 52%, a precision
value of 0.58 and a recall value of 0.58. Out of the 393 instances, 225 were
correctly classified and only 168 were classified incorrectly.
(c) A confusion matrix describing the performance of the SVM
predictive model. The SVM model achieves an accuracy of 58%,
a precision value of 0.60 and a recall value of 0.60. Out of the
393 instances, 228 were correctly classified and only 165 were
classified incorrectly
Completed
Failed
Delayed
Delayed
47
29
55
Failed
7
96
26
Completed
Delayed
77
6
50
Actual
Failed
Predicted
Completed
Actual
Predicted
Completed
Failed
Delayed
70
8
43
9
94
26
52
29
62
(d) A confusion matrix describing the performance of the K-star predictive
model. The K-star model achieves an accuracy of 52%, a precision value of
0.58 and a recall value of 0.58. Out of the 393 instances, 225 were correctly
classified and only 168 were classified incorrectly.
(e) A confusion matrix describing the performance of the J48
predictive model. The SVM model achieves an accuracy of 70%,
a precision value of 0.69 and a recall value of 0.69. Out of the
393 instances, 272 were correctly classified and only 121 were
classified incorrectly
Delayed
Actual
Completed
Failed
Delayed
Failed
7
32
104 18
25 76
Completed
92
9
30
Delayed
Failed
Predicted
Completed
Actual
Predicted
Completed
Failed
Delayed
Delayed
Failed
13 59
102 80
33 64
Completed
59
1
34
Delayed
Failed
Completed
Failed
Delayed
Predicted
Completed
Actual
Predicted
69
18
56
8
91
22
54
22
53
(f) A confusion matrix describing the performance of the Multilayer
Perceptron predictive model. The Multilayer perceptron model achieves an
accuracy of 52%, a precision value of 0.58 and a recall value of 0.58. Out of
the 393 instances, 225 were correctly classified and only 168 were classified
incorrectly.
Fig. 6: A set of confusion matrices describing the performance of several predictive models on a set of test data. Each predictive model’s
accuracy and indicated along with the correctly and incorrectly classified instances
R EFERENCES
Ritesh Ajoodha and Ashwini Jadhav. Identifying at-risk
undergraduate students using biographical and enrollment
observations for mathematical science degrees at a south
african university. Private Communication, 1(1):1–21, nov
2017.
Othman Aljohani. A comprehensive review of the major
studies and theoretical models of student retention in
higher education. Higher education studies, 6(2):1–18,
2016.
Uzair Bashir and Manzoor Chachoo. Performance evaluation
of j48 and bayes algorithms for intrusion detection system.
(a) ROC curve showing the tradeoff between sensitivity and the
specificity of the K-star predictive classification model. The AUC
of the K-star is 0.69
(b) ROC curve showing the tradeoff between sensitivity and the specificity
of the SVM predictive classification model. The AUC of the SVM predictive
classification model is 0.72
(c) ROC curve showing the tradeoff between sensitivity and the
specificity of the MLP predictive classification model. The AUC
of the MLP predictive classification model is 0.72
(d) ROC curve showing the tradeoff between sensitivity and the specificity
of the Logistic Regression predictive classification model. The AUC of the
Logistic Regression predictive classification model is 0.71
(e) ROC curve showing the tradeoff between sensitivity and the
specificity of the Naive Bayes predictive classification
model. The AUC of the Naive Bayes predictive
classification model is 0.68
(f) ROC curve showing the tradeoff between sensitivity and the specificity of
the J48 predictive classification model. The AUC of the J48 predictive
classification model is 0.75
Fig. 7: Figures illustrating the ROC curves of the predictive classification models. AUC values of each class and the average AUC value
of the predictive classification models are given.
Int. J. Netw. Secur. Its Appl, 2017.
DF Butcher and WA Muth. Predicting performance in an
introductory computer science course. Communications
of the ACM, 28(3):263–268, 1985.
Hana Bydžovská. A comparative analysis of techniques for
predicting student performance. International Educational
Data Mining Society, 2016.
Paulo Cortez and Alice Maria Gonçalves Silva. Using data
mining to predict secondary school student performance.
2008.
Ali Daud, Naif Radi Aljohani, Rabeeh Ayaz Abbasi,
Miltiadis D Lytras, Farhat Abbas, and Jalal S Alowibdi.
Predicting student performance using advanced learning
analytics.
In Proceedings of the 26th International
Conference on World Wide Web Companion, pages
415–421. International World Wide Web Conferences
Steering Committee, 2017.
Daphne Koller, Nir Friedman, and Francis Bach.
Probabilistic graphical models: principles and techniques.
2009.
OJ Oyelade, OO Oladipupo, and IC Obagbuwa. Application
of k means clustering algorithm for prediction of students
academic performance. arXiv preprint arXiv:1002.2425,
2010.
Murat Pojon. Using machine learning to predict student
performance. Master’s thesis, 2017.
Ian Scott, Nanette Yeld, and Jane Hendry. Higher education
monitor: A case for improving teaching and learning in
South African higher education, volume 6. Council on
Higher Education Pretoria, 2007.
William G Spady. Dropouts from higher education: An
interdisciplinary review and synthesis. Interchange, 1(1):
64–85, 1970.
Marc Sumner, Eibe Frank, and Mark Hall. Speeding up
logistic model tree induction. In European conference on
principles of data mining and knowledge discovery, pages
675–683. Springer, 2005.
Surjeet Kumar Yadav, Brijesh Bharadwaj, and Saurabh Pal.
Mining education data to predict student’s retention: a
comparative study. arXiv preprint arXiv:1203.2987, 2012.
5. ACKNOWLEDGEMENTS
This research was funded by EPSRC grant EP/N035437/1.
Download