Machine Learning Based Student Modelof an Intelligent Tutoring System Achi I. I.,

International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015
Machine Learning Based Student Modelof an Intelligent
Tutoring System
Achi I. I., #2Prof. Inyiama H. C., #3Prof. Bakpo F. F., #4Agwu C. O.
PhD Student, 2Professor, 3Professor, 4Lecturer 1
Department of Computer Science, University of Nigeria - Nsukka, Nigeria.
Department of Electronic and Computer Engineering, Nnamdi Azikwe Univeristy – Awka, Nigeria
Department of Computer Science, University of Nigeria - Nsukka, Nigeria.
Department of Computer Science, Ebonyi State University – Abakaliki, Nigeria.
Abstract- This paper focuses on the design and the
construction of a Computer Based Learning
System(CBLS) that could learn from the students by
way of modeling, advises the tutor module on the best
way a student could learn and store that best way of
learning for each student for their future use. The
need to understand and predict the student in a
Learning Environment (LE) so as to provide an
accurate guide on their learning procedure has been
a major concern and as such modeling students has
been the subject of many researchers over the years
and many approaches has been used in the past to
achieve some success. This paper explores how
Supervised Machine Learning could use the path to
student learning as an input to modeling students’
comprehension ability as an output in a learning
environment and advice on the best way a student
could learn. In this paper, learning styles will be used
as path to students’ knowledge. Four learning styles
will be considered in this course of this work which
are Auditory Learners (By Hearing), Visual Learners
(By Sight), Kinesthetic Learners (Through Touch)
and hybrid (a combination of two or more).
Keywords- Computer Based Learning System
(CBLS), Learning Environment (LE), Intelligent
Tutoring System (ITS), Machine Learning (ML),
Inductive Machine Learning (IML)
I. Introduction
Machine learning is a subfield of computer science
[1] that evolved from the study of pattern recognition
and computational learning theory in artificial
intelligence.[1] Machine learning explores the
construction and study of algorithms that can learn
from and make predictions on data.[2] Such
algorithms operate by building a model from example
inputs in order to make data-driven predictions or
decisions,[3] rather than following strictly static
program instructions. Machine learning is closely
related to and often overlaps with computational
statistics; a discipline that also specializes in
prediction-making. It has strong ties to mathematical
ISSN: 2231-5381
optimization, which deliver methods, theory and
application domains to the field. Machine learning is
employed in a range of computing tasks where
designing and programming explicit algorithms is
infeasible. Example of applications where machine
learning is deployed includes spam filtering, optical
character recognition (OCR), [4] search engines and
computer vision. Machine learning is sometimes
conflated with data mining,[5] although that focuses
more on exploratory data analysis.[6] Machine
learning and pattern recognition "can be viewed as
two facets of the same field.[3] When employed in
industrial contexts, machine learning methods may be
referred to as predictive analytics or predictive
II. Types of Problems and Tasks
Machine learning tasks are typically classified into
three broad categories, depending on the nature of the
learning "signal" or "feedback" available to a learning
system. These are:[10],[7]
a. Supervised learning: The computer is presented
with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a
general rule that maps inputs to outputs. In the
case of this paper, the input is made up of the
four learning styles which is provided by the
teacher for the modeling of the student
comprehension of a subject matter. The student
comprehension being the output, goal of the
b. Unsupervised learning: No labels are given to
the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised
learning can be a goal in itself (discovering
hidden patterns in data) or a means towards an
c. Reinforcement learning: A computer program
interacts with a dynamic environment in which
it must perform a certain goal (such as driving a
vehicle), without a teacher explicitly telling it
whether it has come close to its goal or not.
Another example is learning to play a game by
playing against an opponent.[3]:
Page 229
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015
Between supervised and unsupervised learning is
semi-supervised learning, where the teacher gives an
incomplete training signal: a training set with some
(often many) of the target outputs missing.
Transduction is a special case of this principle where
the entire set of problem instances is known at
learning time, except that part of the targets is
missing. A support vector machine is a classifier that
divides its input space into two regions, separated by
a linear boundary. Here, it has learned to distinguish
black and white circles.
Among other categories of machine learning
problems, learning to learn learns its own inductive
bias based on previous experience. Developmental
learning, elaborated for robot learning, generates its
own sequences (also called curriculum) of learning
situations to cumulatively acquire repertoires of
novel skills through autonomous self-exploration and
social interaction with human teachers, and using
guidance mechanisms such as active learning,
maturation, motor synergies, and imitation. Another
categorization of machine learning tasks arises when
one considers the desired output of a machine-learned
system: [3] in classification, inputs are divided into
two or more classes, and the learner must produce a
model that assigns unseen inputs to one (or multilabel classification) or more of these classes. This is
typically tackled in a supervised way. Spam filtering
is an example of classification, where the inputs are
email (or other) messages and the classes are "spam"
and "not spam”. In regression, also a supervised
problem, the outputs are continuous rather than
discrete. In clustering, a set of inputs is to be divided
into groups. Unlike in classification, the groups are
not known beforehand, making this typically an
unsupervised task. Density estimation finds the
distribution of inputs in some space. Dimensionality
reduction simplifies inputs by mapping them into a
lower-dimensional space. Topic modeling is a related
problem, where a program is given a list of human
language documents and is tasked to find out which
documents cover similar topics.
III. Supervised Machine Learning Technique
in ITS
This technique is about identifying both the input and
the output of a problem domain and formulating
certain rules within the input end that will produce
the output. In this paper, the goal, which is the
output, is referred to the student comprehension of a
subject matter in a typical learning environment and
we already have the sets of input as the four different
learning styles. In any case, every individual has a
learning style that is most suitable for him or her.
ISSN: 2231-5381
Therefore, the trial of the four sets of learning styles
to determine the actual one learning styles suitable
for the student is what this paper is all about. The
diagram below describes the entire process.
Figure 1: Supervised Learning
A, Technique
B, C, D represents the four learning styles which
are provided as input and E is the output, which is the
target goal. The goal here is comprehension of
student being modeled. A set of rules will be
formulated for the input that will help choose the best
learning styles that is appropriate for the student
being modeled.
IV. Supervised learning algorithms
Inductive machine learning is the process of learning
a set of rules from instances (examples in a training
set), or more generally speaking, creating a classifier
that can be used for training from new instances. The
process of applying supervised ML to a real-world
problem is described in Figure 2.
Figure 2. The process of supervised ML
Page 230
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015
The first step is collecting the dataset. If a requisite
expert is available, then s/he could suggest which
fields (attributes, features) are the most informative.
If not, then the simplest method is that of “bruteforce,” which means measuring everything available
in the hope that the right (informative, relevant)
features can be isolated. However, a dataset collected
by the “brute-force” method is not directly suitable
for induction. It contains in most cases noise and
missing feature values, and therefore requires
significant pre-processing[8]. The second step is the
data preparation and data pre-processing. Depending
on the circumstances, researchers have a number of
methods to choose from to handle missing data[9]. It
could be recalled that Hodge and Austin, 2003;[10]
introduced a survey of contemporary techniques for
outlier (noise) detection. These researchers had
disadvantages. Instance selection is not only used to
handle noise but to cope with the infeasibility of
learning from very large datasets. Instance selection
in these datasets is an optimization problem that
attempts to maintain the mining quality while
minimizing the sample size[11]. It reduces data and
enables a data mining algorithm to function and work
effectively with very large datasets. There is a variety
of procedures for sampling instances from a large
dataset[12]. Feature subset selection is the process of
identifying and removing as many irrelevant and
redundant features as possible[12]. This reduces the
dimensionality of the data and enables data mining
algorithms to operate faster and more effectively. The
fact that many features depend on one another often
unduly influences the accuracy of supervised ML
classification models. This problem can be addressed
by constructing new features from the basic feature
set [13]. This technique is called feature
construction/transformation. These newly generated
features may lead to the creation of more concise and
accurate classifiers. In addition, the discovery of
comprehensibility of the produced classifier, and a
better understanding of the learned concept.
V. Algorithm selection
The choice of which specific learning algorithm we
should use is a critical step once preliminary testing
is judged to be satisfactory, the classifier (mapping
from unlabeled instances to classes) is available for
routine use. The classifier’s evaluation is most often
based on prediction accuracy (the percentage of
correct prediction divided by the total number of
predictions). There are at least three techniques
which are used to calculate a classifier’s accuracy.
One technique is to split the training set by using
two-thirds for training and the other third for
ISSN: 2231-5381
estimating performance. In another technique, known
as cross-validation, the training set is divided into
mutually exclusive and equal-sized subsets and for
each subset the classifier is trained on the union of all
the other subsets. The average of the error rate of
each subset is therefore an estimate of the error rate
of the classifier. Leave-one-out validation is a special
case of cross validation. All test subsets consist of a
single instance. This type of validation is, of course,
more expensive computationally, but useful when the
most accurate estimate of a classifier’s error rate is
required. If the error rate evaluation is unsatisfactory,
we must return to a previous stage of the supervised
ML process (as detailed in Figure 2). A variety of
factors must be examined: perhaps relevant features
for the problem are not being used, a larger training
set is needed, the dimensionality of the problem is
too high, the selected algorithm is inappropriate or
parameter tuning is needed. Another problem could
be that the dataset is imbalanced [14]. A common
method for comparing supervised ML algorithms is
to perform statistical comparisons of the accuracies
of trained classifiers on specific datasets. If we have
sufficient supply of data, we can sample a number of
training sets of size N, run the two learning
algorithms on each of them, and estimate the
difference in accuracy for each pair of classifiers on a
large test set. The average of these differences is an
estimate of the expected difference in generalization
error across all possible training sets of size N, and
their variance is an estimate of the variance of the
classifier in the total set. Our next step is to perform
paired t-test to check the null hypothesis that the
mean difference between the classifiers is zero. This
test can produce two types of errors. Type I error is
the probability that the test rejects the null hypothesis
incorrectly (i.e. it finds a “significant” difference
although there is none). Type II error is the
probability that the null hypothesis is not rejected,
when there is actually a difference. The test’s Type I
error will be close to the chosen significance level. In
practice, however, we often have only one dataset of
size N and all estimates must be obtained from this
sole dataset. Different training sets are obtained by
sub-sampling, and the instances not sampled for
training are used for testing. Unfortunately this
violates the independence assumption necessary for
proper significance testing. The consequence of this
is that Type I errors exceed the significance level.
This is problematic because it is important for the
researcher to be able to control Type I errors and
know the probability of incorrectly rejecting the null
hypothesis. Several heuristic versions of the t-test
have been developed to alleviate this problem [15],
[16]. Ideally, we would like the test’s outcome to be
independent of the particular partitioning resulting
Page 231
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015
from the randomization process, because this would
make it much easier to replicate experimental results
published in the literature. However, in practice there
is always certain sensitivity to the partitioning used.
To measure replicability we need to repeat the same
test several times on the same data with different
random partitioning usually ten repetitions and count
how often the outcome is the same [17]. Supervised
classification is one of the tasks most frequently
carried out by so-called Intelligent Systems. Thus, a
large number of techniques have been developed
based on Artificial Intelligence (Logical/Symbolic
techniques), Perception-based techniques and
Statistics (Bayesian Networks, Instance-based
techniques). However, in our research we suggested
and deployed a supervised machine learning
technique, starting with logical/symbolic algorithms.
VI. Conclusion
This paper had explained in detail how supervised
learning of the machine learning system can be used
to model student in a learning environment. This was
possible because the set of the inputs have be defined
as well as the output. All that is required is to create a
set of rule that can be used to realize the goal, which
is comprehension. The importance of knowing the
actual learning style of a student cannot be over
emphasized and therefore using machine has made it
much easier to realize. The model when deployed can
serve as an alternative platform to students for
learning in our universities and tertiary institutions in
[1] William, L. H., (2015). Machine Learning, Encyclopedia
ISSN: 2231-5381
[2] Ron, K. and Foster, P., (1998). "Glossary of terms". Machine
Learning 30: 271–274.
[3] Bishop, C. M., (2006). Pattern Recognition and Machine
Learning. Springer. ISBN 0-387-31073-8.
[4] Wernick, M. N., Yang, Y., Brankov, J. G., Yourganov, G., and
Strother, S. C., (2010). Machine Learning in Medical Imaging,
IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 25-38.
[5] Mannila, H., (1996). Data mining: machine learning, statistics,
and databases. Int'l Conf. Scientific and Statistical Database
Management. IEEE Computer Society.
[6] Friedman, J. H., (1998). "Data Mining and Statistics: What's
the connection?". Computing Science and Statistics 29 (1): 3–9.
[7] Russell, S., and Norvig, P., (2003). Artificial Intelligence: A
Modern Approach (2nd ed.). Prentice Hall. ISBN 9780137903955.
[8] Zhang, S., Zhang, C., and Yang, Q., (2002). Data Preparation
for Data Mining. Applied Artificial Intelligence, Volume 17, pp.
375 – 381
[9] Batista, G., and Monard, M.C., (2003), An Analysis of Four
Missing Data Treatment Methods for Supervised Learning,
Applied Artificial Intelligence, vol.17, pp.519-533.
[10] Hodge, V., and Austin, J., (2004). A Survey of Outlier
Detection Methodologies, Artificial Intelligence Review, Volume
22, Issue 2, pp. 85-126
[11] Liu, H., and Motoda H., (2001). Instance Selection and
Constructive Data Mining, Kluwer, Boston. and Pruning. Data
Mining and Knowledge Discovery 4: 315–344
[12] Reinartz, T., (2002). A Unifying View on Instance Selection,
Data Mining and Knowledge Discovery, 5, 191–210, Kluwer
Academic Publishers.
[13] Markovitch, S., and Rosenstein, D., (2002). Feature
Generation Using General Construction Functions, Machine
Learning 49: 59-98.
[14] Japkowicz, N., and Stephen, S., (2002). The Class Imbalance
Problem: A Systematic Study Intelligent Data Analysis, Volume 6,
Number 4.
[15] Dietterich, T. G., (1998). Approximate Statistical Tests for
Comparing Supervised Classification Learning Algorithms. Neural
Computation, 10(7) 1895–1924.
[16] Nadeau, C., and Bengio, Y., (2003). Inference for the
generalization error. In Machine Learning 52:239–281.
[17] Bouckaert, R., (2003). Choosing between two learning
algorithms based on calibrated tests. Proc 20th Int. Conf on
Machine Learning, pp. 51-58. Morgan Kaufmann
Page 232