International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015 Machine Learning Based Student Modelof an Intelligent Tutoring System *1 Achi I. I., #2Prof. Inyiama H. C., #3Prof. Bakpo F. F., #4Agwu C. O. 1 PhD Student, 2Professor, 3Professor, 4Lecturer 1 Department of Computer Science, University of Nigeria - Nsukka, Nigeria. 2 Department of Electronic and Computer Engineering, Nnamdi Azikwe Univeristy – Awka, Nigeria 3 Department of Computer Science, University of Nigeria - Nsukka, Nigeria. 4 Department of Computer Science, Ebonyi State University – Abakaliki, Nigeria. 1 Abstract- This paper focuses on the design and the construction of a Computer Based Learning System(CBLS) that could learn from the students by way of modeling, advises the tutor module on the best way a student could learn and store that best way of learning for each student for their future use. The need to understand and predict the student in a Learning Environment (LE) so as to provide an accurate guide on their learning procedure has been a major concern and as such modeling students has been the subject of many researchers over the years and many approaches has been used in the past to achieve some success. This paper explores how Supervised Machine Learning could use the path to student learning as an input to modeling students’ comprehension ability as an output in a learning environment and advice on the best way a student could learn. In this paper, learning styles will be used as path to students’ knowledge. Four learning styles will be considered in this course of this work which are Auditory Learners (By Hearing), Visual Learners (By Sight), Kinesthetic Learners (Through Touch) and hybrid (a combination of two or more). Keywords- Computer Based Learning System (CBLS), Learning Environment (LE), Intelligent Tutoring System (ITS), Machine Learning (ML), Inductive Machine Learning (IML) I. Introduction Machine learning is a subfield of computer science [1] that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.[1] Machine learning explores the construction and study of algorithms that can learn from and make predictions on data.[2] Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions,[3] rather than following strictly static program instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline that also specializes in prediction-making. It has strong ties to mathematical ISSN: 2231-5381 optimization, which deliver methods, theory and application domains to the field. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible. Example of applications where machine learning is deployed includes spam filtering, optical character recognition (OCR), [4] search engines and computer vision. Machine learning is sometimes conflated with data mining,[5] although that focuses more on exploratory data analysis.[6] Machine learning and pattern recognition "can be viewed as two facets of the same field.[3] When employed in industrial contexts, machine learning methods may be referred to as predictive analytics or predictive modeling. II. Types of Problems and Tasks Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are:[10],[7] a. Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. In the case of this paper, the input is made up of the four learning styles which is provided by the teacher for the modeling of the student comprehension of a subject matter. The student comprehension being the output, goal of the teacher. b. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end. c. Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not. Another example is learning to play a game by playing against an opponent.[3]: http://www.ijettjournal.org Page 229 International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015 Between supervised and unsupervised learning is semi-supervised learning, where the teacher gives an incomplete training signal: a training set with some (often many) of the target outputs missing. Transduction is a special case of this principle where the entire set of problem instances is known at learning time, except that part of the targets is missing. A support vector machine is a classifier that divides its input space into two regions, separated by a linear boundary. Here, it has learned to distinguish black and white circles. Among other categories of machine learning problems, learning to learn learns its own inductive bias based on previous experience. Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers, and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation. Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system: [3] in classification, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one (or multilabel classification) or more of these classes. This is typically tackled in a supervised way. Spam filtering is an example of classification, where the inputs are email (or other) messages and the classes are "spam" and "not spam”. In regression, also a supervised problem, the outputs are continuous rather than discrete. In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task. Density estimation finds the distribution of inputs in some space. Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional space. Topic modeling is a related problem, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics. III. Supervised Machine Learning Technique in ITS This technique is about identifying both the input and the output of a problem domain and formulating certain rules within the input end that will produce the output. In this paper, the goal, which is the output, is referred to the student comprehension of a subject matter in a typical learning environment and we already have the sets of input as the four different learning styles. In any case, every individual has a learning style that is most suitable for him or her. ISSN: 2231-5381 Therefore, the trial of the four sets of learning styles to determine the actual one learning styles suitable for the student is what this paper is all about. The diagram below describes the entire process. A ITS B C E D Figure 1: Supervised Learning A, Technique B, C, D represents the four learning styles which are provided as input and E is the output, which is the target goal. The goal here is comprehension of student being modeled. A set of rules will be formulated for the input that will help choose the best learning styles that is appropriate for the student being modeled. IV. Supervised learning algorithms Inductive machine learning is the process of learning a set of rules from instances (examples in a training set), or more generally speaking, creating a classifier that can be used for training from new instances. The process of applying supervised ML to a real-world problem is described in Figure 2. Figure 2. The process of supervised ML http://www.ijettjournal.org Page 230 International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015 The first step is collecting the dataset. If a requisite expert is available, then s/he could suggest which fields (attributes, features) are the most informative. If not, then the simplest method is that of “bruteforce,” which means measuring everything available in the hope that the right (informative, relevant) features can be isolated. However, a dataset collected by the “brute-force” method is not directly suitable for induction. It contains in most cases noise and missing feature values, and therefore requires significant pre-processing[8]. The second step is the data preparation and data pre-processing. Depending on the circumstances, researchers have a number of methods to choose from to handle missing data[9]. It could be recalled that Hodge and Austin, 2003;[10] introduced a survey of contemporary techniques for outlier (noise) detection. These researchers had identified the techniques’ advantages and disadvantages. Instance selection is not only used to handle noise but to cope with the infeasibility of learning from very large datasets. Instance selection in these datasets is an optimization problem that attempts to maintain the mining quality while minimizing the sample size[11]. It reduces data and enables a data mining algorithm to function and work effectively with very large datasets. There is a variety of procedures for sampling instances from a large dataset[12]. Feature subset selection is the process of identifying and removing as many irrelevant and redundant features as possible[12]. This reduces the dimensionality of the data and enables data mining algorithms to operate faster and more effectively. The fact that many features depend on one another often unduly influences the accuracy of supervised ML classification models. This problem can be addressed by constructing new features from the basic feature set [13]. This technique is called feature construction/transformation. These newly generated features may lead to the creation of more concise and accurate classifiers. In addition, the discovery of meaningful features contributes to better comprehensibility of the produced classifier, and a better understanding of the learned concept. V. Algorithm selection The choice of which specific learning algorithm we should use is a critical step once preliminary testing is judged to be satisfactory, the classifier (mapping from unlabeled instances to classes) is available for routine use. The classifier’s evaluation is most often based on prediction accuracy (the percentage of correct prediction divided by the total number of predictions). There are at least three techniques which are used to calculate a classifier’s accuracy. One technique is to split the training set by using two-thirds for training and the other third for ISSN: 2231-5381 estimating performance. In another technique, known as cross-validation, the training set is divided into mutually exclusive and equal-sized subsets and for each subset the classifier is trained on the union of all the other subsets. The average of the error rate of each subset is therefore an estimate of the error rate of the classifier. Leave-one-out validation is a special case of cross validation. All test subsets consist of a single instance. This type of validation is, of course, more expensive computationally, but useful when the most accurate estimate of a classifier’s error rate is required. If the error rate evaluation is unsatisfactory, we must return to a previous stage of the supervised ML process (as detailed in Figure 2). A variety of factors must be examined: perhaps relevant features for the problem are not being used, a larger training set is needed, the dimensionality of the problem is too high, the selected algorithm is inappropriate or parameter tuning is needed. Another problem could be that the dataset is imbalanced [14]. A common method for comparing supervised ML algorithms is to perform statistical comparisons of the accuracies of trained classifiers on specific datasets. If we have sufficient supply of data, we can sample a number of training sets of size N, run the two learning algorithms on each of them, and estimate the difference in accuracy for each pair of classifiers on a large test set. The average of these differences is an estimate of the expected difference in generalization error across all possible training sets of size N, and their variance is an estimate of the variance of the classifier in the total set. Our next step is to perform paired t-test to check the null hypothesis that the mean difference between the classifiers is zero. This test can produce two types of errors. Type I error is the probability that the test rejects the null hypothesis incorrectly (i.e. it finds a “significant” difference although there is none). Type II error is the probability that the null hypothesis is not rejected, when there is actually a difference. The test’s Type I error will be close to the chosen significance level. In practice, however, we often have only one dataset of size N and all estimates must be obtained from this sole dataset. Different training sets are obtained by sub-sampling, and the instances not sampled for training are used for testing. Unfortunately this violates the independence assumption necessary for proper significance testing. The consequence of this is that Type I errors exceed the significance level. This is problematic because it is important for the researcher to be able to control Type I errors and know the probability of incorrectly rejecting the null hypothesis. Several heuristic versions of the t-test have been developed to alleviate this problem [15], [16]. Ideally, we would like the test’s outcome to be independent of the particular partitioning resulting http://www.ijettjournal.org Page 231 International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 4- August 2015 from the randomization process, because this would make it much easier to replicate experimental results published in the literature. However, in practice there is always certain sensitivity to the partitioning used. To measure replicability we need to repeat the same test several times on the same data with different random partitioning usually ten repetitions and count how often the outcome is the same [17]. Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logical/Symbolic techniques), Perception-based techniques and Statistics (Bayesian Networks, Instance-based techniques). However, in our research we suggested and deployed a supervised machine learning technique, starting with logical/symbolic algorithms. VI. Conclusion This paper had explained in detail how supervised learning of the machine learning system can be used to model student in a learning environment. This was possible because the set of the inputs have be defined as well as the output. All that is required is to create a set of rule that can be used to realize the goal, which is comprehension. The importance of knowing the actual learning style of a student cannot be over emphasized and therefore using machine has made it much easier to realize. The model when deployed can serve as an alternative platform to students for learning in our universities and tertiary institutions in Nigeria. REFERENCES [1] William, L. H., (2015). Machine Learning, Encyclopedia Britannica, ISSN: 2231-5381 http://www.britannica.com/EBchecked/topic/1116194/machinelearning [2] Ron, K. and Foster, P., (1998). "Glossary of terms". Machine Learning 30: 271–274. [3] Bishop, C. M., (2006). Pattern Recognition and Machine Learning. Springer. ISBN 0-387-31073-8. [4] Wernick, M. N., Yang, Y., Brankov, J. G., Yourganov, G., and Strother, S. C., (2010). Machine Learning in Medical Imaging, IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 25-38. [5] Mannila, H., (1996). Data mining: machine learning, statistics, and databases. Int'l Conf. Scientific and Statistical Database Management. IEEE Computer Society. [6] Friedman, J. H., (1998). "Data Mining and Statistics: What's the connection?". Computing Science and Statistics 29 (1): 3–9. [7] Russell, S., and Norvig, P., (2003). Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN 9780137903955. [8] Zhang, S., Zhang, C., and Yang, Q., (2002). Data Preparation for Data Mining. Applied Artificial Intelligence, Volume 17, pp. 375 – 381 [9] Batista, G., and Monard, M.C., (2003), An Analysis of Four Missing Data Treatment Methods for Supervised Learning, Applied Artificial Intelligence, vol.17, pp.519-533. [10] Hodge, V., and Austin, J., (2004). A Survey of Outlier Detection Methodologies, Artificial Intelligence Review, Volume 22, Issue 2, pp. 85-126 [11] Liu, H., and Motoda H., (2001). Instance Selection and Constructive Data Mining, Kluwer, Boston. and Pruning. Data Mining and Knowledge Discovery 4: 315–344 [12] Reinartz, T., (2002). A Unifying View on Instance Selection, Data Mining and Knowledge Discovery, 5, 191–210, Kluwer Academic Publishers. [13] Markovitch, S., and Rosenstein, D., (2002). Feature Generation Using General Construction Functions, Machine Learning 49: 59-98. [14] Japkowicz, N., and Stephen, S., (2002). The Class Imbalance Problem: A Systematic Study Intelligent Data Analysis, Volume 6, Number 4. [15] Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7) 1895–1924. [16] Nadeau, C., and Bengio, Y., (2003). Inference for the generalization error. In Machine Learning 52:239–281. [17] Bouckaert, R., (2003). Choosing between two learning algorithms based on calibrated tests. Proc 20th Int. Conf on Machine Learning, pp. 51-58. Morgan Kaufmann http://www.ijettjournal.org Page 232