International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 An Applied Research of Rasch GSP for Evaluating Difficulty of Test Questions Tian-Wei Sheu1, Phung-Tuyen Nguyen2, Duc-Hieu Pham3 , Phuoc-Hai Nguyen4and Nagai Masatake5 1,2,3,4,5 Graduate Institute of Educational Measurement and Statistics, National Taichung University of Education, Taichung, Taiwan ABSTRACT Educational assessment is usually focused on determining how well students are learning and has become an integral part of the quest for improving education, and assessment method is also required to be simple, understandable, easily applied but objective and accurate. The purpose of this paper is to propose a new application of Rasch GSP which can evaluate the difficulty of questions in educational testing. This method promotes the advantages of Rasch model and Grey Student-Problem (GSP) chart to create the creative application of evaluation on the basis of statistics and arrangement in order of priority criteria. In order to identify the rationality of method, the experimental results of the method have been compared with results processed by BILOGMG 3 software which implemented the task based on Item Response Theory (IRT). The research result shows that the advantage of the proposed method is simple, and easy to apply. Especially, it is not only best for handling small samples but also applicable for large sample. This new application is a more convenience way for teachers and education managers to carry out the evaluation process. Keywords: assessment method, Rasch GSP, GSP, small sample, large sample. 1. INTRODUCTION Assessment plays an integral role in teaching and learning, it is used in classrooms to help students learn, so it must be transformed in two fundamental ways that the content and character of assessments must be significantly improved and the gathering and use of assessment information must become a part of the ongoing learning process [1].Whether within the scope of a class or within the wider scope of a school, evaluating academic achievement of students occurs frequently to ensure the provision of timely feedback results in service of teaching, it requires evaluation results to be obtained quickly but accurately and objectively. Therefore, the construction and design of models and assessment methods that have high effectiveness are interesting. An important requirement for assessment methods is that be accurate and easy to apply but also objective, valid, reliable, and practicable to meet the needs of teaching[2]. In the classroom space, the Student-Problem chart (S-P chart) is an evaluation method, which effectively evaluates the students’ results of learning through the tests, it has been in use for many years. In 1969, S-P chart was invented by Takahiro Sato, it can help teachers diagnose abnormal performance. The main purpose of the S-P chart is to get the diagnostic data of each student, and teachers can provide better advise for each student academically depending on this analyzed data[3].In 1982, Deng proposed Grey system theory wherein grey relational analysis is an effective mathematical tool. Grey relational analysis measures the degree of similarity or difference between two sequences based on the grade of relationship between them[4]. In order to overcome the weaknesses of S-P chart which only processed dichotomous data, Nagai proposed GSP chart in 2010, it is a combination of S-P chart and Grey system theory to analyze S-P chart data more specifically. With GSP chart analysis, the uncertainty factors in the study are analyzed clearly[5]. In 1960, Rasch introduced a model for analyzing the test data whose test broadly refers to an assessment of an examinee’s level of ability in a particular domain such as math or reading, or a survey of an examinee’s behaviors or attitudes toward something[6]. The aim of this model is to measure each examinee’s level of a latent trait that underlies his or her scores on items of a test, so it suitable for the analysis of large data. The concept of item response theory (IRT) was known during the 1950s and 1960s, but it did not become widely used until the late 1970s and 1980s by three of the pioneers, namely Lord, Rasch, and Lazarsfeld who pursued parallel research independently. The purpose of IRT is to provide a framework for evaluating how well assessments work, and how well individual items on assessments work. The most common application of IRT is in education, where psychometricians use it for developing and designing exams, building item banks for exams, and equating the difficulties of items for successive versions of exams [7]. In practical application, the parameters of IRT are estimated by computer programs because of the vast number of parameters that must be estimated. One of the most popular software used is BILOG-MG [8]. The view of the Rasch model was applied to GSP chart was a creativity suggestion by Nagai in 2010. This theory has become a method to judge uncertain factors [9]. Rasch GSP can make problem analysis more specific and clear, it shows the subject’s identity status completely and provides the best result which can then be used to provide digital curriculum design and aid development of a reference index [10]. Volume 3, Issue 3, March 2014 Page 214 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 S-P chart analysis method has been used popularly for many years, however it was only suitable for small samples; on the other hand, it only evaluated students and questions at a general level with no specific arrangement of difficulty order of questions. Meanwhile, BILOG-MG software processed data based on IRT theory which gave the evaluation results with specific question difficulties, however, the samples which satisfied its assumptions are always large statistical data sets. It is hypothesized that it was possible to construct an assessment method which not only applied for small sample but could also handle large statistical data sets. The current study proposes a new application of Rasch GSP which can evaluate the difficulty of questions in the test, it could be applied for both small and large samples. The Rasch GSP is implemented as a method of non-parametric statistics for small samples, its success has been proven through some studies[10, 11], now this study would like to propose its new application. In order to show its features, the study applies it in educational testing with large statistical sample whose data collected from 46-question test with more than 1000 students. Evaluation results are compared with the results processed by BILOG-MG 3 software to determine the rationality. 2. BASIC THEORY This study proposes a new application of Rasch GSP method to evaluate difficulty of questions, and then its experimental results are compared with treatment results of BILOG-MG 3. So, the basic theories needed to introduce as follows: 2.1 Rasch model The Rasch model was named after the Danish mathematician Georg Rasch. The model shows what should be expected in responses to items if measurement is to be achieved [12]. The model assumes that the probability of a given respondent affirming an item is a logistic function of the relative distance between the item location and the examinee location on a linear scale. Georg Rasch first announced this model for analyzing the response of the answerers to obtain an objective interval scale that can measure the latent trait of an answerer [13]. In the Rasch model, the correct response probability of a student is a logistic function of the difference between that student’s ability and the item difficulty [14]. The relation between latent trait (theta) and correct response probability is described by an item characteristic curve (ICC). The Rasch model has descriptive function and predictive function. In descriptive function, this model can clearly explain the relationship between student’s ability and item difficulty, the difference between students and the difference between items. In predictive function, this model can predict the probability of a student who has a specified ability to answer a specified item correctly [13]. Figure 1 ICC for three different items in Rasch model 2.2 Student - Problem Chart S-P chart is known as a method that can analyze, process, and arrange data in a defined order, it is very useful for diagnosing the learning state of student and question quality[15, 16]. Definition 1: S-P chart matrix Let X [ x ij ] m n be the student response matrix and called S-P chart matrix, where i 1, 2 , , m is the order of student, j 1,2, , n is the order of question, m, n N , and 0, student i gives wrong answer to problem j xij 1, student i gives correct answer to problem j (1) Caution Index for Student (CS): n (x ij CS i 1 )( x j ) ( xi )( x ) j 1 where x n 1 n x j and l xi xij n j 1 j 1 where x ' 1 m l (x j ) ( xi )( x ) (2) j 1 Caution Index for Problem ( CP ): m (x ij CPj 1 )( xi ) ( x j )( x') i 1 l' (x i ) ( x j )( x') m x i 1 m i and l ' x j xij (3) i 1 i 1 Volume 3, Issue 3, March 2014 Page 215 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 Based on the CS value and the rate of problems being answered correctly by student, the students would be classified and diagnosed, similarly, based on the CP value with the rate of students answering problem correctly, the problems would be also classified. 2.3 Grey Relational Analysis (GRA) In grey system theory, GRA is an effective mathematical tool to treat the uncertain, multiple, discrete and incomplete information. This study would like to refer the localized grey relational grade which is proposed by Nagai [17]. Its procedure as follows: 2.3.1 Establishment of original vectors Definition 2: Basic vectors in GRA The reference vector x0 and inspected vectors of original data xi are established as followed: x 0 ( x (1), x (2), , x ( j ), , x (n )) 0 0 0 0 x1 ( x1 (1), x1 (2), x1 ( j ), x1 (n)) x 2 ( x 2 (1), x 2 (2), x 2 ( j ) , x 2 ( n)) xi ( x i (1), xi ( 2), x i ( j ) , xi (n)) (4) x m ( x m (1), x m (2), x m ( j ) , x m (n)) i 1,2, , m; j 1,2, , n; m, n N There is X [ x i ( j)] [ xij ] m n representing original data in S-P chart, if S i is called a series of answered states by the i-th student for all questions and Pj is a series of answered states by all students for the j-th question, then: S i ( x (1), xi (2), , x ( j ), , x i (n )) i (5) i Pj ( x1 ( j ), x ( j ), , x ( j ), , x m ( j )) 2 i The data of S-P chart which have been shown in formula (5) will be applied for GRA. 2.3.2 Calculation of GRA In original formula, x0 is reference sequence of local grey relational grade, and xi are inspected sequences. The established sequences have to satisfy three conditions: non-dimension, scaling, polarization. Grey relational generation has three ways: larger-the-better (the expected goal is bigger the better), smaller-the-better (the expected goal is smaller the better), and nominal-the-better (the expected goal is between maximum and minimum). Definition 3: Localized grey relational grade (LGRG) The localized grey relational grade is defined as follows: 0i ( x 0 ( j ), x i ( j )) max 0i (6) max min where max and min are maximum value and minimum value of 0i respectively, 0i is the absolute distance between x0 and xi , its formula as follows: 0i x0 xi n 1 ( ( x0 ( j) xi ( j)) ) (7) j 1 0i is called Minkowski distance. This study applies 2 , so 0i is also known as Euclidean distance. When 0i is close to 1, it means that x0 and xi are highly correlated, in contract, 0i is close to 0, the relationship between x0 and xi is lower. Grey relational ordinal: The whole decision-making is made by the comparison of the grey relation 0i . Through the ordinal, different causes can be identified, and the most important influence can be found, becoming the relational standard in the system. 2.4 GSP chart and Rasch GSP GSP chart is the combination of GRA and S-P chart, it was developed in order to overcome the weaknesses of the S-P chart. GSP chart can make the analysis more concrete and accurate, and the uncertain factors in the studies can also be analyzed [18]. Its description is shown in Table 1. Definition 4: Gamma value In GSP chart, GS i is the localized grey relational grade of the i -th student, and GPj is the localized grey relational grade of the j -th problem. They are general called Gamma value, and in specific, GS is called Gamma value for student and GP is called Gamma value for problem: Volume 3, Issue 3, March 2014 Page 216 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 GS i 0 i max j 0 max 0 i , j 1,2, , n , i 1,2, , m ; GPj j 0 max min max min ISSN 2319 - 4847 (8) Table 1: GSP chart Nagai applied the view of Rasch model in GSP chart to propose the Rasch GSP method, it used logistic regression to analyze the relationship between two sets of data which were sets of the order value of students (or problem) and the localized grey relational grades. The purpose was to find a function that represented the characteristics of the entire data [11]. This function is called Rasch GSP function, and its graph is called the Rasch GSP graph. Definition 5: Rasch GSP function 1 Let y f ( x ) (9) 1 e ( x ) be the three-parameter logistic regression function, where , , R are regression coefficients. When x is the order of student ability or the order of item difficulty and y is the localized grey relational grade, the above function y f (x ) is called Rasch GSP function. If x is the order of student ability and y is LGRG-S then the above function y f (x ) is called Rasch GSP function for students, similarly, if x is the order of item difficulty and y is the LGRG-P then y f (x ) is called Rasch GSP function for problems. 2.6 Item Response Theory with BILOG-MG software IRT has become one of the most popular scoring frameworks for measurement data. IRT models are used frequently in computerized adaptive testing, cognitively diagnostic assessment, and test equating. One of the programs that exists for this purpose is BILOG-MG, that has proven particularly useful and reliable over recent decades for many applications. In order to fit into IRT models estimated with BILOG-MG, experimental data have to satisfy three assumptions, these are local independence, monotonicity, and uni-dimensionality[19]. For estimation of the IRT model parameters in BILOG-MG, the degree of bias and estimation error for parameter estimates depends on factors such as the number of parameters, number of examinees and test length. If any general guidelines can be given, it appears that for tests with between 15 and 50 items, approximately at least 250 examinees are required for the oneparameter logistic model, and two-parameter logistic model approximately at least 500, maybe even 1,000 examinees are required for the three-parameter logistic model [20]. 3. METHODOLOGY In this paper, three terms are used in the same meaning: problem = question = item. This study creatively applies Rasch GSP method to promote its effectiveness in assessing academic achievement, the process of applying is presented in the following diagram (Figure 2). Raw data Analyze data by GSP chart Rearrange data by CS, CP Apply regression method by using Rasch GSP curve Determine the last Gamma values Figure 2 Diagram of new application of Rasch GSP method Volume 3, Issue 3, March 2014 Page 217 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 The main purpose of study is to evaluate the difficulty of questions, so the process of processing data is carried out through the algorithm which is presented as follows: Algorithm and process of processing data to evaluate the difficulty of questions in the test: Step 1: Calculate and arrange the data with GSP The original data are processed and arranged according to the rule of GSP chart, the results are presented according to specifications in table 1 (in the basic theory section). Step 2: Give priorities in order to arrange the data according to Gamma value The questions are arranged in the order of their Gamma values (GP) from small to large. Step 3: Rearrange the order according to the caution index for problem (CP) The questions, which have the same Gamma value, are rearranged in order of CP value from large tosmall. Step 4: Apply the logistic regression method for arranged Gamma values according to the above steps The Rasch GSP method is applied, Rasch GSP curve is plotted for the entire data set. Step 5: Match the new Gamma values for all the questions Each question corresponds to a point on the Rasch GSP curve, the value corresponding to each question on the y axis is its new Gamma value that needs to be determined. The difficulty of the questions is calculated according to new Gamma value. 4. EXPERIMENT, RESULTS, AND DISCUSSION 4.1 Experimental design The study has performed two experiments to test the Rasch GSP method about the ability to assess the question difficulty, those are: Experiment 1 conducted a 22-question Math test for 37 students in Taichung, Taiwan, the obtained result had Cronbach’s Alpha reaching 0.884. Using the MATLAB toolbox for Rasch GSP to analyze the data and applying the method mentioned, the result would be obtained and compared with the result processed based on Classical Test Theory. In order to demonstrate the strengths of the Rasch GSP method, the research continued to conduct an experiment with large statistical data. Experiment 2 conducted a 46-question English test for 1119 students also in Taichung, the test result also had high Cronbach’s Alpha, it reached 0.833. This satisfied the assumptions of IRT and estimation of IRT model parameters in BILOG-MG. Data were analyzed using the Rasch GSP method to give assessment results, then these data were also analyzed with the BILOG-MG 3 software, the two analytical results were compared with each other. 4.2 Results The result of Rasch GSP analysis (as shown in Figure 3) is Rasch GSP curve for problem that is mentioned in step 4 of the methodology section. Figure 3 Rasch GSP curve for problem in experiment 1 To show the result of evaluating difficulty of each question, a coordinate method was applied. The difficulty of questions were calculated and exported from a program written by MATLAB (Program for Rasch GSP) and presented in Table 2, they were arranged in order from small value to large value. This result was compared with the result calculated by Classical Test Theory (Table 3). The questions were highlighted in two tables to compare together to show that the order of their difficulties were arranged in the same position for the two methods. Table 2: Evaluation result for questions difficulty by the proposed method in experiment 1 Question-number 9 11 8 1 6 19 2 5 21 7 16 Difficulty 0.153 0.176 0.201 0.229 0.261 0.295 0.331 0.370 0.412 0.445 0.449 Question-number 10 15 14 12 17 4 20 18 13 22 3 Difficulty 0.544 0.589 0.634 0.677 0.719 0.759 0.796 0.831 0.863 0.892 0.919 Volume 3, Issue 3, March 2014 Page 218 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 Table 3: Evaluation result for question difficulty by Classical Test Theory in experiment 1 Question-number 9 11 8 1 6 19 2 5 21 7 16 Dj 0.865 0.838 0.784 0.757 0.757 0.730 0.730 0.730 0.676 0.649 0.649 Question-number 10 15 14 12 17 4 20 18 13 22 3 Dj 0.622 0.622 0.595 0.568 0.541 0.541 0.514 0.486 0.459 0.405 0.351 In Table 3, the difficulty D j of the j-th question was calculated by the following formula: D j nj N Where n j is the number of students who answered the j-th question correctly, N is the number of students taking this test. The larger the D j value is the smaller the question difficulty gets, so the questions are arranged by value of D j from large to small. In experiment 2, Rasch GSP analysis was applied (Figure 4), the difficulty of questions were put out from Program for Rasch GSP and then arranged in order from small value to large value (Table 4). Figure 4 Rasch GSP curve for problem in experiment 2 Figure 5 Evaluation results for questions processed by BILOG-MG 3 in experiment 2 The results of evaluating difficulty of the questions, which were processed by BILOG-MG 3, were presented in Figure 5, in in which, a, and b are two parameters of question estimated by Item Response Theory. The difficulty of questions Volume 3, Issue 3, March 2014 Page 219 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 represented by the parameter b were also arranged from small value to large value (Table 5) in order to compare with the results of the proposed method, question number 1 corresponds item 01 and question number 2 corresponds item 02, etc. Table 4: Evaluation results for question difficulty by the proposed method in experiment 2 Question-number 5 22 28 36 20 2 9 16 29 19 17 31 Difficulty 0.156 0.165 0.174 0.184 0.194 0.205 0.216 0.227 0.239 0.252 0.265 0.278 Question-number 6 24 23 21 18 33 25 40 30 41 26 39 Difficulty 0.292 0.307 0.322 0.338 0.354 0.370 0.388 0.405 0.423 0.442 0.461 0.480 Question-number 42 4 34 35 44 1 45 43 15 32 13 14 Difficulty 0.499 0.520 0.540 0.561 0.581 0.603 0.624 0.645 0.667 0.688 0.710 0.731 Question-number 10 3 27 38 12 7 11 46 37 8 Difficulty 0.753 0.774 0.796 0.817 0.837 0.858 0.878 0.898 0.918 0.937 Table 5: Evaluation results for question difficulty from BILOG-MG 3 in experiment 2 Question-number 5 22 28 36 20 2 9 16 29 19 17 31 Difficulty -4.141 -3.99 -2.934 -2.896 -2.813 -2.647 -2.626 -2.576 -2.536 -2.243 -1.992 -1.971 Question-number 6 24 23 21 18 33 25 40 30 41 26 39 Difficulty -1.936 -1.882 -1.796 -1.758 -1.665 -1.641 -1.558 -1.305 -1.305 -1.264 -1.088 -0.902 Question-number 42 4 34 35 44 1 45 43 15 32 13 14 -0.186 -0.157 Difficulty -0.879 -0.838 -0.793 -0.792 -0.729 -0.668 -0.663 -0.628 -0.615 -0.468 Question-number 10 3 27 38 12 7 11 46 37 8 Difficulty 0.113 0.130 0.201 0.247 0.310 0.784 1.427 1.450 1.782 2.070 4.3 Discussion It was found that the proposed method has evaluated the question difficulty in the test and given the evaluated difficulty of each question. In experiment 1 with small sample, the sorting the order of question difficulty by the proposed method and the Classical Test Theory were similar, namely at highlighted position of 11 questions (50%). At these positions, the difficulties of the questions are completely different, so they are arranged in logical order. At the positions which are not highlighted, the proposed method distinguished the difficulty of questions obviously, so they are also arranged in logical order. Meanwhile, according to Classical Test Theory, these questions are not arranged in order clearly, namely at two questions number 1 and 6, three questions number 19, 2 and 5, two question number 7 and 16, two questions number 10 and 15, two questions number 17 and 4, because they have the same difficulty values. This weakness is due to the fact that the Classical Test Theory did not consider the priority criteria, the questions which have the same correctly answered rate, always have the same difficulty while the ability of students are different from each other. This shows the advantages of the proposed method which has already considered the quality and reliability of questions. In experiment 2 with large sample, the results of arrangement of questions according to their difficulties were completely similar for the results processed by two methods. This confirms the rationality and reliability of the proposed method which can be used to handle large sample. The basic difference is that the difficulty distance estimated between two corresponding questions of two methods are not similar, the reason is the difficulty scale of question of two methods are not similar. This is also a limitation of the proposed method. 5. CONCLUSION With the assumption that it was possible to build an educational assessment method which can be applied to both small and large statistical samples, the present study proposed a new evaluation method based on a new application of Rasch GSP in evaluating question difficulty. The contributions of this study are as follows: 1. The study has developed a method of assessment that does not only well apply for small statistical samples, but also handle large statistical samples. 2. Evaluation results of the proposed method are compared with the evaluation results of the previous methods, the fit is relatively high and shows the advantages of the new method. 3. The advantage of the proposed method is that it is simple, easily understandable, and easy to apply; the processed results are accurate and objective. In summary, this new application of Rasch GSP with its features helps teachers and education managers to have more convenience to perform their evaluation in the teaching process. They will have more time for the construction of multiplechoice questions in a better way. Volume 3, Issue 3, March 2014 Page 220 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 References [1] L. A. Shepard and L. A. Sheppard, "The role of classroom assessment in teaching and learning," 2000. [2] D. Pulakos, "Selection assessment methods," SHRM Foundation’s Effective Practice Guidelines, Society for Human Resource Management, Alexandria, VA, p. 55, 2005. [3] M. N. Yu, Educational Testing and Assessment, Third ed. Taiwan: Psychology Publisher, 2011. [4] K. L. Wen, C. Chao, H. Chang, S. Chen, and H. Wen, Grey system theory and applications. Taipei: Wu-Nan Book Inc, 2009. [5] T. W. Sheu, C. P. Tsai, J. W. Tzeng, D. H. Pham, H. J. Chiang, C. L. Chang, et al., "An Improved Teaching Strategies Proposal Based on Students' Learning Misconceptions," International Journal of Kansei Information, vol. 4, pp. 1-12, 2013. [6] Karabatsos, "The Rasch model, additive conjoint measurement, and new models of probabilistic measurement theory," Journal of Applied Measurement, vol. 2, pp. 389-423, 2001. [7] R. K. Hambleton, H. Swaminathan, and H. J. Rogers, Fundamentals of item response theory: Sage, 1991. [8] M. Du Toit, IRT from SSI: Bilog-MG, multilog, parscale, testfact: Scientific Software International, 2003. [9] T. W. Sheu, J. W. Tzeng, J. C. Liang, B. T. Wang, and M. Nagai, "The Use of Rasch Model GSP Chart and Grey Structural Model Analysis for Vocational Education and Training Courses: Taking Enterprise Ethics and Professional Ethics Courses as an Example," Journal of Educational Research and Development, vol. 8(4), pp. 53-80, 2012. [10] J. W. Tzeng, T. W. Sheu, J. C. Liang, B. T. Wang, and M. Nagai, "A new proposal based on raschmodel GSP chart and grey structural model withanalysis and discussion," International Journal of Advancements in Computing Technology, pp. 111-121, 2012. [11] J. W. Tzeng, T. W. Sheu, J. C. Liang, B. T. Wang, and M. Nagai, "A New Proposal Based on Rasch Model GSP Chart and Grey Structural Model with Analysis and Discussion.," International Journal of Advancements in Computing Technology, vol. 4, pp. 111-121, 2012. [12] Rasch, Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1960. [13] W. C. Wang, "Rasch Measurement Theory and Application in Education and Psychology," Journal of Education & Psychology, vol. 27, pp. 637-694, 2004. [14] B. Baker, The basics of item response theory, Second ed. United States of America: ERIC Clearinghouse on Assessment and Evaluation., 2001. [15] Y. C. Ho, "An Experimental Study of the Effects of Mastery Learning Combined with the Diagnosis of Microcomputerized S-P Chart Analysis on Students' Learning," Educational Psychology, vol. 22, pp. 191-214, 1989. [16] S. C. You and M. N. Yu, "The Relationships among Indices of Diagnostic Assessments Knowledge Structures and S-P Chart Analysis," Education and Psychology, vol. 29, pp. 183-208, 2006. [17] Yamaguchi, G.-D. Li, and M. Nagai, "Verification of effectiveness for grey relational analysis models," Journal of Grey System, vol. 10, pp. 169-181, 2007. [18] T. W. Sheu, T. L. Chen, J. W. Tzeng, C. P. Tsai, H. J. Chiang, C. L. Chang, et al., "Applying Misconception Domain and Structural Analysis to Explore the Effects of the Remedial Teaching," Journal of Grey System, vol. 16, pp. 17-34, 2013. [19] A. Rupp, "Item response modeling with BILOG-MG and MULTILOG for Windows," International Journal of Testing, vol. 3, pp. 365-384, 2003. [20] L. Hulin, R. I. Lissak, and F. Drasgow, "Recovery of two-and three-parameter logistic item characteristic curves: A Monte Carlo study," Applied psychological measurement, vol. 6, pp. 249-260, 1982. Volume 3, Issue 3, March 2014 Page 221 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 Appendices Table 6: Response of students in experiment 1 Student Code 11201 11202 11203 11204 11205 11206 11207 11208 11209 11210 11211 11212 11213 11214 11215 11216 11217 11218 11219 11220 11221 11222 11223 11224 11225 11226 11227 11228 11229 11230 11231 11232 11233 11234 11235 11236 11237 Students’ response 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 0 1 1 1 Volume 3, Issue 3, March 2014 1 1 0 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 Number of correct 1 1 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 13 22 10 4 9 20 22 10 14 17 9 8 11 18 10 16 17 19 4 14 18 17 7 19 17 8 13 11 13 20 13 11 16 0 21 21 21 Page 222 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 AUTHORS Tian-Wei Sheu received the Ph.D. degree in Mathematics from National Osaka University, Japan in 1990. He is the Dean of College of Education and a professor of Graduate Institute of Educational Measurement, National Taichung University, Taichung, Taiwan. His studies focus in IRT, Educational Measurement, and e-Learning, etc. He is the director of TKIA (Taiwan Kansei Information Association). Phung-Tuyen Nguyen is currently a Ph.D. candidate in Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taiwan. He received Master’s degree in Physics, in 2003 from Hanoi University of education, Vietnam. His research interests focus on item response theory, grey system theory, and educational measurement. Duc-Hieu Pham received Master’s degree of education at Hanoi Pedagogical University No2 of Vietnam in in 2009. He works as a lecturer the Primary Education Faculty of Hanoi Pedagogical University No 2, Vietnam. He is currently a Ph.D. candidate in Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taiwan. His research interests include grey system theory, educational measurement and primary education. Phuoc-Hai Nguyen received Master’s degree in Biology from Hanoi University of education of Vietnam in in 2006. He is currently a Ph.D. candidate in Graduate Institute of Educational Measurement and Statistics, Statistics, National Taichung University, Taiwan. His research interests include biology, item response theory, grey system theory, ordering theory and educational measurement. Masatake Nagai received his Master’s degree in Engineering from Toukai University of Japan in 1969. He He worked in Oki Electric Industry Co., Ltd. for 18 years and was mainly engaged in the design development of ME systems, communication network systems, OA systems, etc. He was also a researcher (Dr. Matsuo research) at the TohokuUniversity while working toward his Ph.D in Engineering. From 1989, 1989, he worked at the Teikyo University Department of Science and Engineering as an assistant professor professor and eventually as an engineering professor. Chair professor in Graduate Institute of Educational Measurement, National Taichung University, Taiwan now. His research interests include approximation, strategy system engineering, information communication network technology, agent, kansei information processing, grey system theory and engineering engineering application. A regular of IEICE member. Volume 3, Issue 3, March 2014 Page 223