Yin, P.-Y., Chang, K.-C., Hwang, G.-J., Hwang, G.-H., & Chan, Y. (2006). A Particle Swarm Optimization Approach to Composing Serial Test Sheets for Multiple Assessment Criteria. Educational Technology & Society, 9 (3), 3-15. A Particle Swarm Optimization Approach to Composing Serial Test Sheets for Multiple Assessment Criteria Peng-Yeng Yin and Kuang-Cheng Chang Department of Information Management, National Chi Nan University, Pu-Li, Nan-Tou, Taiwan 545, R.O.C. Gwo-Jen Hwang Department of Information and Learning Technology, National University of Tainan, 33, Sec. 2 Shulin St.,Tainan city 70005, Taiwan, R.O.C. gjhwang@mail.nutn.edu.tw Tel: 886-915396558 Fax: 886-6-3017001 Gwo-Haur Hwang Information Management Department, Ling Tung University, Taichung, Taiwan 40852, R.O.C. Ying Chan Graduate Institute of Educational Policy and Leadership, Tamkang University Tamsui, Taipei County, Taiwan 251, R.O.C. ABSTRACT To accurately analyze the problems of students in learning, the composed test sheets must meet multiple assessment criteria, such as the ratio of relevant concepts to be evaluated, the average discrimination degree, difficulty degree and estimated testing time. Furthermore, to precisely evaluate the improvement of student’s learning performance during a period of time, a series of relevant test sheets need to be composed. In this paper, a particle swarm optimization-based approach is proposed to improve the efficiency of composing near optimal serial test sheets from very large item banks to meet multiple assessment criteria. From the experimental results, we conclude that our novel approach is desirable in composing near optimal serial test sheets from large item banks and hence can support the need of evaluating student learning status. Keywords Computer-assisted testing, serial test-sheet composing, particle swarm optimization, computer-assisted assessment 1. Introduction As the efficiency and efficacy of the deployment of computer-based tests have been confirmed by many early studies, many researchers in both technical and educational fields have engaged in the development of computerized testing systems (Fan et al., 1996; Olsen et al., 1986). Some researchers have even proposed computerized adaptive testing, which uses prediction methodologies to shorten the length of the test sheets without sacrificing their precision (Wainer, 1990). A well-scrutinized test is helpful for teachers wanting to verify whether students well digest relevant knowledge and skills and for recognition of students’ learning bottlenecks (Hwang et al., 2003a). In a computerized learning environment, which provides students with greater flexibility during the learning process, information concerning the student learning status is even more important (Hwang, 2003a). The key to a good test depends not only on the subjective appropriateness of test items, but also on the way the test sheet is constructed. To continuously evaluate the learning performance of a student, it is usually more desirable to compose a series of relevant test sheets to meet a predefined set of assessment criteria such that those test sheets in the same series will not contain identical test items (or contain only an acceptable percentage of overlapped test items). Because the number of test items in an item bank is usually large and the number of feasible combinations to form test sheets thus grows exponentially, an optimal test sheet takes enormous time to build up (Garey & Johnson, 1979). Previous investigation has even shown that a near-optimal solution is difficult to find when the number of candidate test items is larger than five thousand (Hwang et al., 2003b), not to mention the composition of a series ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from the editors at kinshuk@ieee.org. 3 of relevant test sheets from larger item banks for evaluating the improvement of student’s learning performance during a period of time. To cope with the problem in composing optimal serial test sheets from large item banks, a particle swarm optimization (PSO)-based algorithm (Kennedy and Eberhart, 1995) is proposed to optimize the selection of test items to compose serial test sheets. By employing this novel approach, the allocation of test items in each of the serial test sheets will meet the needs of multiple criteria, including the expected testing time, the degree of difficulty, the expected ratio of unit concepts, and the acceptable percentage of overlapped test items among test sheets to approximate the optimal allocation. Based on this approach, an Intelligent Tutoring, Testing and Diagnostic (ITED III) system has been developed. Experimental results indicated that the proposed approach is efficient and effective in generating near-optimal compositions of serial test sheets that satisfy the specified requirements. 2. Background and Relevant Researches In recent years, researchers have developed various computer-assisted testing systems to more precisely evaluate student’s learning status. For example, Feldman and Jones (1997) attempted to perform semi-automatic testing of student software using Unix systems; Rasmussen, et al. (1997) proposed a system to evaluate student learning status on computer networks while taking Feldman and Jones’ progress into consideration. Additionally, Chou (2000) proposed the CATES system, which is an interactive testing system developed in a collective and collaborative project with theoretical and practical research on complex technology-dependent learning environments. Unfortunately, although many computer-assisted testing systems have been proposed, few of them have addressed the problem of finding a systematic approach for composing test sheets that satisfy multiple assessment requirements. Most of the existing systems construct a test sheet by manually or randomly selecting test items from their item banks. Such manual or random test item selection strategies are inefficient and are unable to meet multiple assessment requirements simultaneously. Some previous investigations showed that a well-constructed test sheet not only helps in the evaluation of student’s learning status, but also facilitates the diagnosis of the problems embedded in the learning process (Hwang, 2003a; Hwang et al. 2003a; Hwang 2005). Selecting proper test items is very critical to constitute a test sheet that meets multiple assessment criteria, including the expected time needed for answering the test sheet, the number of test items, the specified distribution of course concepts to be learned, and, most importantly, the maximization of the average degree of discrimination (Hwang et al, 2005). Since satisfying multiple requirements (or constraints) when selecting test items is difficult, most computerized testing systems generate test sheets in a random fashion. Hwang et al. (2003b) proposed a multiple-criteria where test sheet-composing problem is formulated as a dynamic programming model (Hillier and Lieberman, 2001) to minimize the distance between the parameters (e.g., discrimination, difficulty, etc.) of the generated test sheets and the objective values subject to the distribution of concept weights. A critical issue arising from the use of a dynamic programming approach is the exceedingly long execution time required for producing optimal solutions. As the time-complexity of the dynamic programming algorithm is exponential in terms of input data, the execution time will become unacceptably long if the number of candidate test items is large. Consequently, Hwang et al. (2005) attempted to solve the test sheet-composing problem by optimizing the discrimination degree of the generated test sheets with a specified range of assessment time and some other multiple constraints. Nevertheless, in developing an e-learning system, it is necessary to conduct a long-term assessment of each student; that is, only optimizing a test sheet is not enough for such long-term observation for the student. Therefore, a series of relevant test sheets with multiple assessment criteria need to be composed for such a continuously learning performance evaluation. As the problem is much more difficult than that of composing a single test sheet, a more efficient and effective approach is needed. In this paper, a particle swarm optimizationbased algorithm is proposed to find quality approximate solutions in an acceptable time. A series of experiments will be also presented to show the performances of the novel approach. 3. Problem Description In this section, a mixed integer programming model (Linderoth and Savelsbergh, 1999) is presented to formulate the underlying problem. In order to conduct a long-term observation on the student’s learning status, a series of K relevant test sheets will be composed. The model aims at minimizing the differences between the average 4 difficulty of each test sheet and the specified difficulty target, with a specified range of assessment time and some other multiple constraints. Assume K serial test sheets with a specific difficulty degree will be composed out of a test bank consisting of N items, Q1, Q2, …, QN. To compose test sheet k, 1 ≤ k ≤ K, a subset of nk candidate test items will be selected. Assume that in total M concepts will be involved in the K tests. With the specified course concepts to be learned, say Cj, 1 ≤ j ≤ M , each test item is relevant to one or more of them. For example, to test the multimedia knowledge of students, Cj might be “MPEG”, “Video Streaming” or “Videoon-Demand”. We shall call this problem the STSC (Serial Test Sheets Composition) problem. In the STSC problem, we need confine the similarities between each pair of tests. Such a constraint imposed upon each pair of tests k and l, 1 ≤ k, l ≤ K, is specified by parameter f, which indicates that any two tests can have at most f items in common. The variables used in this model are given as follows: ¾ Decision variables xik, 1 ≤ i ≤ N and 1 ≤ k ≤ K: xik is 1 if test item Qi is included in test sheet k; 0, otherwise. ¾ Coefficient di, 1 ≤ i ≤ N : degree of difficulty of Qi. ¾ Coefficient D, target difficulty level for each of the serial test sheets generated. ¾ Coefficient rij, 1 ≤ i ≤ N , 1 ≤ j ≤ M : degree of association between Qi and concept Cj. ¾ ¾ ¾ ¾ ¾ Coefficient ti, 1 ≤ i ≤ N : expected time needed for answering Qi. Right hand side hj, 1 ≤ j ≤ M: lower bound on the expected relevance of Cj for each of the K test sheets. Right hand side l: lower bound on the expected time needed for answering each of the K test sheets. Right hand side u: upper bound on the expected time needed for answering each of the K test sheets. Right hand side f: the maximum number of identical test items between two composed test sheets; Formal definition of the STSC Model: ⎛ ⎜ ⎜ Minimize Zk = ⎜ ∑ ⎜ 1≤ k ≤ K ⎜ ⎝ subject to: N ∑r x ij ik i =1 N N ∑ p d i x ik i =1 N ∑ − D x ik i =1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 1 p ≥ h j , 1 ≤ j ≤ M , 1 ≤ k ≤ K; (1) ∑t x ik ≥ l , 1 ≤ k ≤ K; (2) ∑t x ik ≤ u, 1 ≤ k ≤ K ; (3) i i =1 N i i =1 N ∑x x ij ik ≤ f , 1 ≤ j ≠ k ≤ K; (4) i =1 In the above formula, constraint set (1) indicates the selected test items in each generated test sheet must have a total relevance no less than the expected relevance to each concept assumed to be covered. Constraint sets (2) and (3) indicate that total expected test time of each generated test sheet must be in its specified range. Constraint set (4) indicates that no pair of test sheets can contain more than f identical test items. ⎛ ⎜ ⎜ In the objective function, Zk = ⎜ ∑ ⎜ 1≤ k ≤ K ⎜ ⎝ N ∑ p d i x ik i =1 N ∑ i =1 − D x ik ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 1 p is the p-norm between the average difficulty degree of each test sheet from the target difficulty degree specified by the teacher. In particular, the objective function indicates the absolute distance when p = 1, and it calculates the root squared distance when p = 2. Therefore, the objective of this model seeks to select a number of test items such that the average difficulty 5 of each generated test sheet is closest to the target difficulty value D. Without loss of generality, we let p = 2 for simulation. The computation complexity for obtaining the optimal solution to the STSC problem is analyzed as follows. The number of possible combinations of test items for composing a single test sheet is ⎛⎜ N ⎞⎟ where Ω is the range ∑⎜ i∈Ω ⎟ ⎝i⎠ for number of test items that could be answered within the specified time frame [l, u], while the parameters hj and f will affect the number of feasible solutions in those combinations. For composing K serial test sheets, it K requires a computation complexity of ⎛⎜ ⎛⎜ ⎛ N ⎞ ⎞⎟ ⎞⎟ which is extremely high. Hence, seeking the optimal solution O⎜ ∑ ⎜ ⎟ ⎟ ⎜ ⎜⎝ i∈Ω ⎜⎝ i ⎟⎠ ⎟⎠ ⎟ ⎠ ⎝ to the STSC problem is computationally prohibitive. 4. PSO-based Algorithm for Serial Test Sheet Composition Linderoth and Savelsbergh (1999) conducted a comprehensive computational study manifesting that the mixed integer programming problems are NP-hard, which implies that composing optimal serial test sheets from a large item bank is computationally prohibitive. To cope with this difficulty, a particle swarm optimization (PSO)based algorithm is proposed to find quality approximate solutions with reasonable time. A. STSCPSO (Serial Test Sheets Composition with PSO) Algorithm The PSO algorithm was developed by Kennedy and Eberhart (1995). It is a biologically inspired algorithm which models the social dynamics of bird flocking and fish schooling. Ethologists find that a swarm of birds/fishes flock synchronously, change direction suddenly, scatter and regroup iteratively, and finally stop on a common target. The collective intelligence from each individual not only increases the success rate for food foraging but also expedites the process. The PSO algorithm facilitates simple rules simulating bird flocking and fish schooling and can serve as an optimizer for nonlinear functions. Kennedy and Eberhart (1997) further presented a discrete binary version of PSO for combinatorial optimization where the particles are represented by binary vectors of length d and the velocity represents the probability that a decision variable will take the value 1. PSO has delivered many successful applications (Eberhart and Shi, 1998; Yoshida et al., 1999; Shigenori et al., 2003). The convergence and parameterization aspects of the PSO have also been discussed thoroughly (Clerc and Kennedy, 2002; Trelea, 2003). In the followings, a PSO-based algorithm, STSCPSO (Serial Test Sheets Composition with PSO approach), is proposed to find quality approximate solutions for the STSC problem. Input: N test items Q1, Q2, …, Qn, M concepts C1, C2, …, Cm, the target difficulty level D, and the number of required test sheets, K. Step 1. Generate initial swarm Since all decision variables of the STSC problem take binary values (either 0 or 1), a particle in the STSCPSO algorithm can be represented by x = x11 x 21 ⋅ ⋅ ⋅ x N 1 x12 x 22 ⋅ ⋅ ⋅ x N 2 ⋅ ⋅ ⋅ x1K x2 K ⋅ ⋅ ⋅ x NK , which is a vector of NK [ ] binary bits where xik is equivalent to 1 if test item Qi is included in test sheet k and 0 otherwise. Due to the constrains (2) and (3) with test time, the number of selected items in any test sheet is bounded in [l max{t i }, u min{t i }] . Hence, we should enforce the integrity rule: i =1~ N i =1~ N N i =1 ik l max{t i } ≤ ∑ x ≤ u min{t i }, ∀k = 1,2,..., K , during every step of our algorithm. To generate the i =1~ N i =1~ N initial swarm, we randomly determine the number of items for each test sheet according to the integrity rule. The selection probability of each item is based on the selection rule which gives higher selection probability to the items that have closer difficulty level to the target. In particular, the selection probability of item Qi is defined as (S − d i − D ) S where S is a constant. As such the initial swarm contains solutions that have good objective 6 values but may violate constraint sets. Then the particle swarm evolves to quality solutions that not only optimize the objective function but also meet all of the constraint sets. Step 2. Fitness evaluation of particles The original objective function of the STSC problem measures the quality of a candidate solution which meets all the constraints (1)-(4). However, the particles generated by the PSO-based algorithm may violate one or more of these constraints. To cope with this problem, the merit of a particle is evaluated by incorporating penalty terms into the objective function if any constraint is violated. The penalty terms corresponding to separate constraints are described as follows. ¾ α penalty for violating concept relevance bound constraint K M ⎛ N ⎞ ⎝ i =1 ⎠ α = ∑∑ ⎜ h j − ∑ rij xik ⎟ . k =1 j =1 ¾ This term sums up relevance deficit of selected test items to the specified relevance lower bound of each concept over all test sheets. β penalty for violating test time bound constraint K ⎛ ⎛ N ⎞ ⎛ N ⎞⎞ k =1 ⎝ ⎝ i =1 ⎠ ⎝ i =1 ⎠⎠ β = ∑ ⎜⎜ max⎜ l − ∑ t i xik ,0 ⎟ + max⎜ 0, ∑ t i xik − u ⎟ ⎟⎟ . ¾ This term penalizes the case where the expected test times are beyond the specified lower bound or upper bound. γ penalty for violating common item constraint ⎛ N ⎞ γ = ∑ ⎜ ∑ xij xik − f ⎟ . j ≠k ⎝ i =1 ⎠ This term penalizes the case where the number of common items between two different tests exceeds the threshold f. ¾ Function J(⋅) for evaluating the fitness of a particle x Minimize J ( x ) = Z k + w1α + w2 β + w3γ . w1, w2, and w3 denote relative weights for the three penalty terms. As such the fitness of a particle x accounts for both of quality (objective value) and feasibility (penalty terms). The smaller the fitness value, the better the particle. Step 3. Determination of pbesti and gbest using the bounding criterion In the original PSO, the fitness evaluation of particles which is a necessity for determination of pbesti and gbest is the most time-consuming part. Here we propose a bounding criterion to speed up the process. We observe that the fitness value of a particle is only used for determination of pbesti and gbest, but not directly used for velocity update. Since Zk and J(⋅) are both monotonically increasing functions, we can use the fitness of the incumbent pbesti as a fitness bound and terminate the fitness evaluation of the ith particle when the intermediate fitness value has exceeded the bound. Also, only those pbesti that have been updated at the current iteration need to be compared with gbest for its possible updating. The use of bounding criterion can save the computational time significantly. Step 4. Update of velocities and particles The updating of velocities and particle positions follow the discrete version of PSO, i.e., the velocity is scaled into [0.0, 1.0] by a transformation function S(⋅) and is used as the probability with which the particle bit takes the value 1. In this paper, we adopt the linear interpolation function, S (vij ) = vij 2 vmax + 0.5 , to transform velocities into probabilities. 7 C. An Illustrative Example Herein, an illustrative example for the STSCPSO algorithm is provided. Assume that two test sheets with target difficulty level D = 5 are required to be generated from 10 test items. The 10 test items are relevant to 3 concepts, and the relevance association (rij) between each test item and each concept is shown in Table 1. The estimated answering time (ti) and difficulty degree (di) for the 10 test items are tabulated in Table 2. Let h1 = 2, h2 = 2, h3 = 1, l = 10, u = 16, f = 3, w1 = w2 = w3 = 0.01. The algorithm proceeds as follows. Table 1. Relevance association between each test item and each concept C1 C2 C3 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 1 0 0 0 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 Table 2. Estimated answering time and difficulty degree for each test item ti di Q1 4 0.5 Q2 3 0.9 Q3 3 0.1 Q4 5 0.7 Q5 3 0.4 Q6 2 0.5 Q7 4 0.2 Q8 3 0.6 Q9 5 0.3 Q10 4 0.5 Initial swarm generation Let the algorithm proceed with a swarm of two particles. To generate the initial swarm, the range for the feasible number of selected items in a test sheet is first determined using the integrity rule: N l max{t i } ≤ ∑ xik ≤ u min{t i } . Hence, each test sheet can select 2 to 8 items from the test item bank. i =1~ N i =1 i =1~ N According to our particle representation scheme, each particle is represented as a binary vector with 20 bits and is generated based on the selection rule S − d i − D S which gives higher selection probability to the items ( ) that have closer difficulty level to the target D. It is observed from Table 2 that test items Q1, Q6, and Q10 will have the highest selection probability. With the integrity and selection rules, the initial swarm can be generated as shown in the first generation in Figure 1. Particle 1 selects items Q1, Q6, and Q8 for the first test sheet, and chooses Q1, Q4, Q5, Q8, and Q10 for the second test sheet. As for particle 2, the first test sheet consists of Q2, Q4, Q8, and Q10 and the second test sheet is composed of Q1, Q3, Q7, and Q9. Generation 1: Generation 2: test sheet 1 test sheet 2 Zk α β γ J particle 1 1000010100 1001100101 0.05 0 3 0 0.08 particle 2 0101000101 1010001010 0.285 3 0 0 0.315 particle 1 0100010100 1000100101 >0.08 - - - - particle 2 1000000100 1000101101 0.078 1 5 Figure. 1. Swarm evolution and the corresponding fitness evaluation 0 0.138 8 Particle fitness evaluation The particle fitness evaluation function J ( x ) = Z k + w1α + w2 β + w3γ consists of objective value and penalty terms which can be easily computed. In particular, particle 1 attains an objective value of 0.05 and incurs β penalty of 3 because the expected test time exceeds the upper limit, resulting a fitness value of 0.08. While particle 2 has an objective value of 0.285 and incurs α penalty of 3 due to the deficit of concept relevance. The fitness of particle 2 is thus 0.315. For the initial swarm, the personal best experience (pbesti) is the current particle itself. Since the fitness value of particle 1 is smaller, it is considered as gbest. The fitness values of pbesti and gbest will be used as bounds to expedite the process of next generation. Update of velocities and particles As for particle 1, the incumbent particle is equivalent to pbesti and gbest, resulting in the same vij values as previous ones and thus the same probabilities. Only a very small number of bits will be changed. Assume that particle 1 replaces Q1 by Q2 for the first test sheet and removes Q4 for the second test sheet (see generation 2 in Figure 1). For the case of particle 2, pbesti is the particle itself, but gbest is equivalent to particle 1, vij will be changed at the bits where pbesti and gbest are different. In essence, particle 2 will be dragged, to some extent, toward particle 1 in the discrete space. Assume that particle 2 becomes [10000001001000101101]. Use of bounding criterion for determining pbesti and gbest Now we proceed with the fitness evaluation for the two particles. For the case of particle 1, we find that the intermediate objective value during computation has already exceeded the current bound (0.08); hence, the computation is terminated according to the bounding criterion. There is also no need to derive penalty terms. As such the computational time is significantly reduced. As for particle 2, it attains an objective value of 0.078 and incurs α and β penalties of 1 and 5, resulting in a fitness of 0.138. Compared to incumbent pbest2 (fitness = 0.315), the fitness is improved, so pbest2 is updated to current particle 2; while gbest is not changed (fitness = 0.08). The STSCPSO algorithm iterates this process until a given maximum number of iterations has passed and gbest is considered as the best solution found by the algorithm. 5. Experiment and Discussion The STSCPSO approach has been applied to the development of an Intelligent Tutoring, Evaluation and Diagnosis (ITED III) system, which contains a large item bank for many science courses. The interface of the developed system and the experiments for evaluating the performance of the novel approach are given in the following subsections. A. System Development The teacher interface of ITED III provides a step-by-step instruction to guide teachers in defining the goal and parameters of a test. In the first step, the teacher is asked to define the type and date/time of the test. The test type might be “certification” (for performing a formal test), personal practice (for performing a test based on the to-be-enhanced concepts of each student), or group practice (for performing a test to evaluate the computer knowledge of a group of students). Each student is asked to take the test, using an assigned computer in a monitored computer room, and is allowed to answer the test sheet only within the specified test date/time range. In the following steps, the teachers are asked to define the parameters for composing the test sheets, such as the lower bound and upper bound of the expected test times (i.e., l and u), and the lower bound on the expected relevance of each concept or skill to be evaluated (i.e., hj). Figure 2 demonstrates the third step for defining a test with lower bound and upper bound of the expected test times being 60 minutes and 80 minutes, respectively. Moreover, in this step, the teacher is asked to define the lower bounds on the expected relevance of the concepts for the test, which are all set to 0.9 in the given example. 9 Figure. 2. Teacher interface for defining the goal and parameters of a test The entire test sheet is presented in one Web page with a scroll bar for moving the page up and down. After submitting the answers for the test items, each student will receive a scored test result and a personalized learning guide, which indicates the learning status of the student for each computer skill evaluated. Figure 3 is an illustrative example of a personalized learning guide for evaluating the skills of web-based programming. Such a learning guidance has been shown to be helpful to the students in improving their learning performance if remedial teaching or practice can be conducted accordingly (Hwang, 2003a; Hwang et al. 2003a; Hwang 2005). Figure. 3. Illustrative example of a personalized learning guide 10 B. Experimental Design To evaluate the performance of the proposed STSCPSO algorithm, a series of experiments have been conducted to compare the execution times and the solution quality of three competing approaches: STSCPSO algorithm, Random Selection with Feasible Solution (RSFS), and exhaustive search. The RSFS program generates the test sheet by selecting test items randomly to meet all of the constraints, while the exhaustive search program examines every feasible combination of the test items to find the optimal solution. The platform of the experiments is a personal computer with a Pentium IV 1.6 GHz CPU, 1 GB RAM and 80G hard disk with 5400RPM access speed. The programs were coded with C# Language. To analyze the comparative performances of the competing approaches, twelve item banks with number of candidate items ranging from 15 to 10,000 were constructed by randomly selecting test items from a computer skill certification test bank. Table 3 shows the features of each item bank. Item bank 1 2 3 4 5 6 7 8 9 10 11 12 Table 3. Description of the experimental item banks Number of test Average Average expected answer time items difficulty of each test item (minutes) 15 0.761525 3.00000 20 0.765460 3.25000 25 0.770409 3.20000 30 0.758647 2.93333 40 0.720506 2.95000 250 0.741738 2.94800 500 0.746789 2.97600 1000 0.751302 2.97200 2000 0.746708 3.00450 4000 0.747959 3.00550 5000 0.7473007 2.99260 10000 0.7503020 3.00590 The experiment is conducted by applying each approach twenty times on each item bank with the objective values (Zk) and the average execution time recorded. The lower bounds and upper bounds of testing times are 60 and 120 minutes, respectively, and the maximal number of common test items between each pair of test sheets is 5. To make the solutions hard to obtain, we set the target difficulty level D = 0.5 which sufficiently deviates from the average difficulty of item banks. The STSCPSO algorithm is executed with 10 particles for 100 generations. The execution time of RSFS is set the same as that of the STSCPSO algorithm, while the maximal execution time of the exhaustive search is set to 7 days to obtain the optimal solution. N 15 20 25 30 40 250 500 1000 2000 4000 5000 10000 Table 4 Experimental results STSCPSO RSFS Average Zk Average Zk Time (sec) Time (sec) 0.44 50 0.62 50 0.44 63 0.70 63 0.44 80 0.66 80 0.44 102 0.68 102 0.40 134 0.76 134 0.36 815 0.54 815 0.28 1805 0.68 1805 0.22 3120 0.46 3120 0.18 6403 0.60 6403 0.16 12770 0.62 12770 0.16 15330 0.54 15330 0.14 21210 0.56 21210 Optimum Solution Zk Average Time (day) 0.42 2 days > 7 days - 11 Table 4 shows the experimental results of objective values (Zk) and execution times using the three methods. We observe that the exhaustive search method can only obtain the optimal solution for the smallest test item bank with N = 15 since the computation complexity of the STSC problem is extremely high as described in Section III. As for the STSCPSO algorithm and the RSFS, approximate solutions to all of the test item banks can be obtained with reasonable times ranging from 50 seconds to 5.89 hours, but the solution quality delivered by the STSCPSO algorithm is significantly better. In particular, for N = 15, the objective value obtained by the STSCPSO algorithm is 0.44 which is very close to the optimal value (0.42), while the objective value obtained by the RSFS is 0.62. For the other cases with larger item banks, the superiority of the STSCPSO algorithm over the RSFS becomes more prominent as the size of the item banks increases. Figure 4 shows the variations of the objective value obtained using the STSCPSO algorithm and the RSFS. The objective value derived by the RSFS fluctuates, while the objective value derived by the STSCPSO algorithm constantly decreases since more candidate test items can be selected to construct better solutions as the test bank is larger. Figure. 4. Variations of the objective value as the size of test item banks increases Figure 5 shows the fitness value obtained by gbest of the STSCPSO algorithm as the number of generations increases for the test item bank with N = 1000. We observe that the global swarm intelligence improves with a decreasing fitness value as the evolution proceeds. This validates the feasibility of the proposed particle representation and the fitness function fits the STSC problem scenario. Figure. 5. Variations of the objective value as the size of test item banks increases 12 To analyze the convergence behavior of the particles, we testify whether the swarm evolves to the same optimization goal. We propose the information entropy for measuring the similarity convergence among the particles as follows. Let pij be the binary value of the jth bit for the ith particle, i = 1, 2, …, R, and j = 1, 2, …, NK, where R is the swarm size. We can calculate probj as the conditional probability that value one happens at the jth bit given the total number of bits that take value one in the entire swarm as follows. ∑ p = ∑ ∑ R prob j i =1 R i =1 ij NK . p h =1 ih The particle entropy can be then defined as Entropy = −∑ j =1 prob j log 2 ( prob j ) . NK The particle entropy is smaller if the probability distributions are denser. As such, the variations of particle entropy during the swarm evolution measure the convergence about the similarity among all particles. If the particles are highly similar to one another, the values of the non-zero probj would be high, resulting in denser probability distributions and less entropy value. This also means the swarm particles reach the consensus about which test items should be selected for composing the test sheets. Figure 6 shows the variations of particle entropy as the number of generations increases. It is observed that the entropy value drops drastically during the first 18 generations since the particles exchange information by referring to the swarm’s best solution. After this period, the entropy value is relatively fixed due to the good quality solutions found and the high similarity among the particles, meaning the particles are resorting to the same high quality solution as the swarm converges. Figure 6. The particle entropy as the number of generations increases 6. Conclusions and Future work In this paper, a particle swarm optimization-based approach is proposed to cope with the serial test sheet composition problems. The algorithm has been embedded in an intelligent tutoring, evaluation and diagnosis system with large-scale test banks that are accessible to students and instructors through the World-Wide Web. To evaluate the performance of the proposed algorithm, a series of experiments have been conducted to compare the execution time and the solution quality of three solution-seeking strategies on twelve item banks. Experimental results show that serial test sheets with near-optimal average difficulty to a specified target value can be obtained with reasonable time by employing the novel approach. For further application, collaborative plans with some local e-learning companies are proceeding, in which the present approach is used in the testing and assessment of students in elementary school and junior high schools. 13 Acknowledgement This study is supported in part by the National Science Council of the Republic of China under contract numbers NSC-94-2524-S-024-001 and NSC 94-2524-S-024 -003. References Chou. C. (2000). Constructing a computer-assisted testing and evaluation system on the World Wide Web-the CATES experience. IEEE Transactions on Education, 43 (3), 266-272. Clerc, M., & Kennedy, J. (2002). The particle swarm explosion, stability, and convergence in a multidimensional complex space. IEEE Transaction on Evolutionary Computation, 6, 58-73. Eberhart, R. C., & Shi, Y. (1998). Evolving artificial neural networks. In Proceedings of the International Conference on Neural Networks and Brain, PL5-PL13. Fan, J. P., Tina, K. M., & Shue, L. Y. (1996). Development of knowledge-based computer-assisted instruction system. The 1996 International Conference Software Engineering: Education and Practice, Dunedin, New Zealand. Feldman, J. M., & Jones, J. Jr. (1997). Semiautomatic testing of student software under Unix(R). IEEE Transactions on Education, 40 (2), 158-161. Garey, M. R., & Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NPCompleteness, San Francisco, CA: Freedman. Hillier, F. S., & Lieberman, G. J. (2001). Introduction to Operations Research, 7th Ed., New York: McGrawHill. Hwang, G. J. (2003a). A concept map model for developing intelligent tutoring systems. Computers & Education, 40 (3), 217-235. Hwang, G. J. (2003b). A Test Sheet Generating Algorithm for Multiple assessment criteria. IEEE Transactions on Education, 46 (3), 329-337. Hwang, G. J., Hsiao, J. L., & Tseng, J. C. R. (2003a). A Computer-Assisted Approach for Diagnosing Student Learning Problems in Science Courses. Journal of Information Science and Engineering, 19 (2), 229-248. Hwang, G. J., Lin, B. M. T., & Lin, T. L. (2003b). An effective approach to the composition of test sheets from large item banks. 5th International Congress on Industrial and Applied Mathematics, Sydney, Australia, 7-11 July, 2003. Hwang, G. J. (2005). A Data Mining Algorithm for Diagnosing Student Learning Problems in Science Courses. International Journal of Distance Education Technologies, 3 (4), 35-50. Hwang, G. J., Yin, P. Y., Hwang, G. H., & Chan, Y. (2005). A Novel Approach for Composing Test Sheets from Large Item Banks to Meet Multiple Assessment Criteria. The 5th IEEE International Conference on Advanced Learning Technologies, Kaohsiung, Taiwan, July 5-8, 2005. Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, IV, 1942-1948. Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm algorithm. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 4104-4108. Linderothm J. T., & Savelsbergh, M. W. P. (1999). A computational study of search strategies for mixed integer programming. INFORMS Journal on Computing, 11 (2), 173-187. 14 Olsen, J. B., Maynes, D. D., Slawson, D., & Ho, K. (1986). Comparison and equating of paper-administered, computer-administered and computerized adaptive tests of achievement. The Annual Meeting of American Educational Research Association, California, April 16-20, 1986. Rasmussen, K., Northrup, P., & Lee, R. (1997). Implementing Web-based instruction. In Web-Based Instruction, Khan, B. H. (Ed.), Englewood Cliffs, NJ: Educational Technology, 341–346 Shigenori, N., Takamu, G., Toshiku, Y., & Yoshikazu, F. (2003). A hybrid particle swarm optimization for distribution state estimation. IEEE Transaction on Power Systems, 18, 60-68. Trelea, I. C. (2003). The particle swarm optimization algorithm: convergence analysis and parameter selection. Information Processing Letters, 85, 317-325. Wainer, H. (1990). Computerized Adaptive Testing: A Primer, Lawrence Erlbaum Associates, Hillsdale, NJ. Yoshida, H., Kawata, K., Fukuyama, Y., & Nakanishi, Y. (1999). A particle swarm optimization for reactive power and voltage control considering voltage stability. In Proceedings of the International Conference on Intelligent System Application to Power Systems, 117-121. 15