A Particle Swarm Optimization Approach to Composing Serial Test

advertisement
Yin, P.-Y., Chang, K.-C., Hwang, G.-J., Hwang, G.-H., & Chan, Y. (2006). A Particle Swarm Optimization Approach to
Composing Serial Test Sheets for Multiple Assessment Criteria. Educational Technology & Society, 9 (3), 3-15.
A Particle Swarm Optimization Approach to Composing Serial Test Sheets
for Multiple Assessment Criteria
Peng-Yeng Yin and Kuang-Cheng Chang
Department of Information Management, National Chi Nan University, Pu-Li, Nan-Tou, Taiwan 545, R.O.C.
Gwo-Jen Hwang
Department of Information and Learning Technology, National University of Tainan, 33, Sec. 2
Shulin St.,Tainan city 70005, Taiwan, R.O.C.
gjhwang@mail.nutn.edu.tw
Tel: 886-915396558
Fax: 886-6-3017001
Gwo-Haur Hwang
Information Management Department, Ling Tung University, Taichung, Taiwan 40852, R.O.C.
Ying Chan
Graduate Institute of Educational Policy and Leadership, Tamkang University
Tamsui, Taipei County, Taiwan 251, R.O.C.
ABSTRACT
To accurately analyze the problems of students in learning, the composed test sheets must meet multiple
assessment criteria, such as the ratio of relevant concepts to be evaluated, the average discrimination
degree, difficulty degree and estimated testing time. Furthermore, to precisely evaluate the improvement of
student’s learning performance during a period of time, a series of relevant test sheets need to be composed.
In this paper, a particle swarm optimization-based approach is proposed to improve the efficiency of
composing near optimal serial test sheets from very large item banks to meet multiple assessment criteria.
From the experimental results, we conclude that our novel approach is desirable in composing near optimal
serial test sheets from large item banks and hence can support the need of evaluating student learning status.
Keywords
Computer-assisted testing, serial test-sheet composing, particle swarm optimization, computer-assisted
assessment
1. Introduction
As the efficiency and efficacy of the deployment of computer-based tests have been confirmed by many early
studies, many researchers in both technical and educational fields have engaged in the development of
computerized testing systems (Fan et al., 1996; Olsen et al., 1986). Some researchers have even proposed
computerized adaptive testing, which uses prediction methodologies to shorten the length of the test sheets
without sacrificing their precision (Wainer, 1990).
A well-scrutinized test is helpful for teachers wanting to verify whether students well digest relevant knowledge
and skills and for recognition of students’ learning bottlenecks (Hwang et al., 2003a). In a computerized learning
environment, which provides students with greater flexibility during the learning process, information
concerning the student learning status is even more important (Hwang, 2003a). The key to a good test depends
not only on the subjective appropriateness of test items, but also on the way the test sheet is constructed. To
continuously evaluate the learning performance of a student, it is usually more desirable to compose a series of
relevant test sheets to meet a predefined set of assessment criteria such that those test sheets in the same series
will not contain identical test items (or contain only an acceptable percentage of overlapped test items). Because
the number of test items in an item bank is usually large and the number of feasible combinations to form test
sheets thus grows exponentially, an optimal test sheet takes enormous time to build up (Garey & Johnson, 1979).
Previous investigation has even shown that a near-optimal solution is difficult to find when the number of
candidate test items is larger than five thousand (Hwang et al., 2003b), not to mention the composition of a series
ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the
copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by
others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from the editors at kinshuk@ieee.org.
3
of relevant test sheets from larger item banks for evaluating the improvement of student’s learning performance
during a period of time.
To cope with the problem in composing optimal serial test sheets from large item banks, a particle swarm
optimization (PSO)-based algorithm (Kennedy and Eberhart, 1995) is proposed to optimize the selection of test
items to compose serial test sheets. By employing this novel approach, the allocation of test items in each of the
serial test sheets will meet the needs of multiple criteria, including the expected testing time, the degree of
difficulty, the expected ratio of unit concepts, and the acceptable percentage of overlapped test items among test
sheets to approximate the optimal allocation. Based on this approach, an Intelligent Tutoring, Testing and
Diagnostic (ITED III) system has been developed. Experimental results indicated that the proposed approach is
efficient and effective in generating near-optimal compositions of serial test sheets that satisfy the specified
requirements.
2. Background and Relevant Researches
In recent years, researchers have developed various computer-assisted testing systems to more precisely evaluate
student’s learning status. For example, Feldman and Jones (1997) attempted to perform semi-automatic testing
of student software using Unix systems; Rasmussen, et al. (1997) proposed a system to evaluate student learning
status on computer networks while taking Feldman and Jones’ progress into consideration. Additionally, Chou
(2000) proposed the CATES system, which is an interactive testing system developed in a collective and
collaborative project with theoretical and practical research on complex technology-dependent learning
environments. Unfortunately, although many computer-assisted testing systems have been proposed, few of
them have addressed the problem of finding a systematic approach for composing test sheets that satisfy multiple
assessment requirements. Most of the existing systems construct a test sheet by manually or randomly selecting
test items from their item banks. Such manual or random test item selection strategies are inefficient and are
unable to meet multiple assessment requirements simultaneously.
Some previous investigations showed that a well-constructed test sheet not only helps in the evaluation of
student’s learning status, but also facilitates the diagnosis of the problems embedded in the learning process
(Hwang, 2003a; Hwang et al. 2003a; Hwang 2005). Selecting proper test items is very critical to constitute a test
sheet that meets multiple assessment criteria, including the expected time needed for answering the test sheet, the
number of test items, the specified distribution of course concepts to be learned, and, most importantly, the
maximization of the average degree of discrimination (Hwang et al, 2005).
Since satisfying multiple requirements (or constraints) when selecting test items is difficult, most computerized
testing systems generate test sheets in a random fashion. Hwang et al. (2003b) proposed a multiple-criteria where
test sheet-composing problem is formulated as a dynamic programming model (Hillier and Lieberman, 2001) to
minimize the distance between the parameters (e.g., discrimination, difficulty, etc.) of the generated test sheets
and the objective values subject to the distribution of concept weights. A critical issue arising from the use of a
dynamic programming approach is the exceedingly long execution time required for producing optimal
solutions. As the time-complexity of the dynamic programming algorithm is exponential in terms of input data,
the execution time will become unacceptably long if the number of candidate test items is large. Consequently,
Hwang et al. (2005) attempted to solve the test sheet-composing problem by optimizing the discrimination
degree of the generated test sheets with a specified range of assessment time and some other multiple constraints.
Nevertheless, in developing an e-learning system, it is necessary to conduct a long-term assessment of each
student; that is, only optimizing a test sheet is not enough for such long-term observation for the student.
Therefore, a series of relevant test sheets with multiple assessment criteria need to be composed for such a
continuously learning performance evaluation. As the problem is much more difficult than that of composing a
single test sheet, a more efficient and effective approach is needed. In this paper, a particle swarm optimizationbased algorithm is proposed to find quality approximate solutions in an acceptable time. A series of experiments
will be also presented to show the performances of the novel approach.
3. Problem Description
In this section, a mixed integer programming model (Linderoth and Savelsbergh, 1999) is presented to formulate
the underlying problem. In order to conduct a long-term observation on the student’s learning status, a series of
K relevant test sheets will be composed. The model aims at minimizing the differences between the average
4
difficulty of each test sheet and the specified difficulty target, with a specified range of assessment time and
some other multiple constraints. Assume K serial test sheets with a specific difficulty degree will be composed
out of a test bank consisting of N items, Q1, Q2, …, QN. To compose test sheet k, 1 ≤ k ≤ K, a subset of nk
candidate test items will be selected. Assume that in total M concepts will be involved in the K tests. With the
specified course concepts to be learned, say Cj, 1 ≤ j ≤ M , each test item is relevant to one or more of them.
For example, to test the multimedia knowledge of students, Cj might be “MPEG”, “Video Streaming” or “Videoon-Demand”. We shall call this problem the STSC (Serial Test Sheets Composition) problem. In the STSC
problem, we need confine the similarities between each pair of tests. Such a constraint imposed upon each pair
of tests k and l, 1 ≤ k, l ≤ K, is specified by parameter f, which indicates that any two tests can have at most f
items in common. The variables used in this model are given as follows:
¾ Decision variables xik, 1 ≤ i ≤ N and 1 ≤ k ≤ K: xik is 1 if test item Qi is included in test sheet k; 0,
otherwise.
¾ Coefficient di, 1 ≤ i ≤ N : degree of difficulty of Qi.
¾ Coefficient D, target difficulty level for each of the serial test sheets generated.
¾ Coefficient rij, 1 ≤ i ≤ N , 1 ≤ j ≤ M : degree of association between Qi and concept Cj.
¾
¾
¾
¾
¾
Coefficient ti, 1 ≤ i ≤ N : expected time needed for answering Qi.
Right hand side hj, 1 ≤ j ≤ M: lower bound on the expected relevance of Cj for each of the K test sheets.
Right hand side l: lower bound on the expected time needed for answering each of the K test sheets.
Right hand side u: upper bound on the expected time needed for answering each of the K test sheets.
Right hand side f: the maximum number of identical test items between two composed test sheets;
Formal definition of the STSC Model:
⎛
⎜
⎜
Minimize Zk = ⎜ ∑
⎜ 1≤ k ≤ K
⎜
⎝
subject to:
N
∑r x
ij
ik
i =1
N
N
∑
p
d i x ik
i =1
N
∑
− D
x ik
i =1
⎞
⎟
⎟
⎟
⎟
⎟
⎠
1
p
≥ h j , 1 ≤ j ≤ M , 1 ≤ k ≤ K;
(1)
∑t x
ik
≥ l , 1 ≤ k ≤ K;
(2)
∑t x
ik
≤ u, 1 ≤ k ≤ K ;
(3)
i
i =1
N
i
i =1
N
∑x x
ij ik
≤ f , 1 ≤ j ≠ k ≤ K;
(4)
i =1
In the above formula, constraint set (1) indicates the selected test items in each generated test sheet must have a
total relevance no less than the expected relevance to each concept assumed to be covered. Constraint sets (2)
and (3) indicate that total expected test time of each generated test sheet must be in its specified range.
Constraint set (4) indicates that no pair of test sheets can contain more than f identical test items.
⎛
⎜
⎜
In the objective function, Zk = ⎜ ∑
⎜ 1≤ k ≤ K
⎜
⎝
N
∑
p
d i x ik
i =1
N
∑
i =1
− D
x ik
⎞
⎟
⎟
⎟
⎟
⎟
⎠
1
p
is the p-norm between the average
difficulty degree of each test sheet from the target difficulty degree specified by the teacher. In particular, the
objective function indicates the absolute distance when p = 1, and it calculates the root squared distance when p
= 2. Therefore, the objective of this model seeks to select a number of test items such that the average difficulty
5
of each generated test sheet is closest to the target difficulty value D. Without loss of generality, we let p = 2 for
simulation.
The computation complexity for obtaining the optimal solution to the STSC problem is analyzed as follows. The
number of possible combinations of test items for composing a single test sheet is ⎛⎜ N ⎞⎟ where Ω is the range
∑⎜
i∈Ω
⎟
⎝i⎠
for number of test items that could be answered within the specified time frame [l, u], while the parameters hj
and f will affect the number of feasible solutions in those combinations. For composing K serial test sheets, it
K
requires a computation complexity of ⎛⎜ ⎛⎜ ⎛ N ⎞ ⎞⎟ ⎞⎟ which is extremely high. Hence, seeking the optimal solution
O⎜ ∑ ⎜ ⎟ ⎟
⎜ ⎜⎝ i∈Ω ⎜⎝ i ⎟⎠ ⎟⎠ ⎟
⎠
⎝
to the STSC problem is computationally prohibitive.
4. PSO-based Algorithm for Serial Test Sheet Composition
Linderoth and Savelsbergh (1999) conducted a comprehensive computational study manifesting that the mixed
integer programming problems are NP-hard, which implies that composing optimal serial test sheets from a large
item bank is computationally prohibitive. To cope with this difficulty, a particle swarm optimization (PSO)based algorithm is proposed to find quality approximate solutions with reasonable time.
A. STSCPSO (Serial Test Sheets Composition with PSO) Algorithm
The PSO algorithm was developed by Kennedy and Eberhart (1995). It is a biologically inspired algorithm
which models the social dynamics of bird flocking and fish schooling. Ethologists find that a swarm of
birds/fishes flock synchronously, change direction suddenly, scatter and regroup iteratively, and finally stop on a
common target. The collective intelligence from each individual not only increases the success rate for food
foraging but also expedites the process. The PSO algorithm facilitates simple rules simulating bird flocking and
fish schooling and can serve as an optimizer for nonlinear functions. Kennedy and Eberhart (1997) further
presented a discrete binary version of PSO for combinatorial optimization where the particles are represented by
binary vectors of length d and the velocity represents the probability that a decision variable will take the value
1. PSO has delivered many successful applications (Eberhart and Shi, 1998; Yoshida et al., 1999; Shigenori et
al., 2003). The convergence and parameterization aspects of the PSO have also been discussed thoroughly (Clerc
and Kennedy, 2002; Trelea, 2003).
In the followings, a PSO-based algorithm, STSCPSO (Serial Test Sheets Composition with PSO approach), is
proposed to find quality approximate solutions for the STSC problem.
Input: N test items Q1, Q2, …, Qn, M concepts C1, C2, …, Cm, the target difficulty level D, and the number of
required test sheets, K.
Step 1. Generate initial swarm
Since all decision variables of the STSC problem take binary values (either 0 or 1), a particle in the STSCPSO
algorithm can be represented by x = x11 x 21 ⋅ ⋅ ⋅ x N 1 x12 x 22 ⋅ ⋅ ⋅ x N 2 ⋅ ⋅ ⋅ x1K x2 K ⋅ ⋅ ⋅ x NK , which is a vector of NK
[
]
binary bits where xik is equivalent to 1 if test item Qi is included in test sheet k and 0 otherwise. Due to the
constrains (2) and (3) with test time, the number of selected items in any test sheet is bounded in
[l max{t i }, u min{t i }] .
Hence,
we
should
enforce
the
integrity
rule:
i =1~ N
i =1~ N
N
i =1 ik
l max{t i } ≤ ∑ x ≤ u min{t i }, ∀k = 1,2,..., K , during every step of our algorithm. To generate the
i =1~ N
i =1~ N
initial swarm, we randomly determine the number of items for each test sheet according to the integrity rule. The
selection probability of each item is based on the selection rule which gives higher selection probability to the
items that have closer difficulty level to the target. In particular, the selection probability of item Qi is defined as
(S − d
i
− D ) S where S is a constant. As such the initial swarm contains solutions that have good objective
6
values but may violate constraint sets. Then the particle swarm evolves to quality solutions that not only
optimize the objective function but also meet all of the constraint sets.
Step 2. Fitness evaluation of particles
The original objective function of the STSC problem measures the quality of a candidate solution which meets all
the constraints (1)-(4). However, the particles generated by the PSO-based algorithm may violate one or more of
these constraints. To cope with this problem, the merit of a particle is evaluated by incorporating penalty terms
into the objective function if any constraint is violated. The penalty terms corresponding to separate constraints
are described as follows.
¾ α penalty for violating concept relevance bound constraint
K
M
⎛
N
⎞
⎝
i =1
⎠
α = ∑∑ ⎜ h j − ∑ rij xik ⎟ .
k =1 j =1
¾
This term sums up relevance deficit of selected test items to the specified relevance lower bound of
each concept over all test sheets.
β penalty for violating test time bound constraint
K
⎛
⎛
N
⎞
⎛
N
⎞⎞
k =1
⎝
⎝
i =1
⎠
⎝
i =1
⎠⎠
β = ∑ ⎜⎜ max⎜ l − ∑ t i xik ,0 ⎟ + max⎜ 0, ∑ t i xik − u ⎟ ⎟⎟ .
¾
This term penalizes the case where the expected test times are beyond the specified lower bound or
upper bound.
γ penalty for violating common item constraint
⎛
N
⎞
γ = ∑ ⎜ ∑ xij xik − f ⎟ .
j ≠k
⎝ i =1
⎠
This term penalizes the case where the number of common items between two different tests exceeds
the threshold f.
¾
Function J(⋅) for evaluating the fitness of a particle x
Minimize
J ( x ) = Z k + w1α + w2 β + w3γ .
w1, w2, and w3 denote relative weights for the three penalty terms. As such the fitness of a particle x
accounts for both of quality (objective value) and feasibility (penalty terms). The smaller the fitness
value, the better the particle.
Step 3. Determination of pbesti and gbest using the bounding criterion
In the original PSO, the fitness evaluation of particles which is a necessity for determination of pbesti and gbest
is the most time-consuming part. Here we propose a bounding criterion to speed up the process. We observe that
the fitness value of a particle is only used for determination of pbesti and gbest, but not directly used for velocity
update. Since Zk and J(⋅) are both monotonically increasing functions, we can use the fitness of the incumbent
pbesti as a fitness bound and terminate the fitness evaluation of the ith particle when the intermediate fitness
value has exceeded the bound. Also, only those pbesti that have been updated at the current iteration need to be
compared with gbest for its possible updating. The use of bounding criterion can save the computational time
significantly.
Step 4. Update of velocities and particles
The updating of velocities and particle positions follow the discrete version of PSO, i.e., the velocity is scaled
into [0.0, 1.0] by a transformation function S(⋅) and is used as the probability with which the particle bit takes the
value 1. In this paper, we adopt the linear interpolation function, S (vij ) =
vij
2 vmax
+ 0.5 , to transform velocities
into probabilities.
7
C. An Illustrative Example
Herein, an illustrative example for the STSCPSO algorithm is provided. Assume that two test sheets with target
difficulty level D = 5 are required to be generated from 10 test items. The 10 test items are relevant to 3
concepts, and the relevance association (rij) between each test item and each concept is shown in Table 1. The
estimated answering time (ti) and difficulty degree (di) for the 10 test items are tabulated in Table 2. Let h1 = 2,
h2 = 2, h3 = 1, l = 10, u = 16, f = 3, w1 = w2 = w3 = 0.01. The algorithm proceeds as follows.
Table 1. Relevance association between each test item and each concept
C1
C2
C3
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
1
0
0
0
0
0
1
1
0
0
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
0
1
0
0
Table 2. Estimated answering time and difficulty degree for each test item
ti
di
Q1
4
0.5
Q2
3
0.9
Q3
3
0.1
Q4
5
0.7
Q5
3
0.4
Q6
2
0.5
Q7
4
0.2
Q8
3
0.6
Q9
5
0.3
Q10
4
0.5
Initial swarm generation
Let the algorithm proceed with a swarm of two particles. To generate the initial swarm, the range for the feasible
number of selected items in a test sheet is first determined using the integrity rule:
N
l max{t i } ≤ ∑ xik ≤ u min{t i } . Hence, each test sheet can select 2 to 8 items from the test item bank.
i =1~ N
i =1
i =1~ N
According to our particle representation scheme, each particle is represented as a binary vector with 20 bits and
is generated based on the selection rule S − d i − D S which gives higher selection probability to the items
(
)
that have closer difficulty level to the target D. It is observed from Table 2 that test items Q1, Q6, and Q10 will
have the highest selection probability. With the integrity and selection rules, the initial swarm can be generated
as shown in the first generation in Figure 1. Particle 1 selects items Q1, Q6, and Q8 for the first test sheet, and
chooses Q1, Q4, Q5, Q8, and Q10 for the second test sheet. As for particle 2, the first test sheet consists of Q2, Q4,
Q8, and Q10 and the second test sheet is composed of Q1, Q3, Q7, and Q9.
Generation 1:
Generation 2:
test sheet 1
test sheet 2
Zk
α
β
γ
J
particle 1 1000010100
1001100101
0.05
0
3
0
0.08
particle 2 0101000101
1010001010
0.285
3
0
0
0.315
particle 1 0100010100
1000100101
>0.08
-
-
-
-
particle 2 1000000100 1000101101
0.078
1
5
Figure. 1. Swarm evolution and the corresponding fitness evaluation
0
0.138
8
Particle fitness evaluation
The particle fitness evaluation function
J ( x ) = Z k + w1α + w2 β + w3γ consists of objective value and
penalty terms which can be easily computed. In particular, particle 1 attains an objective value of 0.05 and incurs
β penalty of 3 because the expected test time exceeds the upper limit, resulting a fitness value of 0.08. While
particle 2 has an objective value of 0.285 and incurs α penalty of 3 due to the deficit of concept relevance. The
fitness of particle 2 is thus 0.315. For the initial swarm, the personal best experience (pbesti) is the current
particle itself. Since the fitness value of particle 1 is smaller, it is considered as gbest. The fitness values of pbesti
and gbest will be used as bounds to expedite the process of next generation.
Update of velocities and particles
As for particle 1, the incumbent particle is equivalent to pbesti and gbest, resulting in the same vij values as
previous ones and thus the same probabilities. Only a very small number of bits will be changed. Assume that
particle 1 replaces Q1 by Q2 for the first test sheet and removes Q4 for the second test sheet (see generation 2 in
Figure 1). For the case of particle 2, pbesti is the particle itself, but gbest is equivalent to particle 1, vij will be
changed at the bits where pbesti and gbest are different. In essence, particle 2 will be dragged, to some extent,
toward particle 1 in the discrete space. Assume that particle 2 becomes [10000001001000101101].
Use of bounding criterion for determining pbesti and gbest
Now we proceed with the fitness evaluation for the two particles. For the case of particle 1, we find that the
intermediate objective value during computation has already exceeded the current bound (0.08); hence, the
computation is terminated according to the bounding criterion. There is also no need to derive penalty terms. As
such the computational time is significantly reduced. As for particle 2, it attains an objective value of 0.078 and
incurs α and β penalties of 1 and 5, resulting in a fitness of 0.138. Compared to incumbent pbest2 (fitness =
0.315), the fitness is improved, so pbest2 is updated to current particle 2; while gbest is not changed (fitness =
0.08).
The STSCPSO algorithm iterates this process until a given maximum number of iterations has passed and gbest
is considered as the best solution found by the algorithm.
5. Experiment and Discussion
The STSCPSO approach has been applied to the development of an Intelligent Tutoring, Evaluation and
Diagnosis (ITED III) system, which contains a large item bank for many science courses. The interface of the
developed system and the experiments for evaluating the performance of the novel approach are given in the
following subsections.
A. System Development
The teacher interface of ITED III provides a step-by-step instruction to guide teachers in defining the goal and
parameters of a test. In the first step, the teacher is asked to define the type and date/time of the test. The test
type might be “certification” (for performing a formal test), personal practice (for performing a test based on the
to-be-enhanced concepts of each student), or group practice (for performing a test to evaluate the computer
knowledge of a group of students). Each student is asked to take the test, using an assigned computer in a
monitored computer room, and is allowed to answer the test sheet only within the specified test date/time range.
In the following steps, the teachers are asked to define the parameters for composing the test sheets, such as the
lower bound and upper bound of the expected test times (i.e., l and u), and the lower bound on the expected
relevance of each concept or skill to be evaluated (i.e., hj). Figure 2 demonstrates the third step for defining a test
with lower bound and upper bound of the expected test times being 60 minutes and 80 minutes, respectively.
Moreover, in this step, the teacher is asked to define the lower bounds on the expected relevance of the concepts
for the test, which are all set to 0.9 in the given example.
9
Figure. 2. Teacher interface for defining the goal and parameters of a test
The entire test sheet is presented in one Web page with a scroll bar for moving the page up and down. After
submitting the answers for the test items, each student will receive a scored test result and a personalized
learning guide, which indicates the learning status of the student for each computer skill evaluated. Figure 3 is an
illustrative example of a personalized learning guide for evaluating the skills of web-based programming. Such a
learning guidance has been shown to be helpful to the students in improving their learning performance if
remedial teaching or practice can be conducted accordingly (Hwang, 2003a; Hwang et al. 2003a; Hwang 2005).
Figure. 3. Illustrative example of a personalized learning guide
10
B. Experimental Design
To evaluate the performance of the proposed STSCPSO algorithm, a series of experiments have been conducted
to compare the execution times and the solution quality of three competing approaches: STSCPSO algorithm,
Random Selection with Feasible Solution (RSFS), and exhaustive search. The RSFS program generates the test
sheet by selecting test items randomly to meet all of the constraints, while the exhaustive search program
examines every feasible combination of the test items to find the optimal solution. The platform of the
experiments is a personal computer with a Pentium IV 1.6 GHz CPU, 1 GB RAM and 80G hard disk with 5400RPM access speed. The programs were coded with C# Language.
To analyze the comparative performances of the competing approaches, twelve item banks with number of
candidate items ranging from 15 to 10,000 were constructed by randomly selecting test items from a computer
skill certification test bank. Table 3 shows the features of each item bank.
Item
bank
1
2
3
4
5
6
7
8
9
10
11
12
Table 3. Description of the experimental item banks
Number of test Average
Average expected answer time
items
difficulty
of each test item (minutes)
15
0.761525
3.00000
20
0.765460
3.25000
25
0.770409
3.20000
30
0.758647
2.93333
40
0.720506
2.95000
250
0.741738
2.94800
500
0.746789
2.97600
1000
0.751302
2.97200
2000
0.746708
3.00450
4000
0.747959
3.00550
5000
0.7473007
2.99260
10000
0.7503020
3.00590
The experiment is conducted by applying each approach twenty times on each item bank with the objective
values (Zk) and the average execution time recorded. The lower bounds and upper bounds of testing times are 60
and 120 minutes, respectively, and the maximal number of common test items between each pair of test sheets is
5. To make the solutions hard to obtain, we set the target difficulty level D = 0.5 which sufficiently deviates
from the average difficulty of item banks. The STSCPSO algorithm is executed with 10 particles for 100
generations. The execution time of RSFS is set the same as that of the STSCPSO algorithm, while the maximal
execution time of the exhaustive search is set to 7 days to obtain the optimal solution.
N
15
20
25
30
40
250
500
1000
2000
4000
5000
10000
Table 4 Experimental results
STSCPSO
RSFS
Average
Zk
Average
Zk
Time (sec)
Time (sec)
0.44
50
0.62
50
0.44
63
0.70
63
0.44
80
0.66
80
0.44
102
0.68
102
0.40
134
0.76
134
0.36
815
0.54
815
0.28
1805
0.68
1805
0.22
3120
0.46
3120
0.18
6403
0.60
6403
0.16
12770
0.62
12770
0.16
15330
0.54
15330
0.14
21210
0.56
21210
Optimum Solution
Zk
Average
Time (day)
0.42
2 days
> 7 days
-
11
Table 4 shows the experimental results of objective values (Zk) and execution times using the three methods. We
observe that the exhaustive search method can only obtain the optimal solution for the smallest test item bank
with N = 15 since the computation complexity of the STSC problem is extremely high as described in Section III.
As for the STSCPSO algorithm and the RSFS, approximate solutions to all of the test item banks can be obtained
with reasonable times ranging from 50 seconds to 5.89 hours, but the solution quality delivered by the STSCPSO
algorithm is significantly better. In particular, for N = 15, the objective value obtained by the STSCPSO
algorithm is 0.44 which is very close to the optimal value (0.42), while the objective value obtained by the RSFS
is 0.62. For the other cases with larger item banks, the superiority of the STSCPSO algorithm over the RSFS
becomes more prominent as the size of the item banks increases. Figure 4 shows the variations of the objective
value obtained using the STSCPSO algorithm and the RSFS. The objective value derived by the RSFS fluctuates,
while the objective value derived by the STSCPSO algorithm constantly decreases since more candidate test
items can be selected to construct better solutions as the test bank is larger.
Figure. 4. Variations of the objective value as the size of test item banks increases
Figure 5 shows the fitness value obtained by gbest of the STSCPSO algorithm as the number of generations
increases for the test item bank with N = 1000. We observe that the global swarm intelligence improves with a
decreasing fitness value as the evolution proceeds. This validates the feasibility of the proposed particle
representation and the fitness function fits the STSC problem scenario.
Figure. 5. Variations of the objective value as the size of test item banks increases
12
To analyze the convergence behavior of the particles, we testify whether the swarm evolves to the same
optimization goal. We propose the information entropy for measuring the similarity convergence among the
particles as follows. Let pij be the binary value of the jth bit for the ith particle, i = 1, 2, …, R, and j = 1, 2, …,
NK, where R is the swarm size. We can calculate probj as the conditional probability that value one happens at
the jth bit given the total number of bits that take value one in the entire swarm as follows.
∑ p
=
∑ ∑
R
prob j
i =1
R
i =1
ij
NK
.
p
h =1 ih
The particle entropy can be then defined as
Entropy = −∑ j =1 prob j log 2 ( prob j ) .
NK
The particle entropy is smaller if the probability distributions are denser. As such, the variations of particle
entropy during the swarm evolution measure the convergence about the similarity among all particles. If the
particles are highly similar to one another, the values of the non-zero probj would be high, resulting in denser
probability distributions and less entropy value. This also means the swarm particles reach the consensus about
which test items should be selected for composing the test sheets.
Figure 6 shows the variations of particle entropy as the number of generations increases. It is observed that the
entropy value drops drastically during the first 18 generations since the particles exchange information by
referring to the swarm’s best solution. After this period, the entropy value is relatively fixed due to the good
quality solutions found and the high similarity among the particles, meaning the particles are resorting to the
same high quality solution as the swarm converges.
Figure 6. The particle entropy as the number of generations increases
6. Conclusions and Future work
In this paper, a particle swarm optimization-based approach is proposed to cope with the serial test sheet
composition problems. The algorithm has been embedded in an intelligent tutoring, evaluation and diagnosis
system with large-scale test banks that are accessible to students and instructors through the World-Wide Web.
To evaluate the performance of the proposed algorithm, a series of experiments have been conducted to compare
the execution time and the solution quality of three solution-seeking strategies on twelve item banks.
Experimental results show that serial test sheets with near-optimal average difficulty to a specified target value
can be obtained with reasonable time by employing the novel approach.
For further application, collaborative plans with some local e-learning companies are proceeding, in which the
present approach is used in the testing and assessment of students in elementary school and junior high schools.
13
Acknowledgement
This study is supported in part by the National Science Council of the Republic of China under contract numbers
NSC-94-2524-S-024-001 and NSC 94-2524-S-024 -003.
References
Chou. C. (2000). Constructing a computer-assisted testing and evaluation system on the World Wide Web-the
CATES experience. IEEE Transactions on Education, 43 (3), 266-272.
Clerc, M., & Kennedy, J. (2002). The particle swarm explosion, stability, and convergence in a multidimensional
complex space. IEEE Transaction on Evolutionary Computation, 6, 58-73.
Eberhart, R. C., & Shi, Y. (1998). Evolving artificial neural networks. In Proceedings of the International
Conference on Neural Networks and Brain, PL5-PL13.
Fan, J. P., Tina, K. M., & Shue, L. Y. (1996). Development of knowledge-based computer-assisted instruction
system. The 1996 International Conference Software Engineering: Education and Practice, Dunedin, New
Zealand.
Feldman, J. M., & Jones, J. Jr. (1997). Semiautomatic testing of student software under Unix(R). IEEE
Transactions on Education, 40 (2), 158-161.
Garey, M. R., & Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NPCompleteness, San Francisco, CA: Freedman.
Hillier, F. S., & Lieberman, G. J. (2001). Introduction to Operations Research, 7th Ed., New York: McGrawHill.
Hwang, G. J. (2003a). A concept map model for developing intelligent tutoring systems. Computers &
Education, 40 (3), 217-235.
Hwang, G. J. (2003b). A Test Sheet Generating Algorithm for Multiple assessment criteria. IEEE Transactions
on Education, 46 (3), 329-337.
Hwang, G. J., Hsiao, J. L., & Tseng, J. C. R. (2003a). A Computer-Assisted Approach for Diagnosing Student
Learning Problems in Science Courses. Journal of Information Science and Engineering, 19 (2), 229-248.
Hwang, G. J., Lin, B. M. T., & Lin, T. L. (2003b). An effective approach to the composition of test sheets from
large item banks. 5th International Congress on Industrial and Applied Mathematics, Sydney, Australia, 7-11
July, 2003.
Hwang, G. J. (2005). A Data Mining Algorithm for Diagnosing Student Learning Problems in Science Courses.
International Journal of Distance Education Technologies, 3 (4), 35-50.
Hwang, G. J., Yin, P. Y., Hwang, G. H., & Chan, Y. (2005). A Novel Approach for Composing Test Sheets from
Large Item Banks to Meet Multiple Assessment Criteria. The 5th IEEE International Conference on Advanced
Learning Technologies, Kaohsiung, Taiwan, July 5-8, 2005.
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of the IEEE International
Conference on Neural Networks, IV, 1942-1948.
Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm algorithm. In Proceedings
of the IEEE International Conference on Systems, Man, and Cybernetics, 4104-4108.
Linderothm J. T., & Savelsbergh, M. W. P. (1999). A computational study of search strategies for mixed integer
programming. INFORMS Journal on Computing, 11 (2), 173-187.
14
Olsen, J. B., Maynes, D. D., Slawson, D., & Ho, K. (1986). Comparison and equating of paper-administered,
computer-administered and computerized adaptive tests of achievement. The Annual Meeting of American
Educational Research Association, California, April 16-20, 1986.
Rasmussen, K., Northrup, P., & Lee, R. (1997). Implementing Web-based instruction. In Web-Based Instruction,
Khan, B. H. (Ed.), Englewood Cliffs, NJ: Educational Technology, 341–346
Shigenori, N., Takamu, G., Toshiku, Y., & Yoshikazu, F. (2003). A hybrid particle swarm optimization for
distribution state estimation. IEEE Transaction on Power Systems, 18, 60-68.
Trelea, I. C. (2003). The particle swarm optimization algorithm: convergence analysis and parameter selection.
Information Processing Letters, 85, 317-325.
Wainer, H. (1990). Computerized Adaptive Testing: A Primer, Lawrence Erlbaum Associates, Hillsdale, NJ.
Yoshida, H., Kawata, K., Fukuyama, Y., & Nakanishi, Y. (1999). A particle swarm optimization for reactive
power and voltage control considering voltage stability. In Proceedings of the International Conference on
Intelligent System Application to Power Systems, 117-121.
15
Download