View paper

advertisement
ITEM RESPONSE THEORY (IRT) AS A MEASUREMENT MODEL TO ASSESS
STUDENTS’ CHARACTERISTICS IN BROAD SCALE: A STUDY ON MULTIPLEINTELLIGENCE INSTRUMENT
ABSTRACT
Measurement model that has been used so far in assessing students’ characteristics is
classical theory model. It has a weakness in the analysis of the characteristic instrument,
because it is bound to the sample. The weakness of this model is solved by the emergence of
a theory called Item Response Theory (IRT). IRT has an advantage, in which it is not sample
bound, so that it can be used in a population or in a broad scale. This study aims to develop a
model of IRT measurement on multiple intelligences (MI) instrument in order it can be used
in a broader scale.
The study develops the MI instrument using the following steps: determining the
construct and specification of the instrument, writing items, trying out the instrument, and
analyzing the items using IRT. The item analysis was done to estimate the parameters of
index difficulty, item discrimination, and item fit model. The estimation of parameters was
analyzed using the graded response model (GRM) with parscale program. The result of
parameter estimation were used to estimate the parameter of ability (θ) and item-information
function.
The results of data analysis attained the parameter of difficulty index and item
discrimination as well as the fit model of each item. The item-difficulty indices in general
were at the medium level. Further, the items index-discrimination indices were acceptable.
Nevertheless, not all items were fit according to the graded response model. In addition, it
was also found out that the average of item information function was 2.816 at the score of
ability scale 1.273. Thus, the MI item which met the requirement of fit model can be used to
measure the characteristics of student’ intelligences in a broad scale.
Objectives or Purposes
Measurement is giving the score or number to the learning outcomes and student
ability. There are two models of measurement theory, the classical and modern theory. The
classical theory is often used in the assessment of learning outcomes and the identification of
students' abilities. Although this model has a weakness because the score did not considerate
the items’ characteristics and depend on the sample. This weakness was solved by items
response theory (IRT). This theory laid that the instrument characteristics are not dependent
to the characteristics of groups or samples, but it can be used in population, because it
estimate every person and item. Thus, the modern theory approach is considered more
powerful than the modern theory.
Multiple intelligences (MI) is Gardner theory of intelligence that assume each person
has a different form of intelligence. He divides intelligence into 9 forms, namely linguistic,
logical-mathematical, musical, kinesthetic, interpersonal, intrapersonal, naturalistic and
existential. Development of MI instrument used to determine the profile intelligence students
have be done by several experts. The MI instruments can be used by students in various
places, IRT analysis is needed in order to more effectively use. The purpose of this study was
to develop an instrument MI and analyzed with IRT.
Perspective(s) or Theoretical framework
Item response theory (IRT) popularized by Hambleton and Swaminathan in 1985. IRT
models use mathematical concept which states that the probability of the subject correctly
answered an item depends on the ability of the subject and grain characteristics. In this case
note that a person who has the ability or latent trait that will provide high response at a
different point with someone who has a low capacity. Related to grain characteristics, IRT
analysis is invariant, meaning grain parameter does not depend on how well the sample, and
vice versa. Thus subject scores will not change despite being in the group with high or low
ability.
IRT can be grouped in two models, namely models and politomi dichotomy.
Dichotomy model is a model that is used when a response or reply from the person taking the
test is a dichotomy of data (scores 1 and 0). Score 1 if the participant can answer correctly,
and 0 if the participant fails or wrong answer. Score of 1 and 0 can also show the response of
participants, where a score of 1 indicates a higher response than a score of 0. There are many
kinds of instruments that use traditional dichotomous models, including: test wrong, agreedisagree, appropriate-inappropriate, or yes-no.
Dichotomous item response theory developed by using three logistic models, ie one
parameter logistic model, two parameters, and the three parameters. 1 parameter logistic
model is used when only one parameter, which is item difficulty index. 1 parameter logistic
model is a very well-known Rasch models developed by George Rasch, which provide the
characteristics of the item difficulty unbiased, efficient, consistent estimates of the person and
item calibration. 2 parameter logistic model used two parameters: item difficulty index and
different power. In the 3 parameter logistic models in addition to item difficulty index and the
difference, or guesses pseudoguessing parameters are also apparent.
There are several IRT models used in scoring politomi, including: tiered or graded
response model of response model (GRM), a tiered response model of modification or
modified graded response model (M-GRM), a partial model of credit or partial credit model
(PCM), generalized partial credit model (G-PCM), a model or a rating scale rating scale
model (RSM) and nominal response models or nominal response model (NRM) (Embretson
& Reise, 2000:95). And M-GRM GRM based on 2 parameter logistic model. G-PCM and
PCM using Rasch models or 1 parameter, RSM uses location-scale model of Andrich and
NRM used for irregular response categories.
Politomi one IRT models that can be used to measure the Likert scale was graded
response model (GRM) or tiered response models. This model was developed by Samejima
1969 (Ostini, 2006: 61, van der Linden & Hambleton, 1997: 86). Graded response model is
used when the participant responses to an item category scores and tended to increase
sequentially.
Methods, Techniques or Modes of Inquiry
The study develops the MI instrument using the following steps: determining the
construct and specification of the instrument, writing items, trying out the instrument, and
analyzing the items using IRT. The item analysis was done to estimate the parameters of
index difficulty, item discrimination, and item fit model. The estimation of parameters was
analyzed using the graded response model (GRM) with parscale program. The result of
parameter estimation were used to estimate the parameter of ability (θ) and item-information
function.
Data sources or evidence
The source of research data is the data of multiple intelligences instrument on 443
student.
Results and/or conclusions/points of view
IRT models used in the analysis of this instrument is the graded response model
(GRM). The model provides estimates of item parameters and item fit. On this instrument,
there are 4 kinds of parameter b between categories, moving from very easy to difficult.
Parameter b1 on all items have scores below -2, b2 mostly ranged from -2 to 0, ranging from
0 to 2 b3 and b4 above 2. Overall, the average item level of difficulty of -0956. B Average of
each category is b1 = -3481, b2 = -1042, b3 = 1,088, and b4 = 2,902. In general, this
instrument has had a moderate level of item difficulty. Different power parameters on all
items above 0.2. Thus, it can be said that the items to the instrument can distinguish people
who have high and low ability. Fit test results point to the model estimated ICC 37 items that
fit obtained from 72 grains made. Shows the accuracy of the information function point
average item information function of the instrument by 2816 at 1,273 theta.
Educational importance of this study for theory, practice, and/or policy
IRT analysis useful in the study of the instrument that is used on a wide scale and the
large number of subjects. It is related to the invariant nature of the IRT analyzes where the
resulting score was not associated with grain characteristics. Results of this analysis is not
required if the test again to be used elsewhere. Thus, standardized measurements are required
in various areas such as the UN tests, TPA tests, and psychological tests, will be appropriate
for use with IRT analysis.
Connection to the themes of the congress
This study is relevan with thema :

Educational standards, equity and quality in education

Assessment and evaluation in standards-based education
Download