File - Willie J Jones III

advertisement
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Data Analysis of Items Report
Part 2 Modern Test Theory-Rasch Theory
Willie J Jones III
EADM 738
California State University, San Bernardino
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Abstract
The wide use of Excel and Winstep programs has become essential in the
development of test items for reliability and validity. The key to developing quality
assessments of student’s academic levels is that the data is of quality. With quality
data that have an even distribution, teachers and school staff can place students in
the appropriate levels for academic success. The Winstep program was developed
to help researcher in an effort to support students’ teachers, and parents with
instrumentation that will produce effective and reliable data for assessing student’s
knowledge in Algebra after entering the alternative educational system. The
conversion of results from Excel to Winstep was in reference to whether students
were familiar with Algebraic concepts. Students in the 6th, 7th, and 8th grades were
examined. Students were examined for N=30 middle school students using survey
procedures and the Rasch scale model to analyze the results. The researcher will
analyze data and give a brief interpretation of the data to the best of ability.
Purpose and Framework
The purpose of this report is to use the Rasch model in the analyzing of the
data. The intent of this analysis in to show the use of the Rasch model and how it
can be applied to a dichotomous data in a alternative junior high school setting of
6th, 7th, and 8th grade students. This data was generated from a Algebra 1 using the
Rasch model in order to suggest error with the items in a alternative setting. This
report will provide basic data and incorporate the Rash model. Historically,
teachers have been given the task of developing assessments to gain data on
students to develop curriculum. Subsequently, many types of assessments were
developed with the idea of benefiting the student. Consequently, teachers did not
understand that they were not using the measurement theory to construct their
assessment; teachers used a variety of assessments, stemming from informal to
summative assessments in the classrooms. Fortunately, as the school districts begin
to realize the need for reliable data through realized assessments, teachers were
encouraged to participate in professional development that would emphasis the
construction of quality test. Teachers will be able to construct reliable and valid
measurements that would increase the quality of measurement. The intent of this
data analysis report is to measure and identify through the Rasch model quality
items inside the Algebra Formula Survey (AFS). This report will concentrate on the
individuals ability and items difficulty with regard to the quality of the items within
the (AFS) from the Rasch model.
Method
Rasch analysis gave the researcher the ability to include an examination of
the person and item reliabilities; construct validity including item and person fit
statistics, and differential item functioning (DIF), across a group. DIF analysis
allowed us to evaluate if the relative item estimates (i.e., item difficulty estimates)
remained invariant across the group of persons. The person fit group analysis
allowed us to identify the expected and unexpected patterns of raw scores in terms
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
of the outcomes of the Rasch model. Data was collected on 30 persons who were
instructed in an Algebra 1 lesson on Slope-Intercept Form.
Rasch Modeling
The Rasch model is the only item response theory (IRT) model in
which the total score across items characterizes a person’s total ability. It is also the
simplest focus models having minimum parameters for the person (just one), and
just one parameters corresponding to each category of an item. This item
parameter is generically referred to as the threshold. There is just one in the case of
a dichotomous item, two in the case of three categories.
The Rasch model, where the total score summarizes completely a person’s
standing on a variable, arises from a fundamental requirement: that the comparison
of two people is independent of which items assessing the same variable. Thus the
Rasch model is taken as a criterion for the structure of the responses. For example,
the comparison of the performance of two students’ work on a basic Algebra 1
assessment marked by different graders should be independent of the graders. In
this case it is considered that the researcher is deliberately developing items that
are valid for the purpose and that meet the Rasch requirements of invariance of
comparisons.
Examining data with regards to the Rasch model means conducting a Rasch
analysis, giving a range of details for checking whether or not adding the scores is
justified in the data. This appears to be called the test to fit between the data and
the model. In reference to the Rasch’s model, if the invariance does not hold, then
taking the total score to characterize a person is not justified. Data in many
assessments do not line appropriately, and it is consider crucial to consider the fit
data to the model with regard to the uses of the total data scores. When the fit data
does not adequately fits the model, and then the Rasch model will linearize the total
score, which is bounded by o and the maximum score of items, into the
measurements (Rasch, 1960). The linearize value is the location of the person on
the one-dimensional continuum-the value is called parameter in the model and
there can be only one number in a one-dimensional framework (Rasch, 1960). The
parameter can then be used in the analysis of variance and regression more clearly
than the raw total score, which has low (Floor) and high (Ceiling) effects
(Paraphrased, Rasch 1960).
Rasch analysis can be applied to assessments in a variety of disciplines,
including social sciences, counseling, psychology, sociology, and education. Many
assessments in these disciplines include well-established groups of people
responding to items for assessment. The responses on these items are scored 0,1
(for two ordered categories); or 1,2,3, (for four ordered categories) (Rasch, 1960).
For this report, the data generates from items that are scored 0,1 (0 = No and 1 =
Yes). The responses on these items were intended to indicate levels of competency
on variable in regards to academic achievement. These response are the then added
across items to give each person a total score. This total score summarizes the
responses to all the items, and a person with a higher total score than another one is
deemed to show more of the variable assessed (refer to Table 1.10). Summarizing
the scores of the items is intended to give a single score for each person. This
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
implies that the items are intended to measure a single variable, and report the
quality of the item.
The Rasch Model compared to the classical test model appears to indicate
that individual items interact with students’ function of their ability. The Rasch
model appears to provide more information on the individual student. The Rasch
reports of the logit scale that has a center point of 0 with most scores ranging from 2 to 2. Students’ that fall in the positive range of logit scores have the ability to score
more than average and students’ that score in the negative range score below
average. Rasch does not use the percentage scale of 0 to 100. Rasch has the ability
to identify the difficulty of items from the application of the logit scale by
instructors. Instructors can compare items difficulty with the student ability (Please
refer to (Table 1.10).
The Algebra Formula Survey was developed to test student’s knowledge of
Algebra 1 vocabulary; it was made up of only ten items. This survey was developed
to encourage students on Individuals Educational Plans (I.E.P.) to this at a high level
in regards to math. The purpose of the survey was to examine the student’s
competency and mastery in the educational setting. The survey format appeared
too simple and was adjusted to meet the individual needs of the student in the
Special Education forum. The students were assessed on information that was
taught within the week by a qualified teacher in Algebra. The class was instructed in
the normal with the use of formula maps, visuals, examples, auditory videos, and
review of the subject matter. The survey was developed with only two answers: (a)
1= yes and (b) 0=no. This survey did not measure the student’s strength in basic
math or Geometry. This survey was developed to test student resiliency, memory,
and recall. Students’ were given a lesson on Slope-intercept one week prior to the
test being administered to the students. Students appeared to not be introduced to
Algebra 1 to their lesson on Slope-Intercept Form; this simple quiz gave the
researcher an opportunity to gather data for future assessment design in the
classroom. Students were recognized as being in the sixth, seventh, and eighth
grades. With permission from parents, students were identified as being on
Individual Educational Plans (I.E.P.’s).
Data Collection Procedures
In Appendix A, the survey questions were given to the students in written
form, the students were given five minutes to look at the questions. This process
was repeated for every question, the students were give approximately five minutes
before the instructor would read the question. The students had to circle 1=yes, if
they thought the equation was in Slope-Intercept Form and 0 = no, if they believed
the equation was not in Slope-Intercept Form.
The researcher instructed the instructor to conduct the assessment in the
manner that they would normally deliver the test. The instructor did not let the
students have anything on their desk and the students were not allowed to use their
notes. The instructor informed the students of the directions and handed the test to
the students. The instructor stated to the students, “ Today we are being tested on
the formula for Slope-Intercept, please circle 1 = yes or 0 = no if know that the
particular equation that is provided it in Slope-Intercept Form.
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
The instructor discussed five examples that were independent of the survey
before the test was handed out. These equations consisted of 3y=2x+7, 7x= 4+ y,
12= 9x+3y, y=2x+9, and 5x- 9 = y. The instructor stated, “Here are five equations,
which equations are in Slope-Intercept form?” The instructor stated, “Please try to
remember the strategies that were given to you over the week, make sure you take
your time and do not rush, you know the material and are going to do great.” The
instructor stated, “Please do not leave any questions blank, I know you will all do the
best you can, I believe in you.”
Sample Data
A researcher to help instructors develop assessments that would provide
valuable data to identify student’s strengths and weaknesses collected this data.
This assessment was constructed by an instructor in the Special Education
department that wanted to meet the needs of the examiner but wanted to maintain
the rigor of the test. The instructor needed this data to evaluate the assessment
with regards to being a quality test. This sample did not take into consideration
ethnicity, socioeconomics, or demographics. The participants included all gender.
The researcher did not assess for ability levels of each student. Participants were
taught in the same manner with supports within the lesson. Students received
supports before the assessment and at the beginning. Students did not receive any
supports in the middle of assessment. Ten students were assessed in Algebra 1,
between the ages of 11 and 13 years of age in a summer school setting. Every
participant was on and I.E.P. and was identified as needing remedial assistance in
math.
Most Unexpected Item Responses in Terms of Measure
In Table 10.4 show the most unexpected item response in terms of the
measures. The items are listed in increasing order of difficulty. The person entrynumbers are in each column and listed vertically. Person 22 had a low satisfaction
measure of .77. This person answered “0” to the easy Item # 2but he answered all
“1”s to other items.
Variable Map Table 1.10
In Table 1.10, the relation between the student and item on the assessment
can be made using the items and the persons. The items mean is lower than most
items, which indicate that the assessment was easy for the students. Items 6, 7, and
8 fall below the mean; three of ten items are estimated below the mean. Gaps
between the items demonstrate where the assessment is not distributing the
student’s performance evenly along the continuum. In Table 1.10 gaps occur in
between the item and persons mean, this is one standard deviation below the
persons mean.
Seven items (1,10,) are estimated to be greater than two standard deviations
above the persons mean which is troublesome if mastery of the Algebra concepts is
expected to ultimately meet proficiency according to common core standards. In
evaluating these gaps for mastery we need to view the gaps along the continuum in
regards to items on the assessment. Each gap illustrates the concept of hierarchy
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
and levels of knowledge that is required to understand Slope-intercept form an
Algebra concept. Upon the analysis, the expectation that once students have been
instructed in the concepts of certain math formulas that they will be able to perform
at high levels on different assessments. It is unfortunate that, seven of the ten did
not meet the differential levels responding from the baseline to post assessment
(1,2,3,4,5,9, and 10). Consequently, there are four reasons the students patterns
appeared to be unequally distributed. The reason appears to be that the items were
too easy. Secondly, the students had a fifty-fifty chance of getting them correct or
wrong and guessed on the items. Thirdly, the instructor gave hints or clues during
the initial instruction process. And finally, the instructor gave an effective lesson on
the subject matter and the students were able to confidently answer the questions
the right way.
Results and Discussion
Little attention in many districts is given to investigative methods of specific
topics in individual classrooms becoming less difficult by the end of the assessment.
Within the analysis, item difficulty estimates are inspected and utilized or make
inferences about changes in curriculum and service delivery of concepts in Algebra
1. The data with in the Winstep program focuses on the strength of their than on
the student’s ability to understand the concept. This is necessary for many teachers
with the opportunity to develop their own assessments. In addition curriculum
developers, need this information to develop meaningful items impacting the
learning process of students in many classrooms. Finally, the goal of item difficulty
estimates is to provide an even distribution of strong items that will challenge the
students’ ability to think critically.
Misfit Order Table 10.1
Table 10.1 present a moderately medium person reliability index (.72) and a
medium item reliability index (.60). These are considered good average index for
both item and person. The mean fit and outfit for person and item mean squares are
expected to be 1.00, and for this data, they are all close to 1.00. The mean standard
infit and outfit are expected to be 0.0.
However, the table shows that the z-scores for infit and outfit are 0.60 for
persons and 1.24 for relative items, this indicates that the items are infit. It also
represents that the data does not fit the model. The data shows an overall
unacceptable fit, as the value for standardized outfit standard deviation for person is
1.32. The separation index for items is .68, a moderately low spread of items and
persons along a continuum. Separation index shows a slight increase in values from
1.24 to 1.61. The indices indicate that person’s ability level can be categorized into
2 to 3 level spread of person positions. The initial person reliability index was
estimated at .60 and increased to .72 when all misfit responses were removed. For
item summary statistics of items, there were slight improvements in the fit statistics
as compared to before removal of misfit statistics as compared to before removal of
misfit responses. Overall item reliability index is estimated at .60 but increase to .72
as more misfit responses were removed. Items spread also shows more variability
as separation index increase from 1.24 to 1.61.
Correlations for Items
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
To judge the strength of the measurement dimension, we used the following
guidelines for variance explained by the measure: less than 40%is considered a
strong measurement dimension (Linacre, 2006), less than 30% is considered a
moderate measurement dimension, and less than 20% is considered a minimal
dimension. The 20% criterion is taken from Reckase (1979). In Table 23, the
variance explained by the measure is 44.7%, which is strong. Secondly, 26.4% of the
variance is explained by the fist factor of the residuals, the ratio of 44.7 to 26.4 is 3
to 1, which is supportive of unidimensionality.
Dichotomous Curves
In Table 21.1, the curves show how probable is the observation of each
category for measures relative to the item measure. Ordinarily, 0 logits on the plot
corresponds to the item measure, and is the point at which the highest and lowest
categories are equally likely to be observed. Usually, the plot should look like a
range of hills. In Table 21.1, we observe an incline of “0”s that top off at .80 to the
left and “1”s that top off to the right at .80. The patterns within Table 21.1 suggest
the need to re-consider the choice of response options both in terms of the number
of response options and the corresponding labels.
Conclusion
Upon the review of data in this report with regard to Rasch Model the quality
of the Algebra Formula Survey items, a Rasch model analysis was performed. After
evaluating this survey of items the conclusion of this data suggest that the items had
-0.01, reliability. This data implies that the questions were either too easy,
participants were well prepared, or the students guessed. Although the survey was
developed to test the quality of the tool, the survey appeared to have errors with in
the development due to items difficulty being either too easy or moderately easy.
The scale did not appear to have an equal distribution of quality measures.
Although the measure appeared to not have high reliability, it did provide data for
the development of test by instructors trying to provide rigor to students with
special circumstances. Interestingly, the data may suggest that the students
understood the material being presented and due to the 50% variance of each
question students have a 50% chance of getting the question right or wrong. The
assessment had a low reliability of -0.01, the threshold for high reliability appears to
be .40 and higher. In this case, the developer would need to go back through the
assessment to restructure items and tool with an even distribution of easy,
moderate, and difficult item. In conclusion, due to the sample of students and
number of items, this may have significantly damaged the reliability and validity of
the Algebra Formula Survey items. Significantly, the Rasch model evaluated each
individual students giving them a total score. This report was able to conclude that
many of the items in reference to the Algebra Formula Survey had error. Upon
interpreting the item analysis of the Rasch model, many of the items needed to be
restructured for difficulty. Subsequently, the Rasch model, provided analysis and
data that will be used to design an more reliable and valid measure for individual
students in Algebra 1.
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Appendix A
Name:
Date:
Instructor:
Algebra Formula Survey
Directions: Identify the equation that represents the Slope-Intercept Formula. Circle
1 = Yes, if it represents Slope-Intercept and 0 = No, if it does not.
_________________________________________________________________________________________________
1. 0 = No
1 = Yes
y = 10x + 4
2. 0 = No
1 = Yes
y = 9 – 8x
3. 0 = No
1 = Yes
7 = 8x – 9y
4. 0 = No
1 = Yes
y = 5x + 1
5. 0 = No
1 = Yes
20x = 4y + 4
6. 0 = No
1 = Yes
y = 20x + 5
7. 0 = No
1 = Yes
35x = 35 + 35y
8. 0 = No
1 = Yes
100y = 25x – 25
9. 0 = No
1 = Yes
10x + 4 = 9y
10. 0 = No
1 = Yes
21y + 7 = 14x
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Item Difficulty
P
0.6
0.5
0.4
0.3
P
0.2
0.1
0
-3
-2
-1
0
1
2
3
(Item Difficulty)
Easy
D (Ability)
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
P
0.56510205
0.567246276
0.543538608
0.537033835
0.5
0.484727548
0.482547431
0.454296319
0.43919364
0.432753724
B= Ability
D= Difficulty
Difficult
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Personal Ability
B
(Difficult
y)
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
D
(Ability)
B-D
e^(B-D)
1+e^(B-D)
P
0
0
0
0
0
0
0
0
0
0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0.78
0.83
0.88
0.94
1.00
1.06
1.13
1.20
1.28
1.36
1.78
1.83
1.88
1.94
2.00
2.06
2.13
2.20
2.28
2.36
0.44
0.45
0.47
0.48
0.50
0.52
0.53
0.55
0.56
0.58
Personal Ability
P
0.70
0.60
0.50
0.40
P
0.30
0.20
0.10
0.00
-3
-2
-1
0
1
2
3
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
B
(Difficulty)
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
P
0.44
0.45
0.47
0.48
0.50
0.52
0.53
0.55
0.56
0.58
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Table 10.1 Map Item
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Table 23.0 Standard Residual Variance
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Table 10.2 Misfit Order
PERSON: REAL SEP.: 1.24 REL.: .60 ... ITEM: REAL SEP.: 1.61 REL.: .7
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Table 10.4 Most Unexpected Item Responses in Terms of Measure
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
Table 21.1 Dichotomous Curves
Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2
References
Dodeen, H. (2004). The relationship between item parameters and item fit. Journal
Of Educational Measurement, 41, 261-270.
Linacre, J.M. (1998a). Structure in Rasch residuals: Why principal components
Analysis (PCA)? Rasch Measurement Transactions, 1(2), 636.
Linacre, J.M. (1998b). Detecting multidimensionality: Which residual data-type
works best? Journal of Outcome Measurement, 2(3), 266-283.
Linacre, J.M. (2006). Data Variance Explained by Measures, Rasch Measurement
Transactions, 20:1 p. 1045.
Linacre. J.M. (2011). Winsteps Rasch Measurement (Version 3.71.1).
www.winstep.com. Author.
Reise, S. (1990). A comparison of item and person fit methods of assessing
model fit in IRT. Applied Psychological Measurement, 42, 127
-137.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment test.
Copenhagen, Denmark: Danish Institute for Educational Research.
www.rasch-analysis.com/rasch-analysis.htm.
Rasch, G. Retrieved September 04,2013 from http://www/rasch-analysis.com/itemAnalysis.htm.
Reckase, M. (1979). Unifactor latent trait models applied to multifactor test: results and
Implications. Journal of Educational Statistics, 4, 207-230.
Winstep & Rasch measurement software. (2010a). Misfit diagnosis: Infit outfit meanSquare standardized. Retrieve
http://www.winstep.com/winman/index.htm
Winstep & Rasch measurement Software. (2101b). Item statistics in misfit order.
Retrieved from hhttp://www.winstep.com/winman/index.htm?table10_1.htm.
Download