Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Data Analysis of Items Report Part 2 Modern Test Theory-Rasch Theory Willie J Jones III EADM 738 California State University, San Bernardino Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Abstract The wide use of Excel and Winstep programs has become essential in the development of test items for reliability and validity. The key to developing quality assessments of student’s academic levels is that the data is of quality. With quality data that have an even distribution, teachers and school staff can place students in the appropriate levels for academic success. The Winstep program was developed to help researcher in an effort to support students’ teachers, and parents with instrumentation that will produce effective and reliable data for assessing student’s knowledge in Algebra after entering the alternative educational system. The conversion of results from Excel to Winstep was in reference to whether students were familiar with Algebraic concepts. Students in the 6th, 7th, and 8th grades were examined. Students were examined for N=30 middle school students using survey procedures and the Rasch scale model to analyze the results. The researcher will analyze data and give a brief interpretation of the data to the best of ability. Purpose and Framework The purpose of this report is to use the Rasch model in the analyzing of the data. The intent of this analysis in to show the use of the Rasch model and how it can be applied to a dichotomous data in a alternative junior high school setting of 6th, 7th, and 8th grade students. This data was generated from a Algebra 1 using the Rasch model in order to suggest error with the items in a alternative setting. This report will provide basic data and incorporate the Rash model. Historically, teachers have been given the task of developing assessments to gain data on students to develop curriculum. Subsequently, many types of assessments were developed with the idea of benefiting the student. Consequently, teachers did not understand that they were not using the measurement theory to construct their assessment; teachers used a variety of assessments, stemming from informal to summative assessments in the classrooms. Fortunately, as the school districts begin to realize the need for reliable data through realized assessments, teachers were encouraged to participate in professional development that would emphasis the construction of quality test. Teachers will be able to construct reliable and valid measurements that would increase the quality of measurement. The intent of this data analysis report is to measure and identify through the Rasch model quality items inside the Algebra Formula Survey (AFS). This report will concentrate on the individuals ability and items difficulty with regard to the quality of the items within the (AFS) from the Rasch model. Method Rasch analysis gave the researcher the ability to include an examination of the person and item reliabilities; construct validity including item and person fit statistics, and differential item functioning (DIF), across a group. DIF analysis allowed us to evaluate if the relative item estimates (i.e., item difficulty estimates) remained invariant across the group of persons. The person fit group analysis allowed us to identify the expected and unexpected patterns of raw scores in terms Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 of the outcomes of the Rasch model. Data was collected on 30 persons who were instructed in an Algebra 1 lesson on Slope-Intercept Form. Rasch Modeling The Rasch model is the only item response theory (IRT) model in which the total score across items characterizes a person’s total ability. It is also the simplest focus models having minimum parameters for the person (just one), and just one parameters corresponding to each category of an item. This item parameter is generically referred to as the threshold. There is just one in the case of a dichotomous item, two in the case of three categories. The Rasch model, where the total score summarizes completely a person’s standing on a variable, arises from a fundamental requirement: that the comparison of two people is independent of which items assessing the same variable. Thus the Rasch model is taken as a criterion for the structure of the responses. For example, the comparison of the performance of two students’ work on a basic Algebra 1 assessment marked by different graders should be independent of the graders. In this case it is considered that the researcher is deliberately developing items that are valid for the purpose and that meet the Rasch requirements of invariance of comparisons. Examining data with regards to the Rasch model means conducting a Rasch analysis, giving a range of details for checking whether or not adding the scores is justified in the data. This appears to be called the test to fit between the data and the model. In reference to the Rasch’s model, if the invariance does not hold, then taking the total score to characterize a person is not justified. Data in many assessments do not line appropriately, and it is consider crucial to consider the fit data to the model with regard to the uses of the total data scores. When the fit data does not adequately fits the model, and then the Rasch model will linearize the total score, which is bounded by o and the maximum score of items, into the measurements (Rasch, 1960). The linearize value is the location of the person on the one-dimensional continuum-the value is called parameter in the model and there can be only one number in a one-dimensional framework (Rasch, 1960). The parameter can then be used in the analysis of variance and regression more clearly than the raw total score, which has low (Floor) and high (Ceiling) effects (Paraphrased, Rasch 1960). Rasch analysis can be applied to assessments in a variety of disciplines, including social sciences, counseling, psychology, sociology, and education. Many assessments in these disciplines include well-established groups of people responding to items for assessment. The responses on these items are scored 0,1 (for two ordered categories); or 1,2,3, (for four ordered categories) (Rasch, 1960). For this report, the data generates from items that are scored 0,1 (0 = No and 1 = Yes). The responses on these items were intended to indicate levels of competency on variable in regards to academic achievement. These response are the then added across items to give each person a total score. This total score summarizes the responses to all the items, and a person with a higher total score than another one is deemed to show more of the variable assessed (refer to Table 1.10). Summarizing the scores of the items is intended to give a single score for each person. This Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 implies that the items are intended to measure a single variable, and report the quality of the item. The Rasch Model compared to the classical test model appears to indicate that individual items interact with students’ function of their ability. The Rasch model appears to provide more information on the individual student. The Rasch reports of the logit scale that has a center point of 0 with most scores ranging from 2 to 2. Students’ that fall in the positive range of logit scores have the ability to score more than average and students’ that score in the negative range score below average. Rasch does not use the percentage scale of 0 to 100. Rasch has the ability to identify the difficulty of items from the application of the logit scale by instructors. Instructors can compare items difficulty with the student ability (Please refer to (Table 1.10). The Algebra Formula Survey was developed to test student’s knowledge of Algebra 1 vocabulary; it was made up of only ten items. This survey was developed to encourage students on Individuals Educational Plans (I.E.P.) to this at a high level in regards to math. The purpose of the survey was to examine the student’s competency and mastery in the educational setting. The survey format appeared too simple and was adjusted to meet the individual needs of the student in the Special Education forum. The students were assessed on information that was taught within the week by a qualified teacher in Algebra. The class was instructed in the normal with the use of formula maps, visuals, examples, auditory videos, and review of the subject matter. The survey was developed with only two answers: (a) 1= yes and (b) 0=no. This survey did not measure the student’s strength in basic math or Geometry. This survey was developed to test student resiliency, memory, and recall. Students’ were given a lesson on Slope-intercept one week prior to the test being administered to the students. Students appeared to not be introduced to Algebra 1 to their lesson on Slope-Intercept Form; this simple quiz gave the researcher an opportunity to gather data for future assessment design in the classroom. Students were recognized as being in the sixth, seventh, and eighth grades. With permission from parents, students were identified as being on Individual Educational Plans (I.E.P.’s). Data Collection Procedures In Appendix A, the survey questions were given to the students in written form, the students were given five minutes to look at the questions. This process was repeated for every question, the students were give approximately five minutes before the instructor would read the question. The students had to circle 1=yes, if they thought the equation was in Slope-Intercept Form and 0 = no, if they believed the equation was not in Slope-Intercept Form. The researcher instructed the instructor to conduct the assessment in the manner that they would normally deliver the test. The instructor did not let the students have anything on their desk and the students were not allowed to use their notes. The instructor informed the students of the directions and handed the test to the students. The instructor stated to the students, “ Today we are being tested on the formula for Slope-Intercept, please circle 1 = yes or 0 = no if know that the particular equation that is provided it in Slope-Intercept Form. Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 The instructor discussed five examples that were independent of the survey before the test was handed out. These equations consisted of 3y=2x+7, 7x= 4+ y, 12= 9x+3y, y=2x+9, and 5x- 9 = y. The instructor stated, “Here are five equations, which equations are in Slope-Intercept form?” The instructor stated, “Please try to remember the strategies that were given to you over the week, make sure you take your time and do not rush, you know the material and are going to do great.” The instructor stated, “Please do not leave any questions blank, I know you will all do the best you can, I believe in you.” Sample Data A researcher to help instructors develop assessments that would provide valuable data to identify student’s strengths and weaknesses collected this data. This assessment was constructed by an instructor in the Special Education department that wanted to meet the needs of the examiner but wanted to maintain the rigor of the test. The instructor needed this data to evaluate the assessment with regards to being a quality test. This sample did not take into consideration ethnicity, socioeconomics, or demographics. The participants included all gender. The researcher did not assess for ability levels of each student. Participants were taught in the same manner with supports within the lesson. Students received supports before the assessment and at the beginning. Students did not receive any supports in the middle of assessment. Ten students were assessed in Algebra 1, between the ages of 11 and 13 years of age in a summer school setting. Every participant was on and I.E.P. and was identified as needing remedial assistance in math. Most Unexpected Item Responses in Terms of Measure In Table 10.4 show the most unexpected item response in terms of the measures. The items are listed in increasing order of difficulty. The person entrynumbers are in each column and listed vertically. Person 22 had a low satisfaction measure of .77. This person answered “0” to the easy Item # 2but he answered all “1”s to other items. Variable Map Table 1.10 In Table 1.10, the relation between the student and item on the assessment can be made using the items and the persons. The items mean is lower than most items, which indicate that the assessment was easy for the students. Items 6, 7, and 8 fall below the mean; three of ten items are estimated below the mean. Gaps between the items demonstrate where the assessment is not distributing the student’s performance evenly along the continuum. In Table 1.10 gaps occur in between the item and persons mean, this is one standard deviation below the persons mean. Seven items (1,10,) are estimated to be greater than two standard deviations above the persons mean which is troublesome if mastery of the Algebra concepts is expected to ultimately meet proficiency according to common core standards. In evaluating these gaps for mastery we need to view the gaps along the continuum in regards to items on the assessment. Each gap illustrates the concept of hierarchy Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 and levels of knowledge that is required to understand Slope-intercept form an Algebra concept. Upon the analysis, the expectation that once students have been instructed in the concepts of certain math formulas that they will be able to perform at high levels on different assessments. It is unfortunate that, seven of the ten did not meet the differential levels responding from the baseline to post assessment (1,2,3,4,5,9, and 10). Consequently, there are four reasons the students patterns appeared to be unequally distributed. The reason appears to be that the items were too easy. Secondly, the students had a fifty-fifty chance of getting them correct or wrong and guessed on the items. Thirdly, the instructor gave hints or clues during the initial instruction process. And finally, the instructor gave an effective lesson on the subject matter and the students were able to confidently answer the questions the right way. Results and Discussion Little attention in many districts is given to investigative methods of specific topics in individual classrooms becoming less difficult by the end of the assessment. Within the analysis, item difficulty estimates are inspected and utilized or make inferences about changes in curriculum and service delivery of concepts in Algebra 1. The data with in the Winstep program focuses on the strength of their than on the student’s ability to understand the concept. This is necessary for many teachers with the opportunity to develop their own assessments. In addition curriculum developers, need this information to develop meaningful items impacting the learning process of students in many classrooms. Finally, the goal of item difficulty estimates is to provide an even distribution of strong items that will challenge the students’ ability to think critically. Misfit Order Table 10.1 Table 10.1 present a moderately medium person reliability index (.72) and a medium item reliability index (.60). These are considered good average index for both item and person. The mean fit and outfit for person and item mean squares are expected to be 1.00, and for this data, they are all close to 1.00. The mean standard infit and outfit are expected to be 0.0. However, the table shows that the z-scores for infit and outfit are 0.60 for persons and 1.24 for relative items, this indicates that the items are infit. It also represents that the data does not fit the model. The data shows an overall unacceptable fit, as the value for standardized outfit standard deviation for person is 1.32. The separation index for items is .68, a moderately low spread of items and persons along a continuum. Separation index shows a slight increase in values from 1.24 to 1.61. The indices indicate that person’s ability level can be categorized into 2 to 3 level spread of person positions. The initial person reliability index was estimated at .60 and increased to .72 when all misfit responses were removed. For item summary statistics of items, there were slight improvements in the fit statistics as compared to before removal of misfit statistics as compared to before removal of misfit responses. Overall item reliability index is estimated at .60 but increase to .72 as more misfit responses were removed. Items spread also shows more variability as separation index increase from 1.24 to 1.61. Correlations for Items Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 To judge the strength of the measurement dimension, we used the following guidelines for variance explained by the measure: less than 40%is considered a strong measurement dimension (Linacre, 2006), less than 30% is considered a moderate measurement dimension, and less than 20% is considered a minimal dimension. The 20% criterion is taken from Reckase (1979). In Table 23, the variance explained by the measure is 44.7%, which is strong. Secondly, 26.4% of the variance is explained by the fist factor of the residuals, the ratio of 44.7 to 26.4 is 3 to 1, which is supportive of unidimensionality. Dichotomous Curves In Table 21.1, the curves show how probable is the observation of each category for measures relative to the item measure. Ordinarily, 0 logits on the plot corresponds to the item measure, and is the point at which the highest and lowest categories are equally likely to be observed. Usually, the plot should look like a range of hills. In Table 21.1, we observe an incline of “0”s that top off at .80 to the left and “1”s that top off to the right at .80. The patterns within Table 21.1 suggest the need to re-consider the choice of response options both in terms of the number of response options and the corresponding labels. Conclusion Upon the review of data in this report with regard to Rasch Model the quality of the Algebra Formula Survey items, a Rasch model analysis was performed. After evaluating this survey of items the conclusion of this data suggest that the items had -0.01, reliability. This data implies that the questions were either too easy, participants were well prepared, or the students guessed. Although the survey was developed to test the quality of the tool, the survey appeared to have errors with in the development due to items difficulty being either too easy or moderately easy. The scale did not appear to have an equal distribution of quality measures. Although the measure appeared to not have high reliability, it did provide data for the development of test by instructors trying to provide rigor to students with special circumstances. Interestingly, the data may suggest that the students understood the material being presented and due to the 50% variance of each question students have a 50% chance of getting the question right or wrong. The assessment had a low reliability of -0.01, the threshold for high reliability appears to be .40 and higher. In this case, the developer would need to go back through the assessment to restructure items and tool with an even distribution of easy, moderate, and difficult item. In conclusion, due to the sample of students and number of items, this may have significantly damaged the reliability and validity of the Algebra Formula Survey items. Significantly, the Rasch model evaluated each individual students giving them a total score. This report was able to conclude that many of the items in reference to the Algebra Formula Survey had error. Upon interpreting the item analysis of the Rasch model, many of the items needed to be restructured for difficulty. Subsequently, the Rasch model, provided analysis and data that will be used to design an more reliable and valid measure for individual students in Algebra 1. Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Appendix A Name: Date: Instructor: Algebra Formula Survey Directions: Identify the equation that represents the Slope-Intercept Formula. Circle 1 = Yes, if it represents Slope-Intercept and 0 = No, if it does not. _________________________________________________________________________________________________ 1. 0 = No 1 = Yes y = 10x + 4 2. 0 = No 1 = Yes y = 9 – 8x 3. 0 = No 1 = Yes 7 = 8x – 9y 4. 0 = No 1 = Yes y = 5x + 1 5. 0 = No 1 = Yes 20x = 4y + 4 6. 0 = No 1 = Yes y = 20x + 5 7. 0 = No 1 = Yes 35x = 35 + 35y 8. 0 = No 1 = Yes 100y = 25x – 25 9. 0 = No 1 = Yes 10x + 4 = 9y 10. 0 = No 1 = Yes 21y + 7 = 14x Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Item Difficulty P 0.6 0.5 0.4 0.3 P 0.2 0.1 0 -3 -2 -1 0 1 2 3 (Item Difficulty) Easy D (Ability) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 P 0.56510205 0.567246276 0.543538608 0.537033835 0.5 0.484727548 0.482547431 0.454296319 0.43919364 0.432753724 B= Ability D= Difficulty Difficult Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Personal Ability B (Difficult y) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 D (Ability) B-D e^(B-D) 1+e^(B-D) P 0 0 0 0 0 0 0 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 0.78 0.83 0.88 0.94 1.00 1.06 1.13 1.20 1.28 1.36 1.78 1.83 1.88 1.94 2.00 2.06 2.13 2.20 2.28 2.36 0.44 0.45 0.47 0.48 0.50 0.52 0.53 0.55 0.56 0.58 Personal Ability P 0.70 0.60 0.50 0.40 P 0.30 0.20 0.10 0.00 -3 -2 -1 0 1 2 3 Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 B (Difficulty) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 P 0.44 0.45 0.47 0.48 0.50 0.52 0.53 0.55 0.56 0.58 Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Table 10.1 Map Item Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Table 23.0 Standard Residual Variance Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Table 10.2 Misfit Order PERSON: REAL SEP.: 1.24 REL.: .60 ... ITEM: REAL SEP.: 1.61 REL.: .7 Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Table 10.4 Most Unexpected Item Responses in Terms of Measure Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 Table 21.1 Dichotomous Curves Running head: MODERN TEST THEORY/RASCH THEORY DATA ANALYSIS PART 2 References Dodeen, H. (2004). The relationship between item parameters and item fit. Journal Of Educational Measurement, 41, 261-270. Linacre, J.M. (1998a). Structure in Rasch residuals: Why principal components Analysis (PCA)? Rasch Measurement Transactions, 1(2), 636. Linacre, J.M. (1998b). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-283. Linacre, J.M. (2006). Data Variance Explained by Measures, Rasch Measurement Transactions, 20:1 p. 1045. Linacre. J.M. (2011). Winsteps Rasch Measurement (Version 3.71.1). www.winstep.com. Author. Reise, S. (1990). A comparison of item and person fit methods of assessing model fit in IRT. Applied Psychological Measurement, 42, 127 -137. Rasch, G. (1960). Probabilistic models for some intelligence and attainment test. Copenhagen, Denmark: Danish Institute for Educational Research. www.rasch-analysis.com/rasch-analysis.htm. Rasch, G. Retrieved September 04,2013 from http://www/rasch-analysis.com/itemAnalysis.htm. Reckase, M. (1979). Unifactor latent trait models applied to multifactor test: results and Implications. Journal of Educational Statistics, 4, 207-230. Winstep & Rasch measurement software. (2010a). Misfit diagnosis: Infit outfit meanSquare standardized. Retrieve http://www.winstep.com/winman/index.htm Winstep & Rasch measurement Software. (2101b). Item statistics in misfit order. Retrieved from hhttp://www.winstep.com/winman/index.htm?table10_1.htm.