STAT 778 / EDRM 828 – Exam 1 Due: Tuesday, February 14th at the start of class You may not consult or receive help from any other students or faculty on this assignment. You may ask me for clarification on the questions or for advice with using R. Include copies of any computer code used and show any work done by hand. You may use any software you desire. If you are using a menu driven program say which program and what options you used. The data set http://www.stat.sc.edu/~habing/courses/data/rexam.txt is used throughout this assignment. These 21 items are an excerpt from a reading exam given at a midwestern university. While the exam was designed to measure reading, the questions are based around four reading passages about different subjects. Questions 1-5 are based on passage 1, 6-10 on passage 2, 11-15 on passage 3, and 16-21 on passage 4. The estimated parameters from fitting the 2PL and 3PL models to this data set can be found at http://www.stat.sc.edu/~habing/courses/data/rexam2pl.txt and http://www.stat.sc.edu/~habing/courses/data/rexam3pl.txt respectively. 1) Provide an estimate of how long a reading exam like this would need to be in order to be reliable enough for making high stakes decisions about individual students. Briefly justify your choice of any estimates used and state any simplifying assumptions made. 2) Imagine that a newer version of the exam was to have only four questions for each passage. If this new version was designed to evaluate students of all ability levels, which of items 1-5 would you remove? Briefly justify your choice. 3) A test is said to be “speeded” when students do not have enough time to think about their answers to the questions near the end of the exam. Why or why doesn’t speededness appear to be a problem for this exam? 4) The structure of this exam should make you question whether one or more of the assumptions of the monotone homogeneity model actually holds. Which assumption(s) and why? 5) Assume the 3PL model fits for this data set. What item is estimated to be easiest for examinees with ability level =1 and what is the estimated (total) true score for examinees with =1. 6) In general, does the 3PL model make more sense when thought of using the platonic/random sampling formulation of testing or using the expected value/stochastic subject formulation? Explain your choice. 7) Simulate a data set of 5,000 examinees using the estimated 2PL item parameters and another using the estimated 3PL item parameters (assume a standard normal distribution for the underlying abilities). Using the distribution of observed scores, briefly argue whether this test seems to have had guessing or not. Code like the following may be useful for question 7: par2PL<-read.table("http://www.stat.sc.edu/~habing/courses/data/rexam2pl.txt",head=T) a2PL<-par2PL[,2] 8) Consider Samejima’s Graded Response Model for an item scored as either 0, 1, or 2. What restriction needs to be placed on the i1 and i2? Justify your answer.