STAT 778 / EDRM 828 – Exam 1

advertisement
STAT 778 / EDRM 828 – Exam 1
Due: Tuesday, February 14th at the start of class



You may not consult or receive help from any other students or faculty on this assignment.
You may ask me for clarification on the questions or for advice with using R.
Include copies of any computer code used and show any work done by hand. You may use any software
you desire. If you are using a menu driven program say which program and what options you used.
The data set http://www.stat.sc.edu/~habing/courses/data/rexam.txt is used throughout this
assignment. These 21 items are an excerpt from a reading exam given at a midwestern university. While the
exam was designed to measure reading, the questions are based around four reading passages about different
subjects. Questions 1-5 are based on passage 1, 6-10 on passage 2, 11-15 on passage 3, and 16-21 on passage
4. The estimated parameters from fitting the 2PL and 3PL models to this data set can be found at
http://www.stat.sc.edu/~habing/courses/data/rexam2pl.txt and
http://www.stat.sc.edu/~habing/courses/data/rexam3pl.txt respectively.
1) Provide an estimate of how long a reading exam like this would need to be in order to be reliable enough for
making high stakes decisions about individual students. Briefly justify your choice of any estimates used and
state any simplifying assumptions made.
2) Imagine that a newer version of the exam was to have only four questions for each passage. If this new
version was designed to evaluate students of all ability levels, which of items 1-5 would you remove? Briefly
justify your choice.
3) A test is said to be “speeded” when students do not have enough time to think about their answers to the
questions near the end of the exam. Why or why doesn’t speededness appear to be a problem for this exam?
4) The structure of this exam should make you question whether one or more of the assumptions of the
monotone homogeneity model actually holds. Which assumption(s) and why?
5) Assume the 3PL model fits for this data set. What item is estimated to be easiest for examinees with ability
level =1 and what is the estimated (total) true score for examinees with =1.
6) In general, does the 3PL model make more sense when thought of using the platonic/random sampling
formulation of testing or using the expected value/stochastic subject formulation? Explain your choice.
7) Simulate a data set of 5,000 examinees using the estimated 2PL item parameters and another using the
estimated 3PL item parameters (assume a standard normal distribution for the underlying abilities). Using the
distribution of observed scores, briefly argue whether this test seems to have had guessing or not.
Code like the following may be useful for question 7:
par2PL<-read.table("http://www.stat.sc.edu/~habing/courses/data/rexam2pl.txt",head=T)
a2PL<-par2PL[,2]
8) Consider Samejima’s Graded Response Model for an item scored as either 0, 1, or 2. What restriction needs
to be placed on the i1 and i2? Justify your answer.
Download