Instructional Science 752: Test Theory

advertisement
COURSE SYLLABUS
Instructional Science 752: Test Theory
Fall 2005
Instructor:
Dr. Richard Sudweeks, email: richard_sudweeks@byu.edu
Office: 150-M McKay Bldg,; Telephone: 422-7078
Hours: 9:00-11:50 and 1:00-2:00 p.m. Monday and Wednesday
Class Meeting Schedule:
4:00—5:20 p.m., Monday & Wednesday359 MCKB
Required Textbooks:
Embretson, S.E. & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Shavelson, R.J. & Webb, N.M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Course Packets:
Course Materials for IPT 752
Supplementary Readings for IPT 752
Course Rationale:
Test theory is a discipline of study which focuses on explaining how reliable and valid inferences
about unobservable characteristics of persons can be made from their responses to the items on a test.
The discipline is also concerned with studying and resolving practical problems and issues associated
with the development and use of tests and other assessment procedures.
Modern test theory is a composite of classical true-score theory, item response theory, and
generalizability theory. This composite theory has three important uses or applications:
1. It provides criteria for use in evaluating the adequacy of the process used in developing tests and the
resulting instruments and procedures as a basis for making decisions about how and where they need
to be improved,
2. It provides a set of procedures and a rationale for solving practical measurement problems such as
estimating reliability and validity, conducting item analyses, detecting potentially-biased items,
equating test scores, determining appropriate scoring procedures, etc.
3. It provides a rationale for identifying the limitations of the scores obtained from a particular test and
the cautions that should be kept in mind when interpreting the results.
Course Objectives:
The purpose of this course is to help students understand modern test theory and to critically examine
current practices and problems in educational and psychological measurement from the perspective of this
theory. Hence, the course is designed to help students become proficient in applying modern test theory to
evaluate and improve existing assessment instruments and procedures, to solve practical measurement
problems, and to interpret test scores in a responsible way.
In addition to this main goal, the course is designed to help students be able to-1. Distinguish between the classical and modern approaches to test theory including the assumptions on
which each is based and the implications of each for constructing, scoring, and interpreting tests.
2. Describe the relative advantages and disadvantages associated with classical true-score theory, and the
contributions and limitations of generalizability theory and item response theory.
3. Use classical methods plus item response theory and generalizability theory to solve practical problems in
estimating the reliability of test scores, conducting item analyses, detecting item bias, etc.
4. Understand the current controversies and problems in educational and psychological measurement
and identify lines of inquiry for further research that are likely to be fruitful.
2
Software
Students are expected to develop proficiency in using four computer programs:
1. EXCEL (or some other similar program with both spreadsheet and graphing capabilities):
This program will be frequently used to provide demonstrations and examples of concepts taught in
this course. Some homework assignments are facilitated by using spreadsheet software. Students who
know how to use a spreadsheet can readily create and examine examples of their own.
An instructional module developed by Robert Miller will be provided to help students become
proficient in using EXCEL. This module includes explanations, demonstrations, examples, and learning
activities.
2. SPSS (for use in conducting item analysis, reliability studies, and factor analyses).
3. GENOVA (for use in conducting both G studies and D studies using generalizability theory):
Both PC and Macintosh versions of this program developed by Joe Crick and Robert Brennan are
available from the authors. GENOVA handles only balanced designs, but a new version called
urGENOVA handles both balanced or unbalanced designs. The assignments in this class will include
balanced designs only, so GENOVA is sufficient.
Copies of GENOVA are available on some computers in the I.P & T. graduate student lab in 150
MCKB. The manual is on reserve at the Lee Library, but directions for preparing control files to
complete the homework assignments are included in the Course Packet.
4. WINSTEPS (for Rasch analyses using the dichotomous, rating scale, or partial-credit models):
This program developed by Michael Linacre can handle data sets including up to 10 thousand items
and 1 million persons. A small-scale, student version called MINISTEP may be downloaded free from
www.winsteps.com/ministep.htm. This reduced version is limited to a maximum of 25 items and 100
cases, but that will be sufficient for the assignments in this class.
Course Requirements:
1. Become proficient in using each of the four computer programs described above.
2. Complete all assigned homework exercises.
3. Successfully complete the three examinations: two interim exams and a final exam.
4. Locate a published journal article of interest to you which focuses on the use of item response theory
or generalizability theory. Prepare a written summary and critique of the article in which you-a. State the problem addressed by the study including any research questions or hypotheses.
b. Describe how the method (IRT or G-theory) was used in this study to address the purpose of the
study.
c. Critique the use of the method in the context of this study:
(1). Was the method appropriate in this context or would another method be more appropriate?
(2). To what extent are the researcher's conclusions supported by the data?
5. Complete a data analysis project involving the use of IRT or G-theory.
Testing And Grading Policy:
Course grades will be determined by performance on the four exams (50%), the two journal article
critiques (15% each = 30%) and the homework exercises (20%). Students are expected to complete the
reading assignments prior to and in preparation for discussion in class. Homework exercises should also
be completed in preparation for review and discussion in class. The examinations will include problems
similar to the problems encountered in the homework. Students who understand the homework exercises
should perform well on the examinations.
3
Course Schedule:
Date
8/28
Topic
Text
Readings
Course overview and introduction
to test theory
Supplementary
Readings
Ghiselli et al., ch. 1
8/30
Basic statistics in educational and
psychological measurement
Miller (2000)
9/8
Test scores as composites
Crocker & Algina, ch. 5
Ghiselli et al.: 7
9/13
Test dimensionality
9/15
Classical test theory
9/20
Methods of estimating reliability
9/22
Coefficient alpha
Cortina (1993); Miller (1995)
Schmitt (1996)
9/27
Estimating true scores
Harvill (1991:SRP)
9/29
Reliability generalization
Vacha-Haase (1998)
Thompson & Vacha-Haase (2000)
10/4
Classical item analysis
Crocker & Algina, ch. 14
10/4—10/11
10/6
Netemeyer et al., ch. 1-2
Tate (2002)
Traub (1997; SRP)
Netemeyer et al., ch. 3
Traub & Rowley (1991; SRP);
FIRST EXAM
Introduction to Rasch Scaling
and Item Response Theory
Embretson & Reise, chs. 1-3
Henard (2000)
10/11 Binary & polytomous IRT models Embretson & Reise, chs. 4-5
Harris (1989)
10/13 Scale meaning and properties
Embretson & Reise, ch. 6
Hambleton & Jones (1993)
10/18 Measuring persons and
calibrating items
Embretson & Reise, chs. 7-8
10/20 Assessing model fit
Embretson & Reise, ch. 9
Smith (2000)
10/25 Using IRT software
Linacre & Wright (2000)
10/27 Typical applications
of Rasch procedures
Schulman & Wolfe (2000)
10/27—11/3 SECOND EXAM
4
11/1
Introduction to generalizability
theory
Shavelson & Webb, ch. 1
11/3
Analysis of variance and
Shavelson & Webb, ch. 2
variance component estimation
11/8
G studies with crossed facets
Shavelson & Webb, chs. 3
11/10 G-studies with nested facets
Shavelson & Webb, ch. 4
11/15 D-studies based on the
original G-study design
Shavelson & Webb, chs. 6-7
10/17 D-studies based on
alternative G-study designs
Shavelson & Webb, chs. 8-9
11/22 Using generalizability studies
11/27 G studies & D studies
with fixed facets
11/27—12/4
Hoyt & Melby (1999)
Strube (2000)
Brennan (1992)
Webb et al (1988)
Sudweeks et al. (2004)
Shavelson & Webb, ch. 5
Strube (2000)
Netemeyer et al., ch. 4-6
Clark & Watson (1995)
THIRD EXAM
11/29 Introduction to validity
12/1
Additional validity issues
Bryant (2000), Benson, (1998)
12/6
Factor analytic studies of
construct validity
Thompson & Daniel (1996)
Brace, Kemp & Snelgar, ch. 11
12/14
FINAL EXAM (Tuesday, 7:00 a.m.--10:00 a.m.)
Supplementary Readings Packet:
Brennan, R.L. (1992). Generalizability theory. Educational Measurement: Issues and Practice, 11
(4), 27-34.
Hambleton, R.K. & Jones, R.W. (1993). Comparison of classical test theory and item response
theory and their applications to test development. Educational Measurement: Issues and
Practice, 12(3), 38-47.
Harris, D. (1989). Comparison of the 1-, 2-, and 3-parameter IRT models. Educational
Measurement: Issues and Practices, 8(1), 35-41.
Harvill, L.M. (1991). Standard error of measurement. Educational Measurement: Issues and
Practice, 10(2), 33-41.
Traub, R.E. & Rowley, G.L. (1991). Understanding reliability. Educational Measurement: Issues
and Practice, 10(1), 37-45.
5
References for Supplementary Readings:
Complete bibliographic information for each of the Supplementary Readings listed in the
Course Schedule is given below. Copies of the books referenced are available on two-hour reserve
at the Reserve Desk in the Lee Library. Complete copies the journal articles that are noncopyrighted
are included in the Supplementary Readings packet. The other periodical articles are available
either on Electronic Reserve or in the Periodical Room at the Lee Library.
Benson, J. (1998). Developing a strong program of construct validation: A test anxiety example.
Educational Measurement: Issues and Practice, 17(1), 10-17 & 22.
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. New York:
Holt, Rinehart & Winston. [Reserve Desk at Lee Library]
Henard, D.H. (2000). Item response theory. In L.G. Grimm & P.R. Yarnold (Eds.), Reading and
understanding more multivariate statistics (pp. 67-97). Washington, DC: American
Psychological Association. [Reserve Desk at Lee Library]
Hoyt, W.T. & Melby, J.N. (1999). Dependability of measurement in counseling psychology: An
introduction to generalizability theory. Counseling Psychologist, 27, 325-351. [Electronic
Reserve]
Linacre, J.M. & Wright, B.D. (2000). A user's guide to WINSTEPS. Chicago: MESA Press.
[Reserve Desk at Lee Library]
McKinley, R.L. (1989). An introduction to item response theory. Measurement and Evaluation in
Counseling and Development, 22, 37-57. [Electronic Reserve]
Schulman, J.A. & Wolfe, E.W. (2000). Development of a Nutrition Self-Efficacy Scale for
Prospective Physicians. Journal of Applied Measurement, 1, 107-130.
Smith, R.M. (2000). Fit analysis in latent trait measurement models. Journal of Applied
Measurement, 1, 199-218.
Strube, (M.J. 2000). Reliability and generalizability theory. In L.G. Grimm & P.R. Yarnold (Eds.),
Reading and understanding more multivariate statistics (pp. 23-66). Washington, DC:
American Psychological Association. [Reserve Desk at Lee Library]
Tate, R. (2002). Test dimensionality. In G. Tindal & T.M. Haladyna (Eds.), Large-scale assessment
programs for all students: Validity, technical adequacy, and implementation (p. 181-211).
Mahwah, NJ: Erlbaum.
Thompson, B. & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable.
Educational and Psychological Measurement, 60, 174-195. [Periodical Room at Lee Library]
Traub, R.E. (1997). Classical test theory in historical perspective. Educational Measurement: Issues
and Practice, 16(4), 8-14. [Electronic Reserve]
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error
affecting score reliability across studies. Educational and Psychological Measurement, 58, 620. [Electronic reserve]
Webb, N.M., Rowley, G.L. & Shavelson, R.J. (1988). Using generalizability theory in counseling
and development. Measurement and Evaluation in Counseling and Development, 21, 81-90.
[Electronic Reserve]
6
Additional Helpful References
The books listed below are additional sources that are useful supplements for this course. Copies of
most are available at the Lee Library Reserve Desk.
Allen, M.J. & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Bond, T.G. & Fox, C.M. (2001). Applying the Rasch model: Fundamental measurement in the
human sciences. Mahwah, NJ: Erlbaum.
Brace, N., Kemp, R., & Snelgar, R. (2003). SPSS for psychologists (2nd ed.). Mahwah, NJ: Erlbaum.
Brennan, R.L. (1992). Elements of generalizability theory (Rev. ed.). Iowa City, Iowa: American
College Testing Program.
Brennan, R.L. (2001). Generalizability theory. New York: Springer-Verlag.
Crick, J.E. & Brennan, R.L. (1983). Manual for GENOVA: A generalized analysis of variance system.
ACT Technical Bulletin No. 43. Iowa City, IA: American College Testing Program.
Fischer, G.H. & Molenaar, I.W. (Eds.) (1995). Rasch models: Foundations, recent developments, and
applications. New York: Springer.
Ghiselli, E.E., Campbell, J.P. & Zedeck, S. (1981). Measurement theory for the behavioral sciences.
San Francisco: W.H. Freeman.
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley.
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory.
Newbury Park, CA: Sage Publications.
Linn, R.L. (Ed.) (1989). Educational measurement (3rd ed.). New York: Macmillan.
McDonald, R.P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
Nunnally, J.C. & Bernstein, I.H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
(2nd ed.). London: Routledge.
Rust, J. & Golombok, S. (1999). Modern psychometrics: The science of psychological assessment
Smith, E.V.,Jr. & Smith R.M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.
Thompson, B. (2002). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks,
CA: Sage.
Thorndike, R.L. (1982). Applied psychometrics. Boston: Houghton Mifflin.
van der Linden, W.J. & Hambleton, R.K. (Eds.) (1997). Handbook of modern item response theory.
New York: Springer.
7
RELATED PERIODICALS:
Students are also encouraged to become familiar with the following journals which often include
articles and research reports related to issues treated in this course. Copies of these journals are available
in the Periodicals Room at the Lee Library.
Applied Measurement in Education
Applied Psychological Measurement
Educational and Psychological Measurement
Educational Measurement: Issues and Practice
Journal of Educational and Behavioral Statistics
Journal of Educational Measurement
Measurement and Evaluation in Counseling and Development
Download