introduction to test theory

advertisement
INTRODUCTION TO TEST THEORY
Education 252
Spring 2004
MWF 9:00-10:50, School of Education Building, room 206
Instructor:
Edward Haertel
haertel@stanford.edu
office: e331
650-725-1251
home:
650-494-6432
(Office hours by arrangement, individual or small group)
This course introduces classical test theory, including definitions and formulas for test reliability,
standard error of measurement, and related statistics. Additional topics include test validity, item
statistics useful in test construction, score scales and norms commonly used in educational testing,
item bias and test bias, and ideas of fairness and equity in educational and psychological testing.
Factor analysis as well as the major extensions and alternatives to classical test theory,
generalizability theory and item response theory (latent trait theory), are briefly introduced.
This course is intended to equip students to read the literature in their own substantive areas more
critically, to use tests more intelligently in research, and to pursue further studies in psychometrics.
It is prerequisite to Education 353A (Item Response Theory) and Education 353C (Generalizability
Theory). Although the focus of the course is on paper-and-pencil measures of cognitive abilities and
academic achievement, most of the concepts and methods developed apply equally to performance
testing, as well as the assessment of attitudes and personality constructs, ratings based on systematic
observations, and other kinds of assessments of individuals or groups.
Materials
The text for the course is Introduction to Classical and Modern Test Theory,
by Linda Crocker and James Algina (1986).
There is also a collection of "programmed instruction" materials for this course, which provide a
systematic introduction to and practice in using the notation, statistical concepts, and algebraic
manipulations required. Various authors, primarily L. J. Cronbach, have contributed to these
materials over the years. In 1982, the teaching assistant for this course, David King, retyped and
extensively revised these materials to standardize their format, correct minor errors, and provide
coverage of some topics that had been omitted. Some students have found these materials quite
helpful. Two copies are on reserve in Cubberley Library. Anyone wishing to purchase a copy (at
cost, about $12) should talk with me or send me an email. Use of the PI materials is optional.
References to the PI materials are indicated on the course schedule.
Students are also encouraged to pursue the readings listed in the attached bibliography. These are
intended both to provide additional review and practice for students especially insecure about their
statistical preparation, and to offer more thorough coverage of selected topics, for those who wish to
go beyond what is provided in the text or who are concerned with specific kinds of applications of
measurement theory. In previous years, optional readings have been placed on reserve. The library
has found, however, that the demand for optional readings is generally low, and that the restricted
loan period for materials on reserve inconveniences those students who do make use of them. For
the past several years, I have followed the library's recommendation in not placing these materials on
reserve, and students have not reported any difficulties. If access to or competition for these
materials should pose any problem, please let me know, and I'll try to locate additional copies for
you to use.
Course Meetings, Requirements, and Grading
Attendance is expected at lectures, 9:00-10:50. The regular meeting times are Mondays and
Wednesdays when possible, but the AERA and NCME Annual Meetings and other commitments
will necessitate one regular class meeting on Friday April 9. I have found that scheduled office
hours do not provide sufficient flexibility to accommodate students' varied schedules, which is why I
have listed "office hours by arrangement, individual or small group." Please do not hesitate to send
me an email to set up a time to meet if necessary. Early in the quarter, on some Fridays when there
is no lecture scheduled, I plan to conduct optional discussion sections for students wishing extra help
or practice with course material.
The course is listed as "3-4 units." For most students, this should be a four-unit course, but
enrollment for three units is offered as a courtesy for those with limited tuition grants or stipends.
Enrollment for 3 or 4 units makes no difference in the work expected.
The satisfactory/no credit (+/NC) option is available for this course.
Brief in-class quizzes and homework exercises will be assigned to provide practice and to check
comprehension. Completion of these quizzes and exercises is required, but grades will be based
solely on the take-home midterm and final examinations, with more weight on the final.
This course is not designed for students with exceptionally strong mathematical and statistical
preparation, but such students may nonetheless desire a systematic introduction to measurement
theory. If you fall into this category, please see me individually to arrange some special project in
connection with the course, such as a brief paper on some measurement topic, a critique of
measurement methods in one or more pieces of published research, a critique of the measurement of
some psychological construct, or a review of one or more published tests. I will be happy to arrange
one or more additional units of credit for such special projects in conjunction with the course, if you
wish. Small groups of students are also welcome to undertake such projects.
Students with Documented Disabilities
Students who have a disability that may necessitate an academic accommodation or the use of
auxiliary aids and services in a class must initiate the request with the Disability Resource Center
(DRC). The DRC will evaluate the request with required documentation, recommend appropriate
accommodations, and prepare a verification letter dated in the current academic term in which the
request is being made. Please contact the DRC as soon as possible; timely notice is needed to
arrange for appropriate accommodations. The DRC is located at 563 Salvatierra Walk (phone 7231066; TDD 725-1067).
Tentative Course Schedule
Date
Topic
Readingsa
W 3/31
F 4/ 2
Introduction and Overview
No class
C&A 3-15; PI 1-39
M 4/ 5
Properties of composite scores
C&A 16-50, 60-64, 87-101; PI 40-111;
WE&C 56-73, Ch. 12
W 4/ 7
The classical (weak) true-score model
C&A 36-42(review),105-30;PI 112-133, §§B,Q;
WE&C 178-192; Gulliksen Chs. 2, 3
F 4/ 9
Estimating reliability (1)
C&A 131-43; PI C1-C11; Feldt & Brennan
105-113; Gulliksen Chs. 6-7
M 4/12
W 4/14
F 4/16
AERA/NCME Annual meetings: Class cancelled
AERA/NCME Annual meetings: Class cancelled
AERA/NCME Annual meetings: Class cancelled
M 4/19
Estimating reliability (2); stratified alpha
Feldt & Brennan 113-118; Feldt (1990)
W 4/21
F 4/23
Useful applications (1)
Optional discussion section
C&A 143-56; Gulliksen Chs. 8,10
M 4/26
Useful applications (2); rel. of group means
Feldt & Brennan §§ 2.7.1, 2.8.8
W 4/28
F 4/30
Conditional standard errors of measurement
Optional discussion section
Feldt & Brennan §§ 2.8.3
M 5/ 3
Generalizability theory
C&A 157-91; Shavelson & Webb
W 5/ 5
Validity (1)
C&A 217-42; PI C11-C17; Campbell
& Fiske (1959); APA Standards
F 5/ 7
Optional discussion section
M 5/10
Validity (2); MIDTERM DUE
Messick (1989, 1995)
W 5/12
F 5/14
Factor Analysis
No class
C&A 287-308
M 5/17
Test construction and item analysis
C&A 66-86, 311-38; Haertel (1985);
Millman & Greene; WE&C 196-203
W 5/19
Scores, scales, and norms
C&A 399-409,431-32,438-55;
Petersen, Kolen, & Hoover
F 5/21
No class
M
W
F
M
W
Item Bias, Test Bias, & Equity
Item response theory
No class
Memorial Day Observed (University Holiday)
Testing and Educational Policy
5/24
5/26
5/38
5/31
6/ 2
C&A 267-78, 283-85; Cronbach (1976)
C&A 339-50,352-54,361-71
W 6/ 9
TAKE-HOME FINAL EXAMINATION DUE, 12:00 noon
_________________________________________________________________________
a"C&A" refers to Crocker and Algina (1986) and “WE&C” to Welkowitz, Ewen, and Cohen (2000). "PI" refers to the
"Programmed Instruction" materials available as a course reader. Written examinations will cover both lectures and reading
assignments from C&A. References to PI and WE&C are to provide additional explanatory material for students who lack a strong
statistical background. The remaining readings treat specific topics in greater depth.
References
American Educational Research Association (AERA), American Psychological Association (APA), and
the National Council on Measurement in Education (NCME). (1999). Standards for educational and
psychological testing. Washington, D.C.: American Psychological Association.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81-105.
Crocker, L., and Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York: CBS
College Publishing.
Cronbach, L. J. (1976). Equity in selection: When psychometrics and political philosophy meet. Journal of
Educational Measurement, 13, 31-41.
Feldt, L.S. (1990). The sampling theory for the intraclass reliability coefficient. Applied Measurement in
Education, 3, 361-367.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed.,
pp. 105-146). New York: Macmillan.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Haertel, E. H. (1985). Construct validity and criterion-referenced testing. Review of Educational Research,
55, 23-46.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New
York: Macmillan.
Messick, S. (1995). Validity of Psychological Assessment: Validation of Inferences from Persons'
Responses and Performances as Scientific Inquiry into Score Meaning. American Psychologist, 50,
741-749.
Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability.
In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 335-366). New York: Macmillan.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.),
Educational measurement (3rd ed., pp. 221-262). New York: Macmillan.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Welkowitz, J., Ewen, R.B., Cohen, J. (2000). Introductory statistics for the behavioral sciences (5th
Ed.).Orlando: Harcourt Brace College.
Download