Review of AERA/APA/NCME Test Standards Revision and Students

advertisement
Update on the Revisions to the
Standards for Educational and
Psychological Testing: Overview
2010 Annual Meeting of the NCME
Denver, Colorado
May 1, 2010, 4:05 – 6:05 p.m.
Michael Kolen
University of Iowa
Joint Committee Members
• Lauress Wise, Co Chair, HumRRO
• Barbara Plake, Co Chair, University of
Neb.
• Linda Cook, ETS
• Fritz Drasgow, University of Illinois
• Brian Gong, NCIEA
• Laura Hamilton, Rand Corporation
• Jo-Ida Hansen, University on MN
• Joan Herman, UCLA
May 1, 2010
Update on Revisions to the Test
Standards
2
Joint Committee Members
• Michael Kane, ETS
• Michael Kolen, University of Iowa
• Antonio Puente, UNC-Wilmington
• Paul Sackett, University of MN
• Nancy Tippins, Valtera Corporation
• Walter (Denny) Way, Pearson
• Frank Worrell, Univ of CA- Berkeley
May 1, 2010
Update on Revisions to the Test
Standards
3
Scope of the Revision
• Based on comments each organization
•
received from invitation to comment
Summarized by the Management
Committee in consultation with the CoChairs
• Wayne Camara, Chair, APA
• Suzanne Lane, AERA
• David Frisbie, NCME
May 1, 2010
Update on Revisions to the Test
Standards
4
Five Identified Areas for the
Revisions
• Access/Fairness
• Accountability
• Technology
• Workplace
• Format issues
May 1, 2010
Update on Revisions to the Test
Standards
5
Theme Teams
• Working teams
• Cross team collaborations
• Chapter Leaders
• Focusing of bringing into chapters
content related to themes in coherent
and meaningful ways
May 1, 2010
Update on Revisions to the Test
Standards
6
Presentation: Five Identified
Areas & Discussant
• Fairness – Joan Herman
• Accountability – Laura Hamilton
• Technology – Denny Way
• Workplace – Laurie Wise
• Format and Publication Options •
Barbara Plake
Discussant - Steve Ferrara, NCME
Liaison to JC
May 1, 2010
Update on Revisions to the Test
Standards
7
Timeline
• First meeting January, 2009
• Three year process for completing text of
•
•
•
revision
Release of draft revision following
December 2010 JC meeting
Open comment/Organization reviews
Projected publication Summer, 2012
May 1, 2010
Update on Revisions to the Test
Standards
8
Revision of the Standards for
Educational and Psychological
Testing: Fairness
2010 Annual Meeting of the NCME
Denver, Colorado
May 1, 2010, 4:05 – 6:05 p.m.
Joan Herman
CRESST/UCLA
Overview
• 1999 Approach to Fairness
• Committee Charge
• Revision Response
May 1, 2010
Update on Revisions to the Test
Standards
10
1999 Approach
• Standards related to fairness appear
•
throughout many chapters
Concentrated attention in:
• Chapter 7: Fairness in Testing and Test Use
• Chapter 8: Rights and Responsibilities of
Test Takers
• Chapter 9: Testing Individuals of Diverse
Linguistic Backgrounds
• Chapter 10: Testing Individuals with
Disabilities
May 1, 2010
Update on Revisions to the Test
Standards
11
Committee Charge
•
•
•
Five elements of the charge focused on
accommodations/modifications
•
•
•
•
•
Impact/differentiation of accommodation and
modification
Appropriate selection/use for ELL and EWD
Attention to other groups, e.g., pre-K, older
populations
Flagging
Comparability/validity
One element focused on adequacy and
comparability of translations
One element focused on Universal Design
May 1, 2010
Update on Revisions to the Test
Standards
12
Revision Response
•
•
•
•
•
•
•
Fairness is fundamental to test validity: include
as foundation chapter
Fairness and access are inseparable
Same principles of fairness and access apply
to all individuals and regardless of specific
subgroup
From three chapters to a single chapter
describe core principles and standards
May 1, 2010
Examples drawn from ELs, EWD, and other groups
(young children, aging adults adults, etc)
Comments point to applications for specific groups
Special standards retained where appropriate (e.g.,
test translations)
Update on Revisions to the Test
Standards
13
Overview to Fairness Chapter
• Section I: General Views of Fairness
• Section II: Threats to the Fair and Valid
Interpretations of Test Scores
• Section III: Minimizing Construct Irrelevant
Components Through the Use of Test Design
and Testing Adaptations
• Section IV: The Standards
May 1, 2010
Update on Revisions to the Test
Standards
14
Four Clusters of Standards
1.
2.
3.
4.
Use test design, development administration and
scoring procedures that minimize barriers to valid
test interpretations for all individuals.
Conduct studies to examine the validity of test score
inferences for the intended examinee population.
Provide appropriate accommodations to remove
barriers to the accessibility of the construct
measured by the assessment and to the valid
interpretation of the assessment scores.
Guard against inappropriate interpretations, use,
and/or unintended consequences of test results for
individuals or subgroups.
May 1, 2010
Update on Revisions to the Test
Standards
15
Revision of the Standards for
Educational and Psychological
Testing: Accountability
2010 Annual Meeting of the NCME
Denver, Colorado
May 1, 2010, 4:05 – 6:05 p.m.
Laura Hamilton
RAND Corporation
Overview
•
Use of tests for accountability has expanded
•
•
•
•
•
Most notably in education but also in other areas such as
behavioral health
Facilitated by increasing availability of data and analysis tools
Recent and impending federal and state initiatives will likely
lead to further expansion
Under NCLB, or new pay for performance programs,
tests often have consequences for individuals other
than the examinees
Use of test scores in policy and program evaluations
continues to be widespread
•
May 1, 2010
Reinforced by groups that fund and evaluate research (e.g.,
IES, What Works Clearinghouse)
Update on Revisions to the Test
Standards
17
Organization of Accountability
Material
• Chapter on policy uses of tests focuses on use
of aggregate scores for accountability and
policy
• Chapter on educational testing addresses
student-level accountability (e.g., promotional
gates, high school exit exams) and interim
assessment
• Validity, reliability, and fairness standards in
earlier chapters apply to accountability testing
as well
May 1, 2010
Update on Revisions to the Test
Standards
18
Some Key Accountability Issues
Included in Our Charge
1. Calculation of accountability indices using composite
scores at level of institution or individual


Institutional level (e.g., conjunctive and disjunctive rules for
combining scores)
Individual level (e.g., teacher value-added modeling)
2. Issues related to validity, reliability, and reporting of
individual and aggregate scores
3. Test preparation
4. Interim assessments
May 1, 2010
Update on Revisions to the Test
Standards
19
1. Accountability Indices
•
•
Most test-based accountability systems require
calculation of indices using complex set of rules
Advances in data systems and statistical methodology
have led to more sophisticated indices to support
causal inferences
•
•
May 1, 2010
E.g., teacher and principal value-added measures
Consequences attached to these measures are growing
increasingly significant
Update on Revisions to the Test
Standards
20
2. Validity, Reliability, and Reporting
Requirements
•
•
•
Accountability indices should be subjected to
validation related to intended purposes
Error estimates should be incorporated into score
reports, including those that provide subscores and
diagnostic guidance for individuals or groups
Reports should provide clear, detailed information
on rules used to create aggregate scores or indices
May 1, 2010
Update on Revisions to the Test
Standards
21
2. Validity, Reliability, and Reporting
Requirements, cont.
•
Guidance should be provided for interpretation of
scores from subgroups
•
•
•
•
Describe exclusion rules, accommodations, and modifications
Address error stemming from small subgroups
Explain contribution of subgroup performance to
accountability index
Teachers and other users should be given assistance
to ensure appropriate interpretation and use of
information from tests
May 1, 2010
Update on Revisions to the Test
Standards
22
3. Test Preparation
•
High-stakes testing raises concerns about
inappropriate test preparation
Users should take steps to reduce likelihood of test
preparation that undermines validity
•
•
Help administrators and teachers understand what kinds of
preparation are appropriate and desirable
•
•
Design tests and testing systems to limit likelihood of harmful
test preparation
Consequences of accountability policies should be
monitored
May 1, 2010
Update on Revisions to the Test
Standards
23
4. Addressing Interim Assessments
•
Interim assessments are common but take many
different forms
•
•
•
•
Some produced by commercial publishers, others homegrown
Vary in the extent to which they provide formative feedback
vs. benchmarking to end-of-year tests
Need to determine which of these tests should be subjected to
the Standards
Requirements for validity and reliability depend in part
on how scores are used
•
•
May 1, 2010
If used for high-stakes decisions such as placement, evidence
of validity for that purpose should be provided
Systems that provide instructional guidance should include
rationale and evidence to support it
Update on Revisions to the Test
Standards
24
Revision of the Standards for
Educational and Psychological
Testing: Technology
2010 Annual Meeting of the NCME
Denver, Colorado
May 1, 2010, 4:05 – 6:05 p.m.
Denny Way
Pearson
Overview
• Technological advances are changing the way
•
•
tests are delivered, scored, interpreted and in
some cases, the nature of the tests
themselves
The Joint Committee has been charged with
considering how technological advances
should impact revisions to the Standards
As with the other themes, comments on the
standards that related to technology were
compiled by the Management Committee and
summarized in their charge to the Joint
Committee
May 1, 2010
Update on Revisions to the Test
Standards
26
Key Technology Issues Included in
our Charge
•
•
•
•
Reliability & validity of innovative item formats
Validity issues associated with the use of:
•
•
Automated scoring algorithms
Automated score reports and interpretations
Security issues for tests delivered over the
internet
Issues with web-accessible data, including
data warehousing
May 1, 2010
Update on Revisions to the Test
Standards
27
Reliability & Validity of Innovative
Item Formats
•
•
•
What special issues exist for innovative items with
respect to access and elimination of bias against
particular groups? How might the standards reflect
these issues?
What steps should the standards suggest with regards
to “usability” of innovative items?
What issues will emerge over the next five years
related to innovative items/test formats that need to be
addressed by the standards?
May 1, 2010
Update on Revisions to the Test
Standards
28
Automated Scoring Algorithms
•
•
•
What level of documentation/disclosure is appropriate
and tolerable for automated scoring
developers/vendors?
What sorts of evidence seem most important for
demonstrating the validity and “reliability” of
automated scoring systems?
What issues will emerge over the next five years
related to automated scoring systems that need to be
addressed by the standards?
May 1, 2010
Update on Revisions to the Test
Standards
29
Expert Panel Input
• To address issues related to innovative
•
item formats and automated scoring
algorithms, we convened a panel of
experts from the field and solicited their
advice
Invited members made presentations on
these topics and discussed associated
issues with the joint standards
committee
May 1, 2010
Update on Revisions to the Test
Standards
30
Highlights of Technology Panel Input
• Test development and simulations
• Rationale / validity argument
• Usability studies / field testing
• Security & Fairness
• Timed tasks & processing speed
• Innovative clinical assessments & faking
(effort assessment)
May 1, 2010
Update on Revisions to the Test
Standards
31
Highlights of Technology Panel Input
• Disclosure of automated scoring algorithms:
Differing viewpoints
• Disclose everything to great detail (use patents to
•
protect proprietary IP) vs. provide sufficient
documentation for other experts to confirm validity
of process
Possible compromise: expert review under
conditions of nondisclosure
• Quality Assurance: Importance of
“independent calibrations”
May 1, 2010
Update on Revisions to the Test
Standards
32
Automated Score Reports and
Interpretation
•
•
Use of computer for score interpretation
“Actionable” reports (e.g., routing students and
teachers to instructional materials and lesson
plans based on test results)
•
•
May 1, 2010
Documentation of rationale
Supporting validity evidence
Update on Revisions to the Test
Standards
33
Revision of the Standards for
Educational and Psychological
Testing: Workplace Testing
2010 Annual Meeting of the NCME
Denver, Colorado
May 1, 2010, 4:05 – 6:05 p.m.
Laurie Wise
Human Resources Research Organization
(HumRRO)
Overview
•
•
•
•
Standards for testing in the work place are currently
covered in Chapter 14 (one of the testing application
chapters).
Work-place testing includes employment testing as
well as licensure, certification, and promotion testing.
Comments on standards related to work place testing
were received by the Management Committee and
summarized in their charge to the Joint Committee.
Comments suggested areas for extending or clarifying
testing standards, but did not suggest major revisions
existing standards.
May 1, 2010
Update on Revisions to the Test
Standards
35
Key Work-Place Testing Issues
Included in Our Charge
1.
Validity and reliability requirements for certification
and licensure tests.
2.
Issues when tests are administered only to small
populations of job incumbents.
3.
Requirements for tests for new, innovative job
positions that do not have incumbents or job history
to provide validity evidence.
4.
Assuring access to licensure and certification tests
for examinees with disabilities that may limit
participation in regular testing sessions?
5.
Differential requirements for certification and
licensure and employment tests.
May 1, 2010
Update on Revisions to the Test
Standards
36
1. Validity and Reliability
Requirements for Certification
• Some specific issues:
• Documenting and communicating the
validity and reliability of pass-fail decisions
in addition to the underlying scores
• How cut-offs are determined
• How validity and reliability information is
communicated to relevant stakeholders
• A key change is the need for focus on
pass-fail decisions
May 1, 2010
Update on Revisions to the Test
Standards
37
2. Issues with Small Examinee
Populations
• Including:
• Alternatives to statistical tools for item screening
• Assuring fairness
• Assuring technical accuracy
• Alternatives to empirical validity evidence
• Maintaining comparability of scores from different
test forms
• Key concern is the with appropriate use
of expert judgment
May 1, 2010
Update on Revisions to the Test
Standards
38
3. Requirements for New Jobs
• Issues include:
• Identifying test content
• Establishing passing scores
• Assessing reliability
• Demonstrating validity
• Key here is also appropriate use of
expert judgment
May 1, 2010
Update on Revisions to the Test
Standards
39
4. Assuring Access to Certification
and Licensure Testing
• See also separate presentation on fairness
• Issues include:
• Determining appropriate versus inappropriate
•
May 1, 2010
accommodations
Relating testing accommodations to
accommodations available in the work place
Update on Revisions to the Test
Standards
40
5. Certification and Licensure versus
Employment Testing
• Currently, two sections in the same
•
chapter
Examples of relevant issues:
• Differences in how test content is identified
• Differences in validation strategies
• Differences in test score use
• Who oversees testing
• Goal is to increase coherence in approach
to these two related uses of tests
May 1, 2010
Update on Revisions to the Test
Standards
41
Revision of the Standards for
Educational and Psychological
Testing: Format and Publication
2010 Annual Meeting of the NCME
Denver, Colorado
May 1, 2010, 4:05 – 6:05 p.m.
Barbara Plake
University of Nebraska-Lincoln
Format Issues
• Organization of Chapters
• Consideration of ways to identify of “Priority
•
Standards”
More parallelism between chapter
• Tone
• Complexity
• Technical language
May 1, 2010
Update on Revisions to the Test
Standards
43
Organization of Chapters
• 1999 Testing Standards
• Three sections
• Foundation: Validity, Reliability, Test Development,
•
•
May 1, 2010
Scaling & Equating, Administration & Scoring,
Documentation
Fairness: Fairness, Test Takers Rights and
Responsibilities, Disabilities, Linguistic Minorities
Applications: Test Users, Psychological,
Educational, Workplace, Policy
Update on Revisions to the Test
Standards
44
Revised Test Standards
Possible Chapter Organization
Section 1:
Validity, Reliability, Fairness
Section 2:
Test Design and Development, Scaling &
Equating, Test Administration & Scoring,
Documentation, Test Takers, Test Users
Section 3:
Psychological, Educational, Workplace,
Policy and Accountability
May 1, 2010
Update on Revisions to the Test
Standards
45
Possible Ways to Identify “Priority
Standards”
• Clustering of Standards into thematic
•
•
topics
Over-arching Standards/ Guiding
Principles
Application Chapters
• Connection of standards to previous
standards
May 1, 2010
Update on Revisions to the Test
Standards
46
More Parallelism Across Chapters
• Cross-team collaborations
• Content editor with psychometric
•
expertise
Structural continuity
May 1, 2010
Update on Revisions to the Test
Standards
47
Publication Options
• Management Committee responsibility
• Goal is for electronic access
• Pursuing options for Kindle, etc.
• Concerns about retaining integrity and
financial support for future revision
efforts
May 1, 2010
Update on Revisions to the Test
Standards
48
Download