Update on the Revisions to the Standards for Educational and Psychological Testing: Overview 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 – 6:05 p.m. Michael Kolen University of Iowa Joint Committee Members • Lauress Wise, Co Chair, HumRRO • Barbara Plake, Co Chair, University of Neb. • Linda Cook, ETS • Fritz Drasgow, University of Illinois • Brian Gong, NCIEA • Laura Hamilton, Rand Corporation • Jo-Ida Hansen, University on MN • Joan Herman, UCLA May 1, 2010 Update on Revisions to the Test Standards 2 Joint Committee Members • Michael Kane, ETS • Michael Kolen, University of Iowa • Antonio Puente, UNC-Wilmington • Paul Sackett, University of MN • Nancy Tippins, Valtera Corporation • Walter (Denny) Way, Pearson • Frank Worrell, Univ of CA- Berkeley May 1, 2010 Update on Revisions to the Test Standards 3 Scope of the Revision • Based on comments each organization • received from invitation to comment Summarized by the Management Committee in consultation with the CoChairs • Wayne Camara, Chair, APA • Suzanne Lane, AERA • David Frisbie, NCME May 1, 2010 Update on Revisions to the Test Standards 4 Five Identified Areas for the Revisions • Access/Fairness • Accountability • Technology • Workplace • Format issues May 1, 2010 Update on Revisions to the Test Standards 5 Theme Teams • Working teams • Cross team collaborations • Chapter Leaders • Focusing of bringing into chapters content related to themes in coherent and meaningful ways May 1, 2010 Update on Revisions to the Test Standards 6 Presentation: Five Identified Areas & Discussant • Fairness – Joan Herman • Accountability – Laura Hamilton • Technology – Denny Way • Workplace – Laurie Wise • Format and Publication Options • Barbara Plake Discussant - Steve Ferrara, NCME Liaison to JC May 1, 2010 Update on Revisions to the Test Standards 7 Timeline • First meeting January, 2009 • Three year process for completing text of • • • revision Release of draft revision following December 2010 JC meeting Open comment/Organization reviews Projected publication Summer, 2012 May 1, 2010 Update on Revisions to the Test Standards 8 Revision of the Standards for Educational and Psychological Testing: Fairness 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 – 6:05 p.m. Joan Herman CRESST/UCLA Overview • 1999 Approach to Fairness • Committee Charge • Revision Response May 1, 2010 Update on Revisions to the Test Standards 10 1999 Approach • Standards related to fairness appear • throughout many chapters Concentrated attention in: • Chapter 7: Fairness in Testing and Test Use • Chapter 8: Rights and Responsibilities of Test Takers • Chapter 9: Testing Individuals of Diverse Linguistic Backgrounds • Chapter 10: Testing Individuals with Disabilities May 1, 2010 Update on Revisions to the Test Standards 11 Committee Charge • • • Five elements of the charge focused on accommodations/modifications • • • • • Impact/differentiation of accommodation and modification Appropriate selection/use for ELL and EWD Attention to other groups, e.g., pre-K, older populations Flagging Comparability/validity One element focused on adequacy and comparability of translations One element focused on Universal Design May 1, 2010 Update on Revisions to the Test Standards 12 Revision Response • • • • • • • Fairness is fundamental to test validity: include as foundation chapter Fairness and access are inseparable Same principles of fairness and access apply to all individuals and regardless of specific subgroup From three chapters to a single chapter describe core principles and standards May 1, 2010 Examples drawn from ELs, EWD, and other groups (young children, aging adults adults, etc) Comments point to applications for specific groups Special standards retained where appropriate (e.g., test translations) Update on Revisions to the Test Standards 13 Overview to Fairness Chapter • Section I: General Views of Fairness • Section II: Threats to the Fair and Valid Interpretations of Test Scores • Section III: Minimizing Construct Irrelevant Components Through the Use of Test Design and Testing Adaptations • Section IV: The Standards May 1, 2010 Update on Revisions to the Test Standards 14 Four Clusters of Standards 1. 2. 3. 4. Use test design, development administration and scoring procedures that minimize barriers to valid test interpretations for all individuals. Conduct studies to examine the validity of test score inferences for the intended examinee population. Provide appropriate accommodations to remove barriers to the accessibility of the construct measured by the assessment and to the valid interpretation of the assessment scores. Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. May 1, 2010 Update on Revisions to the Test Standards 15 Revision of the Standards for Educational and Psychological Testing: Accountability 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 – 6:05 p.m. Laura Hamilton RAND Corporation Overview • Use of tests for accountability has expanded • • • • • Most notably in education but also in other areas such as behavioral health Facilitated by increasing availability of data and analysis tools Recent and impending federal and state initiatives will likely lead to further expansion Under NCLB, or new pay for performance programs, tests often have consequences for individuals other than the examinees Use of test scores in policy and program evaluations continues to be widespread • May 1, 2010 Reinforced by groups that fund and evaluate research (e.g., IES, What Works Clearinghouse) Update on Revisions to the Test Standards 17 Organization of Accountability Material • Chapter on policy uses of tests focuses on use of aggregate scores for accountability and policy • Chapter on educational testing addresses student-level accountability (e.g., promotional gates, high school exit exams) and interim assessment • Validity, reliability, and fairness standards in earlier chapters apply to accountability testing as well May 1, 2010 Update on Revisions to the Test Standards 18 Some Key Accountability Issues Included in Our Charge 1. Calculation of accountability indices using composite scores at level of institution or individual Institutional level (e.g., conjunctive and disjunctive rules for combining scores) Individual level (e.g., teacher value-added modeling) 2. Issues related to validity, reliability, and reporting of individual and aggregate scores 3. Test preparation 4. Interim assessments May 1, 2010 Update on Revisions to the Test Standards 19 1. Accountability Indices • • Most test-based accountability systems require calculation of indices using complex set of rules Advances in data systems and statistical methodology have led to more sophisticated indices to support causal inferences • • May 1, 2010 E.g., teacher and principal value-added measures Consequences attached to these measures are growing increasingly significant Update on Revisions to the Test Standards 20 2. Validity, Reliability, and Reporting Requirements • • • Accountability indices should be subjected to validation related to intended purposes Error estimates should be incorporated into score reports, including those that provide subscores and diagnostic guidance for individuals or groups Reports should provide clear, detailed information on rules used to create aggregate scores or indices May 1, 2010 Update on Revisions to the Test Standards 21 2. Validity, Reliability, and Reporting Requirements, cont. • Guidance should be provided for interpretation of scores from subgroups • • • • Describe exclusion rules, accommodations, and modifications Address error stemming from small subgroups Explain contribution of subgroup performance to accountability index Teachers and other users should be given assistance to ensure appropriate interpretation and use of information from tests May 1, 2010 Update on Revisions to the Test Standards 22 3. Test Preparation • High-stakes testing raises concerns about inappropriate test preparation Users should take steps to reduce likelihood of test preparation that undermines validity • • Help administrators and teachers understand what kinds of preparation are appropriate and desirable • • Design tests and testing systems to limit likelihood of harmful test preparation Consequences of accountability policies should be monitored May 1, 2010 Update on Revisions to the Test Standards 23 4. Addressing Interim Assessments • Interim assessments are common but take many different forms • • • • Some produced by commercial publishers, others homegrown Vary in the extent to which they provide formative feedback vs. benchmarking to end-of-year tests Need to determine which of these tests should be subjected to the Standards Requirements for validity and reliability depend in part on how scores are used • • May 1, 2010 If used for high-stakes decisions such as placement, evidence of validity for that purpose should be provided Systems that provide instructional guidance should include rationale and evidence to support it Update on Revisions to the Test Standards 24 Revision of the Standards for Educational and Psychological Testing: Technology 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 – 6:05 p.m. Denny Way Pearson Overview • Technological advances are changing the way • • tests are delivered, scored, interpreted and in some cases, the nature of the tests themselves The Joint Committee has been charged with considering how technological advances should impact revisions to the Standards As with the other themes, comments on the standards that related to technology were compiled by the Management Committee and summarized in their charge to the Joint Committee May 1, 2010 Update on Revisions to the Test Standards 26 Key Technology Issues Included in our Charge • • • • Reliability & validity of innovative item formats Validity issues associated with the use of: • • Automated scoring algorithms Automated score reports and interpretations Security issues for tests delivered over the internet Issues with web-accessible data, including data warehousing May 1, 2010 Update on Revisions to the Test Standards 27 Reliability & Validity of Innovative Item Formats • • • What special issues exist for innovative items with respect to access and elimination of bias against particular groups? How might the standards reflect these issues? What steps should the standards suggest with regards to “usability” of innovative items? What issues will emerge over the next five years related to innovative items/test formats that need to be addressed by the standards? May 1, 2010 Update on Revisions to the Test Standards 28 Automated Scoring Algorithms • • • What level of documentation/disclosure is appropriate and tolerable for automated scoring developers/vendors? What sorts of evidence seem most important for demonstrating the validity and “reliability” of automated scoring systems? What issues will emerge over the next five years related to automated scoring systems that need to be addressed by the standards? May 1, 2010 Update on Revisions to the Test Standards 29 Expert Panel Input • To address issues related to innovative • item formats and automated scoring algorithms, we convened a panel of experts from the field and solicited their advice Invited members made presentations on these topics and discussed associated issues with the joint standards committee May 1, 2010 Update on Revisions to the Test Standards 30 Highlights of Technology Panel Input • Test development and simulations • Rationale / validity argument • Usability studies / field testing • Security & Fairness • Timed tasks & processing speed • Innovative clinical assessments & faking (effort assessment) May 1, 2010 Update on Revisions to the Test Standards 31 Highlights of Technology Panel Input • Disclosure of automated scoring algorithms: Differing viewpoints • Disclose everything to great detail (use patents to • protect proprietary IP) vs. provide sufficient documentation for other experts to confirm validity of process Possible compromise: expert review under conditions of nondisclosure • Quality Assurance: Importance of “independent calibrations” May 1, 2010 Update on Revisions to the Test Standards 32 Automated Score Reports and Interpretation • • Use of computer for score interpretation “Actionable” reports (e.g., routing students and teachers to instructional materials and lesson plans based on test results) • • May 1, 2010 Documentation of rationale Supporting validity evidence Update on Revisions to the Test Standards 33 Revision of the Standards for Educational and Psychological Testing: Workplace Testing 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 – 6:05 p.m. Laurie Wise Human Resources Research Organization (HumRRO) Overview • • • • Standards for testing in the work place are currently covered in Chapter 14 (one of the testing application chapters). Work-place testing includes employment testing as well as licensure, certification, and promotion testing. Comments on standards related to work place testing were received by the Management Committee and summarized in their charge to the Joint Committee. Comments suggested areas for extending or clarifying testing standards, but did not suggest major revisions existing standards. May 1, 2010 Update on Revisions to the Test Standards 35 Key Work-Place Testing Issues Included in Our Charge 1. Validity and reliability requirements for certification and licensure tests. 2. Issues when tests are administered only to small populations of job incumbents. 3. Requirements for tests for new, innovative job positions that do not have incumbents or job history to provide validity evidence. 4. Assuring access to licensure and certification tests for examinees with disabilities that may limit participation in regular testing sessions? 5. Differential requirements for certification and licensure and employment tests. May 1, 2010 Update on Revisions to the Test Standards 36 1. Validity and Reliability Requirements for Certification • Some specific issues: • Documenting and communicating the validity and reliability of pass-fail decisions in addition to the underlying scores • How cut-offs are determined • How validity and reliability information is communicated to relevant stakeholders • A key change is the need for focus on pass-fail decisions May 1, 2010 Update on Revisions to the Test Standards 37 2. Issues with Small Examinee Populations • Including: • Alternatives to statistical tools for item screening • Assuring fairness • Assuring technical accuracy • Alternatives to empirical validity evidence • Maintaining comparability of scores from different test forms • Key concern is the with appropriate use of expert judgment May 1, 2010 Update on Revisions to the Test Standards 38 3. Requirements for New Jobs • Issues include: • Identifying test content • Establishing passing scores • Assessing reliability • Demonstrating validity • Key here is also appropriate use of expert judgment May 1, 2010 Update on Revisions to the Test Standards 39 4. Assuring Access to Certification and Licensure Testing • See also separate presentation on fairness • Issues include: • Determining appropriate versus inappropriate • May 1, 2010 accommodations Relating testing accommodations to accommodations available in the work place Update on Revisions to the Test Standards 40 5. Certification and Licensure versus Employment Testing • Currently, two sections in the same • chapter Examples of relevant issues: • Differences in how test content is identified • Differences in validation strategies • Differences in test score use • Who oversees testing • Goal is to increase coherence in approach to these two related uses of tests May 1, 2010 Update on Revisions to the Test Standards 41 Revision of the Standards for Educational and Psychological Testing: Format and Publication 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 – 6:05 p.m. Barbara Plake University of Nebraska-Lincoln Format Issues • Organization of Chapters • Consideration of ways to identify of “Priority • Standards” More parallelism between chapter • Tone • Complexity • Technical language May 1, 2010 Update on Revisions to the Test Standards 43 Organization of Chapters • 1999 Testing Standards • Three sections • Foundation: Validity, Reliability, Test Development, • • May 1, 2010 Scaling & Equating, Administration & Scoring, Documentation Fairness: Fairness, Test Takers Rights and Responsibilities, Disabilities, Linguistic Minorities Applications: Test Users, Psychological, Educational, Workplace, Policy Update on Revisions to the Test Standards 44 Revised Test Standards Possible Chapter Organization Section 1: Validity, Reliability, Fairness Section 2: Test Design and Development, Scaling & Equating, Test Administration & Scoring, Documentation, Test Takers, Test Users Section 3: Psychological, Educational, Workplace, Policy and Accountability May 1, 2010 Update on Revisions to the Test Standards 45 Possible Ways to Identify “Priority Standards” • Clustering of Standards into thematic • • topics Over-arching Standards/ Guiding Principles Application Chapters • Connection of standards to previous standards May 1, 2010 Update on Revisions to the Test Standards 46 More Parallelism Across Chapters • Cross-team collaborations • Content editor with psychometric • expertise Structural continuity May 1, 2010 Update on Revisions to the Test Standards 47 Publication Options • Management Committee responsibility • Goal is for electronic access • Pursuing options for Kindle, etc. • Concerns about retaining integrity and financial support for future revision efforts May 1, 2010 Update on Revisions to the Test Standards 48