education research meets the gold standard

advertisement
EDUCATION RESEARCH
MEETS THE GOLD
STANDARD:
STATISTICS, EDUCATION, AND
RESEARCH METHODS AFTER
“NO CHILD LEFT BEHIND”
Mack C. Shelley, II
Iowa State University
mshelley@iastate.edu
Presented at the Joint Statistical Meetings,
August 7-11, 2005, Minneapolis, MN
Background


This session is meant to help inform the
national debate over the role of scientific
standards for research in education,
particularly as those research standards
are influenced by statistical methods and
theory.
This session builds on a National Science
Foundation award to myself and Brian
Hand (University of Iowa).
Background


The panel is designed to meld research
interests in statistics, education, and
related disciplines, and to discuss the
dramatically changing context of
contemporary education research.
Why, exactly, is the context changing for
statistical research in education?
Background

Standards for acceptable research in
education are affected greatly by:



the recent creation of the Institute of
Education Sciences in the U.S. Department of
Education
passage of the No Child Left Behind Act of
2001, and
Passage of the Education Sciences Reform Act
(H.R. 3801) in 2002
Background

Together, these developments
have reconstituted federal support for research
and dissemination of information in education
 are meant to foster “scientifically valid research,”
and
 have established what is referred to as the “gold
standard” for research in education.

Background

These and other developments denote
that greater education research emphasis
now is placed on



quantification,
the use of randomized trials, and
the selection of valid control groups
Background

This panel is intended to be part of a
sustained and expanded dialogue
between the statistical community and those who
implement the education research agenda
 through a discussion of whether and how to
implement the new standards for statistical work in
the field of education research

What Is The “Gold Standard”?

U.S. Department of Education, Institute of
Education Sciences, National Center for
Education Evaluation and Regional
Assistance


Identifying and implementing educational
practices supported by rigorous evidence: A
user friendly guide
http://www.ed.gov/about/offices/list/ies/
news.html#guide
What Is The “Gold Standard”?

This publication emphasizes:




evidence-based interventions
educational outcomes that have been found
to be effective in randomized controlled trials
“research’s “gold standard” for establishing
what works”
following patterns of evidence use in medicine
and welfare policy
What Is The “Gold Standard”?

The quality of studies needed to establish
“strong” evidence requires


randomized controlled trials that are welldesigned and implemented
that the quantity of evidence needed spans
trials showing effectiveness in two or more
typical school settings

including a setting similar to that of
schools/classrooms
What Is The “Gold Standard”?

“Possible” evidence may include


randomized controlled trials whose
quality/quantity are good but fall short of
“strong” evidence
and/or comparison-group studies in which the
intervention and comparison groups are very
closely matched

in academic achievement, demographics, and
other characteristics
What Is The “Gold Standard”?

Evaluating whether an intervention is backed by “strong”
evidence of effectiveness hinges on








well-designed and well-implemented randomized
controlled trials
demonstrating that there are no systematic differences
between intervention and control groups before the
intervention
the use of measures and instruments of proven validity
“real-world” objective measures of the outcomes the
intervention is designed to affect
attrition of no more than 25% of the original sample
effect size combined with statistical significance
an adequate sample size to achieve statistical significance
controlled trials implemented in more than one site in
schools that represent a cross-section of all schools
No Child Left Behind

Public Law 107–110 [H.R. 1]



passed on January 8, 2002
“An Act to close the achievement gap with
accountability, flexibility, and choice, so that no child
is left behind”
the “No Child Left Behind Act of 2001” (NCLB)



established standards for academic assessments in
mathematics, reading or language arts, and science
multiple up-to-date measures of student academic
achievement, including measures that assess higher-order
thinking skills and understanding
These requirements for program assessment lead to many
opportunities and circumstances for the application of
statistical methods.
No Child Left Behind

The research program under NCLB was designed to examine the
effect of the assessment and accountability systems on students,
teachers, parents, families, schools, school districts, and States,
including correlations between such systems and
 student academic achievement
 progress toward meeting the State-defined level of proficiency
 progress toward closing achievement gap changes in course
offerings, teaching practices, course content, and instructional
material
 teacher, principal, and pupil-services personnel turnover rates
 student dropout, grade-retention, and graduation rates
 students with disabilities
 student socioeconomic status
 level of student English proficiency
 student ethnicity and race
The Education Sciences Reform
Act and IES

“The Education Sciences Reform Act”




“An Act to provide for improvement of Federal
education research, statistics, evaluation, information,
and dissemination, and for other purposes”
H.R. 3801, passed January 23, 2002
reconstituted federal support for research and
dissemination of information in education, to foster
“scientifically valid research”
established the Institute of Education Sciences (IES)


replacing the Office of Educational Research and
Improvement
part of the Department of Education but functioning
separately from it
The Education Sciences Reform
Act and IES


IES is the research arm of the Department of Education
Mission is to expand knowledge and provide information on




Goal


the condition of education
practices that improve academic achievement
the effectiveness of Federal and other education programs
the transformation of education into an evidence-based field in which decision
makers routinely seek out the best available research and data before adopting
programs or practices that will affect significant numbers of students
Consists of






Grover J. (Russ) Whitehurst, first Director, since November 2002
Office of the Director
National Center for Education Research
National Center for Education Statistics
National Center for Education Evaluation and Regional Assistance
National Center for Special Education Research
The Education Sciences Reform
Act and IES

HR 3801 defined “Scientifically based
research standards” to


apply rigorous, systematic, and objective
methodology to obtain reliable and valid
knowledge relevant to education activities and
programs
present findings and make claims that are
appropriate to and supported by the methods
that have been employed
The Education Sciences Reform
Act and IES

“Scientifically based research” also includes







employing systematic, empirical methods that draw on observation or
experiment
involving data analyses that are adequate to support the general
findings
relying on measurements or observational methods that provide reliable
data
making claims of causal relationships only in random assignment
experiments or other designs (to the extent such designs substantially
eliminate plausible competing explanations for the obtained results)
ensuring that studies and methods are presented in sufficient detail and
clarity to allow for replication or, at a minimum, to offer the opportunity
to build systematically on the findings of the research
obtaining acceptance by a peer-reviewed journal or approval by a panel
of independent experts through a comparably rigorous, objective, and
scientific review
using research designs and methods appropriate to the research
question posed
The Education Sciences Reform
Act and IES

“Scientifically valid education evaluation” means an
evaluation that





adheres to the highest possible standards of quality with respect
to research design and statistical analysis
provides an adequate description of the programs evaluated
and, to the extent possible, examines the relationship between
program implementation and program impacts
provides an analysis of the results achieved by the program with
respect to its projected effects
employs experimental designs using random assignment, when
feasible, and other research methodologies that allow for the
strongest possible causal inferences when random assignment is
not feasible
may study program implementation through a combination of
scientifically valid and reliable methods
What Works

What Works Clearinghouse (WWC)




reviews and reports on existing studies of interventions (education
programs, products, practices, and policies) in selected topic areas


established in 2002 by IES
to provide educators, policymakers, and the public with a central and
trusted source of scientific evidence of what works in education
administered by the U.S. Department of Education, through a contract
to a joint venture of the American Institutes for Research and the
Campbell Collaboration
apply standards that follow scientifically valid criteria for determining
the effectiveness of these interventions
Technical Advisory Group (TAG)



leading experts in research design, program evaluation, and research
synthesis
advises on the standards for evaluation research reviews
monitors and informs the methodological aspects of WWC reviews and
reports
www.whatworks.ed.gov
What Works - TAG













Dr. Larry V. Hedges, Chairperson, Stella M. Rowley Professor of Education, Psychology,
Public Policy Studies, and Sociology, University of Chicago, and editorial board member of the
American Journal of Sociology, the Review of Educational Research, and Psychological Bulletin.
Dr. Betsy Jane Becker, Professor of Measurement and Quantitative Methods, College of
Education, Michigan State University.
Dr. Jesse A. Berlin, Professor of Biostatistics, University of Pennsylvania School of Medicine,
and Director of Biostatistics at the university's Comprehensive Cancer Center.
Dr. Douglas Carnine, Professor of Education, University of Oregon, and Director of the
National Center to Improve the Tools of Educators.
Dr. Thomas D. Cook, Professor of Sociology, Psychology, Education and Social Policy,
Northwestern University, and Faculty Fellow at the Institute for Policy Research.
Dr. David J. Francis, Professor of Quantitative Methods, Chairman of the Department of
Psychology, and Director of the Texas Institute for Measurement, Evaluation, and Statistics,
University of Houston.
Dr. Robert L. Linn, distinguished Professor of Education, University of Colorado at Boulder,
and Co-Director of the National Center for Research on Evaluation, Standards, and Student
Testing.
Dr. Mark W. Lipsey, Senior Research Associate, Vanderbilt Institute for Public Policy Studies,
and Director of the Center for Evaluation Research and Methodology.
Dr. David Myers, Senior Fellow, Mathematica Policy Research, and former Director of the U.S.
Department of Education's national evaluation of Upward Bound.
Dr. Andrew C. Porter, Patricia and Rodes Hart Professor of Educational Leadership and Policy
and Director of the Learning Sciences Institute at Vanderbilt University.
Dr. David Rindskopf, Professor of Psychology and Educational Psychology, City University of
New York Graduate Center, and elected Fellow of the American Statistical Association.
Dr. Cecilia E. Rouse, Professor of Economics and Public Affairs, and joint appointee in the
Economics Department and Woodrow Wilson School, Princeton University.
Dr. William R. Shadish, Founding Faculty and Professor of Social Sciences, Humanities, and
Arts at the University of California, Merced.
What Works Current Topics
The What Works Clearinghouse (WWC) prioritizes topics based on
the following criteria:
 potential to improve important student outcomes;
 applicability to a broad range of students or to particularly
important subpopulations;
 policy relevance and perceived demand within the education
community; and
 likely availability of scientific studies.
 Specifically, the topics were selected from nominations received
through:





emails from the public;
meetings and presentations sponsored by the What Works
Clearinghouse;
the What Works Network;
suggestions presented by senior members of education
associations, policymakers, and the U.S. Department of Education;
and
reviews of existing research.
What Works Current Topics
Topics include:
 Math—Curriculum-Based Interventions for Increasing Middle School
Math
 Reading—Interventions for Beginning Reading
 Character Education—Comprehensive Schoolwide Character
Education Interventions: Benefits for Character Traits, Behavioral, and
Academic Outcomes
 Dropout Prevention—Interventions for Preventing High School
Dropout
 English Language Learning—Interventions for Elementary School
English Language Learners: Increasing English Language Acquisition
and Academic Achievement
 Math—Curriculum-Based Interventions for Increasing Elementary
School Math
 Early Childhood—Interventions for Improving Preschool Children’s
School Readiness
 Delinquent, Disorderly, and Violent Behavior—Interventions to
Reduce Delinquent, Disorderly, and Violent Behavior in Middle and High
Schools
 Adult Literacy—Interventions for Increasing Adult Literacy
 Peer-Assisted Learning—Peer-Assisted Learning Interventions in
Elementary Schools: Reading, Mathematics, and Science Gains
“Does Not Meet Evidence Screens”
Studies may not pass WWC screening requirements for the
following reasons:

Evaluation research design. The study did not meet certain
design standards. Study designs that provide the strongest evidence
of effects include

randomized controlled trials

regression discontinuity designs

quasi-experimental designs (must use a similar comparison
group and have no attrition or disruption problems)

single subject designs

Topic area definition. The study did not meet the intervention
definition developed by the WWC for a particular topic.

Time period definition (generally, the last 20 years)

Relevant outcome

academic outcomes, not, for example, student self-confidence

needs to have only one relevant outcome to pass this screen

test reliability or validity

sample or description of relevant test items if a study outcome
test is not known or available

Relevant student sample
A Real Live Current Example
MATHEMATICS AND SCIENCE EDUCATION
RESEARCH GRANTS PROGRAM
 CFDA (Catalog of Federal Domestic Assistance)
NUMBER: 84.305
 RELEASE DATE: May 6, 2005
 REQUEST FOR APPLICATIONS NUMBER: NCER-0602 Mathematics and Science Education Research
Grants Program
 http://www.ed.gov/about/offices/list/ies/programs
.html
 LETTER OF INTENT RECEIPT DATE: September
12, 2005
 APPLICATION RECEIPT DATE: November 3, 2005,
8:00 p.m. Eastern time
A Real Live Current Example
REVIEW CRITERIA FOR SCIENTIFIC MERIT
 Significance
 Does applicant make a compelling case for the potential
contribution of the project to the solution of an education
problem?
 Does the applicant present a strong rationale justifying the
need to evaluate the selected intervention (e.g., does prior
evidence suggest that the intervention is likely to
substantially improve student learning and achievement)?
 Research Plan
 Does the applicant present
 (a) clear hypotheses or research questions
 (b) clear descriptions of and strong rationales for the
sample, measures (including information on reliability and
validity), data collection procedures, and research design
 (c) a detailed and well-justified data analysis plan?
 Does the research plan meet the requirements described in
the section on the Requirements of the Proposed Research?
 Is the research plan appropriate for answering the research
questions or testing the proposed hypotheses?
A Real Live Current Example

Applications under Goal Three (Efficacy
and Replication Trials)
 Under Goal Three, the Institute requests
proposals to test the efficacy of fully
developed interventions that already have
evidence of potential efficacy.
 By efficacy, the Institute means the degree to
which an intervention has a net positive
impact on the outcomes of interest in relation
to the program or practice to which it is being
compared.
A Real Live Current Example
Methodological requirements
 (i) Sample

The applicant should define, as completely as
possible, the sample to be selected and
sampling procedures to be employed for the
proposed study. Additionally, the applicant
should describe strategies to insure that
participants will remain in the study over the
course of the evaluation.
A Real Live Current Example

(ii) Design





Applicants should describe how potential threats to
internal and external validity will be addressed.
Studies using randomized assignment to treatment
and comparison conditions are strongly preferred.
When a randomized trial is used, the applicant should
clearly state the unit of randomization (e.g., students,
classroom, teacher, or school).
Choice of randomizing unit or units should be
grounded in a theoretical framework.
Applicants should explain the procedures for
assignment of groups (e.g., schools, classrooms) or
participants to treatment and comparison conditions.
A Real Live Current Example
(ii) Design (continued)


Only in circumstances in which a randomized trial
is not possible may alternatives that substantially
minimize selection bias or allow it to be modeled
be employed. Applicants … must make a
compelling case that randomization is not
possible.
Acceptable alternatives include appropriately
structured regression-discontinuity designs or
other well-designed quasi-experimental designs
that come close to true experiments in minimizing
the effects of selection bias on estimates of effect
size.
A Real Live Current Example
(ii) Design (continued)
 A well-designed quasi-experiment reduces substantially
the potential influence of selection bias on membership
in the intervention or comparison group. This involves:
 demonstrating equivalence between the intervention
and comparison groups at program entry on the
variables measuring program outcomes (e.g., math
achievement test scores), or obtaining such
equivalence through statistical procedures such as
propensity score balancing or regression
 demonstrating equivalence or removing statistically
the effects of other variables on which the groups
may differ and that may affect intended outcomes of
the program being evaluated (e.g., demographic
variables, experience and level of training of
teachers, motivation of parents or students)
 a design for the initial selection of the intervention
and comparison groups that minimizes selection bias
or allows it to be modeled
A Real Live Current Example

(iii) Power



Applicants should clearly address the power of the
evaluation design to detect a reasonably expected
and minimally important effect.
For determining the sample size, applicants need to
consider the number of clusters, the number of
individuals within clusters, the potential adjustment
from covariates, the desired effect, the intraclass
correlation (i.e., the variance between clusters
relative to the total variance between and within
clusters), the desired power of the design, onetailed vs. two-tailed tests, repeated observations,
attrition of participants, etc.
Applicants should anticipate the degree to which
the magnitude of the expected effect may vary
across the primary outcomes of interest.
A Real Live Current Example

(iv) Measures

Investigators should include
relevant standardized measures of student
achievement (e.g., standardized measures of
mathematics achievement)
 other measures of student learning and
achievement (e.g., researcher-developed
measures)
 measures of teacher practices
 information on the reliability, validity, and
appropriateness of proposed measures

A Real Live Current Example

(v) Fidelity of implementation of the intervention

The applicant should
 specify how the implementation of the intervention
will be documented and measured
 either indicate how the intervention will be
maintained consistently across multiple groups
(e.g., classrooms and schools) over time or
describe the parameters under which variations in
the implementation may occur
 propose research designs that permit the
identification and assessment of factors impacting
the fidelity of implementation
A Real Live Current Example



(vi) Comparison group, where applicable
The applicant should
 describe strategies to avoid contamination between
treatment and comparison groups
 include procedures for describing practices in the
comparison groups
 be able to compare intervention and comparison groups
on the implementation of key features of the
intervention
using a business-as-usual comparison group is acceptable


applicants should specify the treatment or treatments
received in the comparison group
applicants should account for the ways in which what
happens in the comparison group are important to
understanding the net impact of the experimental
treatment
A Real Live Current Example


(vii) Mediating and moderating variables
 Mediating and moderating variables that are measured in
the intervention condition that are also likely to affect
outcomes in the comparison condition should be measured
in the comparison condition (e.g., student time-on-task,
teacher experience/time in position).
 The evaluation should account for sources of variation in
outcomes across settings (i.e., to account for what might
otherwise be part of the error variance).
(viii) Data analysis
 specific statistical procedures should be described
 the relation between hypotheses, measures, and
independent and dependent variables should be clear
 the effects of clustering must be accounted for in the
analyses, even when individuals are randomly assigned to
condition
Download