Validity and Validation Methods

advertisement
Validity
and Validation Methods
Workshop Flow
• The construct of MKT
– Gain familiarity with the construct of MKT
– Examine available MKT instruments in the field
• Assessment Design
– Gain familiarity with the Evidence-Centered Design approach
– Begin to design a framework for your own assessment
• Assessment Development
– Begin to create your own assessment items in line with your
framework
• Assessment Validation
– Learn basic tools for how to refine and validate an assessment
• Plan next steps for using assessments
Assessment Development
Process
Domain Analysis
(Define
Test
Specs)
Define
item
Template
Define
item
Specs
Domain
Modeling
(Design
Pattern)
Develop
Pool of
items
Refine
items
Refine
items
Collect/
Analyze
Validity
Data
Assemble
Test
Document
Technical
Info
Validity: The Cardinal Virtue
of Assessment
• The degree to which empirical evidence and theoretical
rationales support the adequacy and appropriateness of
inferences and actions based on test scores or other
modes of assessment.
» -- Mislevy, Steinberg, and Almond, 2003
• Validation is a process of accumulating evidence to provide
a scientifically sound validity argument to support the
intended interpretation of test scores
» -- Standards for Educational and Psychological Testing (AERA /
APA / NCME, 1999)
Jargon Note:
• Two kinds of “evidence”
Assessment Reliability
The extent to which an instrument yields consistent, stable, and
uniform results over repeated administrations under the same
conditions each time
Figure obtained from the website: http://www.socialresearchmethods.net/KB/rel&val.htm
Iterative Refinement
Steps of item Validation
Step
Method
1. Expert Panel Review
(Formative)
Alignment and Ratings of items
2. Feasibility of items
Think-Alouds
3. Field testing
Testing with a large sample
4. Expert Panel Review
(Summative)
Alignment and Ratings of items
1. Expert Panel Review
(Formative)
• Are the items aligned with…
– The test specifications?
– Content covered in the curriculum?
– State or national standards?
• Is the complexity level aligned with intended
use (e.g., target population, grade-level)?
• Are the item’s prompts and rubrics aligned?
2. Feasibility of Items (ThinkAlouds)
•
•
•
•
Does the item make sense to the teacher?
Does the item elicit the cognitive processes intended?
Can the item be completed in the available time?
Can respondents use the diagrams, charts, tables as
intended?
• Is the language clear?
• Are there differences in approaches by experts and novices
(or teachers exposed or not to the relevant instruction)?
SimCalc Example:
Think-Alouds
Proportional
Reasoning Problem #3
SimCalc
Expected proportional reasoning:
3.5 white
--------3 dark
=
x white
------5 dark
Found:
Just draw the bars!!
Conducting Think-Alouds
• Sample
– N: You learn the most in the first 3-6
– Who
• Experts and Novices
• Low, Medium, and High Achievers
• Varying in proficiency in English
• Data capture and analysis
– Data can be extremely rich analyzed with varying levels of detail
• Often sufficient to do real-time note-taking
• Videotaping can be helpful
– Document
• Problems with item clarity (language, graphics)
• Response processes – What strategies are they using?
3. Field Testing
• Item-level concerns
– Are there ceiling or floor
effects?
– What is the range of
responses we can expect
from a variety of teachers?
– Is the amount of variation in
responses sufficient to
support statistical analysis?
– What is the distribution of
responses across
distracters?
– Do the items discriminate
among teachers performing
at different levels?
• Assessment-level concerns
– Are there biases among
subgroups?
– Does the assessment have
high internal reliability?
– What is the factor structure of
the test?
Key Item Statistic: Percent
Correct
• What percent of people get it correct?
• Gives us a sense of:
–The item difficulty
–The range of responses
• Alerts you to potential problems:
–Floor = roughly 0-10%
–Ceiling = roughly 85-100%
Count of Teachers Who Chose Distracte r
SimCalc Example:
Exploratory Results for item
#20
150
N=179
Ability Level
1
2
3
4
100
Quartiles of
total test score
50
0
1
2
3
Distracter
4
5
SimCalc
Count of Teachers Who Chose Distracte r
SimCalc Example:
Exploratory Results for item
#43
40
30
Ability Level
1
2
3
N=179
4
20
10
Skip
1
2
3
Distracter
4
5
SimCalc
SimCalc Example:
Exploratory Results for item
#6
Response
Count
Correct (12)
160 (70%)
Additive error (8)
42 (18%)
Other
20 (9%)
Skip
8
(3%)
SimCalc
Conducting a Field Test
• Test under conditions as close to “real” as possible
–
–
–
–
Analogous population of teachers
Administration conditions
Formatting
Scoring
• Gather and use demographic data
• Determine sample size based on
– The number of teachers you can get
– The kinds of statistical analyses you decide to conduct
• e.g., 5-10 respondents per item for fancy statistics
• Can use simple and fancy statistics
Field Testing with Teachers by
Mail
• Purchasing national mailing lists of teachers
– http://www.schooldata.com/
– http://www.qeddata.com
• Best practices mailing sequence (Cook et al., 2000)
– An introductory postcard announcing that a survey will be sent
– About a week later, a packet containing the survey
– About two weeks later, a reminder postcard
– About two weeks later, a second packet containing the survey
and a reminder letter
– About three weeks later, a ‘third appeal’ postcard
Iterative Refinement
Steps of item Validation
Step
Method
1. Expert Panel Review
(Formative)
Alignment and Ratings of items
2. Feasibility of items
Think-Alouds
3. Field testing for psychometric
information
Testing with a large sample
4. Expert Panel Review
(Summative)
Alignment and Ratings of items
4. Expert Panel Review
(Summative)
• Similar questions as in Step 1 (Formative review)
• Same or different panel of experts
• Ratings and alignment collected after items are fully
refined
• Results of summative expert panel review provide
evidence of alignment of items with
standards/curriculum, content validity, and grade-level
appropriateness
• This could be reported in technical documentation
Iterative Refinement
Steps of item Validation
Step
Method
1. Expert Panel Review
(Formative)
Alignment and Ratings of items
2. Feasibility of items
Think-Alouds
3. Field testing for psychometric
information
Testing with a large sample
4. Expert Panel Review
(Summative)
Alignment and Ratings of items
Creating a Validity
Argument
• Integrates all evidence into a coherent
account of the degree to which existing
evidence and theory support the
intended interpretation of test scores
For a Sound Validity
Argument,
at Minimum, Pay Attention
to…
Sources of Evidence
Procedures
1. Test content
Conduct alignment of items to standards/curriculum by
content experts
2. Response processes
• Have at least one or two teachers do think-alouds
• Administer test to at least one group
3. Relationships to other
variables
If possible, conduct one or more of the following:
• Conduct instructional sensitivity study
• Correlate with existing measures
• Correlate with construct-irrelevant variables
4. Internal structure
• Establish internal reliability (alpha)
• Assess inter-scorer reliability, if there is a rubric
5. Consequences of testing
Be aware of the limitations of your test, not going beyond
intended purposes and its intended role on your project
•
Activity #5
Conduct
Think-Aloud
Be the observer for
Break into groups of 3 and select roles
– 1 interviewer
– 1 interviewee
– 1 observer to complete observation recording sheet
•
•
•
•
•
your own items!
Select set of 2 items
Conduct think-alouds. Interviewer and observers take notes on the form
in the protocol.
Repeat two more times, switching roles, with new items.
Revise your own items.
Following, we will have a discussion about
– Insights about development of assessment items
– Questions and challenges
Activity #5
Think-Aloud Pointers
• Find out how long problems take to do
• Uncover issues of item clarity and level of difficulty
• Derive a model of the knowledge and thinking that the
students engage when solving each problem. In
observation notes, describe:
– How problems are solved, focusing on the underlying
knowledge, skills, and structures of item performance
– Actions, thought processes, and strategies
Activity #5
Think-Aloud Pointers
• Interviewers SHOULD
–Prompt the teacher to keep talking
–Ask clarifying questions about what teachers are
saying (but not as scaffolding)
• Interviewers SHOULD NOT
–Help teachers in any way during the interview (e.g.,
no hints, tips, or scaffolding). Be sure to avoid
unintentional hints by being more encouraging when
answers are correct.
Iterative Refinement
Steps of item Validation
Step
Method
1. Expert Panel Review
(Formative)
Alignment and Ratings of items
2. Feasibility of items
Think-Alouds
3. Field testing for psychometric
information
Testing with a large sample
4. Expert Panel Review
(Summative)
Alignment and Ratings of items
Download