Validity

advertisement
Reliability or Validity
Reliability gets more attention:
 Easier to understand
 Easier to measure
 More formulas (like stats!)
 Base for validity
Need for validity
Does test measure what it claims?
 Can test be used to make decisions?

Validity
Reliability is a necessary, but not a sufficient
condition for validity.
Validity: a definition
“A test is valid to the extent that inferences
made from it are appropriate, meaningful,
and useful”
Standards for Educational and Psychological Testing, 1999
“Face Validity”
“looks good to me”!!!!!!!
Trinitarian view of Validity
Content (meaning)
 Construct (meaning)
 Criterion (use)

1) Content Validity
“How adequately a test samples behaviors
representative of the universe of behaviors
the test was designed to measure.”
Determining Content Validity



describe the domain
specify areas to be measured
compare test to domain
Content Validity Ratio (CVR)
Agreement among raters if item is:
 Essential
 Useful but not essential
 Not necessary
2) Construct validity
“A theoretical intangible”
“An informed, scientific idea”
-- how well the test measures that construct
Determining Construct validity
behaviors related to constructs
 related/unrelated constructs
 identify relationships
 multi trait/multi method

Multitrait-Multimethod Matrix
Correlate scores from 2 (or more tests)
 Correlate scores obtained from 2 (or more)
methods

Evidence of Construct Validity

Upholds theoretical predictions
 Changes

Homogeneity of questions
 (internal

(?) over time, gender, training
consistency, factor or item analysis)
Convergent/discriminant
 Multitrait-multimethod
matrix
Decision Making
How well the test can be used to help in
decision making about a particular criterion.
Decision Theory
Base rate
 Hit rate
 Miss rate
 False positive
 False negative

3) Criterion Validity
“The relationship between performance on the
test and on some other criterion.”
Validity coefficient
Correlation between test score and score on
criterion measure.
Two ways to establish
Criterion Validity
A) Concurrent validity
B) Predictive validity
Determining Concurrent validity
Assess individuals on construct
 Administer test to lo/hi on construct
 Correlate test scores to prior identification
 Use test later to make decisions

Determining Predictive validity
Give test to group of people
 Follow up group
 Assess later
 Review test scores
 If correlate with behavior later can use later
to make decisions

Incremental validity
 Value
of including more than one predictor
 Based on multiple regression
 What is added to prediction not present with
previous measures?
Expectancy data
Taylor-Russell Table
 Naylor-Shine Tables
 Too vague, outdated, biased

Unified Validity - Messick
“Validity is not a property of the test, but
rather the meaning of the scores.”
Value implications
Relevance and utility
Unitarian considerations
Content
 Construct
 Criterion
 Consequences

Threats to validity
Construct underrepresentation (too narrow)
 Construct-irrelevant variance (too broad)
construct-irrelevant difficulty
construct-irrelevant easiness

Example 1
Dr. Heidi considers using the Scranton
Depression Inventory to help identify
severity of depression and especially to
distinguish depression from anxiety. What
evidence should Dr. Heidi use to determine
if the test does what she hopes it will do?
Example 2
The newly published Diagnostic Wonder Test
promises to identify children with a
mathematics learning disability. How will
we know whether the test does so or is
simply a slickly packaged general ability
test?
Example 3
Ivy College uses the Western Admissions Test
(WAT) to select applicants who should be
successful in their studies. What type of
evidence should we seek to determine if the
WAT satisfies its purpose?
Example 4
Mike is reviewing a narrative report of his
scores on the Nifty Personality
Questionnaire (NPQ). The report says he is
exceptionally introverted and unusually
curious about the world around him. Can
Mike have any confidence in these
statements or should they be dismissed as
equivalent to palm readings at the county
fair?
Example 5
A school system wants to use an achievement
battery that will measure the extent to which
students are learning the curriculum
specified by the school. How should the
school system proceed in reviewing the
available achievement tests?
Example 6
Super sun computers needs to hire three new
employees. They have decided to administer
the Computer Skills Assessment (CSA) to
their applicants and use the results as the basis
of their decision. How can they determine if
that measure is a good fit for their hiring
practice?
Project homework question





What content or construct is your measure
assessing? (explain your answer)
What do you think congruent and discriminate
constructs would be to the one in your measure?
How would you determine the content or construct
validity of your measure?
How would you determine the criterion validity of
your measure?
Why would you use those approaches?
Project homework question
Select a standardized instrument from
MMY to use as a comparison for your
measure?
 Copy the relevant data.
 Why did you select that instrument?
 How would you use it to help standardize
your measure?

Download