Instrumentation

advertisement
Research Design Part 2
Variability, Validity, Reliability

Objectives
 Refine research purpose & questions
 Variability
 Validity
 External, internal, criterion, content, construct
 Reliability
 Test-retest, inter-rater, internal consistency,
instrument
Variability
 Different values of the independent variable
 3 sources… systematic, error, extraneous
Variability
 1. Systematic
 Variability within the Ind. variables
 Design study to maximize systematic variability
 Rewards vs. management styles
 Select the right sample & methods
Variability
 2. Error
 Sampling & measurement error
 Eliminate as many conditions as possible
 Similar leagues, ages, abilities
 Increase reliability of the instrument
Variability
 3. Extraneous
 Control as much as possible
 Not a planned part of the research
 Influence outcome that we don’t want
 Examples…
Variability
 3. Extraneous
 Examples
 Measure teaching techniques of V & R
between 2 sections of 497 to see level of
comprehension.
 Measure differences between a week of TRX
& Crossfit using the same fitness assessment
at the end
Main Function of Research…
 Maximize systematic
variability, control extraneous
variability, & minimize error
variability
Validity & Reliability
 Validity
 Degree to which something measures what it is supposed to
measure
 Reliability
 Consistency or repeatability of results
Validity & Reliability
You are hitting the
target consistently,
but missing the
center.
Consistently and
systematically
measuring the
wrong value for all
respondents
Random hits spread
across the target but
seldom hit the center
Get a good group
average, but not a
consistent one
Hits are spread
across the target
but consistently
missing the center.
consistently hit the
center of the target
Validity & Reliability
 Can a measurement/instrument be reliable, but not valid?
 Weighing on a broken scale
 Can a measurement/instrument be valid, but not reliable?
 To be useful, a test/measurement must be both valid and
reliable
Validity
 External validity
 Internal validity
 Test/criterion validity
 Content validity
 Construct validity
External Validity
 Generalizability of the results
 Population external validity
 Characteristics & results can only be generalized to those with
similar characteristics
 Does sample represent the entire population
 Demographics
 Psych experiments with college students
 Use multiple PE classes, intramural leagues, sports, teams,
conferences
 Control through sampling
External Validity
 Ecological external validity
 Conditions of the research are generalizable to similar
characteristics
 Physical surroundings
 Time of day
 AM vs. PM
 More common in testing … GRE
Internal Validity

Confidence in the cause and effect relationship in a study.

Strongest when the study’s design (subjects, instruments/measurements,
and procedures) effectively controls possible sources of error so that those
sources are not reasonably related to the study’s results.

The key question that you should ask in any experiment is:

“Could there be an alternative cause, or causes, that explain my
observations and results?”
Internal Validity
 History
 Extraneous incidents/events that occur during the research to
effect results
 Only impacts studies across time
 Attendance at football games/coaching change
 Survey at IHSA @ parent behavior & parent fight breaks out in
middle of survey across the gym
Internal Validity
 Selection
 If there are systematic differences in groups of subjects
 Gender – boys more active than girls
 Higher motivation level
 More positive attitude toward study
 Compare GRE scores & grad school performance between sequences
 Occurs when random sampling isn’t used
Internal Validity
 Statistical regression
 If doing pre-test/post-test those
scoring extremely high or low on
first test will often “regress to the
mean” on the second test
 Scoring based more on luck than
actual performance
 The regression effect causes the
change & not the treatment
 Don’t group the high/low scores for
the post-test
Note: The less reliable
the instrument the
greater the regression.
Internal Validity
 Pre-testing
 Pre-test can increase/decrease motivation
 Gives subjects opportunities to practice
 Practice can be a positive so they get a true score
 Pedometers (A. McGee thesis)
 Instrument can make people think after the pre-test
 Motivation instruments
Internal Validity
 Instrumentation
 Changes in calibration of the exam, instrument – Experimental
research
 Changes in observer scoring
 Fatigue/ boredom
 Reality judging shows
 Maturation
 Experimental research
Internal Validity
 Diffusion of intervention
 Experimental research
 Attrition/Mortality
 Subjects drop out/lost
 Low scorers on GRE drop out of grad school
 Coaching techniques & loss of players
Internal Validity
 Experimenter effect
 Presence, demeanor of researcher impacts +/ Course instructor is PI
 Course evals
 Coach or teacher conducting the study
 Teacher staying in the room when they complete PAQ-C
 Subject effect
 Subjects’ behaviors change because they are subjects
 Subjects want to present themselves in the best light
 Hawthorn effect
Test/Criterion Validity
 Degree to which a measure/test is related to some recognized
standard or criterion
 Increase test validity
 Create an intelligence test and then compare subjects scores on our test
with their scores on the IQ test
 Use 2 motivation instruments
 Giving subjects our intelligence test and the IQ test at the same time
 Use abbreviated Myers Briggs – 126 vs. 72 items at the same time
Content Validity
 Also called face validity
 Degree to which a test adequately samples what is covered in a
course
 Usually used in education
 Does a measurement appear to measure what it purports to
measure?
 No statistical measure/systematic procedure to test this
Content Validity
 Often, experts (panel) are used to verify the content validity of
measurements in research studies
 Content validity is useful, but not the strongest/most credible
way of evaluating a measurement
 Examples
 Rewards listing
 Competency categories
Construct Validity
 Degree to which a test/ measurement measures a hypothetical
construct

Overall quality of measurement
 Construct
 Variables… recruitment, motivation, mental preparation
 Examples
 Do the selected variables completely measure recruitment?
 How well does the instrument measure mental preparation?
 Do the questions adequately test motivation?
Construct Validity
 Threats to construct validity

Using one method to measure the construct

Inadequate explanation of a construct
 Ex. Depression = lethargy, loss of appetite, difficulty in concentration,
etc…

Measuring just one construct & making inferences
 Using 1 item to measure personality
 Ex. Myers Briggs = 4 dichotomies
Validity Overview
 Content
 Test content
 Test
 Standards
 Construct
 How well constructs
describe relationship
Reliability
 Degree to which a test/measurement yields consistent and
repeatable results
 Often reported as a correlation coefficient… Cronbach Alpha
Cronbach's alpha
α ≥ 0.9
0.8 ≤ α < 0.9
0.7 ≤ α < 0.8
0.6 ≤ α < 0.7
0.5 ≤ α < 0.6
α < 0.5
Internal consistency
Excellent
Good
Acceptable
Questionable
Poor
Unacceptable
Look at
articles.
4 Sources of Measurement Error
 1. The participants
 Health
 Motivation
 Mood
 Fatigue
 Anxiety
4 Sources of Measurement Error
 2. The testing
 Changes in time limits
 Changes in directions
 How rigidly the instructions were followed
 Atmosphere of the test/conditions
Sources of Measurement Error
 3. The instrumentation
 Sampling of items
 Calibration of (mechanical) instruments
 Poor questions
 4. The scoring
 Different scoring procedures
 Competence, experience, dedication of scorers
 GRE…
Methods to Establish Reliability
 Test-Retest Reliability (stability)
 Alternate forms
 Internal consistency
 Agreement/ Inter-rater Reliability
Test-Retest Reliability (1)
 Repeat test on same subjects at a later time
 Usually retest on a different day
 Use correlation coefficients between subjects’ two scores
 Used extensively in fitness & motor skills tests
 Used less for pencil and paper tests
Alternate Forms Reliability (2)
 Alternate forms reliability
 Construct 2 tests that measure the same thing
 Give the 2 tests to the same individuals at about the same
time.
 Highly used on standardized tests
 CPRP/CPRE exam (125 questions)
 Rarely used on physical tests because of the difficulty to
develop 2 equivalent tests
Internal Consistency Reliability (3)
 Split half reliability
 Similar to alternate form except 1 form is used
 Divide form into comparable halves (even ?’s/odd
?’s)
 Do not use first half vs. second half because of testing
fatigue
 Correlate # of odds & evens correct
Internal Consistency Reliability (3)
Average inter-item correlation
 Identify question numbers that measure a
construct.
 Correlate the responses to these questions
 Psychological tests
Inter-Rater Reliability (4)
 Two or more persons rate or observe
 Common in observational research & performance based
assessments involving judgments
 GRE writing exam scoring
 Will be expressed as a correlation coefficient or a percentage of
agreement
 Does not indicate anything about consistency of performances
In Summary
 Pick variables that have a chance of varying (systematic
variability)
 Pick a reliable instrument (error variability, statistical regression,
reliability)
 Use random sampling whenever possible (extraneous
variability, internal validity)
 Control external validity thru sampling process at multiple sites
(population external validity)
In Summary
 Control external validity thru similar environmental processes
(ecological external validity)
 Make sure survey measures what it is supposed to (content &
construct validity)
 Fully plan data collection process (reliability)
Download