Introduction to Psychometrics

advertisement
Week 1: Introduction
and Research Design
PSYCHOMETRICS
MGMT 6971
Michael J. Kalsher
Lally School of
Management &
Technology
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
1
Course Overview
• Review of research design/methodology and
statistical concepts
• Review of SPSS (data entry; setting up variables; graphing;
syntax; etc.)
• Statistical analysis techniques
–
–
–
–
Covariance, correlation, simple regression, multiple regression
t-tests, ANOVA / ANCOVA / MANOVA
Non-parametric statistics
Factor analysis, Multilevel Linear Models, Structural Equation
Models
• Grading requirements
– Exams, Labs, Problem Sets, Data Collection/Analysis Project
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
2
Research Methods & Design:
Establishing Control over your variables
• Historical foundations of scientific research in
the behavioral and social sciences.
• The importance of research design
– Ruling out alternative explanations.
– Establishing control of IVs.
• Research Design vs. Statistical Analysis
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
3
Methods of Establishing Truth
• Tenacity
– “It’s so because it’s so”
• Authority
– “Aristotle said it’s so”
• Logical Deduction (Rationalism)
– Aristotle said women have fewer teeth than men (Premise)
– You are a woman
– Therefore, you have fewer teeth than I
• Empiricism
– Combines Logical Deduction with observation
(measurement)
– “Let’s count your teeth”
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
4
Scientific Method
• Shared observations
– Rules out individual experiences like religious
revelations or esthetic experiences (William
James).
• Reproducible Effects
– “No miracles”
• Conditional Truths
– Premises may be wrong
– Necessary Connection may be wrong
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
5
Types of Relationship
(between two concepts)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
Spurious Relationships
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
Spurious Relationships
Ice Cream Sales
Heat Wave
Swimming Pool Drownings
A city's ice cream sales are found to be highest when the rate of drownings in the city’s swimming
pools is highest. To allege that ice cream sales cause drowning, or vice-versa, would be to imply a
spurious relationship between the two. In reality, a third variable, in this instance a heat wave, more
likely caused both.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
Sets of Relationships (a theory)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
A Model of the Research Process:
Levels of Constraint
(Model used to illustrate the continuum of demands placed on the adequacy of the
information used in research and on the nature of the processing of that information.)
High
Low
MGMT 6971
Experimental Research
Differential Research
Correlational Research
Case-study Research
Naturalistic Observation
Exploratory Research
PSYCHOMETRICS
© 2014, Michael Kalsher
Research plan
becomes increasingly
detailed (e.g., precise
hypotheses and
analyses) but less
flexible.
Research plan may be
general, ideas, questions,
and procedures relatively
unrefined.
10
Classes of Research Variables:
Variables defined by their use in research
Independent variable
A variable that is actively manipulated
by the researcher to see what its impact
will be on other variables.
Dependent variable
A variable that is hypothesized to be
affected by the independent-variable
manipulation.
Extraneous variable
Any variable (usually unplanned or
uncontrolled factors), other than the
independent variable, that might
the dependent measure in a study.
affect
A constant
Any variable prevented from varying
(by holding variables constant, they do not affect the outcome of the
research
).
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
11
Classes of Research Variables:
The Measurement Model
Variable values are represented by numbers, but these numbers
may not demonstrate all the characteristics of true numbers.
1. Nominal. A variable made up of discrete, unordered categories. Each
category is either present or absent and categories are mutually exclusive
and exhaustive (e.g., gender).
2. Ordinal. A variable for which different values indicate a difference in the
relative amount of the characteristic being measured.
3. Interval. A variable for which equal intervals between variable values
indicate equal differences in amount of the characteristic being measured.
4. Ratio. Ratios between measurements as well as intervals are meaningful
because there is a starting point (zero).
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
12
Scales of Measurement:
Some Examples
Levels of Measurement
Nominal
Examples
Ordinal
Ratio
Diagnostic categories
Socioeconomic
Test scores;
Weight; length;
brand names; political
class; ranks
personality and
reaction time;
attitude scales
# of responses
Identity; magnitude
Identity; magnitude;
equal intervals
equal intervals;
or religious affiliation
Properties
Interval
Identity
Identity; magnitude
true zero point
Mathematical
Operations
Type of Data
Typical
Statistics
MGMT 6971
None
Rank order
Add; subtract
Add; subtract;
multiply; divide
Nominal
Ordered
Score
Score
Chi Square
Mann-Whitney
t-test; ANOVA
t-test; ANOVA
U-test
PSYCHOMETRICS
© 2014, Michael Kalsher
13
The Role of Variance
- In an experiment, IV(s) are manipulated to cause variation between
experimental and control conditions.
- Experimental design helps control extraneous variation--the variance
due to factors other than the manipulated variable(s).
Sources of Variance
- Systematic between-subjects variance
Experimental variance due to manipulation of the IV(s) [The Good Stuff]
Extraneous variance due to confounding variables.
[The Not-So-Good Stuff]
Natural variability due to sampling error
- Non-systematic within-groups variance
Error variance due to chance factors (individual differences) that affect some
participants more than others within a group
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
14
Separating Out The Variance
SST = Sums of Squares Total
SSM = Sums of Squares Model
SSR = Sums of Squares Error
SST
SSM
MGMT 6971
PSYCHOMETRICS
SSR
© 2014, Michael Kalsher
15
Controlling Variance in Experiments
In experimentation, each study is designed to:
1. Maximize experimental variance.
2. Control extraneous variance.
3. Minimize error variance.
• Good measurement
• Manipulated and Statistical control
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
16
Controlling Variance in Observational
Studies
• Choose IV’s with large natural variance
• Control for alternate explanations by
measuring confounding variables and
statistically removing their variance
• Minimize error variance
– Good measurement
– Statistical control
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
17
Maximizing Experimental Variance:
Strong manipulations and Manipulation Checks
Experimental Variance
(The Good Stuff)
Due to the effects of the IV(s) on the DV(s)
Ensure that experimental manipulations are strong and
reliable!
Manipulation Check
Procedures designed to determine whether manipulation
of the IV(s) had the intended effect(s) on the DV(s)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
18
Controlling Extraneous Variance
Extraneous variables: Between-group variables--other
than the IV(s)--that have effects on whole groups and thus
may confound the results.
Goal: To prevent extraneous variables from differentially affecting the groups.
Solution: Take steps to ensure that: (1) the experimental and control groups
are equivalent at the beginning of the study; and (2) groups are treated exactly
the same--save for the intended manipulation (of the IV).
Methods (for controlling extraneous variance):
1.
2.
3.
4.
Random Assignment of subjects to experimental conditions
Select participants on the basis of one or more potentially confounding
variables (e.g., age, ethnicity, social class, IQ, sex).
Build the confounding variables into the study as additional IVs.
Match participants on confounding variable or use within-subjects design
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
19
Test Statistics
Essentially, most test statistics are of the following
form:
Systematic variance
Test statistic =
Unsystematic variance
Test statistics are used to estimate the likelihood that an
observed difference is real (not due to chance), and is
usually accompanied by a “p” value (e.g., p<.05, p<.01,
etc.)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
20
A Very Simple Statistical Model
outcomei = (model) + errori
• model – an equation made up of variables and parameters
• variables – measurements from our research (X)
• parameters – estimates based on our data (b)
outcomei = (bXi) + errori
outcomei = (b1X1i + b2X2i + b3X3i)+ errori
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
21
Examples of Statistical Models
• One Predictor (e.g. deviance):
outcomei = (bXi) + errori
outcomelecturer1 = mean + errorlecturer1
errrorlecturer1 = mean – outcomelecturer1 = 1 – 2.6 = -1.6
• Multiple Predictors (e.g. sum of squared errors):
outcomei = (b1X1i + b2X2i…)+ errori
errori = (outcome1 – model1)2 + (outcome2 – model2)2 …
= (-1.6)2 + (-0.6)2 + (0.4)2 + (1.4)2 = 5.20
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
22
Types of Hypothesis
• Neyman and Pearson proposed organizing
scientific statements into testable hypotheses.
– H0 – null hypothesis, that no effect will occur
• Adding a narrative component to a video game will not affect
gameplay experience
– H1 – alternative (or experimental) hypothesis, that the effect
you are testing for will occur
• Playing a game with a narrative component will improve your
gameplay experience
• Data cannot prove alternative hypotheses,
only reject null ones
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
23
Null Hypothesis Significance
Testing (NHST)
• NHST combines Fisher’s work with Neyman
and Pearson’s
– Initially assume null hypothesis is true
– Choose a statistical model that represents an
alternative hypothesis
– Calculate p-value of the null hypothesis producing
this model
– If p < .05 (generally), model fits and alternative
hypothesis is supported
• We’re never certain, we just have evidence
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
24
One- and Two-tailed Tests
• One-tailed: directional
results (effect is present
or not)
• Two-tailed: directional
results (effect
increases, decreases,
or no effect)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
25
Types of Mistakes
Statistical decision
Reject Ho
True state of null hypothesis
Ho true
Ho false
Type I error
Don’t reject Ho
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
Correct
Correct
Type II error
26
Inflated Error Rates
• A measure of how well Type I errors have been
avoided
• In most research, the complexity of the question
requires more than one test. The rate of error
increases with the number of tests done, increasing
the Type I error. This is called familywise error.
• Solution? Choose a stricter p-value for each
individual test (Bonferroni correction)
required p-value per test =
(desired overall p-value)/(number of tests)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
27
Statistical Power
• A measure of how well Type II errors
have been avoided (i.e. how well a test
is able to find an effect)
• = 1 – type II error rate
• Power should be 0.8 or higher, so Type
II error rate should not exceed .20.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
28
Confidence Intervals &
Statistical Significance
• p-value of H0 decreases with the amount of overlap
between two confidence intervals
• Moderate overlap (defined as ½ the average Margin
Of Error) indicates p = .05.
• MOE = ½ the length of the confidence interval:
(π‘ˆπ‘π‘π‘’π‘Ÿ π‘π‘œπ‘’π‘›π‘‘ π‘œπ‘“ 𝐢𝐼 − πΏπ‘œπ‘€π‘’π‘Ÿ π‘π‘œπ‘’π‘›π‘‘ π‘œπ‘“ 𝐢𝐼)
2
• So moderate overlap is:
(
MGMT 6971
PSYCHOMETRICS
𝑀𝑂𝐸1 + 𝑀𝑂𝐸2
)/2
2
© 2014, Michael Kalsher
29
Sample Size & Statistical
Significance
• Because MOE is a result of sample size
(via the confidence interval), small
differences can be significant in large
samples, and large differences might
not be significant in small samples.
– This is because larger samples have more
power to detect effects when they exist.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
30
Effect Sizes:
The Correlation coefficient
The statistical test only tells us whether it is safe to
conclude that the means come from different populations.
It doesn’t tell us anything about how strong these
differences are. So, we need a standard metric to gauge
the strength of the effects.
The correlation coefficient (r) is one metric for gauging
effect size.
• Ranges from 0 – 1 (no effect to perfect effect)
• Rough cutoffs (nonlinear, that is twice the r value
doesn’t necessarily mean twice the effect)
– 0.10 – small effect (explains 1% of the variance)
– 0.30 – medium effect (explains 9% of the variance)
– 0.50 – large effect (explains 25% of the variance)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
31
Effect Sizes:
The coefficient of determination
The statistical test only tells us whether it is safe to
conclude that the means come from different populations.
It doesn’t tell us anything about how strong these
differences are. So, we need a standard metric to gauge
the strength of the effects.
r2 (r-Square), or the “Coefficient of Determination”, is one
metric for gauging effect size.
Rules of Thumb regarding effects sizes:
Small effect: 1-3% of the total variance
Medium effect: 10% of the total variance
Large effect: 25% of the variance
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
r2
=
SSM
SST
32
Effect Sizes:
Cohen’s d
– Uses the same unit for all data (standard deviation
units)
– Provides information about the signal-to-noise
ratio – how large is the effect in comparison to
other effects on the same data?
– = (the difference of the means) divided by the
standard deviation
– Effect cutoffs (but remember this is only rough):
• 0.2 – small
• 0.5 – medium
• 0.8 – large
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
33
Meta-Analysis
• An average of the effect size of multiple
studies that all address the same question
– Weighted to favor more precise studies over less
precise ones
• Useful for getting the most accurate
information about the population as a whole
• Not easily done in SPSS
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
34
Reporting Statistical Models
• APA recommends exact p-values for all reported
results; best to include an effect size, too
– Effect “x” was not statistically significant in condition y, p =
.24, d = .21
• Report a mean and the upper and lower boundaries
of the confidence interval as M = 30, 95% CI [20,40]
– If all confidence intervals you are reporting are 95%, it’s
acceptable to say so and then later say something like:
In this condition, effect x increased, M = 30 [20,40].
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
35
Essential Elements of Research:
Reliability, Validity, Control and Importance
Reliability
Getting the same result when a measurement device is applied to the
same quantity repeatedly.
Validity
The extent to which a measurement tool (test,
device) measures what it purports to measure.
Control
Behavior can be influenced by many factors, some known and others
unknown to the researcher. Control refers to the systematic
methods employed by a researcher to reduce threats to the
the study posed by extraneous influences on the behavior of
participants and the observer.
validity of
both the
Importance
MGMT 6971
Does the research question we are trying to answer warrant the
expenditure of resources (i.e., time, money, effort) that will be
required to complete the study).
PSYCHOMETRICS
© 2014, Michael Kalsher
36
Types of Reliability
Test-retest Reliability
Consistency of measurement over time
Internal Consistency
Inter-item correlation
Interrater Reliability
Level of agreement between independent
Agreement
observers of behavior(s). Assessed via Agreement + Disagreement x 100
correlation or the procedure at right.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
37
Evaluating Measures:
Effective Range
Effective Range:
Scales sensitive enough to detect differences among one
group of subjects may be insensitive to detect differences
among another.
Scale Attenuation (or range restriction).
A problem associated with scales not ranging high enough,
low enough, or both.
Leads to “ceiling” effects and “floor” effects that distort data
by not measuring the full range of a variable.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
38
Types of Validity
Face validity. The (non-empirical) degree to which a test
appears to be a sensible measure.
Content validity. The extent to which a test adequately
samples the domain of information, knowledge, or skill that
it purports to measure.
Criterion validity.
Now (concurrent) and Later (predictive).
Involves determining the relationship (correlation) between
the predictor (IV) and the criterion (DV).
Construct validity. The degree to which the theory or
theories behind the research study provide(s) the best
explanation for the results observed.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
39
Internal vs. External Validity
Internal Validity
Extent to which causal/independent variable(s) and no
other extraneous factors caused the change being
measured.
External Validity (generalizability)
Degree to which the results and conclusions of your
study would hold for other persons, in other places,
and at other times.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
40
Threats to Internal Validity:
Factors that reduce our ability to draw valid conclusions
Selection
History
Maturation
Repeated Testing
Instrumentation
Regression to the mean
Subject mortality
Selection-interactions
Experimenter bias
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
41
Reducing Threats to Internal Validity
The role of Control
Behavior is influenced by many factors termed—confounding
variables—that tend to distort the results of a study, thereby
making it impossible for the researcher to draw meaningful
conclusions. Some of these may be unknown to the researcher.
Control refers to the systematic methods (e.g., research
designs) employed to reduce threats to the validity of the study
posed by extraneous influences on both the participants and the
observer (researcher).
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
42
Group/Selection threat
Occurs when nonrandom procedures are used to assign
subjects to conditions or when random assignment fails
to balance out differences among subjects across the
different conditions of the experiment.
Example:
A researcher is interested in determining the factors most likely to
elicit aggressive behavior in male college students. He exposes
subjects in the experimental group to stimuli thought to provoke
aggression and subjects in the control group to stimuli thought to
reduce aggression and then measures aggressive behaviors of the
students. How would the selection threat operate in this instance?
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
43
History threat
Events that happen to participants during the
research which affect results but are not linked to
the independent variable.
Example:
The reported effects of a program designed to improve
medical residents’ prescription writing practices by the
medical school may have been confounded by a self-directed
continuing education series on medication errors provided to
the residents by a pharmaceutical firm's medical education
liaison.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
44
Maturation threat
Can operate when naturally occurring biological or
psychological changes occur within subjects and
these changes may account in part or in total for
effects discerned in the study.
Example:
A reported decrease in emergency room visits in a long-term
study of pediatric patients with asthma may be due to subjects
outgrowing childhood asthma rather than to any treatment
regimen introduced to treat the asthma.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
45
Repeated testing threat
May occur when changes in test scores occur not
because of the intervention but rather because of
repeated testing. This is of particular concern when
researchers administer identical pretests and
posttests.
Example:
A reported improvement in medical resident prescribing
behaviors and order-writing practices in the study previously
described may have been due to repeated administration of the
same short quiz. That is, the residents simply learned to provide
the right answers rather than truly achieving improved
prescribing habits.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
46
Instrumentation threat
When study results are due to changes in instrument
calibration or observer changes rather than to a true
treatment effect, the instrumentation threat is in
operation.
Example:
In Kalsher’s Experimental Methods and Statistics course, he
evaluates students progress in understanding principles of research
design at week 3 of the semester. A graduate T.A. evaluates the
students at the conclusion of the course. If the evaluators are
dissimilar enough in their approach, perhaps because of lack of
training, this difference may contribute to measurement error in
trying to determine how much learning occurred over the semester.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
47
Statistical Regression threat
The regression threat can occur when subjects
have been selected on the basis of extreme
scores, because extreme (low and high) scores in
a distribution tend to move closer to the mean (i.e.,
regress) in repeated testing.
Example:
if a group of subjects is recruited on the basis of extremely high
stress scores and an educational intervention is then implemented,
any improvement seen could be due partly, if not entirely, to
regression to the mean rather than to the coping techniques
presented in the educational program.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
48
Experimental Mortality threat
Experimental mortality—also known as attrition,
withdrawals, or dropouts—is problematic when there
is a differential loss of subjects from comparison
groups subsequent to randomization, resulting in
unequal groups at the end of a study.
Example:
Suppose a researcher conducts a study to compare the effects of a
corticosteroid nasal spray with a saline nasal spray in alleviating
symptoms of allergic rhinitis (irritation and inflammation of the nasal
passages). If subjects with the most severe symptoms preferentially
drop out of the active treatment group, the treatment may appear
more effective than it really is.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
49
Selection Interaction threats
A family of threats to internal validity produced
when a selection threat combines with one or
more of the other threats to internal validity.
When a selection threat is already present, other
threats can affect some experimental groups,
but not others.
Example:
If one group is dominated by members of one fraternity
(selection threat), and that fraternity has a party the night
before the experiment (history threat), the results may be
altered for that group.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
50
Threats to External Validity:
Ways you might be wrong in making generalizations
People, Places, and Times
Demand Characteristics
Hawthorne Effects
Order Effects (or carryover effects)
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
51
People threat:
Are the results due to the unusual
type of people in the study?
Example:
You learn that the grant you submitted to assess average
drinking rates among college students in the U.S. has been
funded. In late November, you post an announcement
about the study on campus to get subjects for the study.
100 students sign up for the study. Of these, 78 are
members of campus fraternities; the other 22 are members
of the school’s football team.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
52
Places threat:
Did the study work because of the
unusual place you did the study in?
Example:
Suppose that you conduct an “educational” study in a
college town with lots of high-achieving educationallyoriented kids.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
53
Time threat:
Was the study conducted at a peculiar time?
Example:
Suppose that you conducted a smoking cessation study
the week after the U.S. Surgeon General issued the well
publicized results of the latest smoking and cancer studies.
In this instance, you might get different results than if you
had conducted the study the week before.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
54
Demand Characteristics
Participants are often provided with cues to the
anticipated results of a study.
Example:
When asked a series of questions about depression, participants
may become wise to the hypothesis that certain treatments may
work better in treating mental illness than others. When participants
become wise to anticipated results (termed a placebo effect), they
may begin to exhibit performance that they believe is expected of
them.
Making sure that subjects are not aware of anticipated outcomes
(termed a blind study) reduces the possibility of this threat.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
55
Hawthorne Effects
Similar to a placebo, research has found that the mere
presence of others watching a person’s performance
causes a change in their performance. If this change is
significant, can we be reasonably sure that it will also
occur when no one is watching?
Addressing this issue can be tricky but employing a
control group to measure the Hawthorne effect of those
not receiving any treatment can be very helpful. In this
sense, the control group is also being observed and will
exhibit similar changes in their behavior as the
experimental group therefore negating the Hawthorne
effect.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
56
Order Effects (carryover effects)
Order effects refer to the order in which treatment
is administered and can be a major threat to
external validity if multiple treatments are used.
Example:
If subjects are given medication for two months, therapy for another
two months, and no treatment for another two months, it would be
possible, and even likely, that the level of depression would be least
after the final no treatment phase. Does this mean that no treatment
is better than the other two treatments? It likely means that the
benefits of the first two treatments have carried over to the last phase,
artificially elevating the no treatment success rates.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
57
The Role of Experimental Design
In most social and behavioral research studies, we attempt to
obtain at least one score from each participant (usually
more!). Any obtained score is comprised of a number of
components:
1. A ‘true score’ for the thing we hope we are measuring.
2. A ‘score for other things’ that we measure inadvertently.
3. Systematic (non-random) bias (usually ok as long as it affects all
participants equally).
4. Random (non-systematic) error (which should cancel out over large
numbers of observations).
We want our obtained score to consist of as much ‘true
score’, and as little of the other factors, as possible.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
58
Research Study Control
Control removes sources of error in inferences
– Reduces the chance of wrong conclusions
– Increases the power of statistics to find
relationships in the presence of random error
(“noise”)
Types of Control
– Direct Manipulation
– Randomization
– Statistical Control
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
59
Types of Control:
Direct Manipulation
Sources of error held constant by
research design or sampling decisions
– Example: a researcher investigating the effects of
seeing justified violence in video games on
children knows that young children cannot
interpret the motives of characters accurately.
She decides to limit her study to older children
only, to eliminate random responses or
unresponsiveness of younger children.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
60
Types of Control:
Randomization
Unknown sources of error are equalized
across all research conditions by randomly
assigning subjects or by randomly choosing
experimental materials.
– Example: Many different factors are known to affect
the amount of use of Internet social networking sites. A
researcher wants to test two different site designs. He
randomly assigns subjects to work with each of the two
designs. This equalizes the amount of confounding
error from unknown factors in both groups.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
61
Types of Control:
Statistical Control
Known confounding variables are measured,
and mathematical procedures are used to
remove their effect.
– Example: A political communication researcher
interested in studying emotional appeals versus
rational appeals in political commercials suspects that
the effects vary with the age of the viewer. She
measures age, and uses it as an independent predictor
(with multivariate statistics) to isolate, describe, and
remove its effect.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
62
Contrasting Methods of Control
Type of
Control
Strength
Weakness
Direct
Manipulation
• Removes effect completely
• Must know source of effect
• Reduces generalizability
Randomization
• Don’t have to know source of
effect
• Equalizes effect so there is no
systematic confound
• Reduces statistical power
by adding to unsystematic
error variance
Statistical
control
• Estimates effect of
confounding variables
• Expands theoretical model
• Must know source of effect
• Requires more complex
statistics
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
63
Basic Types of Research
• Observational Methods
• Quasi-Experimental Designs
• True Experimental Designs
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
64
Observational Methods
No direct manipulation of variables by the
researcher. Behavior is merely recorded--but
systematically and objectively so that the
observations are potentially replicable.
Advantages
•
•
Reveals how people normally behave.
Experimentation without prior careful observation can lead to a
distorted or incomplete picture.
Disadvantages
•
•
Generally more time-consuming.
Doesn’t allow identification of cause and effect.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
65
Quasi-Experimental Design
In a quasi-experimental study, the experimenter
does not have complete control over manipulation
of the independent variable or how participants
are assigned to the different conditions of the
study.
Advantages
•
•
Natural setting
Higher face validity (from practitioner viewpoint)
Disadvantages
•
Not possible to isolate cause and effect as conclusively as with a
“true” experiment.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
66
Types of
Quasi-Experimental Designs
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
67
One Group Post-Test Design
Measurement
Treatment
Time
Change in participants’ behavior may or may not be
due to the intervention.
Prone to time effects, and lacks a baseline against
which to measure the strength of the intervention.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
68
One Group Pre-test Post-test Design
Measurement
Treatment
Measurement
Time
Comparison of pre- and post-intervention scores
allows assessment of the magnitude of the
treatment’s effects.
Prone to time effects, and it is not possible to
determine whether performance would have
changed without the intervention.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
69
Interrupted Time-Series Design
Measurement
Measurement
Time
Measurement
Treatment
Measurement
Measurement
Don’t have full control over manipulations of
the IV. No way of ruling out other factors.
Potential changes in measurement.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
Measurement
70
Static Group Comparison Design
Group A:
Treatment
Measurement
(experimental group)
Group B:
No Treatment
Measurement
(control group)
Time
Participants are not assigned to the conditions randomly.
Observed differences may be due to other factors.
Strength of conclusions depends on the extent to which
we can identify and eliminate alternative explanations.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
71
Experimental Research:
Between-Groups and
Within-Groups Designs
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
72
Between-Groups Designs
Separate groups of participant are used for each
condition of the experiment.
Within-Groups (Repeated Measures) Designs
Each participant is exposed to each condition of
the experiment (requires less participants than
between groups design).
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
73
Between-Groups Designs
Advantages
•
•
•
Simplicity
Less chance of practice and fatigue effects
Useful when it is not possible for an individual to
participate in all of the experimental conditions
Disadvantages
•
•
Can be expensive in terms of time, effort, and number of
participants
Less sensitive to experimental manipulations
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
74
Examples of
Between-Groups Designs
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
75
Post-test Only / Control Group Design
Group A:
Measurement
Treatment
(experimental group)
Random
allocation:
Group B:
Measurement
No Treatment
(control group)
Time
If randomization fails to produce equivalence, there is no way
of knowing that it has failed. Experimenter cannot be certain
that the two groups were comparable before the treatment.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
76
Pre-test / Post-test Control Group
Design
Group A:
Measurement
Treatment
Measurement
No Treatment
Measurement
Random
allocation:
Group B: Measurement
Time
Pre-testing allows experimenter to determine equivalence
of the groups prior to the intervention. However, pretesting may affect participants’ subsequent performance.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
77
Random allocation:
Solomon Four-Group Design
Group A: Measurement
Treatment
Measurement
Group B: Measurement
No Treatment
Measurement
Group C:
Treatment
Measurement
Group D:
No Treatment
Measurement
Time
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
78
Within-Groups Designs:
Repeated Measures
Advantages
• Economy
• Sensitivity
Disadvantages
• Carry-over effects from one condition to another
• The need for conditions to be reversible
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
79
Repeated-Measures Design
Treatment
Measurement
No Treatment
Measurement
Measurement
Treatment
Measurement
Random Allocation
No Treatment
Time
Potential for carryover effects can be avoided by randomizing the order
of presentation of the different conditions or counterbalancing the order
in which participants experience them.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
80
Latin Squares Design
Three Conditions or Trials
order of conditions or trials:
One group of participants
A
B
C
Another group of participants
B
C
A
Yet another group of participants
C
A
B
Order of presentation of conditions in a within-subjects design can be
counterbalanced so that each possible order of conditions occurs just once.
Problem not completely eliminated because A precedes B twice, but B precedes
A only once. Same with C and A.
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
81
Balanced Latin Squares Design
Four Conditions or Trials
order of conditions or trials:
One group of participants
A
B
C
D
Another group of participants
B
D
A
C
Yet another group of participants
D
C
B
A
And yet another group of participants
C
A
D
B
Note: This approach works only for experiments with an even number of conditions. For
additional help with more complex multi-factorial designs, see: http://www.jic.bbsrc.ac.uk
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
82
Factorial Designs
• include multiple independent variables
• allow for analysis of interactions
between variables
• facilitate increased generalizability
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
83
Important Concepts
Alternative hypothesis
Dispersion
Null hypothesis
Score-level variable
Standard Deviation
Between-groups design
Effect Size
Observational study
Skew
Standard Error
Categorical variable
Experimental research
One-tailed test
Standard Deviation
Systematic variation
Central tendency
Face validity
Ordinal variable
Standard Error
Two-tailed test
Confidence intervals
Frequency distribution
Outcome variable
Systematic variation
Type I error
Confounding variable
Independent variable
Platykurtic
Two-tailed test
Type II error
Construct validity
Kurtosis
Power
Type I error
Unsystematic variation
Content validity
Leptokurtic
Practice effects
Type II error
Validity
Continuous variable
Level of Measurement
Predictor variable
Unsystematic variation
Variance
Correlational research
Mean
Quasi-exp. research
Validity
Within-groups design
Counterbalancing
Measurement error
Randomization
Variance
z-scores
Criterion validity
Median
Range
Within-groups design
Degrees of Freedom
Mode
Reliability
z-scores
Dependent variable
Nominal variable
Repeated measures
Score-level variable
Discrete variable
Normal Distribution
Sampling distribution
Skew
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
84
Download