- Syneratio

Collegesamenvatting Experimental Research
HC 1: Introduction
Experimental research – key features:
An experimental researcher :
A. Tries to provide insights into the behavior of consumers
B. By identifying causal relationships between variables
C. Using controlled experiments
A – Behavior of consumers. Experimental research studies various influences on behavior
- Situational influences (external/contextual differences) vb. 3 vs. 7 commercials op attitude target ad
- Chronic influences (individual differences)  vb. Low involvement vs. High involvement consumers
- Interactions between situational and chronic influences  LI vs. HI vs. Aantal commercials
Behavior is not only about wht people do and their acts, but also their thoughts, emotions, physiological
reactions, etc……
Consumer behavior: subdomain in marketing based on psychology.
B – Causal relationships – 3 key features of CR
1. The cause must be related to the effect
Evidence that IV and DV together in predicted ways
2. The cause must precede the effect
Evidence that IV occurs before DV to solve directionality problem.
o Aggressive behavior and watching aggressive movies
o Self-esteem and addictive behavior
3. No plausible alternative explanations must exist for the effect other than the cause
Evidence that IV is the only factor affecting DV to solve third variable problem
o Car color and accident rate
o Maternal smoking and later criminality
C – Experiments
Main data collection method when conducting experimental research. Basic idea is quite simple:
1. Manipulate IV
a. Give treatments to different groups of participants (conditions)
2. Measure effects on DV
a. Record the responses following the treatments
3. Control for confounding factors
a. Keep everything else between groups identical
Experimental research – Settings
Laboratory experiments
True experimental designs
Higher internal validity
Lower external validity
E.g., unconscious thinking study
Field experiments
Quasi experimental designs
Lower internal validity
Higher external validity
E.g. towel reuse study
Experimental research: (dis)advantages
Causal inferences can be made because problems of directionality and third variables are ruled
Inadequate method of scientific inquiry because it views behavior as mechanic and maniputable
Artificial because in highly controlled environments
Only limited number of variables can be investigated at the same time
HC 2: Theoretical framework
Problem identification
Sources of research ideas
Real life experiences (we’re all customers)
Previous research and theory
o Conflicting findings
o Boundary conditions
o Find explanation for findings
o Propositions of a theory that are not tested yet
o Applying a theory to a consumer setting
Further define your topic by identifying all relevant factors
What do you want to explain/clarify?
How do you want to explain it?
What is your field of interest?
This leads to your problem statement
Good research problems should both
1. Have real life relevance (implications for managers or consumers)
2. Contribute to current knowledge (theoretical contribution)
 low: consumers rely on heuristics
 high: consumers rely on ratio
Consult literature again:
Investigate to what extend your problem has already been researched
Look for relevant theories and prior findings to design your hypotheses
Search in databases: ABI/Inform, Econlit, PsycInfo, Web of Science etc.
Only use articles that were published in TOP JOURNALS
A hypothesis is not a vague statement but should be:
As specific as possible
Indicate a clear causal prediction between the variables
Capable of being refuted or confirmed
o Happy moods make consumer mindless
o Thin models are bad
A hypothesis should be based on strong support
Two types of effects can be hypotheses
Main effect:
The effect of one IV on DV over all treatment levels of the other IVs (direct effect)
Interaction effect (for 2 IVs or more)
The effect of one IV on the DV depends on the level of the other IV
IV A with three treatment levels: A1, A2, and A3
IV B with two treatment levels: B1 and B2
no interaction
Voorbeeld: je verwacht een effect als respons op je ad maar
meer in Low Involvement dan in High Involvement!
Example A
Hypothesis – strong argumentation results in higher product attitudes than weak argumentation but
only when consumers are negative and not when they are in a positive mood.
Support – ELM model (argumentation strength only affects product attitudes when likelihood of
elaboration is high) – Mood research (happy individuals elaborate less than unhappy individuals).
Low involved
Argument strength
Consumer Attitude
High involved
H1: Positive mood
H2: Negative
Example B
Hypothesis: participants will demonstrate higher self-esteem after exposure to moderately thin models
than after exposure to moderately thick models and lower self-esteem after exposure to extremely thin
models than after exposure to extremely thick models.
Support: Selective accessibility model (Mussweiler 2003) – standard selection – holistic similarity
assessment – self evaluation (assimilation or contrast)
Thin models
Size of models
Self esteem
Thick models
H1: moderately
How extreme?
Example C
Hypothesis – Distraction will result in higher preference for tasted food products and this effect should
be stronger for impulsive than for prudents
Support – Two component model of sensory experiences (pain) “affective component (emotional
reactions)” and “informational component (objective features)” – Research on impulsivity (prudents
think much about objective features and have more accessible cognitions than impulsives)
Statistical conclusion validity
Accuracy of accepting or rejecting your hypotheses
E.g., mood affects product attitudes
Types of hypotheses
Scientific hypothesis (or alternative hypothesis)
o Represents the relationship among the variables examined
o E.g., attitudes are higher when mood is positive compared to negative
Null hypothesis
o A statement of NO relationship among the variables
o E.g., attitudes are NOT higher when mood is positive compared to negative
In an experiment, you test the null hypothesis
As a researcher you expect that the null hypothesis will be rejected
To accept the scientific hypothesis, you must collect evidence to reject the null hypothesis.
Type 1 error
You think you have a sig. effect of an IV on a
DV, but in fact there is NO effect
The higher the confidence level of an
experiment the lower chances of type 1 error
Confidence level is typically set at 95% so
that chance of type 1 error (α) is limited to 5%
- If obtained p-value of the effect ≤.05,
then reject null hypotheses
- If obtained p-value of the effect ≥.05,
then do not reject null hypotheses
Type 2 error
You think you do NOT have a sig effect of an
IV on a DV, but in fact there is an effect
The higher the sensitivity (power) of an
experiment the lower chance of Type 2 error.
Sensitivity (power) is a function of 3 things:
Alpha level, but set to 5%
Simple size (# of participants)
o The larger the sample size
the higher the sensitivity
o In practice: 20 PP per
condition (between-participants)
Effect size (magnitude of relation between IV and DV)
o Better operationalization of variables
o Better control of the experimental setting
HC 3: Variable operationalization
Manipulation of IVs
Different steps to manipulate your IVs
1. Decide on the number of IVs in your experimental design
2. Create different treatment levels of your IVs
3. Find ways to experimentally induce the various treatments
Number of IVs (1)
Theoretically no limit to the number that can be used, but practically there is a limit
Number of participants needed
Interpretation of data analysis (hoe meer “number”, hoe meer mensen nodig)
Limitations – only include multiple IVs when:
The effect of one IV is expected to depend on the level of other IVs
o One IV takes the role as moderator (in praktijk)
They can be manipulated orthogonally
o 2 VAR = 3 effects
o 3 VAR = 7 effects – (A;B;C;A-B;A-C;B-C;A-B-C)
Vorig college – example A (IV: argumentation strength; moderator: mood) – example B (IV: model size –
moderator: extremity) – example C (IV: distraction; moderator: decision type)
Creating treatments (2)
There are three possibilities to create treatment levels
A. Presence versus absence
B. Amount of a variable
C. Type of a variable
A – presence vs. absence - Idea
One group receives the treatment condition (experimental condition) ant the other group does
not (control condition)
Aim is to tell if a variable/treatment has an effect
B – amount of a variable - Idea
Administer different amounts of the variable to each of several groups
Aim is not only to tell if a variable has an effect but also to examine what influence varying
amounts of the IV have
Combination of presence vs. absence and amount of a variable is sometimes possible
C – type of a variable – Idea
Vary the type of variable under investigation
Aim is to tell if different types of a variable cause variation on the DV
immediately reward vs. future reward
othology kun je checken kijken of alle
condities kunnen “are possible”, 2x2 design
not orthology in no condition kun je geen onderscheid tussen crea/typical
Dus eigenlijk geen moderator, maar AD met 3 niveau’s TP/CRE/NO
Inducing treatments (3)
There are several possibilities to induce the various treatments
A. Instructional manipulation
B. Event manipulation
C. Individual manipulation
A - Instructional manipulation
Variation in the IV is caused by differences in instructions provided by the experimenter
Example of instructional mood inductions
Writing a report of a negative or positive personal event
Important to be careful with manipulations
You have to make sure that everyone interprets the instructions in the same way
Instructions should be clear and unambiguous
B – event manipulation
Changes in the physical and social environment/context or changes in exposed stimuli
Examples of contextual mood inductions
C – Individual difference manipulation
Aka personality variables; a measurement of a variable on which individuals may differ
E.g., need for cognition, self-monitoring, self-esteem, anxiety, impulsivity
Example of an individual difference mood induction
Mood assessment scales (e.g., positive affect negative affect schedule (PANAS) scale)
The way you treat the individual difference variable in your analysis matters
Us it as a continuous variable  regression analysis (dit wordt meestal gebruikt is ook de beste)
Split the variable in high vs. low based on some preset cut-off value or median  ANOVA
Personality variables: just measures it instead of manipulate it, meestal wordt median splitter gebruikt
Main problem with this type of manipulation
The IV is measured instead of experimentally manipulated
Thus, another variable correlated with the might be the real cause driving the effects
E.g., need for cognition, verbal intelligence and life satisfaction
E.g., mood, self-esteem, and motivation
Therefore, studies applying an individual difference manipulation are often followed by studies directly
manipulating the IV. E.g., need for cognition followed by an instructional manipulation (think hard vs.
don’t think too much)
Inducing treatments:
Finding reliable manipulations
Ease with which the IV is translated into operational terms varies greatly depending on the
abstractness of the construct.
o Exposure time, model size, number of arguments
o Frustration, processing mindsets, product involvement
Three techniques to increase chances of successful IV manipulations
1. Look for manipulations in the literature: look in the literature for manipulations that consistently
and successfully produced the desired treatments, but adapt for own population. Example,
Velten mood induction procedure (1968)
2. Add manipulation check: Test whether experimental manipulation induced the expected
treatments. Examples: induction of mood in participants AND measure mood at the entire end
of the experiment.
3. Pretest your treatments extensively: Helps to design valid operationalizations of IVs. Test
whether the experimental manipulation will induce the expected treatment (if manipulation
check would reveal the aim of the study). Ad model study. First pretest: participant rated 23 ad
models in terms of size and attractiveness  selection of 4 extremely and 4 moderately
thin/heavy models. Differed in terms of size but not in terms of attractiveness. Second pretest
confirmed that moderate models were perceived as more similar than extreme models.
Combinations of manipulations possible
Manipulations of same IV
E.g., two mood manipulations
o Reading sad vs. happy stories
o Listening to sad vs. happy music
Single factor design containing 2 conditions
Manipulations of different IVs
E.g., mood and argumentation strength
o Listening sad vs. happy music
o Exposed to strong vs. weak arguments
Two factor design containing 4 conditions: 2 (mood: pos, neg) x 2 (arg. Strength: strong, weak)
Order of manipulating multiple IVs
What makes most sense?
Option 1 dus!!!
Measurement of DV
Different steps to measure DV
1. Decide on the number of DVs
2. Decide on the specific measure of the various DVs
1 – number of DVs
More than one DV possible when IVs are expected to affect multiple constructs (meer kans op een
effect te vinden)
How to analyze?
When DVs are moderately correlated: ANOVA
o Protection against Type 1 errors
o More powerful
o Easier to analyze
When DVs capture unrelated constructs: Multiple ANOVAs
o Sometimes one measurement is mediation (underlying) variable: mediation analyses
2 – Measurement of DVs
Behavior can be continuous (interval or ratio scale)
Time of investigating products, decision time
Number of word puzzles you resolve
7 –point attitude/ purchase intention scales
Appearance self-esteem scale (Heatherton and Polivy 1991)
Behavior can be discrete (nominal scale): choice
Product choice
E.g., choice between tasted food product and well-know alternative
!!!! Select a continuous DV when possible because (a) more sensitive and (b) easier to analyze!
Example A
Approval of an increase in student fees from 1 (strongly disapprove) to 9 (strongly approve)
(continuous DV)
Indication of the fee they would consider appropriate (continuous DV)
Number of thoughts that came to mind within 3 min (continuous DV)
Example B
Example C
Choice between Lindt (sampled product) and Godiva (non-sampled product) (discrete DV) 0-1
Finding reliable DV measurements
1. Get your participants committed
a. Make the experiment interesting, attractive, fun
b. Give rewards to motivate them, but this may reduce external validity (motivatie
hierdoor hoger dan in real life)
2. Control of participant interpretation
a. Retrospective verbal report
i. Ask each individual after the experiment about what they think the purpose of
the experiment was
b. Concurrent verbal reports
i. Ask them to report their thought during the experiment
3. Disguise as much as possible the aim of the study (most important one!)
a. Disguise the relationship between IV and DV
i. Take care of the cover story of the study (deception)
1. Ad models: magazine attractiveness, target audience
ii. Use different (ostensibly unrelated) phases (with different experimenters)
1. Next different study on students’ self-esteem
b. Disguise the DV measurement
i. Outside lab setting (e.g., reward choice)
ii. In unobtrusive way (e.g., taped, timed behavior)
iii. Use filler items (beware of their influence)
Construct validity
Accuracy of the operationalizations
How good do they represent intended constructs?
Main treats to construct validity
1. Reactivity to experimental situation
2. Experimenter effect
1. Reactivity
a. Motives of participants can influence their perception of the experiment and their
b. Two main types of reactivity
i. Demand characteristics
1. Participant try to get purpose of study and answer accordingly
ii. Positive self-presentation
1. Participants try to appear as positive as possible
c. Techniques to avoid reactivity
i. Disguise purpose of study as much as possible = (partial) blind experiment
ii. Control of participant interpretation
Antwoorden wat gewenst is bijvoorbeeld onderzoek unilever, positiever zijn, we love unilever, double
blind (lab assistant weet ook van niks)
2. Experimenter effect
a. Experimenter has motive to support the hypothesis which can unintentionally lead to
recording errors
b. Techniques to reduce experimenter effect
i. Control of recording errors
1. Use multiple data recorders or have participants make response on a
ii. Control of attribute errors
1. Preferably use the same experimenter in al treatment conditions
iii. Control of experimenter expectancies
1. (partial) blind technique: experimenter is not aware of the condition the
participant is assigned to
Find evidence to support hypothesis.
HC 4: Control
Internal validity: refers to the accuracy of the inference that the IV caused the effect observed in the DV.
Accuracy with which you can 100% say you found a causal relationship. Example:
What is the effect of argumentation strength on product attitudes under negative mood
o Sad participants are exposed to ads with either strong or weak arguments
o Strong > weak : favorable attitude
o However, participants in the strong arguments condition are more familiar with the
product compared to participants in the weak arguments condition
o Familiarity is an extraneous variable  did not control for this variable so the internal
validity of our experiment is low!!!!
Threats to internal validity
History: any event occurring after the experimental treatment (IV) is introduced that could
produce the observed effect. Example  test market of new ad campaign, after half a year,
measurement sales changes, however, in the meantime, a competitor went bankrupt (other
explanation for increase in sales)
Maturation: changes in biological and psychological conditions that occur with the passage of
time. Example  product attitude measured under low distraction. Filler task 30 minutes.
Product attitude measured under high distraction. Distraction results in lower product
evaluations. Participants got bored/tired/stressed, that’s why they gave more negative reactions.
Instrumentation: changes in the assessment of the DV. Example  effect of package color on
customer approach in supermarket. Observer measure customer approach. 2 hours package A;
next 2 hours package B. Package B result in less approach compared to package A. Alternative
explanation: after 2 hours observer gets more tired so misses approaches.
Testing: Prior measurement of DV affects results of subsequent measurement (multiple
measurement). Example  Measurement of product attitude at time 0. Exposure to product ad
at time 1. Measurement of product attitude at time 2. Increase in product attitude from time 0
to time 2. Alternative explanation: people have mere exposure effect, so they react more
positive. Also, social desirability, because people start to realize they are being tested!
Regression artifact: tendency of extreme scores to become less extreme on a second
assessment. Example 1  testing the effect of noise on product attitude. Select participants
with high product attitude. Expose them to noise during product evaluation. Observation of
decrease in product attitude. So noise reduces product attitude? No, extremely high scores once
they got in an experiment they ease down their scores. Example 2  last year: particularly high
accident rates. As a consequence: new traffic policy that was introduced. Subsequent year: drop
in accident rates. Conclusion: traffic policy was effective. Altenative explanation: you don’t know
if this is due to the new traffic policy!
Attrition (experimental mortality): Some participants do not show up or do not fulfill the test.
Example  effect of product experiences on attitudes unknown brand. Week 1: exposure to
product information and measurement of product attitude. Week 2: measurement of product
attitude. Increase in product attitude observed. Only 60% returned for week 2, participants who
didn’t like the product, didn’t show up anymore so overall attitudes are more positive!
Selection: unequal distribution of participant-related characteristics over conditions. Example 
Does following the lectures results in better course performance? Compare grades of students
who followed lectures with those who did not. Students who followed the lectures obtained
higher grades however, maybe tutored students were more motivated. Thus, motivation may be
the explaining factor for the higher grades. Not quality of lectures but intrinsic motivation of
Additive and interactive effects
Threats to internal validity can operate simultaneously. Typically in combination with selection.
Example: Test of new store layout  store A gets new layout whereas store B
keeps old layout. Competitor located close to store A starts ad campaign. New
store layout did not increase sales. Selection – History effect. One group of
participants is different to other group. Ad campaign of competitor have
stronger influence on one group than on other.
Controlling for confounds
Two ways to control for potential confounding factors
A. True experimental design (HC 4)
a. Control/comparison condition
b. Assignment of participants to conditions
B. Statistical control (HC 10)
a. Add potential confounding factors as covariates in your analysis (ANCOVA)
b. See lecture 10 on data analysis: final issues
Control conditions
Control condition is condition that does not get the treatment or a standard value
o Serves as source of comparison to the experimental group
o Controls for rival hypothesis (by controlling for extraneous variables)
o The effect of alcohol on risk taking
o Placebo effect
o Hawthorne effect
Patients that suffer pain
Experimental: medicin
Control group: no medicin
Placebo contro: give placebo
Real effect is post-test yellow –
to increase productivity, increase lighting,
random employees in corner with lighting,
productivity omhoog, decrease of lighting of
other corner, also productivity omhoog.
Conclusion: smaaler groups makes employees
feel more special and increase productivity.
Should have put small group in corner with
normal lighting AND small group in corner with
increased lighting.
Assigning participants
a) Between-participants design (only measured once!)
a. Each participant is only exposed to one treatment
b. Independent measures design
b) Within-participants design (repeated measures)
a. Each participant is exposed to all treatments
b. Repeated measures design
c) Mixed design
a. Combination of between- and within-participants design
b. Hybrid design
a)- between participants design – Example (Sela, Berger, and Liu 2009)
Hypothesis: when assortment size is larger, consumers choose products that are easier to justify
Experiment 1a – 2(assortment size: smaller vs. larger) between-participants design
Choice for reduced fat ice cream: 20% in smaller assortment size condition ; 37% in larger assortment
size condition.
Experiment 4 – 2 (assortment size: smaller vs. larger) x 2(licensing: low vs. high) between-participants
Choice for work (vs. fun) laptop
Licensing condition
o Smaller assortment
o Larger assortment
Wie kiest work? Alleen low kolom zelfde effect als ice cream. Low licensing  imagine 3 hours of
hedonic activity. High licensing  imagine 3 hours community service. Then choice between work and
fun laptops! Conclusion: justification is driving the choice for the laptop!!!
Major threat to internal validity  selection bias!
Control techniques
o Randomly assigning participants across conditions (non-randomly selected sample but
that is randomly distributed over different groups)
o Should not be confused with random selection
o Matching participants for an extraneous variable
o Results in more sensitivity (vb. Alleen mannen!)
when using matching?
1. Small N and so randomization is risky and might yield unequal groups on influential extraneous
2. Matching variable is expected to be correlatedwith the DV and so exert an effect on it (confound)
3. There is a way to measure participants on the matching variable in advance.
Matching by holding variables constant
All individuals in all conditions will have the same degree or type of extraneous variable. Examples 
have only males, impulsive consumers, introverts, price sensitive consumers in your design
Disadvantages: restricts the population size and restricts generalization to the type of participants in the
Matching by equating participants
Participants in the various conditions are explicitly equated on the extraneous variable
How can we equate participants?
Precision control
o Each participants is matched with other participants of same age, gender, …., and
randomly assigned to conditions
o Excellent for increasing sensitivity
o But: unpractical + many exclusions
Frequency distribution control
o Equate overall distribution of selected variables
o But: combination of variables may be mismatched (not perfect but better than not
matching the participants)
Matching by building extraneous variable in the model
Treat the extraneous variable as another IV. Example
Effect of argumentation strength on attitude towards brand
This effect may depend on brand familiarity
Brand familiarity measured and build in the model as IV
Should be used only when
The effect of the extraneous variable is interesting
The effect of the IV on the DV will not be the same on all levels of the extraneous variable (i.e.,
extraneous variable functions as a moderator)
Matching by yoked control
the temporal sequence of events in an experiment is kept constant across conditions. Example 
Study an relationship between stress and stomach ulcers
Effect of controlling product information acquisition and product attitude
Should be used when a temporal relationship between event and response is expected.
Also measuring DV before treatment
Given that you randomize or match participants:
Both groups should be similar except for the manipulation of the IV, and also pretests should be
Thus, one can directly observe a change in behavior as a result of the treatment /manipulation
IV by looking at the posttests
In addition pretesting
Costs time and money
May lead to testing effects
Why can then still be interesting to pretest?
Advantages of pretesting
To insure initial comparability
o What if randomization or matching was not successful?
To test for a ceiling or floor effect
o What if individuals have an extremely high or low initial score on the DV?
To test for initial position
o What if participants vary strongly with respect to their initial DV scores?
To obtain evidence of change
o Treatment condition should show a change from pretest to posttest due to treatment;
the control condition should not.
Within – participants design – example
Hypothesis: it is easier to recall high meaningful brands than low meaningful brands
Experiment (option 1): 2 (meaningfulness: low vs. high) within participants design
Major threats: History, instrumentation, maturation, testing effects!
Control technique:
o Techniques to control order effects
 Intraparticipant counterbalancing
 Intragroup counterbalancing
Intraparticipant counterbalancing
Control for ordering effects by changing order of treatments for each individual participants
Intragroup counterbalancing
Control for ordering effects by changing order of treatments for different groups of participants
Instead of randomizing the order within participants we do it between
This has the main advantage that participants only have to take each treatment once
Two main methods of intragroup counterbalancing
1. Complete counterbalancing
2. Incomplete counterbalancing
Mixed design
So, which design should you use?
Use within-participants manipulations because
o More economic
o Higher sensitivity
But in practice often between-participants manipulations are used because
o Conditions are not reversible
o Simplicity
 Procedure(no counterbalancing)
 Statistical analyses (less assumptions)
HC 5: External validity and quasi-experimental designs
Vorige college 5 treatments:
B volgt A 2 x
C volgt A 0 x
D volgt A 2 x
E volgt A 0 x
no more
Niet goed als A moeilijk is heeft het negatievere gevolgen voor Recall B en D dus reverse.
E 2x
C 2x
20 per participant per condition
External validity: the extent to which experimental results can be generalized across people, settings,
treatment variations, outcomes and times
Types of external validity:
Population validity
Ecological validity
Temporal validity
Treatment variation validity
Outcome validity
Population validity: generalizability across populations
Threats: Use convenience sample (e.g., students), Matching (e.g., only females)
Ecological validity: Generalizability across settings or environments
Threats: use of scenario’s, simulated shopping shelves, exposure to ads,…..etc
Temporal validity: Generalizability across time
Threats: (Time between IV and DV: Ad exposure immediately followed by product evaluations, Product
information  recall task)(Different points in time: morning vs. evening, Monday vs. Friday, winter vs.
Treatment variation validity: generalizability manipulations (E.g., can the results obtained with one
mood induction also be obtained with a similar mood induction)
Outcome validity: Generalizability across different but related DVs (E.g., product evaluations and
product choices, implicit vs. explicit measures, different stimuli (products, ads, etc.)
Field experiments:
More external validity because in natural environment
Les internal validity because no complete control over manipulation of IV and/or assignment of
Applying true experimental designs impossible
Instead researchers need to rely on:
o Weak experimental designs
o Quasi experimental designs
Weak experimental designs
 Designs with severe threats to internal
 Different types of weak experimental designs
o One-group posttest-only design
o One-group pretest-posttest design
o Nonequivalent posttest-only design
One-group posttest-only design
The influence of a treatment condition is investigated on only one group of individuals
Weak control: almost all threats to internal validity apply because no control group and no comparison
Rarely used in research settings!
One-group pretest-posttest design
a pretest measurement is taking to serve as baseline
Weak control of threats to internal validity
History: something else happened (advertising)
Maturation: become positive door caring voor belief
Regression to the mean
Testing effect
Nonequivalent posttest-only design
a control group is added to serve as comparison standard but randomization is not possible
Weak because of selection bias and all other threat-selection combinations
Quasi experimental designs: design elements are added to weak designs to reduce chances of internal
validity threats
three main types:
Nonequivalent comparison group design
Interrupted time series design (event study)
o (single) interrupted time series design
o Multiple interrupted time series design
Regression discontinuity design
Control method – combination of nonequivalent
comparison group and pretests
Major threats – selection, and other threats in
interaction with selection
Control method – multiple measurements before and after
Major threat – History
X= change in store lay-out
Tussen Oc5 en Oc6 zit GEEN X treatment.
Bijvoorbeeld. 1 store wel veranderin lay-out en 1
store niet, gevaar: verschillend type klanten =
Control method:
Multiple measurements before and after the treatment
Addition of a nonequivalent control group
Major threats:
Other threats in interaction with selection
Compare de 2 bovenste lijnen!
Regression Discontinuity Design: used to determine if the special treatment some individuals receive has
any effect.
Characteristics of the design
All individuals are pretested
Individuals who score above some cutoff score receive the treatment
All individuals are post-tested
Discontinuity in the regression line indicates a treatment effect
Regression discontinuity design – requirements:
Assignment must be based on the cutoff score
Assignment cannot be a nominal variable as gender, or drug user or nondrug user
The closer the cutoff score is to the mean , the more power
Experimenter should control group assignment
Relationship (linear, curvilinear, etc) should be known
Participants must be from the same population
Selective – History effect
HC 6: Data analysis – introduction
Vorig lecture, experimental designs for field experiments, weak experimental design nooit gebruiken
tenzij het echt niet anders kan!
Descriptives  kijk hier altijd als eerste naar!
Mood affects the number of candies people eat
Collected data in the negative mood condition
What is a good summary statistic of the real behavior in a particular condition?
How well does the summary statistic fits the real data?
Summary statistics
Mode = most common score = 3
Median = Middle score ((n+1)/2) when ranked in order of magnitude = 3
Mean =
2,6 is the mean, niemand eet 2,6 dus error!! (or fit)
Hoe error te berekenen!
Nadeel  fit always becomes slechter bij
meer personen = namelijk simpele optelling!
Variance: The sum of squares is a good measure of overall variability, but is dependent on the number
of scores. We calculate the average variability by dividing by the degrees of freedom. This value is called
the variance (𝑠 2 ):
Variance is al beter, 1.3 candies^2 (niet echt daarom wortel  SD)
Standard deviation: the variance has one problem: it is measured in units squared. This isn’t a very
meaningful metric so we take the square root. This is the standard deviation (s):
number of candies, dit was met mean!!
Calculating fit of the mode/median
1.22  mode/median is hoog, standard
wordt de mean gebruikt. De eerste methode
Important to remember
Mean is most accurate estimation of observed behavior in particular condition (bij rapporteren, mean
en SD geven)
Sum of squares, variance, and SD represent the same thing
The “fit” of the mean to the data
The variability in the data
How well the mean represents the observed data
How to report this:
“Participants ate les candies in de negative (M=2,6;SD=1.14) than in the positive mood (M=5,7; SD=1,63)”
1. Data should be measured on interval or ratio scale
2. Data should be normally distributed
a. Check for outliers with boxplots
b. Look at histogram
c. Conduct Kolgomorov-Smirnov test/Shapiro-Wilks test
3. Assumptions specific to the analysis
a. Between-participants: homogeneity of variances
b. Within-participants: sphericity
c. Mixed design: both homogeneity and sphericity
Altijd checken voor outliers. Mean
verandert dramatisch. Stel 1 heeft
een 6  dan mean 2,6 4,7
Boxplot kun je doen *6 is een outlier.
(meer dan 3x SD weg van de mean) dan
zegt SPSS, outlier. Maximaal 10% outliers
verwijderen  >10% bad experiment,
beste om experiment opnieuw te doen!
Kun je ook op normality checken!
2 testen SPSS, indication normality. Dit
voorbeeld  geen normality!
What now? 1. Remove outliers 2. If still not normally distributed, transform the data (Log
transformation (log(Xi)), Square root transformation (√Xi), reciprocal transformation (1/Xi) 
transformation (log=special voor large tails aan rechterkant)
Delete 6, hier dan wel normality!
Randomly assigned to 1 condition. Difference
SPSS Levene’s test p=0,93  Goed!! Je wilt niet sig.
hebben. Stel wel sig. (overal evenveel participants,
dan valt probleem mee) Niet even groot, dan Welch
test doen!
Participant doet alle 3 de testen
Mean difference kan alleen bij
dezelfde mensen! Alleen van belang
bij ≥ 3 within-participants levels!
Sig > 0,05 moet. Dan goed!
T-test  simple ANOVA met degree
of freedom 1
The larger the t-value, the greater that the group differences are real and not due to chance
T-value expresses how much the between group mean difference is greater than the average withingroup variability.
speelt ook
grote rol!
is huge
Low =
kans op
bijna geen
mean is overal hetzelfde!
Independent measures t-test: single factor between participants design
Example  Participants exposed to two types of ads
Ads with moderately thin models (condition = 1)
Ads with extremely thin models (condition = 2)
2 levels
DV: appearance self-esteem (7 point)
Test: compare to which extent self-esteem is different when consumers are exposed to moderately vs.
extremely thin models
absolute verschil tussen self esteem en
mean. SS= variance
s^2= 0.71 = optellen / n-1
meestal 2-tail, t>6,18 dus waarschijnlijk significant,
p<0,0001. Zie hiervoor de tabel online.
Condition  grouping variable
SelfEsteem  Test variable
Define groups
Cut point  is voor median split “cut off”
Stel niet gelijk, dan 2de rij dan
worden o.a. df aangepast
variantie gelijk?
Sig= 1  dus ja dan kijken
eerste rij, df=28
 zo rapporteren!!!
Repeated mesures T-test (same participants but different conditions – within)
Participants are tested on their recall of two different types of words that have earlier been presented
to them.
Valence is the wihin-participants IV
Neutral words vs. positive valence words
DV is recall per valence condition
(Statistical) null hypothesis: no difference in recall across different conditions.
14/n-1 = 7
t-table  2 tails row 2 df =4,303  P-value
x<0,10. 90% dat er difference is marginal significant (p<0,1)
Niet 2 DVs!, 1tje maar (ook al heb je 2 kolommen)
paired of repeated  dat noemt SPSS het!
1 paar 2 levels!
On average, 5 more
words in positive state
only 2 conditions du
kunt niet testen, denk
an je moet ≥3 hebben!
 0.082
HC 7: Data analysis – One way ANOVA
ANOVA= Analysis of Variance
F as large as possible
Why WP-designs are more sensitive
Advantages WP design (lecture 4):
More economic
More sensitive  now we can see why
Stel dit zijn attitudes dan 1= negatiever dan 4
Independent measures ANOVA Err
Var = Tot. var. – IV var.
Repeated measures ANOVA err Var=
Tot. var – Ivvar – var due to
Stel 1/3 van de Var is ErrVar bij
independent, van die 1/3 kun je
misschien nog 1/9 extra verklaren
met repeated, due to participants var.
One-way independent measures ANOVA – calculating the anova tableb y hand!!!!!!
IV – ad with strong arguments vs. weak arguments vs. no arguments (vitamin water vb.)
DV – Attitude rating on a 1-7 scale
Observed data:
Note: 12 participants, randomly assigned to 3 conditions
Total sum of squares – step 1: calculate SStotal (add up, for all scores, the (squared) difference between
the score and the grand mean (VAR  s^2 = SS/ n-1
mean_strong = (5+6+7+6)/4 =6
= (7+2+4+3)/4 =4
= (3+1+2+2)/4 =2
Grand mean = 4
Variance due to the IV – step 1: calculate SSiv
Step 2: calculate df =number of conditions -1
Step 3: calculate MSiv
MSiv = SSiv /dfiv
Zie formule vorige pagina:
Error variance – Step 1: calculate SSerror
Step 2: calculate df = k-1
Step 3: calculate MSerror
MSerror = SSerror /dferror
MSiv = 16 df =2
MSerr = 2 df =9
Look at the greatest
MS en select the
matching df in table
for the column then
take row of the
other df value. So, 2
and 9 4,26<F<8,02
F=8 so effect is
significant at 0,05
One way repeated measures ANOVA – calculated by hand
Variance due to the participants: SSbetween-participants
When we have a WP design, we can also correct for individual differences (variance between
SSbetween participants
- Sum of squared differences of each participant’s mean from the grand mean, multiplied by the number
of conditions!
Data analysis – four steps:
1. Look at descriptive
2. Check assumptions
1. General assumptions
a. Dependent variable should be measured on interval/ratio scale
b. Data should be normally distributed
2. Assumptions specific to the analysis:
a. BP IV: assumption of homogeneity of variances
3. Conduct main analysis
4. Conduct follow-up analyses
assumptions you want p>0,05
effects you want p<0,05
both good!
The F-value indicates that the null hypothesis can be rejected
- there are some sig. differences among the condition means
Does this mean that all three conditions are sig. different from each other? NO!
Conclusion so far:
- “Argumentation strength had a sig. effect on product attitude (Mstrong=6.0; Mweak=4.0;Mno=2.0;
F(2,9)=8.00; p<.05)”
Next step: make comparisons between all condition pairs to see which means differ sig. from each other
 follow up analysis!
Follow up analyses
Main analysis results:
 sig. F-value tells us that the group means are different, but does not tell WHICH group means are
different (if we have more than two groups)
Therefore: follow-up analyses
 Which follow-up test is appropriate does NOT depend on how IVs are manipulated (between/within)
 it does depend on other factors:
Significance of main/interaction effects
Number of levels of the IVs
Whether the researcher had a priori hypotheses or not
Vandaag alleen linkerkant van deze
Conclusions sof ar….
Which means are significantly different from each other?
We have to conduct a comparison of marginal means to find out….
Comparing groups: why not just do a number of t-tests instead?
 example: 3 conditions
T-test 1: mean(condition 1) vs. mean (condition 2)
T-test 2: mean(condition 1) vs. mean (condition 3)
T-test 3: mean(condition 2) vs. mean (condition 3)
For each t-test: level of significance = 0,05. Probability of no type 1 errors is,95 for each test. Familywise
error rate = 1-(,95)^3 =,143
Inflation of type 1 error. This becomes a big problem when the number of comparisons becomes large!
We need a way to compare different groups without inflating the type 1 error rate. How? Compare
condition means but use a more conservative test (the familywise error rate should not exceed ,05).
Post-hoc comparisons (e.g., Bonferroni or Tukey)
However, if you predicted the difference between condition means in advance, you are ALLOWED to do
a less conservative test
You are not HUNTING for differences
Rather, only a few comparisons are of interest, and familywise error is not likely to be a serious
Data analysis – 4 steps!
Analyze –
Linear Model
– Repeated
Arguments – 3 levels!
Sig onderste table = 0,240, dus
happy!  were checking
assumptions so we want to accept
H0, so p has to be >0,05
Ignore this table!
On exam: main analyses results, which follow up?
Just post-hoc not enough, explain how you
conclude this!!!!
 zo dus weer antwoorden op tentamen!!!
HC 8: Data analysis – N-way ANOVA
N-way independent measures ANOVA
More than one BP IV
N stands for the number of IVs
2 IVs  2-way ANOVA
Number of IVs and number of levels per IV
determine the number of conditions
E.g., price (hi vs lo) and gender (m vs fm)
2x2 = 4 conditions
e.g., mood (pos vs neg vs neutral), argument
strength (hi vs lo) and number of
arguments(7 vs 3)  3x2x2 = 12 conditions
Participants are randomly assigned to the conditions
Example: 2-way independent measures
Example 2x2 independent measures
Hypothesis: a delay increases
consumption enjoyment for pleasant
products, but decreases enjoyment for
unpleasant products
BP IV 1: product type (pl vs unpl)
BP IV 2: delay (yes vs. no)
DV: consumption enjoyment (15 point)
Again 4 steps of data analysis!!!
niet sig. dus happy!!!
dit is je antwoord!
Main effects: follow up test? Only if effect is sig. and IV has
more than 2 levels!!!
Interaction effect: follow up test? Which comparisons do
we need to test the hypothesis?
What if….
we would have had a different hypothesis? An example (although it doesn’t make that much sense):
Consumption enjoyment is higher for pleasant than for unpleasant products, but only after a delay.
What if…. We would have had no hypotheses
at all?
Should we do a more conservative simple
effects test?
No, simple effects test does not depend on
whether you have a priori hypotheses or not
(see scheme) , why?
Because the simple effects are independent (i.e.
each simple effect is related to a different part
of the data) and as a result the type 1 error
rate is not inflated!
N-way repeated measures ANOVA
More than one WP IV
N stands for the number of IVs
2 IVs  2-way ANOVA
Number of IVs and number of levels per IV determine the number of conditions
E.g., price (hi vs lo) and gender (m vs fm) 2x2 = 4 conditions
E.g., mood (pos vs neg vs neutral), argument strength (hi vs lo) and number of arguments(7 vs 3) 
3x2x2 = 12 conditions
Participants participate in ALL conditions!
Example : two-way repeated measures ANOVA 2x2
Idea: People are more likely to underestimate the caloric content of main dishes and to choose highercalorie side dishes, drinks or desserts when fast food restaurants claim to be healthy (e.g. Subway)
compared to when they do not (McDonalds)
Hypotheses: H1: if health claim is present (vs. not), people underestimate the amount of calories in the
food more. H2: this effect is larger for high-calorie dishes than for low-calorie dishes.
WP IV1: health claim (McD vs Subway);WP IV2: actual calories (330 vs 600); DV: estimated calories
Data analysis – four steps!!!
What if…. We would have had a different hypothesis?
If the actual amount of calories increases, the estimated amount of calories increases less when health
claims are present (subway) vs. not (McD) zie plaatje volgende pagina
What if we would have had no hypotheses at all? Should we do a more conservative simple effects test?
No, simple effects test does not depend on whether you have a priori hypotheses or not (see scheme).
Why? Because the simple effects are independent (i.e. each simple effect is related to a different part of
the data) and as a result the type 1 error rate is not inflated.
Theoretically no limit to the number of IVs (N)
practically there is a limit – results difficult to interpret (four way interaction)
HC 9: Data analysis – N-way ANOVA continued!
(error)variance due to IV 2 and (error)variance to interaction are new in this analysis!
Voorbeeld Mcdonald vs. Subway zie vorig lecture, extra aantekeningen hieronder:
seems to support our hypothesis but
we cannot say if these results are
significant. We do not know anything
about the error variances!
sphericity: looks at the variance of the
differences! So if A-B ; A-C ; B-C variances
are not significantly different, then the
assumption of sphericity is met!
Why are there no results in the Sig
column? There are only 2 levels so,
differences are not an issue! The only
difference is A-B
Main claim
Main effect actual
interaction effect
Mean square column = type III sum of squares / df
F-value (interaction) = Mean square (interaction) / Mean square error (interaction)
 answer on exam!
Test for simple effects – Healthclaim is moderator – compare 1-2 and 3-4
Test for simple effects – actual calories is moderator – compare 1-3 and 2-4
pick option 2 we want to know the effect of healthclaim  caloric perception with actual calories as
moderator. So…….
COMPARE (claim)  the true IV NOT the moderator!!
>- this supports our hypothesis because
the underestimation is highly significant
in the 600 calorie condition
 total answer on your exam!!!
What if – different hypothesis  now, healthclaim is moderator!!
So actual calories  caloric perception with health claim as moderator!
Add the true IV
1=subway, 2=Mcdonalds.
difference 3-4 is Subway is not sig.
Difference 1-2 is McDonalds is sig.
This supports our hypothesis
So simple effects test depends on the
moderator you choose!
What if we had no a priori hypothesis at all?
More than 2 variables then they are related if you
know 1<2 and 2<3 it holds that 1<3, so simple effect
not necessary because results/effects are not
independent like 1-2 and 3-4
N-way Mixed ANOVA
- Combination of at least one BP IV and on WP IV
- Participants are randomly assigned to the levels of the BP variable(s) and participate in all
conditions of the WP variable(s)
- This lecture, we will stick to the 2-way mixed ANOVA
o 1WP IV and 1BP IV
- You can have multiple WP and BP variables
o E.g., a 3 way-mixed ANOVA
 2 WP IVs and 1 BP IV, or
 1 WP IV and 2 BP IVs
Variance due to interaction is new!
Example two way mixed ANOVA
Positive mood  more outside the box thinking  you’re seeing more similarities
H1: the larger the distance between the core brand and the extension, the more negative the extension
is evaluated
H2: Positive mood will enhance the evaluation of moderate extensions, but not the evaluation of near or
far extensions
2x3 mixed ANOVA
BP IV: Mood (pos vs. neg)
WP IV: Brand extension (near vs. moderate vs. far)
DV: Brand extension (7 point)
Analyze – General linear model - repeated measures
 extension type
Descriptive and homogeneity
Then slide four steps analysis
H1 seems to be supported
Check assumptions, because this is a mixed design, we need to check general assumptions, homogeneity
and sphericity!! Zie volgende pagina
Sig bovenste table is 0.965, this is good,
we don’t want to reject H0, we want to accept that the variances of the differences are homogeneous.
Sig onderste table good, thes should be non-sig.
2 significante main effects, 1 significant
interaction effect.
zie aantekeningen voor grafiekjes, in principe dezelfde als
de follow up grafiekjes van McD en subway, alleen nu
met 3 variabelen, kijk naar horizontale 1-2-3 en 4-5-6
vergelijking of naar verticale 1-4 en 2-5 en 3-6
vergelijking. Kijk naar de hypothese, wat is de moderator ,
teken het! En zie het! Dan weet je hoe je moet
vergelijken. In dit geval: mood  similarities met exttype
als moderator dus optie 2 verticaal. H2: positive mood
will enhance the evaluation of moderate extensions, but
not the evaluation of near or far extensions!
LSD because we have a priori
COMPARE (mood)  true IV
What if …we would have had different hypotheses? Dan dus horizontale 1-2-3 en 4-5-6 vergelijkingen!
1-2 zijn non sig de rest wel! Dus links boven eerste stukje groene lijn (positive) in het plaatje hierboven!
Within simple effects each condition is
significantly different!
 exam answer
What if…we had no hypothesis at all?
Bonferoni, we do not have
HC 10: Data Analysis – Final issues
Previous lectures on data analyses, steeds descriptives – assumptions – main analysis – follow up tests
Very important check for outliers, not normally distributed  is an assumption. Has huge effects on
descriptive. Outliers make prediction less accurate!
1. How to calculate by hand
AND how to run them in SPSS
When/which situation to
use?? AND how to run them in SPSS
Comparison test: comparing effects
for more conditions
Simple effects: separate effects for
one conditions
Follow up test is determined by
results from ANOVA to pinpoint
where the differences are. Test for
simple effects  open syntax add
compare statement, 2 p-values  for
2 conditions moderator!
Simple comparisons: comes along
with simple effects test
Post hoc comparisons (bonferroni or
tuckey) add in syntax after compare adj (bonferroni or tuckey)  more conservative, take in
consideration familywise type 1 error, you see differences you did not expect.
Planned contrasts: LSD does not control for type 1 error, based on theory you expected differences so
you have to be less conservative! Today – final issues on data analysis
Manova – Multivariate analyses of variance – more than one DV in the analysis
Ancova – analysis of covariance – extraneous, continuous variable in the analyses
Until now, univariate, now multivariate  DON’T MIX UP WITH REPEATED MEASURES
Two or more continous DVs. Multivariate ANOVA (MANOVA). Do not mix up with repeated measures –
refers to a wp variable (i.e. 1 DV measured at different levels of an IV). Do not mix up with items of a
scale (capture the same DV)  calculate Cronbach’s Alpha >0,7 dan kun je gewoon gemiddelde nemen
want vragen niet dezelfde DV. DVs should be moderately correlated to each other.
Example: what is more effective an entertaining or an informative ad commercial?
It depends on consumers’ state
When distracted: E>I (H1)
When not distracted E<I (H2)
Design: 2 (ad type:A: I vs B: E) x 2 (state: distracted vs. not distracted) BP design
Willingness to pay for the advertised product …..AND
Attitude towards the advertised product
2 DVs dus MANOVA
How does MANOVA work?
First, new DV that is a linear combination of the DVs
maximizing the differences among the treatment
groups (cfr. Discriminant analysis)
Then, ANOVA is conducted on this composite DV
But, why not analyzing multiple DVs with multiple
Advantages of MANOVA
1. Strong protection against type 1 error: since only
one DV is tested, the researcher is protected against
inflating the type 1 error due to multiple comparisons
2. Sometimes shows differences that individual
ANOVAs do not: MANOVA is powerful; it takes account of the relationship between DVs  When you
take more DVs who are correlated, you have a higher probability to find a relationship!
3. Hard to interpret if multiple ANOVAs are significant: easier to discover truly important effects
with a MANOVA
Assumptions of MANOVA (MANOVA alleen between, bij within gebruik repeated ANOVA)
1. Multivariate normality
a. DVs (collectively) should be normally distributed
b. For each DV:
i. Test for outliers
ii. Kolmogorov – smirnov test/shapiro – wilks test (normality)
2. Homogeneity of variances
a. DVs exhibit equal levels of variance across conditions
b. For each DV: Levene’s test  need to be non-significant
3. Homogeneity of covariances!!!
a. Intercorrelations between the DVs should be homogeneous across the conditions of the
b. Box’s M (use p<.001 as criterion)
Box test not significant thus covariances are homogeneous!!
if you have interaction effect use simple effects test!
Bonferroni niet nodig want we hadden a priori hypotheses!
Analysis of covariance
- To test for differences between group means when we know that an extraneous variable affects
the outcome variable
- Used to control known extraneous and confounding variables
- ANCOVA is an extension of ANOVA in which main effects and interactions of IVs on DV are
assessed after removal of one or more covariates
How does ANCOVA work?
First, for each condition the covariate is regressed on the DV
o Covariate is typically a continuous (or dummy) variable
Then, the effects of the IVs (experimental factors) are computed on the variance that is left
o This way we find the effect after the effect of the covariate and control for it
Advantages of ANCOVA
Reduces Error variance
o By explaining some of the unexplained variance the error variance in the model can be
o Increases sensitivity of the experiment (more power)
 i.p.v. mean, gebruik regressive, dan SD
kleiner dus meer power. Kans op type 2
error kleiner.
2 price coupons.
Condition 1: buy two and get 50% off
Condition 2: buy one get one free
Is hetzelfde, alleen anders gezegd!
X = coupon proneness
Advantages of ANCOVA
Reduces error variance
o By explaining some of the unexplained variance the error variance in the model can be
o Increases sensitivity of the experiment (more power)
Greater experimental control
o Adjust the means on the DV itself to what it would be if all cases scored identically on
the covariate
o Differences between participants are removed so the remaining differences would the
real effects of IV(s) on the DV
o By controlling known confounds, we gain greater insight into the effect of the predictor
Same as for ANOVA, plus
Homogeneity of regression: the slopes of the
regression are the same for all cells of a
ANCOVA : other issues
Extraneous variable can also be used as another IV
o Effect of manipulations depend on covariate
o Categorize it: use it as another IV in an ANOVA
o Leave it continuous: use it in a regression
o Effects of IV(s) on multiple DVs can also be corrected for covariates
o Multivariate extension of ANCOVA
o Same function as ANCOVA with respect to error reduction and the adjusting of means