Collegesamenvatting Experimental Research HC 1: Introduction Experimental research – key features: An experimental researcher : A. Tries to provide insights into the behavior of consumers B. By identifying causal relationships between variables C. Using controlled experiments A – Behavior of consumers. Experimental research studies various influences on behavior - Situational influences (external/contextual differences) vb. 3 vs. 7 commercials op attitude target ad - Chronic influences (individual differences) vb. Low involvement vs. High involvement consumers - Interactions between situational and chronic influences LI vs. HI vs. Aantal commercials Behavior is not only about wht people do and their acts, but also their thoughts, emotions, physiological reactions, etc…… Consumer behavior: subdomain in marketing based on psychology. B – Causal relationships – 3 key features of CR 1. The cause must be related to the effect - Evidence that IV and DV together in predicted ways 2. The cause must precede the effect - Evidence that IV occurs before DV to solve directionality problem. o Aggressive behavior and watching aggressive movies o Self-esteem and addictive behavior 3. No plausible alternative explanations must exist for the effect other than the cause - Evidence that IV is the only factor affecting DV to solve third variable problem o Car color and accident rate o Maternal smoking and later criminality C – Experiments Main data collection method when conducting experimental research. Basic idea is quite simple: 1. Manipulate IV a. Give treatments to different groups of participants (conditions) 2. Measure effects on DV a. Record the responses following the treatments 3. Control for confounding factors a. Keep everything else between groups identical Experimental research – Settings Laboratory experiments - True experimental designs Higher internal validity Lower external validity E.g., unconscious thinking study Field experiments - Quasi experimental designs Lower internal validity Higher external validity E.g. towel reuse study Experimental research: (dis)advantages Advantage: Causal inferences can be made because problems of directionality and third variables are ruled out. Disadvantages: Inadequate method of scientific inquiry because it views behavior as mechanic and maniputable Artificial because in highly controlled environments Only limited number of variables can be investigated at the same time HC 2: Theoretical framework Problem identification Sources of research ideas - Real life experiences (we’re all customers) Previous research and theory o Conflicting findings o Boundary conditions o Find explanation for findings o Propositions of a theory that are not tested yet o Applying a theory to a consumer setting Further define your topic by identifying all relevant factors - DV: IV: Setting: What do you want to explain/clarify? How do you want to explain it? What is your field of interest? This leads to your problem statement Good research problems should both 1. Have real life relevance (implications for managers or consumers) 2. Contribute to current knowledge (theoretical contribution) ELM: low: consumers rely on heuristics high: consumers rely on ratio Consult literature again: Investigate to what extend your problem has already been researched Look for relevant theories and prior findings to design your hypotheses Search in databases: ABI/Inform, Econlit, PsycInfo, Web of Science etc. Only use articles that were published in TOP JOURNALS Hypotheses A hypothesis is not a vague statement but should be: - As specific as possible Indicate a clear causal prediction between the variables Capable of being refuted or confirmed NOT: o Happy moods make consumer mindless o Thin models are bad A hypothesis should be based on strong support Two types of effects can be hypotheses Main effect: - The effect of one IV on DV over all treatment levels of the other IVs (direct effect) Interaction effect (for 2 IVs or more) - The effect of one IV on the DV depends on the level of the other IV Exercise - IV A with three treatment levels: A1, A2, and A3 IV B with two treatment levels: B1 and B2 no interaction Voorbeeld: je verwacht een effect als respons op je ad maar meer in Low Involvement dan in High Involvement! Example A Hypothesis – strong argumentation results in higher product attitudes than weak argumentation but only when consumers are negative and not when they are in a positive mood. Support – ELM model (argumentation strength only affects product attitudes when likelihood of elaboration is high) – Mood research (happy individuals elaborate less than unhappy individuals). Low involved Argument strength Consumer Attitude High involved H1: Positive mood Mood H2: Negative mood Example B Hypothesis: participants will demonstrate higher self-esteem after exposure to moderately thin models than after exposure to moderately thick models and lower self-esteem after exposure to extremely thin models than after exposure to extremely thick models. Support: Selective accessibility model (Mussweiler 2003) – standard selection – holistic similarity assessment – self evaluation (assimilation or contrast) Thin models Size of models Self esteem Thick models H1: moderately How extreme? H2:extreme Example C Hypothesis – Distraction will result in higher preference for tasted food products and this effect should be stronger for impulsive than for prudents Support – Two component model of sensory experiences (pain) “affective component (emotional reactions)” and “informational component (objective features)” – Research on impulsivity (prudents think much about objective features and have more accessible cognitions than impulsives) Statistical conclusion validity Accuracy of accepting or rejecting your hypotheses - E.g., mood affects product attitudes Types of hypotheses - - Scientific hypothesis (or alternative hypothesis) o Represents the relationship among the variables examined o E.g., attitudes are higher when mood is positive compared to negative Null hypothesis o A statement of NO relationship among the variables o E.g., attitudes are NOT higher when mood is positive compared to negative In an experiment, you test the null hypothesis - As a researcher you expect that the null hypothesis will be rejected To accept the scientific hypothesis, you must collect evidence to reject the null hypothesis. Type 1 error You think you have a sig. effect of an IV on a DV, but in fact there is NO effect The higher the confidence level of an experiment the lower chances of type 1 error Confidence level is typically set at 95% so that chance of type 1 error (α) is limited to 5% Thus - If obtained p-value of the effect ≤.05, then reject null hypotheses - If obtained p-value of the effect ≥.05, then do not reject null hypotheses Type 2 error You think you do NOT have a sig effect of an IV on a DV, but in fact there is an effect The higher the sensitivity (power) of an experiment the lower chance of Type 2 error. Sensitivity (power) is a function of 3 things: - - Alpha level, but set to 5% Simple size (# of participants) o The larger the sample size the higher the sensitivity o In practice: 20 PP per condition (between-participants) Effect size (magnitude of relation between IV and DV) o Better operationalization of variables o Better control of the experimental setting HC 3: Variable operationalization Manipulation of IVs Different steps to manipulate your IVs 1. Decide on the number of IVs in your experimental design 2. Create different treatment levels of your IVs 3. Find ways to experimentally induce the various treatments Number of IVs (1) Theoretically no limit to the number that can be used, but practically there is a limit - Number of participants needed Interpretation of data analysis (hoe meer “number”, hoe meer mensen nodig) Limitations – only include multiple IVs when: - The effect of one IV is expected to depend on the level of other IVs o One IV takes the role as moderator (in praktijk) They can be manipulated orthogonally o 2 VAR = 3 effects o 3 VAR = 7 effects – (A;B;C;A-B;A-C;B-C;A-B-C) Vorig college – example A (IV: argumentation strength; moderator: mood) – example B (IV: model size – moderator: extremity) – example C (IV: distraction; moderator: decision type) Creating treatments (2) There are three possibilities to create treatment levels A. Presence versus absence B. Amount of a variable C. Type of a variable A – presence vs. absence - Idea - One group receives the treatment condition (experimental condition) ant the other group does not (control condition) Aim is to tell if a variable/treatment has an effect Example B – amount of a variable - Idea - Administer different amounts of the variable to each of several groups Aim is not only to tell if a variable has an effect but also to examine what influence varying amounts of the IV have Example Combination of presence vs. absence and amount of a variable is sometimes possible C – type of a variable – Idea - Vary the type of variable under investigation Aim is to tell if different types of a variable cause variation on the DV Example immediately reward vs. future reward othology kun je checken kijken of alle condities kunnen “are possible”, 2x2 design NO AD PURCHASE YES CREATIVE AD TYPE TYPICAL not orthology in no condition kun je geen onderscheid tussen crea/typical Typical Creative YES OK OK Dus eigenlijk geen moderator, maar AD met 3 niveau’s TP/CRE/NO Inducing treatments (3) There are several possibilities to induce the various treatments A. Instructional manipulation B. Event manipulation C. Individual manipulation NO X X A - Instructional manipulation Variation in the IV is caused by differences in instructions provided by the experimenter Example of instructional mood inductions - Writing a report of a negative or positive personal event Important to be careful with manipulations - You have to make sure that everyone interprets the instructions in the same way Instructions should be clear and unambiguous B – event manipulation Changes in the physical and social environment/context or changes in exposed stimuli - Examples of contextual mood inductions C – Individual difference manipulation Aka personality variables; a measurement of a variable on which individuals may differ - E.g., need for cognition, self-monitoring, self-esteem, anxiety, impulsivity Example of an individual difference mood induction - Mood assessment scales (e.g., positive affect negative affect schedule (PANAS) scale) The way you treat the individual difference variable in your analysis matters - Us it as a continuous variable regression analysis (dit wordt meestal gebruikt is ook de beste) Split the variable in high vs. low based on some preset cut-off value or median ANOVA Personality variables: just measures it instead of manipulate it, meestal wordt median splitter gebruikt Main problem with this type of manipulation - The IV is measured instead of experimentally manipulated Thus, another variable correlated with the might be the real cause driving the effects E.g., need for cognition, verbal intelligence and life satisfaction E.g., mood, self-esteem, and motivation Therefore, studies applying an individual difference manipulation are often followed by studies directly manipulating the IV. E.g., need for cognition followed by an instructional manipulation (think hard vs. don’t think too much) Inducing treatments: Finding reliable manipulations Ease with which the IV is translated into operational terms varies greatly depending on the abstractness of the construct. o Exposure time, model size, number of arguments o Frustration, processing mindsets, product involvement Three techniques to increase chances of successful IV manipulations 1. Look for manipulations in the literature: look in the literature for manipulations that consistently and successfully produced the desired treatments, but adapt for own population. Example, Velten mood induction procedure (1968) 2. Add manipulation check: Test whether experimental manipulation induced the expected treatments. Examples: induction of mood in participants AND measure mood at the entire end of the experiment. 3. Pretest your treatments extensively: Helps to design valid operationalizations of IVs. Test whether the experimental manipulation will induce the expected treatment (if manipulation check would reveal the aim of the study). Ad model study. First pretest: participant rated 23 ad models in terms of size and attractiveness selection of 4 extremely and 4 moderately thin/heavy models. Differed in terms of size but not in terms of attractiveness. Second pretest confirmed that moderate models were perceived as more similar than extreme models. Combinations of manipulations possible Manipulations of same IV - - E.g., two mood manipulations o Reading sad vs. happy stories o Listening to sad vs. happy music Single factor design containing 2 conditions Manipulations of different IVs - - E.g., mood and argumentation strength o Listening sad vs. happy music o Exposed to strong vs. weak arguments Two factor design containing 4 conditions: 2 (mood: pos, neg) x 2 (arg. Strength: strong, weak) Order of manipulating multiple IVs What makes most sense? Option 1 dus!!! Measurement of DV Different steps to measure DV 1. Decide on the number of DVs 2. Decide on the specific measure of the various DVs 1 – number of DVs More than one DV possible when IVs are expected to affect multiple constructs (meer kans op een effect te vinden) How to analyze? - When DVs are moderately correlated: ANOVA o Protection against Type 1 errors o More powerful o Easier to analyze - When DVs capture unrelated constructs: Multiple ANOVAs o Sometimes one measurement is mediation (underlying) variable: mediation analyses o 2 – Measurement of DVs Behavior can be continuous (interval or ratio scale) - Time of investigating products, decision time Number of word puzzles you resolve 7 –point attitude/ purchase intention scales Appearance self-esteem scale (Heatherton and Polivy 1991) Behavior can be discrete (nominal scale): choice - Product choice E.g., choice between tasted food product and well-know alternative !!!! Select a continuous DV when possible because (a) more sensitive and (b) easier to analyze! Example A Approval of an increase in student fees from 1 (strongly disapprove) to 9 (strongly approve) (continuous DV) Indication of the fee they would consider appropriate (continuous DV) Number of thoughts that came to mind within 3 min (continuous DV) Example B Example C Choice between Lindt (sampled product) and Godiva (non-sampled product) (discrete DV) 0-1 variabele Finding reliable DV measurements 1. Get your participants committed a. Make the experiment interesting, attractive, fun b. Give rewards to motivate them, but this may reduce external validity (motivatie hierdoor hoger dan in real life) 2. Control of participant interpretation a. Retrospective verbal report i. Ask each individual after the experiment about what they think the purpose of the experiment was b. Concurrent verbal reports i. Ask them to report their thought during the experiment 3. Disguise as much as possible the aim of the study (most important one!) a. Disguise the relationship between IV and DV i. Take care of the cover story of the study (deception) 1. Ad models: magazine attractiveness, target audience ii. Use different (ostensibly unrelated) phases (with different experimenters) 1. Next different study on students’ self-esteem b. Disguise the DV measurement i. Outside lab setting (e.g., reward choice) ii. In unobtrusive way (e.g., taped, timed behavior) iii. Use filler items (beware of their influence) Construct validity Accuracy of the operationalizations - How good do they represent intended constructs? Main treats to construct validity 1. Reactivity to experimental situation 2. Experimenter effect 1. Reactivity a. Motives of participants can influence their perception of the experiment and their responses b. Two main types of reactivity i. Demand characteristics 1. Participant try to get purpose of study and answer accordingly ii. Positive self-presentation 1. Participants try to appear as positive as possible c. Techniques to avoid reactivity i. Disguise purpose of study as much as possible = (partial) blind experiment ii. Control of participant interpretation Antwoorden wat gewenst is bijvoorbeeld onderzoek unilever, positiever zijn, we love unilever, double blind (lab assistant weet ook van niks) 2. Experimenter effect a. Experimenter has motive to support the hypothesis which can unintentionally lead to recording errors b. Techniques to reduce experimenter effect i. Control of recording errors 1. Use multiple data recorders or have participants make response on a computer ii. Control of attribute errors 1. Preferably use the same experimenter in al treatment conditions iii. Control of experimenter expectancies 1. (partial) blind technique: experimenter is not aware of the condition the participant is assigned to Find evidence to support hypothesis. HC 4: Control Internal validity: refers to the accuracy of the inference that the IV caused the effect observed in the DV. Accuracy with which you can 100% say you found a causal relationship. Example: - What is the effect of argumentation strength on product attitudes under negative mood o Sad participants are exposed to ads with either strong or weak arguments o Strong > weak : favorable attitude o However, participants in the strong arguments condition are more familiar with the product compared to participants in the weak arguments condition o Familiarity is an extraneous variable did not control for this variable so the internal validity of our experiment is low!!!! Threats to internal validity History: any event occurring after the experimental treatment (IV) is introduced that could produce the observed effect. Example test market of new ad campaign, after half a year, measurement sales changes, however, in the meantime, a competitor went bankrupt (other explanation for increase in sales) Maturation: changes in biological and psychological conditions that occur with the passage of time. Example product attitude measured under low distraction. Filler task 30 minutes. Product attitude measured under high distraction. Distraction results in lower product evaluations. Participants got bored/tired/stressed, that’s why they gave more negative reactions. Instrumentation: changes in the assessment of the DV. Example effect of package color on customer approach in supermarket. Observer measure customer approach. 2 hours package A; next 2 hours package B. Package B result in less approach compared to package A. Alternative explanation: after 2 hours observer gets more tired so misses approaches. Testing: Prior measurement of DV affects results of subsequent measurement (multiple measurement). Example Measurement of product attitude at time 0. Exposure to product ad at time 1. Measurement of product attitude at time 2. Increase in product attitude from time 0 to time 2. Alternative explanation: people have mere exposure effect, so they react more positive. Also, social desirability, because people start to realize they are being tested! Regression artifact: tendency of extreme scores to become less extreme on a second assessment. Example 1 testing the effect of noise on product attitude. Select participants with high product attitude. Expose them to noise during product evaluation. Observation of decrease in product attitude. So noise reduces product attitude? No, extremely high scores once they got in an experiment they ease down their scores. Example 2 last year: particularly high accident rates. As a consequence: new traffic policy that was introduced. Subsequent year: drop in accident rates. Conclusion: traffic policy was effective. Altenative explanation: you don’t know if this is due to the new traffic policy! Attrition (experimental mortality): Some participants do not show up or do not fulfill the test. Example effect of product experiences on attitudes unknown brand. Week 1: exposure to product information and measurement of product attitude. Week 2: measurement of product attitude. Increase in product attitude observed. Only 60% returned for week 2, participants who didn’t like the product, didn’t show up anymore so overall attitudes are more positive! Selection: unequal distribution of participant-related characteristics over conditions. Example Does following the lectures results in better course performance? Compare grades of students who followed lectures with those who did not. Students who followed the lectures obtained higher grades however, maybe tutored students were more motivated. Thus, motivation may be the explaining factor for the higher grades. Not quality of lectures but intrinsic motivation of participant!! Additive and interactive effects Threats to internal validity can operate simultaneously. Typically in combination with selection. Example: Test of new store layout store A gets new layout whereas store B keeps old layout. Competitor located close to store A starts ad campaign. New store layout did not increase sales. Selection – History effect. One group of participants is different to other group. Ad campaign of competitor have stronger influence on one group than on other. Controlling for confounds Two ways to control for potential confounding factors A. True experimental design (HC 4) a. Control/comparison condition b. Assignment of participants to conditions B. Statistical control (HC 10) a. Add potential confounding factors as covariates in your analysis (ANCOVA) b. See lecture 10 on data analysis: final issues Control conditions - - Control condition is condition that does not get the treatment or a standard value o Serves as source of comparison to the experimental group o Controls for rival hypothesis (by controlling for extraneous variables) Examples o The effect of alcohol on risk taking o Placebo effect o Hawthorne effect Patients that suffer pain blue Experimental: medicin Control group: no medicin Placebo contro: give placebo Real effect is post-test yellow – to increase productivity, increase lighting, random employees in corner with lighting, productivity omhoog, decrease of lighting of other corner, also productivity omhoog. Conclusion: smaaler groups makes employees feel more special and increase productivity. Should have put small group in corner with normal lighting AND small group in corner with increased lighting. Assigning participants a) Between-participants design (only measured once!) a. Each participant is only exposed to one treatment b. Independent measures design b) Within-participants design (repeated measures) a. Each participant is exposed to all treatments b. Repeated measures design c) Mixed design a. Combination of between- and within-participants design b. Hybrid design a)- between participants design – Example (Sela, Berger, and Liu 2009) Hypothesis: when assortment size is larger, consumers choose products that are easier to justify Experiment 1a – 2(assortment size: smaller vs. larger) between-participants design Choice for reduced fat ice cream: 20% in smaller assortment size condition ; 37% in larger assortment size condition. Experiment 4 – 2 (assortment size: smaller vs. larger) x 2(licensing: low vs. high) between-participants design. Choice for work (vs. fun) laptop - Licensing condition o Smaller assortment o Larger assortment low 30% 52% high 63% 39% Wie kiest work? Alleen low kolom zelfde effect als ice cream. Low licensing imagine 3 hours of hedonic activity. High licensing imagine 3 hours community service. Then choice between work and fun laptops! Conclusion: justification is driving the choice for the laptop!!! Major threat to internal validity selection bias! Control techniques - - Randomization o Randomly assigning participants across conditions (non-randomly selected sample but that is randomly distributed over different groups) o Should not be confused with random selection Matching o Matching participants for an extraneous variable o Results in more sensitivity (vb. Alleen mannen!) Matching when using matching? 1. Small N and so randomization is risky and might yield unequal groups on influential extraneous variables 2. Matching variable is expected to be correlatedwith the DV and so exert an effect on it (confound) 3. There is a way to measure participants on the matching variable in advance. Matching by holding variables constant All individuals in all conditions will have the same degree or type of extraneous variable. Examples have only males, impulsive consumers, introverts, price sensitive consumers in your design Disadvantages: restricts the population size and restricts generalization to the type of participants in the study. Matching by equating participants Participants in the various conditions are explicitly equated on the extraneous variable How can we equate participants? - - Precision control o Each participants is matched with other participants of same age, gender, …., and randomly assigned to conditions o Excellent for increasing sensitivity o But: unpractical + many exclusions Frequency distribution control o Equate overall distribution of selected variables o But: combination of variables may be mismatched (not perfect but better than not matching the participants) Matching by building extraneous variable in the model Treat the extraneous variable as another IV. Example - Effect of argumentation strength on attitude towards brand This effect may depend on brand familiarity Brand familiarity measured and build in the model as IV Should be used only when - The effect of the extraneous variable is interesting The effect of the IV on the DV will not be the same on all levels of the extraneous variable (i.e., extraneous variable functions as a moderator) Matching by yoked control the temporal sequence of events in an experiment is kept constant across conditions. Example - Study an relationship between stress and stomach ulcers Effect of controlling product information acquisition and product attitude Should be used when a temporal relationship between event and response is expected. Pretesting Also measuring DV before treatment Given that you randomize or match participants: - Both groups should be similar except for the manipulation of the IV, and also pretests should be similar. Thus, one can directly observe a change in behavior as a result of the treatment /manipulation IV by looking at the posttests In addition pretesting - Costs time and money May lead to testing effects Why can then still be interesting to pretest? Advantages of pretesting - To insure initial comparability o What if randomization or matching was not successful? To test for a ceiling or floor effect o What if individuals have an extremely high or low initial score on the DV? To test for initial position o What if participants vary strongly with respect to their initial DV scores? To obtain evidence of change o Treatment condition should show a change from pretest to posttest due to treatment; the control condition should not. Within – participants design – example Hypothesis: it is easier to recall high meaningful brands than low meaningful brands Experiment (option 1): 2 (meaningfulness: low vs. high) within participants design Major threats: History, instrumentation, maturation, testing effects! Control technique: - Counterbalancing o Techniques to control order effects Intraparticipant counterbalancing Intragroup counterbalancing Intraparticipant counterbalancing Control for ordering effects by changing order of treatments for each individual participants B 0 A 3 A 5 B 6 Intragroup counterbalancing Control for ordering effects by changing order of treatments for different groups of participants Instead of randomizing the order within participants we do it between This has the main advantage that participants only have to take each treatment once Two main methods of intragroup counterbalancing 1. Complete counterbalancing 2. Incomplete counterbalancing Mixed design So, which design should you use? - - Use within-participants manipulations because o More economic o Higher sensitivity But in practice often between-participants manipulations are used because o Conditions are not reversible o Simplicity Procedure(no counterbalancing) Statistical analyses (less assumptions) HC 5: External validity and quasi-experimental designs Vorige college 5 treatments: A B C D E 0 B C D E A 2 ↑ learning B volgt A 2 x C volgt A 0 x D volgt A 2 x E volgt A 0 x E A B C D 3 ↑ learning decreases C D E A B 3 ↑ no more learning D E A B C 2 =10 ↖ tireness/boredom Niet goed als A moeilijk is heeft het negatievere gevolgen voor Recall B en D dus reverse. DCEBA EDACB AEBDC E 2x BACED C 2x 20 per participant per condition CBDAE External validity: the extent to which experimental results can be generalized across people, settings, treatment variations, outcomes and times Types of external validity: - Population validity Ecological validity Temporal validity Treatment variation validity Outcome validity Population validity: generalizability across populations Threats: Use convenience sample (e.g., students), Matching (e.g., only females) Ecological validity: Generalizability across settings or environments Threats: use of scenario’s, simulated shopping shelves, exposure to ads,…..etc Temporal validity: Generalizability across time Threats: (Time between IV and DV: Ad exposure immediately followed by product evaluations, Product information recall task)(Different points in time: morning vs. evening, Monday vs. Friday, winter vs. summer) Treatment variation validity: generalizability manipulations (E.g., can the results obtained with one mood induction also be obtained with a similar mood induction) Outcome validity: Generalizability across different but related DVs (E.g., product evaluations and product choices, implicit vs. explicit measures, different stimuli (products, ads, etc.) Field experiments: More external validity because in natural environment Les internal validity because no complete control over manipulation of IV and/or assignment of participants Applying true experimental designs impossible Instead researchers need to rely on: o Weak experimental designs o Quasi experimental designs Weak experimental designs Designs with severe threats to internal validity Different types of weak experimental designs o One-group posttest-only design o One-group pretest-posttest design o Nonequivalent posttest-only design One-group posttest-only design The influence of a treatment condition is investigated on only one group of individuals Weak control: almost all threats to internal validity apply because no control group and no comparison Rarely used in research settings! One-group pretest-posttest design a pretest measurement is taking to serve as baseline Weak control of threats to internal validity History: something else happened (advertising) Maturation: become positive door caring voor belief Regression to the mean Testing effect Nonequivalent posttest-only design a control group is added to serve as comparison standard but randomization is not possible Weak because of selection bias and all other threat-selection combinations Quasi experimental designs: design elements are added to weak designs to reduce chances of internal validity threats three main types: Nonequivalent comparison group design Interrupted time series design (event study) o (single) interrupted time series design o Multiple interrupted time series design Regression discontinuity design Control method – combination of nonequivalent comparison group and pretests Major threats – selection, and other threats in interaction with selection Control method – multiple measurements before and after treatment Major threat – History X= change in store lay-out Tussen Oc5 en Oc6 zit GEEN X treatment. Bijvoorbeeld. 1 store wel veranderin lay-out en 1 store niet, gevaar: verschillend type klanten = selection. Control method: - Multiple measurements before and after the treatment Addition of a nonequivalent control group Major threats: - Selection Other threats in interaction with selection Compare de 2 bovenste lijnen! Regression Discontinuity Design: used to determine if the special treatment some individuals receive has any effect. Characteristics of the design - All individuals are pretested Individuals who score above some cutoff score receive the treatment All individuals are post-tested Discontinuity in the regression line indicates a treatment effect Regression discontinuity design – requirements: Assignment must be based on the cutoff score Assignment cannot be a nominal variable as gender, or drug user or nondrug user The closer the cutoff score is to the mean , the more power Experimenter should control group assignment Relationship (linear, curvilinear, etc) should be known Participants must be from the same population Threats: Selective – History effect Attrition HC 6: Data analysis – introduction Vorig lecture, experimental designs for field experiments, weak experimental design nooit gebruiken tenzij het echt niet anders kan! Descriptives kijk hier altijd als eerste naar! Example - Mood affects the number of candies people eat Collected data in the negative mood condition 3 - 2 4 3 1 What is a good summary statistic of the real behavior in a particular condition? How well does the summary statistic fits the real data? Summary statistics 1 2 3 3 4 Mode = most common score = 3 Median = Middle score ((n+1)/2) when ranked in order of magnitude = 3 Mean = 2,6 is the mean, niemand eet 2,6 dus error!! (or fit) Hoe error te berekenen! Square Nadeel fit always becomes slechter bij meer personen = namelijk simpele optelling! Variance: The sum of squares is a good measure of overall variability, but is dependent on the number of scores. We calculate the average variability by dividing by the degrees of freedom. This value is called the variance (𝑠 2 ): Variance is al beter, 1.3 candies^2 (niet echt daarom wortel SD) Standard deviation: the variance has one problem: it is measured in units squared. This isn’t a very meaningful metric so we take the square root. This is the standard deviation (s): number of candies, dit was met mean!! Calculating fit of the mode/median 1.22 mode/median is hoog, standard wordt de mean gebruikt. De eerste methode dus!! Important to remember Mean is most accurate estimation of observed behavior in particular condition (bij rapporteren, mean en SD geven) Sum of squares, variance, and SD represent the same thing The “fit” of the mean to the data The variability in the data How well the mean represents the observed data Error How to report this: “Participants ate les candies in de negative (M=2,6;SD=1.14) than in the positive mood (M=5,7; SD=1,63)” Assumptions 1. Data should be measured on interval or ratio scale 2. Data should be normally distributed a. Check for outliers with boxplots b. Look at histogram c. Conduct Kolgomorov-Smirnov test/Shapiro-Wilks test 3. Assumptions specific to the analysis a. Between-participants: homogeneity of variances b. Within-participants: sphericity c. Mixed design: both homogeneity and sphericity Altijd checken voor outliers. Mean verandert dramatisch. Stel 1 heeft een 6 dan mean 2,6 4,7 Boxplot kun je doen *6 is een outlier. (meer dan 3x SD weg van de mean) dan zegt SPSS, outlier. Maximaal 10% outliers verwijderen >10% bad experiment, beste om experiment opnieuw te doen! Kun je ook op normality checken! 2 testen SPSS, indication normality. Dit voorbeeld geen normality! What now? 1. Remove outliers 2. If still not normally distributed, transform the data (Log transformation (log(Xi)), Square root transformation (√Xi), reciprocal transformation (1/Xi) transformation (log=special voor large tails aan rechterkant) Delete 6, hier dan wel normality! !!!!!!!!! Randomly assigned to 1 condition. Difference significant? SPSS Levene’s test p=0,93 Goed!! Je wilt niet sig. hebben. Stel wel sig. (overal evenveel participants, dan valt probleem mee) Niet even groot, dan Welch test doen! Participant doet alle 3 de testen Mean difference kan alleen bij dezelfde mensen! Alleen van belang bij ≥ 3 within-participants levels! Sig > 0,05 moet. Dan goed! IVs? T-test simple ANOVA met degree of freedom 1 T-tests The larger the t-value, the greater that the group differences are real and not due to chance T-value expresses how much the between group mean difference is greater than the average withingroup variability. variance speelt ook grote rol! High variability is huge overlap! Low = grootste kans op difference, bijna geen overlap! ↑ mean is overal hetzelfde! Independent measures t-test: single factor between participants design Example Participants exposed to two types of ads - Ads with moderately thin models (condition = 1) Ads with extremely thin models (condition = 2) 2 levels DV: appearance self-esteem (7 point) Test: compare to which extent self-esteem is different when consumers are exposed to moderately vs. extremely thin models absolute verschil tussen self esteem en mean. SS= variance s^2= 0.71 = optellen / n-1 meestal 2-tail, t>6,18 dus waarschijnlijk significant, p<0,0001. Zie hiervoor de tabel online. Condition grouping variable SelfEsteem Test variable Define groups Cut point is voor median split “cut off” point! Stel niet gelijk, dan 2de rij dan worden o.a. df aangepast variantie gelijk? Sig= 1 dus ja dan kijken eerste rij, df=28 zo rapporteren!!! Repeated mesures T-test (same participants but different conditions – within) Participants are tested on their recall of two different types of words that have earlier been presented to them. Valence is the wihin-participants IV - Neutral words vs. positive valence words DV is recall per valence condition (Statistical) null hypothesis: no difference in recall across different conditions. 14/n-1 = 7 t-table 2 tails row 2 df =4,303 P-value x<0,10. 90% dat er difference is marginal significant (p<0,1) Niet 2 DVs!, 1tje maar (ook al heb je 2 kolommen) paired of repeated dat noemt SPSS het! 1 paar 2 levels! On average, 5 more words in positive state only 2 conditions du kunt niet testen, denk an je moet ≥3 hebben! 0.082 HC 7: Data analysis – One way ANOVA ANOVA= Analysis of Variance F as large as possible Why WP-designs are more sensitive Advantages WP design (lecture 4): - More economic More sensitive now we can see why Stel dit zijn attitudes dan 1= negatiever dan 4 Independent measures ANOVA Err Var = Tot. var. – IV var. Repeated measures ANOVA err Var= Tot. var – Ivvar – var due to participants. Stel 1/3 van de Var is ErrVar bij independent, van die 1/3 kun je misschien nog 1/9 extra verklaren met repeated, due to participants var. One-way independent measures ANOVA – calculating the anova tableb y hand!!!!!! IV – ad with strong arguments vs. weak arguments vs. no arguments (vitamin water vb.) DV – Attitude rating on a 1-7 scale Observed data: Note: 12 participants, randomly assigned to 3 conditions Total sum of squares – step 1: calculate SStotal (add up, for all scores, the (squared) difference between the score and the grand mean (VAR s^2 = SS/ n-1 mean_strong = (5+6+7+6)/4 =6 mean_weak = (7+2+4+3)/4 =4 mean_no = (3+1+2+2)/4 =2 Grand mean = 4 Variance due to the IV – step 1: calculate SSiv Step 2: calculate df =number of conditions -1 Step 3: calculate MSiv MSiv = SSiv /dfiv Zie formule vorige pagina: Error variance – Step 1: calculate SSerror Step 2: calculate df = k-1 Step 3: calculate MSerror MSerror = SSerror /dferror Formule: F-statistic MSiv = 16 df =2 MSerr = 2 df =9 Look at the greatest MS en select the matching df in table for the column then take row of the other df value. So, 2 and 9 4,26<F<8,02 F=8 so effect is significant at 0,05 level! One way repeated measures ANOVA – calculated by hand Variance due to the participants: SSbetween-participants When we have a WP design, we can also correct for individual differences (variance between participants)! SSbetween participants - Sum of squared differences of each participant’s mean from the grand mean, multiplied by the number of conditions! Data analysis – four steps: 1. Look at descriptive 2. Check assumptions 1. General assumptions a. Dependent variable should be measured on interval/ratio scale b. Data should be normally distributed 2. Assumptions specific to the analysis: a. BP IV: assumption of homogeneity of variances 3. Conduct main analysis 4. Conduct follow-up analyses ↓ ← assumptions you want p>0,05 effects you want p<0,05 both good! Results The F-value indicates that the null hypothesis can be rejected - there are some sig. differences among the condition means Does this mean that all three conditions are sig. different from each other? NO! Conclusion so far: - “Argumentation strength had a sig. effect on product attitude (Mstrong=6.0; Mweak=4.0;Mno=2.0; F(2,9)=8.00; p<.05)” Next step: make comparisons between all condition pairs to see which means differ sig. from each other follow up analysis! Follow up analyses Main analysis results: sig. F-value tells us that the group means are different, but does not tell WHICH group means are different (if we have more than two groups) Therefore: follow-up analyses Which follow-up test is appropriate does NOT depend on how IVs are manipulated (between/within) it does depend on other factors: - Significance of main/interaction effects Number of levels of the IVs Whether the researcher had a priori hypotheses or not Vandaag alleen linkerkant van deze flowchart! Conclusions sof ar…. Which means are significantly different from each other? We have to conduct a comparison of marginal means to find out…. Comparing groups: why not just do a number of t-tests instead? example: 3 conditions - T-test 1: mean(condition 1) vs. mean (condition 2) T-test 2: mean(condition 1) vs. mean (condition 3) T-test 3: mean(condition 2) vs. mean (condition 3) For each t-test: level of significance = 0,05. Probability of no type 1 errors is,95 for each test. Familywise error rate = 1-(,95)^3 =,143 Inflation of type 1 error. This becomes a big problem when the number of comparisons becomes large! We need a way to compare different groups without inflating the type 1 error rate. How? Compare condition means but use a more conservative test (the familywise error rate should not exceed ,05). - Post-hoc comparisons (e.g., Bonferroni or Tukey) However, if you predicted the difference between condition means in advance, you are ALLOWED to do a less conservative test - You are not HUNTING for differences Rather, only a few comparisons are of interest, and familywise error is not likely to be a serious concern. PLANNED CONTRASTS Data analysis – 4 steps! Analyze – General Linear Model – Repeated Measures! Arguments – 3 levels! Sig onderste table = 0,240, dus happy! were checking assumptions so we want to accept H0, so p has to be >0,05 Ignore this table! On exam: main analyses results, which follow up? Just post-hoc not enough, explain how you conclude this!!!! zo dus weer antwoorden op tentamen!!! HC 8: Data analysis – N-way ANOVA N-way independent measures ANOVA More than one BP IV N stands for the number of IVs 2 IVs 2-way ANOVA Number of IVs and number of levels per IV determine the number of conditions E.g., price (hi vs lo) and gender (m vs fm) 2x2 = 4 conditions e.g., mood (pos vs neg vs neutral), argument strength (hi vs lo) and number of arguments(7 vs 3) 3x2x2 = 12 conditions Participants are randomly assigned to the conditions Example: 2-way independent measures ANOVA Example 2x2 independent measures ANOVA Hypothesis: a delay increases consumption enjoyment for pleasant products, but decreases enjoyment for unpleasant products BP IV 1: product type (pl vs unpl) BP IV 2: delay (yes vs. no) DV: consumption enjoyment (15 point) UNIVARIATE Again 4 steps of data analysis!!! niet sig. dus happy!!! dit is je antwoord! HIER TUSSEN KOMT DE GESCANDE PAGINA Main effects: follow up test? Only if effect is sig. and IV has more than 2 levels!!! Interaction effect: follow up test? Which comparisons do we need to test the hypothesis? PASTE!! HIER GAAT HET OM What if…. we would have had a different hypothesis? An example (although it doesn’t make that much sense): Consumption enjoyment is higher for pleasant than for unpleasant products, but only after a delay. What if…. We would have had no hypotheses at all? Should we do a more conservative simple effects test? No, simple effects test does not depend on whether you have a priori hypotheses or not (see scheme) , why? Because the simple effects are independent (i.e. each simple effect is related to a different part of the data) and as a result the type 1 error rate is not inflated! N-way repeated measures ANOVA More than one WP IV N stands for the number of IVs 2 IVs 2-way ANOVA Number of IVs and number of levels per IV determine the number of conditions E.g., price (hi vs lo) and gender (m vs fm) 2x2 = 4 conditions E.g., mood (pos vs neg vs neutral), argument strength (hi vs lo) and number of arguments(7 vs 3) 3x2x2 = 12 conditions Participants participate in ALL conditions! Example : two-way repeated measures ANOVA 2x2 Idea: People are more likely to underestimate the caloric content of main dishes and to choose highercalorie side dishes, drinks or desserts when fast food restaurants claim to be healthy (e.g. Subway) compared to when they do not (McDonalds) Hypotheses: H1: if health claim is present (vs. not), people underestimate the amount of calories in the food more. H2: this effect is larger for high-calorie dishes than for low-calorie dishes. WP IV1: health claim (McD vs Subway);WP IV2: actual calories (330 vs 600); DV: estimated calories Data analysis – four steps!!! What if…. We would have had a different hypothesis? If the actual amount of calories increases, the estimated amount of calories increases less when health claims are present (subway) vs. not (McD) zie plaatje volgende pagina What if we would have had no hypotheses at all? Should we do a more conservative simple effects test? No, simple effects test does not depend on whether you have a priori hypotheses or not (see scheme). Why? Because the simple effects are independent (i.e. each simple effect is related to a different part of the data) and as a result the type 1 error rate is not inflated. N-way ANOVA Theoretically no limit to the number of IVs (N) practically there is a limit – results difficult to interpret (four way interaction) HC 9: Data analysis – N-way ANOVA continued! (error)variance due to IV 2 and (error)variance to interaction are new in this analysis! Voorbeeld Mcdonald vs. Subway zie vorig lecture, extra aantekeningen hieronder: seems to support our hypothesis but we cannot say if these results are significant. We do not know anything about the error variances! sphericity: looks at the variance of the differences! So if A-B ; A-C ; B-C variances are not significantly different, then the assumption of sphericity is met! Why are there no results in the Sig column? There are only 2 levels so, differences are not an issue! The only difference is A-B Main claim Main effect actual interaction effect Mean square column = type III sum of squares / df F-value (interaction) = Mean square (interaction) / Mean square error (interaction) answer on exam! Test for simple effects – Healthclaim is moderator – compare 1-2 and 3-4 1. Test for simple effects – actual calories is moderator – compare 1-3 and 2-4 2. pick option 2 we want to know the effect of healthclaim caloric perception with actual calories as moderator. So……. COMPARE (claim) the true IV NOT the moderator!! >- this supports our hypothesis because the underestimation is highly significant in the 600 calorie condition total answer on your exam!!! What if – different hypothesis now, healthclaim is moderator!! So actual calories caloric perception with health claim as moderator! Add the true IV 1=subway, 2=Mcdonalds. difference 3-4 is Subway is not sig. Difference 1-2 is McDonalds is sig. This supports our hypothesis So simple effects test depends on the moderator you choose! What if we had no a priori hypothesis at all? More than 2 variables then they are related if you know 1<2 and 2<3 it holds that 1<3, so simple effect not necessary because results/effects are not independent like 1-2 and 3-4 N-way Mixed ANOVA - Combination of at least one BP IV and on WP IV - Participants are randomly assigned to the levels of the BP variable(s) and participate in all conditions of the WP variable(s) - This lecture, we will stick to the 2-way mixed ANOVA o 1WP IV and 1BP IV - You can have multiple WP and BP variables o E.g., a 3 way-mixed ANOVA 2 WP IVs and 1 BP IV, or 1 WP IV and 2 BP IVs Variance due to interaction is new! Example two way mixed ANOVA Positive mood more outside the box thinking you’re seeing more similarities H1: the larger the distance between the core brand and the extension, the more negative the extension is evaluated H2: Positive mood will enhance the evaluation of moderate extensions, but not the evaluation of near or far extensions 2x3 mixed ANOVA BP IV: Mood (pos vs. neg) WP IV: Brand extension (near vs. moderate vs. far) DV: Brand extension (7 point) Analyze – General linear model - repeated measures extension type Descriptive and homogeneity Then slide four steps analysis H1 seems to be supported Check assumptions, because this is a mixed design, we need to check general assumptions, homogeneity and sphericity!! Zie volgende pagina Sig bovenste table is 0.965, this is good, we don’t want to reject H0, we want to accept that the variances of the differences are homogeneous. Sig onderste table good, thes should be non-sig. 2 significante main effects, 1 significant interaction effect. ALWAYS CHECK THE FOLLOW UP SCHEME!!! zie aantekeningen voor grafiekjes, in principe dezelfde als de follow up grafiekjes van McD en subway, alleen nu met 3 variabelen, kijk naar horizontale 1-2-3 en 4-5-6 vergelijking of naar verticale 1-4 en 2-5 en 3-6 vergelijking. Kijk naar de hypothese, wat is de moderator , teken het! En zie het! Dan weet je hoe je moet vergelijken. In dit geval: mood similarities met exttype als moderator dus optie 2 verticaal. H2: positive mood will enhance the evaluation of moderate extensions, but not the evaluation of near or far extensions! LSD because we have a priori hypothesis! COMPARE (mood) true IV What if …we would have had different hypotheses? Dan dus horizontale 1-2-3 en 4-5-6 vergelijkingen! 1-2 zijn non sig de rest wel! Dus links boven eerste stukje groene lijn (positive) in het plaatje hierboven! Within simple effects each condition is significantly different! exam answer What if…we had no hypothesis at all? THEN POST-HOC COMPARISONS Bonferoni, we do not have hypotheses!! COMPARE ADJ (BONFERRONI) COMPARE(exttype) ADJ (BONFERRONI) HC 10: Data Analysis – Final issues Previous lectures on data analyses, steeds descriptives – assumptions – main analysis – follow up tests Very important check for outliers, not normally distributed is an assumption. Has huge effects on descriptive. Outliers make prediction less accurate! 1. How to calculate by hand AND how to run them in SPSS 2. When/which situation to use?? AND how to run them in SPSS Comparison test: comparing effects for more conditions Simple effects: separate effects for one conditions Follow up test is determined by results from ANOVA to pinpoint where the differences are. Test for simple effects open syntax add compare statement, 2 p-values for 2 conditions moderator! Simple comparisons: comes along with simple effects test Post hoc comparisons (bonferroni or tuckey) add in syntax after compare adj (bonferroni or tuckey) more conservative, take in consideration familywise type 1 error, you see differences you did not expect. Planned contrasts: LSD does not control for type 1 error, based on theory you expected differences so you have to be less conservative! Today – final issues on data analysis Manova – Multivariate analyses of variance – more than one DV in the analysis Ancova – analysis of covariance – extraneous, continuous variable in the analyses Until now, univariate, now multivariate DON’T MIX UP WITH REPEATED MEASURES MANOVA Two or more continous DVs. Multivariate ANOVA (MANOVA). Do not mix up with repeated measures – refers to a wp variable (i.e. 1 DV measured at different levels of an IV). Do not mix up with items of a scale (capture the same DV) calculate Cronbach’s Alpha >0,7 dan kun je gewoon gemiddelde nemen want vragen niet dezelfde DV. DVs should be moderately correlated to each other. Example: what is more effective an entertaining or an informative ad commercial? It depends on consumers’ state When distracted: E>I (H1) When not distracted E<I (H2) Design: 2 (ad type:A: I vs B: E) x 2 (state: distracted vs. not distracted) BP design DVs: Willingness to pay for the advertised product …..AND Attitude towards the advertised product 2 DVs dus MANOVA How does MANOVA work? First, new DV that is a linear combination of the DVs maximizing the differences among the treatment groups (cfr. Discriminant analysis) Then, ANOVA is conducted on this composite DV But, why not analyzing multiple DVs with multiple ANOVA’s Advantages of MANOVA 1. Strong protection against type 1 error: since only one DV is tested, the researcher is protected against inflating the type 1 error due to multiple comparisons 2. Sometimes shows differences that individual ANOVAs do not: MANOVA is powerful; it takes account of the relationship between DVs When you take more DVs who are correlated, you have a higher probability to find a relationship! 3. Hard to interpret if multiple ANOVAs are significant: easier to discover truly important effects with a MANOVA Assumptions of MANOVA (MANOVA alleen between, bij within gebruik repeated ANOVA) 1. Multivariate normality a. DVs (collectively) should be normally distributed b. For each DV: i. Test for outliers ii. Kolmogorov – smirnov test/shapiro – wilks test (normality) 2. Homogeneity of variances a. DVs exhibit equal levels of variance across conditions b. For each DV: Levene’s test need to be non-significant 3. Homogeneity of covariances!!! a. Intercorrelations between the DVs should be homogeneous across the conditions of the design b. Box’s M (use p<.001 as criterion) Box test not significant thus covariances are homogeneous!! if you have interaction effect use simple effects test! Bonferroni niet nodig want we hadden a priori hypotheses! Analysis of covariance - To test for differences between group means when we know that an extraneous variable affects the outcome variable - Used to control known extraneous and confounding variables - ANCOVA is an extension of ANOVA in which main effects and interactions of IVs on DV are assessed after removal of one or more covariates How does ANCOVA work? - First, for each condition the covariate is regressed on the DV o Covariate is typically a continuous (or dummy) variable Then, the effects of the IVs (experimental factors) are computed on the variance that is left o This way we find the effect after the effect of the covariate and control for it Advantages of ANCOVA - Reduces Error variance o By explaining some of the unexplained variance the error variance in the model can be reduced o Increases sensitivity of the experiment (more power) i.p.v. mean, gebruik regressive, dan SD kleiner dus meer power. Kans op type 2 error kleiner. 2 price coupons. Condition 1: buy two and get 50% off Condition 2: buy one get one free Is hetzelfde, alleen anders gezegd! X = coupon proneness Advantages of ANCOVA - - Reduces error variance o By explaining some of the unexplained variance the error variance in the model can be reduced o Increases sensitivity of the experiment (more power) Greater experimental control o Adjust the means on the DV itself to what it would be if all cases scored identically on the covariate o Differences between participants are removed so the remaining differences would the real effects of IV(s) on the DV o By controlling known confounds, we gain greater insight into the effect of the predictor variable(s) Assumptions Same as for ANOVA, plus Homogeneity of regression: the slopes of the regression are the same for all cells of a design. ANCOVA : other issues - - Extraneous variable can also be used as another IV o Effect of manipulations depend on covariate o Categorize it: use it as another IV in an ANOVA o Leave it continuous: use it in a regression MANCOVA o Effects of IV(s) on multiple DVs can also be corrected for covariates o Multivariate extension of ANCOVA o Same function as ANCOVA with respect to error reduction and the adjusting of means