Where Are We Now? Experimental Design Part II 9 Research Review 9 Research Design: The Plan 9 Internal Validity Statements of Causality External Validity Statements of Generalizability Designs Keith Smolkowski Experimental Designs Quasi-experimental Designs Poor Designs Non-Experimental Designs April 30, 2008 1 2 Threats to External Validity External Validity Population Validity Statements of Generalizability Population validity: generalization from a sample to a specific larger group Threats Extent to which findings can be applied to other individuals or groups and other settings. Two aspects of external validity Population validity: extent to which you can generalize results to a specified larger group. Ecological validity: extent to which you can generalize results to other environmental conditions. Inability to generalize from experimental sample to a defined population, the target population The extent to which personological variables interact with treatment effects A variety of threats to external validity identified by Glass and Bract (1968). 3 Generalize from experimental sample to a defined population 4 Public schools and students by type of locale: 1996-97 Sample may differ from target population on location or environment Can only generalize to the population from which the sample was drawn Compare populations on relevant characteristics and hope results will generalize—Risky Oregon Mississippi California 5 6 Interaction between personoligical variables and treatment effects Public school student membership by racial & ethnic category Sample may differ from target population on personological variables Personological variables include locus of control, gender, SES, education, alcohol consumption, anxiety level, various academic or social skills, confidence level, and so on Oregon Hawaii Mississippi 7 Threats to External Validity, Ecological Validity Explicit Description of IV Ecological Validity: Generalization from study conditions to a different environment 1) Explicit description of experimental treatment 2) Hawthorne effect 3) Multiple treatment interference 4) Novelty and disruption effects 5) Experimenter effect 8 6) Pretest sensitization 7) Posttest sensitization 8) Interaction of history and treatment 9) Measurement of dependent variable 10) Interaction of time of measurement and treatment effects Give a complete and detailed description of method such that another researcher could replicate the procedure Write out steps Draw pictures Build a timeline 9 Multiple-Treatment Interference 4. Infect with a cold virus 5. Lock “treatment” group in cold room 6. Have participants count symptoms 10 Hawthorne Effect Use multiple interventions in an uncontrolled manner “I know you’re watching me” phenomenon When knowledge of the study influences performance by participants Which treatment caused the change? Cannot generalize findings to situations with only one treatment Design studies that systematically separate the participants into different intervention groups Aware of the aims or hypothesis Receive special attention (Not really the cause of the problem in the original Hawthorne study; see Gilbert, 1996) 11 12 Novelty and Disruption Effects Novel interventions Experimenter Effect Effectiveness or ineffectiveness of intervention due to the individual administering the intervention Importance of replication may cause changes in DV simply because they are new or different Disruption effects occur when a disruption in routine inhibits performance 13 Pretest Sensitization 14 Posttest Sensitization Pretest may interact with intervention and produce different results than if the participants had not taken the pretest Posttest is itself a learning experience Participants performance affected by the the test Test extends the intervention Helps participants “put the pieces together” Pretest “clues in” the participants Applies to both groups Practice in intervention was probably insufficient Most common with self-report of attitudes or personality 15 Interaction of History and Treatment Effects 16 Measurement of DV Results limited to the particular mode of assessment used in the study Example: Project DARE It may be difficult to generalize the finding outside of the time period in which the research was done Example: School Safety Study Sample: 6th graders Intervention: Police officer DARE’s kids to “say no to drugs” Assessment—officer plays a drug dealer: “Wanna buy some drugs?” Results: DARE 6th graders say “no” March to June ’98: Intervene to improve safety in non-classroom settings May ’98: Thurston Shooting Difficult to avoid! DARE unrelated to later drug use 17 18 Threats to External Validity Interaction of Time of Measurement and Treatment Effects Population Validity—Review Population validity: generalization from a sample to a specific larger group Threats Results may depend on the amount of time between intervention and assessment Inability to generalize from experimental sample to a defined population, the target population The extent to which personological variables interact with treatment effects Last day of intervention A month later A year later Is assessment immediately after intervention better? 19 Threats to External Validity, Ecological Validity—Review Ecological Validity: Generalization from your study to some other environment Generalization from research project to real world Lab to School Clinic to Home Threats to Study Validity in Examples on Slides 18, 19, & 20 from Part I Threats Explicit treatment description Hawthorne effect Multiple treatment interference Novelty and disruption Experimenter effects Pretest sensitization Posttest sensitization Interaction of history & treatment Measurement of DV Interaction of time of measurement & treatment 21 Group Designs Quasi-Experimental Imply Causality Two or More Groups Two or More Groups Comparisons Between Groups Comparisons Between Groups Random Assignment Not Random Assignment Equivalent Groups No Adjustment Unnecessary Nonequivalent Groups Must Adjust for Differences Manipulate IV Manipulate IV Provide Intervention Provide Intervention One or More IVs Separate Groups for IVs One or More IVs Separate Groups for IVs a) b) c) d) e) Experimental treatment diffusion (i9), also called “contamination.” History (i1). Controls for differential selection (i6). A true placebo may control for experimental treatment diffusion (i9), compensatory equalization of treatments (i11) or resentful demoralization of the control group (i12). Multi-treatment interference (e2)—threat to external validity. f) g) h) i) j) k) l) Not a typical threat to internal validity, but probably falls under maturation (i2). Differential selection (i6). Compensatory rivalry by the control group (i10). Generalization from sample to undefined population (i6; see also slides 5-8 in Part II). Statistical regression towards the mean (i5). Testing—they become “test wise” (i3). Hawthorne effect (e3). i = internal, p = population, e = ecological 22 Experimental Designs Study Designs Experimental Show Causality 20 Nonexperimental Show Relationships Descriptive Correlational Causal Comparative Case Studies Single Group Pre-Post 23 Shows Causality Requirements Randomly assigned groups, two or more Manipulation of one IV (or more) Examples Post-only control group design Pre-post control-group design Solomon four group design Factorial designs 24 Post-Only Control Group Design Collect data (O) only after manipulation of the IV Interv’n: Control: X <blank> or or Threats to Validity* R X1 X2 Internal (causation): differential mortality External (generalization): potentially many *Examples X R Data collection (O) before & after intervention (X) Threats to Validity* O O Alternate Specification R X1 O R X2 O Pre-Post Control Group Design Internal: attrition, treatment diffusion External: pretest sensitization, novelty R O R O X O O *Examples 25 Solomon 4-Group Design Combination of post-only and pre-post designs Ideal but difficult to use Requires larger sample size Cumbersome analysis Threats to Validity* R O R O R Internal (causality): attrition External (generalizability): R experimenter, disruption, but 26 Factorial Designs X Similar to pre-post, but with two IVs O X = intervention (drug) Y = setting (cold room) O X Threats to Validity* O Internal: mortality External: interaction of testing & treatment O not pretest sensitization R O X1 Y1 O R O X1 Y2 O R O X2 Y1 O R O X2 Y2 O *Examples *Examples 27 Unit of Analysis: Individuals 28 Unit of Analysis: Groups Challenges to standard procedure Standard procedure Can only recruit intact groups, say, classrooms Intervention applies to groups, such as schools or communities Sample individuals Randomize individuals to groups Unit of analysis = individuals Analyze individuals Inference based on individuals Alternative procedure But what if you cannot recruit and assign individuals, only intact groups? 29 Randomize groups (e.g., classrooms, cities) Unit of analysis = groups Intervention applies to groups Analyze group means (other options available) Inference based on groups Requires more groups & more people 30 Quasi-Experimental Designs Static Group Comparison Quasi means “resembling” Requirements Two or more groups, Not randomly assigned Manipulation of one or more IVs Internal: differential selection, mortality External: interaction of selection and treatment Examples Static-group comparison design Nonequivalent-group design Interrupted time series design Design Not Random Exactly like a post-only design, but no random assignment Threats to validity (examples) X O O 31 Nonequivalent Group Designs Internal (causality): differential selection External: Interrupted Time Series Design O O O O O O Not Random Similar to experimental pre-post design, but not randomized Threats to Validity* 32 O X O O O O O O O Many assessments with IV in the middle Threats to Validity (Examples) O O Internal (causality): history, maturation External (generalizability): interaction of testing and treatment Interaction of testing and treatment Experimenter effect *Examples X Contrast with true single-case research, which can achieve good internal validity (single-subject research sequence highly recommended) 33 Regression Discontinuity 34 Nonexperimental Designs Show Only Relationships Requirements: Few Examples Two-group design Assign to condition based on pretest Cut score must be used for assignment Accounts for selection bias Ex: Assign all students reading below 20 on Beck Depression Inventory to intervention Descriptive Correlational Causal comparative design Case studies Single-group pre-post Discontinuity in regression Regress pretest on posttest Test for discontinuity at assignment cut score 35 36 Causal Comparative Designs Correlational Designs Also: ex post facto studies Correlation does not imply causation Determining the relationship between two variables Example: Teacher training and student performance in 1st grade Variable 1: Hours spent in practicum Variable 2: Students’ reading scores One group pretest-posttest Experimental control is not possible Shows relationships over time Cannot to establish causality School dropout “may lead to” gang membership Alternatives: poor school environment leads to both dropout and gang membership 38 Group Design Review How do design types differ? Assess, intervene, assess—a bad idea Threats to Validity (Examples) Experimental? Quasi-experimental? Nonexperimental? Design History O X Maturation Testing Interaction of selection and other factors Suspected cause: past event or experience Likely effect: present event or state Example: Gang membership 37 Internal Studies temporal relationships O Valid experiments What is internal validity? What is external validity? External Interaction of testing and treatment Interaction of selection and treatment 39 Single-Case Research (a digression) Powerful, flexible, and parsimonious Very different from group designs—1 to 5 participants Each participant serves as his or her own control May achieve excellent internal and external validity with appropriate design Many experimental designs Multiple baseline ABAB Alternating treatments Beyond the scope of the current presentation Key sources Kennedy, Single Case Designs for Ed. Research, 2005 Zhan & Ottenbacher, “Single subject research designs . . .” in Disability and Rehab., 2001, 23(1) Carr, “A classroom demonstration of singlesubject research designs” in Teaching of Psychology, 1997, 24(3) Fisher, Kelley, & Lomas, “Visual aids and structured criteria for . . . single-case designs” in JABA, 2003, 36(3) 41 40 Experimental Validity—Review Internal Validity Valid Statements about causality Can we draw conclusions about cause? External Validity Valid statements about generalization Can we expect the same results at other places or times, with other people, & with the intervention we reported? 42 Design Review—the Plan Design: Research Question Hypothesis Statement, Research Question Design overview: timeline or design figure Participants: sampling & recruitment Intervention (IV): theory, implementation, fidelity, strength Data collection (DV): Measures with reliability & validity Carefully identify procedures & timing Intended analysis & power Critique: strengths & weaknesses Relationship between IV(s) and DVs Identifies the effect of the IV on the DV for the study sample Must represent falsifiable hypotheses Suggests empirical tests Implies measurable, well defined DVs State “clearly and unambiguously” (Kerlinger & Lee, 2000) 43 Design: Overview & Sample 44 Design: Independent Variable Overview Operational definition of IV Expected strength of IV Fidelity of Implementation Draw design picture or timeline Define relationships among variables Sample “the extent to which the treatment conditions, as implemented, conform to the researcher’s specifications” To whom do I want to generalize? How will I sample participants? How do I assign participants? How large of a sample do I need? (G, B, & G, 1996, p. 481) Also called Treatment Fidelity 45 46 Design: Dependent Variable Design: Analysis & Critique Choose measures carefully Choose an analysis method Borrow measures in research literature Create new, which usually requires considerable pilot work Factors to Consider Research question Type of data collected Number of groups: treatment condition as well as other groups Type of design (e.g., pre-post, correlational) Report reliability & validity Carefully identify procedures & assessment schedule Report Power—coming next Critique: strengths & weaknesses 47 48 Design: Power Analysis Design: Power Analysis (cont’d) “How large must my sample be?” A step back: two “realities” Reality I: no difference in real world A big question Assumes “true” differences: how likely will we see them? Assumption for statistical tests (not power) Type I error: we accidentally concluding we have a difference when it does not really exist The chance of Type I error: p-value or alpha (α) Reality & our glimpse of it Real world: not known Our sample: what we know We try to infer reality by what we know from our sample Reality II: differences exist in real world 49 Error Types Probability of Type I error: α Probability of Type II error: β What No Difference We Know Accept Null from Our Sample Groups Differ (Test Results) Reject Null Assumption for power Type II error: accidentally concluding we have no difference when one exists in reality The chance of Type II error: called beta (β) 50 Design: Computing Power Important considerations Real World Unknown to Us No Difference Difference Correct Type I Error (α) Type II Error (β) Analysis: t-test, ANOVA, correlation, etc. Alpha level: level of statistical significance chosen to reject null, often .05 or .01 Direction of hypothesis: one- or two-tailed Expected magnitude of effect, “effect size” Desired power: 1 – β, often 80%, so β = .20 Attrition rate Consult Cohen (1988) or similar source for power tables or get G*Power (free) Correct 51 Questions? Research designs? Internal validity? Power? External validity? 53 52