Module 8: Experimental validity “IT MIGHT BE ON THE QUIZ” 1. What are “threats to internal validity” and “threats to external validity”? How are they different? 'Threats to internal validity' are things that might go wrong in trying to answer questions about cause and effect. They are ways a study might fail to provide convincing or valid evidence (handout packet). 'Threats to external validity' are threats to the generalizability of results, "that is, the extent to which the results of a study could be expected to apply to other individuals or settings" (handout packet). They are different in that 'threats to internal validity' are threats that threaten to make the study itself and thus it's results invalid whereas 'threats to external validity' are threats to how the results are applied or interpreted to generalize to the population and/or other contexts. I see the difference as an inward threat which could potentially destroy the study or outward threats that could potentially make a study useless for my purposes. Random selection improves the external validity of an experiment and random assignment improves the internal validity. This is why “randomization” is so important in research. External validity is a way to determine is to whom are these results applicable. This speaks to the generalizability of the results. In other words, from what population were the subjects selected, and how confident are we that the sample is representative of the population? External validity can be viewed in terms of units, treatments, outcomes and settings. Subjects: Does the effect hold with a different group? Treatments: Does the effect hold when there are variations in treatment implementation? Measures: Does the effect hold on different measures? Settings: Does the effect found in one setting hold in other settings? Internal validity, what you are really interested in is the quality of the research design. Did the researcher plan ahead effectively? Did he or she take issues into consideration that will provide the strongest evidence about cause and effect? Many issues that could be compromising to a research design, and thus its results, can be addressed by careful planning, but that still doesn’t guarantee that things won’t go wrong. How confident can we be that the independent variable, rather than other extraneous or confounding variables, produced the observed effect? “Threats to internal validity”: refers to categories of ways that the internal validity of a study might be called into question 2. Gall, Gall, and Borg (2003) list the Hawthorne effect as a threat to external validity, whereas McMillan (2003) uses it as an example of a threat to internal validity. How would you resolve this difference? Under what conditions is the Hawthorne effect a matter of internal validity? External validity? McMillan believes it is an internal validity problem because the subjects respond in a particular way because they know they are participating in a study—so one is unable to conclude the treatment caused the desired effect. This is an example of “subject effects” (the subjects awareness of their involvement in the study may effect their performance). Gall, Gall, & Borg believe the Hawthorne effect is a threat to external validity because of the actual Hawthorne effect is taking place. That is, the treatment cannot be generalizable outside of this group because new subjects will not be experiencing the “Hawthorne effect” (i.e., being in a study—receiving special treatment). 3. For items 3-10, choose from the following list what threat to internal validity is likely to be the greatest, or most obvious. Give a justification for your choice. history selection maturation pretesting instrumentation treatment replications subject attrition statistical regression diffusion of treatment experimenter effects subject effects 4. Dr. Jones finds a large school district in which half of the elementary schools have just adopted the Accelerated Reader program. At the end of the school year (ex post facto), he gets standardized test scores for students from all the elementary schools in the district, and finds that those with the Accelerated Reader program have significantly higher test scores. He concludes that the Accelerated Reader produces gains in standardized test scores. Which threat to internal validity most directly and obviously calls Dr. Jones’ conclusion into question? Selection 5. Dr. Jones found two first grade teachers who were willing to take part in an experiment comparing two different approaches to reading instruction. By conducting tests at the very beginning of the year, he found that the two classes were almost perfectly matched in terms of students’ verbal IQ and reading ability. By flipping a coin, he randomly assigned Miss Murphy to the experimental condition, Jolly Phonics, and Mrs. Butler, her friend in the classroom across the hall, to the control group, continuing her normal instruction, using the basal reading series adopted by the district. What threat to internal validity can Dr. Jones anticipate? diffusion of treatment or history - but I think this could be even more a history threat. On page 215, the textbook says "history is the category of threat in studies where different teachers are responsible for implementing each intervention." 6. Dr. Jones wanted to examine the effectiveness of his new critical thinking skills program for gifted students. He gave a test of critical thinking to all the 9th graders at a local middle school, and selected the top ten percent of the students for a trial of his program. After 3 weeks of critical thinking lessons, he gave the critical thinking test to the participants, and found that their posttest scores were significantly lower than their pretest scores. He concluded that his critical thinking program had a negative effect on gifted students’ critical thinking skills. What threat to internal validity calls his conclusion into question? Statistical regression 7. Dr. Jones takes over the math period in two fifth grade classrooms. In one classroom he uses his Jones’ Mighty Math textbook, and in the other he uses the math textbook adopted by the district. The two classrooms were equivalent on a test of math knowledge at the beginning of the year, but an end-of-year test shows that the students using Dr. Jones’ textbook learned significantly more. Dr. Jones claims that this result demonstrates the superiority of his textbook. A critic of Dr. Jones’ claim is most likely to mention which kind of threat to internal validity? Experimenter effects 8. A longitudinal panel study of students at a certain private school found that in the year the study began, 22% of the first grade students were diagnosed as having ADHD. Five years later, those same students were contacted. Of those still at this private school, only 14% were now categorized as having ADHD. The principal concluded that the strict discipline program used at that school was improving students’ ability to concentrate, and therefore reducing the number showing signs of ADHD. What threat to internal validity offers a different explanation of the change observed? subject attrition 9. Dr. Jones gives a group of 100 students a spelling test containing words of Greek origin. He then gives these students 15 minutes of daily instruction for a week on the history of Greece and its contribution to Western civilization. At the end of the week he tests them on the same words, and finds that their accuracy in spelling has improved. He attributes the gain in spelling accuracy to his intervention. What threat to internal validity calls this interpretation into question? Pretesting 10. Two math teachers in a large middle school each taught algebra to 3 classes of 7th graders, and there were 25 students in each of these classes, so a total of 150 students were involved in the study. Dr. Jones got each of the teachers to try one of two new instructional approaches. The mean performance of students who had experienced instructional method A was higher than the mean performance of students who had experienced instructional method B. Conducting a t-test with an N of 150, Dr. Jones found the difference in means to be statistically significant, and concluded that instructional method A was superior. What threat to internal validity calls this interpretation into question? Treatment replications 11. Dr. Jones was investigating the effects of two different behavior management techniques. Ten classrooms were randomly assigned to the Assertive Discipline condition, and 10 to the Positive Humor condition. Dr. Jones, having received lots of funding, had 20 graduate assistants, and assigned a different graduate assistant to observe in each classroom. By chance, all the graduate students who were observing in the Assertive Discipline classrooms were from the counseling program, and all the graduate students who were observing in Positive Humor classrooms were from the educational leadership program. The frequency of disruptive behaviors was found to be significantly higher in the classrooms using Assertive Discipline. Dr. Jones wants to publish an article using this data to prove that Positive Humor is superior to Assertive Discipline. What threat to internal validity calls this claim into question? Instrumentation 12. What is the most serious threat to internal validity of an ex post facto design? The most serious threat to internal validity of an ex post facto design is around subject or group selection. The researcher needs to be very careful that extraneous variables have not influenced the study and that differences in groups being compared have been controlled. 13. Which threats to internal validity are controlled for by a true experimental design? Which are not? This question confused me a bit, with what was meant by a "true experiment" but after reading that the randomized groups were these it made it a bit easier. Controlled threats would be: selection, maturation, and statistical regression. Not controlled, or threats that could have an effect on the experiment would be all the others: history, pretesting, instrumentation, treatment replications, subject attrition, diffusion of treatment, experimenter effects, and subject effects. I used the chart on page 232 to help me with this. Internal Validity threats: History: uncontrolled events that might impact that dependent variable. Example: half the control group gets the flu, big news story on the topic being studied in the experiment Selection: A difference between the treatment groups ( other than the treatment) Example: schools in the treatment condition have more experienced teachers, students in the control condition have higher verbal IQ’s. Selection is almost always a serious potential problem for ex post facto studies Maturation: any change over time (whether long term or short term) that might impact the dependent variable. Example: subjects are older and wiser after a year of schooling, subjects are tired and bored after 20 minutes of testing. Maturation is a threat to internal validity in single group designs, of if it differentially affects the treatment groups. Pretesting: taking the pretest can impact performance on the posttest. Example: taking the pretest can alert subjects to the purpose of the experiment, taking the pretest can increase subjects’ awareness of and hence subsequent learning about, the topic of the test. Pretesting is a threat to internal validity only in single group designs, or if it differentially affects the treatment groups. Instrumentation: unreliability or bias in the measure of procedures used to obtain data. Example: using different observers for different treatment conditions, using a dependent measure that is not sensitive to the treatment. Instrumentation is a threat to internal validity only in single group designs or if it differentially effects the treatment groups. Treatment replications: inferential statistical tests assume that each observation (the value of the dependent variable for each subject) is independent. Example: a teacher teaching a class of 25 students with a new method should count as only one instance of the treatment, not 25. Subject Attrition: subjects drop out of the study. Example: in a longitudinal study of high school students, poor readers are more likely to drop out. Subject attrition is a threat to internal validity only in single group designs, or if it differentially affects the treatment groups. Statistical regression: due to measurement error, subjects with extreme high or low scores on the pretest are likely to score closer to the mean on the posttest. Example: If you use a reading test to identify the lowest 20% of readers in your class, these students will (on the average) do better on the posttest, whether or not there has been any actual improvement in their reading ability. Diffusion of treatment: the control group may find out about, and imitate, the treatment received by the experimental group: Example, the control group teachers find out what the experimental group teachers are doing, and try it out. Experimenter effects: The influence of the experimenter's behavior, personality traits, or expectancies on the results of his or her own research. Expectations or attributes of the experimenter differentially affect the treatment groups. Example: the experimenter implements two treatments, and unconsciously favors the one he/she hopes will do better. Subject effects: changes in behavior generated by the subject by virtue of being in a study. The behavior of subjects is influenced by their knowledge that they are taking part in the study. Example: Hawthorne effects: the subjects perform better because they know they are in an experiment. Compensatory rivalry: control group subjects try harder. Resentful demoralization, control group subjects resent not getting treatment and give up.