Study Questions for Chapter 11 of Shadish et al. “Generalized Causal Inference: A Grounded Theory” p. 341 What is the strength of the randomized experiment? What is one of the oldest criticisms of the randomized experiment? p. 341 What are the two different kinds of generalizations that can be made about units, treatments, outcomes and settings? To what type of validity does each of these kinds of generalizations refer? p. 342 Comment on the following statement: “Our theories and methods regarding generalized causal inference are considerably more advanced than our theories and methods regarding internal validity.” p. 342 Shadish et al. state that “Most of this knowledge about generalization is the product of multiple attempts at replication, of reflection on why successful or failed generalization might have occurred, and of empirical tests of which reasons are true.” This procedure is most useful in which kinds of scientific fields? How does this apply to the social sciences? p. 343 What is the difference between simple random sampling, and stratified random sampling? Give a concrete example that illustrates the difference. p. 343 Describe how cluster sampling works. Give a concrete example. How does the margin of error for cluster sampling compare with the margin of error for simple random sampling? pp. 343-344 According to Lavori et al. (1986), the question of generalizability may not be resolvable unless ___________________________ . p. 344 Most surveys are concerned with the random sampling of people. However, Shadish et al. point out that in experiments, one also needs to sample _______________. p. 345 Instead of randomly selecting from possible treatments, field researchers generally prefer to add ____________________ p. 345 In theory, it would be possible to administer a randomly selected set of measures. But field researchers generally prefer to select measures that ___________________ p. 345 In field research, what choice do researchers usually make regarding (a) the number of constructs/outcomes measured versus (b) the number of measures of each construct? p. 345 Give a concrete example to illustrate the comment by Shadish et al. that “item selection tends to drift toward rational selection based on human judgments about the degree to which particular components are prototypical of a construct and then about the degree to which specific items are prototypical of those components.” p. 346 In theory, it would be possible to use cluster sampling to randomly sample settings (such as community mental health centers). However, what practical constraints still limit the experimenter’s ability to sample settings formally? [Note: The constraints that apply to settings are not all the same as those that apply to persons.] pp. 346-347 Shadish et al. give four examples in which experimenters randomly selected participants from a large population, then randomly assigned these participants to treatments. Describe any single one of these experiments, including (a) the population sampled and its approximate size (if given), and (b) the type of treatment that was employed. p. 347 Although Wallace et al. randomly selected patients with drinking problems from a population of hundreds of thousands, what problem did they then encounter? How did this affect the representativeness of their sample? p. 347 What is the financial reason that random sampling of settings is often impractical? p. 348 Overall, what do Shadish et al. conclude about random sampling as the best model for dealing with the problems of generalized causal inference? p. 349 Why do Shadish et al. mean when they say that their theory of causal generalizations is “grounded”? p. 349 Comment on the following statement: “Although uneducated people commonly make generalizations, scientists do so only rarely.” p. 349 What is perhaps the most common form of human generalization? Give a concrete example. pp. 349-350 How does the Beck Depression Inventory illustrate the use of generalization in measurement? p. 350 Give an example from the Cognitive Sciences that illustrates how similar results have been found over heterogeneous instances? In this example, identify what is consistent about the findings. In this example, also identify some of the irrelevant characteristics that have varied from one study to the next. pp. 350-351 In the neurochemistry of psychiatric disorder, what three criteria are used to evaluate whether findings from animal models can be generalized to human applications? p. 351 When the EPA sets limits for human exposure to toxic chemicals, what adjustments are made for (a) interindividual differences in human responsiveness to chemicals and (b) effects on infant development? pp. 351-352 According to Gross (1993), how did the findings on secondhand smoke differ among studies of (a) nonsmoking spouses of smokers (b) outside the U.S. where living conditions, ventilation, and smoking patterns are different from the U.S., and (c) workplace exposure in the U.S.? Which set of studies did Gross believe was most relevant to policies against smoking in U.S. public settings? p. 352 In evaluating new drugs, the FDA depends on a three-stage process. In which stage do clinical trials of the drug take place? In general terms, what is accomplished in the other two stages of the evaluation process? p. 352 Give an example of how a patient with prostate cancer might find out more about his chances of survival following surgery, other than simply considering the average survival rate of patients who have had the surgery. p. 353 What is the difference between “research therapy” and “clinic therapy” in studies of psychotherapy effects. Which of the two is generally agreed to be effective? Do the results generalize to the other? pp. 353-354 Shadish et al. describe five simple principles that scientists use in making generalizations. You do not have to memorize the names of these five principles. However, if given the name for a particular principle, be able to explain it and give an example. p. 354 Although field researchers seldom use random sampling techniques to make generalized causal inferences, they are more likely to use _________________ sampling strategies. p. 356 Explain what is meant by “purposive sampling of typical instances (PSI-Typ)” and give a concrete example. Explain what is meant by “purposive sampling of heterogeneous instances (PSI-Het)” and give a concrete example. (note: You may want to wait to answer this question until after you have read Chapter 12). p. 356 What are the two tasks of generalized causal inference? (Consult the questions for page 341) p. 356 “Surface similarity” is similar to what (somewhat outdated) concept from measurement theory? pp. 356 and 358 What do scientists probably rely on most of the time when deciding how to label treatments, outcomes, units (or persons), and settings? p. 359 Give a concrete example of “thoughtful matching of the surface characteristics of an operation to those that are prototypical of a construct.” p. 359 Give an example of how a researcher would go about constructing a test of second grade arithmetic ability that has “content relevance” and “representativeness.” p. 360 Give an example of how a researcher would go about selecting a sample of “Hispanics” that has “content relevance” and “representativeness.” p. 360 What is Campbell’s “principle of proximal similarity”? To what kind of validity is it most relevant? p. 360 Why can surface similarity be a “superficial” basis for making generalizations? What practical problem is likely to arise if a group of researchers tries to identify the “prototypical” characteristics of a construct? p. 361 Explain why the use of metronomic dosing for the treatment of cancer in humans was an example of “making a generalization based on surface similarities.” (You do not have to explain all the reasons physicians used this treatment. Just briefly describe the role that “generalization” probably played in their decision.] p. 361 Some jurors are male, some are female. However, gender is irrelevant to membership in the category of “juror,” or to the prototypical characteristics of being a juror. Therefore, for the construct of “juror,” gender is what measurement theorists call a C_________________ I_______________ . p. 361 The idea that two different measures should correlate with each other no matter what unique irrelevancies are associated with each of them is called ______________. p. 361 M__________________ O_________________ requires that all the operations used to index a construct are relevant to the construct of interest but that, across the set of operations, there will be heterogeneity in conceptually irrelevant features. p. 362 Give a concrete example of how a researcher might inadvertently select participants in a way that accidentally creates homogeneity (the opposite of heterogeneity) on supposedly irrelevant characteristics. Why does this create a problem of confounding ? p. 362 In Brunswik’s hypothetical study of whether people with squints are perceived as sly, what are the heterogeneous irrelevancies that would need to be varied? Why should they be varied? If they were NOT varied, why might the results be misleading? p. 363 Suppose that a researcher studies the effect of family support on rehospitalization rates among non-Hispanic whites with schizophrenia. The researcher finds that the correlation between family support is .25. A second researcher carries out a similar study, but with Hispanics. If the ethnicity of participants is irrelevant to the relationship of family support and rehospitalization, what is the expected outcome of the second study? If ethnic is relevant to the relationship, what is the expected outcome of the second study? p. 363 Describe a study in which a researcher begins by assuming that a certain feature of his/her study is irrelevant, but later learns that it is relevant. pp. 364-365 A researcher wants to study the effects of a particular medication on the symptoms of schizophrenic patients in mental health long-term care settings. The researcher therefore identifies all patients in 10 nursing homes whose charts indicate that they have been diagnosed with schizophrenia. The patients are randomly assigned to receive the medication or not. The effects on their symptoms are measured with a selfreport questionnaire. A critic questions the construct validity of the study in regards to (a) its characterization of the setting, (b) its characterization of the patients’ diagnosis, and (c) its characterization of what the questionnaire is measuring. What specific criticisms can the critic make for each of these points, according to Shadish et al? p. 365 Give a concrete example of a study in which the findings indicate that a causal relationship between two variables goes to zero or is reversed depending on some third variable or factor. p. 366 Using an example, explain the difference between interpolation and extrapolation. p. 367 Confidence in extrapolation is greatest when ____________________. p. 368 In contrast to surface similarity, causal explanations refer to what cognitive psychologists call d___________ or s_______________ similarity. pp. 368-369 Give an example of a situation in which a deep similarity does not give rise to the same surface similarity. Do not use an example provided by Shadish. Instead, make up your own example. p. 369 Using an example, explain why detailed knowledge of underlying or deep causal relationships can be a particularly powerful basis for making generalizations. pp. 370-371 Why are Shadish et al. less than optimistic about the possibility of making generalizations based on complete causal theories in the social sciences? p. 371 Explain the financial reasons that researchers do not frequently include features in their research that enhance its generalizability. Which two principles of generalized causal inference are probably most practical for application to single studies? p. 371 A researcher wishes to study the effect of an exercise program on obesity among Hispanics. How might the researcher address the principle of surface similarity, to show that the results can indeed be generalized to “Hispanics”? p. 372 Because a single study usually cannot address most questions about the generalizability of a research finding, what alternative do Shadish et al. suggest (which they will treat more fully in Chapter 13). p. 373 Which kind of validity is rarely a high priority in the prospective design of individual experiments? Why?