AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression ESSENTIAL QUESTION: What is causation and how can it be determined? OBJECTIVES: •To examine data for causation, and to examine data. •To be careful when using models for extrapolation. •To understand that causation may be hard to determine do to effects of lurking variables, common response, and confounding variables. Correlation and Regression • Correlation and regression describe only linear relationships. • The correlation r and the least-squares regression line are not resistant. One influential or incorrectly entered data point can greatly change these measures. • Always plot data before interpreting regression or correlation. Extrapolation Extrapolation is the use of a regression line for prediction far outside the domain of values of the explanatory variable x that you used to obtain the line or curve. Such predictions are often not accurate. Page 226 Example 4.10 DISCRIMINATION IN MEDICAL TREATMENT Lurking Variables A lurking variable is a variable that is not among the explanatory or response variables in study and yet may influence the interpretation of relationships among those variables. Lurking Variables • The relationship between two variables can be strongly influenced by lurking variables. • Many lurking variables change systematically over time. • One method of detecting if time has an influence is to plot residuals and response variables over time if available. Using averaged data • Many regression or correlation studies work with averages or other measures that combine information from many individuals. Note this carefully and resist the temptation to apply the results of such studies to individuals. • Correlations based on averages are usually too high when applied to individuals. This is another reminder that it is important to note exactly what variables were measured in a statistical study. The question of causation In many studies of the relationship between two variables, the goal is to establish that changes in the explanatory variable cause changes in the response variable. Even when a strong association is present, the conclusion that this association is due to a causal linking in the variables is often elusive. Explaining association: causation Variable x and y show a strong association (dashed line). This association may be the result of any of several causal relationships ( solid arrow). Causal associations • Causation: changes in x cause changes in y. • Common response: Changes in both x and y are caused by changes in a lurking variable z. • Confounding: The effect ( if any ) of x and y is confounded with the effect of a lurking variable. • Even when direct causation is present. It is exactly a complete explanation of an association between two variables. Explaining association: Common Response • Beware of lurking variables when thinking about an association between two variables. • The observed association between the variables x and y is explained by a lurking variable z. Both x and y change to changes in z. This common response creates an association even though there may be no direct causal link between x and y. Confounding • Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. • Even a very strong association between two variables is not by itself good evidence that there is a causeand-effect link between the variables. What are the criteria for establishing causation when we can’t dfo an experiment? • The association is strong. • The association is consistent. • Higher doses are associated with stronger responses. • The alleged cause precedes the effect in time. • The alleged cause is plausible.