Chapter 3 Review: Examining Relationships Sam R. Kenan F. Rohil T. Daisy S. The relationship between two variables can be strongly influenced by other variables that are lurking in the background. Response variables can help explain or even cause changes in explanatory variables. However, response variables don’t necessarily case changes in explanatory variables. A scatterplot is the most effective way to display the relationship between two quantitative variables. Our eyes are not good judges of how strong a linear relationship is. Correlation requires that both variables be quantitative, and it does not describe curved relationships between variables. Correlation is not resistant, and it is not a complete summary of two-variable data The Big Idea 3.1 Regression requires that we have an explanatory variable and a response variable. Extrapolation can be used to predict outside the range of values of the explanatory variable. Residual plots make it easier to study the residuals. The coefficient of determination tells us how well the least-squares line does at predicting values of the response variable. The Big Idea 3.2 Correlation and regression describe only linear relationships. Extrapolation often produces unreliable predictions. Correlation is not resistant. Lurking variables can make a correlation or regression misleading. Association does not imply causation. Correlations based on averages are usually too high wen applied to individuals. The Big Idea 3.3 A response variable measures an outcome of a study. An explanatory variable helps explain or influences changes in a response variable. Calling one variable explanatory and the other response doesn’t necessarily mean that changes in one cause changes in the other! A scatterplot shows the relationship between two quantitative variables measured on the same individuals. Direction: positive or negative association Form: Linear relationships, curved relationships, and clusters Strength: Determined by how close the points in the scatterplot lie to a simple form such as a line Correlation (r) measures the strength and direction of the linear association between two quantitative variables x and y. r>0 for a positive association and r<0 for a negative association Correlation is always between -1 and 1. It is strongest when closest to 1 or -1. Correlation is not resistant, so outliers can greatly change the value of r. Vocabulary 3.1 A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. The slope b of a regression line ŷ=a+bx is the rate at which the predicted response ŷ chnages along the line as the explanatory variable x changes. b is thechange in ŷ hen x increases by 1. The y intercept “a” of a regression line is the predicted response ŷ when the explanatory variable x=0. Extrapolation is the use of a regression line for prediction of values of the explanatory variable outside the range of the data from which the line was calculated. The least-squares regression line is the line that minimizes the sum of the squares of the vertical distances of the observed points from the line. Residuals are the differences between observed and predicted values of y. The coefficient of determination (r^2) is the fraction of the variance of one variable that is explained by the least-squares regressions on the other variable. Vocabulary 3.2 Outliers- Influential observations- Lurking variables- ◦ An observation that lies outside the overall pattern of the other observations. ◦ Points that are outliers in the y direction of a scatterplot have large regression residuals, but other outliers need not have large residuals. ◦ An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. ◦ Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line. ◦ A variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables. Vocab 3.3 When exploring a bivariate relationship: ◦ Make and remember to interpret a scatterplot: Strength, Direction, Form ◦ Define x and y: Describe each Mean and Standard Deviation in Context ◦ Find the Least Squares Regression Line. Write in context. ◦ Construct and Interpret a Residual Plot. ◦ Interpret r and r2 in context. ◦ Use the LSRL to make predictions... Key topics Correlation always satisfies -1 ≤ r ≤ 1 If r is equal to +- 1 then all points lie on the line The least squares regression line is : ŷ=a+bx b=r∙Sy/Sx r2= (SST-SSE)/SST SST=∑ (y-ŷ)2 SSE=∑ (y-ȳ)2 r=∑ (y-ŷ) Formula Cheat sheet LinRegBx(Xlist,Ylist,frequency) for data table ShowLinear() for graphs 2VarStat(Xlist,Ylist) Calculator keystrokes Explain why you should not use the LSRL calculated earlier to make such a prediction. Question 1 NEA -94 -57 -29 135 143 151 245 355 392 Fat Gain 4.2 3. 3.7 2.7 3.2 3.6 2.4 1.3 3.8 NEA 473 489 535 571 580 620 690 Fat Gain 1.7 1.6 2.2 1 .4 2.3 1.1 •Use the data from example 3.9 and your calculator to obtain the equation of the LSRL that would be appropriate for predicting NEA from fat gain Question 2 Suppose the new subject’s fat gain is 3.0kg One of the original 16subjects had a fat gain of 3.0kg and that subjects NEA change was 57 calories. Explain why you should not just predict an NEA change of -57 calories for this new subject. What NEA change should you predict for this individual? Interpret the value of r2 you obtained in part (b) How does this compare to the r2 we obtained earlier for the line y=3.505-00344x explain why this makes sense. Questions 3 and 4 5): The Sanchez household is about to install solar panels to reduce the cost of heating their house. In order to know how much the panels help, they record their consumption of natural gas before the panels are installed. Gas consumption is higher in cold weather, so the relationship between outside temp and gas consumption is important. A)Describe the direction, form, and strength of the relationship. B)About how much gas does the regression line predict that the family will use in a month that averages 20 degree-days per day? Question 5 Consider the following historical data: 6) How strong is the relationship and is it positive or negative? Question 6 1. 2. 3. 4. 5. 6. We can not use the provided model because we will be extrapolating data outside of the range that the LSRL covers. Solve it by hand and you get the same LSRL. y=3.50500344x Residuals are large so we can not assume that one individual observation accounts for the whole association. The correct approximation should be around 146.91 Still get around .606 just more accurate proving the LSRL. The r2 means that 60.62% of the linear relationship between the NEA and fat gained is explained by the LSRL. A)Positive, linear, and very strong B) 500 cubic feet per day There is a perfect linear relationship between x and y. this is true because the r2 is equal to 1.00 1 Answers 2 3&4 5 6 Correlation makes no distinction between explanatory and response variables (It makes no difference which variable is x or y) The correlation, r, does not change when units of measurement are changed Correlation never describes curved relationships between variables, no matter the strength Correlation is not resistant and it strongly affected by outlying observations The size of the regression slope does not tell how important a relationship is Extrapolation often produces unreliable results Helpful Hints