Looking at Data-Relationships 2.1 –Scatter plots Definitions • Scatter plot-shows relationship between two quantitative variables measured on the same individuals • Explanatory variable-a variable may explain or even cause changes in another • Response variable-a variable changes with explanatory variables • Scatter plot axis –x axis(explanatory variable), y axis(response variable) • Examining a scatter plot – Overall pattern(linear, non-linear, quadratic, etc) and deviations – Overall pattern of scatter plot by form( line, parabola),direction( positive, negative), and strength( strong, weak) of the relationship – An important kind of deviation is an outlier – Positive association(high values of the two variables tend to occur together) – Negative association(high values of one variable tend to occur with low values of the other variable) – Strength-the strength of a relationship is determined by how close the points in the scatter plot lie to simple form such as line • Prep work Do problem 2.7 in the text book. Store second-test scores in list L3 & Final-exam score in list L4 Do problem 2.11 Looking at Data-Relationships 2.2 –Correlation Definitions • Correlation r- measures the direction and strength of the linear(straight line) association between two quantitative variables x & y • You can calculate a correlation for any scatter plot, r measures only linear relationships • r>0 ->positive association • r<0 ->negative association • r between -1 & 1 including endpoints • Perfect correlation , r=+ or – 1 occurs only when the points lie exactly on a straight line • Formula for the correlation coefficient between x & y- standard deviation of x= • Correlation ignores the distinction between explanatory and response variables • Correlation not resistant-outliers can greatly change the value of r • Prep work– Do problem 2.29. Store price in list L5 & deforestation in list L6 Looking at Data-Relationships 2.3 –Least-Squares Regression Definitions • Regression linea straight line that describes the relationship between x & y Requires an explanatory variable & a response variable • Fitting a line to datay b0 b1x b0 y int ercept b1 slope(the amount by which y changes when x increases by one unit • Extrapolation- Use of a regression line for prediction far outside the range of values of the explanatory variable x used to obtain the line Definitions • Least-squares regression line of y on x a line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible Requires an explanatory variable & a response variable • Equation of the Least-squares regression line yˆ b 0 b1x b 0 y int ercept y b1x b1 slope r 2 sy sx • r in regression- is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x Looking at Data-Relationships 2.4 –Cautions about Correlation and Regression Definitions • Residuals Difference between an observed value of the response and the value predicted by the regression line Requires an explanatory variable & a response variable • Residual equation residual observed y predicted y y yˆ • Special property: the mean of the least-squares residuals is always zero • Residual Plots: a scatter plot of regression residual against the explanatory variable. Help us to assess the fit of a regression line Definitions • Outlier-An observation that lies outside the overall pattern • Influential observations-if removed it would change the result of the calculation • Lurking variable: a variable that is not among explanatory or response variables but yet may influence the interpretation of relationships among those variables • Association does not imply causation Prep work-Brain Activity vs. Empathy score example Will women who are higher in empathy respond more strongly when their partner has a painful experience? 1)Store empathy scores in list L1 & Brain activity in list L2 2)Use the TI-84 to find the equation of the least-squares regression line of brain activity on empathy score (use 4 decimals for coefficients) 3)Use the equation to predict the empathy score for subject 1 4)Find the residual for subject 1 5)Subject 16 can be considered as a possible outlier, find the equation of the leastsquares regression line of brain activity on empathy score without this outlier Looking at Data-Relationships 2.6 –The Question of Causation Definitions • Some observed associations between two variables are due to a cause-and-effect relationship between these variables, but others are explained by lurking variables • The effect of lurking variables can operate through common response if changes in both explanatory and response variables are caused by changes in lurking variables. • Confounding of two variables(either explanatory or lurking variables) means that we cannot distinguish their effects on the response variables