Chapter 4 Slides

Chapter 4 Describing the Relation Between Two Variables Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 1 of 3 Overview ● Data for a single variable is univariate data ● Many or most real world models have more than one variable … multivariate data ● In this chapter we will study the relations between two variables … bivariate data Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 2 of 3 Chapter 4 ● Chapter 4 – Describing the Relation Between Two Variables Only section 1 and 2   Scatter Diagrams and Correlation Least-Squares Regression Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 3 of 3 Chapter 4 Section 1 Scatter Diagrams and Correlation Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 4 of 3 Chapter 4 – Section 1 ● In many studies, we measure more than one variable for each individual ● Some examples are  Rainfall amounts and plant growth  Exercise and cholesterol levels for a group of people  Height and weight for a group of people ● In these cases, we are interested in whether the two variables have some kind of a relationship Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 5 of 3 Chapter 4 – Section 1 ● When we have two variables, they could be related in one of several different ways  They could be unrelated  One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable)  One variable could be thought of as causing the other variable to change ● In this chapter, we examine the second case … explanatory and response variables Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 6 of 3 Chapter 4 – Section 1 ● Sometimes it is not clear which variable is the explanatory variable and which is the response variable ● Sometimes the two variables are related without either one being an explanatory variable ● Sometimes the two variables are both affected by a third variable, a lurking variable, that had not been included in the study Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 7 of 3 Chapter 4 – Section 1 ● An example of a lurking variable ● A researcher studies a group of elementary school children  Y = the student’s height  X = the student’s shoe size ● It is not reasonable to claim that shoe size causes height to change ● The lurking variable of age affects both of these two variables Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 8 of 3 Chapter 4 – Section 1 ● Some other examples ● Rainfall amounts and plant growth  Explanatory variable – rainfall  Response variable – plant growth  Possible lurking variable – amount of sunlight ● Exercise and cholesterol levels  Explanatory variable – amount of exercise  Response variable – cholesterol level  Possible lurking variable – diet Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 9 of 3 Chapter 4 – Section 1 ● The most useful graph to show the relationship between two quantitative variables is the scatter diagram ● Each individual is represented by a point in the diagram  The explanatory (X) variable is plotted on the horizontal scale  The response (Y) variable is plotted on the vertical scale Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 10 of 3 Chapter 4 – Section 1 ● An example of a scatter diagram ● Note the truncated vertical scale! Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 11 of 3 Chapter 4 – Section 1 ● There are several different types of relations between two variables  A relationship is linear when, plotted on a scatter diagram, the points follow the general pattern of a line  A relationship is nonlinear when, plotted on a scatter diagram, the points follow a general pattern, but it is not a line  A relationship has no correlation when, plotted on a scatter diagram, the points do not show any pattern Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 12 of 3 Chapter 4 – Section 1 ● Linear relations have points that cluster around a line ● Linear relations can be either positive (the points slants upwards to the right) or negative (the points slant downwards to the right) Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 13 of 3 Chapter 4 – Section 1 ● For positive (linear) associations  Above average values of one variable are associated with above average values of the other (above/above, the points trend right and upwards)  Below average values of one variable are associated with below average values of the other (below/below, the points trend left and downwards) ● Examples  “Age” and “Height” for children  “Temperature” and “Sales of ice cream” Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 14 of 3 Chapter 4 – Section 1 ● For negative (linear) associations  Above average values of one variable are associated with below average values of the other (above/below, the points trend right and downwards)  Below average values of one variable are associated with above average values of the other (below/above, the points trend left and upwards) ● Examples  “Age” and “Time required to run 50 meters” for children  “Temperature” and “Sales of hot chocolate” Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 15 of 3 Chapter 4 – Section 1 ● Nonlinear relations have points that have a trend, but not around a line ● The trend has some bend in it Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 16 of 3 Chapter 4 – Section 1 ● When two variables are not related  There is no linear trend  There is no nonlinear trend ● Changes in values for one variable do not seem to have any relation with changes in the other Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 17 of 3 Chapter 4 – Section 1 ● Nonlinear relations and no relations are very different  Nonlinear relations are definitely patterns … just not patterns that look like lines  No relations are when no patterns appear at all Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 18 of 3 Chapter 4 – Section 1 ● Examples of nonlinear relations  “Age” and “Height” for people (including both children and adults)  “Temperature” and “Comfort level” for people ● Examples of no relations  “Temperature” and “Closing price of the Dow Jones Industrials Index” (probably)  “Age” and “Last digit of telephone number” for adults Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 19 of 3 Chapter 4 – Section 1 ● The linear correlation coefficient is a measure of the strength of linear relation between two quantitative variables ● The sample correlation coefficient “r” is r  ( xi  x ) ( y i  y ) sx sy n 1 ● This should be computed with software (and not by hand) whenever possible Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 20 of 3 Chapter 4 – Section 1 ● Some properties of the linear correlation coefficient  r is a unitless measure (so that r would be the same for a data set whether x and y are measured in feet, inches, meters, or fathoms)  r is always between –1 and +1  Positive values of r correspond to positive relations  Negative values of r correspond to negative relations Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 21 of 3 Chapter 4 – Section 1 ● Some more properties of the linear correlation coefficient  The closer r is to +1, the stronger the positive relation … when r = +1, there is a perfect positive relation  The closer r is to –1, the stronger the negative relation … when r = –1, there is a perfect negative relation  The closer r is to 0, the less of a linear relation (either positive or negative) Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 22 of 3 Chapter 4 – Section 1 ● Examples of positive correlation Strong Positive r = .8 Moderate Positive r = .5 Very Weak r = .1 ● In general, if the correlation is visible to the eye, then it is likely to be strong Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 23 of 3 Chapter 4 – Section 1 ● Examples of negative correlation Strong Negative r = –.8 Moderate Negative r = –.5 Very Weak r = –.1 ● In general, if the correlation is visible to the eye, then it is likely to be strong Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 24 of 3 Chapter 4 – Section 1 ● Nonlinear correlation and no correlation Nonlinear Relation No Relation ● Both sets of variables have r = 0.1, but the difference is that the nonlinear relation shows a clear pattern Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 25 of 3 Chapter 4 – Section 1 ● Correlation is not causation! ● Just because two variables are correlated does not mean that one causes the other to change ● There is a strong correlation between shoe sizes and vocabulary sizes for grade school children  Clearly larger shoe sizes do not cause larger vocabularies  Clearly larger vocabularies do not cause larger shoe sizes ● Often lurking variables result in confounding Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 26 of 3 Summary: Chapter 4 – Section 1 ● Correlation between two variables can be described with both visual (graphic) and numeric methods ● Visual methods  Scatter diagrams ● Numeric methods  Linear correlation coefficient Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 27 of 3 Chapter 4 Section 2 Least-Squares Regression Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 28 of 3 Chapter 4 – Section 2 ● If we have two variables X and Y, we often would like to model the relation as a line ● Draw a line through the scatter diagram ● We want to find the line that “best” describes the linear relationship … the regression line Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 29 of 3 Chapter 4 – Section 2 ● We want to use a linear model ● Linear models can be written in several different (equivalent) ways  y=mx+b  y – y1 = m (x – x1)  y = b1 x + b0 ● Because the slope and the intercept are important to analyze, we will use y = b1 x + b0 Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 30 of 3 Chapter 4 – Section 2 ● The difference between the observed value and the predicted value is called an error or residual ● The formula for the residual is always Residual = Observed – Predicted Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 31 of 3 Chapter 4 – Section 2 ● For example, say that we want to predict a value of y for a specific value of x  Assume that we are using y = 10 x + 25 as our model  To predict the value of y when x = 3, the model gives us y = 10  3 + 25 = 55, or a predicted value of 55  Assume the actual value of y for x = 3 is equal to 50  The actual value is 50, the predicted value is 55, so the residual (or error) is 50 – 55 = –5 Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 32 of 3 Chapter 4 – Section 2 ● What the residual is on the scatter diagram The model line The residual The observed value y The predicted value y The x value of interest Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 33 of 3 Chapter 4 – Section 2 ● We want to minimize the residuals, but we need to define what this means ● We use the method of least-squares  We consider a possible linear mode  We calculate the residual for each point  We add up the squares of the residuals  residuals 2 ● The line that has the smallest  residuals 2 is called the least-squares regression line Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 34 of 3 Chapter 4 – Section 2 ● The equation for the least-squares regression line is given by y = b 1x + b 0  b1 is the slope of the least-squares regression line  b0 is the y-intercept of the least-squares regression line Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 35 of 3 Chapter 4 – Section 2 ● Finding the values of b1 and b0, by hand, is a very tedious process ● You should use software for this ● Finding the coefficients b1 and b0 is only the first step of a regression analysis  We need to interpret the slope b1  We need to interpret the y-intercept b0 Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 36 of 3 Chapter 4 – Section 2 ● Interpreting the slope b1  The slope is sometimes referred to as Rise Run  The slope is also sometimes referred to as Change in y Change in x ● The slope relates changes in y to changes in x Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 37 of 3 Chapter 4 – Section 2 ● For example, if b1 = 4  If x increases by 1, then y will increase by 4  If x decreases by 1, then y will decrease by 4  A positive linear relationship ● For example, if b1 = –7  If x increases by 1, then y will decrease by 7  If x decreases by 1, then y will increase by 7  A negative linear relationship Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 38 of 3 Chapter 4 – Section 2 ● For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable)  To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) ● The model used is y = 300 x + 12,000 ● A slope of 300 means that the model predicts that, on the average, the population increases by 300 per year Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 39 of 3 Chapter 4 – Section 2 ● Interpreting the y-intercept b0 ● Sometimes b0 has an interpretation, and sometimes not  If 0 is a reasonable value for x, then b0 can be interpreted as the value of y when x is 0  If 0 is not a reasonable value for x, then b0 does not have an interpretation ● In general, we should not use the model for values of x that are much larger or much smaller than the observed values Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 40 of 3 Chapter 4 – Section 2 ● For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable)  To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) ● The model used is y = 300 x + 12,000 ● An intercept of 12,000 means that the model predicts that the town had a population of 12,000 in the year 1900 (i.e. when x = 0) Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 41 of 3 Chapter 4 – Section 2 ● After finding the slope b1 and the intercept b0, it is very useful to compute the residuals, particularly  residuals 2 ● Again, this is a tedious computation ● All the least-squares regression software would compute this quantity Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 42 of 3 Summary: Chapter 4 – Section 2 ● We can find the least-squares regression line that is the “best” linear model for a set of data ● The slope can be interpreted as the change in y for every change of 1 in x ● The intercept can be interpreted as the value of y when x is 0, as long as a value of 0 for x is reasonable Sullivan – Statistics: Informed Decisions Using Data – 2nd Edition – Chapter 4 Introduction – Slide 43 of 3

Chapter 4 Slides

Related documents

Products

Support

Chapter 4 Slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib