Simple Linear Regression and Correlation: Inferential Methods Chapter 13 AP Statistics Peck, Olsen and Devore Topic 2: Summary of Bivariate Data In Topic 2 we discussed summarizing bivariate data Specifically we were interested in summarizing linear relationships between two measurable characteristics We summarized these linear relationships by performing a linear regression using the method of least squares Least Squares Regression Graphically display the data in a scatterplot Calculate the Pearson’s Correlation Coefficient Determine if the model is appropriate No patterns Determine the Coefficient of Determination yˆ a bx Inspect the residual plot The strength of the linear association Perform the least squares regression Form, strength and direction How good is the model as a prediction tool Use the model as a prediction tool Interpretation Pearson’s correlation coefficient Coefficient of Determination Variables in yˆ a bx Standard deviation of the residuals Minitab Output Simple Linear Regression Model ‘Simple’ because we had only one independent variable yˆ a bx We interpreted as a predicted value of y given a specific value of x When y f (x) we can describe this as a deterministic model. That is, the value of y is completely determined by a given value x That wasn’t really the case when we used our linear regressions. The value of y was equal to our predicted value +/- some amount. That is, y a bx e We call this a probabilistic model. So, without e, the (x,y) pairs (observed points) would fall on the regression line. yˆ Now consider this … How did we calculate the coefficients in our linear regression models? We were actually estimating a population parameter using a sample. That is, the simple linear regression y a bx e is an estimate for the population regression line y x e We can consider a, b estimates for , Basic Assumptions for the Simple Linear Regression Model The distribution of e at any particular value of x has a mean value of 0. That is, e 0 The standard deviation of e is the same for any value of x. Always denoted by The distribution of e at any value of x is normal The random deviations are independent. Another interpretation of yˆ Consider , y x e where the coefficients are fixed and e is distributed normally. Then the sum of a fixed number yˆ and a normally distributed variable is normally distributed (Chapter 7). So y is normally distributed. Now the mean of y will be equal to x plus the mean of e which is equal to 0 So another interpretation is the mean y value for a given x value = x Distribution of y Where y x e we can now see that y is distributed normally with a mean of x The variance for y is the same as the 2 variance of e -- which is 2 2 An estimate for is se Assumption The major assumption to all this is that the random deviation e is normally distributed. We’ll talk more about how this assumption is reasonable later. Inferences about the slope of the population regression line Now we are going to make some inferences about the slope of the regression line. Specifically, we’ll construct a confidence interval and then perform a hypothesis test – a model utility test for simple linear regression Just to repeat … We said the population regression model is y x e The coefficients of this model are fixed but unknown (parameters) – so using the method of least squares, we estimate these parameters using a sample of data (statistics) and we get y a bx e Sampling distribution of b We use b as an estimate for the population coefficient in the simple regression model b is therefore a statistic determined by a random sample and it has a sampling distribution Sampling distribution of b When the four assumptions of the linear regression model are met The mean value of the sampling distribution of b is . That is, b The standard deviation of the statistic b is b x X 2 The sampling distribution of b is normally distributed. Estimates for … The estimate for the standard deviation of b is sb se x X 2 When we standardize b it has a t distribution with n-2 degrees of freedom b t sb Confidence Interval Sample Statistic +/- Crit Value * Std Dev of Stat b t sb * Hypothesis Test We’re normally interested in the null H o : because if we reject the null, the data suggests there is a useful linear relationship between our two variables We call this ‘Model Utility Test for Simple Linear Regression’ 0 Summary of the Test Ho : 0 Test Statistic HA : 0 b t sb Assumptions are the same four as those for the simple linear regression model. Minitab Output