Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression Objectives To determine confidence interval estimators of the slope and the y intercept. To test hypotheses about the slope of the regression line. Estimating the slope and the y-intercept As we are aware from our study on confidence interval estimators previously, there are two types of estimators when estimating a population parameter: point estimators and interval estimators. The point estimators for the slope and the y - intercept can easily be determined from the Excel output generated when fitting the regression. The interval estimators can be just as easily determined from the Excel output generated. SUMMARY OUTPUT Regression Statistics Multiple R 0.86673932 R Square 0.75123705 Adjusted R Square 0.68904631 Standard Error 6.51658723 Observations 6 ANOVA df Regression Residual Total SS MS F Significance F 1 512.9697 512.9697 12.07956 0.025454365 4 169.8636 42.46591 5 682.8333 Intercept Experience Coefficients Std Error t Stat P-value Lower 95% Upper 95% 15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654 1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271 Therefore the 95% confidence interval estimate of the slope is from 0.336 to 3.009 ie from $336 to $3009. Excel also generates a confidence interval estimate for the y-intercept. This will only be considered if the yintercept has a sensible interpretation in the situation described. For our salary and experience example, the y- intercept does has a sensible interpretation ie it is the salary for a person with no experience. As such, we would also be interested in determining a confidence interval estimate of the intercept. Therefore the 95% confidence interval estimate of the intercept is from -1.933 to 32.569 ie from -$1933 to $32 569. Testing whether the relationship is real or coincidence We can easily summarise the relationship between two variables, whether it exists or not. Hypothesis testing will tell us whether the relationship that appears to be there, is pure coincidence or, there is in fact a significant relationship between the two variables. The null hypothesis states that there is no relationship between x and y. Therefore the hypotheses for testing a significant relationship are H 0 : 1 0 H A : 1 0 Why Statistical Inference? Because there can seem to be a relationship • when, in fact, the population is just random Below are plots of the data from samples of size n = 10 • from a population with no relationship (correlation 0) • Notice that the sample correlations are not zero! • This is due to the randomness of sampling r = – 0.471 r = 0.089 r = 0.395 For our example, we would be testing: is there a significant relationship between salary and experience? Step 1 H 0 : 1 0 H A : 1 0 Step 2 t ˆ1 1 sˆ 1 Step 3 0.05 t / 2,n2 t0.025, 4 2.776 Step 4 Reject H 0 if t sample 2.776 or t sample 2.776 Step 5 t ˆ1 1 s ˆ 1 1.67 0 0.48 3.48 Intercept Experience Coefficients Std Error t Stat P-value Lower 95% Upper 95% 15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654 1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271 t sample 3.476 (from Excel output) Step 4 Reject H 0 if t sample 2.776 or t sample 2.776 Step 5 t sample 3.48 Step 6 Since 3.48 > 2.776 we reject H0. There is sufficient evidence at = 0.05 to conclude that there is a significant linear relationship between salary and experience. Using the p-value to test: is there a significant relationship between salary and experience? H 0 : 1 0 H A : 1 0 0.05 Reject H 0 if p value 0.05 Level of significance: Decision rule: Intercept Experience Coefficients Std Error t Stat P-value Lower 95% Upper 95% 15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654 1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271 Since p value 0.025 0.05 we reject H 0 There is sufficient evidence at = 0.05 to conclude that there is a significant linear relationship between salary and experience. An important point to remember about using the p-value to test a hypothesis is that the p-value can give us a good indication of how much evidence exists to support the alternative hypothesis. The smaller the p-value, the more overwhelming is the evidence to support the alternative hypothesis. In our example here, the p-value was only 0.025. This allows us to conclude that a linear relationship exists when testing at = 0.05 and 0.1, but our conclusion would be different at = 0.01 Testing for a significant correlation In situations where we are interested in how the independent variable affects the dependent variable, we estimate and test hypotheses about the linear regression model. In many situation however, one variable does not influence the other and therefore we are not interested in estimating how the independent variable affects the dependent variable. We simply want to test whether there is a linear correlation between the two variables. Testing for a significant correlation For these situations the null hypothesis states that there is no linear correlation between x and y. Therefore the hypotheses for testing a significant linear correlation are H0 : 0 HA : 0 When we test for a significant correlation, you will find that the value of the test statistic and the conclusion are exactly the same as when we test for a significant relationship between two variables. This is because we are in fact testing the same thing. Are the two variables linearly related (correlated)? Therefore we perform one test or the other - not both! For our previous example, we would be testing: is there a significant linear correlation between salary and experience? Step 1 H0 : 0 HA : 0 Step 2 r 1 r t where sr sr n2 2 Step 3 0.05 t / 2,n2 t0.025, 4 2.776 Step 4 Reject H 0 if t sample 2.776 or t sample 2.776 Step 5 r t sr 1 r where sr n2 2 0.751 0 1 0.751 where sr 0.249 62 3.48 Intercept Experience Coefficients Std Error t Stat P-value Lower 95% Upper 95% 15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654 1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271 t sample 3.476 (from Excel output) Step 4 Reject H 0 if t sample 2.776 or t sample 2.776 Step 5 t sample 3.48 Step 6 Since 3.48 > 2.776 we reject H0. There is sufficient evidence at = 0.05 to conclude that there is a significant linear correlation between salary and experience. Using the p-value to test: is there a significant correlation between salary and experience? H0 : 0 HA : 0 0.05 Reject H 0 if p value 0.05 Level of significance: Decision rule: Intercept Experience Coefficients Std Error t Stat P-value Lower 95% Upper 95% 15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654 1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271 Since p value 0.025 0.05 we reject H 0 There is sufficient evidence at = 0.05 to conclude that there is a significant linear correlation between salary and experience. Reading for next lecture Read Chapter 18 Sections 18.6 (Chapter 11 Sections 11.6 abridged) Exercises to be completed before next lecture S&S 18.27 (11.27 18.29 11.29 abridged)