-Perform a Linear Regression T-test and calculate and interpret a confidence interval for regression slope. a= y-intercept of our sample data b=slope of our sample data. Estimating Parameters (we need to denote our population data differently than our sample data) Let: α= true population y-intercept ß= true population slope Step 1: Create a scatter plot so you can visually see what this data looks like. Think about what is the explanatory and the response variable? Suppose a local restaurant wanted to predict the amount of tip left based on the amount of the customer’s bill. Find the LSRL in your calculator: -0.7367+0.164x x=amount of bill y=amount of tip (Don’t forget to define your variables!) Whenever we have a linear regression test on the AP exam, they will give you computer output of the numbers all crunched for you! The first step with a Linear Regression t-test and interval is to learn how to read the computer output!! So this is what you would get! Notice it’s the same equation we got when typing it in our calculator earlier. After you get your LSRL, we don’t need any more data from the top row so cross it out! (leave you’re y-intercept: -0.7367) Our question of interest: Using a 5%significance level, is there evidence of a linear relationship between the amount of a bill and the amount that was tipped? (Assume the conditions for inference are met) Remember: If they ask you “is there evidence”, you have to complete a test. We will use a linear regression t-test, since we are determining if there is a relationship between 2 quantitative variables. (** Chi-squared independence test was when we have categorical data) In order to show a linear relationship, we can test to see if the slope is positive or negative (no slope=no association) Since the sample data gives us a slope using “b”, we can denote the population slope using “ß”. ß= true slope of y per x (in context of the problem) Ho: ß=0 (this really means no association) Ha: ß≠0 (this really means there is an association) Assumptions: If you have a linear regression output on the AP exam-it will always state- Assume your assumptions are met. (So don’t worry about them!) Test Name: Linear Regression T-test Alpha: 0.05 Calcualtions: P(t> ___)=p-value Degrees of Freedom: 𝒏 − 𝟐 (there are 2 variables so we use n-2, not n-1) Decision and Statement: Since p<α, …….SAME THING WE’VE BEEN DOING!! So let’s look at the output again: ß= true slope of amount tipped per the amount of the bill 𝐻0 : ß=0 𝐻𝑎 : ß≠0 Assumptions: stated in problem they are met. Linear Regression T-test α = 0.05 Calculations (given in the table): 2P(t> 9.18)=0.0027 Degrees of Freedom: 5 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟𝑠 − 2 = 3 Decision and Statement: Since p<α, it is statistically significant, therefore we reject 𝐻0 . There’s enough evidence to suggest there’s a relationship between the amount of a bill and the amount tipped. What is the slope? Interpret? On average, for every point increase on a student’s quiz grade, the final grade will increase by 0.75 points. What % of the variation in the final grade can be explained through the variation of the least-squares regression line of final grade on quiz grade? r²=37% What is the correlation? Interpret? r=0.61 (It is positive b/c the slope is positive) There is a moderate positive linear relationship between quiz grades and a final grade. Is there evidence of an association between a student’s quiz grade and their final grade. ß= true slope of final grade per quiz grade 𝐻0 : ß=0 𝐻𝑎 : ß≠0 Assumptions: stated in problem they are met. Linear Regression T-test α = 0.05 Calculations (given in the table): 2P(t> 5.31)=0.000 Degrees of Freedom: 50-2= 48 Decision and Statement: Since p<α, it is statistically significant, therefore we reject 𝐻0 . There’s enough evidence to suggest there’s a relationship between the quiz grade and a students final grade. A level C confidence interval for the slope 𝛽 of the true regression line is: 𝑏± ∗ 𝑡 𝑆𝐸𝑏 where 𝑆𝐸𝑏 =standard error of the slope We find 𝑡 ∗ in the table in the back of your book (use the degrees of freedom and CI % to find it). Ex: Compute a 95% confidence interval for the true slope of amount tipped per cost of bill. Name: Linear Regression t-interval Assumptions: Stated in the problem they are met Calculations: First look up the 𝑡 ∗ value: Go to 95%, df=3 𝑡 ∗ = 3.182 df=3 𝑏 ± 𝑡 ∗ 𝑆𝐸𝑏 = 0.16406 ± 3.182 0.01787 = 0.107, 0.221 Statement: We are 95% confident that the true slope of amount tipped per cost of bill is between 0.107 and 0.221. How well do golfers’ scores in the first round of a two-round tournament predict their scores in the second round? The data for 12 members of a college’s women’s golf team in a recent tournament are listed below. Is there good evidence that there is an association between first and second round Example: scores? (Assume conditions for inference are met) Golfer 1 2 3 4 5 6 7 8 9 10 11 12 Round A 89 90 87 95 86 81 102 105 83 88 91 79 Round B 94 85 89 89 81 76 107 89 87 91 88 80 ß= true slope of score on round B per score of round A 𝐻0 : ß=0 𝐻𝑎 : ß≠0 Assumptions: stated in problem they are met. Linear Regression T-test α = 0.05 Calculations (given in the table): 2P(t> 2.99)=0.0136 Degrees of Freedom: 12-2= 10 Decision and Statement: Since p<α, it is statistically significant, therefore we reject 𝐻0 . There’s enough evidence to suggest there’s a relationship between the score on round A and round B. Give a 95% confidence interval for the increased rate of golf scores. Linear Regression t-interval Assumptions: Stated in the problem they are met Calculations: 𝑡 ∗ = 2.228 df=10 𝑏 ± 𝑡 ∗ 𝑆𝐸𝑏 = 0.6877 ± 2.228 0.23 = 0.1753,1.200 Statement: We are 95% confident that the true slope of score of round B per Round A is b/w 0.1753 and 1.200. What is the line of best fit? Define any variables. 𝑦 = 26.332 + 0.6877𝑥 x=score of Round A y=score of Round B Interpret the slope: b=0.6877 On average for every increase in score of Round A, we expect Round B to increase by 0.6877 points Interpret the y-intercept: a= 26.332 When the score on round A is 0, we predict the score of round B to be 26.332 Linear Regression t-interval Assump: Stated in the problem they are met 𝑡 ∗ = 1.860 df=10-2=8 2.1495 ± 1.860 0.1396 = 1.889,2.409 We are 95% confident that the true slope of score of fuel consumption per # of railcars is b/w 1.889 and 2.409.