14.1 Inference for Regression - how-confident-ru

advertisement

-Perform a Linear Regression T-test and
calculate and interpret a confidence interval
for regression slope.

a= y-intercept of our sample data

b=slope of our sample data.
Estimating Parameters (we need to denote our
population data differently than our sample
data)
Let:
α= true population y-intercept
ß= true population slope

Step 1: Create a scatter plot so you can
visually see what this data looks like. Think
about what is the explanatory and the
response variable?


Suppose a local restaurant wanted to predict
the amount of tip left based on the amount of
the customer’s bill.
Find the LSRL in your calculator:
-0.7367+0.164x
x=amount of bill
y=amount of tip
(Don’t forget to define your variables!)
Whenever we have a linear regression test on
the AP exam, they will give you computer
output of the numbers all crunched for you!
The first step with a Linear Regression t-test
and interval is to learn how to read the
computer output!!
So this is what you would get!

Notice it’s the same equation we got when
typing it in our calculator earlier.
After you get your LSRL, we don’t need any
more data from the top row so cross it out!
(leave you’re y-intercept: -0.7367)



Our question of interest: Using a 5%significance
level, is there evidence of a linear relationship
between the amount of a bill and the amount that
was tipped? (Assume the conditions for inference
are met)
Remember: If they ask you “is there evidence”,
you have to complete a test.
We will use a linear regression t-test, since we
are determining if there is a relationship between
2 quantitative variables.
(** Chi-squared independence test was when we
have categorical data)






In order to show a linear relationship, we can test
to see if the slope is positive or negative (no
slope=no association)
Since the sample data gives us a slope using “b”,
we can denote the population slope using “ß”.
ß= true slope of y per x (in context of the
problem)
Ho: ß=0
(this really means no association)
Ha: ß≠0
(this really means there is an
association)
Assumptions:
If you have a linear regression output on
the AP exam-it will always state- Assume your
assumptions are met. (So don’t worry about
them!)



Test Name:
Linear Regression T-test
Alpha:
0.05

Calcualtions:
P(t> ___)=p-value
Degrees of Freedom: 𝒏 − 𝟐
(there are 2 variables so we use n-2, not n-1)
Decision and Statement:
Since p<α, …….SAME THING WE’VE BEEN
DOING!!


So let’s look at the output again:

ß= true slope of amount tipped per the amount of the bill
𝐻0 : ß=0
𝐻𝑎 : ß≠0

Assumptions: stated in problem they are met.






Linear Regression T-test
α = 0.05
Calculations (given in the table):
2P(t> 9.18)=0.0027
Degrees of Freedom: 5 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟𝑠 − 2 = 3
Decision and Statement: Since p<α, it is statistically
significant, therefore we reject 𝐻0 . There’s enough
evidence to suggest there’s a relationship between the
amount of a bill and the amount tipped.

What is the slope? Interpret?
On average, for every point increase on a student’s
quiz grade, the final grade will increase by 0.75 points.

What % of the variation in the final grade can be
explained through the variation of the least-squares
regression line of final grade on quiz grade?
r²=37%

What is the correlation? Interpret?
r=0.61 (It is positive b/c the slope is positive)
There is a moderate positive linear relationship
between quiz grades and a final grade.

Is there evidence of an association between a student’s
quiz grade and their final grade.

ß= true slope of final grade per quiz grade
𝐻0 : ß=0
𝐻𝑎 : ß≠0

Assumptions: stated in problem they are met.






Linear Regression T-test
α = 0.05
Calculations (given in the table):
2P(t> 5.31)=0.000
Degrees of Freedom: 50-2= 48
Decision and Statement: Since p<α, it is statistically
significant, therefore we reject 𝐻0 . There’s enough
evidence to suggest there’s a relationship between the
quiz grade and a students final grade.

A level C confidence interval for the slope 𝛽
of the true regression line is:
𝑏±
∗
𝑡 𝑆𝐸𝑏
where 𝑆𝐸𝑏 =standard error of the slope
We find 𝑡 ∗ in the table in the back of your book
(use the degrees of freedom and CI % to find
it).

Ex: Compute a 95% confidence interval for
the true slope of amount tipped per cost of
bill.

Name: Linear Regression t-interval

Assumptions: Stated in the problem they are met
Calculations:
First look up the 𝑡 ∗ value: Go to 95%, df=3

𝑡 ∗ = 3.182
df=3
𝑏 ± 𝑡 ∗ 𝑆𝐸𝑏 = 0.16406 ± 3.182 0.01787
= 0.107, 0.221

Statement: We are 95% confident that the true
slope of amount tipped per cost of bill is
between 0.107 and 0.221.

How well do golfers’ scores in the first round
of a two-round tournament predict their
scores in the second round? The data for 12
members of a college’s women’s golf team in
a recent tournament are listed below. Is
there good evidence that there is an
association between first and second round
Example:
scores? (Assume conditions for inference are
met)
Golfer
1
2
3
4
5
6
7
8
9
10
11
12
Round
A
89
90
87
95
86
81
102
105
83
88
91
79
Round
B
94
85
89
89
81
76
107
89
87
91
88
80








ß= true slope of score on round B per score of round A
𝐻0 : ß=0
𝐻𝑎 : ß≠0
Assumptions: stated in problem they are met.
Linear Regression T-test
α = 0.05
Calculations (given in the table):
2P(t> 2.99)=0.0136
Degrees of Freedom: 12-2= 10
Decision and Statement: Since p<α, it is statistically
significant, therefore we reject 𝐻0 . There’s enough
evidence to suggest there’s a relationship between the
score on round A and round B.

Give a 95% confidence interval for the increased
rate of golf scores.
Linear Regression t-interval
Assumptions: Stated in the problem they are met
Calculations:
𝑡 ∗ = 2.228
df=10
𝑏 ± 𝑡 ∗ 𝑆𝐸𝑏 = 0.6877 ± 2.228 0.23
= 0.1753,1.200

Statement: We are 95% confident that the true
slope of score of round B per Round A is b/w
0.1753 and 1.200.

What is the line of best fit? Define any variables.
𝑦 = 26.332 + 0.6877𝑥
x=score of Round A
y=score of Round B
Interpret the slope:
b=0.6877
On average for every increase in score of Round A,
we expect Round B to increase by 0.6877 points

Interpret the y-intercept:
a= 26.332 When the score on round A is 0,
we predict the score of round B to be 26.332

Linear Regression t-interval
Assump: Stated in the problem they are met
𝑡 ∗ = 1.860 df=10-2=8
2.1495 ± 1.860 0.1396 = 1.889,2.409
We are 95% confident that the true slope of
score of fuel consumption per # of railcars is
b/w 1.889 and 2.409.
Download