Sec9-1 - Personal.psu.edu

advertisement
STAT 250
Dr. Kari Lock Morgan
Simple Linear
Regression
SECTION 9.1
• Inference for correlation
• Inference for slope
• Conditions for inference
Statistics: Unlocking the Power of Data
Lock5
Social Networks and the Brain
 Is the size of certain regions of your brain
correlated with the size of your social network?
 Data from 40 students at City College London
 How to measure brain size?
 How to measure social network size?
Source: R. Kanai, B. Bahrami, R. Roylance and G. Ree (2011). Online social network
size is reflected in human brain structure, Proceedings of the Royal Society B:
Biological Sciences. 10/19/11.
Statistics: Unlocking the Power of Data
Lock5
Measuring Brain Size
 Structural Magnetic Resonance Imaging (MRI)
 Voxel-based morphometry (VBM) to compute
regional grey matter volume based on T1-weighted
anatomical MRI scans
 Brain regions found significant in initial study
 Amygdala (emotion and emotional memory)
 Middle temporal gyrus (social perception)
 Entorhinal cortex (memory and navigation)
 Superior temporal sulcus (perception of others)
 Response: normalized z-score of grey matter
density for these brain regions
Statistics: Unlocking the Power of Data
Lock5
Brain Regions
Image from Do our Brains Determine our Facebook Friend Count? (www.nature.com)
Statistics: Unlocking the Power of Data
Lock5
Social Networks and the Brain
 How to measure size of social network?









How many were present at your 18th or 21st birthday party?
If you were going to have a party now, how many people would you
invite?
What is the total number of friends in your phonebook?
Write down the names of the people to whom you would send a text
message marking a celebratory event. How many people is that?
Write down the names of people in your phonebook you would
meet for a chat in a small group (one to three people). How many
people is that?
How many friends have you kept from school and university whom
you could have a friendly conversation with now?
Explanatory variable
How many friends do you have on ‘Facebook’?
How many friends do you have from outside school or university?
Write down the names of the people of whom you feel you could ask
a favor and expect to have it granted. How many people is that?
Statistics: Unlocking the Power of Data
Lock5
Social Networks and the Brain
r = 0.436
Is the association significant?
Statistics: Unlocking the Power of Data
Lock5
Standard Error Formulas
Parameter
Distribution
Proportion
Difference in
Proportions
Mean
Difference in Means
Correlation
Standard Error
Normal
p(1  p)
n
Normal
p1 (1  p1 ) p2 (1  p2 )

n1
n2
2
t, df = n – 1
t, df = min(n1, n2) – 1
t, df = n – 2
Statistics: Unlocking the Power of Data
n
 12
n1

 22
n2
1- r 2
n-2
Lock5
Social Networks and the Brain
• Is the grey matter volume of these regions of the brain
significantly correlated with number of Facebook
friends?
• From n = 40 people, we find r = .436. Is this
significant?
(a) Yes
(b) No
Statistics: Unlocking the Power of Data
Lock5
Social Networks and the Brain
1. State hypotheses:
2. Check conditions:
3. Calculate test statistic:
4. Compute p-value:
5. Interpret in context:
Statistics: Unlocking the Power of Data
Lock5
Social Networks and the Brain
Should you go out and add more Facebook
friends to increase the size of your brain?
a) Yes
b) No
Statistics: Unlocking the Power of Data
Lock5
Limitations
Statistics: Unlocking the Power of Data
Lock5
Social Networks and the Brain
Give a 95% confidence interval for ρ, the true
correlation between grey matter volume in the
left middle temporal gyrus and number of
Facebook friends. (Can use t* = 2).
a)
b)
c)
d)
(0.34, 0.54)
(0.24, 0.64)
(0.14, 0.73)
(0.04, 0.83)
Statistics: Unlocking the Power of Data
r = 0.436
1- 0.436 2
SE =
= 0.156
40 - 2
Lock5
R2
R2 is the proportion of the variability in
the response variable, Y, that is
explained by the explanatory variable, X
 For simple linear regression, R2 = r2 (R2 is just
the sample correlation squared)
Statistics: Unlocking the Power of Data
Lock5
2
R
R 2  0.67
R 2  0.09
How much does the variability in Y decrease if you know X?
Statistics: Unlocking the Power of Data
Lock5
Regression in Minitab
 Stat -> Regression -> Fitted Line Plot
0.4362 = 0.19
Statistics: Unlocking the Power of Data
Lock5
Sample to Population
 Everything we have done so far is based solely
on sample data
 Now, we will extend from the sample to the
population
 Statistical inference!
Statistics: Unlocking the Power of Data
Lock5
Simple Linear Model
• The population/true simple linear model is
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Intercept
Slope
Random error
• 0 and 1, are unknown parameters
• Can use familiar inference methods!
Statistics: Unlocking the Power of Data
Lock5
Inference for the Slope
 Test for whether the slope is significantly
different from 0 (whether there is any linear
relationship between x and y):
H0 : b1 = 0
H a : b1 ¹ 0
 Confidence interval for the true slope
Statistics: Unlocking the Power of Data
Lock5
Inference for the Slope
• Confidence intervals and hypothesis tests for the
slope can be done using the familiar formulas:
sample statistic  t  SE
*
sample statistic  null value
t
SE
• Population Parameter: 1, Sample Statistic: 𝛽1
• Use t-distribution with n – 2 degrees of freedom
Statistics: Unlocking the Power of Data
Lock5
Regression in Minitab
Stat -> Regression -> Regression -> Fit Regression Model
Statistics: Unlocking the Power of Data
Lock5
Inference for Slope
n = 40
Is the slope significantly different from 0?
(a) Yes
(b) No
Give a 95% confidence interval for the true slope.
Statistics: Unlocking the Power of Data
Lock5
Hypothesis Test
Statistics: Unlocking the Power of Data
Lock5
Regression in Minitab
Stat -> Regression -> Regression -> Fit Regression Model
Statistics: Unlocking the Power of Data
Lock5
Two Quantitative Variables
• The t-statistic (and p-value) for a test for a
non-zero slope and a test for a non-zero
correlation are identical!
• They are equivalent ways of testing for a linear
association between two quantitative variables.
Statistics: Unlocking the Power of Data
Lock5
Confidence Interval
statistic  t *  SE
0.0023 ± 2 ´ 0.00077
(0.00076,0.00384 )
We are 95% confident that the true slope, regressing
grey matter volume of the left temporal gyrus on
number of Facebook friends, is between 0.00076
and 0.00384.
Statistics: Unlocking the Power of Data
Lock5
Multiple Testing?
Statistics: Unlocking the Power of Data
Lock5
False Positive (Type I Error) Protection
 To further protect against Type I errors, they
performed two independent analysis on two
separate samples (n = 125, then n = 40)
Statistics: Unlocking the Power of Data
Lock5
Real-World Network Size
 What about real-world network size?
Statistics: Unlocking the Power of Data
Lock5
Conditions
Inference based on the simple linear
model is only valid if the following
conditions hold:
1) Linearity
2) Constant Variability of Residuals
3) Normality of Residuals
Statistics: Unlocking the Power of Data
Lock5
Linearity
• The relationship between x and y is
linear (it makes sense to draw a line
through the scatterplot)
Statistics: Unlocking the Power of Data
Lock5
Dog Years
Charlie
• From www.dogyears.com:
“The old rule-of-thumb that one dog year
equals seven years of a human life is not
accurate. The ratio is higher with youth
and decreases a bit as the dog ages.”
• 1 dog year = 7 human years
• Linear: human age = 7×dog age
ACTUAL
LINEAR
A linear model can still be useful,
even if it doesn’t perfectly fit the data.
Statistics: Unlocking the Power of Data
Lock5
“All models are wrong,
but some are useful”
-George Box
Statistics: Unlocking the Power of Data
Lock5
Residuals (errors)
Conditions for residuals:
 i ~ N  0,   
The errors
are normally
distributed
Check with
a histogram
The average of
the errors is 0
(Always true for
least squares
regression)
Statistics: Unlocking the Power of Data
The standard deviation
of the errors is constant
for all cases
Constant spread
of points around
the line
Lock5
Regression in Minitab
Is the association
approximately
linear?
a) Yes
b) No
Is the spread of the
points around the
line approximately
constant?
a) Yes
b) No
Statistics: Unlocking the Power of Data
Lock5
Histogram of Residuals
Are the residuals
approximately
normally distributed?
a) Yes
b) No
Statistics: Unlocking the Power of Data
Lock5
Non-Constant Variability
Statistics: Unlocking the Power of Data
Lock5
Non-Normal Residuals
Statistics: Unlocking the Power of Data
Lock5
Conditions not Met?
• If the association isn’t linear: don’t use simple
linear regression
• If variability is not constant, or residuals are
not normal: The model itself is still valid, but
inference may not be accurate
• If you want to do something more fancy so the
conditions are met… take STAT 462!
Statistics: Unlocking the Power of Data
Lock5
Simple Linear Regression
1) Plot your data!
•
•
•
Association approximately linear?
Outliers?
Constant variability?
2) Fit the model (least squares)
3) Use the model
•
•
Interpret coefficients
Make predictions
4) Look at histogram of residuals (normal?)
5) Inference (extend to population)
•
Inference on slope (interval and test)
Statistics: Unlocking the Power of Data
Lock5
To Do
 Read Section 9.1
 Do HW 9.1 (due Friday, 3/24)
 Study for Exam 3 (Friday, 3/24)
Statistics: Unlocking the Power of Data
Lock5
Download