Lecture 11

advertisement
N318b Winter 2002
Nursing Statistics
Lecture 11
Specific statistical tests:
Regression
Today’s Class
 Discussion of final exam
 Regression
<< 10 min break >>
 Applying knowledge to assigned reading
 Ferketich & Mercer (1995)
Followed by small groups 12-2 PM
Focus on interpreting Regression results
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 2
“In Group” Session
Focuses on Ferketich & Mercer (1995)
Q1. Chance to interpret regression findings
Q2. Know main advantage of regression
Key points about regression relating to group
work will be covered in the 2nd part of lecture
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 3
Outline of Final Exam
Last year’s exam is now on class web page
Answers will go up next week !
3 parts (60 marks): (3 hours)
Section A: true/false (8 marks)
Section B: multiple choice (22 marks)
Section C: short answer (30 marks) on
interpreting results of a research study
(4 questions – choose 3 @ 10 marks each)
INCLUDES ALL LECTURES (TODAY) !
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 4
Outline of Review Lecture
First part focuses on answering select
questions from each of the 3 sections
Second part focuses on addressing
questions directly from students in class
Work group will be a chance to review own
mid-term results and ask further questions
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 5
Correlation vs. Regression
Used for?
Correlation
Regression
association
prediction
Variables? two only
two or more
Statistics? “r” (+’ve or –’ve) “b” (+’ve or –’ve)
and “R2”
Tells you?
direction &
strength
direction,
strength &
magnitude
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 6
Types of Regression
Like ANOVA, regression is a family of methods
Simple linear regression – one dependent
variable and one independent variable
Multiple linear regression – one dependent
variable and >1 independent variables
Multiple logistic regression – one binary (e.g.
Y/N) dependent variable and >1 independent
variables (especially useful for case-control)
Plus MANY others !
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 7
Main Purpose of Regression
Correlation tells you only if there is a
relationship between two variables – strength
(“r”) and direction (positive or negative)
Simple linear regression tells you if there is a
relationship between two variables and uses
score on one to predict score on the other
Multiple linear regression tells you if there is a
relationship between two variables and uses
score on one to predict score on the other and
controls for other factors that might influence
(“confound”) the relationship being studied
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 8
Statistical Tests – Review
How do you known when to use which test?
Helps to ask some basic questions:
1. What kind of data are used?
- ratio/interval or categorical (ordinal/nominal)
- dependent (e.g. follow-up) or independent
2. What kind of relationship is of interest?
- prediction, association or difference?
3. How many groups (samples) involved?
- one, two, or more than two
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 9
Regression – when to use it?
How do you known when to use regression?
Referring back to the 3 “basic questions”:
1. What kind of data are used?
- most often numeric/continuous (ratio/interval)
- can also be binary (logistic regression)
2. What kind of relationship is of interest?
- prediction (and/or association)
3. How many groups (samples) involved?
- often just one sample involved (in some
ways regression tries to “find” groups )
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 10
Regression (linear) assumptions
BASICALLY SAME AS FOR CORRELATION
1. The relationship under study is linear –
important non-linear relationships can be
overlooked with simple regression analysis
2. Data are (approximately) normally distributed
3. Data in the (two) variables have roughly
equal range of variability (i.e. homoscedasticity)
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 11
Simple Linear Regression
Although useful for understanding concepts
behind regression it is not used very often in
research articles (as main analytic tool) – why?
Example – can use simple linear regression to
determine if there is a relationship between
mother’s age and baby’s birth weight
Analysis shows a significant relationship
between age and birth weight (for each 1 yr of
age there was a 12.3 g increase in birth weight)
Any doubts about this result?
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 12
Simple Linear Regression – cont’d
What other things might be related to child’s
birth weight other than just mother’s age?
Income?
Mother’s weight?
Smoking?
Diabetes?
Gestational age?
Simple linear regression can be used to
separately examine relationship between
birth weight and each of these but not the
combined effects of any of them together
Need multiple regression to do this !
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 13
The Simple Regression Model
Regression quantifies a (linear) relationship
that minimizes variation in predicted versus
observed values around the regression line
Y' = a + bX + e
Y' = predicted value (of dependent variable)
a = intercept (value of Y' when X = 0)
b = slope of regression line (“beta” coefficient)
X = value of independent variable (“predictor”)
e = error (how much Y' differs from observed)
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 14
Testing Fit of Regression Model
Very similar to ANOVA – how was that done?
Between-group variability
F=
Within-group variability
For regression, we also use variance from two
sources: model (variables) and error (chance)
Variance attributed to model term(s)
F=
Variance from random error
larger F-statistic = smaller p-value
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 15
Interpreting Regression Output
What does a significant F-statistic tell you?
independent variable(s), X significantly predicts
value of dependent variable, Y' (i.e. a “good” fit)
What if F is non-significant?
X not associated with Y' or there is a non-linear
relationship (i.e. model not a “good” fit for data)
What does a significant beta coefficient tell you?
amount Y' increases for each unit change in X
What does a significant “R2” tell you?
total variation in Y' attributable to model terms
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 16
Regression Output – Example
Systolic BP
Age
121
132
114
119
126
137
109
128
122
130
46
52
45
43
45
51
42
48
50
56
Is there a
relationship
between systolic
BP and age?
Sample of ten subjects
Correlation
BP
BP
Age
1
0.767453188
Age
1
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 17
Regression Table
Same as correlation
coefficient seen before !
(true only for simple reg.)
Regression Statistics
Multiple R
0.767453188
R Square
0.588984396
Adjusted R Square 0.537607445
Standard Error
5.785950307
Observations
10
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
8
9
SS
MS
F
Significance F
383.7822323 383.7822 11.46398 0.009558492
267.8177677 33.47722
651.6
Coefficients Standard Error
t Stat
P-value
53.13439636
20.95090714 2.536138 0.03492
1.478359909
0.436628866 3.38585 0.009558
Lower 95%
Upper 95%
4.821486619 101.4473061
0.471491288 2.48522853
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 18
Plot of BP and Age
BP
Plot of BP vs. Age
140
135
130
125
120
115
110
105
100
Y
Predicted Y
40
45
50
55
60
Age
Is age is a good predictor of systolic BP?
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 19
10 minute break !
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 20
Multiple Linear Regression
What about other factors – e.g. weight?
Systolic BP
Age Weight
121
132
114
119
126
137
109
128
122
130
46
52
45
43
45
51
42
48
50
56
81
92
76
74
81
88
69
79
88
92
Does weight affect
the relationship
between systolic
BP and age?
Same ten subjects
Correlation table
BP
BP
Age
Weight
1
0.76745319
0.81702624
Age
1
0.934676222
Weight
1
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 21
Multiple Regression Table
Regression Statistics
Multiple R
0.81709609
R Square
0.66764602
Adjusted R Square
0.57268774
Standard Error
5.56214054
Observations
10
Note R2 increases since more
terms are in this model (thus
less “unexplained” variance)
ANOVA
df
Regression
Residual
Total
2
7
9
SS
MS
F
Significance F
435.0381479 217.5191 7.030941 0.02116426
216.5618521 30.93741
651.6
Coefficients Standard Error
t Stat
P-value
Lower 95% Upper 95%
Intercept
50.7454735 20.22582729 2.508944 0.040459 2.919026047 98.571921
X Variable 1 (Age)
0.05789293 1.180700937 0.049033 0.962263 -2.73401914
2.849805
X Variable 2 (Weight) 0.85716152 0.665936202 1.287153 0.238965 -0.71752625 2.43184928
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 22
Multiple Regression Summary
What does a significant F-statistic tell you?
together age and weight are strong predictors of
systolic BP (i.e. model is a “good” fit of data)
Why did effects of age change so much?
age and weight are strongly correlated with
each other (note “r” = 0.93 !)
What do non-significant beta coefficients tell you?
each has non-significant (independent) effect
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 23
Model Building in Regression
The main advantage of regression is that is
allows you to “control” for multiple factors
How does this happen?
Statistical models are developed that (should) fit
the theoretical model proposed by researchers
Different procedures available but all essentially
try to “weed out” non-significant factors to explain
the most variation with least number of variables
As terms are added (or deleted) in model process,
you look for significant changes in R2 – if not, then
variable has no significant (independent) effect
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 24
Part 2:
Application to the
Assigned Readings
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 25
Ferketich & Mercer (1995)
Quick summary of the paper:
– a follow-up study examining the effects of
previous fatherhood experience on paternal
competence
– also looked at “predictors” of competence
– 172 subjects (79 exp., 93 inexp.) were
recruited during partner’s pregnancy
– analysis included t-tests, ANOVA and
multiple regression
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 26
Main study research question
“Are there any differences in paternal
competence between first-time (inexp)
and experienced (exp) fathers in the
transition to the father role?.”
see first sentence of article – page 53 of syllabus
Is this question related to regression results?
Not directly – more specific if rephrased as:
“Are there any differences in “predictors” of
paternal competence between … e.g. last
sentence of 1st paragraph under “Method”
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 27
Overview of analysis
How did the researchers “structure” analysis?
Started with …..
1. Comparison of the two main groups on
several demographic characteristics
Then …..
2. Comparison of the two main groups on
key independent variables  How (t-tests,
RANOVA)
Then …..
3. Identification of predictors of competence
for the two main groups  How (multiple
regression)
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 28
Interpreting Table 1 …
See Table 1 page 92 of paper
Look at structure of the table …
1. Who is in the table – e.g. groups ?
- only the experienced fathers
2. What is being tested – e.g. dep/indep?
- predictors of competence (many factors)
3. How is it being tested – e.g. test used?
- multiple regression (at 4 time points)
4. Was it statistically significant – e.g. p<0.05
- everything ?! (What happened to rest?)
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 29
Interpreting Table 1 - cont’d
Why does Table 1 seem to have different
variables being tested at each time point?
Only those that were significant shown !
What do all the columns mean (at each time)?
Unique R2 = variance for that term only
Cumulative R2 = variance for all model terms
Beta = change in Y per unit of X
F = test statistic from regression output (model)
p = significance of test statistic, F (model)
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 30
For Table 1 (cont’d)
Questions for interpreting results in workshop …
1. What is the relative contribution of
the terms in each model?
- i.e. what is its change in R2?
2. How was model developed?
- i.e. where are the other terms?
3. Was it in the expected direction?
- i.e. is there a +’ve or –’ve relationship?
4. What seems to be related?
- i.e. how can a term be significant
in one model but not in another?
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 31
Next Week:
Review
For next week’s class please review:
1. Last year’s final exam (web page)
School of
Nursing
Institute for Work & Health
Nur 318b 2002 Lecture 11: page 32
Download