PowerPoint Presentation - Vanderbilt Biostatistics

advertisement
How to Analyze and Graphically
Present Longitudinal Data
Ayumi Shintani, Ph.D., M.P.H.
Department of Biostatistics
Ayumi.shintani@vanderbilt.edu
For handouts and datasets:
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/GCRCNoonWorkshops
Example 1. More than 2 repeated measures with 1 group
From Table 1 of Deal et al (1979): Role of respiratory heat exchange in
production of exercise-induced asthma. J Appl Physiol 46:467-475
Minute ventilation Volume vs. Temperature-Dry Gas Experiments
Ventilation in 1/min
ID
Mean
SD
-10
25
37
50
65
80
Mean
SD
Slope
1
74.5
81.5
83.6
68.6
73.1
79.4
76.8
5.7
-0.01
2
75.5
84.6
70.6
87.3
73
75
77.7
6.7
-0.02
3
68.9
71.6
55.9
61.9
60.5
61.8
63.4
5.8
-0.40
4
57
61.3
54.1
59.2
56.6
58.8
57.8
2.5
0
5
78.3
84.9
64
62.2
60.1
78.7
71.4
10.5
-0.12
6
54
62.8
63
58
56
51.5
57.6
4.7
-0.04
7
72.5
68.3
67.8
71.5
65
67.7
68.8
2.8
-0.06
8
80.8
89.9
83.2
83
85.7
79.6
83.7
3.7
-0.01
70.2
75.6
67.8
69.0
66.3
69.1
-4.5
9.8
11.0
11.1
11.0
10.3
10.8
4.6
Error Bars show 95.0% Cl of Mean
Dot/Lines show Means
80.0
]
70.0
]
]
]
]
]
60.0
Temp_10
Temp25
Temp37
Temp50
Temp65
Temperature
Temp80
90.0
ID
1
2
3
4
5
6
7
8
80.0
70.0
Dot/Lines show Means
60.0
50.0
Temperture -10
Temperture 37
Temperture 65
Temperture 25
Temperture 50
Temperture 80
Temperature
We want to analyze whether there is an association
between minute ventilation volume vs. temperature.
What’s hypothesis do you want to test, i.e., what
exactly do you want to compare?
Null hypothesis 1: Mean of minute ventilation
volume at different temperatures are the
same.
Error Bars show 95.0% Cl of Mean
Dot/Lines show Means
80.0
]
70.0
]
]
]
]
]
60.0
Temp_10
Temp25
Temp37
Temp50
Temp65
Temperature
Temp80
First, let’s ignore repeated measures, perform one-way ANOVA
In order to perform ANOVA, you first need to transform data from (horizontal) to (longitudinal) format
(longitudinal format uses only one variable for outcome measures as oppose to horizontal, where
different outcome variable is created for each repeated measure).
In SPSS go:
Data
Restructure
Step 1: Welcome Data Structure Wizard - Select the first choice on
Step 2: Variables to Cases: Number of variable groups - Select the first choice
Step 3: Variables to Cases: Select variables
* Select Temp_01 through Temp80 (in the right order) to Variables to be
transposed box (use Shift key on your key board to highlight them all once),
* Type the name of the output variable (for example, Vent)
* Select ID to the Fixed variable box, Next
Step 4: Variables to Cases: Create Index Variables - Select the first choice on
Step 5: Variables to Cases: Create One Index Variable
Click on the first choice (sequential numbers)
Edit index variable from Index1 to TEMP (for temperature)
Step 6: Handling variables not selected
Keep and treat as fixed variables
The rest remains the same
Step 7: Do nothing
Finish (before you do this, make sure you saved the original horizontal file)
Recode the level of Temp to the actual values (-10, 25, 37, …80)
Save the new longitudinal dataset as Deal Longitudinal.sav
Now, let’s perform one-way ANOVA
Analyze
Compare Means
One-way ANOVA
Select Vent as dependent, Temp as Factor variables
ANOVA
VF
Another way:
Between Groups
Within Groups
Total
Sum of
Squares
413.867
4780.830
5194.697
df
5
42
47
Mean Square
82.773
113.829
F
.727
Sig.
.607
Analyze
General linear model
Univariate (Multivariate means when you have more than 1 dependent variables)
Select Change as dependent, Temp as fixed Factor variables
Click Plots
Select Temp as horizontal variable, Click ADD, Continue
Click Models
Select Custom
Select Temp into Model box, Continue
OK
Author’s test for effect of temperature: One-way analysis of variances : F5,42=.72, p>0.5
Results of one-way ANOVA
Tests of Between-Subjects Effects
Dependent Variable: Vent
Source
Corrected Model
Intercept
Temp
Error
Total
Corrected Total
Type III Sum
of Squares
413.867a
232798.163
413.867
4780.830
237992.860
5194.697
df
5
1
5
42
48
47
Mean Square
82.773
232798.163
82.773
113.829
a. R Squared = .080 (Adjusted R Squared = -.030)
F
.727
2045.152
.727
Sig.
.607
.000
.607
The authors concluded that no differences existed between minute
ventilation volume and temperatures. The flaw in this analysis are:
1 The normality of the distributions of ventilation volumes has not
been checked.
Use Kruskal-Wallis test
2. The observations within a single subject are not independent. The
subject identification was not used in the analysis. The analysis does
not remove variation among subjects by considering each subject as his
own control, making use of the fact that variation within subjects is
usually less than the variation between subjects.
Ignoring correlation among
measurements
Considering correlation
among measurements
Nonparametric
Kruskal-Wallis
test
(p=0.613)
Parametric
One-way
ANOVA (GLM)
(p=0.607)
Friedman
Test
(p=0.023)
Linear mixed
effect model
(???)
Problem 2
Problem 1
In order to solve the problem 2, we can use a linear mixed effect model.
A linear mixed effect model is similar to linear regression (or general
linear regression) where outcome variables are continuous. A linear
regression assumes normality and independence of residuals. Similarly
a linear mixed model requires normality assumption, however, it does
not requires independence assumption. We will talk about normality
part later, here let’s spend some time to learn about independence
assumption.
ID
temp
trans1
1
-10
74.5
1
25
81.5
1
37
83.6
1
50
68.6
1
65
73.1
1
80
79.4
2
-10
75.5
2
25
84.6
2
37
70.6
2
50
87.3
2
65
73.0
2
80
75.0
3
-10
68.9
3
25
71.6
3
37
55.9
Are they independent
from each other?
A quick way to check this is to do correlation analysis.
Correlations
Temperture Temperture Temperture Temperture Temperture Temperture
-10
25
37
50
65
80
Spearman's rhoTemperture -10Correlation Coefficient
1.000
.952**
.690
.786*
.738*
.929**
Sig. (2-tailed)
.
.000
.058
.021
.037
.001
N
8
8
8
8
8
8
Temperture 25 Correlation Coefficient
.952**
1.000
.667
.690
.690
.881**
Sig. (2-tailed)
.000
.
.071
.058
.058
.004
N
8
8
8
8
8
8
Temperture 37 Correlation Coefficient
.690
.667
1.000
.762*
.857**
.833*
Sig. (2-tailed)
.058
.071
.
.028
.007
.010
N
8
8
8
8
8
8
Temperture 50 Correlation Coefficient
.786*
.690
.762*
1.000
.857**
.738*
Sig. (2-tailed)
.021
.058
.028
.
.007
.037
N
8
8
8
8
8
8
Temperture 65 Correlation Coefficient
.738*
.690
.857**
.857**
1.000
.857**
Sig. (2-tailed)
.037
.058
.007
.007
.
.007
N
8
8
8
8
8
8
Temperture 80 Correlation Coefficient
.929**
.881**
.833*
.738*
.857**
1.000
Sig. (2-tailed)
.001
.004
.010
.037
.007
.
N
8
8
8
8
8
8
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
If each observation is independent, you would expect p>0.05 for Spearman
correlation coefficient.
In order to perform correlation analysis, data must be entered horizontally, you can use deal.sav
dataset for this. In the case, you have created your original database longitudinally and want to
convert it to horizontal, here is how to do it.
Read Deal.long.sav into SPSS
Before you restructure this data, recode negative value for Temperature variable, by either recoding it
to (1,2,3,4,5,6) or replace -10 with 10.
Go to:
Data
Restructure
Step 1: Welcome Data Structure Wizard - Select the second choice “Restructure selected
cases into variables, click “next”
Step 2: Select “PATIENT ID” to identifier variable box (upper left box)
“Temperature” to index variable box (lower left box), click “next”
Step 3: Finish
Mathematical Presentation of Correlation Structures
•
Let rjk donates a correlation coefficient between the jth and kth repeated measures
on the same patients.
• R(rjk ) is the working correlation matrix of Y
1
0
R
..

0
0 ... 0 

1 ... 0 
.. 1 ...

0 ... 1 
This is what a linear regression model assumes.
Since observations are not independent for repeated measure data
(observations within a patient are dependent), we cannot use the
independence assumption. A linear mixed model requires
assumption on “correlation structure”.
You need to make a guess on how measures taken repeatedly correlate.
2.00
4
Value
1.00
9
5
16
17
14
12
20
6
13
1
15
10
0.00
19
3
8
7
18
-1.00
11
2
-2.00
Y1
Y2
Y3
Y4
Y5
Y6
Category
Data on the above figure assumes variance of Y at each point is the same
across all categories, correlation between any 2 sets of Ys are zero
(independent). This structure is called independent (scaled identity in SPSS).
This is what 2-way ANOVA assumes for a structure of error terms, which is now
obvious not providing good fit to the isoproterenol data.
Let’s look at the next figure. Variance of each Y are the same across all the
categories, correlation between any 2 Ys are the same (ie, correlation is the
same when 2 doses are closer or not). This structure is called Compound
Symmetry (Exchangeable).
2.00
1
Value
1.00
9
6
14
12
4
15
8
19
20
7
18
2
3
17
0.00
-1.00
5
10
16
11
13
-2.00
Y1
Y2
Y3
Y4
Category
Y5
Y6
Let’s look at the next figure. Variance of each Y are the same across all
the categories, correlation between any 2 Ys is equal to r(distance
between Ys) where -1<r<1 (ie, correlation is the same when 2 doses are
closer or not). This structure is called First-Order-Autoregressive.
2.00
20
2
9
10
15
Value
1.00
17
16
19
12
3
1
13
7
4
5
0.00
14
11
18
-1.00
8
6
-2.00
Y1
Y2
Y3
Y4
Category
Y5
Y6
AR(1): Heterogeneous. This is a first-order autoregressive structure with
heterogeneous variances.
2
Value
2.00
20
15
16
12
7
3
10
13
9
19
1
17
0.00
4
8
11
18
5
14
-2.00
6
Y1
Y2
Y3
Y4
Category
Y5
Y6
Mathematical Presentation of Correlation Structures
•
Let rjk donates a correlation coefficient between the jth and kth repeated
measures on the same patients.
• R(rjk ) is the working correlation matrix of Y
 1 r12 ... r1J 
r

1
...
r
21
2J 

R
 .. .. 1 ... 


 r1 rJ 2 ... 1 
Model for the correlation
• Independence (called “Scaled Identity” in SPSS)
1
0
R
 ..

0
•
0 ... 0 

1 ... 0 
.. 1 ...

0 ... 1 
Correlation between any two observations within the same patient is
independent
Model for the correlation (cont.)
• Exchangeable (compound symmetry)
1
r
R
 ..

 r
... r 

1 ... r 
.. 1 ...

r ... 1 
r
• Any two distinct observations from the
same patient have the same correlation
coefficient ()
Model for the correlation (cont.)
• Unstructured
1
r
21

R
 ..

 r1
r12
1
..
rJ 2
... r1J 

... r2 J 
1 ... 

... 1 
• Each jk has different value, no structure is assumed in R
Model for the correlation (cont.)
• Auto regressive (1)
 1

r

R
 ..
 J 1
 r
r
1
..
r
J 2
... r J 1 
J 2 
... r 
1 ... 

... 1 
• rjk is function of time lag between 2 points
Toeplitz: Often fits well for experimental data.
Toeplitz. This covariance structure has homogenous variances and
heterogenous correlations between elements. The correlation between
adjacent elements is homogenous across pairs of adjacent elements.
The correlation between elements separated by a third is again
homogenous, and so on.
1
r
1

R
 r2

 r3
r1
r2
1
r1
r1
1
r2
r1
r3 

r2 
r1 

1 
Selection of correlation
structure
• If the number of repeats is small and data are balanced and
complete, then an unstructured matrix is recommended
• If observations are measured over time, then use a structure that
accounts for correlation as function of time (i.e. auto-regressive),
choose a model which provides the smallest AIC value.
• If observations are clustered (i.e. no logical ordering) then
exchangeable may be appropriate
The model is able to consider the following covariance structures for repeated
measures data available in SPSS.
Ante-Dependence: First Order
AR(1)
AR(1): Heterogeneous
ARMA(1,1)
Compound Symmetry
Compound Symmetry: Correlation Metric
Compound Symmetry: Heterogeneous
Diagonal
Factor Analytic: First Order
Factor Analytic: First Order,
Heterogeneous
Huynh-Feldt
Scaled Identity
Toeplitz
Toeplitz: Heterogeneous
Unstructured
Unstructured: Correlations
Let’s use a linear mixed effect model to analyze Minute ventilation Volume data.
Read Isoproterenol.long2.sav into SPSS.
Analyze,
Mixed Models, Linear,
Select ID for Subject variable
Select Dose as Repeated variable
Select appropriate covariance structure in the Repeated Covariance Type
for example AR(1) heterogeneous, Continue
Dependent variable: Vent
Factor variable (categorical independent variable) : Temp
Covariates (continuous independent variables):
Click Fixed
Click Custom, highlight all independent variables in the box
Choose Main Effect, and put them (Temp) in the model box
Click EM Means
Select Temp into the “Display means for” box
Select Bonferroni method
Click Compare main effect, select reference category to “first”
Click Statistics
Choose Parameter estimates, tests for covariance parameters, covariance for residuals
Click Save
Select Residuals, Predicted Values
OK
Result of the linear mixed
Model with ARH(1)
a
Information Criteria
-2 Restricted Log
297.714
Likelihood
Akaike's Information
311.714
Criterion (AIC)
Hurvich and Tsai's
315.008
Criterion (AICC)
Bozdogan's Criterion
330.877
(CAIC)
Schwarz's Bayesian
323.877
Criterion (BIC)
The information criteria are displayed in smaller-is-better
forms.
a. Dependent Variable: Minute Ventilation Volume.
a
Type III Tests of Fixed Effects
Denominator
Source
Numerator df
df
F
Intercept
1
7.049
485.337
temp
5
21.222
3.099
a. Dependent Variable: Minute Ventilation Volume.
Sig.
.000
.030
Pairwise Comparisonsb
Mean
Difference
(I-J)
5.425
-2.413
-1.225
-3.938
-1.125
(I) Temperature (J) Temperature
Std. Error
25
-10
2.339
37
-10
3.412
50
-10
3.783
65
-10
3.843
80
-10
4.246
Based on estimated marginal means
a. Adjustment for multiple comparisons: Bonferroni.
b. Dependent Variable: Minute Ventilation Volume.
df
14.346
16.480
20.843
23.946
19.444
Sig. a
.178
1.000
1.000
1.000
1.000
95% Confidence Interval for
Differencea
Lower Bound
Upper Bound
-1.513
12.363
-12.341
7.516
-11.943
9.493
-14.688
6.813
-13.243
10.993
b
Pairwise Comparisons
95% Confidence Interval for
Mean
a
Difference
Difference
a
(I) Temperature (J) Temperature
(I-J)
Std. Error
df
Sig.
Lower Bound Upper Bound
P-values were not
-10
25
-5.425*
2.339
14.346
.036
-10.431
-.419
adjusted
37
2.413
3.412
16.480
.489
-4.804
9.629
50
1.225
3.783
20.843
.749
-6.645
9.095
For multiple
65
3.938
3.843
23.946
.316
-3.995
11.870
comparisons.
80
1.125
4.246
19.444
.794
-7.749
9.999
25
-10
5.425*
2.339
14.346
.036
.419
10.431 Select Bonferroni
37
7.838*
2.678
16.722
.010
2.180
13.495 option to adjust for
50
6.650
3.458
23.877
.066
-.489
13.789 multiple comparisons.
65
9.363*
3.787
26.581
.020
1.585
17.140
80
6.550
4.285
23.101
.140
-2.312
15.412
37
-10
-2.413
3.412
16.480
.489
-9.629
4.804
Therefore,
25
-7.838*
2.678
16.722
.010
-13.495
-2.180
50
-1.188
2.748
19.753
.670
-6.925
4.550
With the adjustment,
65
1.525
3.528
21.550
.670
-5.800
8.850
None of the pair-wise
80
-1.288
4.177
22.832
.761
-9.932
7.357
Analysis was
-10
50
-1.225
3.783
20.843
.749
-9.095
6.645
significant.
25
-6.650
3.458
23.877
.066
-13.789
.489
37
1.188
2.748
19.753
.670
-4.550
6.925
65
2.713
2.579
17.391
.307
-2.720
8.145
80
-.100
3.513
20.601
.978
-7.415
7.215
65
-10
-3.938
3.843
23.946
.316
-11.870
3.995
25
-9.363*
3.787
26.581
.020
-17.140
-1.585
37
-1.525
3.528
21.550
.670
-8.850
5.800
50
-2.713
2.579
17.391
.307
-8.145
2.720
80
-2.813
2.496
14.017
.279
-8.165
2.540
80
-10
-1.125
4.246
19.444
.794
-9.999
7.749
25
-6.550
4.285
23.101
.140
-15.412
2.312
37
1.288
4.177
22.832
.761
-7.357
9.932
50
.100
3.513
20.601
.978
-7.215
7.415
65
2.813
2.496
14.017
.279
-2.540
8.165
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).
b. Dependent Variable: Minute Vantilation Volume.
Predicted model by the linear mixed effect model with ARH(1)
Correlation structure.

Dot/Lines show Means
Fixed Predicted Values
74 .00 0
72 .00 0

70 .00 0



68 .00 0

66 .00 0
0
25
50
Tempe rature
75
Performing One-way ANOVA using a linear mixed effect model option in SPSS
When you use independence (scaled identity) structure for correlation, the
model becomes equivalent with one-way ANOVA.
Result of the linear mixed model ignoring dependency among repeated measures.
This value is bigger than one with ARH(1)
on page 29
Information Criteriaa
-2 Restricted Log
330.525
Likelihood
Type III Tests of Fixed Effectsa
Akaike's Information
332.525
Criterion (AIC)
Denominator
Hurvich and Tsai's
Source
Numerator df
df
F
332.625
Criterion (AICC)
Intercept
1
42
2045.152
Bozdogan's Criterion
temp
335.263
5
42
.727
(CAIC)
a. Dependent Variable: Minute Ventilation Volume.
Schwarz's Bayesian
334.263
Criterion (BIC)
The information criteria are displayed in smaller-is-better
forms.
a. Dependent Variable: Minute Ventilation Volume.
Sig.
.000
.607
Predicted model by the linear mixed effect model with independence
Correlation structure.

Dot/Lines show Means
Predicted Values
74 .00 0
72 .00 0

70 .00 0



68 .00 0

66 .00 0
0
25
50
75
Tempe rature
Model parameter estimates are the same as those of the model with ARH(1)
however, standard errors are much smaller with a model with considering
correlation among repeated measures.
Performing residual diagnosis for a linear mixed model in SPSS.
After you perform the analysis on page 28, by using SAVE option,
you created 2 new variables; residuals and predicted values.
Go to graphics, histogram to create graph 1.
Go to graphics, Scatter plot to create graph 2.
Graph 1:Checking for normality
Graph 2: Checking for trend
in residuals (if there is a trend,
you may want to try
transformation of Y
Fitting slopes (null hypothesis 2: slope = 0)
In the previous analysis, we treated temperature as a categorical variable, which
assesses that means ventilation volume were the same or not. All pair-wise
analysis did not show any difference due to power loss (by adjustment for multiple
comparisons). In order to assess whether there is increasing or decreasing trend
by temperature, we can off course analyze this data with regression slope by
treating temperature as continuous instead of categorical.

9 0.0
Linear Regression with
95.00% Mean Prediction Interval

Minute Ventilation Volume




8 0.0












7 0.0








Minute Ventilation Volume = 71.48 + -0.04 * temp

R-Square = 0.02 




6 0.0













5 0.0
0
25
50
75
Temperature
LOWSS curve
Performing a linear mixed effect model to assess slope is greater or less than zero.
Read deal.long.sav into SPSS.
Analyze,
Mixed Models,
Linear,
Select ID for Subject variable
Select Temp as Repeated variable
Select appropriate covariance structure in the Repeated Covariance Type
for example AR(1) heterogeneous
Continue
Dependent variable: Vent
Factor variable (categorical independent variable) :
Covariates (continuous independent variables): Temp
Click Fixed
Click Custom, highlight all independent variables in the box
Choose Main Effect, and put them (Temp) in the model box
Click Statistics
Choose Parameter estimates, tests for covariance parameters
Click Save
Select Residuals, Predicted Values
OK
Result of the linear mixed model with AHR(1) with temp as continuous.
Information Criteria a
-2 Restricted Log
Likelihood
Akaike's Information
Criterion (AIC)
Hurvich and Tsai's
Criterion (AICC)
Bozdogan's Criterion
(CAIC)
Schwarz's Bayesian
Criterion (BIC)
332.417
346.417
349.364
366.217
359.217
The information criteria are displayed in smaller-is-better
forms.
a. Dependent Variable: Minute Ventilation Volume.
a
Estimates of Fixed Effects
Parameter Estimate Std. Error
df
t
Intercept
66.97027 2.929443
8.409
22.861
temp
.013971 .045589
22.702
.306
a. Dependent Variable: Minute Ventilation Volume.
Sig.
.000
.762
95% Confidence Interval
Lower Bound Upper Bound
60.271723 73.668818
-.080406
.108348
P=0.762 indicates that the slope is not different from zero.
A much simpler way to analyze this data:
Response feature analysis, i.e., analysis of summary
measures: Using slope as a summary measure:
Read Deal Longitudianl.sav dataset into SPSS
Graphs, Interactive, Scatterplot,
Select Vent as Y-axis, Temp (as Scale) as X-axis, ID (as Categorical) as panel variable
Click Fit
Select “Include constant in equation”
Prediction line for “Mean”
Fit line for ‘Total”
OK
1
2
3
100.0
Linear Regression w ith
95.00% Mean Prediction Interval
90.0





VF
80.0




VF = 78.49
+ -0.02 * Temp

R-Square = 0.01
VF = 77.16 + -0.01 * Temp

R-Square = 0.00
70.0




60.0



VF = 67.73
+ -0.10 * Temp
R-Square = 0.33
50.0
4
5
6
100.0
90.0

80.0

VF

70.0

60.0





VF = 76.32 + -0.12
* Temp

R-Square = 0.13


50.0

7
8

90.0



VF
80.0




VF = 84.25 + -0.01 * Temp
R-Square = 0.01



VF = 71.14 + -0.06 * Temp
R-Square = 0.43
60.0
50.0
0
25
Tem p
50
75

VF = 59.13 + -0.04 * Temp
R-Square = 0.07
100.0
70.0


 + 0.00 * Temp
VF = 57.65
R-Square = 0.00
0
25
Tem p
50
75


Now, open Deal.sav and create a new variable Slope and type each person’s slope v
Then perform One-sample non-parametric test (N is small) for the slope
In order to perform one-sample non-parametric
Test in SPSS, you need this trick.
Create a dummy variable with all 0’s
Then go:
Analyze
2-related samples
Select Slope and Dummy to test pair list
Select Wilcoxon as test type
OK
(SPSS does not work with only SLOPE variable)
ID
Slope
Dumm
y
1
-0.01
0
2
-0.02
0
3
-0.4
0
4
0
0
5
-0.12
0
6
-0.04
0
7
-0.06
0
8
-0.01
0
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
dummy Slope
-2.371 a
.018
a. Bas ed on negative ranks.
Using slopes to test for trends:
b. Wilcoxon Signed Ranks Tes t
Wilcoxon signed-rank tests: P=0.018
We now can conclude that there is a significant association between
minute ventilation volume and temperature (as temperature increases,
ventilation volume decreases)
Using summary measures can provide more intuitive and simplified
approach which some times provides bigger power to detect
differences.
Download