How to Analyze and Graphically Present Longitudinal Data Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics Ayumi.shintani@vanderbilt.edu For handouts and datasets: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/GCRCNoonWorkshops Example 1. More than 2 repeated measures with 1 group From Table 1 of Deal et al (1979): Role of respiratory heat exchange in production of exercise-induced asthma. J Appl Physiol 46:467-475 Minute ventilation Volume vs. Temperature-Dry Gas Experiments Ventilation in 1/min ID Mean SD -10 25 37 50 65 80 Mean SD Slope 1 74.5 81.5 83.6 68.6 73.1 79.4 76.8 5.7 -0.01 2 75.5 84.6 70.6 87.3 73 75 77.7 6.7 -0.02 3 68.9 71.6 55.9 61.9 60.5 61.8 63.4 5.8 -0.40 4 57 61.3 54.1 59.2 56.6 58.8 57.8 2.5 0 5 78.3 84.9 64 62.2 60.1 78.7 71.4 10.5 -0.12 6 54 62.8 63 58 56 51.5 57.6 4.7 -0.04 7 72.5 68.3 67.8 71.5 65 67.7 68.8 2.8 -0.06 8 80.8 89.9 83.2 83 85.7 79.6 83.7 3.7 -0.01 70.2 75.6 67.8 69.0 66.3 69.1 -4.5 9.8 11.0 11.1 11.0 10.3 10.8 4.6 Error Bars show 95.0% Cl of Mean Dot/Lines show Means 80.0 ] 70.0 ] ] ] ] ] 60.0 Temp_10 Temp25 Temp37 Temp50 Temp65 Temperature Temp80 90.0 ID 1 2 3 4 5 6 7 8 80.0 70.0 Dot/Lines show Means 60.0 50.0 Temperture -10 Temperture 37 Temperture 65 Temperture 25 Temperture 50 Temperture 80 Temperature We want to analyze whether there is an association between minute ventilation volume vs. temperature. What’s hypothesis do you want to test, i.e., what exactly do you want to compare? Null hypothesis 1: Mean of minute ventilation volume at different temperatures are the same. Error Bars show 95.0% Cl of Mean Dot/Lines show Means 80.0 ] 70.0 ] ] ] ] ] 60.0 Temp_10 Temp25 Temp37 Temp50 Temp65 Temperature Temp80 First, let’s ignore repeated measures, perform one-way ANOVA In order to perform ANOVA, you first need to transform data from (horizontal) to (longitudinal) format (longitudinal format uses only one variable for outcome measures as oppose to horizontal, where different outcome variable is created for each repeated measure). In SPSS go: Data Restructure Step 1: Welcome Data Structure Wizard - Select the first choice on Step 2: Variables to Cases: Number of variable groups - Select the first choice Step 3: Variables to Cases: Select variables * Select Temp_01 through Temp80 (in the right order) to Variables to be transposed box (use Shift key on your key board to highlight them all once), * Type the name of the output variable (for example, Vent) * Select ID to the Fixed variable box, Next Step 4: Variables to Cases: Create Index Variables - Select the first choice on Step 5: Variables to Cases: Create One Index Variable Click on the first choice (sequential numbers) Edit index variable from Index1 to TEMP (for temperature) Step 6: Handling variables not selected Keep and treat as fixed variables The rest remains the same Step 7: Do nothing Finish (before you do this, make sure you saved the original horizontal file) Recode the level of Temp to the actual values (-10, 25, 37, …80) Save the new longitudinal dataset as Deal Longitudinal.sav Now, let’s perform one-way ANOVA Analyze Compare Means One-way ANOVA Select Vent as dependent, Temp as Factor variables ANOVA VF Another way: Between Groups Within Groups Total Sum of Squares 413.867 4780.830 5194.697 df 5 42 47 Mean Square 82.773 113.829 F .727 Sig. .607 Analyze General linear model Univariate (Multivariate means when you have more than 1 dependent variables) Select Change as dependent, Temp as fixed Factor variables Click Plots Select Temp as horizontal variable, Click ADD, Continue Click Models Select Custom Select Temp into Model box, Continue OK Author’s test for effect of temperature: One-way analysis of variances : F5,42=.72, p>0.5 Results of one-way ANOVA Tests of Between-Subjects Effects Dependent Variable: Vent Source Corrected Model Intercept Temp Error Total Corrected Total Type III Sum of Squares 413.867a 232798.163 413.867 4780.830 237992.860 5194.697 df 5 1 5 42 48 47 Mean Square 82.773 232798.163 82.773 113.829 a. R Squared = .080 (Adjusted R Squared = -.030) F .727 2045.152 .727 Sig. .607 .000 .607 The authors concluded that no differences existed between minute ventilation volume and temperatures. The flaw in this analysis are: 1 The normality of the distributions of ventilation volumes has not been checked. Use Kruskal-Wallis test 2. The observations within a single subject are not independent. The subject identification was not used in the analysis. The analysis does not remove variation among subjects by considering each subject as his own control, making use of the fact that variation within subjects is usually less than the variation between subjects. Ignoring correlation among measurements Considering correlation among measurements Nonparametric Kruskal-Wallis test (p=0.613) Parametric One-way ANOVA (GLM) (p=0.607) Friedman Test (p=0.023) Linear mixed effect model (???) Problem 2 Problem 1 In order to solve the problem 2, we can use a linear mixed effect model. A linear mixed effect model is similar to linear regression (or general linear regression) where outcome variables are continuous. A linear regression assumes normality and independence of residuals. Similarly a linear mixed model requires normality assumption, however, it does not requires independence assumption. We will talk about normality part later, here let’s spend some time to learn about independence assumption. ID temp trans1 1 -10 74.5 1 25 81.5 1 37 83.6 1 50 68.6 1 65 73.1 1 80 79.4 2 -10 75.5 2 25 84.6 2 37 70.6 2 50 87.3 2 65 73.0 2 80 75.0 3 -10 68.9 3 25 71.6 3 37 55.9 Are they independent from each other? A quick way to check this is to do correlation analysis. Correlations Temperture Temperture Temperture Temperture Temperture Temperture -10 25 37 50 65 80 Spearman's rhoTemperture -10Correlation Coefficient 1.000 .952** .690 .786* .738* .929** Sig. (2-tailed) . .000 .058 .021 .037 .001 N 8 8 8 8 8 8 Temperture 25 Correlation Coefficient .952** 1.000 .667 .690 .690 .881** Sig. (2-tailed) .000 . .071 .058 .058 .004 N 8 8 8 8 8 8 Temperture 37 Correlation Coefficient .690 .667 1.000 .762* .857** .833* Sig. (2-tailed) .058 .071 . .028 .007 .010 N 8 8 8 8 8 8 Temperture 50 Correlation Coefficient .786* .690 .762* 1.000 .857** .738* Sig. (2-tailed) .021 .058 .028 . .007 .037 N 8 8 8 8 8 8 Temperture 65 Correlation Coefficient .738* .690 .857** .857** 1.000 .857** Sig. (2-tailed) .037 .058 .007 .007 . .007 N 8 8 8 8 8 8 Temperture 80 Correlation Coefficient .929** .881** .833* .738* .857** 1.000 Sig. (2-tailed) .001 .004 .010 .037 .007 . N 8 8 8 8 8 8 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). If each observation is independent, you would expect p>0.05 for Spearman correlation coefficient. In order to perform correlation analysis, data must be entered horizontally, you can use deal.sav dataset for this. In the case, you have created your original database longitudinally and want to convert it to horizontal, here is how to do it. Read Deal.long.sav into SPSS Before you restructure this data, recode negative value for Temperature variable, by either recoding it to (1,2,3,4,5,6) or replace -10 with 10. Go to: Data Restructure Step 1: Welcome Data Structure Wizard - Select the second choice “Restructure selected cases into variables, click “next” Step 2: Select “PATIENT ID” to identifier variable box (upper left box) “Temperature” to index variable box (lower left box), click “next” Step 3: Finish Mathematical Presentation of Correlation Structures • Let rjk donates a correlation coefficient between the jth and kth repeated measures on the same patients. • R(rjk ) is the working correlation matrix of Y 1 0 R .. 0 0 ... 0 1 ... 0 .. 1 ... 0 ... 1 This is what a linear regression model assumes. Since observations are not independent for repeated measure data (observations within a patient are dependent), we cannot use the independence assumption. A linear mixed model requires assumption on “correlation structure”. You need to make a guess on how measures taken repeatedly correlate. 2.00 4 Value 1.00 9 5 16 17 14 12 20 6 13 1 15 10 0.00 19 3 8 7 18 -1.00 11 2 -2.00 Y1 Y2 Y3 Y4 Y5 Y6 Category Data on the above figure assumes variance of Y at each point is the same across all categories, correlation between any 2 sets of Ys are zero (independent). This structure is called independent (scaled identity in SPSS). This is what 2-way ANOVA assumes for a structure of error terms, which is now obvious not providing good fit to the isoproterenol data. Let’s look at the next figure. Variance of each Y are the same across all the categories, correlation between any 2 Ys are the same (ie, correlation is the same when 2 doses are closer or not). This structure is called Compound Symmetry (Exchangeable). 2.00 1 Value 1.00 9 6 14 12 4 15 8 19 20 7 18 2 3 17 0.00 -1.00 5 10 16 11 13 -2.00 Y1 Y2 Y3 Y4 Category Y5 Y6 Let’s look at the next figure. Variance of each Y are the same across all the categories, correlation between any 2 Ys is equal to r(distance between Ys) where -1<r<1 (ie, correlation is the same when 2 doses are closer or not). This structure is called First-Order-Autoregressive. 2.00 20 2 9 10 15 Value 1.00 17 16 19 12 3 1 13 7 4 5 0.00 14 11 18 -1.00 8 6 -2.00 Y1 Y2 Y3 Y4 Category Y5 Y6 AR(1): Heterogeneous. This is a first-order autoregressive structure with heterogeneous variances. 2 Value 2.00 20 15 16 12 7 3 10 13 9 19 1 17 0.00 4 8 11 18 5 14 -2.00 6 Y1 Y2 Y3 Y4 Category Y5 Y6 Mathematical Presentation of Correlation Structures • Let rjk donates a correlation coefficient between the jth and kth repeated measures on the same patients. • R(rjk ) is the working correlation matrix of Y 1 r12 ... r1J r 1 ... r 21 2J R .. .. 1 ... r1 rJ 2 ... 1 Model for the correlation • Independence (called “Scaled Identity” in SPSS) 1 0 R .. 0 • 0 ... 0 1 ... 0 .. 1 ... 0 ... 1 Correlation between any two observations within the same patient is independent Model for the correlation (cont.) • Exchangeable (compound symmetry) 1 r R .. r ... r 1 ... r .. 1 ... r ... 1 r • Any two distinct observations from the same patient have the same correlation coefficient () Model for the correlation (cont.) • Unstructured 1 r 21 R .. r1 r12 1 .. rJ 2 ... r1J ... r2 J 1 ... ... 1 • Each jk has different value, no structure is assumed in R Model for the correlation (cont.) • Auto regressive (1) 1 r R .. J 1 r r 1 .. r J 2 ... r J 1 J 2 ... r 1 ... ... 1 • rjk is function of time lag between 2 points Toeplitz: Often fits well for experimental data. Toeplitz. This covariance structure has homogenous variances and heterogenous correlations between elements. The correlation between adjacent elements is homogenous across pairs of adjacent elements. The correlation between elements separated by a third is again homogenous, and so on. 1 r 1 R r2 r3 r1 r2 1 r1 r1 1 r2 r1 r3 r2 r1 1 Selection of correlation structure • If the number of repeats is small and data are balanced and complete, then an unstructured matrix is recommended • If observations are measured over time, then use a structure that accounts for correlation as function of time (i.e. auto-regressive), choose a model which provides the smallest AIC value. • If observations are clustered (i.e. no logical ordering) then exchangeable may be appropriate The model is able to consider the following covariance structures for repeated measures data available in SPSS. Ante-Dependence: First Order AR(1) AR(1): Heterogeneous ARMA(1,1) Compound Symmetry Compound Symmetry: Correlation Metric Compound Symmetry: Heterogeneous Diagonal Factor Analytic: First Order Factor Analytic: First Order, Heterogeneous Huynh-Feldt Scaled Identity Toeplitz Toeplitz: Heterogeneous Unstructured Unstructured: Correlations Let’s use a linear mixed effect model to analyze Minute ventilation Volume data. Read Isoproterenol.long2.sav into SPSS. Analyze, Mixed Models, Linear, Select ID for Subject variable Select Dose as Repeated variable Select appropriate covariance structure in the Repeated Covariance Type for example AR(1) heterogeneous, Continue Dependent variable: Vent Factor variable (categorical independent variable) : Temp Covariates (continuous independent variables): Click Fixed Click Custom, highlight all independent variables in the box Choose Main Effect, and put them (Temp) in the model box Click EM Means Select Temp into the “Display means for” box Select Bonferroni method Click Compare main effect, select reference category to “first” Click Statistics Choose Parameter estimates, tests for covariance parameters, covariance for residuals Click Save Select Residuals, Predicted Values OK Result of the linear mixed Model with ARH(1) a Information Criteria -2 Restricted Log 297.714 Likelihood Akaike's Information 311.714 Criterion (AIC) Hurvich and Tsai's 315.008 Criterion (AICC) Bozdogan's Criterion 330.877 (CAIC) Schwarz's Bayesian 323.877 Criterion (BIC) The information criteria are displayed in smaller-is-better forms. a. Dependent Variable: Minute Ventilation Volume. a Type III Tests of Fixed Effects Denominator Source Numerator df df F Intercept 1 7.049 485.337 temp 5 21.222 3.099 a. Dependent Variable: Minute Ventilation Volume. Sig. .000 .030 Pairwise Comparisonsb Mean Difference (I-J) 5.425 -2.413 -1.225 -3.938 -1.125 (I) Temperature (J) Temperature Std. Error 25 -10 2.339 37 -10 3.412 50 -10 3.783 65 -10 3.843 80 -10 4.246 Based on estimated marginal means a. Adjustment for multiple comparisons: Bonferroni. b. Dependent Variable: Minute Ventilation Volume. df 14.346 16.480 20.843 23.946 19.444 Sig. a .178 1.000 1.000 1.000 1.000 95% Confidence Interval for Differencea Lower Bound Upper Bound -1.513 12.363 -12.341 7.516 -11.943 9.493 -14.688 6.813 -13.243 10.993 b Pairwise Comparisons 95% Confidence Interval for Mean a Difference Difference a (I) Temperature (J) Temperature (I-J) Std. Error df Sig. Lower Bound Upper Bound P-values were not -10 25 -5.425* 2.339 14.346 .036 -10.431 -.419 adjusted 37 2.413 3.412 16.480 .489 -4.804 9.629 50 1.225 3.783 20.843 .749 -6.645 9.095 For multiple 65 3.938 3.843 23.946 .316 -3.995 11.870 comparisons. 80 1.125 4.246 19.444 .794 -7.749 9.999 25 -10 5.425* 2.339 14.346 .036 .419 10.431 Select Bonferroni 37 7.838* 2.678 16.722 .010 2.180 13.495 option to adjust for 50 6.650 3.458 23.877 .066 -.489 13.789 multiple comparisons. 65 9.363* 3.787 26.581 .020 1.585 17.140 80 6.550 4.285 23.101 .140 -2.312 15.412 37 -10 -2.413 3.412 16.480 .489 -9.629 4.804 Therefore, 25 -7.838* 2.678 16.722 .010 -13.495 -2.180 50 -1.188 2.748 19.753 .670 -6.925 4.550 With the adjustment, 65 1.525 3.528 21.550 .670 -5.800 8.850 None of the pair-wise 80 -1.288 4.177 22.832 .761 -9.932 7.357 Analysis was -10 50 -1.225 3.783 20.843 .749 -9.095 6.645 significant. 25 -6.650 3.458 23.877 .066 -13.789 .489 37 1.188 2.748 19.753 .670 -4.550 6.925 65 2.713 2.579 17.391 .307 -2.720 8.145 80 -.100 3.513 20.601 .978 -7.415 7.215 65 -10 -3.938 3.843 23.946 .316 -11.870 3.995 25 -9.363* 3.787 26.581 .020 -17.140 -1.585 37 -1.525 3.528 21.550 .670 -8.850 5.800 50 -2.713 2.579 17.391 .307 -8.145 2.720 80 -2.813 2.496 14.017 .279 -8.165 2.540 80 -10 -1.125 4.246 19.444 .794 -9.999 7.749 25 -6.550 4.285 23.101 .140 -15.412 2.312 37 1.288 4.177 22.832 .761 -7.357 9.932 50 .100 3.513 20.601 .978 -7.215 7.415 65 2.813 2.496 14.017 .279 -2.540 8.165 Based on estimated marginal means *. The mean difference is significant at the .05 level. a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments). b. Dependent Variable: Minute Vantilation Volume. Predicted model by the linear mixed effect model with ARH(1) Correlation structure. Dot/Lines show Means Fixed Predicted Values 74 .00 0 72 .00 0 70 .00 0 68 .00 0 66 .00 0 0 25 50 Tempe rature 75 Performing One-way ANOVA using a linear mixed effect model option in SPSS When you use independence (scaled identity) structure for correlation, the model becomes equivalent with one-way ANOVA. Result of the linear mixed model ignoring dependency among repeated measures. This value is bigger than one with ARH(1) on page 29 Information Criteriaa -2 Restricted Log 330.525 Likelihood Type III Tests of Fixed Effectsa Akaike's Information 332.525 Criterion (AIC) Denominator Hurvich and Tsai's Source Numerator df df F 332.625 Criterion (AICC) Intercept 1 42 2045.152 Bozdogan's Criterion temp 335.263 5 42 .727 (CAIC) a. Dependent Variable: Minute Ventilation Volume. Schwarz's Bayesian 334.263 Criterion (BIC) The information criteria are displayed in smaller-is-better forms. a. Dependent Variable: Minute Ventilation Volume. Sig. .000 .607 Predicted model by the linear mixed effect model with independence Correlation structure. Dot/Lines show Means Predicted Values 74 .00 0 72 .00 0 70 .00 0 68 .00 0 66 .00 0 0 25 50 75 Tempe rature Model parameter estimates are the same as those of the model with ARH(1) however, standard errors are much smaller with a model with considering correlation among repeated measures. Performing residual diagnosis for a linear mixed model in SPSS. After you perform the analysis on page 28, by using SAVE option, you created 2 new variables; residuals and predicted values. Go to graphics, histogram to create graph 1. Go to graphics, Scatter plot to create graph 2. Graph 1:Checking for normality Graph 2: Checking for trend in residuals (if there is a trend, you may want to try transformation of Y Fitting slopes (null hypothesis 2: slope = 0) In the previous analysis, we treated temperature as a categorical variable, which assesses that means ventilation volume were the same or not. All pair-wise analysis did not show any difference due to power loss (by adjustment for multiple comparisons). In order to assess whether there is increasing or decreasing trend by temperature, we can off course analyze this data with regression slope by treating temperature as continuous instead of categorical. 9 0.0 Linear Regression with 95.00% Mean Prediction Interval Minute Ventilation Volume 8 0.0 7 0.0 Minute Ventilation Volume = 71.48 + -0.04 * temp R-Square = 0.02 6 0.0 5 0.0 0 25 50 75 Temperature LOWSS curve Performing a linear mixed effect model to assess slope is greater or less than zero. Read deal.long.sav into SPSS. Analyze, Mixed Models, Linear, Select ID for Subject variable Select Temp as Repeated variable Select appropriate covariance structure in the Repeated Covariance Type for example AR(1) heterogeneous Continue Dependent variable: Vent Factor variable (categorical independent variable) : Covariates (continuous independent variables): Temp Click Fixed Click Custom, highlight all independent variables in the box Choose Main Effect, and put them (Temp) in the model box Click Statistics Choose Parameter estimates, tests for covariance parameters Click Save Select Residuals, Predicted Values OK Result of the linear mixed model with AHR(1) with temp as continuous. Information Criteria a -2 Restricted Log Likelihood Akaike's Information Criterion (AIC) Hurvich and Tsai's Criterion (AICC) Bozdogan's Criterion (CAIC) Schwarz's Bayesian Criterion (BIC) 332.417 346.417 349.364 366.217 359.217 The information criteria are displayed in smaller-is-better forms. a. Dependent Variable: Minute Ventilation Volume. a Estimates of Fixed Effects Parameter Estimate Std. Error df t Intercept 66.97027 2.929443 8.409 22.861 temp .013971 .045589 22.702 .306 a. Dependent Variable: Minute Ventilation Volume. Sig. .000 .762 95% Confidence Interval Lower Bound Upper Bound 60.271723 73.668818 -.080406 .108348 P=0.762 indicates that the slope is not different from zero. A much simpler way to analyze this data: Response feature analysis, i.e., analysis of summary measures: Using slope as a summary measure: Read Deal Longitudianl.sav dataset into SPSS Graphs, Interactive, Scatterplot, Select Vent as Y-axis, Temp (as Scale) as X-axis, ID (as Categorical) as panel variable Click Fit Select “Include constant in equation” Prediction line for “Mean” Fit line for ‘Total” OK 1 2 3 100.0 Linear Regression w ith 95.00% Mean Prediction Interval 90.0 VF 80.0 VF = 78.49 + -0.02 * Temp R-Square = 0.01 VF = 77.16 + -0.01 * Temp R-Square = 0.00 70.0 60.0 VF = 67.73 + -0.10 * Temp R-Square = 0.33 50.0 4 5 6 100.0 90.0 80.0 VF 70.0 60.0 VF = 76.32 + -0.12 * Temp R-Square = 0.13 50.0 7 8 90.0 VF 80.0 VF = 84.25 + -0.01 * Temp R-Square = 0.01 VF = 71.14 + -0.06 * Temp R-Square = 0.43 60.0 50.0 0 25 Tem p 50 75 VF = 59.13 + -0.04 * Temp R-Square = 0.07 100.0 70.0 + 0.00 * Temp VF = 57.65 R-Square = 0.00 0 25 Tem p 50 75 Now, open Deal.sav and create a new variable Slope and type each person’s slope v Then perform One-sample non-parametric test (N is small) for the slope In order to perform one-sample non-parametric Test in SPSS, you need this trick. Create a dummy variable with all 0’s Then go: Analyze 2-related samples Select Slope and Dummy to test pair list Select Wilcoxon as test type OK (SPSS does not work with only SLOPE variable) ID Slope Dumm y 1 -0.01 0 2 -0.02 0 3 -0.4 0 4 0 0 5 -0.12 0 6 -0.04 0 7 -0.06 0 8 -0.01 0 Test Statisticsb Z Asymp. Sig. (2-tailed) dummy Slope -2.371 a .018 a. Bas ed on negative ranks. Using slopes to test for trends: b. Wilcoxon Signed Ranks Tes t Wilcoxon signed-rank tests: P=0.018 We now can conclude that there is a significant association between minute ventilation volume and temperature (as temperature increases, ventilation volume decreases) Using summary measures can provide more intuitive and simplified approach which some times provides bigger power to detect differences.