supplementary material Appendix I

advertisement
SPSS
Step 1: Normality tests
go to Analyze – Descriptive Statistics <select> Explore from the drop down list. In
the box that opens enter the column identifier for the data that you wish to test in the
Dependent List box. Click on Plots and tick the Normality plots with tests option.
Click Continue then click Ok.
The relevant output for this test can be found in the following table:
Tests of Normality
Kolmogorov-Smirnova
Statistic
Metabolism
.140
Df
Shapiro-Wilk
Sig.
15
.200
Statistic
*
.957
df
Sig.
15
.633
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.
The significance of the test is indicated by the p value in the table. If the p value is
less than 0.05 then the distribution of data differs significantly from normal. If it
is > .05 then the data can be considered normally distributed. A normality plot will
also be shown called Normal Q-Q Plot of ‘column identifier’.
EXAMINE VARIABLES=Metabolism
/PLOT BOXPLOT STEMLEAF NPPLOT
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Step 3: Attempt to normalise the distribution by transforming it.
Data can be easily transformed by using the Transform – Compute Variable
command. Enter a name for your new variable in the Target Variable box and enter
your transformation in the Numeric Expression box (e.g., LG10(Variable name)).
SPSS will create a new column with the transformed variable.
COMPUTE LOGMetabolism=LG10(Metabolism).
EXECUTE.
Alternatively data may be transformed using the Box-Cox procedure
Go to Transform – Prepare Data for Modelling <select> Automatic from the
drop down list. In the Fields tab you can specify which variables to transform by
moving them to the Inputs box. In the Settings tab click on Rescale Fields. Tick the
box before ‘Rescale a continuous target with a Box-Cox transformation to reduce
skew’. Click Run. This will create a new column with the transformed variable.
Step 7: Paired comparisons of parametric normally distributed data
For two sample t-test
Go to Analyze – Compare Means <select> Independent Samples T-Test from the
drop down list. This will open a new window. Data should be organized as shown in
Table 1. Add the variable to be tested in the Test Variable(s) box. You can enter
multiple variables (e.g., metabolism and body mass) if you want. Identify which data
belong to which treatment by entering your treatment column into the Grouping
Variable box. You now need to define your treatment groups by clicking the Define
Groups button. A new window will pop up. When using numeric treatment groups
enter the values for Group 1 and Group 2 (i.e., 1 and 2 respectively in our example in
Table 1). If you have used string codes you need to identify the codes between
apostrophe’s, e.g., for control = C and treatment =T, enter “C” for Group 1 and “T”
for Group 2. Click Continue. Click Ok. The output appears as follows.
Group Statistics
Treatment
Metabolism
N
Mean
Std. Deviation
Std. Error Mean
1.00
10
1.5380
.19510
.06169
2.00
5
.9760
.20403
.09125
Independent Samples Test
Levene's
Test for
Equality of
Variances
t-test for Equality of Means
95% Confidence Interval
Sig.
Metabolism
Equal
df
(2-
Mean
Std. Error
tailed)
Difference
Difference
of the Difference
F
Sig.
t
Lower
Upper
.348
.565
5.185
13
.000
.56200
.10839
.32784
.79616
5.102
7.771
.001
.56200
.11015
.30670
.81730
variances
assumed
Equal
variances
not
assumed
The first table summarizes your data and gives sample sizes, Mean, SD and
SEM. The second table gives the results for the t-test. A t-test assumes equal
variances and the first two columns in the table show the results for Levene’s
test for equality of variances. If p>0.05 variances are equal and you will use
the top row. The next 3 columns give the results for the independent t-test (tvalue, degrees of freedom and p-value respectively). In this case p<0.05 and
the differences between the groups is statistically significant.
T-TEST GROUPS=treatment(1 2)
/MISSING=ANALYSIS
/VARIABLES=Metabolism
/CRITERIA=CI(.95).
For Paired t-test
Go to Analyze – Compare Means <select> Paired Samples T-Test from
the drop down list. This will open a new window. Data should be organized
as shown in Table 2.
Add the column identifiers of interest into the Paired Variables box next to
pair 1 (i.e., under variable1 and variable2). Click Ok.The output appears as
follows.
Paired Samples Correlations
N
Pair 1
Metabolism & Metabolism2
Correlation
15
.909
Sig.
.000
Paired Samples Test
Paired Differences
t
df
Sig. (2-
95% Confidence Interval
Mean
Pair
Metabolism -
1
Metabolism2
Std.
Std. Error
Deviation
Mean
-.12000
.14933
tailed)
of the Difference
Lower
.03856
Upper
-.20270
-.03730
-3.112
14
.008
Paired Samples Statistics
Mean
Pair 1
N
Std. Deviation
Std. Error Mean
Metabolism
1.3507
15
.33401
.08624
Metabolism2
1.4707
15
.25018
.06460
The first table shows if there is a significant correlation between the two
measurements of interest. In this case there is a significant correlation
between both measurements of metabolism (p<0.01). This need not always
be the case, and the results for the paired t-test are valid also if there is not a
significant relationship between the variables of interest. The last three
columns of the second table show the results for your paired t-test (t-value,
degrees of freedom and p-value respectively). In this case p<0.05 and a
significant effect of treatment is shown. The third table shows your
descriptive statistics (mean, sd and sem).
T-TEST PAIRS=Metabolism WITH Metabolism2 (PAIRED)
/CRITERIA=CI(.9500)
/MISSING=ANALYSIS.
Step 8: multiple treatment levels, parametric tests
For One Way ANOVA
Go to Analyze – Compare Means <select> One way ANOVA from the drop down
list. This will open a new window. Data should be organized as shown in Table 1.
Add variable of interest (e.g., metabolism) to Dependent List box. Add column
identifier for treatment levels to the Factor box (e.g., levels). Click Ok. The output
appears as follows.
ANOVA
Metabolism
Sum of Squares
Between Groups
Within Groups
Total
df
Mean Square
1.342
2
.671
.220
12
.018
1.562
14
F
36.582
Sig.
.000
The F and P values for the treatment effect and for the individual effect are shown in
the last two columns of the table. A p-value < than .05 indicates a significant effect.
In this example there was a significant treatment effect.
ONEWAY Metabolism BY levels
/MISSING ANALYSIS.
For Repeated Measured ANOVA
Data should be organized as shown in Table 2. Go to Analyze – General Linear
Model <select> Repeated Measures from the drop down list. This will open a new
window. Add the ‘Number of levels’ that you have in the appropriate box (i.e., the
number of repeated measurements; i.e., in this case 3). Then click Add. Click Define.
This will open a new window.
Add the column identifiers of interest into the Within-Subjects Variables box. The
number of levels that you have identified in the previous step will show up in the
box and you need to add the column identifiers for each level. If you used different
levels of treatments or other factor that are different between subjects like for
instance sex, this can be added to the Between-Subjects Factor(s) box. Click Ok. The
relevant output for this test can be found in the following tables:
Tests of Within-Subjects Effects
Measure:MEASURE_1
Type III Sum of
Source
Squares
Treatment
Error(Treatment)
df
Mean Square
F
Sig.
Sphericity Assumed
.109
2
.054
3.779
.035
Greenhouse-Geisser
.109
1.799
.060
3.779
.041
Huynh-Feldt
.109
2.000
.054
3.779
.035
Lower-bound
.109
1.000
.109
3.779
.072
Sphericity Assumed
.403
28
.014
Greenhouse-Geisser
.403
25.189
.016
Huynh-Feldt
.403
28.000
.014
Lower-bound
.403
14.000
.029
Tests of Between-Subjects Effects
Measure:MEASURE_1
Transformed Variable:Average
Type III Sum of
Source
Intercept
Error
Squares
Df
Mean Square
89.183
1
89.183
2.802
14
.200
F
445.546
Sig.
.000
The Tests of Within-Subject Effects Table shows the results for your repeated
measures (e.g., metabolism measured at different time points in the same individual).
In our example (Table 2) this refers to metabolism measured at control, treatment 1
and treatment 2. The first row (Spericity Assumed) shows if there was a significant
effect of treatment (F1,28=3.8, p=0.035 in our example).
If you added a Between-Subjects Factor (e.g., sex) the F and p-values will be shown
in the last two columns of the Tests of Between-Subjects Effects Table. In this case,
no value is shown because no factor was added.
Note that sphericity is assumed in a RM ANOVA. A violation of sphericity occurs
when the variances of the differences between all combinations of the groups are not
equal and this is tested in the Mauchly's Test of Sphericity test for which the results
are given as part of the SPSS output in a RM ANOVA (in Mauchly's Test of
Sphericity table). When the probability of Mauchly's test statistic is less than or equal
to 0.05 (i.e., p < .05), sphericity cannot be assumed and a correction needs to be
made to the F and p-value. SPSS provides three corrections, Greenhouse-Geisser,
Huynh-Feldt and Lower-bound. For more details about these tests we would refer
you to a statistical text book.
GLM Metabolism1 Metabolism2 Metabolism3
/WSFACTOR=Treatment 3 Polynomial
/METHOD=SSTYPE(3)
/CRITERIA=ALPHA(.05)
/WSDESIGN=Treatment.
Step 10: post hoc tests
For One Way ANOVA
Conduct the one-way ANOVA as described under section 9. However before
clicking on OK, click on the button labelled Post Hoc. This will open a new
window. Choose the test you want to use by ticking the appropriate box, e.g., Tukey
test. Click Continue and click OK. This time in addition to the ANOVA results there
is an additional output below the analysis of variance table as follows.
Post Hoc Tests
Multiple Comparisons
Metabolism
Tukey HSD
95% Confidence Interval
Mean Difference
(I) levels
1.00
2.00
(J) levels
Std. Error
Sig.
Lower Bound
Upper Bound
2.00
.34000
*
.08565
.005
.1115
.5685
3.00
.73200*
.08565
.000
.5035
.9605
1.00
-.34000
*
.08565
.005
-.5685
-.1115
.39200
*
.08565
.002
.1635
.6205
1.00
-.73200
*
.08565
.000
-.9605
-.5035
2.00
-.39200*
.08565
.002
-.6205
-.1635
3.00
3.00
(I-J)
Multiple Comparisons
Metabolism
Tukey HSD
95% Confidence Interval
Mean Difference
(I) levels
(J) levels
1.00
2.00
.34000*
.08565
.005
.1115
.5685
3.00
.73200
*
.08565
.000
.5035
.9605
-.34000
*
.08565
.005
-.5685
-.1115
3.00
.39200*
.08565
.002
.1635
.6205
1.00
-.73200
*
.08565
.000
-.9605
-.5035
-.39200
*
.08565
.002
-.6205
-.1635
2.00
(I-J)
1.00
3.00
2.00
Std. Error
Sig.
Lower Bound
Upper Bound
*. The mean difference is significant at the 0.05 level.
Homogeneous Subsets
Metabolism
Tukey HSD
a
Subset for alpha = 0.05
Levels
N
1
3.00
5
2.00
5
1.00
5
Sig.
2
3
.9760
1.3680
1.7080
1.000
1.000
1.000
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 5.000.
The Multiple Comparisons table shows pair-wise comparisons for the different
levels of treatment. P values<0.05 between groups indicate the groups differ
significantly. The Homogeneous Subsets table summarises the results from the
multiple comparisons and shows the mean values for the different levels of treatment
and whether they differ or not. In this case metabolism is different between all levels
of treatment.
ONEWAY Metabolism BY levels
/MISSING ANALYSIS
/POSTHOC=TUKEY ALPHA(0.05)
For Repeated Measures ANOVA
Conduct the Repeated Measures ANOVA as described under section 9. However
before clicking on OK, click on the button labelled Post Hoc. This will open a new
window. Add the Factor of interest to the ‘Post Hoc Tests For’ box. Choose the test
you want to use by ticking the appropriate box, e.g., Tukey test. Click Continue and
click OK. This time in addition to the ANOVA results there is an additional output
below the analysis of variance that is similar to the output shown above for the OneWay ANOVA.
GLM Metabolism Metabolism2 BY levels
/WSFACTOR=Time 2 Polynomial
/METHOD=SSTYPE(3)
/POSTHOC=levels(TUKEY)
/CRITERIA=ALPHA(.05)
/WSDESIGN=Time
/DESIGN=levels.
Step 11
SPSS does not provide the capability to perform power analysis. Alternative
programs need to be used instead.
Step 13 : two way anova
For Two-way ANOVA
Data should be organized as shown in Table 1. Go to Analyze – General Linear
Model <select> Univariate from the drop down list. This will open a new window.
Add your variable to be tested to the Dependent Variable box (e.g., metabolism).
Add your fixed factors to the Fixed Factor(s) box (e.g., treatment and sex). The
output appears as follows.
Between-Subjects Factors
N
Sex
Levels
.00
9
1.00
6
1.00
5
2.00
5
3.00
5
Tests of Between-Subjects Effects
Dependent Variable:Metabolism1
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
Corrected Model
1.417
a
5
.283
17.596
.000
Intercept
26.645
1
26.645
1654.406
.000
.033
1
.033
2.065
.185
1.210
2
.605
37.577
.000
Sex * Levels
.042
2
.021
1.300
.319
Error
.145
9
.016
Total
28.926
15
1.562
14
Sex
Levels
Corrected Total
a. R Squared = .907 (Adjusted R Squared = .856)
The last two columns of the Tests of Between-Subjects Effects table show the F and
p-values. In this case there was a significant effect of treatment level (p<0.001), but
there was no significant effect of sex (p=0.185) and no significant sex by treatment
interaction (p=0.319). A significant interaction effect implies that both sexes
responded differently to the treatment (e.g., one sex in or decreased more than the
other).
For Repeated Measures Two-way ANOVA
Data should be organised as shown in Table 2. As Repeated measures one-way
ANOVA (see 9), but add a Between-Subjects Factor (e.g., sex). In the output the
extra factor will be included in both the Tests of Within and between effects tables.
Step 14: Same as 13 for Two-way ANOVA, but add extra fixed factor(s) in the Fixed Factor(s)
box. Factors can be added to or removed from the model by using the Model button. This will open
a new window. Tick Custom and add/remove variables of interest.
Step 18:
non parametric tests paired comparisons
Mann whitney U-test
Go to Analyze – Nonparametric tests – Legacy dialogs <select> Two
independent samples from the drop down list. This will open a new window. Data
should be organized as shown in Table 1. Add variable to be tested to the Test
Variable list box (e.g., metabolism). Identify which data belong to which treatment
by entering your treatment column into the Grouping Variable box. You now need
to define your treatment groups by clicking the Define Groups button. A new
window will pop up. When using numeric treatment groups enter the values for
Group 1 and Group 2 (i.e., 1 and 2 respectively in our example in Table 1). If you
have used string codes you need to identify the codes between apostrophe’s, e.g., for
control = C and treatment =T, enter “C” for Group 1 and “T” for Group 2. Click
Continue. Make sure the box before Mann-Whitney U is ticked. Click Ok. The
output appears as follows.
Ranks
treatment
Metabolism
N
Mean Rank
Sum of Ranks
1.00
10
10.50
105.00
2.00
5
3.00
15.00
Total
15
Test Statisticsb
Metabolism
Mann-Whitney U
.000
Wilcoxon W
15.000
Z
-3.067
Asymp. Sig. (2-tailed)
.002
Exact Sig. [2*(1-tailed Sig.)]
.001a
a. Not corrected for ties.
b. Grouping Variable: treatment
Z and p-value (Asump. Sig. (2-tailed)) are shown in the Test statistics table.
NPAR TESTS
/M-W= Metabolism BY treatment(1 2)
/MISSING ANALYSIS.
Wilcoxon matched pairs test
Go to Analyze – Nonparametric Tests – Legacy dialogs <select> Two Related
Samples from the drop down list. This will open a new window. Data should be
organized as shown in Table 2.
Add the column identifiers of interest into the Test Pairs box next to pair 1 (i.e.,
under variable1 and variable2). Make sure the box before Wilcoxon in ticked. Click
Ok. The output appears as follows.
Ranks
N
Metabolism2 - Metabolism
Mean Rank
Sum of Ranks
Negative Ranks
3
a
4.00
12.00
Positive Ranks
12b
9.00
108.00
c
Ties
0
Total
15
a. Metabolism2 < Metabolism
b. Metabolism2 > Metabolism
c. Metabolism2 = Metabolism
Test Statisticsb
Metabolism2 –
Metabolism
-2.728a
Z
Asymp. Sig. (2-tailed)
.006
a. Based on negative ranks.
b. Wilcoxon Signed Ranks Test
Z and p-value (Asump. Sig. (2-tailed)) are shown in the Test statistics table.
NPAR TESTS
/WILCOXON=Metabolism WITH Metabolism2 (PAIRED)
/MISSING ANALYSIS.
Step 19 non parametric analysis when there are multiple treatments or levels:
Kruskal-Wallis ANOVA
Go to Analyze – Nonparametric tests – Legacy dialogs <select> k independent
samples from the drop down list. This will open a new window. Data should be
organized as shown in Table 1. Add variable to be tested to the Test Variable list box
(e.g., metabolism). Identify which data belong to which treatment by entering your
treatment column into the Grouping Variable box. You now need to define your
treatment groups by clicking the Define Range button. A new window will pop up.
Enter Minimum and Maximum values for your groups (i.e., 1 - 3 respectively for the
treatment levels in our example in Table 1). Click Continue. Make sure the box
before Kruskal-Wallis H is ticked. Click Ok. The output appears as follows.
Ranks
levels
Metabolism
N
Mean Rank
1.00
5
13.00
2.00
5
8.00
3.00
5
3.00
Total
15
Test Statisticsa,b
Metabolism
Chi-Square
12.545
Df
2
Asymp. Sig.
.002
a. Kruskal Wallis Test
b. Grouping Variable: levels
X2 and p-value (Asump. Sig. (2-tailed)) are shown in the Test Statistics table.
NPAR TESTS
/K-W=Metabolism BY levels(1 3)
/MISSING ANALYSIS.
Repeated measures Friedman Test
Go to Analyze – Nonparametric Tests – Legacy dialogs <select> k Related
Samples from the drop down list. This will open a new window. Data should be
organized as shown in Table 2.
Add the column identifiers of interest into the Test Variables. Click Ok. The output
appears as follows.
Ranks
Mean Rank
Metabolism
1.20
Metabolism2
1.80
Test Statisticsa
N
Chi-Square
Df
Asymp. Sig.
15
5.400
1
.020
a. Friedman Test
X2 and p-value (Asump. Sig. (2-tailed)) are shown in the Test Statistics table.
Step 23 Analysis of Covariance: ANCOVA
Go to Analyze – General Linear Model <select> Univariate from the drop down list. This will
open a new window. Data should be organized as shown in Table 1. Add your variable to be tested
to the Dependent Variable box (e.g., metabolism). Add the column identifier for treatments to the
Fixed Factor(s) box (e.g., levels) and add the covariate (e.g., body mass) to the Covariate box. The
output appears as follows.
Tests of Between-Subjects Effects
Dependent Variable: Metabolism
Source
Type III Sum of
df
Mean Square
F
Sig.
Squares
a
3
.468
32.441
.000
Intercept
.001
1
.001
.065
.804
Bodymass
.061
1
.061
4.264
.063
1.325
2
.663
45.957
.000
Error
.159
11
.014
Total
28.926
15
1.562
14
Corrected Model
Levels
Corrected Total
1.403
a. R Squared = .898 (Adjusted R Squared = .871)
F and P-values are shown in the final two columns. In this example the bodyweight
effect did not reach statistical significance (p=0.063), but there was a significant
effect of treatment (p<0.05). In SPSS the default full factorial model does not
include an interaction effect between body mass and treatment.
To include this interaction rerun the analysis (go to Analyze – General Linear
Model <select> Univariate from the drop down list) and click the Model button.
This will open a new window. Tick the Custom box and then click and add factors as
appropriate using the arrow button. To add an interaction effect, select two factors
(in this case body mass and levels) and click the arrow button and the interaction
effect will show up in the right box (i.e., levels x body mass). The output is as
follows.
Tests of Between-Subjects Effects
Dependent Variable: Metabolism
Source
Type III Sum of
df
Mean Square
F
Sig.
Squares
a
5
.288
21.138
.000
Intercept
.004
1
.004
.288
.604
Levels
.056
2
.028
2.055
.184
Bodymass
.077
1
.077
5.644
.042
Levels * Bodymass
.036
2
.018
1.323
.314
Error
.123
9
.014
Total
28.926
15
1.562
14
Corrected Model
1.439
Corrected Total
a. R Squared = .922 (Adjusted R Squared = .878)
Note that when the interaction effect is included there is no significant effect of
treatment level. The interaction effect is also not significant. In this case you would
remove the interaction effect and analyse the data without the interaction effect as
shown above.
Step 37 Correlation matrix
Select Analyse <select> Correlation and <select> Bivariate. This will open a new
window. Add variables of interest to the variables box. Tick relevant box under
correlation coefficients (i.e., Pearson). The output is a correlation matrix.
A typical matrix might be as follows for a situation where 5 organ weights are available.
BAT
0.13
Skeletal
muscle
0.66
0.17
0.55
0.93
1.00
0.11
0.03
Organ
Liver
WAT
Brain
Liver
1.00
0.32
1.00
WAT
Brain
0.22
Skeletal
muscle
1.00
0.61
This table highlights that WAT and BAT are highly correlated and hence not independent
predictor variables. Skeletal muscle also is quite highly correlated to the liver mass. One can
proceed with the analysis ignoring these effects but one should be aware that such
correlations may compromise the outcome. In this case a strong effect of WAT might
emerge because of the effect of BAT on metabolism combined with the high correlation of
WAT with BAT. This analysis requires the number of observations (i.e., individuals) to
exceed by at least a factor of 3 the number of predictor variables included into the analysis.
Hence in this situation one would have 5 predictors so for each group (i.e., treatment levels
and control) one would need at least 15 individuals – and preferably many more.
Interpretation of these effects depends on the complexity of the interactions. The bottom line
is to diagnose an overall treatment effect controlling for these body composition variables. If
there is an overall treatment effect one can establish where this occurs using the multiple
range tests (TUKEY TEST and DUNCAN’S MULTIPLE RANGE TEST).
Step 38: PRINCIPAL COMPONENTS ANALYSIS.
Principal Component Analysis can be performed using SPSS, but the procedure to
do so is hidden within the procedure for factor analysis. Go to Analyze <select>
Dimension Reduction <select> Factor. This opens a new window. Add the column
identifiers of the variables of interest to the Variables box. Click the Extraction
button. This will open a new window. Under Method select Principal Components
using the drop-down menu. Tick Correlation matrix and Unrotated factor solution.
To restrict the number of components tick Fixed number of Factors and type the
number of components you want in the Factors to extract box (e.g., 5). Click
Continue this will take you back to the first window. Click Ok. The output is as
follows.
Communalities
Initial
Extraction
Carcass
1.000
.924
HEART
1.000
.874
LIVER
1.000
.784
KIDNEY
1.000
.926
BRAIN
1.000
.672
Brown Fat
1.000
.861
Abdominal Fat
1.000
.926
Gonadal Fat
1.000
.939
Mesenteric Fat
1.000
.836
Gonads
1.000
.914
Large Intestine (g)
1.000
.807
Small Intestine (g)
1.000
.735
Stomach
1.000
.805
Lungs
1.000
.911
Pancreas
1.000
.839
Pelage
1.000
.942
Tail
1.000
.895
Extraction Method: Principal Component Analysis.
Total Variance Explained
Initial Eigenvalues
Component
Total
% of Variance
Extraction Sums of Squared Loadings
Cumulative %
Total
% of Variance
Cumulative %
1
7.688
45.221
45.221
7.688
45.221
45.221
2
3.430
20.176
65.396
3.430
20.176
65.396
3
1.614
9.493
74.889
1.614
9.493
74.889
4
1.060
6.235
81.123
1.060
6.235
81.123
5
.798
4.696
85.819
.798
4.696
85.819
6
.648
3.815
89.634
7
.550
3.237
92.871
8
.354
2.083
94.953
9
.283
1.663
96.616
10
.180
1.061
97.677
11
.125
.735
98.412
12
.103
.605
99.018
13
.075
.440
99.458
14
.046
.272
99.731
15
.023
.134
99.865
16
.019
.110
99.974
17
.004
.026
100.000
Extraction Method: Principal Component Analysis.
Component Matrixa
Component
1
2
3
4
5
Carcass
.923
-.198
-.090
-.145
.069
HEART
.655
.530
.067
.212
.337
LIVER
.785
-.399
.021
-.017
.089
KIDNEY
.800
.501
.113
-.035
-.145
BRAIN
.373
.687
.229
-.090
.012
-.639
-.218
.615
.021
-.161
Abdominal Fat
.625
-.617
.178
.335
-.106
Gonadal Fat
.743
-.472
.140
.287
-.250
Mesenteric Fat
.796
-.179
.310
.202
.182
Gonads
.386
-.559
-.186
-.465
.449
Large Intestine (g)
.869
-.025
-.124
.009
.187
Brown Fat
Small Intestine (g)
.412
.107
.730
-.132
.059
Stomach
.370
.589
.430
-.368
-.027
-.052
.689
-.145
.586
.264
Pancreas
.765
.223
-.320
-.068
-.312
Pelage
.909
-.283
-.054
.056
-.173
Tail
.635
.518
-.340
-.168
-.284
Lungs
Extraction Method: Principal Component Analysis.
a. 5 components extracted.
MINITAB
Step 1: Normality tests
Go to statistics tab, <select> basic statistics from the drop down list <select>
normality test (second from bottom). In the box that opens enter the column
identifier for the data that you wish to test. Click on the actual test you wish to
perform (e.g. Anderson-Darling test).
A typical output for this test looks like the following for the data in table 1 column 2
Probability Plot of Metabolism
Normal
99
Mean
StDev
N
AD
P-Value
95
90
1.351
0.3340
15
0.239
0.732
Percent
80
70
60
50
40
30
20
10
5
1
0.50
0.75
1.00
1.25
1.50
Metabolism
1.75
2.00
2.25
The significance of the test is indicated by the p value in the box to the right of the
plot. If the p value is less than 0.01 then the distribution of data differs significantly
from normal. If it is > .01 then the data can be considered normally distributed.
Step 3: Attempt to normalise the distribution by transforming it.
Data can be easily transformed by going to Calc - Calculator. This will open a
window. Enter a name for your new variable in the Store results in Variable box and
enter your transformation in the Expression box (e.g., LOGTEN(Column Identifier)).
Click Ok. Minitab will create a new column with the transformed variable.
Alternatively data can be transformed using the box-cox procedure.
Go to the statistics tab. Select Control charts from the dropdown box. Select BOXCOX from the options that appear. This opens a new window. Type the column
identifier in the box you want to transform. Insert the number 1 in the box that says
sub-group size. Click on options. In the new window that appears type a column
identifier (e.g. C9 for column 9) in the box that says ‘store transformed data in’
where you want the transformed data to be stored. Click on OK. Closes window.
Click on OK. Perfoms analysis. A typical output looks like this.
Box-Cox Plot of BEE
Lower CL
Upper CL
Lambda
1.3
(using 95.0% confidence)
Estimate
0.88
Lower CL
Upper CL
1.2
StDev
Rounded Value
-0.00
1.91
1.00
1.1
1.0
0.9
Limit
0.8
-5.0
-2.5
0.0
Lambda
2.5
5.0
The plot shows the optimal transformation value (lambda). The transformed data will
now be in the column you specified in the options.
Step 7: Paired comparisons of parametric normally distributed data
For two sample t-test
Go to the statistics tab and select ‘basic statistics’ from the drop down tab.
Select 2t two sample t… from the available options and click on it. Opens new
window.
If you have formatted the data as detailed in the Table 1 then the data you are testing
will be in one column (e.g., in the above example the energy expenditure data is in
column 2) and the codes identifying which data are treatment and which control are
in another column (in the above example column 4). In the new window select the
‘data in one column’ button and enter C2 in the data box and c4 in the subscripts
box.
Typical output (for analysis of metabolism against treatment group in Table one)
looks as follows:
Two-sample T for Metabolism
Treatment
1
2
N
10
5
Mean
1.538
0.976
StDev
0.195
0.204
SE Mean
0.062
0.091
Difference = mu (1) - mu (2)
Estimate for difference: 0.562
95% CI for difference: (0.302, 0.822)
T-Test of difference = 0 (vs not =): T-Value = 5.10
DF = 7
P-Value = 0.001
The t-value and p value are shown on the bottom line. If P < .05 then the difference
between the two groups is significant. Data for mean, sd and se for each of the
treatment groups is shown in the table. In this case (metabolism data from table 1)
there was a significant difference between treatment and control groups.
For Paired t-test
To use the paired t-test the data in Minitab needs to be organised as shown in Table
2, i.e., the data we are interested in testing for the treatment needs to be placed in a
separate column from the control data and data from the same individual needs to be
aligned in the same row.
Go to the statistics tab. Select ‘basic statistics’ from the dropdown box. This opens a
new window. Select t..t paired t test. Opens new window. Click the ‘samples in
columns’ button.
Enter the column identifiers into the two boxes. Click OK.
Typical output (for data in table 2 comparing metabolism 1 and metabolism 2) looks
as follows:
Paired T for Metabolism1 - Metabolism2
Metabolism1
Metabolism2
Difference
N
15
15
15
Mean
1.3507
1.4707
-0.1200
StDev
0.3340
0.2502
0.1493
SE Mean
0.0862
0.0646
0.0386
95% CI for mean difference: (-0.2027, -0.0373)
T-Test of mean difference = 0 (vs not = 0): T-Value = -3.11
0.008
P-Value =
The t-value and p value are shown on the bottom line. If P < .05 then there is a
difference between the treatment and control. Sign of the t and the values in the table
indicate the direction of the difference. In this case the difference is highly
significant (P <.01) and the metabolism 1 is lower than metabolism 2.
Step 8: multiple treatment levels, parametric tests
For One way ANOVA
Go to the statistics tab. <Select> ANOVA from the drop-down box. If the data are
all in a single column with the identifiers for them in a second column then <Select>
‘one way...’ from the options. This opens a new window. In the response box type
the identifier for the variable being tested (e.g., metabolism). In the factor box type
the column that contains the treatment levels. On the other hand if the data are
structured as in Table 2 with each measurement in a separate column <Select> ‘one
way (unstacked)...’. This opens a new window. Enter the column identifiers for the
columns containing the data into the box marked ‘Responses (in separate columns)’.
Then click on OK.
Using the data from Table 1 the output appears as follows:
One-way ANOVA: Metabolism versus Levels
Source
Levels
Error
Total
DF
2
12
14
SS
1.3418
0.2201
1.5619
S = 0.1354
Level
1
2
3
N
5
5
5
MS
0.6709
0.0183
F
36.58
R-Sq = 85.91%
Mean
1.7080
1.3680
0.9760
StDev
0.0769
0.0864
0.2040
P
0.000
R-Sq(adj) = 83.56%
Individual 95% CIs For Mean Based on
Pooled StDev
------+---------+---------+---------+--(----*-----)
(-----*----)
(----*----)
------+---------+---------+---------+--1.00
1.25
1.50
1.75
Pooled StDev = 0.1354
F and P values are shown in the variance table at the start of the output. If the P value
is less than .05 then there is a significant effect of the treatment. In this case there is a
highly significant treatment effect. (Note Minitab refers to p values less than .001 as
0.000. These should be cited as P < .001).
Repeated measures ANOVA:
There is no specific procedure in MINITAB to perform a repeated measures
ANOVA. The best way to perform this test is to use the general linear model test and
include individual ID as a random factor in the model. The data needs to be in the
‘stacked format’ for this analysis – ie all the data need to be in a single column with
other columns identifying the treatment and the individual IDs. The following
analysis uses the data from Table 2 where 15 individuals are measured in 3
conditions (control and 2 treatments labelled metabolism1, 2 and 3)
To perform this test, go to the statistics tab. <Select> ANOVA. From the options
that appear <select> GLM – general linear model. This opens a new window. In
the box that opens type the column identifier for the variable you are interested in
testing into the response box. In the model box you need to enter the column
identifier for the treatment levels and the column identifier for the column containing
the individual IDs. In the box marked ‘random factors’ enter the same column
identifier for the IDs. (Note if each individual is measured only once in each
condition an interaction of individual and treatment level cannot be tested). The
output appears as follows.
General Linear Model: Metabolism versus ID, Treatment
Factor Type
ID
random
trmt
fixed
Levels Values
15 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
3 Metabolism1, Metabolism2, Metabolism3
Analysis of Variance for C21, using Adjusted SS for Tests
Source
DF
Seq SS
Adj SS
Adj MS
F
P
ID
trmt
Error
Total
14
2
28
44
2.80231
0.10875
0.40292
3.31398
S = 0.119958
2.80231
0.10875
0.40292
R-Sq = 87.84%
0.20017
0.05438
0.01439
13.91
3.78
0.000
0.035
R-Sq(adj) = 80.89%
The F and P values for the treatment effect and for the individual effect are shown in
the variance table. A value less than .05 indicates a significant effect. In this example
there was both a significant treatment effect, and also a significant individual effect.
Step 10: post hoc tests
For paired t-test see procedure detailed above in section 8. For post hoc tests proceed
as follows. Conduct the one-way ANOVA as described under step 8. However
before clicking on OK, click on the button labelled comparisons. Choose the test you
want to use e.g. Tukey test. This time in addition to the ANOVA results there is an
additional output below the analysis of variance table as follows.
Grouping Information Using Tukey Method
Levels
1
2
3
N
5
5
5
Mean
1.7080
1.3680
0.9760
Grouping
A
B
C
Means that do not share a letter are significantly different.
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons among Levels of Levels
Individual confidence level = 97.94%
Levels = 1 subtracted from:
Levels
2
3
Lower
-0.5683
-0.9603
Center
-0.3400
-0.7320
Upper
-0.1117
-0.5037
----+---------+---------+---------+----(----*-----)
(-----*----)
----+---------+---------+---------+-----0.80
-0.40
-0.00
0.40
Levels = 2 subtracted from:
Levels
3
Lower
-0.6203
Center
-0.3920
Upper
-0.1637
----+---------+---------+---------+----(-----*-----)
----+---------+---------+---------+-----0.80
-0.40
-0.00
0.40
The first part of this output shows the pairwise comparisons of each level. Inthis case
the 3 groups all differ significantly from each other which is indicated in the table by
the fact none of them share a letter adjacent to the level identifier.
The information under the table shows the pairwise differences and their confidence
limits.
For repeated measures ANOVA tested in GLM
Repeat the analysis as specified above under step 8 but before clicking on OK to run
the test click on ‘comparisons’. In the new window that opens enter the column
identifier for the treatment in the box labelled ‘terms’. Select the test required, e.g.
Tukey test. The output appears as follows.
Grouping Information Using Tukey Method and 95.0% Confidence
C22
Metabolism2
Metabolism3
Metabolism1
N
15
15
15
Mean
1.5
1.4
1.4
Grouping
A
A B
B
Means that do not share a letter are significantly different.
In this instance metabolism 1 doesn’t differ from metabolism 3 but it does differ
from metabolism 2. However metabolism 2 and 3 are also not significantly different.
Step 11 Power analysis
Go to the statistics tab. Select ‘power analysis and sample size’. Select the test you
used from the options that appear. Under each of the options you need to specify all
the values except ‘power’, e.g., under two sample t-test you need to fill in the boxes
that specify sample sizes, differences and standard deviation. The sample size is the
number of measurements in each group. The difference is the size of the effect that
you would consider important to detect. For example if you felt a difference would
between groups would need to be 5% or larger before you would consider it
important then you need to take the mean value across all the measurements and
calculate 5% of that value. Finally add the pooled standard deviation from the output
of the test. For example in the 2 sample t test detailed above the overall mean was
1.368 so 5% of this would be 0.0684. The standard deviation was 0.133 and the
sample size per group was 10. Putting these into the respective boxes and clicking
Ok runs the analysis.
The typical output looks like this
Power and Sample Size
2-Sample t Test
Testing mean 1 = mean 2 (versus not =)
Calculating power for mean 1 = mean 2 + difference
Alpha = 0.05 Assumed standard deviation = 0.133
Difference
0.068
Sample
Size
10
Power
0.191413
The sample size is for each group.
The power for this example is 0.191 multiply this by 100 to express it as a %. So the
power in this case to detect a 5% difference between means was only 19.1%. this
means we cannot be sure that the absence of a difference wasn’t just a type 2 error
because of the low sample size. If you run the test again but this time put the desired
minimum power into the power box – i.e. 0.8, and leave the sample size box empty,
this analysis would show that to detect a 5% difference would need 62 animals per
group.
Step 13 : two way anova
Two-way ANOVA
In Mintab the best way to perform Two way ANOVA is to use the general linear
model (GLM) as detailed in section 9 above. Although there is a facility to do twoway ANOVA in MINITAB under the statistics and ANOVA tabs this only works for
completely balanced designs. To perform two-way ANOVA by GLM the data needs
to be in the ‘stacked format’ – i.e. all the response data need to be in a single column
with other columns identifying the treatment or factor variables and individual IDs.
To select this test choose <statistics>, <ANOVA>, <general linear model>. In the
box labelled ‘responses’ type the column identifier for the variable you want to
analyse (e.g. Metabolism). In the box labelled ‘model’ it is necessary to include the
column identifiers for both of the treatment variables plus an additional term which
is the multiplication of these two variables to reflect the interaction of the predictors.
For example using the example of the data in Table 1. The treatment levels are in
column 5 (C5) and the sex identifiers in column six (C6). Hence the model would be
C5 C6 C5*C6
In the case of the metabolism data detailed in table 1 above with treatment and sex as
factors the output appears as follows
General Linear Model: Metabolism versus Treatment, Sex
Factor
Levels
Sex
Type
fixed
fixed
Levels
3
2
Values
1, 2, 3
0, 1
Analysis of Variance for Metabolism, using Adjusted SS for Tests
Source
Levels
Sex
Levels*Sex
Error
Total
DF
2
1
2
9
14
Seq SS
1.34181
0.03325
0.04188
0.14495
1.56189
S = 0.126908
R-Sq = 90.72%
Adj SS
1.21041
0.03325
0.04188
0.14495
Adj MS
0.60520
0.03325
0.02094
0.01611
F
37.58
2.06
1.30
P
0.000
0.185
0.319
R-Sq(adj) = 85.56%
The significance of the different effects is shown in the ANOVA table.
In this case there is a significant treatment effect (p < 0.001), no significant sex
effect (p > 0.05) and no significant sex by treatment interaction (p = 0.319).
Repeated measures 2-way ANOVA
To perform this test one simply repeats the above procedure but including a column
with the individual IDs in it and put this into the model. This column is also entered
into the box labelled ‘random factors’.
Step 14: Proceed in the same way as for step 13 Two-way ANOVA. Add additional factors
and interactions into the ‘model’ box.
Step 18:
non parametric tests paired comparisons
Mann Whitney U-test
The data need to be organised with the data for treatment and control in different
columns. Using the data from table 1.
Go to Statistics – nonparametrics – Mann Whitney. This opens a new window. Enter
the column identifier for the treatment data and control data in the respective boxes.
Click on OK. The output appears as follows.
Mann-Whitney Test and CI: Treatment, Control
T
C
N
10
5
Median
1.5650
1.0000
Point estimate for ETA1-ETA2 is 0.5600
95.7 Percent CI for ETA1-ETA2 is (0.2599,0.7799)
W = 105.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0027
The test is significant at 0.0026 (adjusted for ties)
The p value for the test is shown on the bottom line of the output. If this value is
< .05 the difference between the columns is significant.
Wilcoxon matched pairs
The data need to be organised as in table 2. Before running the test it is necessary to
subtract one column from the other. To do this go to Calc, select calculator. Identify
the column where you wish the result to be placed (e.g. C10). Pt the subtraction
calculation into the ‘expression’ box. Ie using the data in table 2 the expression
would be ‘C3-C2’ (i.e. treatment minus control data)
Go to Statistics – nonparametrics – 1-Sample Wilcoxon. This opens a new
window. Put the column identifier for the column that contains the differences into
the box marked ‘variables. Click the button marked ‘test median’ and enter the value
0.0 into the box if it doesn’t appear automatically.
The output appears as follows.
Wilcoxon Signed Rank Test: C10
Test of median = 0.000000 versus median not = 0.000000
C10
N
15
N for
Test
15
Wilcoxon
Statistic
108.0
P
0.007
Estimated
Median
0.1075
The value for the Wilcoxon statistic and the associated p value are displayed in the
table. If the P value is less than 0.05 as it is in this case there is a significant
difference between the treatment and control.
Step 19 non parametric analysis when there are multiple treatments or levels:
Kruskal Wallis ANOVA
Go to statistics- select nonparametrics- select Kruskal Wallis. In the response box
enter the column identifier for the dependent variable eg metabolism and in the
factor box type the identifier for the treatment variable. Click OK. The output
appears as follows.
Kruskal-Wallis Test: Metabolism versus Treatment
Kruskal-Wallis Test on Metabolism
Treatment
1
2
Overall
H = 9.38
H = 9.41
N
10
5
15
Median
1.565
1.000
DF = 1
DF = 1
Ave Rank
10.5
3.0
8.0
P = 0.002
P = 0.002
Z
3.06
-3.06
(adjusted for ties)
The significance is indicated on the last line as a P value. If P < .05 there is a
significant treatment effect
Friedman test
To apply the Friedman test in Minitab the data needs to be structured in a different
way. All the data needs to be in a single column with the individual identifiers in a
separate column and the treatments in a third column. To generate these columns
from the data in table 2 you can use the ‘stack’ command. i.e. go to data <select>
Stack and then select columns from the options that appear. In the first box type the
column identifiers for the two sets of metabolism data (c2 and c3) and then click the
button ‘column of current worksheet’ and type in the column number of the column
where you want the stacked data to be stored (e.g. C10). In the store subscripts box
type another column name (e.g. c11) this will identify which values correspond to
treatment and which to control. To get the corresponding individual IDs type C1 C1
in the ‘stack the following columns box’ this will give you the individual data in a
third new column (e.g. c12).
To perform the Friedman test choose Statistics – Nonparametrics – Friedman test.
In the box that opens type the column identifier for the stacked metabolism data (in
the above case C10). In the treatment box type the column identifier for the
subscripts (C11) and in the blocks column type the column identifying the individual
IDs (c12). The output is as follows.
Friedman Test: metabolism versus treatment blocked by IDs
S = 5.40
trtment
1
2
DF = 1
N
15
15
P = 0.020
Est Median
1.3900
1.4900
Sum of
Ranks
18.0
27.0
Grand median = 1.4400
The result and P value are on the first line of the output.
Step 23 Analysis of Covariance: ANCOVA
Data should be organized as shown in Table 1. Go to Statistics <select> ANOVA,
<select> general linear model. In the box labelled response, add the column
identifier for the dependent variable (e.g. metabolism or column 2 in table 1). In the
box labelled ‘model’ add the column identifiers for the treatment variable and also
the covariate (ie body mass) and the treatment by covariate interaction. Eg using the
data in table 1 as an example the model is specified as
C3 C4 C3*C4
It is necessary to declare that body weight is a covariate in the model. To do this
click on the box labelled ‘covariates’ and then type the column identifier for the
covariate (in this case C3) into the box. Close this box and then click on Ok to run
the analysis. The output appears as follows
General Linear Model: Metabolism versus Levels
Factor
Levels
Type
fixed
Levels
3
Values
1, 2, 3
Analysis of Variance for Metabolism, using Adjusted SS for Tests
Source
DF
Seq SS
Adj SS
Adj MS
F
P
Body Mass
Levels
Levels*Body Mass
Error
Total
S = 0.116699
1
2
2
9
14
0.07801
1.32527
0.03604
0.12257
1.56189
0.07686
0.05598
0.03604
0.12257
R-Sq = 92.15%
Term
Constant
Body Mass
Body Mass*Levels
1
2
0.07686
0.02799
0.01802
0.01362
5.64
2.06
1.32
0.042
0.184
0.314
R-Sq(adj) = 87.79%
Coef
-0.3954
0.07448
SE Coef
0.7367
0.03135
T
-0.54
2.38
P
0.604
0.042
-0.05786
-0.01052
0.04214
0.04386
-1.37
-0.24
0.203
0.816
The significance of the different effects is shown in the ANOVA table.
In this case body mass was a significant covariate (p =0.042), and there no
significant effect of treatment or interaction between body mass and treatment level
was found (p>0.05).
Note: see results under Part 14 where treatment levels were shown to significantly
affect metabolism. The analysis of covariance here suggests that these effects on
metabolism might be explained by differences in body mass between individuals and
are thus not caused by the treatment. However, to perform the final analysis of these
data the analysis needs to be repeated omitting the non-significant interaction effect
in the model. i.e. the model should be respecified as
C3 C4
This should only be done when the interaction term is NOT significant. In this
analysis keep C3 as a covariate. The revised output is as follows.
General Linear Model: Metabolism versus Levels
Factor
Levels
Type
fixed
Levels
3
Values
1, 2, 3
Analysis of Variance for Metabolism, using Adjusted SS for Tests
Source
Body Mass
Levels
Error
Total
DF
1
2
11
14
S = 0.120078
Term
Constant
Body Mass
Seq SS
0.07801
1.32527
0.15861
1.56189
Adj SS
0.06147
1.32527
0.15861
R-Sq = 89.85%
Coef
-0.1899
0.06559
SE Coef
0.7467
0.03177
Adj MS
0.06147
0.66264
0.01442
F
4.26
45.96
P
0.063
0.000
R-Sq(adj) = 87.08%
T
-0.25
2.06
P
0.804
0.063
This revised analysis excluding the interaction term shows that consistent with the
data analysis in section 14 there is a significant treatment effect and an effect of body
mass that just fails to reach statistical significance (p = 0.063). This emphasises the
critical importance of re-running such analyses excluding non-significant interaction
terms.
Step 37 Correlation matrix
Select statistics <select> display basic statistics and <select> correlation. In the
variables box enter the column identifiers for all the predictor variables that you
want to correlate together. The output is a correlation matrix.
A typical matrix might be as follows for a situation where 5 organ weights are available.
BAT
0.13
Skeletal
muscle
0.66
0.17
0.55
0.93
1.00
0.11
0.03
1.00
0.61
Organ
Liver
WAT
Brain
Liver
1.00
0.32
1.00
WAT
Brain
Skeletal
muscle
0.22
This table highlights that WAT and BAT are highly correlated and hence not independent
predictor variables. Skeletal muscle also is quite highly correlated to the liver mass. One can
proceed with the analysis ignoring these effects but one should be aware that such
correlations may compromise the outcome. In this case a strong effect of WAT might
emerge because of the effect of BAT on metabolism combined with the high correlation of
WAT with BAT. This analysis requires the number of observations (i.e., individuals) to
exceed by at least a factor of 3 the number of predictor variables included into the analysis.
Hence in this situation one would have 5 predictors so for each group (i.e., treatment levels
and control) one would need at least 15 individuals – and preferably many more.
Interpretation of these effects depends on the complexity of the interactions. The bottom line
is to diagnose an overall treatment effect controlling for these body composition variables. If
there is an overall treatment effect one can establish where this occurs using the multiple
range tests (TUKEY TEST and DUNCAN’S MULTIPLE RANGE TEST).
Step 38: PRINCIPAL COMPONENTS ANALYSIS.
To perform a principal components analysis the data for the individual organ weights
need to be organised such that the organ weights are in separate columns and the
organ weights for a given individual are in a single row. An example set of data is
included in Appendix one. These data are 17 organ weights in grams from 30 rats.
The original data were published in Selman et al (2008). To perform a principal
components analysis on these data select Statistics - multivariate and principal
components . In the box labelled variables type the column identifiers for the 17
organs (eg c1 – c17). In the box labelled ‘number of components to compute’ type 5.
This will restrict the analysis to calculate only the first 5 components. Otherwise if
this is left blank the analysis will compute n components where n is the original
number of columns entered into the analysis. Click on the button labelled ‘storage’
and in the new window that opens type column identifiers for the same number of
columns that you asked the program to compute. E.g. if you asked it to compute 5
components then type 5 column identifiers for example C18-c22. Click OK. Closes
new window. Click OK. Runs analysis. The output looks as follows:
Principal Component Analysis: Carcass, HEART, LIVER, KIDNEY, BRAIN, Brown Fat….,
Eigenanalysis of the Correlation Matrix
28 cases used, 2 cases contain missing values
Eigenvalue
Proportion
Cumulative
7.6875
0.452
0.452
3.4298
0.202
0.654
1.6138
0.095
0.749
1.0599
0.062
0.811
0.7983
0.047
0.858
0.6485
0.038
0.896
0.5503
0.032
0.929
0.3541
0.021
0.950
Eigenvalue
Proportion
Cumulative
0.2827
0.017
0.966
0.1803
0.011
0.977
0.1250
0.007
0.984
0.1029
0.006
0.990
0.0749
0.004
0.995
0.0463
0.003
0.997
0.0228
0.001
0.999
0.0187
0.001
1.000
Eigenvalue
Proportion
Cumulative
0.0044
0.000
1.000
Variable
Carcass
HEART
LIVER
KIDNEY
BRAIN
Brown Fat
Abdominal Fat
Gonadal Fat
Mesenteric Fat
Gonads
Large Intestine (g)
Small Intestine (g)
Stomach
Lungs
Pancreas
Pelage
Tail
PC1
0.333
0.236
0.283
0.288
0.134
-0.231
0.225
0.268
0.287
0.139
0.314
0.149
0.133
-0.019
0.276
0.328
0.229
PC2
-0.107
0.286
-0.215
0.270
0.371
-0.118
-0.333
-0.255
-0.097
-0.302
-0.014
0.058
0.318
0.372
0.120
-0.153
0.280
PC3
-0.071
0.052
0.017
0.089
0.180
0.485
0.140
0.110
0.244
-0.147
-0.098
0.574
0.339
-0.114
-0.252
-0.042
-0.267
PC4
0.141
-0.206
0.016
0.034
0.087
-0.020
-0.326
-0.279
-0.196
0.451
-0.009
0.128
0.357
-0.569
0.066
-0.055
0.164
PC5
0.077
0.378
0.100
-0.163
0.014
-0.180
-0.118
-0.280
0.204
0.502
0.209
0.066
-0.030
0.296
-0.349
-0.194
-0.317
At the top of the output the note reminds us that for this analysis to run it is
necessary to have complete data for all animals. If the data for a given animal is
incomplete it is excluded from the analysis. Beneath this is a table containing 17 sets
of values labelled Eigenvalue, proportion and cumulative. These are the proportions
of the original variance contained in the 17 computed components ordered by size.
Hence the first principal component explains 45.2% of the original variation. The
eigenvalue is a representation of how much better this variable is at describing the
variance compared to the original variables. As there were 17 original variables they
each contain 1/17th of the total variation (p = 0.0588). Since this new variable
contains p = 0.452 of the variation it is 0.452/0.0588 = 7.68x better than the original
variables at describing the data. Another way of thinking about the eigenvalue is that
it is the number of original variables that the current variable is ‘worth’. The second
principal component in this case has an eigen value of 3.43 and explains 20.2% of
the variation, so the cumulative variance explained by components 1 and 2 is 65.4%.
Looking at this table you can see that beyond the 4th principal component the
eigenvalue falls below 1 so these variables explain less than the original variables.
Moreover, the first 4 components explain together 81% of the original variation.
This means that by looking at just the first 4 components retains 81% of the original
information but in just 4 as opposed to 17 variables.
Below the eigenvalues and variance table is a second table showing each of the
original variables alongside the new principal components (PC1 to PC5). The values
in this table are ‘eigenvectors’ that show the strength and direction of the association
between the original variable and the new component. As you can see almost all the
variables affect PC1 in a positive way and so it reflects an overall size component,
while PC2 is negatively affected by all the body fat components so it is a reflection
of leanness of the animals. We can use the ‘scores’ on these principal components in
a general linear model (see above) in place of the original organ masses. The major
advantage of this is that these principal components are by definition completely
independent of each other. This makes their use in the general linear model more
statistically valid.
R
Step 1: Normality tests
The following code will perform the Anderson-Darling and the Shapiro-Wilks
tests for normality on the Metabolism variable in the Table1 data frame. To
perform the Anderson-Darling test using the ad.test() function, the “nortest”
package must be installed. On most systems the install.packages() function
can be used to install packages; otherwise, there are package installation
wizards associated with all script editors. To use the “nortest” package, it must
be loaded using the library() function.
Anderson-Darling test
install.packages("nortest")
library(nortest)
ad.test(Table1$Metabolism)
Anderson-Darling normality test
data: Table1$Metabolism
A = 0.2395, p-value = 0.732
Shapiro-Wilks test
shapiro.test(Table1$Metabolism)
Shapiro-Wilk normality test
data: Table1$Metabolism
W = 0.9566, p-value = 0.6333
If the p-value is less than 0.05, then the distribution of the data differs
significantly from normal. If the p-value is greater than 0.05, then the data can
be considered normally distributed. Normality Q-Q plots can be made for the
Metabolism variable in the Table1 data frame using the qqnorm() function. A
Q-Q line that reflects what would is expected if the distribution is normal can
be added using the qqline() function. Note that to make this plot with the
observed values on the x-axis, datax = TRUE must be specified in both the
qqnorm() and qqline() functions.
qqnorm(Table1$Metabolism, datax = TRUE, xlab = "Expected Normal", ylab =
"Observed Values")
qqline(Table1$Metabolism, datax = TRUE)
Step 3: Attempt to normalise the distribution by transforming it.
New log10 or square root transformed variables can be added to the data frame
using the log10() and sqrt() functions, respectively. Transformed variables
(e.g. log10Metabolism or sqrtMetabolism) can be analyzed in the steps below.
Table1$log10Metabolism <- log10(Table1$Metabolism)
Table1$sqrtMetabolism <- sqrt(Table1$Metabolism)
Alternatively data may be transformed using the BOX-COX procedure
Prior to performing a Box-Cox transformation, the “MASS” library needs to
be loaded using the library() function (the “MASS” package comes with the
installation of R). To perform the Box-Cox transformation in R, an ANOVA
or ANCOVA model needs to be specified using either the lm() or aov()
functions (e.g. aov(Metabolism ~ Levels, data = Table1)). The boxcox()
function calculates the log-likelihood of a sequence of Lambda values (λ)
attempting to normalise the residuals of the linear model that is specified. The
default of the boxcox() function is to calculate log-likelihood values for λ
values between -2 and 2 at 0.1 intervals. In this example, the plateau of the
log-likelihood function peaks outside these λ values; therefore, in the function
below log-likelihood values between -5 and 5 at 0.1 intervals are calculated
using the seq() function. The default of the boxcox() function is to plot loglikelihood values on λ values with the 95% confidence interval of λ values. In
this example, the results of the boxcox() function will be placed in a list that
we have arbitrarily called bc. This list contains two vectors: (1) a vector of the
λ values used between -5 and 5 (this vector is called x and it can be seen with
the command bc$x), and (2) a vector of all the log-likelihood values
calculated from the λ values (this vector is called y and it is shown below
rounded to two decimal places using the round() function). The max()
function is used to determine the largest log-likelihood value. The
which.max() function is used to determine the position of maximum loglikelihood value in the log-likelihood vector. In this case, the 77th value in the
log-likelihood vector was the maximum value. The vector notation bc$x[] will
be used to output λ value in the position corresponding to the maximum loglikelihood value; this value will be called Lambda. Finally, values
transformed by the Box-Cox transformation (yi(λ)) are calculated using the
following formulas: (1) if the maximum log-likelihood λ ≠ 0: yi(λ) = (yiλ – 1) /
λ (shown below), (2) if the maximum log-likelihood λ = 0: yi(λ) = loge(yi)
specified using the log() function in R. This transformation is then applied to
the dependent variable (i.e. Metabolism) and then the ANOVA or ANCOVA
model is re-run using the Box-Cox transformed dependent variable.
library(MASS)
bc <- boxcox(aov(Metabolism ~ Levels, data = Table1), lambda = seq(5, 5, 0.1))
bc$x
[1] -5.0 -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4.0 -3.9 -3.8 -3.7
-3.6
[16] -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2
-2.1
[31] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7
-0.6
[46] -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9
[61] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3
2.4
[76] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
3.9
[91] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
round(bc$y, digits = 2)
[1] -31.68 -30.83 -29.98 -29.13 -28.28 -27.44 -26.60 -25.76 -24.93 -24.10
[11] -23.27 -22.45 -21.62 -20.81 -19.99 -19.18 -18.37 -17.56 -16.76 -15.97
[21] -15.17 -14.38 -13.59 -12.81 -12.03 -11.26 -10.49 -9.72 -8.96 -8.20
[31] -7.45 -6.70 -5.96 -5.23 -4.49 -3.77 -3.05 -2.33 -1.63 -0.93
[41] -0.23
0.45
1.13
1.80
2.47
3.12
3.76
4.40
5.02
5.63
[51]
6.23
6.81
7.38
7.94
8.48
9.01
9.52 10.01 10.48 10.93
[61] 11.35 11.76 12.14 12.50 12.83 13.14 13.42 13.67 13.89 14.09
[71] 14.26 14.40 14.51 14.60 14.65 14.69 14.69 14.67 14.63 14.56
[81] 14.47 14.36 14.23 14.09 13.92 13.74 13.54 13.33 13.10 12.86
[91] 12.61 12.35 12.08 11.80 11.51 11.21 10.91 10.59 10.27
9.95
[101]
9.62
max(bc$y)
[1] 14.6913
which.max(bc$y)
[1] 77
Lambda <- bc$x[which.max(bc$y)]
Lambda
[1] 2.6
Table1$MetabolismBC <- (Table1$Metabolism^Lambda) / Lambda
summary(Table1$MetabolismBC)
Min. 1st Qu.
0.1157
0.5230
R
Median
0.8232
Mean 3rd Qu.
0.9376
1.4030
Max.
1.7990
Step 7: Paired comparisons of parametric normally distributed data
For two sample t-test
Using the data presented in Table 1, the difference between the Treatments
can be evaluated using the t.test() function. In this function, a two-sided test
is the default comparison between the Treatments. If equal variances in the
two Treatments are assumed, then the use the following code:
t.test(Metabolism ~ Treatment, data = Table1, var.equal =
TRUE)
If equal variances in the two treatments are not assumed, then use the
following code:
t.test(Metabolism ~ Treatment, data = Table1, var.equal = FALSE)
The output from the t.test() function where equal variances in the two
treatments are not assumed is:
Welch Two Sample t-test
data: Metabolism by Treatment
t = 5.1023, df = 7.771, p-value = 0.001014
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3066951 0.8173049
sample estimates:
mean in group 1 mean in group 2
1.538
0.976
The t-value, df, and p-value are shown on the same line. If the p-value < 0.05
then the difference between the two Treatments is significant. In this case,
Metabolism in group 1 was significantly greater than Metabolism in group 2.
If you want to output the standard deviation and the standard error (standard
deviation / square root of the number of samples) for both groups, then use the
tapply() function.
tapply(Table1$Metabolism, Table1$Treatment, sd)
1
2
0.1950954 0.2040343
tapply(Table1$Metabolism, Table1$Treatment, sd) /
sqrt(tapply(Table1$Metabolism, Table1$Treatment, length))
1
2
0.06169459 0.09124692
For Paired t-test
This analysis uses the data provided in Table 2. To run this analysis, the data
in Table 2 needs to be in ‘stacked format’. In the stacked format data frame,
there are five columns: 1) ID, 2) Metabolism – all values from the Metabolism
Control, Metabolism Treatment 1, and Metabolism Treatment 2 columns, 3)
Levels - the Metabolism Control, Metabolism Treatment 1, and Metabolism
Treatment 2 values are specified as Metabolism1, Metabolism2, and
Metabolism3, respectively, 4) Body Mass and 5) Sex. In the stacked data, all
categorical variables (ID and Sex) must be specified as factors using the
as.factor() function. This paired t-test compares Metabolism between the
Levels Metabolism1 and Metabolism2 (the Metabolism3 values are removed
using subset not equal to “!=”).
Table2stack <- read.table("Table2b.txt", header=TRUE)
Table2stack$Sex <- as.factor(Table2stack$Sex)
Table2stack$ID <- as.factor(Table2stack$ID)
t.test(Metabolism ~ Levels, data = Table2stack, subset = Table2stack$Levels !=
"Metabolism3", paired = TRUE, var.equal = FALSE)
Paired t-test
data: Metabolism by Levels
t = -3.1122, df = 14, p-value = 0.007644
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.20269722 -0.03730278
sample estimates:
mean of the differences
-0.12
In this case, the difference between the Metabolism1 and Metabolism2 Levels
is highly significant (i.e. P < .01). To calculate the mean within each Level,
use the tapply() function.
tapply(Table2stack$Metabolism, Table2stack$Levels, mean)
Metabolism1 Metabolism2 Metabolism3
1.350667
1.470667
1.402000
Step 8: multiple treatment levels, parametric tests
R
For One way ANOVA
Using the data from Table 1, to compare the Control, Treatment1, and
Treatment2 Levels, use the aov() function.
summary(aov(Metabolism ~ Levels, data = Table1))
Df Sum Sq Mean Sq F value
Pr(>F)
Levels
2 1.34181 0.67091 36.582 7.827e-06 ***
Residuals
12 0.22008 0.01834
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
The p-value in this summary table is under the Pr(>F) heading. In this case,
the difference between Levels is highly significant (i.e. P < .01).
Repeated-measures ANOVA
This analysis uses the data provided in Table 2 and compares Metabolism
measured under three conditions (Levels: Metabolism1, Metabolism2, and
Metabolism3). To run this analysis, the data in Table 2 needs to be in ‘stacked
format’ – i.e. all the Metabolism data need to be in a single column with other
four columns identifying the Levels (Metabolism1, Metabolism2, and
Metabolism3), Body Mass, Sex, and individual IDs. Repeated measures
ANOVA can be performed in R using either the lme() or the aov() functions.
The random effect of ID is included in the lme() function by adding “random
= ~1|ID” as a separate argument, whereas in the aov() function it is included
by adding “+ Error(ID)” in the formula. In order to use the lme() function, the
“nlme” package must be installed and loaded using the library() function.
install.packages("nlme")
library(nlme)
anova(lme(Metabolism~Levels, random = ~1|ID, data = Table2stack))
(Intercept)
Levels
numDF denDF F-value p-value
1
28 445.5459 <.0001
2
28
3.7787 0.0353
summary(aov(Metabolism ~ Levels + Error(ID), data = Table2stack))
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 14 2.8023 0.20017
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Levels
2 0.10875 0.054376 3.7787 0.03525 *
Residuals 28 0.40292 0.014390
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-value in this summary table is under the Pr(>F) heading. The difference
between Levels is significant (i.e. P < .05). The significance of the random
effect is not provided by either the lme() or the aov() functions, but the Fvalue can be calculated using the aov() output by dividing the Mean Square of
the ID effect (0.20017), by the Mean Square Residual (0.014390). Using this
F-value, the p-value can be calculated based on the F distribution with degrees
of freedom equal to 14 and 28 using the pf() function. In this case, the random
effect of ID is highly significant (P < 0.0001).
pf(0.20017 / 0.014390, df1 = 14, df2 = 28, lower.tail = FALSE)
[1] 4.51458e-09
Step 10: post hoc tests
For One way Anova
The one-way ANOVA in step 9 suggested that Metabolism was significantly
different among the Levels in Table 1. Tukey's post-hoc test can be used to
determine what Levels were significantly different from each other using the
TukeyHSD() function.
TukeyHSD(aov(Metabolism ~ Levels, data = Table1))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Metabolism ~ Levels, data = Table1)
$Levels
diff
lwr
upr
p adj
2-1 -0.340 -0.5685037 -0.1114963 0.0048896
3-1 -0.732 -0.9605037 -0.5034963 0.0000053
3-2 -0.392 -0.6205037 -0.1634963 0.0017004
The Levels that are being compared are given in the left-most column. Based
on the p-values (p adj column), all three Levels differ significantly from each
other (P < 0.005).
Repeated-measures ANOVA
The repeated measures ANOVA in step 9 suggested that Metabolism differed
among the Metabolism1, Metabolism2, and Metabolism3 Levels. This
analysis determines what Levels were significantly different from each other.
The data in Table 2 needs to be in ‘stacked format’ for this analysis. Post-hoc
tests following repeated measures ANOVA can be performed on lme()
function objects (see Step 9 for the installation of the "nlme" package for the
lme() function), but not on the aov() function objects. To proceed, the
“multpcomp” package must be installed, the package must be loaded using the
library() function, and the glht() function is used.
library(nlme)
install.packages(“multcomp”)
library(multcomp)
summary(glht(lme(Metabolism ~ Levels, random = ~1|ID, data = Table2stack),
linfct=mcp(Levels="Tukey")))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lme.formula(fixed = Metabolism ~ Levels, data = Table2stack,
random = ~1 | ID)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
Metabolism2 - Metabolism1 == 0 0.12000
0.04380
2.740
0.017 *
Metabolism3 - Metabolism1 == 0 0.05133
0.04380
1.172
0.470
Metabolism3 - Metabolism2 == 0 -0.06867
0.04380 -1.568
0.260
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
The Levels that are being compared are given in the left-most column. Based
on the p-values (Pr(>|z|) column), Metabolism2 differed significantly from
Metabolism1 (P = 0.017), but none of the other Levels differed significantly
from each other (P > .05).
Step 11 Power analysis
To perform power analyses, the “pwr” package must be installed using the
install.packages() function. To use the functions in the “pwr” package, it
must be loaded using the library() function. To perform a power analysis on a
two sample t-test using the pwr.t2n.test() function, you need to input the
sample sizes of the two samples (i.e. n1 = 10, and n2 = 10; note that the
default significance level is set to 0.05). To specify the effect size in R, the %
difference that you would consider important to detect needs to be divided by
the standard deviation; thus, using the same mean, % difference, and standard
deviation as in the MINITAB example: d = 1.368 * 0.05 / 0.133.
install.packages("pwr")
library(pwr)
pwr.t2n.test(n1 = 10, n2 = 10, d = 1.368 * 0.05 / 0.133)
t test power calculation
n1
n2
d
sig.level
power
alternative
=
=
=
=
=
=
10
10
0.5142857
0.05
0.1931212
two.sided
As outlined in the MINITAB example, the pwr.t.test() function can be used to
calculate the sample sizes required to obtain a specified level of power (e.g.
power = 0.8).
pwr.t.test(power = 0.8, d = 1.368 * 0.05 / 0.133, type = "two.sample")
Two-sample t test power calculation
n
d
sig.level
power
alternative
=
=
=
=
=
60.32651
0.5142857
0.05
0.8
two.sided
NOTE: n is number in *each* group
Step 13 : Two-way ANOVA
Two-way ANOVA
This analysis uses the data in Table 1. In order to use Type III sums-of-squares
as is done in MINITAB and SPSS, the “car” package must be installed and
loaded using the library() function. Two-way ANOVAs can be analyzed using
the aov() function. The aov() function is nested within the Anova() function
specifying that we want the analysis to use type III sums-of-squares (i.e. type
= “III”). R offers a number of different ways by which factor levels can be
compared (i.e. contrasted in statistical terms). Treatment contrasts are the
default method in R; however, this type of contrasts is not valid for two-way
ANOVAs using type III sums-of-squares. Helmert or sum contrasts are two
types of contrasts that are valid for this type of ANOVA. Sum contrasts are
used in the command below by specifying "contr.sum" in the function
options(contrasts()). The Anova() function will output F-value, p-values,
and the other values that are used to calculate them. In this example, the
Levels:Sex interaction was non-significant (F2,9 = 1.3, P = 0.32), and thus it
can be removed from the model. The model without the interaction suggests
that there is a strong effect of Levels (F2,11 = 39.5, P < 0.0001), but that the
effect of Sex is non-significant (F1,11 = 1.96, P = 0.19).
install.packages(“car”)
library(car)
options(contrasts=c("contr.sum", "contr.poly"))
Anova(aov(Metabolism ~ Levels + Sex + Levels:Sex, data = Table1), type = "III")
Anova Table (Type III tests)
Response: Metabolism
Sum Sq Df
F value
Pr(>F)
(Intercept) 26.6451 1 1654.4056 1.634e-11 ***
Levels
1.2104 2
37.5774 4.278e-05 ***
Sex
0.0333 1
2.0648
0.1846
Levels:Sex
0.0419 2
1.3000
0.3192
Residuals
0.1450 9
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Anova(aov(Metabolism ~ Levels + Sex, data = Table1), type = "III")
Anova Table (Type III tests)
Response: Metabolism
Sum Sq Df F value
Pr(>F)
(Intercept) 26.6451 1 1568.824 3.221e-13 ***
Levels
1.3418 2
39.502 9.533e-06 ***
Sex
0.0333 1
1.958
0.1893
Residuals
0.1868 11
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Repeated-measures two-way ANOVA
This analysis examines the effect of Levels and Sex on Metabolism including
a random factor of ID. This analysis can be performed using either the lme()
or the aov() functions; however, only the commands for the aov() function
will be shown. The random effect of ID is included in the aov() function it is
included by adding “+ Error(ID)” in the formula. For this analysis, the data in
Table 2 needs to be in ‘stacked format’. The “Levels * Sex” notation suggests
that both the main effects of Levels and Sex are tested in addition to their
interaction. The “Levels * Sex” interaction was not significant (F2,26 = 0.06, P
= 0.94), and thus, it was removed from the model. There was a significant in
Metabolism among the Levels (F2,28 = 3.78, P = 0.04); however, Sex did not
significantly affect Metabolism (F1,13 = 0.33, P = 0.58).
summary(aov(Metabolism ~ Levels * Sex + Error(ID), data = Table2stack))
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Sex
1 0.06848 0.068481 0.3256 0.578
Residuals 13 2.73383 0.210295
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Levels
2 0.10875 0.054376 3.5258 0.04417 *
Levels:Sex 2 0.00193 0.000967 0.0627 0.93936
Residuals 26 0.40098 0.015422
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(aov(Metabolism ~ Levels + Sex + Error(ID), data = Table2stack))
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Sex
1 0.06848 0.068481 0.3256 0.578
Residuals 13 2.73383 0.210295
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Levels
2 0.10875 0.054376 3.7787 0.03525 *
Residuals 28 0.40292 0.014390
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Step 14: Proceed in the same way as for step 13 for Two-way ANOVA, but with
additional factors added to the model formula
Step 18:
non parametric tests paired comparisons
Mann-Whitney U/ two-sample Wilcoxon test
This analysis compares Metabolism between the two Treatments in Table 1.
The Mann-Whitney U test can also be called a two-sample Wilcoxon test. To
determine the median of each Treatment, use the tapply() function.
tapply(Table1$Metabolism, Table1$Treatment, median)
1
2
1.565 1.000
wilcox.test(Metabolism ~ Treatment, data = Table1, conf.int = TRUE)
Wilcoxon rank sum test with continuity correction
data: Metabolism by Treatment
W = 50, p-value = 0.002647
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.2600396 0.7799804
sample estimates:
difference in location
0.5543361
R gives a warning message for this analysis that it cannot calculate exact pvalues and confidence intervals because of ties. The p-value and the 95%
confidence interval are corrected because there are ties amongst the
Metabolism values. The p-value suggests that Metabolism is significantly
different between the two Treatments (i.e. P < .05).
Wilcoxon matched pairs
This analysis uses the wilcox.test() function to compare the Metabolism
Control and Metabolism Treatment1 columns in Table 2 assuming that these
values come from the same individual (paired = TRUE). To proceed with this
analysis, the Table 2 data need to be imported using: Table2 <read.table("Table2.txt", header = TRUE).
wilcox.test(Table2$Control, Table2$Treatment1, paired = TRUE, conf.int = TRUE)
Wilcoxon signed rank test with continuity correction
data: Table2$Control and Table2$Treatment1
V = 12, p-value = 0.006945
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-0.20504427 -0.03504422
sample estimates:
(pseudo)median
-0.1072468
Again, R gives a warning message for this analysis that it cannot calculate
exact p-values and confidence intervals because of ties. The p-value and the
95% confidence interval are corrected because there are ties amongst the
Metabolism values. This analysis suggests that there is a significant difference
between Control and Treatment1.
Step 19 non parametric analysis when there are multiple treatments or levels:
Kruskal-Wallis ANOVA
This analysis compares Metabolism between the Treatments in Table 1 using
the kruskal.test() function. The Kruskal-Wallis statistic (χ2) and the p-value
adjusted for ties is given on the last line. A p-value of < .05 suggests that
Metabolism is significantly different between the Treatments.
kruskal.test(Metabolism ~ Treatment, data = Table1)
Kruskal-Wallis rank sum test
data: Metabolism by Treatment
Kruskal-Wallis chi-squared = 9.4086, df = 1, p-value = 0.00216
Repeated measures Friedman test
This analysis compares Metabolism between the Levels in Table2. The data in
Table 2 needs to be in ‘stacked format’ and this analysis will focus on
comparing Metabolism between Levels Metabolism1 and Metabolism2. The
subset() function is used to omit the Metabolism3 Level and create a new
data frame called Table2stacksub. The as.factor(as.character()) functions
are required in this case because without performing these functions, R
considers there be to zero individuals in the Metabolism3 Level and this
causes an error with the Friedman Test.
Table2stacksub <- subset(Table2stack, Table2stack$Levels != "Metabolism3")
Table2stacksub$Levels <- as.factor(as.character(Table2stacksub$Levels))
friedman.test(Table2stacksub$Metabolism, Table2stacksub$ID,
Table2stacksub$Levels)
Friedman rank sum test
data: Table2stacksub$Metabolism, Table2stacksub$ID and
Table2stacksub$Levels
Friedman chi-squared = 26.6679, df = 14, p-value = 0.02126
The significance of this test is indicated on the last line as a p-value. The pvalue is < .05 in this case suggesting that Metabolism differs significantly
between the Metabolism1 and Metabolism2 Levels taking into account that
these Levels were examined in the same individuals.
pchisq(26.6679, df=14, lower.tail = F)
[1] 0.02125833
pchisq(5.4, df=1, lower.tail = F)
[1] 0.02013675
Step 23 Analysis of Covariance: ANCOVA
This analysis uses the data in Table 1. In order to use Type III sums-of-squares as is
done in MINITAB and SPSS, the “car” package must be installed (see Step 14 for
installation of the “car” package) and loaded using the library() function. ANCOVAs
can be analyzed using the aov() function. In this example, we will first run the aov()
function and call the resulting model aov1. We will name the model aov1 because it
will simplify the notation for the post-hoc Tukey test in step 30. Next, we will use the
Anova() function on aov1 specifying that we want the analysis to use type III sumsof-squares (i.e. type = “III”). Remember that the correct contrasts need to be specified
in the options() function in order for the results of Anova() to be correct (see twoway ANOVA in step 14). In this example, the Levels:Body.Mass interaction was nonsignificant (F2,9 = 1.32, P = 0.31), and thus it can be removed from the model. The
ANOVA tables both including (aov1) and excluding (aov2) the Levels:Body.Mass
interaction are presented. The interpretation of the analysis excluding the interaction
is that there is a strong effect of Levels (F2,11 = 46.0, P < 0.0001), but that the effect of
body mass is only a trend assuming a significance threshold of 0.05 (F1,11 = 4.3, P =
0.06).
library(car)
options(contrasts=c("contr.sum", "contr.poly"))
aov1 <- aov(Metabolism ~ Body.Mass + Levels + Levels:Body.Mass, data = Table1)
Anova(aov1, type = "III")
Anova Table (Type III tests)
Response: Metabolism
Sum Sq Df F value Pr(>F)
(Intercept)
0.003923 1 0.2881 0.60446
Body.Mass
0.076857 1 5.6436 0.04152 *
Levels
0.055980 2 2.0553 0.18399
Body.Mass:Levels 0.036039 2 1.3231 0.31351
Residuals
0.122567 9
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
aov2 <- aov(Metabolism ~ Body.Mass + Levels, data = Table1)
Anova(aov2, type = "III")
Anova Table (Type III tests)
Response: Metabolism
Sum Sq Df F value
Pr(>F)
(Intercept) 0.00093 1 0.0647
0.80395
Body.Mass
0.06147 1 4.2635
0.06334 .
Levels
1.32527 2 45.9567 4.561e-06 ***
Residuals
0.15861 11
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To perform a post-hoc test on an ANCOVA, the "multcomp" package must be
installed (see Step 11 for the installation of the "multcomp" package). The
"multcomp" package must then be loaded using the library() function, and the
glht() function is used. In the example below, the glht() function compares the
Levels using a Tukey test, based on the aov2 model.
library(multcomp)
summary(glht(model = aov2, linfct = mcp(Levels = "Tukey"), data = Table1))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = Metabolism ~ Body.Mass + Levels, data = Table1)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0 -0.37673
0.07800 -4.830 0.00123 **
3 - 1 == 0 -0.72806
0.07597 -9.584 < 0.001 ***
3 - 2 == 0 -0.35133
0.07846 -4.478 0.00219 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
Step 37 Correlation matrix
Assuming that you have already imported a data frame called BodyComp into
R, you can assign a new data frame called cors that only contains the organ
weights that you want to compare. R uses [row, column] notation to refer to
specific values within a data frame. For example, if you want to return the
variable in the second row and third column of the Table1 data frame, you
would type:
Table1[2, 3]
[1] 22.3
Because we want to have all rows from the BodyComp data frame in the cors
data frame, we leave the rows position in the [row, column] notation blank. In
the column position, we include all the variable names that we want to include
in a vector (i.e. vectors are specified using c() notation). Finally, to display the
correlation matrix using Pearson correlation coefficients, use the cor()
function. The “use” argument within the cor() function allows you to deal
with missing values (for more information on the “use” argument, see type:
cors <- BodyComp[ , c("Liver", "WAT", "Brain", "Skeletal.Muscle", "BAT")]
cor(cors, method = "pearson", use = "pairwise.complete.obs")
A typical matrix might be as follows for a situation where 5 organ weights are
available.
BAT
0.13
Skeletal
muscle
0.66
0.17
0.55
0.93
1.00
0.11
0.03
1.00
0.61
Organ
Liver
WAT
Brain
Liver
1.00
0.32
1.00
WAT
Brain
Skeletal
muscle
0.22
This table highlights that WAT and BAT are highly correlated and hence not
independent predictor variables. Skeletal muscle also is quite highly correlated to the
liver mass. One can proceed with the analysis ignoring these effects but one should be
aware that such correlations may compromise the outcome. In this case a strong effect
of WAT might emerge because of the effect of BAT on metabolism combined with
the high correlation of WAT with BAT. This analysis requires the number of
observations (i.e., individuals) to exceed by at least a factor of 3 the number of
predictor variables included into the analysis. Hence in this situation one would have
5 predictors so for each group (i.e., treatment levels and control) one would need at
least 15 individuals – and preferably many more. Interpretation of these effects
depends on the complexity of the interactions. The bottom line is to diagnose an
overall treatment effect controlling for these body composition variables. If there is an
overall treatment effect one can establish where this occurs using the multiple range
tests (TUKEY TEST and DUNCAN’S MULTIPLE RANGE TEST). If there are large
numbers of high correlations in the matrix go to 39 otherwise END.
Step 38: PRINCIPAL COMPONENTS ANALYSIS.
This analysis will use the BodyComp data set that was used in step 38. In
order to run a PCA in R using prcomp(), individuals without complete data
must be removed using the na.omit() function nested within the prcomp()
function. The argument scale = TRUE is specified because the variances
among the body composition variables vary considerably (the spread within
each variable within the BodyComp data frame can be seen with the
summary() function). The following line of code will give the proportion and
cumulative proportion of the total variance explained.
summary(prcomp(na.omit(BodyComp), scale = TRUE))
PC1
PC2
PC3
PC4
PC5
PC6
PC7
Standard deviation
2.7726 1.8520 1.27034 1.02950 0.89347 0.80528
0.74181
Proportion of Variance 0.4522 0.2018 0.09493 0.06235 0.04696 0.03815
0.03237
Cumulative Proportion 0.4522 0.6540 0.74889 0.81123 0.85819 0.89634
0.92871
PC8
PC9
PC10
PC11
PC12
PC13
PC14
Standard deviation
0.59503 0.53166 0.42466 0.35356 0.32082 0.2736
0.21518
Proportion of Variance 0.02083 0.01663 0.01061 0.00735 0.00605 0.0044
0.00272
Cumulative Proportion 0.94953 0.96616 0.97677 0.98412 0.99018 0.9946
0.99731
PC15
PC16
PC17
Standard deviation
0.15093 0.1366 0.06610
Proportion of Variance 0.00134 0.0011 0.00026
Cumulative Proportion 0.99865 0.9997 1.00000
Eigenvalues for all seventeen components are the square of the standard
deviations given from the previous line of code and can be outputted using:
prcomp(na.omit(BodyComp), scale=TRUE)$sd^2
[1] 7.687505211 3.429847979 1.613766157 1.059868403 0.798288931
0.648470172
[7] 0.550281167 0.354066373 0.282664344 0.180336227 0.125007493
0.102923245
[13] 0.074873073 0.046300902 0.022779398 0.018651206 0.004369719
To output the principle component scores associated with all seventeen organ
weights (only the first four are shown) use the following line of code. The
principle components show the strength of the association between the original
variable and the new component.
prcomp(na.omit(BodyComp), scale=TRUE)
Carcass
HEART
LIVER
KIDNEY
BRAIN
Brown.Fat
Abdominal.Fat
Gonadal.Fat
Mesenteric.Fat
Gonads
Large.Intestine..g.
Small.Intestine..g.
Stomach
Lungs
Pancreas
Pelage
Tail
PC1
-0.33276942
-0.23637808
-0.28312835
-0.28843254
-0.13447712
0.23057185
-0.22535429
-0.26791674
-0.28709897
-0.13919520
-0.31351304
-0.14871276
-0.13339934
0.01886844
-0.27587262
-0.32768510
-0.22887083
PC2
0.10678015
-0.28630188
0.21517741
-0.27045367
-0.37113185
0.11777104
0.33295878
0.25488352
0.09681311
0.30207318
0.01358145
-0.05786933
-0.31818402
-0.37180831
-0.12034584
0.15295340
-0.27969993
PC3
-0.07113286
0.05238709
0.01690019
0.08891807
0.18025001
0.48450263
0.14048190
0.11035956
0.24393549
-0.14657328
-0.09778719
0.57434774
0.33869275
-0.11381438
-0.25189547
-0.04247428
-0.26741094
PC4
0.140815809
-0.205892396
0.016342695
0.033666276
0.087331685
-0.020379466
-0.325546388
-0.278616530
-0.196235051
0.451415952
-0.008909068
0.128275550
0.357140179
-0.569416313
0.066332253
-0.054512252
0.163568995
The signs of the principal component scores within each principal component
are arbitrary. It may be that all the scores need to be multiplied by “-1” in
order to have consistent results between different statistical programs. Within
a given principal component, variables with principal component scores with
the same sign, have original values are correlated in the same direction with
the principal component scores. For example, fifteen of the seventeen variable
are correlated with PC1 in the same direction suggesting that this variable
gives a general indication of body size.
Download