ASCA - Biosystems Data Analysis Group

advertisement
ASCA: analysis of multivariate data
from an experimental design,
Biosystems Data Analysis group
Universiteit van Amsterdam
Contents
•
•
•
•
ANOVA
SCA
ASCA
Conclusions
ANOVA
• different design factors contribute to the
variation
For two treatments A and B the total sum of
squares can be split into several contributions
SStotal  SS A  SS B  SS AB  SSwithin
xcdq  μq  αcq  β dq αβcdq
Example
Experiment:
Rats
are given Bromobenzene
that affects the liver
Measurements: NMR spectroscopy of urine
Experimental Design:
Rats
6 hours
24 hours
Time: 6, 24 and 48 hours
48 hours
Groups: 3 doses of BB
3.0275
Vehicle group, Control group
2.055
3.285
3.0475
5.38
Animals: 3 rats per dose per time
point
3.675
3.7525
2.7175
2.075
2.93
10
8
6
4
2
chemical shift (ppm)
0
NMR Spectroscopy
0.7
3.0275
-
Each type of H-atom
has a specific Chemical
shift
-
The peak height is
number of H-atoms at
this chemical shift =
metabolite
concentration
-
NMR measures
‘concentrations’ of
different types of Hatoms
0.6
0.5
0.4
2.055
0.3
3.285
5.38
3.0475
0.2
3.675
3.7525
2.7175
0.1
0
10
2.075
2.93
8
6
4
chemical shift (ppm)
2
0
Different contributions
Experimental Design
Time
4
0
Metabolite concentration
3.5
0.2
0.4
time
0.6
0.8
1
3
2.5
Dose
2
1.5
1
0
0.2
0.5
0.4
time
0.6
0.8
1
0
-0.5
0
0.2
0.4
0.6
0.8
1
time
Trajectories
Animal
0
0.2
0.4 time 0.6
0.8
1
The Method I: ANOVA
xhki     k   hk   hki
hk
Symbol
Meaning
k
Time
h
Dose group
ih
Individual
xhihk
Data
hk
Estimates of these factors:

xhkihk  x...  x..k  x...  xh.k  x..k   xhkihk  xh.k
Constraints:

 0
0
0.2
0.4
 0
time
0.6
0.8
1
k
0
k
  
hk
0
0.2
0.4
 0
time
0.6
0.8
1
0
h
  
hkihk
0
0.2
0.4 time 0.6
0.8
1
ihk
0

The Method II
ANOVA is a Univariate technique
xhkihk     k   hk   hkihk
3.0275
2.055
5.38 3.285
3.0475
3.675
3.7525
2.7175
2.075
2.93
xhihk
X
x
For all values in
the ANOVA
equation
e.g.:
αk  X α
MATRICES:
Structured !
X  1m  X α  X αβ  X αβγ
T
2
T 2
X  1m
2
2
 X α  X αβ  X αβγ
2
Multivariate Data
NMR Spectroscopy
0.04
0.7
3.0275
0.6
0.03
6.01 ppm
0.5
0.4
2.055
0.3
5.38
0.02
0.01
3.285
3.0475
0
0.2
3.675
3.7525
0.1
0
10
8
2.7175
2.075
2.93
6
4
2
chemical shift (ppm)
Or:
Relationship
between
the columns of
X
X
-0.01
0
-0.02
-0.2 -0.1
0
0.1 0.2 0.3
2.05 ppm
0.4
0.5
Covariance between the
variables
The Method III: Principal Component
Analysis
Loading PC 1
Loading PC 2
3
2.5
Loading PC 1
Loading PC 2
X
1.5
1
0.5
Scores
0
1
0.6
1
0.5
0.5
0
x2
0
0.4
X  1m  TP  E
T
scores
0.2
x1
T
loadings
residuals
PC 2
x3
2
0
-0.2
-0.4
3D  2D … Imagine!
350D  2D !!!
-0.6
-0.8-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
PC 1
The Method IV: ANOVA and PCA  ASCA
X  1m  Xα  Xαβ  Xαβγ
T
Pα
Pαβ
Column spaces
are
Orthogonal
Pαβγ
X
E
Tα
Tαβ
Tαβγ
T
T
X  1mT  TαPαT  TαβPαβ
 TαβγPαβγ
E
Parts of the
data not
explained by
the
component
models
In Words:
• ASCA models the different contributions
to the variation in the data
• ASCA takes the covariance between the
variables into account
• ASCA gives a solution for the problem
at hand.
Results I
0.5
0.4
0.3
Xαβ
Scores
Xα
Xαβγ
control
vehicle
low
medium
high
40 %
αβ -scores
0.2
0.1
0
-0.1
-0.2
6
24
Time (Hours)
48
Results II
0.5
0.4
0.3
Scores
• Quantitative
effect!
• No effect of
vehicle
• Scores are in
agreement with
visual inspection
control
vehicle
low
medium
high
0.2
0.1
0
-0.1
-0.2
6
24
Time (Hours)
48
Results III  biomarkers
3.0475
5.38
Unique to the α submodel
3.7525
3.675
α
3.9675
Differences
between submodels
2.735
2.055
2.5425
2.5825
2.6975
Interesting for Biology
2.055
Interesting for
Diagnostics
2.075
2.91
3.0275
2.93
αβ
3.9675
2.735
2.6975
2.5825
3.285
3.2625
2.93
3.0475
αβγ
2.075
2.055
3.73
3.8875
2.735
3.0275
3.285
10
8
6
4
chemical shift (ppm)
2
0
Conclusions
• Metabolomics (and other –omics)
techniques give multivariate datasets
with an underlying experimental design
• For this type of data, ASCA can be used
• The results observed for this experiment
are in accordance with clinical
observations
• The metabolites that are responsible for
this variation can be found using ASCA
 BIOMARKERS
Discussion
1. How can I perform statistics on the
ASCA model? (e.g. Significance
testing)
2. Are there other constraints possible for
this model? (e.g. stochastic
independence)
3. Are there alternative methods for
solving this problem?
Download