Experimental Design

advertisement
Experimental Design
Experimental Design
Why Experimental Design?
Simple Example
The Intuitive Approach
SUGAR
Butter
SUGAR
EGG
Butter
EGG
Variable
(- )
0
(+ )
Sugar
[dl]
1
3
5
Butter
[g]
50
1 25
200
1
3
5
Eggs [number]
EGG
[+++]
[---]
SUGAR
Sugar
Butter
Eggs
1
1
50
1
2
5
50
1
3
1
200
1
4
5
200
1
5
1
50
5
6
5
50
5
7
1
200
5
8
5
200
5
9
3
1 25
3
10
3
1 25
3
Sugar
Butter
Eggs
Taste
1
1
50
1
2.1 2
2
5
50
1
5.98
3
1
200
1
5.52
4
5
200
1
8.7 1
5
1
50
5
1 .42
6
5
50
5
7 .59
7
1
200
5
5.22
8
5
200
5
9.87
9
3
1 25
3
6.25
10
3
1 25
3
6.1 4
Investigation: cake (MLR)
Scaled & Centered Coefficients for Taste
2
1
N=10
DF=3
R2=0.994
Q2=0.896
R2 Adj.=0.983
RSD=0.3373
Conf. lev.=0.95
B*E
S*E
S*B
E
B
S
0
Egg = 3
Design of Experiments
Aim of Study
Important Factors
Type of Design
Full Factorial Design
x3
x2
x1
Fractional Factorial Design
x3
x2
x1
Central Composite
Circumscribed (CCC) Design
x3
x2
x1
Can we measure what we
want?
Make Model
Find the relationship (equation)
between the variables and the
response
y = f ( X ) = x + Xb + e
Ordinary Least Squares (OLS)
Multiple Linear Regression (MLR)
1
b = (X ' X ) X ' y
Evaluate Model
Model Evaluation
Investigation: cake (MLR)
Summary of Fit
1.00
0.80
0.60
0.40
0.20
0.00
Taste
N=10
DF=3
Cond. no.=1.1180
Y-miss=0
R2
Q2
Model Validity
Reproducibility
Important Note #1
Midpoint
Midpoint
Important Note #2
Important Note #3
Correlation between number of storks
and number of new born babies in
Germany between 1930-1936
How Do We Use Experimental Design?
Design in Scores
Multivariate Projection Methods Used
Data Analysis
PCA
A data matrix X of n observations
(timepoints) and k variables (genes).
alpha0
alpha7
alpha14
alpha21
alpha28
alpha35
alpha42
alpha49
alpha56
alpha63
YAL001C YAL002W YAL003W YAL004W YAL005C
-0.15
-0.11
-0.14
-0.02
-0.05
-0.15
0.1
-0.71
-0.48
-0.53
-0.21
0.01
0.1
-0.11
-0.47
0.17
0.06
-0.32
0.12
-0.06
-0.42
0.04
-0.4
-0.03
0.11
-0.44
-0.26
-0.58
0.19
-0.07
-0.15
0.04
0.11
0.13
0.25
0.24
0.19
0.21
0.76
0.46
-0.1
-0.22
0.09
0.07
0.12
-0.2
0.57
0.04
0.49
Explore Data
Variables
Countries
Table of food patterns
European Food Preferences
4
Scores: t[1]/t[2]
Sweden
Finland
Norway
t[2]
2
Denmark
Portugal Austria
0
Italy Spain
-2
Germany
Belgium
SwitzerlLuxembou
Ireland
Holland England
France
-4
-6
-4
-2
0
2
4
6
t[1]
Loadings: p[1]/p[2]
0.4
Crisp_Bread
Fro_Fish
Fro_Veg
p[2]
0.2
Gr_Coffee
Margarine
Butter
0.0
Olive_Oil
-0.2
In_Potatoes
Garlic
Youghurt
-0.4
-0.2
-0.1
0.0
0.1
p[1]
Tea
Sweetner
Jam Ti_Soup
Biscuits
Oranges
Ti_Fruit
Apples
Pa_Soup
Inst_Coffee
0.2
0.3
4
Scores: t[2]/t[3]
Luxembourg
France
Spain
um
Germany
SwitzerlBelgItaly
Holland
t[3]
2
Sweden
Denmark
Norway
Portugal
Finland
Austria
0
-2
England
-4
Ireland
-4
-2
0
2
4
t[2]
Loadings: p[2]/p[3]
Garlic
Youghurt
Apples
p[3]
02
0.0
Gr_Coffee
Oranges
0.4
Olive_Oil
Ti_Fruit
Biscuits
Butter
Ti_Soup
Pa_Soup
Inst_Coffee
-0.2
Fro_Fish
In_Potatoes
Fro_Veg
Sweetner
Margarine
Crisp_Bread
Tea
Jam
-0.4
-0.2
0.0
p[2]
0.2
0.4
Example 1
Release
Prelabeling [3H]
aspartate
[14C]GABA
Incubation
with
SK 89976-A
5 min
5 min
Rinsing
5 min
30 s
Lewin, L, et al. Inhibition of Transporter Mediated –Aminobutyric Acid (GABA) Release by
SKF 89976-A, a GABA Uptake Inhibitor, Studied in a Primary Neuronal Culture from
Chicken, Neurochemical Research, 17, 1992, 577-584
Type of design
CCF design in
three variables
x3
x2
x1
Results
R2
Q2
Model Validity
Reproducibility
Investigation: Neuro (MLR)
Summary of Fit
1.00
0.80
0.60
0.40
0.20
0.00
GABA
N=29
DF=18
D-aspartate
Cond. no.=6.4693
Y-miss=0
Investigation: Neuro (MLR)
Scaled & Centered Coefficients for GABA
7
6
5
4
3
%
2
1
0
-1
-2
-3
N=29
DF=18
R2=0.749
Q2=0.413
R2 Adj.=0.610
RSD=1.7786
Conf. lev.=0.95
K+*Ca+
SKF*T
T*T
Ca+*Ca+
K+*K+
SKF*SKF
T
Ca+
K+
-5
SKF
-4
Investigation: Neuro (MLR)
Response Surface Plot
GA BA
K + = 1 5 .5
T i m e = 9 7 .5
MODDE 6.0 - 03/08/2002 04:09:55 PM
Uncontrolled variables
The combinatorial explosion
Synthesis
+
100
Diamines
HTS/screening
+
200
Ketones
300
Carboxylic acids
ID, verification
6 000 000
products
New lead compounds
&
new drugs
The chemical ”space”
…if you believe in that model…
Ketone Example
Scores
Loadings
t2
p[2]
10-Nonadecanone
mol. V
IR
F.W.
mp
bp
p[1]
t1
nd
dens
2-Butanone
A
Interpretation of the PCA result
Multivariate Design
Principal Properties and Factorial
Designs
A
B
+
+
B
A
C
C
O
O
H
H
+
R1
N
R2
H
"N
H
H
+
X
R3
R1
N
R2
"N
H
H
R3
BB´s
Core structure
with three BB´s
A B C
A
B
C
ti
t1A
t2A
t1B
t2B
t1C
t2C
Generators
a
b
c
ab
bc
abc
1
2
3
4
5
6
7
8
9
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0
0
0
0
0
0
Defining the settings
high and low levels
in the score plot
- + ++
- - +-
Is the design ”optimal”? Explain?
How Does PCA Work?
PCA is a window into a multidimensional space
PCA decompose a matrix of descriptors, X-matrix,
into scores, t, and loadings, p
K
t1 t2 t3
X-matrix
N
p1
p2
xm
Graphical View of PCA
Multiple variables (X1,
X2 X3 . . . . XN) are
used to generate
principal components
that are orthogonal to
each other and that
represent variance in
observations in
variable frame
X3
X2
X1
Graphical View of PCA
Variables
Observations
X3
X2
X1
Graphical view of PCA
Variables
Observations
X3
X2
X1
Graphical view of PCA
Variables
Observations
Principal Components
X3
PC1
X2
X1
Graphical view of PCA
Variables
Observations
Principal Components
X3
PC2
PC1
X2
X1
Graphical view of PCA
Variables
Observations
Principal Components
X3
PC2
PC1
Score plot
X2
X1
Scores and Loadings
For the first principal component:
Each observation gets a score value t,
X2
k
ti = p j xi , j
j =1
PC1
where pj is the loading for
variable xj
p j = cos( j )
1
(i)
ti
X1
Scores and Loadings
The score vector t, shows the
similarities/differences of the
observations
The loading vector p shows
how important the variables
are to form the t scores
.
X2
PC1
1
(i)
ti
X1
Model of X
.
After the first component, the
value xi,j can be approximated
by
xˆi , j = ti p j
The difference between the
predicted x and the observed x
is known as the residual value
ei,j
ei , j = xi , j xˆi , j
X2
PC1
1
(i)
ti
X1
Model of X
After the first principal component
X = t 1p1T + E
•
After A principal components included in the model
X = TP T + E = t 1p1T + t 2p T2 + ... + t Ap TA + E
Eigenvalues
t is the eigenvector corresponding to the largest eigenvalue of XXT
p is the eigenvector corresponding to the largest eigenvalue of XTX
The Nipals algorithm is one method for finding these
eigenvectors.
Nipals PCA algorithm
1. Choose t
2. p = XTt / (tTt)
3. p = p/norm(p)
4. t= Xp/pTp
Iterate until
convergence for
(tnew-told) / told
5. E = X - tpT
For additional components, use X=E and return to step 1.
Other Similar Methods
Download