Experimental Design Experimental Design Why Experimental Design? Simple Example The Intuitive Approach SUGAR Butter SUGAR EGG Butter EGG Variable (- ) 0 (+ ) Sugar [dl] 1 3 5 Butter [g] 50 1 25 200 1 3 5 Eggs [number] EGG [+++] [---] SUGAR Sugar Butter Eggs 1 1 50 1 2 5 50 1 3 1 200 1 4 5 200 1 5 1 50 5 6 5 50 5 7 1 200 5 8 5 200 5 9 3 1 25 3 10 3 1 25 3 Sugar Butter Eggs Taste 1 1 50 1 2.1 2 2 5 50 1 5.98 3 1 200 1 5.52 4 5 200 1 8.7 1 5 1 50 5 1 .42 6 5 50 5 7 .59 7 1 200 5 5.22 8 5 200 5 9.87 9 3 1 25 3 6.25 10 3 1 25 3 6.1 4 Investigation: cake (MLR) Scaled & Centered Coefficients for Taste 2 1 N=10 DF=3 R2=0.994 Q2=0.896 R2 Adj.=0.983 RSD=0.3373 Conf. lev.=0.95 B*E S*E S*B E B S 0 Egg = 3 Design of Experiments Aim of Study Important Factors Type of Design Full Factorial Design x3 x2 x1 Fractional Factorial Design x3 x2 x1 Central Composite Circumscribed (CCC) Design x3 x2 x1 Can we measure what we want? Make Model Find the relationship (equation) between the variables and the response y = f ( X ) = x + Xb + e Ordinary Least Squares (OLS) Multiple Linear Regression (MLR) 1 b = (X ' X ) X ' y Evaluate Model Model Evaluation Investigation: cake (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 Taste N=10 DF=3 Cond. no.=1.1180 Y-miss=0 R2 Q2 Model Validity Reproducibility Important Note #1 Midpoint Midpoint Important Note #2 Important Note #3 Correlation between number of storks and number of new born babies in Germany between 1930-1936 How Do We Use Experimental Design? Design in Scores Multivariate Projection Methods Used Data Analysis PCA A data matrix X of n observations (timepoints) and k variables (genes). alpha0 alpha7 alpha14 alpha21 alpha28 alpha35 alpha42 alpha49 alpha56 alpha63 YAL001C YAL002W YAL003W YAL004W YAL005C -0.15 -0.11 -0.14 -0.02 -0.05 -0.15 0.1 -0.71 -0.48 -0.53 -0.21 0.01 0.1 -0.11 -0.47 0.17 0.06 -0.32 0.12 -0.06 -0.42 0.04 -0.4 -0.03 0.11 -0.44 -0.26 -0.58 0.19 -0.07 -0.15 0.04 0.11 0.13 0.25 0.24 0.19 0.21 0.76 0.46 -0.1 -0.22 0.09 0.07 0.12 -0.2 0.57 0.04 0.49 Explore Data Variables Countries Table of food patterns European Food Preferences 4 Scores: t[1]/t[2] Sweden Finland Norway t[2] 2 Denmark Portugal Austria 0 Italy Spain -2 Germany Belgium SwitzerlLuxembou Ireland Holland England France -4 -6 -4 -2 0 2 4 6 t[1] Loadings: p[1]/p[2] 0.4 Crisp_Bread Fro_Fish Fro_Veg p[2] 0.2 Gr_Coffee Margarine Butter 0.0 Olive_Oil -0.2 In_Potatoes Garlic Youghurt -0.4 -0.2 -0.1 0.0 0.1 p[1] Tea Sweetner Jam Ti_Soup Biscuits Oranges Ti_Fruit Apples Pa_Soup Inst_Coffee 0.2 0.3 4 Scores: t[2]/t[3] Luxembourg France Spain um Germany SwitzerlBelgItaly Holland t[3] 2 Sweden Denmark Norway Portugal Finland Austria 0 -2 England -4 Ireland -4 -2 0 2 4 t[2] Loadings: p[2]/p[3] Garlic Youghurt Apples p[3] 02 0.0 Gr_Coffee Oranges 0.4 Olive_Oil Ti_Fruit Biscuits Butter Ti_Soup Pa_Soup Inst_Coffee -0.2 Fro_Fish In_Potatoes Fro_Veg Sweetner Margarine Crisp_Bread Tea Jam -0.4 -0.2 0.0 p[2] 0.2 0.4 Example 1 Release Prelabeling [3H] aspartate [14C]GABA Incubation with SK 89976-A 5 min 5 min Rinsing 5 min 30 s Lewin, L, et al. Inhibition of Transporter Mediated –Aminobutyric Acid (GABA) Release by SKF 89976-A, a GABA Uptake Inhibitor, Studied in a Primary Neuronal Culture from Chicken, Neurochemical Research, 17, 1992, 577-584 Type of design CCF design in three variables x3 x2 x1 Results R2 Q2 Model Validity Reproducibility Investigation: Neuro (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 GABA N=29 DF=18 D-aspartate Cond. no.=6.4693 Y-miss=0 Investigation: Neuro (MLR) Scaled & Centered Coefficients for GABA 7 6 5 4 3 % 2 1 0 -1 -2 -3 N=29 DF=18 R2=0.749 Q2=0.413 R2 Adj.=0.610 RSD=1.7786 Conf. lev.=0.95 K+*Ca+ SKF*T T*T Ca+*Ca+ K+*K+ SKF*SKF T Ca+ K+ -5 SKF -4 Investigation: Neuro (MLR) Response Surface Plot GA BA K + = 1 5 .5 T i m e = 9 7 .5 MODDE 6.0 - 03/08/2002 04:09:55 PM Uncontrolled variables The combinatorial explosion Synthesis + 100 Diamines HTS/screening + 200 Ketones 300 Carboxylic acids ID, verification 6 000 000 products New lead compounds & new drugs The chemical ”space” …if you believe in that model… Ketone Example Scores Loadings t2 p[2] 10-Nonadecanone mol. V IR F.W. mp bp p[1] t1 nd dens 2-Butanone A Interpretation of the PCA result Multivariate Design Principal Properties and Factorial Designs A B + + B A C C O O H H + R1 N R2 H "N H H + X R3 R1 N R2 "N H H R3 BB´s Core structure with three BB´s A B C A B C ti t1A t2A t1B t2B t1C t2C Generators a b c ab bc abc 1 2 3 4 5 6 7 8 9 + + + + + + + + + + + + + + + + + + + + + + + + 0 0 0 0 0 0 Defining the settings high and low levels in the score plot - + ++ - - +- Is the design ”optimal”? Explain? How Does PCA Work? PCA is a window into a multidimensional space PCA decompose a matrix of descriptors, X-matrix, into scores, t, and loadings, p K t1 t2 t3 X-matrix N p1 p2 xm Graphical View of PCA Multiple variables (X1, X2 X3 . . . . XN) are used to generate principal components that are orthogonal to each other and that represent variance in observations in variable frame X3 X2 X1 Graphical View of PCA Variables Observations X3 X2 X1 Graphical view of PCA Variables Observations X3 X2 X1 Graphical view of PCA Variables Observations Principal Components X3 PC1 X2 X1 Graphical view of PCA Variables Observations Principal Components X3 PC2 PC1 X2 X1 Graphical view of PCA Variables Observations Principal Components X3 PC2 PC1 Score plot X2 X1 Scores and Loadings For the first principal component: Each observation gets a score value t, X2 k ti = p j xi , j j =1 PC1 where pj is the loading for variable xj p j = cos( j ) 1 (i) ti X1 Scores and Loadings The score vector t, shows the similarities/differences of the observations The loading vector p shows how important the variables are to form the t scores . X2 PC1 1 (i) ti X1 Model of X . After the first component, the value xi,j can be approximated by xˆi , j = ti p j The difference between the predicted x and the observed x is known as the residual value ei,j ei , j = xi , j xˆi , j X2 PC1 1 (i) ti X1 Model of X After the first principal component X = t 1p1T + E • After A principal components included in the model X = TP T + E = t 1p1T + t 2p T2 + ... + t Ap TA + E Eigenvalues t is the eigenvector corresponding to the largest eigenvalue of XXT p is the eigenvector corresponding to the largest eigenvalue of XTX The Nipals algorithm is one method for finding these eigenvectors. Nipals PCA algorithm 1. Choose t 2. p = XTt / (tTt) 3. p = p/norm(p) 4. t= Xp/pTp Iterate until convergence for (tnew-told) / told 5. E = X - tpT For additional components, use X=E and return to step 1. Other Similar Methods