einstein - University of Miami

Statistical design and modeling of experiments with high-tech applications C. F. Jeff Wu School of Industrial and Systems Engineering Georgia Institute of Technology •A statistical trilogy: data collection, analysis, decision making •Examples in high-tech applications:  nano technology  cell biology  complex system simulations 1 A Statistical Trilogy I. Data collection: II. Data modeling (incl. inference): III. Optimization and decision making: 2 A Statistical Trilogy I. Data collection: experimental design, sample surveys. II. Data modeling (incl. inference): regression, analysis of variance, time series analysis, survival data analysis. III. Optimization and decision making: decision analysis, Bayesian method. 3 What’s Next? The High-Tech Revolution • Availability of massive data: cannot do design of experiments, but can do data mining and data experimentation. • "The sexy job in the next 10 years will be statisticians,” Google chief economist (NY Times, 2009/8/5) • Physical experiments replaced by computer experiments (savings in cost and time, more feasible): a definite opportunity. • Other opportunities abound (nanotechnology, molecular medicine, biotech devices, alternative fuel): unknown territory, tremendous promises. 4 Statistical Work in Nano Technology The nano part is based on two papers: – A Statistical Approach to Quantifying the Elastic Deformation of Nanomaterials (X. Deng, V. R. Joseph, W. Mai*, Z. L. Wang* , C. F. J. Wu). Proc. Nat. Acad. Sciences, 106, 11845-50, 2009. – Robust optimization of the output voltage of nanogenerators by statistical design of experiments (J.Song*, H.Xie, W.Wu*, V.R.Joseph, C.F.J.Wu, Z.L.Wang*). Nano Research, 3(9) , 613-9, 2010. *School of Materials Science and Engineering, Georgia Tech 5 A Statistical Approach to Quantifying the Elastic Deformation of Nanomaterials • Existing method and drawbacks • A new method: Sequential Profile Adjustment by Regression (SPAR) • Demonstration on nanobelt data 6 Introduction • One-dimensional (1D) nanomaterials: fundamental building blocks for constructing nanodevices and nanosystems. • Important to quantify mechanical property such as elastic modulus of 1D nanomaterials: dictate their applications in nanotechnology. • A common strategy is to deform a 1D nanostructure using an AFM (Atomic Force Microscopy) tip. Schematic diagram of AFM 7 Method of Experimentation and Modeling • Mai and Wang (2006, Appl. Phys. Lett.) proposed a new approach to measure the elastic modulus of ZnO nanobelt (NB). AFM images of a suspended ZnO nanobelt • The AFM tip scans along the length of the NB under a constant applied force. • A series of bending profiles of the same NB are obtained by sequentially changing the magnitude of the contact force. 8 Free-Free Beam Model • Mai and Wang (2006) suggested a free-free beam model (FFBM) to quantify the elastic deflection (with free boundary condition): F h x L A B F h x L • The deflection v of NB at x is determined by where E is the elastic modulus, L is the width of trench, and I is the moment of inertia. • FFBM gives better fit than clamped-clamped beam model. 9 FFBM Profiles Example • The profiles are calculated based on FFBM. The force F changes from low 78 nN to high 261 nN. 10 Profiles of the Nanobelt Experiment • AFM image profiles of NB under load forces from low 78 nN to high 261 nN. • Initial bias of the nanobelt: – The NB is not perfectly straight: initial bending during sample manipulation. – The profile curves in Figure are not smooth: caused by a small surface roughness (around 1 nm) of the NB. 11 MW Method • Eliminate the initial bias: Normalize profiles by subtracting the first profile (acquired at 78 nN) from the profiles in (a). • The elastic modulus is estimated by fitting the normalized AFM image profiles using the FFBM. (MW method) 12 Problem with MW Method • Subtracting the first profile to normalize the data can result in poor estimation if the first profile behaves poorly. • Systematic biases can occur during the measurement, • Inconsistent (order reversal) pattern: profiles at applied force 235, 248 and 261 nN lie above on those obtained at lower force F = 209 and 222 nN. This pattern persists in the normalized profiles. 13 Problem with MW Method • Subtracting the first profile to normalize the data can result in poor estimation if the first profile behaves poorly. • Systematic biases can occur during the measurement. 235 nN 248 nN 261 nN 209 nN 222 nN • Inconsistent (order reversal) pattern: profiles at applied force 235, 248 and 261 nN lie above on those obtained at lower force F = 209 and 222 nN. This pattern persists in the normalized profiles. 14 Problem with MW Method • Subtracting the first profile to normalize the data can result in poor estimation if the first profile behaves poorly. • Systematic biases can occur during the measurement. 157 nN 170 nN 235 nN 183 nN 248 nN 261 nN 209 nN 131 nN 222 nN 144 nN • Inconsistent (order reversal) pattern: profiles at applied force 235, 248 and 261 nN lie above on those obtained at lower force F = 209 and 222 nN. This pattern persists in the normalized profiles. 15 Counter Measures • Experimenters: drop the data (i.e., five belts) that exhibit inconsistency. – loss of data and waste of information. • Statisticians: keep the data, use statistical modeling to remove the inconsistency. – remaining information in data be utilized. 16 SPAR: A New Method • The FFBM itself cannot explain the inconsistency. – Requires a more general model to include other factors besides the initial bias. • Propose a general model to incorporate the initial bias and other potential systematic biases. • Use model selection to choose an appropriate model. • The method is called sequential profile adjustment by regression (SPAR). 17 18 Causes of Systematic Biases • The changes of boundary conditions: – Can be nonlinear and irreversible during the measurement. – Can cause the occasional stick-slip events. • The wear and tear of AFM tip and the nanobelt surface. • The lateral shifting and sliding, and other artifacts. • Because of the nano scale, such causes are more acute in nano experiment and can occur at any stage of the experiment. 19 Model Selected from Deflection Data 20 F13 = 235 nN F14 = 248 nN F15 = 261 nN F11 = 209 nN F12 = 222 nN 21 F13 = 235 nN F14 = 248 nN F15 = 261 nN F11 = 209 nN F12 = 222 nN • Matching the FFBM better, but inconsistent pattern persists  22 F11 = 209 nN F12 = 222 nN F13 = 235 nN F14 = 248 nN F15 = 261 nN • Inconsistent pattern removed  23 • The δ12 term over-corrects and moves the curves down; this is rectified by adding δ10; curves are moved up, middle part smoothed  better match with FFBM. 24 std reduced by 50% . 25 Mechanistic vs. Statistical Modeling • The error and noise of the experiment are stochastic in nature. • It is difficult to develop a catch-all mechanistic model. – The mechanistic model is deterministic and predictive. • A purely statistical model lacks prediction power. • The proposed mechanistic-empirical modeling strategy can be a useful approach. – Make the statistical corrections physically meaningful. – Improve the estimation of physical parameters. 26 Understanding Cell Adhesion State Using Hidden Markov Model C. F. Jeff Wu+ (joint with Y. Hung*, V. Zarnitsyna§, Yijie Wang+, & C. Zhu§) + Georgia Tech, Industrial & Systems Engineering *Rutgers, the State University of New Jersey § Georgia Tech, Biomedical Engineering Based on NIH-GMS Grant 27 Cell adhesion • Motivated by the statistical analysis of biomechanical experiments at Georgia Tech. • Cell adhesion: binding of a cell to another cell or surface.  Mediated by interaction between cell adhesion proteins (receptors) and the molecules that they bind to (ligands). • Biologists describe the receptor-ligand binding as a key-to-lock type relation. • What makes cells sticky? When, how, and to what cells adhere? • Why important? It plays an important role in many physiological and pathological processes and in tumor metastasis in cancer study. 28 Thermal fluctuation experiment • It uses reduced thermal fluctuations to indicate the presence of receptor-ligand bonds. • Objective: Identify association and dissociation points for receptor-ligand bonds. • Accurate estimation of these points is essential because it is required for precise measurement of bond lifetimes and waiting times, it forms the basis for subsequent estimation of the kinetic parameters. 29 Experimental setting • A micropipette red blood cell with a bead (probe) glued to its apex (left) was aligned against another bead (target) aspirated by another pipette (right). (Developed at Georgia Tech.) • Driven by a piezoelectric translator, a computer-programmed test cycle consisted of an approach-push-retract-hold-return cycle. • During the holding period, the left pipette was held stationary to allow the probe and the target to contact via thermal fluctuations, thereby providing an opportunity for the receptors and ligands to interact. • Position of probe was tracked by image analysis software to produce data. 30 Data • Interested in the thermal fluctuation during the holding period. • Bond formation is equivalent to adding a molecular spring in parallel to the force transducer spring to stiffen the system the fluctuation decreases when a receptor-ligand bond forms and resumes when the bond dissociates. Bond Bond forms dissociates 31 Challenges • Challenges in identifying the bond association/dissociation points:  Points are not directly observable.  Observations are not independent.  In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 32 Challenges • Challenges in identifying the bond association/dissociation points:  Points are not directly observable. Can only be detected by variance changes.  Observations are not independent.  In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 33 Challenges • Challenges in identifying the bond association/dissociation points:  Points are not directly observable. Can only be detected by variance changes.  Observations are not independent. Need to take into account cell memory effect. Binding probability increases if there is a binding in the immediate past.  In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 34 Challenges • Challenges in identifying the bond association/dissociation points:  Points are not directly observable. Can only be detected by variance changes.  Observations are not independent. Need to take into account cell memory effect. Binding probability increases if there is a binding in the immediate past.  In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 35 Hidden Markov Models (HMM) Framework • Assume the probe fluctuates with different variances that correspond to different underlying binding states. • These states, including no bond and a number of distinct types of bonds, are not observable but the process of these binding states change can be captured by a Markov chain model. • Such Markov chain process can also be used to capture the cell memory effect. 36 Hidden Markov Model with two states 37 Hidden Markov Model with two states 38 Hidden Markov Model with two states 39 Hidden Markov Model with two states 40 Hidden Markov Model with two states 41 Transition Probability in HMM • aij , i, j  0,1 denotes the prob. of going from state i to state j • A large a11 indicates a memory effect • Called “Hidden” because the Markov chain transition works underneath the normal distribution N(μi,σi²) for state i 42 Analysis Results for Two States 43 HMM with three states • No bond, P-selectin bond, L-selectin bond: P/L-selectin are different proteins on cell surface. They play an important role in transiently rolling process of cell. • It is known that L-selectin has a more stiff string than P-selectin σL² < σp² . This physical knowledge allows us to focus the HMM on the variance change as an indication of chang of bond type. 44 Thermal fluctuation data: Three states Three States Experiment Data 50 40 30 Position x (nm) 20 10 0 -10 -20 -30 -40 -50 0 100 200 300 400 500 observation 600 700 800 45 Estimation for HMM •   a00   a10  a  20   a01  a02   a01 a02   0.9499 0.0498 0       a11 a12    0.0018 0.8953 0.1029    a21 a22   0.0449 0.0636 0.8915  : No bond (state 0) more likely transits to P-bond (state 1) than to L-bond (state 2) •   a12  a10 : P-bond more likely transits to L-bond than to no bond • a20  a21 : not much difference • Estimates attached with statistical significance 46 Analysis for three states 47 Why computer experiments? 48 Some examples 49 Statistical Meta-Modeling of Computer Experiments Uncertainty Quantification 50 GP with quanti/quali factors: Data Center Thermal Distribution 51 Configuration Variables for Data Center Example • Five quantitative factors: rack temperature rise, rack power, diffuser angle, diffuser flow rate, ceiling height • Three qualitative factors: diffusor location, hot-air return-vent location, power allocation 52 Gaussian Process Models with Quantitative and Qualitative Factors 53 Summary • Statistics not used in some high-tech applications, e.g., Nobel-winning experimental effects (or Science, Nature) should be “obvious”. • It has made impact in industrial work when “incremental” improvement needs statistical tools; increasingly popular for high-tech work when “subtle” effects need to be ascertained. • Massive online data is the biggest opportunity for stat, e.g., webpage design and optimization using stat doe. • Major role in complex stochastic system study. 54 55

einstein - University of Miami

Related documents

Products

Support

einstein - University of Miami

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib