Non-linear Regression Analysis with Fitter Software Application Alexey Pomerantsev Semenov Institute of Chemical Physics Russian Chemometrics Society 12.02.02 1 Agenda 1. Introduction 2. TGA Example 3. NLR Basics 4. Multicollinearity 5. Prediction 6. Testing 7. Bayesian Estimation 8. Conclusions 12.02.02 2 1. Introduction 12.02.02 3 Linear and Non-linear Regressions Linear f a1 1 ( x ) a p p ( x ) f f=aa1exp(-20x) 1 ( x) f=exp(-ax) any fa(pa, xp)( x ) Formula Example Source Choice Dimension Soft Interpretation Soft Tools 12.02.02 Hard Close relatives? Multicollinearity Purpose Non Linear any f (a, x) Easy ? Large Difficult ? Small Excess of parameters Well-known Interpolation Many Lack of data Uncommon Extrapolation Few 4 2. Thermo Gravimetric Analysis Example Object Goal PVC Cable Isolation Let’s see it! Service-Life Prediction Experiment Thermo Gravimetric Method Tool Non-Linear Regression and Fitter 12.02.02 5 TGA Experiment and Data TGA Experiment TGA Data 510 Sample 490 0.98 470 0.96 450 430 0.94 410 0.92 390 0.90 HEAT Temperature T , K Plasticizer Chamge in mass, y 1.00 370 0 10 20 30 40 50 Time t , min This is Experiment! Not a Hell of Flame! 12.02.02 6 TGA Example Variables Estimated Measured Intermediate Response y=m/m Change in mass 0 Predictors t Time C Plasticizer concentration Small size problem! Parameters y0 Initial value of y C0 Initial concentration k0 Evaporation rate constant v T0 Heating rate E Activation energy F Sample specific surface Initial temperature Sample specific surface F S 22 R 2 V R -r 12.02.02 7 Plasticizer Evaporation Model Evaporation Law dy -k C, y(0) y 0 dt Diffusion is not relevant! Volume Change The Arrhenius law Temperature growth 12.02.02 C 1- 1- C0 y æ ç ç ç è ö E ÷ k F exp k0 ÷ RT ÷ø T=T0+vt 8 Fitter Worksheet for TGA Example A B C D E F C 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 F 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 t 0.00 1.15 2.25 3.40 4.55 5.65 6.75 7.85 9.00 10.10 0.4 0.4 0.4 0.4 0.4 2 2 2 2 2 3E+06 5E+06 8E+06 1E+07 1E+07 G H I J K L M N y 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 f 1.001 1.001 1.001 1.001 1.001 1.001 1.001 1 1 1 Left 1.001 1.001 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 'TGA Desorbtion Model D[y]/D[t]=-k*[1-(1-C0)/y];y(0)=y0 k=F*exp(k0-E/R/T) T=T0+v*t R=1.98717 y0=? k0=? E=? 0.931 0.871 0.821 0.781 0.748 0.915 0.844 0.788 0.746 0.714 Parameters estimation Name Initial Final Deviation y0 1 1.00088 0.0002 k0 10 13.9964 0.24016 10000 18052.3 225.424 E O 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Data v T0 3 373 3 373 3 373 3 373 3 373 3 373 3 373 3 373 3 373 3 373 0 0 0 0 0 12.02.02 293 293 293 293 293 9 Service Life Prediction by TGA Data 12.02.02 10 3. NLR Basics 12.02.02 11 Data and Errors Response y1 y2 . . . . y yN Predictors x 11 x 12 x 21 x 22 . . . . . . . . . . . Weights x 1m x 2m Weight is . . . . a w X . . an effective . . instrument! . . . . x N1 x N2 a1 a2 ap x Nm Absolute error yi f i e i Relative error yi f i (1 e i ) 12.02.02 w1 w2 Parameters wN Fit f1 f2 . . f . fN Weight and variance wi2 cov(e i , e i ) s e2 Const 12 Model f(x,a) 'Цикл "увлажнение-сушка" M=Sor*hev(t1-t)+Des*[hev(t-t1)+imp(t-t1)] 'Кинетика "увлажнения" Sor=Sor1*hev(USESor1)+Sor2*[hev(-USESor1)+imp(-USESor1)] 'Кинетика "сушки" Des=Des1*hev(USEDes1)+Des2*[hev(-USEDes1)+imp(-USEDes1)] 'Условие применимости асимптотик USESor1=Sor2-Sor1 USEDes1=Des1-Des2 'константы и промежуточные величины t3=(t-t1)*hev(t-t1) t4=t*hev(t1-t)+t1*[hev(t-t1)+imp(t-t1)] P2=PI*PI P12=(PI)^(-0.5) R=r*(M1-M0)*exp(-r*t4) K=M1+(M0-M1)*exp(-r*t4) V0=M0-C0 V1=M1-C0 'асимптотика сорбции при 0<t<tau Sor1=C0+4*P12*(d*t)^0.5*[M0-C0+(M1-M0)*beta] beta=1-exp(-z) x=r*t z=(a1*x+a2*x*x+a3*x*x*x)/(1+b1*x+b2*x*x+b3*x*x*x) a1=0.6666539250029 a2=0.0121051017749 a3=0.0099225322428 b1=0.0848006232519 b2=0.0246634591223 b3=0.0017549947958 'кинетика сорбции при tau<t<t1 Sor2=K-8*S1 S1=U01/n0+U11/n1+U21/n2+U31/n3+U41/n4 n0=P2 U01=[(V0*n0*d-V1*r)*exp(-n0*d*t4)+R]/(n0*d-r) n1=P2*9 U11=[(V0*n1*d-V1*r)*exp(-n1*d*t4)+R]/(n1*d-r) n2=P2*25 U21=[(V0*n2*d-V1*r)*exp(-n2*d*t4)+R]/(n2*d-r) n3=P2*49 U31=[(V0*n3*d-V1*r)*exp(-n3*d*t4)+R]/(n3*d-r) n4=P2*81 U41=[(V0*n4*d-V1*r)*exp(-n4*d*t4)+R]/(n4*d-r) 'асимптотика десорбции при t1<t<t1+tau Des1=K*[1-4*P12*(d*t3)^0.5]-8*S1 'кинетика десорбции при t1+tau<t Des2=8*S2 S2=U02/n0+U12/n1+U22/n2+U32/n3+U42/n4 U02=(K-U01)*exp(-n0*d*t3) U12=(K-U11)*exp(-n1*d*t3) U22=(K-U21)*exp(-n2*d*t3) U32=(K-U31)*exp(-n3*d*t3) U42=(K-U41)*exp(-n4*d*t3) 'неизвестные параметры d=? M0=? M1=? C0=? r=? t1=? Different shapes of the same model Explicit model Implicit model Diff. equation y = a + (b – a)*exp(–c*x) Rather d[y]/d[x] = – c (y –a); y(0) = b complex model! 0 = a + (b – a)*exp(–c*x) – y * Presentation at worksheet ' Explicit model y=a+(b-a)*exp(-c*t) a=? b=? c=? 12.02.02 ' Diff. equation d[y]/d[t]=-c*(y-a); y(0)=b a=? b=? c=? 13 Data & Model Prepared for Fitter Predictor A B 1 2 3 4 5 6 7 8 9 10 11 C D Values Response Comment Weight E F G H I J K L M Apply Fitter! BoxBod Data x y w f 0 0 0.00 1 109 1 90.11 2 149 1 142.24 3 149 1 172.41 5 191 1 199.95 7 213 1 209.17 10 224 1 212.91 Parameters a 100 213.80941 b 0.4 0.5472375 'BoxBOD model y=a*[1-exp(-b*x)] a=? b=? y 200 100 0 0 Fitting 4 8 x Equation Parameters 12.02.02 14 Objective Function Q(a) N Sum of squares S (a) å wi2 ( yi - f i ) 2 g i2 i 1 [ ]Q Objective function is a sum Q(a ) Sof (a ) Bsquares (a ) and may be more… Bayesian term B(a ) s02 N 0 (a - b) t H (a - b) Objective function Parameter estimates aˆ arg min Q(a) 12.02.02 Weighted variance estimate S (aˆ ) s Nf 2 Nf N- p 15 Very Important Matrix A Hesse’s matrix Gauss’ approximation Aab 1 ¶ 2 Q (a ) , a , b 1,K, p 2 ¶aa ¶a b A V tV ( X t X in linear regression) Matrix A is the ¶ f ( x , a) V w , a 1,..., p; i 1,.., N cause of¶ a troubles.. Model derivatives ai i i a Covariance matrix C s 2 A-1 F -1 F-matrix F s -2 A C -1 12.02.02 16 Quality of Estimation Covariances Matrix Deviations Vector cov (aˆ, aˆ ) C s 2 A-1 dev (aˆa ) Caa Matrix Acor (is the C aˆ , aˆ ) C C measure of quality! a Correlations Matrix Final Objective Value Error Variance and Number Degrees of Freedom 12.02.02 ab b aa bb Q(aˆ) min Q(a) S (aˆ ) s Nf 2 Nf N- p 17 Search by Gradient Method Q (a ) Q(a n ) bnt (a - a n ) 1 (a - a n ) t A (a - a n ) 2 a n1 a n A bn Matrix A is the key to search! a Search problems Initial point 0 det( A) 0 Local minima 12.02.02 Objective function Q (a ) A A I Parameters a a 7aa6 âa 5 4a3 a2 a1 a0 18 4. Multicollinearity 12.02.02 19 Multicollinearity: View Multicollinearity is degradation of matrix A Objective function Q(a) Spread of eigenvalues æ l max N( A) log 10 ç ç l min è ö ÷ ÷ ø 8 7.2 a1 6.4 5.6 5.6 4.8 4.8 44 3.2 3.2 2.4 1.6 0.8 00 1.2 1.2 a2 2.4 3.6 4.8 4.8 6 7.2 7.2 is a measure of degradation N(A) = 2 1 7 6 5 4 12.02.02 20 Multicollinearity: Source “Hard” multicollinearity “Soft” multicollinearity ( ) y a1 1 - e -a2 x a1a2 x at a2 x << 1 y a1a2 x y aa22xx=0.05 =1.0 =0.5 =0.1 0.05 0.1 0.8 0.4 00 00 12.02.02 55 10 10 x 21 Data & Model Preprocessing ((a + b) + c) + d a + (b + (c + d)) as 1+10 –20 = 1 Representation of a number in computer with 64 bits 1 0 0 1 … 0 1 1 Target is to make Hesse matrix A regular and decrease N(A) M e a n s 12.02.02 Data Scaling Data Centering Model Adjusting X mX X X – X0 a (a) x (x) y (y) 22 Example: The Arrhenius Law Standard form Adjusted form æ E ö k exp ç ÷ è RT ø exp (a1 - a2 X ) Scaling & centering 1000 X - X0 T 1 n 1000 X0 å n i 1 Ti Adjusting a1 ln (k ) - a2 X 0 a2 k 10+11 E 10+4 N(A) = 20 12.02.02 a1 1 E 1000 R a2 10 N(A) = 2 23 Derivative Calculation and Precision N(A) A-1 y=f (a,x) 0=f (y,a,x) dy/dx=f (y,a,x) 6 10 8 6+2=8 10+2=12 8+2=10 8+0=8 12+0=12 10+0=10 8+2=10 10+2=12 12+2=14 14+2=16 10+2=12 12+2=14 1) Numerical calculation of difference derivatives C 2) Auto calculation of analytical derivatives f=exp(-a*t) 12.02.02 df/da=-t*exp(-a*t) 24 5. Prediction 12.02.02 25 Reliable Prediction Estimate of response Confidence limits Linearization yˆ f ( x, aˆ ) Prob{l ( x, P) < f ( x, a) < r ( x, P) } ³ P r ( x , P) f ( x , aˆ ) g ( P) v t Cv Forecast should include uncertainties! 12.02.02 26 Nonlinearity and Simulation * a * ~ N (aˆ, C ) a1* ,K, a*M f 1* ,K, f M r ( P,x) Non-linear models call for special methods of reliable prediction! 12.02.02 27 Prediction: Example Model of rubber aging Y 1 - e -( kt) , a Accelerated aging tests T=383K Y 1.00 T=368K 1.00 Simulation Mean 0.80 0.80 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 12.02.02 48 T=293K Linearization T=353K 24 E RT Upper confidence limits Y 0 k e k0 - 72 time, hr 96 120 0 60 120 180 240 time, day 300 360 28 6. Testing 12.02.02 29 Hypotheses Testing Test statistics x is compared with critical value t (a) From experiment Test don’t prove a model! It just shows that the hypothesis is accepted or rejected! 12.02.02 From theory 30 Lack-of-Fit and Variances Tests These hypotheses are based on variances and they can’t be tested without replicas! 100 Replica 1 6 Variances by replicas Replica 2 50 Lack-of-Fit is a wily test! 4 Variance Response 75 2 25 0 0 0.5 1.5 2.5 3.5 4.5 5.5 Predictor 12.02.02 31 Outlier and Series Tests These hypotheses are based on residuals and they can be tested without replicas Response Positive residual Acceptable deviation 1.2 Series test is very sensitive! 5 Series of signs Positive residuals 6 Negative residuals 11 1 0.8 Negative residual 0.6 Outlier 0.4 0.2 0 5 10 15 Predictor 12.02.02 32 7. Bayesian Estimation 12.02.02 33 Bayesian Estimation How to eat away an elephant? Slice by slice! 12.02.02 34 Posterior and Prior Information. Type I Posterior Information Prior Information Parameter estimates â Prior parameter values b Matrix F Recalculated matrix H Error variance s2 Prior variance value NDF Prior NDF The same error sin N each portion ofN data! f 0 2 0 Objective Function Q ( a ) S ( a ) B (a ) B(a ) s02 (N 0 R(a ) ) R(a ) (a - b) t H (a - b) 12.02.02 35 Posterior and Prior Information. Type II Posterior Information Prior Information Parameter estimates â Prior parameter values b Matrix F Recalculated matrix H Different errors in Q (a ) S (a ) B (a ) each portion of data! æ R (a ) ö B(a ) exp ç R (a ) (a - b ) H (a - b ) ÷ Objective Function t è N ø 12.02.02 36 8. Conclusions Mysterious Nature LR Model NLR Model Thank you! 12.02.02 37