Molecular Weight Determination of Unknown Proteins for NASA/JPL PAIR Program August 24, 2001 Barbara Falkowski Falgun Patel Celia Smith The Overall Goal To determine molecular weight of unknown electrophoresis data Method to Achieve the Goal Measure distances of unknown standards with PhotoShop and Spotviewer Decide whether Spotviewer or Photoshop is the better measuring tool. Run models on standard proteins Decide which model(s) work the best for the standards Run model(s) on unknown proteins. Decide which model(s) worked the best on the unknowns SpotViewer Disadvantages Did not measure dye-front distance One needed to go into Photoshop to mark or crop the dye-front distance. Spotviewer missed bands Did not always pick up bands that were thin, blurry or close together. Sometimes gave two measurement values to one band Or gave values that were associated with any band. Did not pick up very light bands. PhotoShop Advantages Did not need assistance from another program. Not as time consuming Light bands could be more easily discerned through color inversion/manipulation of the image. This also worked well with tightly packed, thin and blurred bands. Models Tested Gels/Protein Used Quadratic Regression Quadratic Cross Validation SLIC Log-Linear Model Log-Log Model Local Linear Model Quadratic Interpolation Vitelline Envelopes (VE) for two species (Strongylocentrotus purpuratus and Lytechinus pictus) Vitelline Envelopes for two methods (DTT and mechanically isolated) Which model worked the best? No single model was best for all of the gels. It was found that different models worked better for different gels. Quadratic Regression Model - 15 % Gel #1 S.purp/L.pictus VE DTT Removal SLIC Model - Gradient Gel #2 Jelly + Seminal Plasma + VE Time Courses LOG-LOG Model - 12. 5% Gels Gel #4 VE + Tris Supernatant Time Course and Gel # 6 VE + Tris Pellet Time Course Why was the Quadratic Model chosen for the Gel #1? Gel #1 15%SLIC Gel S. purp/L..pictus SLICVE DTT Removal Quadratic Regression Quadratic Regression Lane 2 Residuals Residuals Squared Residuals Residuals Squared Band 1 0.01132444 0.00012824 0.03853645 0.00148506 Band 2 0.00300890 0.00000905 0.00309766 0.00000960 Band 3 0.00109427 0.00000120 0.02988180 0.00089292 Band 4 0.00827156 0.00006842 0.05186572 0.00269005 Band 5 0.00688821 0.00004745 0.04430756 0.00196316 Band 6 0.00783585 0.00006140 0.02173100 0.00047224 Band 7 0.00975655 0.00009519 0.01972984 0.00038927 Sums 0.00000000 0.00041095 0.00000000 0.00790229 R-Squared 0.92768402 0.98858508 Gel #1 15%LOG-LOG Gel S. purp/L..pictus LOG-LOG VE DTT Removal Quadratic Interpolation Quadratic Interpolation Lane 2 Residuals Residuals Squared Residuals Residuals Squared Band 1 0.00026347 0.00000007 0.10070348 0.01014119 Band 2 0.01672254 0.00027964 0.01238500 0.00015339 Band 3 0.00637001 0.00004058 0.00503027 0.00002530 Band 4 0.03252705 0.00105801 0.01377812 0.00018984 Band 5 0.03261521 0.00106375 0.08413179 0.00707816 Band 6 0.00057092 0.00000033 0.03163810 0.00100097 Band 7 0.01074814 0.00011552 0.12022332 0.01445365 Sums 0.00000000 0.00255790 0.09314636 0.00867624 R-Squared 0.91630500 Gel #1 15%LOG Gel Linear S. purp/L..pictus LOG VE Linear DTT Removal Local Linear Local Linear Lane 2 Residuals Residuals Squared Residuals Residuals Squared Band 1 0.06289249 0.00395547 0.02458821 0.00060458 Band 2 0.10387439 0.01078989 0.02458821 0.00060458 Band 3 0.07983165 0.00637309 0.00236728 0.00000560 Band 4 0.03433984 0.00117922 0.01533503 0.00023516 Band 5 0.04476902 0.00200427 0.03416853 0.00116749 Band 6 0.02631810 0.00069264 0.02429362 0.00059018 Band 7 0.08820824 0.00778069 0.03316659 0.00110002 Sums 0.21118107 0.03277527 0.15850747 0.02512462 R-Squared 0.955670994 0.97855752 •Took Quadratic Regression of standards to find the intercept and coefficients. • Used the intercept and coefficients in the equation: LOG MW = RM^2*a +RM*b +c Sea Urchins Intercept 5.072481 Gel #1 15% Gel PSS.RMpurp/L..pictus Square RMVEAverage -1.10049 DTT Removal Lane 2 Average Square RMMolecular Average Log Molecular Predicted Weight Weight Log Molecular Residuals Weight Residuals Squared Coefficients Band 1 0.09 0.008 200000 5.30 5.26 0.04 0.001 Intercept 5.445671 Band 2 0.19 0.038 116500 5.07 5.07 0.00 0.000 Average -2.13847 Band 3 0.22 0.050 97000 4.99 5.02 0.03 0.001 Square RM Average 1.017416 Band 4 0.32 0.100 66000 4.82 4.87 0.05 0.003Regression Statistics Band 5 0.52 0.270 45000 4.65 4.61 0.04 0.002 Multiple R 0.994276 Band 6 0.67 0.449 31000 4.49 4.47 0.02 0.000 R Square 0.988585 Band 7 0.88 0.772 21500 4.33 4.35 0.02 0.000 Adjusted R Square 0.982878 Sums 0.00 0.008 Standard Error 0.044447 •Put the relative mobility of the unknowns into the equation to come up with the following results: Log Molecular Weight Results for 15% Gel Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Band 10 Band 11 Band 12 Band 13 Band 14 Band 15 Band 16 Band 17 Band 18 Band 19 Band 20 S. Purp L. Pict Lane1 Lane3 Lane 4 Lane 5 Lane 6 Lane 7 Lane 8 Lane 9 PREDLOGMWPREDLOGMWPREDLOGMWPREDLOGMWPREDLOGMWPREDLOGMWPREDLOGMWPREDLOGMW 5.287194 5.366035 5.337433 5.337433 5.370153 5.369856 5.325993 5.345953 4.959881 5.325296 5.250127 5.313233 5.211740 5.226168 5.243535 5.067651 4.751998 5.234675 5.122917 4.539243 5.031963 5.188135 5.061425 4.663953 4.624020 5.185351 5.038765 4.965727 5.047612 4.703594 4.562605 4.513194 5.105001 4.754128 4.930680 4.990248 4.639752 4.477437 4.427591 5.066297 4.640942 4.863528 4.944781 4.533252 4.439041 4.399273 4.927542 4.526304 4.837187 4.855694 4.467592 4.366838 4.362973 4.872454 4.362123 4.791981 4.795473 4.427966 4.343091 4.808690 4.764781 4.741470 4.775564 4.728064 4.715694 4.735798 4.702809 4.690736 4.712814 4.673574 4.659512 4.652414 4.645507 4.631915 4.605594 4.603454 4.614171 4.568308 4.568308 4.599074 4.499914 4.512002 4.520488 4.364580 4.451155 4.456455 4.356674 4.422551 4.423228 4.337822 4.343295 4.329330 4.326006 4.333159 4.334527 What type of Cross Validation was done? Quadratic Cross Validation using relative mobility and Log Molecular Weight Cross Validation was not chosen at all The predicted value for the missing band was not close the the actual value in any of the gel cases. Results for Cross Validation Model on Standards Gel #1 15%Gel Gel#4 S.VE purp/L..pictus and Gel #6 TrisVE Supernatant and Gel VE #2 Tris DTT Seminal Pellet Removal TimeTime Course and Jelly Cours Band Band Band Band Band Band Band Band Band Square Residuals Square Residuals Square Residuals Square Residuals Sums 1 0.050049 0.057916 0.043083 0.051572 0.20262 2 0.10454 0.063949 0.091593 0.114309 0.374392 3 0.139053 0.15307 0.155971 0.172042 0.620136 4 0.274772 0.315013 0.337997 0.25786 1.185643 5 0.647048 0.597523 0.601238 0.534674 2.380482 6 0.860863 0.93935 0.91457 0.923428 3.638211 7 1.219472 1.159049 1.41531 3.793831 8 1.774662 1.774662 9 1.370537 1.370537 Why was the SLIC Model chosen for the Gradient Gel #2 ? Residual Sum = 0.00 Residual Squared Sum = 0.00 Largest R^2 = 0.99 Why was the SLIC Model was chosen for the Gel #2? Gel #2 Seminal andSLIC Jelly and Lane 1 Residuals Band 1 0.00338646 Band 2 0.00255562 Band 3 0.00108465 Band 4 0.00613833 Band 5 0.00513082 Band 6 0.00211144 Band 7 0.00387409 Band 8 0.00788489 Band 9 0.00540511 Sums 0.00000000 R-Squared Gel #2 Seminal and LOG-LOG Jelly and Lane 1 Residuals Band 1 0.02354931 Band 2 0.03054624 Band 3 0.02199111 Band 4 0.06126702 Band 5 0.02770156 Band 6 0.09272466 Band 7 0.11599313 Band 8 0.06878414 Band 9 0.21494843 Sums 0.00000000 R-Squared Gel #2 Seminal LOG and Jelly Linear and Lane 1 Residuals Band 1 1.61764360 Band 2 1.55422217 Band 3 1.51345930 Band 4 1.38880849 Band 5 1.28979944 Band 6 1.17819529 Band 7 1.05505930 Band 8 0.90411208 Band 9 0.56946702 Sums 11.07076670 R-Squared - VE TimeSLIC Courses Quadratic Regression Quadratic Regression Residuals Squared Residuals Residuals Squared 0.00001147 0.04133391 0.00170849 0.00000653 0.00296709 0.00000880 0.00000118 0.01503750 0.00022613 0.00003768 0.08140173 0.00662624 0.00002633 0.02365412 0.00055952 0.00000446 0.04925389 0.00242595 0.00001501 0.10904385 0.01189056 0.00006217 0.07433042 0.00552501 0.00002922 0.15090162 0.02277130 0.00019403 0.00000000 0.05174200 0.98849460 0.97175674 VE Time LOG-LOG Courses Local Quadratic Local Quadratic Residuals Squared Residuals Residuals Squared 0.00055457 0.13307584 0.01770918 0.00093307 0.02239282 0.00050144 0.00048361 0.02827159 0.00079928 0.00375365 0.05660829 0.00320450 0.00076738 0.10757500 0.01157238 0.00859786 0.01117711 0.00012493 0.01345441 0.01824890 0.00033302 0.00473126 0.12395746 0.01536545 0.04620283 0.23603719 0.05571356 0.07947863 0.10726760 0.10532374 0.95589400 VE Time LOGCourses Linear Local Linear Local Linear Residuals Squared Residuals Residuals Squared 2.61677082 0.03770193 0.00142144 2.41560655 0.00384170 0.00001476 2.29055906 0.01173907 0.00013781 1.92878902 0.02347037 0.00055086 1.66358258 0.02104061 0.00044271 1.38814415 0.01748156 0.00030560 1.11315014 0.01392608 0.00019394 0.81741865 0.01381797 0.00019094 0.32429269 0.01381797 0.00019094 14.55831368 0.15683723 0.00344898 0.96081131 0.9608 Compare Values: SLIC Type Models: Log( LN(MW) ) = A + B * LN( -LN(RM) ) Compare Log Molecular Weight X = e ^ ( LN( X ) ) Convert Log( LN(MW) ) into Log( MW ) Log( MW) = Log( e ^ LN(MW) ) Log Molecular Weight Results for SLIC Gel #2 Seminal and Jelly and VE Time Courses Intercept 1.05917902 Slope 0.03963918 Lane 3 Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Predicted Predicted MW LOG(MW of Avg RM LN ( -LN(RM) ) MW LN ( Log(MW) ) on Stds. Residuals Residuals^2 Unknown RM LOG(LN(MW)) Unknowns) 0.16 0.61 200000 1.09 1.08 0.0034 1.14681E-05 0.29 1.070674384 2.92 0.32 0.13 116500 1.07 1.06 0.0026 6.53117E-06 0.41 1.075431086 2.93 0.37 -0.01 97000 1.06 1.06 0.0011 1.17647E-06 0.54 1.08058418 2.95 0.44 -0.20 66000 1.05 1.05 0.0061 3.76791E-05 0.62 1.083755314 2.96 0.58 -0.61 45000 1.03 1.04 0.0051 2.63253E-05 0.71 -1.07 31000 1.01 1.02 0.0021 4.45819E-06 0.82 -1.62 21500 1.00 1.00 0.0039 1.50086E-05 0.90 -2.25 13400 0.98 0.97 0.0079 6.21715E-05 0.94 -2.78 6500 0.94 0.95 0.0054 2.92152E-05 Sums R-Squared 0.00 0.000194034 0.99 Graph result of SLIC Model SLIC Plot for Standards and Unknowns Gel 2 Sem inal and Jelly + VE Tim e Courses 1.1 Log (LN(Molecular Weight)) 1.08 1.06 1.04 1.02 1 0.98 0.96 0.94 0.92 -3 -2.5 -2 -1.5 -1 -0.5 Ln (-LN (Relative Moblity)) Standards for Gel 2 Uknow ns for Gel 2 0 0.5 1 Why was the LOG-LOG Model Chosen for 12.5% Gels LOG-LOG Model worked best for the 12.5% Gels (Gel #4 VE + Tris Supernatant Time Course and Gel # 6 VE + Tris Pellet Time Course) Small residuals R^2 > .9 Residuals did not have large sections of positive or negative. The Log-Log Model The Log-Log model is of the form: Log(MW)=a+bLog(RM)+cLog(RM)^2 It incorporates the Log model and the quadratic model to make a more successful madel. Model Comparison Gel #1 15% Gel S. purp/L..pictus SLIC Lane 2 Residuals Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Sums R-Squared Gel #1 15% Gel S. purp/L..pictus LOG-LOG Lane 2 Residuals Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Sums R-Squared Gel #1 15% Gel S. purp/L..pictus LOG Linear Lane 2 Residuals Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Sums R-Squared VE DTT RemovalSLIC Quadratic Regression Quadratic Regression Residuals Squared Residuals Residuals Squared 0.01132444 0.00012824 0.03853645 0.00148506 0.00300890 0.00000905 0.00309766 0.00000960 0.00109427 0.00000120 0.02988180 0.00089292 0.00827156 0.00006842 0.05186572 0.00269005 0.00688821 0.00004745 0.04430756 0.00196316 0.00783585 0.00006140 0.02173100 0.00047224 0.00975655 0.00009519 0.01972984 0.00038927 0.00000000 0.00041095 0.00000000 0.00790229 0.92768402 0.98858508 VE DTT RemovalLOG-LOG Quadratic Interpolation Quadratic Interpolation Residuals Squared Residuals Residuals Squared 0.00026347 0.00000007 0.10070348 0.01014119 0.01672254 0.00027964 0.01238500 0.00015339 0.00637001 0.00004058 0.00503027 0.00002530 0.03252705 0.00105801 0.01377812 0.00018984 0.03261521 0.00106375 0.08413179 0.00707816 0.00057092 0.00000033 0.03163810 0.00100097 0.01074814 0.00011552 0.12022332 0.01445365 0.00000000 0.00255790 0.09314636 0.00867624 0.91630500 VE DTT RemovalLOG Linear Local Linear Local Linear Residuals Squared Residuals Residuals Squared 0.06289249 0.00395547 0.02458821 0.00060458 0.10387439 0.01078989 0.02458821 0.00060458 0.07983165 0.00637309 0.00236728 0.00000560 0.03433984 0.00117922 0.01533503 0.00023516 0.04476902 0.00200427 0.03416853 0.00116749 0.02631810 0.00069264 0.02429362 0.00059018 0.08820824 0.00778069 0.03316659 0.00110002 0.21118107 0.03277527 0.15850747 0.02512462 0.955670994 0.97855752 Predictions Relative Mobilities 0.043103 0.034483 0.12069 0.12931 0.241379 0.241379 0.310345 0.310345 0.431034 0.431034 0.5 0.517241 0.551724 0.543103 0.603448 0.568966 0.637931 0.603448 0.689655 0.689655 Predicted MW 680.5166 869.4138 219.76 203.73 102.6781 102.6781 77.92182 77.92182 54.32988 54.32988 46.16116 44.47475 41.43274 42.15527 37.55081 40.05647 35.32852 37.55081 32.43066 32.43066 0.042017 0.12605 0.193277 0.243697 0.428571 0.504202 0.529412 0.554622 0.596639 0.672269 699.8616 209.5214 131.0499 101.6064 54.67275 45.73904 43.35362 41.19515 38.02157 33.35257 0.033898 0.127119 0.237288 0.322034 0.432203 0.5 0.533898 0.550847 0.59322 0.677966 885.8835 207.5894 104.6232 74.8224 54.16859 46.16116 42.95384 41.50513 38.26216 33.04501 0.033898 0.118644 0.237288 0.305085 0.423729 0.508475 0 0.542373 0.59322 0.669492 0.033898 0.118644 0.220339 0.305085 0.40678 0.508475 0 0.542373 0.584746 0.677966 3135.675 840.6008 441.0293 332.2724 217.5011 182.5361 169.8955 160.3076 149.1631 131.2589 Average 783.9189 210.1502 110.2573 83.0681 54.37527 45.63403 42.47387 40.07689 37.29076 32.81472 Conclusion Different models worked better on different on certain gel types. The Quadratic Regression Model on the 15% gel, SLIC Model for the gradient gel and the LOG-LOG Model worked best for 12.% gels. This process could be much improved if there was more data on the different gel types. Thank You Open for Questions…