Genomics, Computing, Economics 10 AM Tue 13-Feb Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508) http://openwetware.org/wiki/Harvard:Biophysics_101/2007 Binomial, Poisson, Normal 0.10 0.09 0.08 0.07 Normal (m=20, s=4.47) 0.06 Poisson (m=20) 0.05 Binomial (N=2020, p=.01) 0.04 0.03 0.02 0.01 0.00 0 10 20 30 40 50 Binomial frequency distribution as a function of X {int ...n} p and q p q q = 1 – p Factorials 0! = 1 two types of object or event. n! = n(n-1)! Combinatorics (C= # subsets of size X are possible from a set of total size of n) n! X!(n-X)! =C(n,X) B(X) = C(n, X) pX qn-X m=np s2 =npq (p+q)n = B(X) = 1 B(X: 350, n: 700, p: 0.1) = 1.53148×10-157 =PDF[ BinomialDistribution[700, 0.1], 350] Mathematica ~= 0.00 =BINOMDIST(350,700,0.1,0) Excel Poisson frequency distribution as a function of X {int ...} P(X) = P(X-1) m/X = mx e-m/X! s2 =m n large & p small P(X) @B(X) m=np For example, estimating the expected number of positives in a given sized library of cDNAs, genomic clones, combinatorial chemistry, etc. X= # of hits. Zero hit term = e-m Normal frequency distribution as a function of X {-...} Z= (X-m)/s Normalized (standardized) variables N(X) = exp(-Z2/2) / (2ps)1/2 probability density function npq large N(X) @B(X) Mean, variance, & linear correlation coefficient Expectation E (rth moment) of random variables X for any distribution f(X) First moment= Mean m; variance s2 and standard deviation s E(Xr) = Xr f(X) m=E(X) s2 =E[(X-m)2] Pearson correlation coefficient C= cov(X,Y) = E[(X-mX )(Y-mY)]/(sX sY) Independent X,Y implies C =, but C =0 does not imply independent X,Y. (e.g. Y=X2) P = TDIST(C*sqrt((N-2)/(1-C2)) with dof= N-2 and two tails. where N is the sample size. www.stat.unipg.it/IASC/Misc-stat-soft.html One form of HIV-1 Resistance Association test for CCR-5 & HIV resistance Alleles CCR-5+ D ccr-5 total Obs Neg ObsSeroPos total ExpecNeg ExpecPos 1278 1368 2646 1305 1341 130 78 208 103 105 1408 1446 2854 dof=(r-1)(c-1)=1 ChiSq=sum[(o-e)^2/e]= 15.6 P 0.00008 Samson et al. Nature 1996 382:722-5 Association test for CCR-5 & HIV resistance Alleles CCR-5+ D ccr-5 total Obs Neg ObsSeroPos total ExpecNeg ExpecPos 1278 1368 2646 1305 1341 130 78 208 103 105 1408 1446 2854 dof=(r-1)(c-1)=1 ChiSq=sum[(o-e)^2/e]= 15.6 P 0.00008 Samson et al. Nature 1996 382:722-5 But what if we test more than one locus? Y= Number of Sib Pairs (Assocation) X= Number of Alleles (Hypotheses) Tested Y= Number of Sib Pairs (Association) X= Population frequency (p) GRR=1.5, p= 0.5 (population frequency) 1,600 GRR=1.5, #alleles=1E6 1,400 1E+10 1,200 1E+9 1,000 1E+8 800 600 1E+7 | 400 1E+6 200 1E+5 0 1E+4 1E+4 1E+6 1E+8 1E+10 1E+12 1E+14 1E+16 1E+18 1E+20 1E+22 1E+3 | Y= Number of Sib Pairs (Association) X= Genotypic Relative Risk (GRR) 1E+2 1 0.1 0.01 0.001 0.0001 0.00001 1E-06 1E-07 1E-08 1E-09 #alleles=1E6, p=0.5 (population frequency) 1E+8 The future of genetic studies of complex human diseases. Ref (Note above graphs are active spreadsheets -- just click) 1E+7 1E+6 [based on Risch & Merikangas (1996) | Science 273: 1516] 1E+5 1E+4 1E+3 1E+2 GRR = Genotypic relative risk | 1E+1 0.001 0.01 1.001 0.1 1.01 1 1.1 2 10 11 100 101 1000 1,001 10,001 10000 1-GRR GRR Class outline (1) Topic priorities for homework since last class (2) Quantitative exercises so far: psycho-statistics, combinatorials, exponential/logistic, bits, association & multi-hypotheses (3) Project level presentation & discussion (4) Discuss communication/presentation tools Spontaneous chalkboard discussions of t-test, genetic code, non-coding RNAs & predicting deleteriousness of various mutation types.