who was Bayes? • Reverend Thomas Bayes (1702-1761) – – – – part-time mathematician buried in Bunhill Cemetary, Moongate, London famous paper in 1763 Phil Trans Roy Soc London was Bayes the first with this idea? (Laplace?) • basic idea (from Bayes’ original example) – two billiard balls tossed at random (uniform) on table – where is first ball if the second is to its left (right)? first second Y=0 BS Yandell © 2005 Y=1 prior pr() = 1 likelihood pr(Y | ) = 1–Y(1–)Y posterior pr( |Y) = ? Plant Microarray Course 1 what is Bayes theorem? • posterior = likelihood * prior / C pr( parameter | data ) = pr( data | parameter ) * pr( parameter ) / pr( data) pr ( , Y ) pr (Y | ) pr ( ) pr ( | Y ) pr (Y ) pr (Y ) Y=0 Y=1 • prior: probability of parameter before observing data – pr( ) = pr( parameter ) – equal chance of first ball being anywhere on the table • posterior: probability of parameter after observing data – pr( | Y ) = pr( parameter | data ) – more likely second to left if first is near right end of table • likelihood: probability of data given parameters – pr( Y | ) = pr( data | parameter ) – basis for classical statistical inference about given Y BS Yandell © 2005 Plant Microarray Course 2 6 8 10 prior mean actual mean n small prior n large n large prior mean n small prior actual mean Bayes posterior for normal data 12 14 16 6 y = phenotype values 10 12 14 16 y = phenotype values small prior variance BS Yandell © 2005 8 Plant Microarray Course large prior variance 3 Bayes posterior for normal data model environment likelihood prior Yi = + Ei E ~ N( 0, 2 ), 2 known Y ~ N( , 2 ) ~ N( 0, 2 ), known posterior: single individual mean tends to sample mean ~ N( 0 + B1(Y1 – 0), B12) sample of n individuals ~ N BnY (1 Bn ) 0 , Bn 2 / n with Y sum Yi / n {i 1,..., n} fudge factor (shrinks to 1) BS Yandell © 2005 Bn n n 1 1 Plant Microarray Course 4 n large n small prior posterior genotypic means Gq 6 qq BS Yandell © 2005 8 10 Qq 12 y = phenotype values Plant Microarray Course 14 16 QQ 5 posterior genotypic means Gq posterior centered on sample genotypic mean but shrunken slightly toward overall mean prior: Gq ~ N Y , 2 posterior: Gq ~ N BqYq (1 Bq )Y , Bq 2 / nq nq count {Qi q}, Yq sum Yi / nq {Qi q} fudge factor: BS Yandell © 2005 Bq nq nq 1 1 Plant Microarray Course 6 Are strain differences real? SREBP1 BTBR SCD1 2.6 2.3 B6 BTBR BTBR Plant Microarray Course BTBR G6Pase 0.0 1.0 2.0 1.0 1.4 1.8 2.2 2.2 1.8 1.4 B6 B6 3.0 PEPCK BTBR BS Yandell © 2005 BTBR 2.9 2 -2 -1 0 BTBR islet BTBR B6 PPARalpha ACO B6 muscle BTBR 1 0.5 B6 B6 B6 PPARgamma -0.5 0.0 few d.f. per gene Can we trust SDg ? 1.2 1.9 1.7 -2.5 B6 noise negligible? liver GPAT 1.6 -1.5 similar pattern parallel lines no interaction fat FAS 2.1 strain differences? B6 BTBR B6 BTBR 7 Bayesian shrinkage of gene-specific SD • gene-specific SD from replication – SDg = gene-specific standard deviation (df = 1) • robust abundance-based estimate – (Ag) = smoothed over mRNA – depends only on abundance level Ag (or constant) • combine ideas into gene-specific hybrid – “prior” g2 ~ inv-2(0, (Ag)2) – “posterior” shrinkage estimate 1SDg2 + 0(Ag)2 1 + 0 – combines two “statistically independent” estimates BS Yandell © 2005 Plant Microarray Course 8 1.00 SD for strain differences gene-specific g 0.50 smooth of g main effects liver (Ag) interaction fat-liver (Ag) B6 0.05 SD = spread 0.10 0.20 fat (Ag) B6 fat liver muscle islet BTBR BTBR BS Yandell © 2005 -2 -1 Plant Microarray Course 0 1 average intensity 2 3 9 B6 0.05 95% 82 limits new (shrunk) g size of shrinkage 1g2 + 0(Ag)2 1 + 0 SD = spread 0.10 0.20 gene-specific g abundance (Ag) 0.50 1.00 Shrinkage Estimates of SD B6 fat liver muscle islet BTBR BTBR BS Yandell © 2005 -2 Plant Microarray Course -1 0 1 average intensity 2 3 10 How good is shrinkage model? 0.8 prior for gene-specific 0 = 5.45, = 1 2 approximation with 0 = 3.56, = .809 0.4 0.2 0.0 2 approximation Density histogram of ratio g2 / (Ag)2 empirical Bayes estimates 0.6 g2 ~ inv-2(0, (Ag)2) fudge to adjust mean 1g2 + 0(Ag)2 1 + 0 0 BS Yandell © 2005 2 Plant Microarray Course 4 6 8 10 11 B6 B6 liver muscle 10 5 0 -5 islet BTBR -15 fat -10 fat-liver interaction shrinkage-based abundance-based 9 genes identified S = (D-center)/spread 15 Effect of SD Shrinkage on Detection 0.02 BTBR BS Yandell © 2005 0.10 0.50 2.00 10.00 A = average intensity Plant Microarray Course 12 QTL Mapping (Gary Churchill) Key Idea: Crossing two inbred lines creates linkage disequilibrium which in turn creates associations and linked segregating QTL QTL Marker BS Yandell © 2005 Trait Plant Microarray Course 13 Bayes factors for comparing models • goal of BF: balance model fit with model "complexity“ – want “best model” that captures key features (model bias) – want to avoid “overfitting” the data in hand (poor prediction) • what is a Bayes factor (BF)? – ratio of posterior odds to prior odds – ratio of model likelihoods • BF is same as Bayes Information Criteria (BIC) – penalty on likelihood ratio (LR) • want Bayes factor to be much larger than 1 (ideally > 10) pr ( model 1 | data ) / pr ( model 2 | data ) pr (data | model 1 ) BF12 pr ( model 1 ) / pr ( model 2 ) pr (data | model 2 ) 2 log( BF12 ) 2 log( LR) ( p2 p1 ) log( n) BS Yandell © 2005 Plant Microarray Course 14