The Co-Evolution of Genetics and Statistics Bio-Stat seminar 2 February 2011 From the First • Gregor Mendel is recognized as the founder of genetics • Was the first to use “math” to define a biologic process. Role of Biostat • Fisher suggested in 1936 that Mendel’s data was a little too good. • Fisher is thought of as a geneticist. The Chemistry of DNA Structure of DNA Double helix=2nm 10 Base pairs/turn=10nm 140 BP/nucleosome How Has “Math” Driven Genetics? Genotype Phenotype GG=0.25 Gx=0.75 Gg=0.50 gg=0.25 gg=0.25 Some Questions in Genetics • There are 4 bases in DNA • There are 20 amino acids • How do you order 4 to code for 20? Simple Math 4=4 4X4=16 4X4X4=64 So 3 bases required at a minimum More Questions • If 3 required, spacing? Boxcar= ATGCAGT Sequential=ATGCAGT Spaced=ATGaCAGaT Solution First a homo-polymer (TTTTTTT) This produce a peptide of phenylalanine Then a co-polymer TTCCTTCCTTCC The pattern of AA would allow dissection Example • TTCCTTCCTTCCTTCCTTCC • Boxcar Sequential • TTC=Phe TTC=Phe • TCC=Ser CTT=Leu • CCT=Pro CCT=Pro • CTT=Leu TCC=Ser How Did We Get Here? • Genetics is the study of variation • “Easy” genetics involved variation by genes of major effect. • Sickle cell, cystic fibrosis are examples of single gene diseases Finding Single Genes • Collect families that show the trait • Analyze their DNA find sections that are common with trait • Assess the probability that these are shared randomly LOd ratio How is DNA Measured? • Before the age of the genome, centimorgans • • • Humans have 22 paired chromosomes These segregate at cell division independently Along a chromosome the probability that a trait is near something is measured in centimorgans DNA is in Base Pairs Now • The chromosomes are numbered largest to smallest (1-22) • Positions are now located by Chr # and position along that Chr. (Chr2:108234125) • There are a little more than 9 3x10 BP Mutations vs. SNP • Currently the trend is to talk of “variation” not mutation. • SNP=Single Nucleotide Polymorphism • Most SNP are dimeric (A/G) and have a frequency (0.895/0.105) • SNP’s mark positions not “mutations”! Other Terms • InDel • VNTR • marker • Coding • NonCoding • synonymous • promoter • epigenetic • imprinting • mitochondria l • Intron/exon Data Sets • Arrays • SNP • Expressio Genotype SNP - Looking for regions of DNA associated with a trait n Phenotype Expression - What genes are “produced” How the biochemistry is changed Help From the “Math”Gifted! • These are complex datasets • Analysis can be “simple”, it shouldn’t be! • Getting in early is critical Next Big Challenge • • Network analysis •Andrew Mugler, Boris Grinshpun, Riley Franks, and Chris H. WigginsStatistical method for revealing form-function relations in biological networksPNAS 2011 108 (2) 446-451; published ahead of print December 23, 2010, doi:10.1073/pnas.