The Co-Evolution of Genetics and Statistics

advertisement
The Co-Evolution of
Genetics and
Statistics
Bio-Stat seminar 2 February 2011
From the First
• Gregor Mendel is recognized as the
founder of genetics
• Was the first to use “math” to define a
biologic process.
Role of Biostat
• Fisher suggested in 1936 that Mendel’s
data was a little too good.
• Fisher is thought of as a geneticist.
The Chemistry of
DNA
Structure of DNA
Double helix=2nm
10 Base pairs/turn=10nm
140 BP/nucleosome
How Has “Math”
Driven Genetics?
Genotype
Phenotype
GG=0.25
Gx=0.75
Gg=0.50
gg=0.25
gg=0.25
Some Questions in
Genetics
• There are 4 bases in DNA
• There are 20 amino acids
• How do you order 4 to code for 20?
Simple Math
4=4
4X4=16
4X4X4=64
So 3 bases required
at a minimum
More Questions
• If 3 required,
spacing?
Boxcar= ATGCAGT
Sequential=ATGCAGT
Spaced=ATGaCAGaT
Solution First a homo-polymer (TTTTTTT)
This produce a peptide of phenylalanine
Then a co-polymer TTCCTTCCTTCC
The pattern of AA would allow dissection
Example
• TTCCTTCCTTCCTTCCTTCC
• Boxcar Sequential
• TTC=Phe TTC=Phe
• TCC=Ser
CTT=Leu
• CCT=Pro
CCT=Pro
• CTT=Leu
TCC=Ser
How Did We Get
Here?
• Genetics is the study of variation
• “Easy” genetics involved variation by
genes of major effect.
• Sickle cell, cystic fibrosis are examples
of single gene diseases
Finding Single Genes
• Collect families that show the trait
• Analyze their DNA find sections that are
common with trait
• Assess the probability that these are
shared randomly LOd ratio
How is DNA
Measured?
•
Before the age of the genome,
centimorgans
•
•
•
Humans have 22 paired chromosomes
These segregate at cell division
independently
Along a chromosome the probability
that a trait is near something is
measured in centimorgans
DNA is in Base Pairs
Now
• The chromosomes are numbered
largest to smallest (1-22)
• Positions are now located by Chr # and
position along that Chr.
(Chr2:108234125)
• There are a little more than
9
3x10
BP
Mutations vs. SNP
• Currently the trend is to talk of
“variation” not mutation.
• SNP=Single Nucleotide Polymorphism
• Most SNP are dimeric (A/G) and
have a frequency (0.895/0.105)
• SNP’s mark positions not “mutations”!
Other Terms
• InDel
• VNTR
• marker
• Coding
• NonCoding
• synonymous
• promoter
• epigenetic
• imprinting
• mitochondria
l
• Intron/exon
Data Sets
• Arrays
• SNP
• Expressio
Genotype
SNP - Looking
for regions of
DNA associated
with a trait
n
Phenotype
Expression - What genes
are “produced”
How the biochemistry
is changed
Help From the
“Math”Gifted!
• These are complex datasets
• Analysis can be “simple”, it shouldn’t
be!
• Getting in early is critical
Next Big Challenge
•
•
Network analysis
•Andrew Mugler, Boris Grinshpun, Riley Franks, and Chris H. WigginsStatistical
method for revealing form-function relations in biological networksPNAS
2011 108 (2) 446-451; published ahead of print December 23, 2010,
doi:10.1073/pnas.
Download