ppt

advertisement
Human Evolution Timelines
Research History
Evolution History
Source: Jobling, Hurles & Tyler-Smith (2004) Human Evolutionary Genetics.
Human Evolution
The Data
Genetic: Allele Frequencies, SNPs, Haplotypes
Non-genetic: Language, culture, pets,pathogens, culture,..
The Dynamics
Mutation, selection, recombination,
The Genealogical Structure
Phylogeny, Ancestral Recombination Graph, Pedigree
Relationship to the great Apes, Ancestral Population of Human/Chimp Ancestor, Out of Africa Ancestral Population
Structure, Selection, Migrations & Age of Alleles.
Genealogies
Iceland
Models of Pedigrees
Languages & Pathogens
Populations & Basic Genealogical Structures
Pedigree: Trace the ancestry of individuals
Grand parents
Phylogeny: Trace the ancestry of sequence
points.
Parents
Now
ARG: Trace the ancestry of sequences
Other Genealogical Structures are possible
network, language merging, population splitting
Recombination
Recombination:
Gene Conversion:
1 meiosis
•Total Haploid length males: 25.9 M - females: 44.6 M.
•Gene conversions 1-2 orders higher. Length 300-2000 pb.
Lander et al.(2001) “Initial sequencing and analysis of the human genome” Nature 409.860-912. + Kong,E. et al.(2002) “A high resolution recombination map of the human genome” Nature Genetics
Mutations and Mutation Rates
1 mitosis or generation
Average Number of Mitoses
•
Single nucleotide substitutions: ~10-7
Per Male generation (15:35 .. 20:150)
•
Microsatellites (~100.000): ~10-2
Per Female generation: ~24
•
Small insertion deletions: ~10-8
A
A
A
C
C
A
A
A
C
C
A
A
A
C
C
Selection: Positive & Negative
Crow,JF (2000) “The Origins, Patterns and Implications of Human Spontaneous Mutation” Nature Review Genetics 1.1.40-47 + Strachan and Read (2004) chapter 11 +Jobling, Hurles and TylerSmith (2004) chapter 2
Coalescent Issues
1. The number of genetic ancestors
2. When gene-trees differ from species trees
3. Out of Africa
4. Ages of Alleles
5. Allele Gradients
6. Number of Genetic Ancestors
7. Selective Sweeps
Human History Levels: Physical, Cultural & Genealogical
The physical population size, N(t), and the efficient population size,
Ne(t) are separate concepts.
i. N(t)can mainly be studied by historical/archeological means,
ii. Ne(t) can be studied genealogically, for instance by tracing
the ancestries of DNA sequences.
Main departures from simplest Population Genetical Models:
A. Long epochs of exponential growth at increasing rates
B. Bottlenecks & small populations.
C. Migrations & Geographical subdivisions
Our relationship to the great Apes.
From Nei,2003
13 Myr
7 Myr
5.5 Myr
1 Myr
Chimp
Pygmee Chimp
Humans
Gorilla
Orangutan
Ancestral Population of Human and Chimp
7 Myr
G
H
5 Myr
Now
Human
P(GeneTree  SpecieTree )  2 / 3et / 2 N e
C
Chimp
Gorilla
Example: Chen & Li (2001) 53 triads: 31 (H,C), 10 (H,G) & 11 (C,G)
Out-of-Africa and different degrees of replacements
Total replacement
Europe
No replacement
Partial replacement
1-1.2 Myr
1-1.2 Myr
1-1.2 Myr
80-130 Kyr
80-130 Kyr
80-130 Kyr
Africa
Asia
Europe
Africa
Asia
Europe
Africa
Asia
Example: Takahata (2001) found data could be explained by total replacement.
Allele Frequencies and Principal Components
Cavalli-Sforza,2001
•Allele frequencies for different localities are subjected to a smoothing procedure.
•Principle Components are found and projected on geographical maps.
•Strongly criticized (Sokal et al.): even no geographical structure will “look like”
geographical structure, no timing of gradients,...
1.
Agriculture 6-10 Kyr
2.
Greek Colonisation 3 Kyr
3.
Retraction of the Basques
4.
Uralic People
5.
Horse domestication
Time slices
All positions have found a
common ancestors on
one sequence
All positions have found a
common ancestors
Time
1 2
1
1 2
1 2
1 2
1 2
N
Population
Number of genetic ancestors to the Human Genome
time
Sr– number of Segments
E(Sr) = 1 + r
C
C
C
R
R
R
sequence
Simulations
Statements about number of
ancestors are much harder to make.
Wiuf conjectured ~r/ln(r)
Applications to Human Genome
Parameters used
Chromosome 1:
4Ne 20.000
Segments
(Wiuf and Hein,97)
Chromos. 1: 263 Mb.
52.000
263 cM
Ancestors
6.800
All chromosomes Ancestors
86.000
Physical Population. 1.3-5.0 Mill.
A randomly picked ancestor:
(ancestral material comes in batteries!)
0
260 Mb
0
52.000
*35
0
7.5 Mb
8360
6890
*250
0
30kb
Many sampled alleles relative to Ne
Wakeley03, Pitman, Schweinberg
1. Simultaneous Events
2. Multifurcations.
3. Underestimation of Coalescent Rates
Cystic Fibrosis
(Wiuf 2000)
F508 – possibly maintained by heterosis (1.023)- higher
resistance to Salmonella infections.
Data:
1. Frequency of F508-allele - .022.
2. Inter variability in 1.705 individuals 46 variable
positions.
3. Model of human demography.
Model parameters: mutation rate, heterosis advantage and an
exponential growth model of human population expansion.
Estimated age of F508
is estimated to be
*
Pedigree Issues
Chinese
http://demography.anu.edu.au/People/Staff/zhongwei.html
Burke’s British Peerage
http://www.burkes-peerage.net/sites/wars/sitepages/home.asp
Mormons
http://genealogy-mormons.com/
Icelandic
http://www.decode.com + Helgason, A. et al. (2003 June) “A population-wide coalescent analysis of
Icelandic matrilineal and patrilineal genealogies: Evidence for a faster evolutionary rate of mtDNA
lineages than Y-chromosomes” American Journal Human Genetics.
i. Icelandic Pedigree
ii. Theoretical Models
Quebec French
Heyer and Tremblay, 1998 PNAS
Icelandic Genealogies
Helgason, 2003
Total Genealogy
Males only
Females only
1848
2
Ancestor cohort
1
1
1892
2
Year
Of (June 2002) 276,00
Icelanders 131,060 born
after 1972 was traced
back.
2
3
1
2
1
2
1
1
1
1972
Contemporary cohort
2002
Icelandic Genealogies
Helgason, 2003
Ancestors to 1972 cohort
Backtracable
Matrilines
N = 31,817
Patrilines
73.9%
4 .3
g=
20%
22.1%
8.3%
3 .8
77.9%
25%
91.7%
15%
26.1%
13.8%
86.2%
N = 64,150
10%
Descendant cohort
born after 1972
N = 66,910
5%
25
50
75
100
25
50
75
No. of descendants
No. of descendants
Matrilines
Patrilines
100
Matrilines
N = 20,443
93.4%
15.0%
Patrilines
Ancestral cohort
born 1698-1742
N = 18,023
89.7%
10.3%
g=7
.9
6.6%
12.5%
10.0%
g=8
.8
Percent of ancestors
Percent of ancestors
N = 31,659
g=
Matrilines
Patrilines
Ancestral cohort
born 1848-1892
7.5%
5.0%
2.5%
29.3%
38.2%
25
50
75
100 125 150 175 200 225 250 275 300 325 350 375
No. of descendants
25
50
75
100 125 150 175 200 225 250 275 300 325 350 375
No. of descendants
70.7%
61.8%
N = 64,150
Descendant cohort
born after 1972
N = 66,910
45
Icelandic Genealogies
Age of parent (years)
Helgason, 2003
Patrilines
Variation in annual
offspring number greater
for females in males, due
to shorter generation time.
Matrilines
40
35
30
25
20
1700
1750
1800
1850
1900
1950
Birth year of individual
Positive correlation in
fertility between parentoffspring.
2
Average number of offspring
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
Patrilines
0.2
0
1700
Matrilines
1750
1800
1850
1900
Birth year of parent
1950
2000
2000
Finding (Great)k Grand Parents.
Finding Ancestral Individuals.
Joe Chang 1999 Dec. Adv. Appl. Prob.
11
10
9
8
7
6
5
4
3
2
1
0
Let T be the time, when somebody was everybody’s ancestor.
Changs’ result:
lim T*/log2(N) =1
prob. 1
NOW
Combining Ancestral Individuals and the Coalescent
Finding Common Ancestors.
Wiuf & Hein, 2000.
NOW
Unify the two processes:
Sample more individuals
Let each have p parents. ( p – possibly stochastic >= 1).
Result:
A discontinuity at 1.
For p>1 change log2logp
Comment: Genetic Ancestors is a vanishing set within Genealogical Ancestors.
Derrida
G +1
G
Offspring Distribut ion for a marriage , pk .
m is expected offspring number.
Recursion: w (G  1) 
1
w' (G )

2  ' children of 
 individual,  ancestor in tree, w - weight
probability that uni. random path leads to .
Initialization: w (0)   ,
Kammerle 89: Pair Moran Model
A pair of children are born – they choose parents randomly.
A pair is erased and the children pair take their place.
A. The stationary distribution of number of ancestors to present population is
hypergeometric:
 N  N 

 i 

 i  1





i 
 2N 

 N  1



B. R
N
 ( )i 1,.. N
R N - N/2
then
   N (0,1 / 8)
N
R(t )  N / 2
C. Rˆ N (t ) :
and SN (t) : Rˆ N (t ). SN (t) converges to Ornstein - Uhlenbeck
N
process with infinitesi mal variance  (y)  1/4 and infinitesi mal drift  (y)  -y.
y
0
Non-Contributing Ancestors
Recombination:
22
1
x
y
22
1
….
22
x
x
1
x
….
22
46 packets
2k
x
….
22
y
1
21
0
1
<≈72 + 46 packets
46 packets
1
k
< ≈k*72 + 46 packets
46 packets
1
Ancestors:
No Recombination:
Generation:
Kevin Donnelly, 1983 TPB
y
1
x
….
22
46 packets
y
Non-Contributing Ancestors
Yun song- pers.comm., 2003 Kevin Donnelly, 1983
The probability of
1. Any non-contributing ancestor
2.That a randomly chosen ancestors is non-contributing
1
2
4
8
16
32
64
128
256
512
Pedigree Inference
Prior on Pedigrees
Three Processes
1. Choosing Parents
2. Recombination
3. The Mutational Process
Probability of data given pedigree
Mother
Father
Posterior on Pedigrees
Elston-Stewart (1971) -Temporal Peeling Algorithm
Lander-Green (1987) - Genotype Scanning Algorithm
Inheritable phenomena
Genetic Material
Sequences
“Allele Frequencies”
Language
Culture
Pathogens
Pests
Pets
Morphological Characters
Pathogen phylogenies
Falush 2003
Helicobacter pylori is transmitted from mother to child.
Falush et al. sequenced 8 genes from 370 strains from 27
populations – 3850 nucletides each.
5 ancestral populations:East Asia, Euro1, Euro2, Afr1 Afr2
Structure assign each polymorphism to an ancestral population.
American indians are grouped as asian showing that H.pylori
infection is ancient.
Diversity of H.pylori 50 times larger than humans.
Much recombination – i.e. positions can be treated as independent
A. Maori is east asian.
B. Inuit is Euro1 + Euro2
C. South African Afr2
D. English
Cavalli- Sforza: Language Trees
Cavalli-Sforza (1997) Genes Peoples and Languages PNAS 94.7719-24
Principle of Comparison.
Loss of cognates
(“homologous” words)
Syntax Comparison.
Sound use.
Reconstruction (dependent on
interpretation) – stretches
back 2-6.000 years dependent
on criteria.
Historical Linguistics
William Jones 1776 observes similarities between Sanskrit, Greek & Latin
Swadesh (1952) makes on of the first glottochronological studies
Kruskal, Dyer & Black (1971) large successful investigation.
Principles:
Distance - Swadesh’ rule. 20% lost per millenium.
Parsimony
Compatibility
Likelihood
Criticisms:
Word Loss is not clocklike
Languages and merge and borrow giving non-tree like structure
Not much research goes into this area.
Global Phylogeny
Khoisan
African
Niger-Kordofanian
Congo-Saharan
Cavalli-Sforza,2001
Ruhlen, 1994
Nilo-Saharan
Afro-Asiatic
Kartvelian
Dravidian
Indo-European
Uralic
Eurasiatic 20-10 Kyr
Eurasion/American 40-20 Kyr
Altaic
Eskimo-Aleut
Chukchi-Kamchatkan
Home sapiens sapiens 100-70 Kyr
Amerind
Na-Dene
Eurasian 60-40 Kyr
Dene-Caucasian 40-20 Kyr
Sino-Tibetan
Caucasian/Basque/Burushaski
Asian 70-50 Kyr
Austronesian
Austro-Tai
Daic
Miao-Yao
Austric
Austro-Asiatic
Dene-Caucasian 40-20 Kyr
Pacific
Indo-Pacific
Australian
Indo-European
Language Trees
Afghan
Baluchi
Persian
Osetic
Bengali
Dyen, Kruskal & Black, 1992
Hindi
Piazza, Cavalli-Sforza, 2001
Punjabi
Marathi
Nepali
Kashmiri
Singhalese
Welch
Irish
Celtic
Breton
Bulgarian
Macedonian
Belorussian
Ukranian
Polish
Chech
Russian
Slavic
Serbo Croatian
Slovenian
Latvian
Lithuanian
Walloon
Italian
Ladin
Portugese
Spanish
Sardinian
Rumanian
Danish
Swedish
Riksmaal
Faraoese
Icelandic
Dutch
German
Frisian
English
Greek
Armenian
Albanian
9000
8000
7000
6000
5000
4000
3000
2000
1000
Romance Germanic
French
Germanic Language Trees
From Embleton, 1986
Swedish
1540
Danish
1803
Norwegian
1051
Faraoese
Icelandic
1842
194
English
Tok Pisin
1051
Frisian
246
Flanders
1239
Africaans
1423
1025
1668
1051
1234
Dutch
Yiddish
1476
Hamburg Lower Saxony
1558
Pennsylvanian German
German
TrS
The Coalescent & Human Evolution (11.6.04)
Human History
Methodological Problems: Reconstucting haplotypes, defining haplotype blocks + HapMap.
Relationship to the great Apes, Ancestral Population of Human/Chimp Ancestor, Out of Africa, The Neanderthal.
Human Population Growth, Ancestral Population Structure, Selection, Migrations & Age of Alleles.
SNPs Haplotypes, Recombination Hotspots & Haplotype Blocks.
Individual Stories: Mitochondria, Y, autosomal chromosomes & alleles.
Emperical Genealogies
Iceland
Other Genealogical Issues: Genealogical Ancestors, Genetic and Non-Contributing Ancestors
Heritable Characters
Languages
Associated Animals, Plants & Pathogens
Surnames
Morphological Characters
The Role of Coalescent Theory
Download