BIOL 303: Genetics

advertisement
Deutsch 1
Ashley Deutsch
Dr. Ely
BIOL 303: Genetics
November 1, 2014
Human Migration Population Studies Using Multiple Data Types
The human species has a worldwide distribution yet the most recent common human
ancestor was located in Africa 100,000 to 200,000 years ago (Pakendorf and Stoneking 2005).
Due to this recent origin, the human population is difficult to study because of a low number of
accumulated polymorphisms useful for study (Wirth 2004). The global distribution of the human
species has been shaped by demographic events, such as migration and variation of population
size over time (Via et al. 2011). Understanding DNA variation among human populations is of
importance for medicine, developmental biology, and creating a history of H. sapiens (CavalliSforza and Feldman 2003). Haploid markers, mitochondrial DNA and Y-chromosomes, are often
useful in human population studies (Cavalli-Sforza and Feldman 2003); however, in inferring
population history, the incorporation of multiple data types is essential (Via et al. 2011). This
incorporation can be done by supplementing data from one haploid marker with another haploid
marker or nuclear DNA variation (Pakendorf and Stoneking 2005) or by incorporating social and
geographical information, including historical and archaeological data, with the genetic
information (Via et al. 2011). Less traditionally yet with growing implementation (Devi et al.
2007), language and cultural traits, along with commensals and parasites that modern humans
brought with them during migration, that have co-evolved with humans can be used to support
genetic data in migration studies (Cavalli-Sforza and Feldman 2003). According to Cavalli-
Deutsch 2
Sforza and Feldman (2003), the use of multidisciplinary approaches in these ways has been
essential to advances in the understanding of human evolutionary history.
A recent study by Secher et al. (2014) utilized a multidisciplinary approach to population
genetics. In this study, Secher et al. looked at mitochondrial (mt) DNA of 230 individuals from
Africa, Europe, and the Middle East, who belonged to the U6 haplogroup, in order to construct a
phylogeny. mtDNA is inherited only maternally without recombination and with high mutation
rates making it a good marker for tracing lineages (Pakendorf and Stoneking 2005). This study
looked at mutations within the mtDNA highly variable region (HVR) 1 in order to track
dispersion and create the phylogeny. The U6 haplogroup is a group of similar mtDNA sequences
with a common ancestor that is made up, mainly, of individuals in North Africa. The 230
samples of mtDNA were sequenced and the differences between individual sequences were
analyzed. Similar studies had previously been conducted, however, with smaller sample sizes
and less complete sequencing, making this study’s results more exact. The study used the
mtDNA mutation accumulation rate of one mutation every 3624 years previously published by
Soares et al. (2009) to estimate the coalescence age of each haplotype and subhaplotype within
U6. Because of the low rate of mutation, it can be assumed that all individuals with the same
mutation share a common ancestor from which they descended and that haplotypes with a greater
number of mutations are more distantly related than those with few. Using these data along with
the calculated coalescence ages, Secher et al. constructed a phylogenetic tree with branching
ages for all of the 230 sequences in the study. The most recent common ancestor of the U6 line
was calculated using the mutation rate to have lived approximately 35.3 kya (thousand years
ago). This conclusion is similar to the results of previous studies. This result was then analyzed
with respect to climate, as is important in population studies. The time of the common ancestor
Deutsch 3
occurred during the Early Upper Paleolithic error before the glacial maximum, yet during a time
that was cold and dry enough to force individuals to follow a North African coastal route. The
data were then used to construct a phylogeography, the genetic and geographic distribution of a
species. Using only the HVR1 sequences, the authors mapped the distribution of U6 and its subgroups by analyzing the frequencies throughout Africa, Europe and the Middle East (figure 1).
This figure shows that the total U6a haplogroup (totU6a) has an area of high frequency in
northeast and southwest Africa. The map
of U6a without the 16189 transition
mutation (U6a) has a high frequency
only in northeast Africa suggesting this
region as the origin of this haplogroup.
The authors concluded that more data is
needed to evaluate the most probable
origin of the U6 haplogroup. A
calculation was then performed to
estimate the origin of the U6 haplogroup
outside of Africa using anthropological,
climate and geographic data. The authors
made the assumption that the North
African coastal route was 5,000 km. To
Figure 1: (Secher et al. 2014)
The figure represents distribution maps, based on
HVI frequencies, for total U6 (U6), total U6a (Tot
U6a), U6a without 16189 (U6a), U6a with 16189
(U6a-189), U6b'd, U6c, U6b and U6d.
travel this distance, the individuals with
the U6 haplotype would have had to
migrate at a rate of 11.2 km/year, which
Deutsch 4
is reasonable for Paleolithic hunter gatherers. Given the mutation rate, there is likely a 7,000 year
gap between the formation of the U macro-haplogroup and the U6 haplogroup. The mutation rate
and migration rate together indicate that the origin of the U macrohaplogroup was about 4,000
km outside of Africa, in Eurasia. This migration route prediction, which included migration rate
and climate, correlated with archeological data.
The migration routes of later branches of U6 sub-haplogroups were also proposed by
combining genetic data with climatic and archeological data. One instance in which climate data
was used by the authors in combination with frequencies was in analyzing the U6a2 branch. This
branch showed radiation centered in Ethiopia 20 kya. That period was one of maximal aridity in
North Africa, making it unlikely that the migration back to East Africa occurred across the
Sahara desert. The authors instead proposed a gradual migration of small groups over a long
period as the mechanism for this migration. Another example was the use of archaeological data
in supporting the radiation in Morocco around 26 kya. It had previously been suggested that this
migration was associated with the Aterian, but the authors found that it more likely correlates
with the Iberomaurusian archaeological data.
This study attested to the reliability of such a multidisciplinary approach by using their
method to explain their data with migration patterns in the post-colonial era, which is known
through historical records, successfully. The analysis approach in this experiment utilized a
comparison of the phylogeny based on the uniparental marker, mtDNA with archaeological,
geographic, climate, and anthropological information. The authors say that this method of
complete genome sequencing along with complex statistical analysis will model the future of
studies of population genetics.
Deutsch 5
Trivedi et al. (2008) also conducted a study with haploid markers, analyzing their results
with respect to other data; however, their research analyzed Y-chromosome data along with
linguistic, sociocultural, and geographic data to make predictions about human migration in
India. The results were then compared to mtDNA results from previous studies. Y-chromosomes
are a good indicator of human population relationships, because they are restricted to the male
germ line and undergo limited recombination during meiosis, yet produce a relatively high
number of mutations causing variability (Hughes and Rozen 2012). Trivedi et al. collected blood
samples from 1152 unrelated males from 80 populations varying in linguistic family (IndoEuropean, Austro-Asiatic, Dravidian and Tibeto-Burman), socio-ethnic association, and
geographic areas within India in order to analyze the Y-chromosomes. Additionally, 282 Indian
samples from Punjab, Konkanstha Brahmin, Koya, Yerava, Mullukunan, Kuruchian, and Koraga
populations and 3,047 samples from 76 populations outside of India were used from literature for
an analysis of distance and origin. The Y-chromosomes were analyzed for 38 previously
described binary polymorphisms. From these data, haplotypes of short tandem repeat sequences
on the Y-chromosome (Y-STR) and single nucleotide polymorphisms on the Y-chromosome (YSNPs) were constructed. Genetic differences were analyzed among socio-ethnic, linguistic, and
geographic groups at a level between individuals of a population, between populations of a
group, and between groups of populations. The authors provided an estimate of the time of the
most recent common ancestor by analyzing seven Y-STR loci (DYS19, DYS389I, DYS389II,
DYS390, DYS391, DYS392 and DYS393) with a generation time of 25 years and a mutation
rate of 6.9X10-4 as described in Zhivotovsky et al. (2004).
After sequencing, 24 paternal lineages were observed within India. This haplogroup
diversity is higher than that of Europe or East Asia and more similar to Central Asia. Austro-
Deutsch 6
Asiatic and Tibeto-Burman tribes showed a decreased diversity from the total sample. The
linguistic comparison showed that Dravidian populations had a higher diversity than IndoEuropean speaking populations. Additionally, geographic analysis showed that groups in South
India had higher haplogroup diversity than did the groups in North India. This data was compiled
into a phylogeographic distribution with haplogroup frequencies in figure 2. The amount of
variation due among geographical regions was determined to be greater than that among
populations within the regions, indicating that geographic distribution was most significant;
however, geographic distribution also tended to mirror linguistic family distribution. The socioethnic influence could not be seen from a geographic distribution and there was little difference
between the genetic distribution of castes and that of tribes. The coalescence ages calculated for
the haplogroups R1a1, H, C, O2a, and R2 with estimated most recent common ancestors at
around 32 kya, 44 kya, 49 kya, 36 kya, and 40 kya, respectively. The archaeological evidence
from this period is very sparse, so human migration predictions were made through other
methods.
A phylogenetic tree constructed for the samples (figure 3) showed that the haplotype
distributions were closely linked with linguistic families. The Austro-Asiatic and Tibeto-Burman
speakers were clustered around the O2a haplotype separate from the Indo-European and
Dravidian clusters. The results of this comparison of Y-chromosome haplogroups including
those of South East Asia indicated a connection between Austro-Asiatic and Tibeto-Burman
speakers and the South East Asian population. The other populations demonstrated a connection
with Indo-European speakers in Central Asia and Eastern Europe.
Using the collected and analyzed data, the authors made predictions about evolutionary
events, such as founder effect, gene flow, and genetic drift, as well as factors, including
Deutsch 7
geographic, linguistic and cultural barriers, which produced the Indian patrilineal distribution
they observed. The large diversity of Y-haplogroups in India suggested an early settlement.
Figure 2: (Trivedi et al. 2008)
In this figure, the frequency distribution of Y-chromosome haplogroups is
mapped in different regional populations of India. This map depicts an
area in which the Indian Ocean is to the right of the land and the Arabian
Sea is to the left.
Deutsch 8
Figure 3: (Trivedi et al. 2008)
This figure depicts a phylogeny constructed of Y-haplogroups found in
India with population relationships indicated based on frequency.
Deutsch 9
The four major haplogroups identified in this study were H, R1a1, O2a, and R2. The H
haplogroup was found to be mainly confined to India. The authors suggested that this haplogroup
was associated with an eastward migration during the late Pleistocene through the Leventine
corridor by viewing the data with accompanying mtDNA haplogroups, a strategy that provides a
more complete picture. The H-M69 group shows a fairly uniform distribution across different
populations, which indicated an appearance early in the lineage cluster. The R1a1 haplogroup
demonstrated high STR variance, indicating a period of population growth and expansion. These
data were consistent with an early migration from Central or South Asia. Analysis of the R2
haplogroup with respect to geographic distribution indicated an Indian origin for this haplogroup
as well as several small migrations out of India. Its uneven distribution within India as depicted
in figure 2, was attributed by the authors to genetic drift or bottleneck rather than migration. In
these data, the lack of C haplogroup sub-lineages indicated to the authors that most Indian
populations originated within the subcontinent, which indicated that the theory of Aryan
migration into India from Central Asia was incorrect. Although socio-ethnic factors, mainly the
caste system, are a large part of Indian society, no significant variation was found between caste
groups and tribes. This result was mirrored in the previously published mtDNA results that the
sample was compared to. This evidence again supports the hypothesis that Indian populations are
likely derived from common settlers during the Pleistocene era. The limited gene flow from
Europe and Central and West Asia is further indicated by agricultural evidence in the lack of
Neolithic farming markers. They predict that agriculture arose in India through the earliest
migration of Dravidian speakers and then again later through the migration of rice cultivators
from Southeast Asia. The linguistic analysis of the Austro-Asiatic and Tibeto-Burman language
families provided interesting data. The low haplotype diversity in these populations indicated a
Deutsch 10
demographic event that reduced diversity. The authors suggested a common founder event
followed by a bottleneck. The linguistic branches within Tibeto-Burman populations were
evident in the haplogroup diversity while they were not in the Austro-Asiatic populations. The
authors hypothesized that this result is due to the occurrence of multiple migration events of
Tibeto-Burman speakers and only a single Austro-Asiatic migration event which occurred
earlier. Geographic distribution supports this hypothesis. mtDNA from a previous study by
Metspalu et al. (2004) indicated that Tibeto-Burman speakers contributed many maternal
lineages while a study by Thangaraj et al. (2005) indicated an absence of markers in AustroAsiatic tribes. From this information accompanying their own data, the authors predict that either
the migration from South East Asia was male dominant or that the mtDNA has been completely
lost. This conclusion was supported by agricultural expansion data.
Trivedi et al. (2008) proposed that settlers of South India were the original settlers of the
continent, rather than previously assumed Austro-Asiatic tribes. This prediction was supported
through the geographical and linguistic comparison in this study. A multidisciplinary approach in
this study, including analysis of data with respect to agricultural histories, linguistic distributions,
geographic information, and a comparison to mtDNA from the same region, allowed for more
conclusions to be drawn and a more complete migration picture to be formed from the data than
would have analysis of Y-chromosome data alone.
A study by Devi et al. (2007) took a very different approach to studying human migration
from analyzing haploid markers as was done by Trivedi et al. (2008) and Secher et al. (2014)
The study analyzed the variation in the species Helicobacter pylori, a highly variable bacteria
that has colonized human stomachs, co-evolving with humans, and is transmitted primarily
vertically (Dominguez-Bell and Blaser 2011). The vertical transmission, a transmission directly
Deutsch 11
from mother to child, allows for this bacteria to be useful in tracking lineages (Dominguez-Bello
and Blaser 2011) in a more specific way than mtDNA (Wirth et al. 2004). This study analyzed its
data with respect to geography, culture, religion, and linguistics as well as mtDNA published in
other studies to produce conclusions about waves of human migration within India. A total of 63
H. pylori samples were collected from native Indian people primarily of Aryan and Dravidian
ancestry. For each sample, a 600 base pair region from each of 7 housekeeping genes, atpA, efp,
ureI, ppa, mutY, trpC and yphC was sequenced and seven haplotypes were identified.
Deutsch 12
An additional 600 sequences from other databases were used for phylogenetic tree construction.
Almost all of the strains in the study were found to be most similar to those from the European
H. pylori subpopulation (hpEurope). This similarity indicates a phylogenetic relationship
between the H. pylori populations of Europe and India and by extension, the people. The data
from the 400 previously published sequences of H. pylori from geographically and ethnically
diverse hosts and the newly sequenced H. pylori were used to create a geographic distribution of
global population structure in figure 4 (left). The phylogeny was constructed based on 650
mutation positions and demonstrates that H. pylori spread out of Africa mirroring the spread of
humans, consistent with
the co-evolution of these
species. The branches
were spread for greater
clarity to the right of the
figure. This result
revealed clear geographic
distribution of subpopulations and
Figure 4: (Devi et al. 2007)
Phylogenetic depiction of new and previously published data
created by analyzing 650 mutation sites (left) with a more easily
visualized version to the right. These are color coded by
population according to the key. The European cluster was more
clearly branched into Ladakhi (yellow) and other Indian (light
green) lineages European (dark green) when analyses were
performed including the remaining 650 mutating positions (center
box).
populations. All of the
isolates from North and
South India and two from
Ladakh were clustered
under the hpEurope
population (green).
Deutsch 13
Seventeen of the Ladakhi sequences were clustered on one hpAsia2 branch (gray). Further
distinction within the hpEurope population was made by analyzing an additional 650 mutation
sites in order to create a more specific phylogeny in the center box of the figure.
H. pylori samples were also sequenced at the cag Pathogenicity Island (cagPAI) locus.
This locus is a region responsible for translocation into the
host gastric epithelial cells (Terry et al. 2005). This analysis
showed that cagPAI sequences in India within this sample
(red) were part of the European cluster (figure 5). From the
use of geographic data in combination with genetic
frequencies, in both figure 4 and 5, the authors hypothesized
that H. pylori was likely introduced to India by IndoEuropean people, which is consistent with the idea of gene
flow into India from Indo-Aryans. They suggested that this
event occurred at the same time as the Indo-European
languages arrived, between 4000 and 10000 ya. In a
comparison with mtDNA results by Kivisild et al. (1999),
Figure 5: (Devi et al. 2007)
Phylogenetic tree based on the
cagPAI locus comprised of
219bp in which the
representative Indian samples
are red and clustered in the
European cluster.
the data does not rule out an alternative possibility that the
common origin of Indian and European strains could have
occurred much earlier during the upper Paleolithic migration
of humans in Eurasia. In comparison with other indications
of migration into India prior to Indo-Asiatic and from other
places, the authors hypothesized that H. pylori were present in the Indian population prior to the
Indo-Asiatic migration; however, the cagPAI strain from Europe outcompeted those of other
Deutsch 14
locations. The results showed a homogeneous population make up regardless of religion and
language, making these not useful for making additional conclusions. The H. pylori do not have
a known mutation rate and, therefore, could not be used for estimations of the dates of migration
events. This lack of date information prevents comparison with historical and archaeological
data. Thus, the conclusions from this study were heavily reliant on the comparison of genetic
data with geographic distribution data.
The studies conducted by Secher et al. (2014), Trivedi et al. (2008), and Devi et al.
(2007) demonstrate the usefulness of comparing genetic data with other forms of data in order to
produce a greater number of valuable conclusions in studies of human migration. The increased
effectiveness of a multidisciplinary approach, however, spreads beyond studies of human
migration to other kinds of population studies (Cavalli-Sforza and Feldman 2003). Equally the
methods are not confined to those in the aforementioned studies. The use of co-evolved species
to infer migration patterns and phylogeny can apply to many species, including Mycobacterium
tuberculosis and Hepatitis viruses (Dominguez-Bello and Blaser 2011). Also, in addition to
uniparental markers, variation in nuclear DNA can be used to construct phylogeny (Pankendorf
and Stoneking 2005). These methods in conjunction with geographical or cultural data can
provide support for the genetic data. These multidisciplinary methods as a whole provide context
to genetic information that allows for advancement of the knowledge in the field of human
migration (Cavalli-Sforza and Feldman 2003).
Literature Cited:
Deutsch 15
Cavalli-Sforza LL, Feldman MW (2003) The application of molecular genetic approaches to the
study of human evolution. Nat Genet 33:266-275
Devi SM, Ahmed I, Francalacci P, Hussain MA, Akhter Y, Alvi A, Sechi LA, Mégraud F,
Ahmed N (2007) Ancestral European roots of Helicobacter pylori in India. BMC
Genomics 8:184 doi: 10.1186/1471-2164-8-184
Dominguez-Bello MG, Blaser MJ (2011) The human microbiota as a marker for migration of
individuals and populations. Annu Rev Anthro 40:451-474 doi:10.1146/annurev-anthro081309-145711
Hughes JF, Rozen S (2012). Genomics and genetics of human and primate Y chromosomes.
Annu Rev Genom 13:83-108 doi: 10.1146/annurev-genom-090711-163855
Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, Reidla M, Laos S, Parik J,
Watkins WS, Dixon ME, Papiha SS, Mastana SS, Mir MR, Ferak V, Villems R
(1999) Deep common ancestry of Indian and western-Eurasian mitochondrial DNA
lineages. Curr Biol 9:1331-1334.
Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, Kaldma K, Serk P, Karmin M, Behar
DM, Gilbert MT, Endicott P, Mastana S, Papiha SS, Skorecki K, Torroni A, Villems R
(2004) Most of the extant mtDNA boundaries in south and southwest Asia were likely
shaped during the initial settlement of Eurasia by anatomically modern humans. BMC
Genet, 5: 26
Pakendorf B, Stoneking M (2005) Mitochondrial DNA and human evolution. Annu Rev Genom
6:165-183 doi: 10.1146/annurev.genom.6.080604.162249
Secher B, Fregel R, Larruga JM, Cabrera VM, Endicott P, Pestano JJ, González AM (2014) The
history of the North African mitochondrial DNA haplogroup U6 gene flow into the
African, Eurasian, and American continents. BMC Evol Biol 14:109 doi:10.1186/14712148-14-109
Soares P, Ermini L, Thomson N, Mormina M, Rito T, Rohl A, Salas A, Oppenheimer S,
Macaulay V, Richards MB (2009) Correcting for purifying selection: an improved human
mitochondrial molecular clock. Am J Hum Genet 84:740-759
Terry CE, McGinnis LM, Madigan KC, Cao P, Cover TL, Liechti GW, Peek RM, Forsyth MH
(2005) Genomic comparison of cag Pathogenicity Island (PAI)-positive and negative Helicobacter pylori strains: Identification of novel markers for cag PAI-positive
strains. Innfect Immun 73:3794-3798 doi: 10.1128/IAI.73.6.3794-3798.2005
Thangaraj K, Sridhar V, Kivisild T, Reddy AG, Chaubey G, Singh VK, Kaur S, Agarawal P, Rai
A, Gupta J, Mallick CB, Kumar N, Velavan TP, Suganthan R, Udaykumar D, Kumar R,
Mishra R, Khan A, Annapurna C, Singh L (2005) Different population histories of the
Mundari- and Mon-Khmer-speaking Austro-Asiatic tribes inferred from the mtDNA 9-bp
deletion/insertion polymorphism in Indian populations. Hum Genet, 116: 507-517
Trivedi R, Sanghamitra S, Amika S, Bindu GH, Banerjee J, Tandon M, Gaikwad S, Rajkumar R,
Sitalaximi T, Richa, Chainy GBN, Kashyap VK (2008) Genetic imprints of Pleistocene
Deutsch 16
origin of Indian populations: A comprehensive phylogeographic sketch of Indian YChromosomes. Int J Hum Genet 8:97-118
Via M, Gignoux CR, Roth LA, Fejerman L, Galanter J, Choudhry S, Toro-Labrador G, VieraVera J, Oleksyk TK, Beckman K, Ziv E, Risch N, Burchard Eg, Martinez-Cruzado JC
(2011) History shaped the geographic distribution of genic admixture on the island of
Puerto Rico. PLOS One 6:16513 doi: 10.1371/journal.pone.0016513
Wirth T, Wang XY, Linz B, Novick RP, Lum JK, Blaser M, Morelli G, Falush D, Achtman M
(2004) Distinguishing human ethnic groups by means of sequences from Helicobacter
pylori: Lessons from Ladakh. PNAS 101:4746-4751 doi: 10.1073/pnas.0306629101
Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild T, Scozzari R,
Cruciani F, Destro-Bisol G, Spedini G, Chambers G., Herrera RJ, Yong KK, Gresham D,
Tournev I, Feldman MW, Kalaydjieva L (2004) The effective mutation rate at Y
chromosome short tandem repeats, with application to human population-divergence
time. Am J Hum Genet, 74: 50-61
Download