Author's personal copy Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use. This article was originally published in the International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues who you know, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier’s permissions site at: http://www.elsevier.com/locate/permissionusematerial From Gerbault, P., Thomas, M.G., 2015. Human Evolutionary Genetics. In: James D. Wright (editor-in-chief), International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, Vol 11. Oxford: Elsevier. pp. 289–296. ISBN: 9780080970868 Copyright © 2015 Elsevier Ltd. unless otherwise stated. All rights reserved. Elsevier Author's personal copy Human Evolutionary Genetics Pascale Gerbault and Mark G Thomas, Research Department of Genetics, Evolution and Environment, University College London, London, UK Ó 2015 Elsevier Ltd. All rights reserved. This article is a revision of the previous edition article by J.L. Mountain, volume 10, pp. 6984–6991, Ó 2001, Elsevier Ltd. Abstract Traditionally, our knowledge of human evolution has come from the fossil and material culture records, and has been studied by paleontologists, anthropologists, archaeologists, anatomists, and – to some extent – linguists. In the past 25 years genetics has made substantial contributions to this field. While much of this has been driven by advances in molecular biology techniques (i.e., the methods used to obtain genetic data), the principles underlying how genetic data can be used to make inferences about our evolutionary past come from the field of population genetics. In this article we discuss how population genetics has been used to address two sets of questions regarding human evolution: (1) when and where did human populations originate (demographic history) and (2) to what extent and in what ways have humans adapted to changes in our ecology by natural selection (adaptation history). Introduction to Population Genetics Genetic information is carried by the sequence of bases in deoxyribonucleic acid (DNA). One of the most important features of the DNA molecule is its ability to be replicated, and so be passed on from a cell to its daughter cells, and from one generation to the next, mostly unchanged. Much of human DNA (referred to as the human genome) appears not to have a direct function, and is sometimes referred to as junk DNA. In the past it was generally thought that the only functionally important parts of our genome were genes – regions of DNA that provide information on how to make specific proteins – and regions near those genes that control their expression. However, in recent years other genomic regions have been shown to be important, such as those coding for functionally active ribonucleic acid (RNA) molecules. During the replication process, some changes can occur in the DNA sequence; any such change is termed a mutation. Mutations can occur in reproductive and nonreproductive cells (i.e., germ and somatic cells, respectively). However, only mutations occurring in germ cells are heritable and therefore a substrate for evolutionary processes. Mutations give rise to new genetic variants, called alleles (Kimura, 1971), and the overall constellation of alleles in an individual is known as its genotype. It should be noted that mutations are relatively rare – even on the evolutionary timescale – so most of the DNA sequences in our genome are identical between individuals, and indeed between humans and other primates. However, the human genome is very large (around 3 billion base pairs) so while comparatively rare, sites in the genome (loci) that are variable between individuals are still numerous. In addition to mutation, a process called recombination can shuffle the distribution of alleles along DNA sequences into new combinations (see later). The field of population genetics is chiefly concerned with describing and understanding the distribution and fate of genetic variation. Ultimately, an allele only has two fates, loss (from the population) or fixation (i.e., loss of variation at that locus). However, in the intervening time between mutation International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, Volume 11 and loss or fixation a locus will be variable or polymorphic. The distribution and fate of genetic variation in a population is shaped by three processes: mutation (including recombination), genetic drift, and natural selection. Mutation, as explained above, gives rise to variation, natural selection favors (and so increases/decreases in frequency) particular alleles, and genetic drift is the random sampling of alleles from one generation to the next through reproduction, and leads to random changes in allele frequencies through time. When a new mutation occurs in the protein-coding region of a gene, or in other functionally important parts of the genome, it may change the biological characteristics (phenotype) of an individual. While phenotypes are only partly determined by genotype, they can affect the overall fitness of individuals, and hence lead to differential survival and reproduction. This process is known as natural selection, and it can act on phenotypes related to early development and survival to reproductive age (viability), on success in attracting a mate (sexual selection), on the consequent ability to fertilize (gamete selection), or on the number of progeny produced (fecundity). The sum of these various selection stages constitutes the fitness of a phenotype, which is partly dependent on environmental variables. The selection coefficient is the relative fitness of a genotype or genotypes in relation to others. A new mutation might not affect the underlying phenotype. Such mutations give rise to alleles that are neither advantageous nor disadvantageous, and are said to be neutral. The fate of neutral mutations is determined by random genetic drift – the chance passing on alleles to future generations. Genetic drift reduces genetic diversity through time since it ultimately leads to the loss or fixation of alleles, but its effect depends strongly on the effective size of the population considered. The effective population size (Ne) is the size of a randomly mating population that would show the same degree of genetic drift as the actual population (Wright, 1931). The relationship between effective and census population sizes (Ne and N, respectively) varies depending on certain properties of the population considered (Rice, 2004). For example, if we assume http://dx.doi.org/10.1016/B978-0-08-097086-8.82020-6 International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 289 Author's personal copy 290 Human Evolutionary Genetics that individuals mate randomly and generations are not overlapping, then when N changes through time it can be shown that Ne is the harmonic mean of N over a given period of time. In such conditions, Ne is disproportionately affected by small values of the fluctuating population size, and Ne will be smaller than the average value of N across generations. However, these assumptions do not always hold (Rice, 2004). The effect of genetic drift can be illustrated by looking at the chance an allele has of being fixed in a population. For autosomes (chromosomes other than the sex chromosomes: X or Y) the probability of fixation of a new allele equals its relative frequency in the population and is therefore 1/(2Ne); 2Ne because every individual carries two copies of each autosome. For the Y chromosome (one copy, only carried by males) or the mitochondrial DNA (mtDNA; only transmitted by females) this probability is approximately 1/(0.5Ne), and for the X chromosome (2 copies in females, 1 copy in males) this probability becomes approximately 1/(1.5Ne). This highlights that the larger the population the less is the effect of drift since the smaller the chance a new allele has of being fixed in this population. An appreciation of the importance of genetic drift emerged from a major branch of population genetics called Neutral Theory (Kimura, 1971). Prior to this it was thought that natural selection was the most important force shaping the fates of alleles, and so evolution. Neutral Theory itself arose out of the realization that there was far more genetic variation in natural populations (including humans) than could be maintained by natural selection alone. It postulates that most mutations give rise to alleles that are either disadvantageous – so lost rapidly from the population – or are selectively neutral, or sufficiently nearly neutral that drift rather than selection is the main force governing their fate. Neutral Theory does admit the possibility of selectively advantageous alleles arising by mutation, but assumes that such events are sufficiently rare that they contribute little to overall patterns of genetic variation (Kimura, 1968). One interesting prediction of Neutral Theory relates to the rate of change of DNA sequences through time. If selection plays a major role in the determination of survival of new mutations, then a constant mutation rate should not lead to a constant rate of evolution (substitution rate, or the rate of fixation of new alleles). Instead the substitution rate would be expected to vary episodically through time, as selection intensity acting on different traits varies. Alternatively, if the Neutral Theory holds then it is expected that the substitution rate is set by the underlying mutation rate and the proportion of new alleles that are selectively disadvantageous (the selective constraint). Given that the mutation rate (m) and selective constraint for any given gene is approximately constant through time, this predicts that the rate of genetic change is also constant over all evolutionary lineages. This is known as the molecular clock hypothesis (Kimura, 1968; Zuckerkandl and Pauling, 1965) and it permits the estimation of lineage/ species divergence times using genetic data. Neutral Theory has had a profound effect on population genetics; so much so that while there are undoubtedly examples where natural selection has shaped the fate of alleles, neutrality is now widely accepted as the null hypothesis – the default assumed status of alleles at polymorphic loci. Given that patterns of genetic variation in populations are shaped by mutation, drift (itself shaped by demographic history) and natural selection, it stands to reason that those patterns of variation in populations contain information about mutation, demographic history, and natural selection. However, extracting that information to make inferences about our evolutionary past is not trivial, primarily because any particular pattern of variation in one or a set of populations can be the result of a very wide range of different evolutionary scenarios (equifinality). Population Genetics Inferences: Demographic History and Natural Selection One of the advantages of the neutral model of evolution is that it makes predictions about the relationship between patterns of genetic variation, the mutation rate, and demographic parameters, such as the effective population size (Kimura, 1968; Zuckerkandl and Pauling, 1965). These predictions include: (1) the expected level of diversity within a species at equilibrium is a function of the mutation rate and the effective population size; (2) as two species diverge, they accumulate differences at the same rate (i.e., substitution rate) at which neutral mutations arise; and (3) the expected allele frequencies in a sample of DNA sequences is a function of the effective population size and of the sample size. These predictions can then be compared to empirical data (Zuckerkandl and Pauling, 1965). From there, any significant deviations from neutral expectations would suggest the locus or loci under study did not evolve neutrally but instead were subjected to other forces, including natural selection. In other words, the study of natural selection is mainly based upon tests of significant deviation from the null hypothesis of neutral evolution, rather than any direct measurements of selection. However, an important complicating factor is that real populations typically have complicated demographic histories that cannot be collapsed to a single population size parameter, and some demographic histories can lead to patterns in genetic variation that mimic those formed by natural selection (e.g., Currat et al., 2006). A wide-range of methods have been developed to detect genomic signature of natural selection. These methods differ primarily in the nature of the genetic data they consider and the evolutionary time frame they are sensitive to. Allele frequencies and other measures of genetic variation can be used to estimate the extent and nature of demographic changes, and population structuring, or departures from neutrality. The extent of Hardy–Weinberg disequilibrium can be used as a crude test of neutral evolution. The Hardy–Weinberg theorem assumes an idealized population with an infinite number of randomly mating individuals, no selection, no mutation, and no migration, and predicts that for a single locus with two alleles, the three genotypes AA, Aa, and aa follow the proportions p2, 2pq, and q2, where p and q are the initial relative frequencies of the two alleles A and a, respectively. Different types of natural selection will alter these proportions in distinct ways (Nielsen, 2005). However, any deviation from Hardy– Weinberg equilibrium can be due to various factors, such as population structure, and should thus be treated with caution International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 Author's personal copy Human Evolutionary Genetics as evidence for selection. Importantly, natural selection affects only those selected alleles, while demographic factors and genetic drift act on the whole genome. Classical methods for detecting selection are based on the distribution of allele frequencies in single nucleotide polymorphisms (SNPs), also called the allele- or site-frequency spectrum (SFS)-based methods. The allele-frequency spectrum can be ‘unfolded’ or ‘folded’ depending on whether the derived and the ancestral (i.e., fixed in other Apes) allele can be distinguished or not, respectively. This basically involves identifying the number of allele-frequency classes observed, and counting the number of loci falling in each frequency class. When considering the SFS negative selection tends to increase the proportion of low-frequency variants compared to neutral expectations (Nielsen, 2005). Alternatively, a new mutation arising in a population may be advantageous, in which case it will increase in frequency in the population by positive selection. Much interest has focused on positive selection due to its association with adaptation and the evolution of new forms and functions. Over a prolonged period, positive selection increases the fixation rate of beneficial function-altering alleles. It can be detected by comparison of changes between species. The first test of selection based on detecting regions where patterns of variation depart from those expected under neutrality is the Hudson–Kreitman–Aguade (HKA) test (Hudson et al., 1987). It compares levels of diversity in different genes or genomic regions, calibrated by betweenspecies divergence rates for the same regions, to test whether these levels are significantly increased or reduced in the region of interest, compared to theoretical expectation or empirical data for presumed neutral parts of the genome (Nielsen, 2005). Balancing selection is a case of positive selection where the new variant is advantageous in combination of other alleles. Balancing selection thereby increases the proportion of intermediate-frequency variants with respect to neutral expectations, whereas positive selection increases the proportion of high-frequency variants. A selective sweep is an example of positive selection where the new favored variant reaches a high frequency in the population, resulting in an overall decrease of the genetic variation at the selected locus, as well as in the genomic region surrounding it. A selective sweep tends to increase the proportion of low-frequency variants in respect to neutral expectations. A widely-used type of a selection test based on the SFS is Tajima’s D test (see Tajima, 1989). In cases of a selective sweep or purifying selection, this test detects an excess of lowfrequency variants (indicated by negative values of the Tajima’s D statistic) compared to what is expected under neutrality, or to empirical data for presumed neutral parts of the genome. Alternatively, in cases of balancing selection this test detects an excess of alleles of intermediate frequency (indicated by positive values of D). Other extensions of this type of tests are based on similar principles (e.g., Fay and Wu, 2000). Selection can also differ between populations. This may happen due to adaptations to local environment, in which cases the level of population differentiation may increase for loci under different selection in different populations. The Lewontin–Krakauer (1973) test was one of the first neutrality tests to use the level of genetic differentiation between 291 populations. This test rejects neutral evolution when differentiation between populations at specific locus is larger than that expected under a neutral model, or is outside the empirical range observed in presumed neutral regions of the genome. Other tests have been developed under the same principle (reviewed in Nielsen (2005)). Notably, Akey et al. (2002) used genome-wide data to look at the variation in FST (a traditional measure of allele-frequency difference between populations) in humans. Various statistical methods can use genome-wide scans to detect selective sweeps without prior hypotheses on candidate genes or regions under selection (Nielsen, 2005). Another feature of selected genomic regions is an increase in the level of linkage disequilibrium (LD). When a new allele arises it is physically linked to alleles at other loci nearby on the same chromosome. This generates a nonrandom association of alleles known as LD, and initially, any new allele will be in complete LD with other nearby alleles. These combinations of alleles in a region of a chromosome are sometimes referred to as haplotypes. Over time, recombination breaks down the association between alleles at nearby loci, and the allele of interest will become associated with an increasing number of different haplotypes. LD also decreases with increasing genetic distance from the selected site as recombination shuffles allele combinations in proportion to distance away from it on the chromosome. Thus, the number and lengths of haplotypes associated with an allele of interest can act as a proxy for the age of that allele. If an allele has been positively selected then it will rise to high-frequency quicker than expected under genetic drift alone. It therefore follows that high-frequency alleles that are still in high LD with other nearby alleles (i.e., are recent in origin) are good candidates for selection. These principles have been used to develop a number of statistics for detecting recent and strong natural selection. For example, the extended haplotype test identifies tracts of homozygosity (identity in randomly drawn pairs of sequences) associated with a core haplotype using the ‘extended haplotype homozygosity’ (EHH) statistic (e.g., Voight et al., 2006). A haplotype containing an allele that has been positively selected is expected to display high EHH values and high frequencies. In contrast, haplotypes that reach high frequencies due to genetic drift are likely to have taken longer to reach those high frequencies and are therefore likely to have been subjected to more recombination and mutation events, and would consequently show lower values of EHH. A fundamental problem with some methods of detecting genomic signatures of natural selection is the bias generated when genetic data are obtained not through direct sequencing, but by genotyping SNPs that have been discovered in other samples. This is called ascertainment bias. Patterns observed in the data, such as allele frequencies, population differentiation, and LD, all depend on the procedure used to discover these SNPs. For example, ascertained SNPs will almost invariably be biased toward more common variants, thus skewing statistics based in the SFS. Additionally, many SNPs were identified in only a subset of populations, particularly Europeans. When the SNP-discovery protocol is known, statistical methods can be used to correct for ascertainment biases to some extent (e.g., Voight et al., 2006). More importantly, most methods for detecting selection are challenged by the confounding effects of demographic history International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 Author's personal copy 292 Human Evolutionary Genetics (e.g., Przeworski, 2002; Currat et al., 2006). For example, Tajima’s D test can reject a neutral model of evolution if the population has undergone an expansion or was strongly structured (e.g., Przeworski, 2002) since an expansion can lead to an increase in the proportion of low-frequency variants, mirroring the effect of a selective sweep, and the population structure can lead to an increase in the proportion of intermediate-frequency variants, mirroring balancing selection. Allele surfing, a process whereby a rare allele is driven to high frequencies at the wave front of an expanding population, can also mirror a selective sweep (e.g., Currat et al., 2006). Alternatively, a preferential loss of low-frequency variants is expected to occur during a population bottleneck, thereby leading to an excess of intermediate-frequency variants, mirroring balancing selection. In consequence, failure to reject a neutral model of evolution for a locus might be the result of a particular demographic history. Methods to account for the confounding effects of demography usually involve comparisons of distributions of genomic patterns of diversity at a locus of interest – such as selection–detection statistics – to those derived from the rest of the genome (which is presumed to be mostly neutral), or generation of distributions of statistics under specific demographic scenarios by simulation, and detection of outliers (e.g., Xue et al., 2009). These methods involve modeling specific demographic histories by computer simulation. Alternatively, a recently developed method for genome-wide data sets accounts for shared population history and gene flow by generating an empirical pattern of covariance in allele frequencies between populations from a set of markers (Coop et al., 2010). This is then used as a null model for identifying loci involved in local adaptation by looking at unusual correlations between allele frequencies and ecological variables (Coop et al., 2010). An alternative approach to detecting selection makes use of advances in ancient DNA analysis to directly assess the rate of allele-frequency change through time (Wilde et al., 2014; Sverrisdottir et al., 2014). While this approach currently requires the assumption of population continuity between temporally distinct DNA samples (Sverrisdottir et al., 2014), it does have some key advantages over other methods. First, it is a direct method to test for selection, unlike others based only on contemporary DNA samples. Secondly, given sufficient data it has the potential to detect episodic changes in selection intensity through time. Thirdly, unlike some approaches it is not dependent on estimates of mutation or recombination rates (Wilde et al., 2014). The Origins of Modern Humans: The Genetic Record Population genetics is grounded in mathematics and probability theory (e.g., Kimura, 1968). The field has however become fundamentally data driven since the discovery of the ABO blood group system. Cavalli-Sforza and Edwards (1964) estimated genetic distances between five populations based on five different blood groups and inferred a tree in which Europeans were separated from an Afro-Asian lineage. A larger study analyzing 35 proteins linked Europeans and Asians to the exclusion of Africans, estimating a Europeans/Asians split 55 000 years ago and an older divergence from Africans 120 000 years ago (Nei and Roychoudry, 1974). As genetic data increased, multivariate methods, such as principal component analysis (PCA), started to be used to summarize patterns of variation for multiple genetic markers (including HLA and blood group protein markers) over space (Cavalli-Sforza et al., 1994). Spatial gradients of allele frequencies were often interpreted as corresponding to hypothesized linguistic or cultural expansions. It has however been shown that spatial patterns of genetic variation can arise even when no major demographic events have occurred (Novembre and Stephens, 2008) but simply because similarity between locations tends to decay with geographic distance, as predicted under an isolation by distance (IBD) model. One of the earliest influential genetic studies of human origins examined mtDNA variation from worldwide population sample of 147 individuals (Cann et al., 1987). The inferred maximum parsimony tree indicated that the deepest branches were in Africa and a molecular clock estimate placed the deepest split, that is, the time to the most recent common ancestor, at just less than 200 000 years ago. Despite the caveats of this study, notably the ‘African sample’ was actually a sample of African Americans, and the maximum parsimony analysis used mid-point rooting, instead of the more reliable out-group rooting, these results were corroborated some years later by a more robust study (Ingman et al., 2000). Since then, estimates of the time to the most recent common ancestor (TMRCA) for mtDNA and the Y chromosome have been highly influential in studies of human evolution. Estimates for mtDNA TMRCA currently range around 140–240 kya (e.g., Behar et al., 2012) while, until recently, estimates for the Y chromosome TMRCA ranged between 60 and 140 kya (e.g., Wei et al., 2013). These TMRCA estimates for mtDNA and the Y chromosome differ considerably and might initially seem contradictory. However, large differences are expected for two major reasons. First, the variance in expected TMRCAs is large compared to its mean, as predicted by coalescent theory (Kingman, 1982). Second, the mode of inheritance of mtDNA and the Y chromosome (maternally and paternally inherited, respectively) mean they may have different effective population sizes. Coalescent theory is a retrospective mathematical model of gene genealogies for a sample of DNA sequences under a defined population history (Kingman, 1982). Demographic processes shape the pattern in which those lineages (i.e., branches on a genealogical tree) connect to one another. We say those lineages converge or ‘coalesce’ when this process is looked at backward through time. Gene coalescence can be thought of as drift run backwards through the genealogical tree. However, while drift and coalescence are conceptually related, simulating coalescence backward through time is considerably more computationally efficient than simulating drift forward through time, because it only considers the sample, not the whole population. Coalescent theory shows that the expected TMRCA is 2Ne in generations, and the final time interval, when the remaining two lineages coalesce (i.e., join) into the MRCA, represents more than half of the variance in the TMRCA. This is because coalescences take longer when there are fewer lineages and those times will tend to strongly affect the shape of the genealogy. International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 Author's personal copy Human Evolutionary Genetics These high-expected variances in TMRCAs have been highlighted by the recent discovery of a Y chromosome lineage branching at the basal portion of the Y chromosome genealogical tree (Mendez et al., 2013). This Y chromosome lineage was identified in individuals of West Africa origin and recent West African descent, and was labeled A00 (Mendez et al., 2013). The updated Y chromosome genealogical tree provided a TMRCA estimate of 338 kya (95% confidence interval (237–581) kya) (Mendez et al., 2013). This estimate does not only predate current mtDNA TMRCA but also the earliest anatomically modern human fossils. This demonstrates that interpretation of TMRCA date estimates for a single nonrecombining region of the genome should be treated with caution – they do not necessarily represent the founding dates of a population or species. Furthermore, because of the ubiquity of past migration and the stochastic process inherent to gene genealogies, it is difficult to infer the geographic origin of a genealogical lineage from its current distribution (Beaumont et al., 2010). Regardless of the TMRCA estimates, studies of mtDNA variation (Cann et al., 1987; Ingman et al., 2000) and of nonrecombining regions of the Y chromosome variation (e.g., Ke et al., 2001) in modern populations are consistent with, and have been interpreted as strongly supporting a recent African origin of our species. This relies on the observation that Africa is the source of the deepest lineages and harbors the greatest diversity. The genetic diversity of both systems is characteristic of a rapidly expanding population or one that was subject to positive selection, that is, long terminal branches and excess of low-frequency polymorphisms. A similar signal was also identified in various autosomal regions (e.g., Voight et al., 2005). A demographic process where modern humans expanded out of Africa between 50 000 and 100 000 years ago and replaced other archaic humans has been shown to better explain the general pattern of genetic diversity observed (e.g., Fagundes et al., 2007). Although classification systems vary, archaic humans refer to any Homo-related fossil remain that is not anatomically modern humans (Homo sapiens sapiens). Traditionally, and prior to the development of molecular genetics techniques, information on the morphology and behavior of our ancestors and the related species at different times came from fossil and archaeological evidence. One of the most widely known archaic human groups is Neanderthal (Homo neanderthalensis). It represents a morphologically distinct group, with robust morphology, large brains, and prominent brow ridges. These fossils date to between 250 000 and 39 000 years ago (Higham et al., 2014) and have been found in Europe and western and central Asia. In contrast, the earliest widely accepted fully modern human skull is Omo I, from Ethiopia, and dates to 195 000 years ago (McDougall et al., 2005). Recently, the genomes of two archaic human forms, Neanderthals (Green et al., 2010) and Denisovans (Reich et al., 2010), have been partially sequenced, thereby providing some estimates of the relationships of ancient and modern humans. This was performed by first focusing on biallelic SNPs where two present-day humans carry different alleles and the archaic human genome carried the derived allele, that is, not matching chimpanzee (Green et al., 2010; Reich et al., 2010). Then a measure called D(H1,H2, A, chimpanzee) was computed to 293 assess the difference in proportion of matching when the derived allele in the archaic human (A) genome matched the modern human genome H1 more often than the modern human genome H2. D is positive if the archaic genome matches H1 more often and negative if it matches H2 more often. This measure led to an estimate of the proportion of Neanderthal ancestry in genomes as 1–4% (Green et al., 2010) and 4–6% of Denisovan ancestry in Melanesian genomes (Reich et al., 2010). It thus appears that Neanderthals and Denisovans did contribute some ancestry to non-African modern humans. Adaptations and Detection of Positive Selection We now inhabit radically different ecological (e.g., hot/cold and tundra/forest), cultural (e.g., variety of food resources and of their uses and social and mating systems) and demographic (high population densities and rates of gene flow) environments worldwide. Environmental changes have triggered human adaptation, a process that involves physiological, biochemical, and behavioral adjustments. Genetic adaptations are additional responses of humans to changing environment. Genome-wide scans have the potential to generate lists of putatively selected genes that can be further studied from a functional perspective. The most striking examples of signals of selection include those identified in genomic regions involved in human pigmentation (e.g., Wilde et al., 2014) and for genetic polymorphisms conferring some resistance to malaria (Kwiatkowski, 2005) and other infectious diseases (e.g., Fumagalli et al., 2012). Both emphasize that the various environments we have evolved in, either in the form of climate factors (e.g., Hancock et al., 2010) – such as ultraviolet exposure – or of pathogens (e.g., Fumagalli et al., 2012) have driven components of our evolution. In this respect, evidence for correlations between patterns of genetic variation and environmental variables (e.g., Hancock et al., 2010), such as temperature, precipitation, solar radiation, latitude, or elevation, have been identified. However, identifying the precise selective pressure and assessing the functional advantage of candidate genetic variants remains a challenge. EDAR This challenge is well exemplified by genes involved in hair follicle development, which have shown higher levels of population differentiation than expected under neutrality. In particular, the ectodysplasin-A receptor gene (EDAR; MIM 604095), located on chromosome 2q12.3, plays a central role in generation of the primary hair follicle pattern. In humans a derived allele in the EDAR gene, called EDAR370A, results in a valine-to-alanine substitution at position 370 of the protein sequence. This allele is associated with thicker hair in East Asian populations (Kamberov et al., 2013) has shown a strong signal of positive selection using tests based on the SFS (e.g., Kelley et al., 2006), haplotype structure (Voight et al., 2006) and population differentiation (e.g., Xue et al., 2009). Using haplotype analysis, the EDAR370A allele has been dated to between 1133 and 73 996 years ago (Bryk et al., 2008), but the reasons for strong selection are not yet known. International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 Author's personal copy 294 Human Evolutionary Genetics The only selection hypothesis explicitly formulated so far is that this allele may have been beneficial under Asian climate conditions some 25 000 to 10 000 years ago; a time when the climate was significantly colder and dryer than present in eastern and northern Asia. This hypothesis proposes that the EDAR370A allele contributed to increase lubrication and humidification of exposed surfaces (face and scalp) during the dryer and colder Ice Age in East Asia (Chang et al., 2009). This has however not been supported by a recent study reporting an EDAR370A mouse model (Kamberov et al., 2013). In this study EDAR370A mice showed (1) increased hair thickness; (2) a higher mammary branch density and smaller mammary fat pad area; and (3) an increase in eccrine sweat gland density (Kamberov et al., 2013). This study also showed that EDAR370A was associated with (4) tooth variation (including single and double shoveling of the upper incisors); and (5) a higher active eccrine sweat gland density in Han Chinese (Kamberov et al., 2013). A spatially explicit forward in time simulation model, integrating the origins and spread of farming populations and fitting to observed modern allele frequencies, suggested that EDAR370A arose in East Asia around 30 000 years ago (95% credible interval 13 175– 39 575 years ago) with selection a coefficient among the highest estimated for the human genome (0.122 with a 95% credible interval 0.030–0.186). This nonetheless does not explicitly tell us which of EDAR370A pleiotropic effect/s was/were the actual target of natural selection. For example, as EDAR affects the development of sweat glands and hair morphology, it may be hypothesized that EDAR370A alters thermoregulation, and possibly mate preference. Alternatively, mammary gland branching and/or fat pad size may have been the adaptive phenotype, either because of associated benefits on lactation (not-yet assessed) or because of increased mate preferences (Dixson et al., 2011). Identifying the selective pressure on EDAR370A and the target phenotype of this selection are made difficult by the various developmental pathways in which EDAR is involved. LCT Pathways involved in nutrient intake, digestion, and metabolism are essential to energy production and growth, and consequently to development and survival. Since the range expansions out of Africa, our species has undergone several dietary transitions, including a shift from a diet based on animal hunting and a broad range of food gathering to a diet based on farming domesticated plants and animals (Luca et al., 2010). This Neolithic transition had profound effects on our diet, particularly dietary breadth and carbohydrate content. The Neolithic transition dates back to about 11 000 years ago in its core regions and marks the transition from foraging (i.e., hunter-gathering) to food-producing (i.e., farming) societies. It is associated with changes that include an increasingly sedentary lifestyle; the development of alternative economies that focus on animal and/or plant domesticates; and technical innovations that include polished stone tools and pottery. The dietary breadth of farming populations is known to be narrower than that of hunter-gatherers (Luca et al., 2010). Patterns of genetic variation in the gene coding for lactase (LCT) provide what is probably the strongest evidence of genetic adaptation to dietary specialization (Holden and Mace, 1997; Gerbault et al., 2011). Lactase is the enzyme that digests the milk sugar lactose. It is expressed in young mammals but its expression usually decreases after the weaning period is over; a trait known as lactase nonpersistence. However, in certain human populations many adults continue to express lactase throughout adulthood, a trait termed lactase persistence. Large differences in the frequency of lactase persistence in different populations have been known for some time (Holden and Mace, 1997). More recently, at least four independent alleles associated with lactase persistence have been identified indicating convergent evolution in different geographic locations (Enattah et al., 2007; Gerbault et al., 2011). In Europe and southern and central Asia, a single allele (–13 910*T) predominates, whereas in the Middle East and Africa three additional alleles (13 915*G, 13 907*G, and 14 010*C) are commonly found in lactase-persistent individuals. Furthermore, the correlation between the distribution of lactase persistence and the distribution of pastoralism/dairying (Holden and Mace, 1997) provide strong support for a geneculture coevolutionary process. Selection coefficients on the -13 910*T allele have been estimated to be between 1.4 and 19% using extended haplotype homozygosity (Bersaglieri et al., 2004), between 5.2 and 15.9% using spatially explicit simulation modeling (Itan et al., 2009), and around 2.4% using ancient DNA (Sverrisdottir et al., 2014), among the highest of the human genome over the past 30 000 years. Furthermore, the age estimates for the –13 910*T allele are relatively recent: 2188–20 650 years ago using extended haplotype variation (Bersaglieri et al., 2004), 7475–10 250 years ago using closely linked microsatellite variation (Mulcare, 2006) and 6256–8683 years ago using simulation modeling (Itan et al., 2009). These age estimates bracket dates for the domestication of milkable animals and dairying. Our knowledge of the evolution of lactase persistence in Europe has benefitted considerably from archaeological data, notably from the age of dairy animals at death (e.g., Vigne, 2008) and lipid residue analyses from potsherds (Evershed et al., 2008). Both have shown that dairying was an early feature of European Neolithic economies. By considering ancient DNA data on the occurrence of the –13 910*T allele in combination with archaeological data on milk use, it appears that dairying was practiced before lactase persistence arose or became common(Burger et al., 2007). Conclusion The propensity of our species to colonize a wide range of environments illustrates our species plasticity and adaptability. While much of that plasticity and adaptability is underwritten by our capacity for cumulative culture (Powell et al., 2009), it now seems certain that some biological adaptation has occurred. Our ability to detect the genomic signatures of natural selection has improved dramatically in the last 10 years. This has been simultaneously driven by advances in the molecular techniques used to generate genetic data, by improvements in ancient DNA technologies, and critically, by International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 Author's personal copy Human Evolutionary Genetics innovations in statistical inference and computer simulation modeling. However, challenges remain in disentangling the relative effects of mutation, recombination, migration, population structure, genetic drift, and natural selection on patterns of genetic diversity. Strong evidence for the adaptive status of a trait entails (1) evidence of differential fertility or mortality dependent on particular genetic variation; (2) evidence from in vitro and/or in vivo studies of functional differences between genotypes that affect the reproductive success of their carriers; and (3) evidence of concordance between the distribution of a genetic traits and the environmental factors that drive selective pressures. Because providing these three types of evidence remains a challenging task, few strong examples of natural selection exist. We have presented two of them, those involving EDAR and LCT genes. As molecular techniques improve and become easier and cheaper to perform, more genetic information accumulates, bringing new challenges in handling and analyzing so much data (Pool et al., 2010). With improvements in our understanding of metabolic networks and biological pathways, it becomes increasingly evident that mutations leading to changes in a gene product can affect phenotypes in multiple and subtle ways. This makes the targeting of selective pressure and their potential adaptive consequences difficult. Integrative computational modeling conditioned on multiple data types offers one solution to this problem, and provides a robust means of testing evolutionary hypotheses. See also: Adaptation, Fitness, and Evolution; Darwinism; Evolution, History of; Genetics and Anthropology; Human Behavioral Ecology; Microevolution. Bibliography Akey, J.M., Zhang, G., Zhang, K., Jin, L., Shriver, M.D., 2002. Interrogating a highdensity SNP map for signatures of natural selection. Genome Research 12, 1805–1814. Beaumont, M.A., Nielsen, R., Robert, C., Hey, J., Gaggiotti, O.E., Knowles, L., Estoup, A., Panchal, M., Corander, J., Hickerson, M., Sisson, S.A., Fagundes, N.J., Chikhi, L., Beerli, P., Vitalis, R., Cornuet, J.M., Huelsenbeck, J.P., Foll, M., Yang, Z., Rousset, F., Balding, D., Excoffier, L., 2010. In defense of model-based inference in phylogeography. Molecular Ecology 19 (3), 436–446. Behar, D.M., Van Oven, M., Rosset, S., Metspalu, M., Loogvali, E.L., Silva, N.M., Kivisild, T., Torroni, A., Villems, R., 2012. A “Copernican” reassessment of the human mitochondrial DNA tree from its root. American Journal of Human Genetics 90, 675–684. Bersaglieri, T., Sabeti, P.C., Patterson, N., Vanderploeg, T., Schaffner, S.F., Drake, J.A., Rhodes, M., Reich, D.E., Hirschhorn, J.N., 2004. Genetic signatures of strong recent positive selection at the lactase gene. American Journal of Human Genetics 74, 1111–1120. Burger, J., Kirchner, M., Bramanti, B., Haak, W., Thomas, M.G., 2007. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proceedings of National Academy Science USA 104, 3736–3741. Bryk, J., Hardouin, E., Pugach, I., Hughes, D., Strotmann, R., Stoneking, M., Myles, S., 2008. Positive selection in East Asians for an EDAR allele that enhances NF-kappaB activation. PLoS ONE 3, e2209. Cann, R.L., Stoneking, M., Wilson, A.C., 1987. Mitochondrial DNA and human evolution. Nature 325, 31–36. Cavalli-Sforza, L., Edwards, A.W., 1964. Analysis of human evolution. Genetics Today 3, 923–933. Cavalli-Sforza, L., Menozzi, P., Piazza, A., 1994. The History and Geographic of Human Genes. Princeton University Press, Princeton, New Jersey, USA. 295 Chang, S.H., Jobling, S., Brennan, K., Headon, D.J., 2009. Enhanced EDAR signalling has pleiotropic effects on craniofacial and cutaneous glands. PLoS ONE 4, e7591. Coop, G., Witonsky, D., Di Rienzo, A., Pritchard, J.K., 2010. Using environmental correlations to identify loci underlying local adaptation. Genetics 185, 1411–1423. Currat, M., Excoffier, L., Maddison, W., Otto, S.P., Ray, N., Whitlock, M.C., Yeaman, S., 2006. Comment on “Ongoing adaptive evolution of ASPM, a brain size determinant in Homo sapiens” and “Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans”. Science 313 (5784), 172; author reply 172. Dixson, B.J., Vasey, P.L., Sagata, K., Sibanda, N., Linklater, W.L., Dixson, A.F., 2011. Men’s preferences for women’s breast morphology in New Zealand, Samoa, and Papua New Guinea. Archives of Sexual Behavior 40, 1271–1279. Enattah, N.S., Trudeau, A., Pimenoff, V., Maiuri, L., Auricchio, S., Greco, L., Rossi, M., Lentze, M., Seo, J.K., Rahgozar, S., Khalil, I., Alifrangis, M., Natah, S., Groop, L., Shaat, N., Kozlov, A., Verschubskaya, G., Comas, D., Bulayeva, K., Mehdi, S.Q., Terwilliger, J.D., Sahi, T., Savilahti, E., Perola, M., Sajantila, A., Jarvela, I., Peltonen, L., 2007. Evidence of still-ongoing convergence evolution of the lactase persistence T-13910 alleles in humans. American Journal of Human Genetics 81, 615–625. Evershed, R.P., Payne, S., Sherratt, A.G., Copley, M.S., Coolidge, J., Urem-Kotsu, D., Kotsakis, K., Ozdogan, M., Ozdogan, A.E., Nieuwenhuyse, O., Akkermans, P.M., Bailey, D., Andeescu, R.R., Campbell, S., Farid, S., Hodder, I., Yalman, N., Ozbasaran, M., Bicakci, E., Garfinkel, Y., Levy, T., Burton, M.M., 2008. Earliest date for milk use in the Near East and southeastern Europe linked to cattle herding. Nature 455, 528–531. Fagundes, N.J., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, F.M., Bonatto, S.L., Excoffier, L., 2007. Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America 104, 17614–17619. Fay, J.C., Wu, C.I., 2000. Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413. Fumagalli, M., Sironi, M., Pozzoli, U., Ferrer-Admetlla, A., Pattini, L., Nielsen, R., 2012. Signatures of environmental genetic adaptation pinpoint pathogens as the main selective pressure through human evolution. PLoS Genetics 7, e1002355. Gerbault, P., Liebert, A., Itan, Y., Powell, A., Currat, M., Burger, J., Swallow, D.M., Thomas, M.G., 2011. Evolution of lactase persistence: an example of human niche construction. Philosophical Transactions of the Royal Society London B: Biological Sciences 366, 863–877. Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.H., Hansen, N.F., Durand, E.Y., Malaspinas, A.S., Jensen, J.D., Marques-Bonet, T., Alkan, C., Prufer, K., Meyer, M., Burbano, H.A., Good, J.M., Schultz, R., Aximu-Petri, A., Butthof, A., Hober, B., Hoffner, B., Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E.S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I., Doronichev, V.B., Golovanova, L.V., Lalueza-Fox, C., De La Rasilla, M., Fortea, J., Rosas, A., Schmitz, R.W., Johnson, P.L., Eichler, E.E., Falush, D., Birney, E., Mullikin, J.C., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D., Paabo, S., 2010. A draft sequence of the Neanderthal genome. Science 328, 710–722. Hancock, A.M., Alkorta-Aranburu, G., Witonsky, D.B., Di Rienzo, A., 2010. Adaptations to new environments in humans: the role of subtle allele frequency shifts. Philosophical Transactions of the Royal Society London B: Biological Sciences 365, 2459–2468. Higham, T., Douka, K., Wood, R., Ramsey, C.B., Brock, F., Basell, L., Camps, M., Arrizabalaga, A., Baena, J., Barroso-Ruiz, C., Bergman, C., Boitard, C., Boscato, P., Caparros, M., Conard, N.J., Draily, C., Froment, A., Galvan, B., Gambassini, P., Garcia-Moreno, A., Grimaldi, S., Haesaerts, P., Holt, B., IriarteChiapusso, M.J., Jelinek, A., Jorda Pardo, J.F., Maillo-Fernandez, J.M., Marom, A., Maroto, J., Menendez, M., Metz, L., Morin, E., Moroni, A., Negrino, F., Panagopoulou, E., Peresani, M., Pirson, S., De La Rasilla, M., Riel-Salvatore, J., Ronchitelli, A., Santamaria, D., Semal, P., Slimak, L., Soler, J., Soler, N., Villaluenga, A., Pinhasi, R., Jacobi, R., 2014. The timing and spatiotemporal patterning of Neanderthal disappearance. Nature 512, 306–309. Holden, C., Mace, R., 1997. Phylogenetic analysis of the evolution of lactose digestion in adults. Human Biology 69, 605–628. Hudson, R.R., Kreitman, M., Aguade, M., 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153–159. Ingman, M., Kaessmann, H., Paabo, S., Gyllensten, U., 2000. Mitochondrial genome variation and the origin of modern humans. Nature 408, 708–713. Itan, Y., Powell, A., Beaumont, M.A., Burger, J., Thomas, M.G., 2009. The origins of lactase persistence in Europe. PLoS Computational Biology 5, e1000491. International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296 Author's personal copy 296 Human Evolutionary Genetics Kamberov, Y.G., Wang, S., Tan, J., Gerbault, P., Wark, A., Tan, L., Yang, Y., Li, S., Tang, K., Chen, H., Powell, A., Itan, Y., Fuller, D., Lohmueller, J., Mao, J., Schachar, A., Paymer, M., Hostetter, E., Byrne, E., Burnett, M., Mcmahon, A.P., Thomas, M.G., Lieberman, D.E., Jin, L., Tabin, C.J., Morgan, B.A., Sabeti, P.C., 2013. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell 152, 691–702. Ke, Y., Su, B., Song, X., Lu, D., Chen, L., Li, H., Qi, C., Marzuki, S., Deka, R., Underhill, P., Xiao, C., Shriver, M., Lell, J., Wallace, D., Wells, R.S., Seielstad, M., Oefner, P., Zhu, D., Jin, J., Huang, W., Chakraborty, R., Chen, Z., Jin, L., 2001. African origin of modern humans in East Asia: a tale of 12,000 Y chromosomes. Science 292, 1151–1153. Kelley, J.L., Madeoy, J., Calhoun, J.C., Swanson, W., Akey, J.M., 2006. Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Research 16, 980–989. Kimura, M., 1968. Evolutionary rate at the molecular level. Nature 217, 624–626. Kimura, M., 1971. Theoretical foundation of population genetics at the molecular level. Theoretical Population Biology 2, 174–208. Kingman, J.F.C., 1982. The coalescent. Stochastic Processes and Their Applications 13, 235–248. Kwiatkowski, D.P., 2005. How malaria has affected the human genome and what human genetics can teach us about malaria. American Journal of Human Genetics 77, 171–192. Lewontin, R.C., Krakauer, J., 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175–195. Luca, F., Perry, G.H., Di Rienzo, A., 2010. Evolutionary adaptations to dietary changes. Annual Review of Nutrition 30, 291–314. McDougall, I., Brown, F.H., Fleagle, J.G., 2005. Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature 433, 733–736. Mendez, F.L., Krahn, T., Schrack, B., Krahn, A.M., Veeramah, K.R., Woerner, A.E., Fomine, F.L., Bradman, N., Thomas, M.G., Karafet, T.M., Hammer, M.F., 2013. An African American paternal lineage adds an extremely ancient root to the human Y chromosome phylogenetic tree. American Journal of Human Genetics 92, 454–459. Mulcare, C.A., 2006. The Evolution of the Lactase Persistence Phenotype. Genetic, Evolution and Environment. University of London, London. Nei, M., Roychoudhury, A.K., 1974. Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids. American Journal of Human Genetics 26 (4), 421. Nielsen, R., 2005. Molecular signatures of natural selection. Annual Review of Genetics 39, 197–218. Novembre, J., Stephens, M., 2008. Interpreting principal component analyses of spatial population genetic variation. Nature Genetics 40, 646–649. Pool, J.E., Hellmann, I., Jensen, J.D., Nielsen, R., 2010. Population genetic inference from genomic sequence variation. Genome Research 20, 291–300. Powell, A., Shennan, S., Thomas, M.G., 2009. Late Pleistocene demography and the appearance of modern human behavior. Science 324, 1298–1301. Przeworski, M., 2002. The signature of positive selection at randomly chosen loci. Genetics 160, 1179–1189. Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., Viola, B., Briggs, A.W., Stenzel, U., Johnson, P.L., Maricic, T., Good, J.M., MarquesBonet, T., Alkan, C., Fu, Q., Mallick, S., Li, H., Meyer, M., Eichler, E.E., Stoneking, M., Richards, M., Talamo, S., Shunkov, M.V., Derevianko, A.P., Hublin, J.J., Kelso, J., Slatkin, M., Paabo, S., 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060. Rice, S.H., 2004. Effective Population Size. Evolutionary Theory: Mathematical and Conceptual Foundations. Sinauer Associates, Inc., Sunderland, MA, USA. Sverrisdottir, O.O., Timpson, A., Toombs, J., Lecoeur, C., Froguel, P., Carretero, J.M., Ferreras, A., Arsuaga Gotherstrom, J.L., Thomas, M.G., 2014. Direct estimates of natural selection in iberia indicate calcium absorption was not the only driver of lactase persistence in Europe. Molecular Biology and Evolution 31, 975–983. Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. Vigne, J.D., 2008. Zooarchaeological aspects of the neolithic diet transition in the Near East and Europe, and their putative relationships with the neolithic demographic transition. In: Bocquet-Appel, J.P., Bar-Yosef, O. (Eds.), The Neolithic Demographic Transition and Its Consequences. Springer, New York. Voight, B.F., Adams, A.M., Frisse, L.A., Qian, Y., Hudson, R.R., Di Rienzo, A., 2005. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proceedings of the National Academy of Sciences of the United States of America 102, 18508–18513. Voight, B.F., Kudaravalli, S., Wen, X., Pritchard, J.K., 2006. A map of recent positive selection in the human genome. PLoS Biology 4, e72. Wei, W., Ayub, Q., Chen, Y., Mccarthy, S., Hou, Y., Carbone, I., Xue, Y., TylerSmith, C., 2013. A calibrated human Y-chromosomal phylogeny based on resequencing. Genome Research 23, 388–395. Wilde, S., Timpson, A., Kirsanow, K., Kaiser, E., Kayser, M., Unterlander, M., Hollfelder, N., Potekhina, I.D., Schier, W., Thomas, M.G., Burger, J., 2014. Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y. Proceedings of the National Academy of Sciences of the United States of America 111, 4832–4837. Wright, S., 1931. Evolution in Mendelian populations. Genetics 16, 97–159. Xue, Y., Zhang, X., Huang, N., Daly, A., Gillson, C.J., Macarthur, D.G., Yngvadottir, B., Nica, A.C., Woodwark, C., Chen, Y., Conrad, D.F., Ayub, Q., Mehdi, S.Q., Li, P., Tyler-Smith, C., 2009. Population differentiation as an indicator of recent positive selection in humans: an empirical evaluation. Genetics 183, 1065–1077. Zuckerkandl, E., Pauling, L., 1965. Molecules as documents of evolutionary history. Journal of Theoretical Biology 8, 357–366. International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 289–296