THE GENETIC POPULATION STRUCTURE OF MARINE SPECIES IN RELATION TO THEIR GENERAL BIOLOGY J. Mork, TBS Lecture notes in population genetics at MNK BI 260 spring 2001 2 CONTENTS: INTRODUCTION ................................................. PAGE: 3 A. I. II. III. IV. DEFINITIONS AND TOPICS:....................................... Species....................................................... Population.................................................... Evolution..................................................... Genetic population sructure................................... 4 4 4 4 4 V. VI. VII. VIII. The four evolutionary forces ................................. The Hardy-Weinberg theorem.................................... Estimating gene frequencies................................... How to test for H-W genotypic proportions..................... 4 4 5 5 B. I. II. IIa. IIb. III. IV. V. VI. VII. VIII. GENETIC DIFFERENTIATION:...................................... 6 How to test for differences in allelic proportions ........... 6 Measures of general differentiation........................... 7 Relative measures (Wright’s Fst, Nei’s Gst)................... 7 Absolute measures (Nei's I and D)............................. 8 Genetically effective population size......................... 8 Genetic drift................................................. 8 Fitness- and selection coefficients........................... 8 Gene flow..................................................... 9 Genetic equilibrium situation................................. 9 Marine, versus anadromous and limnic species’ pop. structure.. 10 C. SPECIFIC BIOLOGY AND GENETIC STRUCTURE ........................11 D. TYPES OF MILIEU ADAPTATIONS ...................................14 Referenced articles ...........................................14 2 3 INTRODUCTION Many aquatic organisms are notoriously difficult study objects. In particular this applies to marine species in oceanic environment. Their generally low accessability and hence the difficulties connected with obtaining direct observations of behaviour, interactions and distribution have often created problems in solving e.g. taxonomic problems. Below the species level, the identification and delineation of subtle population structures have proved to be a rather hazardeous exercise if attacked with traditional methodology (general biology, morphology, migrations etc). There are several characteristic effects related to a marine way of life which lead to these difficulties: 1. The variances in growth patterns and other biological characteristics are larger than for terrestric organisms. Fish are, for example, phenotypically more variable than terrestric vertebrates (Mayr 1969). 2. The strong influence of milieu regimes may mask genetic differences between groups. 3. In poikiloterms, ocean current transport and different temperature regimes can create temporal and spatial ecotypes which are easily misclassified as genetic "races". 4. Migrations are often on a large geographic scale and include areas not available for study. 5. Recaptures from tagging experiments often depend on commercial catches and are therefore restricted to a narrow range of species. 6. Animal communities are still not explored in large areas of the oceans and the seafloor. 7. Large depths, a hostile environment for man, and non-optimal sampling techniques may result in inferior experimental designs for traditional studies of demographics and dynamics. Thus both the characteristics traits of marine organisms and the difficulties in obtaining reliable measurements of those characteristics, are obstacles for achieving good species and population descriptions with classical tools. However, modern population genetic techniques offer alternative approaches in taxonomy and population descriptions. Genetic characteristics of species and population include stable traits which are not affected by short term changes in the environment. Furthermore, genes are present in all individuals and at all stages of development. This makes them suited not only for taxonomic purposes, but also for delineation of the population structure that may exist within species. This compendium contains a brief introduction to population genetics; its basic theory, a few statistical tools, and some general patterns resulting from applications in marine biology. 3 4 A. DEFINITIONS AND TOPICS I. Species: A structure of populations. These populations may in practice be wholly or partially reproductively isolated from each other, but interbreeding between them would produce fertile offspring. This structure of populations is totally reproductively isolated from, and would also not produce fertile offspring if interbred with similar population groups constituting other species. II. Population: An intraspesific group of individuals which share a common gene pool, and which is wholly or partially reproductively isolated from other such groups within the species. Populations are the real evolutionary units. It is the populations which give basis for new species if their gene frequencies become sufficiently changed by reproductive isolation from each other (i.e no or little gene flow) over extended evolutionary periods. III. Evolution: Any change in gene frequency (although not covering all aspects of evolution, this definition is very useful in population genetics). IV. Genetic population structure: The distribution of genetic variation within and between popuoations within a species (i.e. the relative parts of the total genetic variability within a species that are manifested as genetic differences between populations and between individuals. Strongly structured species show large genetic differences between populations, while a totally unstructured species would consist of only one population. V. The 4 evolutionary forces acting on gene frequencies in populations: 1. Mutations within populations 2. Genetic drift within populations (se B. III og IV) 3. Gene flow between populations 4. Natural selection within populations (se B. V) Mutations and genetic drift favour evolutionary change (increases the genetic differences between populations and higher taxa), while gene flow inhibits evolution by counteracting the development of gene frequency differences between populations, and levelling out those that may exist. Natural selection can favour or inhibit evolution depending on the actual selection regime being local or universal. VI. The Hardy-Weinberg theorem: "In a panmictic population (random pairing) the expected proportion of genotypes at a polymorphic two-allel locus is determined by the frequencies (p and q) of the alleles according to the binomial formula (p+q)2 = (p2 + 2pq + q2). In absence of effects from evolutionary forces (1-4 above) the gene frequencies and genotype frequencies are constant over generations and can be used as population characteristics». With more than two alleles the principle is the same, but the expected 4 5 genotype distribution is multinomic: (p+q+r)2, (p+q+r+s)2, etc. In general, the number of possible genotypes with n alleles is: n*(n+1)/2. VII. The calculation of gene frequencies from observed genotypic distribution: Consider a locus with 2 alleles A og B which by sexual reproduction is combined in (segregates to) the genotypes AA (homozygot)e, AB (heterozygote), og BB (homozygote). In a ramdom sample from a natural population the following genotype distribution is observed among 100 individuals: AA 30 AB 60 BB 10 N 100 qA 0.60 qB 0.40 100 diploid individual vil altogether have 200 genes at each locus. Of these 200, 2*30 + 1*60 = 120 is of type A. Hence the frequency of A is: qA = 120/200 = 0.60. The gene frequency of B is calculated in the same manner, or as (1-qA) = 0.40 since the frequency must sum to 1.0. VIII. How to test for Hardy-Weinberg distribution of genotypes: Observed distributions based on classification criterions («Yes-No») variables like sex, genotypes etc can be tested for deviances from expected values by the chi-square test. Test statistic (chi-square) for each class = [(observed - expected)2 / (expected)] and is summed over classes. Degrees of Freedom (DF) in the test is the number of classes which can vary without «locking» the total. Ordinarily this number will be the total number of classes minus 1, but since in this «Goodness-of-fit» test we use observed gene frequency for estimating the expected number of the various genotypes, we subtract one extra DF (see sect. VIII). In the «Goodness-of-fit» test for HardyWeinberg proportions therefore, the DF is generally the number of genotype classes minus the number of alleles used to calculate them. In the table above we calculated the gene frequencies qA=0.60 og qB=0.40. According to the HardyWeinberg theorem and the binomial distribution (p2*N + 2pq*N + q2*N) the number of the different genotypes among 100 individuals (N=100) should ideally be as shown in the row «Calculated» in the following test for the differences between observed and expected. Observed Caculated Chi-square AA 30 (36.0) 1.000 AB 60 (48.0) 3.000 BB 10 (16.0) 2.25 N 100 (100.0) qA 0.60 0.60 qB 0.40 0.40 Sum chi-square = 1.00+3.00+2.25=6.25. DF = No. of genotypes minus no. of alleles = 3-2=1. Hence, P ~ 0.012 The deviance between observed and expected genotypic numbers in the table may be caused by a stochastic sampling error (kun 100 individ), but how peobable is that? The chi-square «Goodness-offit» test performed indicates that a deviance as large as that observed here is expected to be 5 6 encountered only in about one out of 12000 trials by stochasticity alone if we were re-sampling a population in Hardy-Weinberg equilibrium. We therefore reject the null hypothesis, which is that the samples is from one population in H-W equilibrium, and accepts the alternative hypothesis that the observed deviance reflects reality. Deviances of this type, i.e. an excess of heterozygotes compared to the H-W expected proportion, have few other explanations than that there has been a selection which has favoured the survival of heterozygotes («heterosis», «overdominance», «balanced polymorphism»). The opposite situation, i.e. a deficit of heterozygotes, would indicate that the sample is taken from a phhysical mixture of populations with different gene frequencies at the locus under study (The «Wahlund» effect). The relative proportions of heterozygotes at a locus is called the heterozygosity at the locus. We distinguish between observed (actually counted) and expected (proportions of genotypes based on gene frequency calculations assuming H-W) heterozygosity. B. GENETIC DIFERENTIATION Given this genotypic distribution at one polymorpic locus (alleles A and B) in two populations: AA AB BB N qA qB Pop1 36 48 16 100 .6 .4 Pop2 16 48 36 100 .4 .6 Total 52(50) 96(100) 52(50) 200 .5 .5 I. How to test for differences in allelic proportions (=frequencies) between samples: The null hypothesis is (as in most tests) that there are no differences between samples, i.e. that they can be two samples drawn from the same population. Under this assumption, the best estimate we can have of the real allelic proportion in that population is the allele proportions in the pooled samples (because increasing the sample size will always improve the accuracy). The first step in the test is to calculate the observed numbers (NB! not proportions) of alleles in the two samples and in the pooled samples. These numbers are easily obtained by counting two A-alleles in the AA homozygote and one A-allele in the AB heterozygote, and so on. Thereafter, we calculate the expected numbers under the null hypothesis for each sample using the allele numbers in the pooled samples as model (the relative allelic proportions in the separate samples A and B are expected to be the same as in n the pooled samples). The following table can be set up: # allele A (and exp # ) # allele B (and exp # ) Total Sample A 72+48=120 (100) 32+48= 80 (100) 200 Sample B 32+48= 80 (100) 72+48=120 (100) 200 6 7 Pooled sample A & B 200 200 400 We then calculate the partial chi-square value for each cell, and sum them to a total chi-square. For example, the chi-square for allele A in sample A is calculated as: (120-100)2/100=4. In this particular case, since the sample sizes are equal and the allelic proportions in the pooled samples are also equal, the chi-square will actually be the same (=4) in all the four cells, giving a total chi-square=16. Before looking up this value in a chi-square table, we need the DF (degrees of freedom). In RxC (rows x column) tables like this the DF is (R-1) x (C-1), here (2-1) x (2-1) = 1. Thus, the chi-square value is 16 with one degree of freedom, which corresponds to a probability of P<0.0001. We can safely conclude that the allelic proportions in the two samples are too differerent to have been caused by chance in the sampling, and that they therefore very probably reflects real differences in allelic frequencies. Often, we can from such results also infer that the populations from which the two samples were drawn must be reproductively isolated (i.e., too low gene flow to ‘homogenize’ the populations). II. Measures of general differentiation: Relative measures: Fst (S.Wright), Gst (M. Nei) Absolute measures: Genetic Identity (I) og Genetic distance (D) (M. Nei). IIa. Wright's Fst = 1 - (Hs/Ht), where Hs is the mean observed heterozygosity over populations, while Ht is the expected heterozygosity in the total material (i.e. basert on "overall" gene frequencies). Wright's Fst is calculated for single loci with two alleles. From the tabulated data both Pop1 and Pop2 will have an observed heterozygosity of 0.48, giving a mean (Hs) of 0.48. The expected number of heterozygotes in the total material (Hardy-Weinberg expectation) is (2*0.5*0.5*200)=100, i.e. Ht=0.5. Therefore Fst = 1 - (0.48/0.50) = 0.04 in this case. Nei's Gst expands Fst to include multiple alleles at multiple loci in one single measure. While Wright’s Fst is based on the actual genotypic distribution, Nei’s Gst is calculated from the observed gene frequencies (assuming H-W equilibria in each of the single populations). Mathematically, the two measures are not principally different, and Gst can bee viewed as an average Fst over loci. IIb. Nei's (Genetic Identity) I = (xiyi) / [ ( (xi2)( (yi2)], where xi, yi is the frequency of the i-th allele in population X and Y, respectively. I = (0.6*0.4 + 0.4*0.6) / [ ((0.62 + 0.42)(0.42 + 0.62))] I = 0.48/0.52 = 0.9231 7 8 Nei's (Genetic Distance) D = - ln(I) = - ln(0.9231) = 0.08 in this case. This D-value (genetic distance) is an absolute measure of genetic differences, and an estimate of the mean number of amino acid substitutions («opposite fixations») per locus. Usually, D-values are estimated as an average over many loci (>10), monomorphic as well as polymorphic. III. Ne - genetically effective population size: Definition: "The size of the ideal (H-W) population which looses genetic variability (by genetic drift) at the same rate as the one under study». Ne is strongly affected by the relative proportion of males and females (1), the degree ov generation overlapping (2), and the historical variation in population size (3). (1) Ne = (4*Nm*Nf) / (Nm + Nf) (2) Ne = N0*t*l (3) Ne = n / [(Ni-1)] Symbols in I-III: m=males, f=females, N0=number born, t=mean reproductive age, l=probability for surviving to reproductive age, n=no. of generations, Ni = N in the i-th generation. IV. Genetic drift: In populations of limited size, the transfer of a gene frequency (p) from parents to offspring will usually not be completely accurate. Therefore, the expected value of p in the offspring will have a variance which magnitude depends on frequency itself and on the effective size of the parental generation: Var(p) = p(1-p)/(2Ne), og SE = [Var(p)]. V. Fitness- and selection coefficients: Consider a locus with two alleles A and B with three possible genotypes AA, AB and BB. Also consider a population of 100 individuals where the frequency of both alleles initially were 0.5, and the genotype distribution therefore 25AA, 50AB, and 25 BB. Towards reproductive age, the genotypes show differential survival, i.e. a selection is taking place: AA AB BB Ne qA Before selection 25 50 25 100 0.500 After selection 15 45 10 70 0.536 Survival 15/25=0.6 45/50=0.9 10/25=0.4 w (fitness) 0.6/0.9=0.67 0.9/0.9=1.00 0.4/0.9=0.44 8 9 s (=1-w) 0.33 0.00 0.56 Genotypic fitness coeffisient (w) is the relative survival compared to the «best» genotype (AB in this example). The selection coefficent is defined as 1-w. VI. Gene flow: If population A (gene frequency q) receives a proportion m of immigrants each generation from population B (gene frequency p), there will be a change q in gene frequency at each locus according to the formula: q = m(p-q) The magnitude of the impact in each generation thus depends on the proportion of immigrants and on the actual difference in gene frequency between donor and recipient population. VII. Genetic equilibrium situation: When an evolutionary regime has been stable for evolutionary significant periods of time (order of 4N generations where N=population size), an equilibrium situation is expected where the various evolutionary forces cancel out each other and the change in gene frequency is small between generations. How genetically different the populations will be at this stage depends on the relative magnitude of the evolutionary forces involved. Thus the equilibrium population structure will, besides the effect of the age of the system, be affected by the population’s general biology and habitats through their effective population sizes (genetic drift), stationarity/migration habits (effective gene flow), and degree of genetic adaptation to local environmental factors. Knowledge of biological traits may therefore form basis for general considerations concerning genetic differentiation in various species. One problem, however, is that the evolutionary factors population size, gene flow and selection are not probably constant over time, and that dramatic events like severe population bottlenecks (see pt B.III.3), large immigrations fluxes, and milieu catastrophies may leave their impact on the genetic population structure for long evolutionary periods even if their occurrences are generally rare. 9 10 VIII. Marine versus anadromous and limnic species’ population structures Some characteristics of marine organisms and their environment have a clear bearring to the important evolutionary forces genetic drift, gene flow and selection. First, since space and geography usually pose no restriction on the size of marine populations, they are often more abundant (e.g. cod, herring, capelin) than anadromous (e.g salmon, trout) and limnic (e.g. whitefish, garpike, charr, trout) populations. Since the number of individuals often is so large in marine populations, the evolutionary factors gene flow and local adaptation are probably much more significant than genetic drift in moulding the genetic population structures (see C and Table I and II). This argument holds for invertebrates as well as for fish, and has been supported by studies in both taxa. Also, there are relatively fewer physical barriers to migration and gene flow in the marine milieu, and many marine species have pelagic stages in their lifespan. Furthermore, «homing» to the place of birth seems less common among marine than among anadromous species. All these factors have a limiting effect on genetic differentiation. In addition, important physical factors in the marine environment (e.g. temperature, salinity) are characterized by homogeneity over vast geographical ranges. This reduces the necessity of local genetic adaptation to milieu factors and leeds to a lower overall degree of genetic differentiation. In accordance with this, comparative studies have shown that marine species in general are less genetically structured and differensiated than anadromous and limnic species. There is, nevertheless, some genetic substructuring among marine fishes (e.g. in herring, cod, and blue whiting), at least on a large geographic scale (e.g. East and West Atlantic). In many cases however, geographic variability in morphologic and meristic traits as well as in single-locus molecular markers has been shown to be adaptive (i.e., results of selection) rather than being the effect of isolation and genetic drift. To the degree that the term “reproductive isolation” applies to marine fish species it appears to often be «Isolation by distance». 10 11 C. SPECIFIC BIOLOGY AND GENETIC STRUCTURE Table I. Biological traits relevant for the de 4 evolutionary main forces and therefore for the level of genetic structuring in marine species (No of plus signs denotes relative potensials). Mutations Genetic drift Selection Gene flow +++ +++ ++ Stationarity ++ +++ Milieu tolerance +++ +++ Fecundity +++ ++++ Shoaling behav. + +++ Homing + +++ Pelagic/benthic eggs/larvae Weaning ++ ++++ + ++ Pop. size Multiple spawn ++ Migrations Marine/anadrom. + + +++ +++ +++ This table indicates that the biology of the species has greatest potensial for affecting the evolutionary forces SELECTION and GENE FLOW. These two forces will often have opposite effects on genetic differentiation, and their relative magnitudes will therefore determine the actual level of genetic structuring of the populations within a species. Hence, species with populations spread on many types of habitats with large milieu fluctuations will be expected to show genetic substructuring. The actual level of structuring will depend on the effective gene flow between populations. Biological traits that increase the tendency towards structuring would thus be high stationarity, solitarity, strong homing, benthic rather than pelagic eggs and larvae, low migration tendency, and anadromous/katadromous rather than a pure marine life history. 11 12 ATLANTIC COD. Distributed on the shelves on both sides of the North Atlantic. Very large populations which experience relatively similar milieu conditions. Can undertake extensive migrations, but do not show a very accurate homing. The pelagic egg- and larvae stage lasts for several months. Thrives in cold and temperate climatic condtitions. Extensive genetic studies have revealed only limited genetic differentiation thoughout the range. ATLANTIC SALMON. Distributed all over the North Atlantic. Anadromous (spawning in fresh water). Relatively small river populations often with substantial differences in local environment between rivers. Benthic egg, larvae and young stages. Thrives in cold and temperate climatic conditions. Can undertake extensive migrations and shows an extremely accurate homing. Extensive studies of genetic structure have indicated a moderate level of genetic differentiation. DOG WHELK Nucella lamellosa is distributed on the American West coast. Relatively small spawning groups. Lives in the tidal zone, where microclimate can vary extensively. Limited migration capability, but shows tendency of homing. Benthic egg capsules in which the larvae develop to small whelks before hatching. No pelagic stadia. Thrives in cold and temperate climatic conditions. Genetic studies have revealed a considerable genetic differentiation on both small and large geographic scales. 12 13 Table II. Two marine and one anadromous species; biological traits relevant for the relative strength of the two evolutionary forces gene flow and selection (local adaptation). Atlantic cod Atlantic salmon Dog whelk Pop. size Large Small Small Stationary No Yes Yes Milieu tolerance High High High Fecundity High Medium Low/medium No (?) Yes Yes Both Both Benthic Pelagic Benthic Benthic Migrations Large Large Small Marine/anadrom. Marine Anadromous Marine Homing Pelagic/benthic life Pelagic eggs or larvae Adult cod may undertake extensive migrations in connection with annual spawning. Cod population sizes are usually very large, and the spawning products (eggs and larvae) drift pelagically for extended periods of time (several months). Also the salmon performs extensive migration, but its anadromous way of life includes a very precise homing behaviour to home rivers which may have very different environmental conditions. The eggs are buried in the river-bottom gravel, and the young individuals are stationary until smoltification. The dog whelk has limited migration capacity. Its population sizes are typically small, and the offspring (from 40-60 egg capsules) are benthic at all stages. Its intertidal habitat is often characterized by highly variable environmental conditions. Population genetic studies using electrophoretic (presumably neutral) markers in these three species have shown that with respect to level of genetic substructuring they can be ranked as follows (from lower to higher genetic distance): Cod ---> salmon ---> dog whelk For neutral characters, (which indicates the balance between genetic drift and gene flow) this ranking appears reasonable in light of the biological characterization in Table II. With respect to local adaptation (selection), very small population size is a disadvantage, whereas medium size populations may achieve considerable local genetic adaptation if the gene flow between populations is low. The homing tendency of both salmon and whelk, and the low migration capacity of the dog whelk are factors that will act to reduce the gene flow and thus facilitate adaptation. There is evidence that salmon has developed much more local genetic adapations than cod, but little is known about the dog whelk in this respect. 13 14 D. TYPES OF MILIEU ADAPTATIONS 1. Biochemical 2. Morphological 3. Behavioural 4. Resistence (disease/parasites) Ad. 1: LDH-genotypes in Fundulus heteroclitus,haemoglobin genotypes in cod (Ref. II and III). Ad. 2. Pronounced differences in body form and size in salmon populations is believed to reflect genetic adaptations to the size, depth and water flow of the rivers. The relevance of body form may be difficult to assess, but body size clearly has effects on the salmon’s ability to ascend rivers, rapids and waterfalls. In general, small rivers have small salmon. The regulation mechanism appears, at least partly, to be the number of years at sea before the return spawning migration (Ref. IV). Ad. 3. Time for return from feeding areas at sea differs greatly between salmon stocks (e.g., the . Figga-salmon at Steinkjær, which return from the Faroe feeding areas is several weeks ahead of other rivers stocks in that region. This early return appears neccessary in order to be able to ascend the Figga river during the spring flood, and is probably a strongly selected trait.(Ref. V). Ad. 4. The parasite Gyrodactylus salaris L.occurs naturally in the Baltic. The Baltic salmon (e.g.,. Neva-salmon) is resistent. This parasite/host relationship has probably developed through thousands of generations. This parasite was accidentally imported to Norway together with salmon for production plants. It was soon discovered that Norwegian salmon stocks were not resistant, and that the parasite could totally wipe out salmon river stocks when introduced during stock reinforcement programs. The status in 1992 was that the parasite had wiped out the salmon stocks in 35 Norwegian rivers (Ref. VI). REFERENCED ARTICLES [AVAILABLE ON REQUEST AT TBS] I. Mayr, E. 1970. Chapter 11 ("Geographic variation") in: Populations, Species, and Evolution. Harvard University Press, Cambridge, Massachusetts. SBN: 674-69010-9. 453 p. II. Place, A.R. & D.A. Powers 1979. Genetic variation and relative catalytic efficiencies: Lactate dehydrogenase B allozymes of Fundulus heteroclitus. Proc. Natl. Acad. Sci. USA 76(5): 2354-2358. III. Mork, J., Giskeødegård, R. & G. Sundnes, 1983. Haemoglobin polymorphism in Gadus morhua: genotypic differences in maturing age and within-season gonad maturation. - Helgolander Meeresunters. 36: 313-322. IV. Jonsson, N., Hansen, L.P. & B. Jonsson 1991. Variation in age, size and repeat spawning of adult Atlantic salmon in relation to river discharge. Journal of Animal Ecology 60: 937-947. V. Hansen, L.P & B. Jonsson 1991. Evidence of a genetic component in the seasonal return pattern of Atlantic salmon, Salmo salar L. J. Fish. Biol. 38: 251-258. VI. Bakke, T.A., Jansen, P.A. & L.P. Hansen 1990. Differences in the host resistance of Atlantic salmon, Salmo salar L., stocks to the monogenean Gyrodactylus salaris Malmberg, 1957. J. Fish. Biol. 37: 577-587. 14