Multivariate analysis of allozyme variation ... 1 southwest Oregon

181 Multivariate analysis of allozyme variation patterns in coastal Douglas-fir from southwest Oregon1 S. A. MERKLE2 AND W. T. ADAMS Department of Forest Science, College of Forestry, Oregon State University, Corvallis, OR, U.S.A. 97331 AND R. K. CAMPBELL Pacific Northwest Forest and Range Experiment Station, United States Department of Agriculture, Forest Service, Corvallis, OR, U.S.A. 97331 Received January 14, 1987 Accepted October 13, 1987 MERKLE, S. A., ADAMS, W. T., and CAMPBELL, R. K. 1988. Multivariate analysis of allozyme variation patterns in 1 coastal Douglas-fir from southwest Oregon • Can. J. For. Res. 18: 181-187. Isozyme data collected from megagametophytes of coastal Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco var. menziesii) parent trees, representing 22 southwest Oregon breeding zones, were analyzed by multivariate techniques to describe the distribution of genotypic variation among and within breeding zones and to relate genotypic and envi ronmental variation. Data entered were mean haploid genotype scores obtained by averaging two haploid genotype scores from each parent tree. Haploid genotype scores were created from 27-locus haploid genotypes of two mega gametophytes collected from each of 1230 parent trees. Although principal components analysis did not indicate the presence of linkage disequilibria among loci, canonical discriminant analysis suggested that much more genotypic variation may be accounted for by breeding-zone differences than was evident from single-locus techniques. The first two canonical variables, which accounted for - 250Jo of the genotypic variation, appeared to separate breeding zones on the basis of geographic and elevational differences among zones. Regressing canonical variable scores against location variables failed to provide a model attributing > lOOJo of genotypic variation to latitude, elevation, or distance from the ocean. Although canonical correlation analysis of mean haploid genotype scores with the same location variables produced two significant canonical variables accounting for 390Jo of the variation, little of the variation accounted for by the canonical variables was related to location variables. Although these results may be due to the small geographic scale of the study, the region covered is characterized by extreme environmental heterogeneity, to which variability in seed ling quantitative traits has been strongly correlated in a companion common garden study. In sum, multivariate tech niques were not markedly better than single-locus techniques in providing evidence that allozyme variation is adaptive in the coastal Douglas-fir breeding zones studied. Consequently, multivariate techniques cannot be expected to improve the use of allozymes for certifying seed or for designating breeding zones in this region. MERKLE, S. A., ADAMS, W. T., et CAMPBELL, R. K. 1988. Multivariate analysis of allozyme variation patterns in 1 coastal Douglas-fir from southwest Oregon • Can. J. For. Res. 18 : 181-187. Des donnees d'isozyme provenant de megagametophytes d'arbres parents de douglas c6tiers (Pseudotsuga menziesii (Mirb.) Franco var. menziesii) representant 22 zones de reproduction du sud-ouest de !'Oregon ont ete analysees suivant des techniques multivariees afin de preciser Ia repartition de Ia variation genotypique entre et parmi ces zones et de relier celles-ci aux variations environnementales. Les donnees obtenues comprenaientJes comptes genotypes haploldes moyens obtenus en faisant Ia moyenne de deux comptes pour chaque arbre parent. Ces comptes furent crees a partir de genotypes haploldes a 27 emplacements de deux megagametophytes recoltes sur chacun des 1230 arbres parents. Bien que !'analyse des constituants principaux n'indiquait pas Ia presence de desequilibres de lien parmi les emplacements, !'analyse discriminante canonique a tout de meme suggere qu'une plus grande part de Ia variation genotypique pourrait etre due aux differences entre les zones de reproduction qu'il ne semblait evident a partir des techniques d'un seul emplacement. Les deux premieres variables canoniques, qui comptaient pour environ 250Jo de Ia variation genotypique, semblent pouvoir separer les zones de reproduction sur Ia base de differences geographiques et altitudinales parmi les zones. Une regression des variables canoniques en fonction des variables d'emplacement n'a pu former un modele capable d'attribuer plus de lOOJo de Ia variation phenotypique a Ia latitude, !'altitude ou a distance depuis !'ocean. Bien que !'analyse de correlation canonique des comptes genotypiques haploides moyens avec les memes variables d'emplacement ait donne deux variables canoniques significatives comptant pour 390Jo de Ia variation, seulement une faible proportion de Ia variation impliquee par les variables canoniques a pu etre reliee aux variables d'emplacement. Bien que ces resultats puissent etre dus a Ia petite echelle geographique de !'etude, Ia region couverte est neanmoins caracterisee par une heterogeneite environnementale extreme, pour laquelle Ia variabilite des traits quantitatifs des semis a ete fortement correlee au cours d'une etude similaire au champ. Pour resumer, les techniques multivariees n'ont pas ete tellement meilleures que celles a emplacement unique pour rendre evident que Ia variation allozyme peut s'adapter aux zones de reproduction du douglas c6tier etudiees. Par consequent, on ne peut s'attendre ace que les techniques multivariees puissent ameliorer l'emploi des allozymes pour Ia certification des semences ou pour Ia designation de zones de reproduc tion dans Ia region. [Traduit par Ia revue] 1 2 Paper No. 2148, Forest Research Laboratory, Oregon State University, Corvallis. Present address: School of Forest Resources, University of Georgia, Athens, GA, U.S.A. 30602. Printed in Canada I lrnprimC au Canada CAN. J. FOR. RES. VOL. 18, 1988 182 Introduction Studies of genetic diversity in coastal Douglas-fir (Pseud otsuga menziesii ( Mirb.) Franco var. menziesil) have led to opposing conclusions depending on whether the investigated characteristics were quantitative (seedling) traits or single-locus (allozyme) markers. Common garden studies have revealed strong associations between patterns of genetic variation in quantitative traits and environmental variables, including evidence of consistent clines over geographic transects, which suggests that adaptation has influenced the observed patterns (Irgens-Moller 1967; Griffin and Ching 1977; Hermann and Lavender 1968; Campbell and Sorensen 1978; Loopstra 1984). Allozyme studies, on the other hand, although indicating high levels of genetic diversity in coastal Douglas-fir, have shown only weak associations between allele frequencies and environmental variables (Li 1986 ; Yang et a/. 1977; Yeh and O' Malley 1980). Furthermore, < 100oJ of the total genetic diversity as defined by Nei (1973) in coastal Douglas-fir resides among populations and over 90% within populations (Yeh and O' Malley 1980; El-Kassaby and Sziklai 1982; Merkle and Adams 1987). Patterns of genetic variation within and among southwest Oregon breeding zones have been investigated for both quantitative traits and single loci. This region is characterized by variable climate and topography, Climate changes dramatically from the coast eastward, with decreasing rain fall, decreasing winter temperatures, and increasing summer temperatures (Franklin and Dyrness 1973). Although quan titative traits showed strong clines over environmental gradients and varied substantially among breeding zones (Loopstra 1984), single loci did not ( Merkle and Adams 1987). Lack of breeding-zone differences in allele frequen cies, however, does not necessarily mean that these loci are neutral to selection pressure. In fact, Lewontin (1984) has shown that differentiation is much more difficult to detect statistically at the single-locus than at the quantitative-trait level. This inability to detect differences among populations at the single-locus level has prompted application of multivariate statistical techniques to the analysis of poly morphic marker loci (Smouse eta/. 1982). Whereas conven tional analyses restrict the researcher to viewing variation at individual loci or their average over several loci, multivariate techniques make it possible to examine variation in multilocus sets. Furthermore, since genes do not act inde pendently of other loci in the genome, examining multilocus sets may reveal differences among populations not reflected at individual loci. The use of multivariate techniques may even provide a crude method for screening coadapted gene complexes that could then be examined further. Recently, Conkle and Westfall (1984) have modified the methods described by Smouse et a/. (1982) to evaluate allozyme differences among ponderosa pine (Pinus ponder osa Doug!. ex Laws.) breeding zones in California. Using canonical correlation analysis, the authors could account for 49% of the genotypic variation with two canon ical variables associated with latitude, longitude, and eleva tion. Similarly, in a canonical discriminant analysis of allozyme data from 17 populations of lodgepole pine (Pinus contorta Doug!. ex Loud.) in the Yukon and British Columbia, Yeh eta/. (1985) accounted for 38% of the varia tion in 20 polymorphic loci with two significant canonical functions associated with latitude and elevation. In this paper, we examine patterns of allozyme variation among and within breeding zones of coastal Douglas-fir in south west Oregon using multivariate techniques. The objectives were to determine whether multivariate analyses would uncover more genetic differentiation among zones than was revealed when the same data were analyzed with conven tional single-locus techniques ( Merkle and Adams 1987), and whether multivariate patterns of allozyme variation are associated with environmental variables. The ability to genetically differentiate among Douglas-fir populations with multivariate techniques would have practical applications in seed certification and in refining seed- and breeding-zone boundaries ( Conkle and Westfall 1984). Materials and methods Wind-pollinated seeds of 31-72 (mean, 56) trees were collected from each of 22 Douglas-fir breeding zones in southwest Oregon (Table I, Fig. 1). Breeding zones, established in regional Douglas fir tree improvement programs (Silen and Wheat 1979), are eleva tion bands within geographically designated areas called breeding units. Each breeding zone spans an altitudinal range of < 300 m (1000 ft) and is generally smaller than 60 000 ha (!50 000 acres). Subsamples of the 300 + parent trees initially selected for breeding from wild stands in each zone were chosen for study. Among the trees for which wind-pollinated seed were stored, individuals were chosen so as to be distributed as uniformly as possible over the entire breeding zone. Megagametophytes from two seeds sampled from each parent tree were analyzed electrophoretically for 27 allozyme loci. Further information on the sampled zones and trees and details of seed preparation, electrophoretic methods, allozyme loci, and single-locus population analyses can be found in Merkle and Adams (1987). In addition, location data compiled for 1180 of the 1230 parent trees included latitude to the nearest distance inland from the Pacific Ocean (i.e., distance east 0.001 of 124°30' longitude) to the nearest 0.1 km, and elevation to the nearest 5 m. o, Scoring procedure Allelic data for each of the megagametophytes were first transformed to haploid genotype scores in a manner similar to that employed for diploid genotypes by Smouse et a!. (1982). With diploid genotypes, the genotype score for an allele is I, 1/2, or 0 depending on whether the allele is homozygous, heterozygous, or absent in the individual, respectively. Since only haploid genotype data were available, we modified the scoring procedure with a method suggested by R.D. Westfall (Pacific Southwest Forest and Range Experiment Station, United States Department of Agriculture, Forest Service, personal communication, 1983) in which a megagametophyte's genotype score for a given locus with k alleles consists of k-1 positions coded with O's and l's. For the first k-1 alleles, a 1 indicates that the allele is present, a 0 that it is absent. Zeros in all k-1 positions indicate that the k'h allele is present. For example, in the case of a three-allele locus, the two positions can be filled in one of three ways: I 0, indicating allele I; 0 I, indicating allele 2; or 0 0, indicating allele 3. Thus the haploid genotype score for a given locus is a vector of zeros and ones, and haploid genotype scores for all loci may be combined into one multilocus vector for each megagametophyte. With this procedure, we transformed each 27-locus haploid genotype into a 78-variable haploid genotype score and performed subsequent analyses using the resulting set of haploid scores. As noted earlier, each parent tree was represented by only two megagametophytes; thus, we could not obtain the true diploid genotype of each parent tree. Although we might have drawn a single megagametophyte per mother tree - the loss in precision would have been minimal because the second megagametophyte of each pair does not represent an independent gamete drawn from the population - we decided to use both available gametophyte MERKLE ET AL 183 TABLE 1. Coastal Douglas-fir seed sources, southwest Oregon, for the multivariate allozyme analysis from each tree and performed a second set of analyses using the mean haploid scores. Mean haploid scores based on large numbers of megagametophytes (i.e., > 7) sampled per mother tree would Elevation range (X 100 m) No. of parent trees closely approximate the diploid genotype scores employed by Smouse et at. (1982). With samples of only two megagametophytes 7.6-10.7 63 57 per mother tree, however, mean haploid scores must be considered crude approximations of diploid scores because the probability of detecting both alleles at a heterozygous locus is only 0.50. Despite limitations in interpreting mean haploid scores in this study, we Breeding unit by breeding zone Butte Falls BFI BF2 BF3 10.7-13.7 13.7-16.8 North Gold Beach GBNI GBN2 South Gold Beach 4.6 49 4.6- 7.6 58 0 - 4.6 4.6- 7.6 57 57 3.0- 6.1 31 43 0 GBSI GBS2 Grants Pass GP l GP2 Jacksonville JVl 39 6.1- 9.1 JV2 JV3 North Umpqua NUl NU2 NU3 4.6- 7.6 69 7.6-10.7 10.7-13.7 48 49 1.5- 6.1 72 6.1- 7.6 68 72 70 7.6- 9.1 9.1-10.7 10.7-12.2 NU4 NUS 57 South Umpqua 51 49 3.0- 6.1 6.1- 7.6 7.6- 9.1 9.1-10.7 10.7-13.7 SUI SU2 SU3 SU4 sus 67 62 42 expected multivariate analyses based on these scores to more effectively discriminate among breeding zones than analyses based on individual megagametophyte scores. Eliminating the variation between megagametophytes from the same tree was likely to reduce genetic variation within breeding zones, relative to that among zones, such that multivariate analyses would be better able to detect among-zone variation. Statistical analysis The first multivariate procedure was a principal components analysis (PRINCOMP; SAS Institute, Inc. 1982), applied to both individual and mean haploid genotype scores without regard to location or breeding-zone affiliation of parent trees, to determine if alleles were "clustered" into groups within which the alleles displayed similar patterns of variation. If any alleles were so clustered, then variation in haploid scores could be summarized by fewer principal components than the original 78 variables. We conducted principal components analysis based on both correlation and variance-covariance matrices. In four additional steps, we examined how haploid scores might be associated with the location of parent trees to establish (I) whether genotypes could be grouped according to the breeding zones from which they originated, and (il) whether haploid scores were related to the environment sur rounding the parent trees. To examine haploid scores with respect to breeding zones, we used canonical discriminant analysis (CANDISC; SAS Institute, Inc. 1982) to estimate the canonical correlations between haploid scores and a set of dummy variables coded for breeding zones. Because 44" each resulting canonical variable accounts for a given proportion of the variation in haploid scores, canonical discriminant analysis provides an index of the proportion of variation accounted for by differences among breeding zones, the canonical R2• OREGON If the variation in haploid scores results from selection by the local environment, the average canonical variable scores for con tiguous or environmentally similar breeding zones might be expected to be more similar than those for noncontiguous or envi T ronmentally different breeding zones. To examine the potential similarities or differences, we first plotted the mean canonical variable scores of each breeding zone for pairs of canonical KLAMATH FALLS 124" 122° FIG. 1. Douglas-fir breeding units in southwest Oregon. Shaded units were included in this study and are subdivided into two t o five breeding (elevation) zones (from Merkle and Adams 1987). variables. Then we conducted a hierarchical cluster analysis of the Mahalanobis distances (Rao 1973), generated by CANDISC, between each pair of breeding zones. Three clustering algorithms were tried: average linkage, complete linkage, and single linkage (CLUSTER; SAS Institute, Inc. 1982). Variation among breeding zones might also follow regional gra dients because environments in mountainous areas tend to vary along gradients. We therefore regressed the canonical variable scores for each significant (p < 0.05) canonical variable against the location variables latitude (L), distance from the ocean (D), and elevation (E). The preliminary model for each canonical 2 variable had nine terms (L, D, E, L , IY, E, L x D, L x E, D x E) and was modified by stepwise backward elimination (STEPWISE; SAS Institute, Inc. 1982) to a final model retaining genotypes in order to compare the results of this analysis with those of the companion study (Merkle and Adams 1987), which applied single-locus techniques to the same data set. only terms significant at p < 0.10. Each final model was tested for lack of fit (Draper and Smith 1966, p. 26). The canonical variable scores within each breeding zone used Following another suggestion by R.D. Westfall (personal com munication, 1985), we computed mean haploid genotype scores in the above regression were adjusted by CANDISC to the breeding zone mean. Because the adjustment is based on a linear combina tion chosen to maximize differences among breeding zones, it could for each maternal tree based on the two megagametophytes sampled CAN. J. FOR. RES. VOL. !8, 1988 184 TABLE 2. Correlation coefficients generated from canonical discriminant analysis of mean haploid genotype scores and breeding zones for the first four significant (p < 0.0001) canonical variables Canonical variable Locus Fest Pgml Pgm2 Pgil Pgi2 Lapl Lap2 Pep2 Gotl Got2 Got3 G6pd Gly Cat Gdh Sod Mpi Fhex 6pgd Acol Aco2 Idh Dia Mdhl Mdh2 Mdh3 Mdh4 Eigenvalue Percent of variance Canonical correlation CAN! CAN2 CAN3 CAN4 0.0419 0.0276 0.1310 0.0792 -0.0711 0.2137 -0.1045 0.1857 0.2067 0.2881 0.0376 0.1087 0.0751 0.3339 -0.1215 -0.1803 0.1430 -0.2236 0.1303 0.1152 0.0536 0.0281 0.0440 0.0053 0.0760 0.1392 0.0172 0.0212 0.0973 0.0796 0.1023 0.0106 -0.0448 0.0291 0.0734 0.1240 0.8425 -0.0599 0.0349 0.0383 0.0820 0.2888 15.50 0.4734 -0.0547 0.1515 0.2547 -0.1716 -0.0758 0.0753 0.0769 -0.1990 0.1835 0.0981 -0.1940 0.0710 -0.1076 0.2280 0.2277 -0.0599 -0.0735 0.0663 0.1154 0.1229 0.1896 0.1295 0.1483 0.0307 0.0707 0.1624 0.0725 0.2446 0.0584 -0.1083 0.2689 0.0399 0.1303 -0.1239 -0.0447 0.0299 0.1066 0.0037 -0.1360 -0.1079 0.0687 0.0842 0.1186 -0.0449 0.0979 0.1911 -0.0486 -0.1381 0.1089 0.1424 -0.2620 0.1925 0.0622 0.0572 -0.2313 0.1809 -0.1333 0.1886 10.12 0.1696 9.10 0.1405 7.54 0.3984 0.3808 0.3510 0.1498 0.1317 -0.0304 0.1392 0.2569 0.0762 0.1073 0.1077 0.0109 0.0856 NoTE: Only the largest correlation coefficient at each locus is shown. either camouflage or accentuate the relationship between the haploid scores and location variables, depending on whether those maximum differences were more closely associated with geographic or other factors. To check these possibilities, we compared results from the CANDISC analysis with those from a canonical correla tion analysis (CANCORR; SAS Institute, Inc. 1982), in which the nine permutations of L, D, and E composed the X matrix and the haploid scores the Y matrix. Results Results from the analyses based on the original haploid genotype scores and on the mean haploid genotype scores were very similar. However, as expected, slightly more genetic variation could be attributed to breeding zones on the basis of mean haploid scores. Therefore, we report here results based on mean haploid scores, unless otherwise indicated. Examination of the eigenvectors generated by principal components analysis of either the correlation or variance covariance matrix of haploid scores showed that none of the principal components reflected variation in more than two or three loci. For example, the first principal compo nent had relatively high coefficients (or "loadings") only for alleles of the G6pd, Gly, and Cat loci. Evidently, few clusters of alleles varied together in the same pattern. In the analysis based on the correlation matrix, the first principal component (which had the largest eigenvalue and thus explained the largest proportion of the variation) accounted for only 30Jo of the variation in haploid scores. The first 48 of the 78 principal components accounted for 90% of the variation. In the analysis based on the variance covariance matrix, the first principal component accounted for 10% of the variation and the first 20 principal com ponents for 90%. The first 4 of the 21 canonical variables generated by canonical discriminant analysis were highly significant (p < 0.0001), accounting for 42% of the variation in haploid scores (Table 2). Canonical R2 values indicated that the breeding-zone contributions to explaining variation in these scores ranged from 2 2% in the first canonical variable to 12% in the fourth canonical variable. The other 17 canonical variables were not significant (p > 0.05) The first canonical variable was dominated by the Dia locus (Table 2). However, the second, third, and fourth canonical variables were associated with several loci, each having correlation coeffi cients of 0.20 or higher with alleles at 5, 4, and 4 loci, respec tively (Table 2). The same analysis performed with the original haploid genotype scores resulted in four highly significant (p < 0.0001) canonical variables accounting for only 36% of the variation in scores. On the basis of . MERKLE ET AL. CAN2 1.3 1.5 1.2 1. 0 •NU3 L&J L&J (!)\..) z <t 0.5 •BFI a::;:! a GPI 1 5 - . -1.0 05 - . .NU5 0.5 SUS •JV2 ·0.5 1.0 1.1 1.0 L&J<I) •SU2 CAN I ·2.0 185 GP2 1.5 2.0 •SU3 • GBSI <I) cwill NO -z 0.9 z:::!i 0.5 canonical R2 values obtained from this analysis, we found that the breeding-zone contributions to this variation ranged from only 13% for the first canonical variable to 80/o for the fourth canonical variable (data not shown). Plotting the breeding-zone mean canonical scores for the first and second canonical variables, which together represented 25.6% of the variation in haploid scores, also reflected the moderate relationship of genotype variation with breeding zones ( Fig. 2). The first canonical variable (horizontal axis) appeared to partially separate the breeding zones according to breeding unit. For example, the Grants Pass zones (GP1, GP 2) were widely segregated from all others. The first canonical variable also separated breeding units from the west side of the Cascade Mountains (North Umpqua (NU), South Umpqua (SU), Grants Pass) in a north-south fashion, the higher values for the southernmost unit, Grants Pass, the lower values for the northernmost unit, North Umpqua. The second canonical variable (vertical axis) seemed more strongly associated with elevation zones. For example, the three Jacksonville zones (JV1-JV3), which were not separated at all by the first canonical variable, were spread from the highest elevation zone to the lowest by the second canonical variable, the highest zone (JV3) having the lowest value. Breeding zones within the North Umpqua and South Umpqua (SU1-SU5) units were separated by the second canonical variable but showed no clinal pattern with elevation ( Fig. 2). Average-linkage clustering, based on Mahalanobis dis tances between pairs of breeding zones, gave little evidence to relate geographic or environmental differences to varia tion in haploid scores. For example, some Butte Falls (BF) and Gold Beach (GB) zones, widely separated both geo graphically and environmentally, tended to cluster together relatively early ( Fig. 3). However, the Grants Pass breeding unit seemed separated genetically from the other units in that its two zones clustered together last with the remaining zones. One Jacksonville zone (JV3) also clustered with the other zones very late. Maximum- and minimum-linkage clustering resulted in dendrograms similar to that of Fig. 3. Of the location variables, only distance from the ocean was significantly associated with scores from all four signifi cant canonical variables (Table 3). In no case did the regres- l 0.4 -1.5 FIG. 2. Mean canonical variable scores of 22 southwest Oregon Douglas-fir breeding zones plotted for the first two canonical variables. See Table I for breeding-zone abbreviations. f 0.6 -1.0 •JV3 Of?, ...J<{ 0.8 <{ ...J ::!!:<{ 0.7 O::J: 0<{ N- N<tl - N <I)N - BREED! NG ZONE N -ZN Z N - FIG. 3. Average-linkage clustering of 22 southwest Douglas-fir breeding zones, based on a matrix of Mahalanobis distances. See Table 1 for breeding-zone abbreviations. TABLE 3. Outcome of regressing canonical variable scores for the first four canonical variables against the location variables latitude (L), distance from the ocean (D), and elevation (E) Canonical variable CAN! Location variable (significance) CAN3 lack of fit Canonical R2 < 0.05 0.11 >0.25 0.10 >0.05 0.07 >0.25 0.05 L (0.0001) D (0.0001) 2 L (0.0001) CAN2 Probability of d (0.0001) D (0.0063) (0.0064) (0.0204) LX E (0.0005) DxE (0.0001) D (0.0001) d (0.0001) 2 E (0.0059) LxD (0.0001) Lx£(0.0001) DxE (0.0004) CAN4 D (0.0037) 2 L (0.0005) d (0.0001) (0.0050) LxD (0.0001) D xE (O.OO I I) sion models account for more than 11% of the variation in canonical variable scores. Nevertheless, lack of fit (Draper and Smith 1 966) was significant (p < 0.05) only for the regression model relating geographic variables to scores for the first canonical variable. Canonical correlation of haploid scores with location variables produced two significant (p < 0.01) canonical correlations. The canonical variables of these first two correlations accounted for 3 90Jo of the variation in haploid scores (Table 4). However, the canonical R2 values associated with the canonical variables indicated that only 16% and 13% of the variation accounted for by the first and second canonical variables, respectively, could be attributed to the effects of the location variables. Both CAN. 186 J. FOR. RES. VOL 18, 1988 TABLE 4. Correlation coefficients generated by canonical correlation analysis of mean haploid genotype scores with location variables for the first two significant canonical variables Canonical variable CAN! CAN2 0.0675 0.0465 -0.121 I However, this study provides evidence that canonical discriminant analysis may be more sensitive than single-locus methods to revealing allozyme variation associated with geography. Canonical discriminant analysis assigned a larger proportion of genotypic variation to breeding zones than 0.2348 0.0838 0.0170 -0.1464 -0.0418 0.0797 0.0783 -0.2119 -0.2664 -0.0591 0.0194 0.1243 0.0825 0.1455 0.0092 0.0665 about 22% could be attributed to breeding-zone differences 0.1379 0.1518 -0.1410 genetic variation could be assigned to breeding-zone differ 0.1100 0.1541 Pep2 Got/ Got2 Got3 G6pd G/y Cal Gdh Sod Mpi Fhex 6pgd A col Aco2 Idh Dia 0.1287 -0.0426 0.0237 0.1971 0.1416 0.5815 0.0354 0.0494 0.1105 -0.0410 Mdhl Mdh2 Mdh3 Mdh4 gene diversity analysis for subdivided populations, 0.730Jo -0.2284 0.1670 0.1017 0.1268 -0.2645 0.1420 Lap/ Lap2 -0.1322 0.0423 -0.0321 -0.1967 0.0265 0.0852 -0.0802 0.0969 -0.1280 0.2874 -0.0313 0.1839 0.1730 0.1158 of the variation in allele frequencies could be attributed to variation among breeding zones-0.220Jo to breeding units, 0.51% to elevation bands within breeding units (Merkle and Adams 1987). In our canonical discriminant analysis, on the other hand, the first canonical variable accounted for over 15% of the variation in mean haploid scores, of which (Table 2). Thus, 3.3% (0.15 x 0.22 x 100%) of the total ences in the first canonical variable alone. Although it was possible through canonical discriminant analysis to assign a larger proportion of allozyme variation to differences among breeding zones, the patterns of varia tion revealed were only weakly associated with geographic variables. Cluster analysis based on Mahalanobis distances between pairs of breeding zones gave results quite similar to those found when clustering was based on a matrix of Nei's (1978) unbiased genetic distances involving the same loci (Merkle and Adams 1987). In both analyses, breeding zones from the same unit did not generally cluster together, and two zones from environmentally dissimilar breeding units clustered relatively early. The Mahalanobis distance based clustering procedure improved upon that based on Nei's distances only in the segregation of the Grants Pass zones, as a pair, from the zones of the other breeding units. (D) Elevation (E) L2 LxD LxE DxE 0.2710 0.1388 0.0070 0.4173 0.2844 0.0375 0.3653 0.3167 0.1658 0.6452 0.2669 0.3278 0.5508 related genotypic and environmental variation. Thus, despite evidence of genotypic variation among breeding zones of southwest Oregon Douglas-fir, the results 0.6126 tion is of adaptive significance with regard to the geographic 0.4876 -0.0470 0.0001 Significance Similarly, neither regression of canonical variable scores against location variables nor canonical correlation analysis 0.2373 -0.3908 0.1939 21.64 0.4030 Eigenvalue Percent of variance Canonical correlation Noa;o included in this study are weak or nonexistent. did single-locus techniques. When we applied Nei's (1973) Locus Fest Pgml Pgm2 Pgil Location variable Latitude (L) Distance to the ocean number of loci suggest that linkage disequilibria among loci 0.1558 17.38 0.3671 0.0012 Only the largest correlation coefficient at each locus is shown. canonical variables were associated with latitude, its square, suggest that no more than a minor proportion of the varia variables tested. Furthermore, because multivariate analyses could not clearly differentiate among breeding zones, application of such techniques does little to improve the utility of allozymes for certifying seed or designating breeding zones in Douglas-fir. Our results contrast quite strongly with those reporting multivariate analyses of allozyme variation patterns in two other conifer species. Using canonical correlation analysis on ponderosa pine from 12 seed zones in California, Conkle and the interactions of latitude and distance from the ocean and of latitude and elevation. As with the results of and Westfall (1984) found that 490Jo of the variation in canonical discriminant analysis, the first canonical variable by just two canonical variables, both of which were asso was dominated by the Dia locus; the second canonical ciated with geographic variables. These authors concluded variable was correlated with alleles at three loci (Table 4). Discussion The application of multivariate techniques to isozyme data provides only a crude analysis of interlocus associations. diploid genotype scores for 30 loci could be accounted for that multivariate analysis could be a very effective tool for correlating allozyme variation in ponderosa pine with geographic origin of seed and, further, that geographic variation in multilocus genotypes made it possible to assign individual trees to correct seed zones with high precision. Nevertheless, the inability of individual principal com Similarly, Yeh et at. (1985), working with 17 populations ponents to account for more than a few percent of the varia of lodgepole pine in British Columbia and the Yukon, tion in mean haploid genotype scores and the fact that the reported that two significant canonical discriminant func principal components were dominated by alleles at a low tions accounted for 38% of the variation in 20 polymorphic MERKLE ET AI.. loci. The populations' contributions to the variance accounted for by the first and second canonical variables were 280Jo and 2 7%, respectively. Although we did find some significant correlations between sets of allozyme and loca tion variables in southwest Oregon coastal Douglas-fir, our canonical correlation and discriminant analyses could not account for such large proportions of variation with so few canonical variables as in the ponderosa and lodgepole pine studies. Nor did our canonical discriminant analysis assign such large proportions of the variation to breeding-zone differences as could be assigned to population differences in the lodgepole pine study. Several factors could have contributed to the disparity between our results and those reported in the two pine studies. For example, our study did not cover as a geographic area as the other investigations. The latitudinal range of our study area was only 1.5°, whereas the ranges for the ponderosa and lodgepole pine studies were 7 and 1 oo, respectively. Thus, our study results could be interpreted as evidence that the multivariate techniques we applied are not useful on such a small geographic scale and therefore no conclusions can be drawn regarding the adaptive significance of the genetic variation. However, in a companion common garden study using the same families from this small region (Loopstra 1984), seedling quantitative traits showed strong clines over the same environmental gradients as tested here. Another difference between our study and the two pine studies was the type of isozyme data analyzed. As noted earlier, only haploid genotypes were available to us, although we did have two haploid genotypes from each mother tree. Thus we could not make use of the genetic scor ing method of Smouse et at. (1982), which is based on diploid genotypes. Although we eliminated within-tree varia tion by averaging the two haploid genotype scores, thereby reducing the within-breeding-zone component of variance, the resulting mean haploid scores were only crude approx imations of diploid scores. More precise and complete data were available in the ponderosa and lodgepole pine analyses because diploid genotypes were employed. Finally, differences in the study results may actually reflect basic differences in the genetic structure of the species at the allozyme level. Allozyme variation may simply not be as sensitive to geographic diversity in coastal Douglas-fir as it appears to be in ponderosa and lodgepole pine. This conclusion is supported by the results of a recent study on range-wide patterns of allozyme variation in Douglas-fir (Li 1 986). Although differences between the coastal and inland {var. glauca) varieties were quite distinct at the allozyme level, patterns of allozyme variation among populations representing the entire range of the coastal variety correlated poorly with geographic variables. Thus, it is not surprising that our southwest Oregon study also failed to provide evidence of the impact of selection on allozyme variation. Acknowledgments The authors thank Robert Westfall for sharing his exper tise, and Allan Doerksen, Carol Loopstra, Deborah Smith, and Louise Cremo for technical assistance. Financial sup port for this research was provided by the United States Department of Interior, Bureau of Land Management, and the United States Department of Agriculture, Forest Service, under the auspices of the Southwest Oregon Forestry Inten sified Research (FIR) Program (grant No. PN W-80- 85). 187 CAMPBELL, R.K., and SORENSEN, F.C. 1978. Effect of test environment on expression of clines and on delimitation of seed zones in Douglas-fir. Theor. Appl. Genet. 51: 233-246. CONKLE, M.T., and WESTFALL, R.D. 1984. Evaluating breeding zones for ponderosa pine in California. In Progeny Testing: Proceedings of Servicewide Genetics Workshop, Charleston, SC, December 5-9, 1983. USDA Forest Service, Washington, DC. pp. 89-98. DRAPER, N.R., and SMITH, H. 1966. Applied regression analysis. John Wiley & Sons, New York. EL-KASSABY, Y.A., and SZIKLAI, 0. 1982 . Genetic variation of allozyme and quantitative traits in a selected Douglas-fir (Pseudotsuga menziesii var. menziesii (Mirb.) Franco) popula tion. For. Ecol. Manage. 4: 115-126. FRANKLIN, J.F., and DYRNESS, C.T. 1973. Natural vegetation of Oregon and Washington. U.S. Dep. Agric. For. Serv. Gen. Tech. Rep. PNW -8. GRIFFIN, A.R., and CHING, K.K. 1977. Geographic variation in Douglas-fir from the coastal ranges of California. I. Seed, seedling growth and hardiness characteristics. Silvae Genet. 26: 149-156. HERMANN, R.K., and LAVENDER, D.P. 1968. Early growth of Douglas-fir from various altitudes and aspects in southern Oregon. Silvae Genet. 17(4): 143-151. 1RGENS-MOLLER, H. 1967. Patterns of height-growth initiation and cessation in Douglas-fir. Silvae Genet. 16: 56-58. LEWONTIN, R.C. 1984. Detecting population differences in quan titative characters as opposed to gene frequencies. Am. Nat. 123: 115-124. Ll, P. 1986. Range-wide patterns of allozyme variation in Douglas-fir {Pseudotsuga menziesii (Mirb.) Franco]. M.S. thesis, Oregon State University, Corvallis. LOOPSTRA, C. 1984. Patterns of variation within and among breeding zones of Douglas-fir in southwest Oregon. M.S. thesis, Oregon State University, Corvallis. MERKLE, S.A., and ADAMS, W.T. 1987. Patterns of allozyme variation within and among Douglas-fir breeding zones in southwest Oregon. Can. J. For. Res. 17: 402-407. NEI, M. 1973. Analysis of gene diversity in subdivided popula tions. Proc. Nat!. Acad. Sci. U.S.A. 70: 3321-3323. ___ 1978. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics. 89: 583-590. RAo, C.R. 1973. Linear statistical inference and its applications. John Wiley & Sons, New York. SAS INSTITUTE, INC. 1982. SAS user's guide: statistics, 1982 edition. SAS Institute, Inc., Cary, NC. SILEN, R.R., and WHEAT, J.G. 1979. Progressive tree improve ment program in Douglas-fir. J. For. 77: 78-83. SMOUSE, P.E., SPIELMAN, R.S. and PARK, M.H. 1982. Multiple locus allocation of individuals to groups as a function of the genetic variation within and differences among human popula tions. Am. Nat. 119: 445-463. YANG, J.C., CHING, T.M., and CHING, K.K. 1977. Isoenzyme variation of coastal Douglas-fir. I. A study of geographic varia tion in three enzyme systems. Silvae Genet. 26: 10-18. YEH, F.C., CHELIAK, W.M., DANCIK, B.P., I LLINGWORTH, K., TRUST, D.C., and PRYHITKA, B.A. 1985. Population differen tiation in lodgepole pine, Pinus contorta spp. latifolia: a'discrimi nant analysis of allozyme variation. Can. J. Genet. Cytol. 27: 210-218. YEH, F.C., and O'MALLEY, D.M. 1980. Enzyme variation in natural populations of Douglas-fir, Pseudotsuga menziesii (Mirb.) Franco, from British Columbia. I. Genetic variation patterns in coastal populations. Silvae Genet. 29: 83-92.

Multivariate analysis of allozyme variation ... 1 southwest Oregon

Related documents

Products

Support

Multivariate analysis of allozyme variation ... 1 southwest Oregon

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib