Association mapping approach for analysis of QTL

advertisement
Association mapping approach for analysis of QTL determining fatty acid
composition and oil content in sunflower seeds
Zambelli, A.1, Kaspar, M.1, Grondona, M.1, Reid, R.1 and León A.1
1
Biotech Research Center, Advanta Semillas-Nutrisun Business Unit, Ruta 226 Km 60.5 (7620) Balcarce,
Argentina. andres.zambelli@advantasemillas.com.ar




Most traits of agricultural importance are controlled by multiple quantitative trait loci (QTL).
The aim of many genetic mapping studies is to identify QTL responsible for phenotypic
variation facilitating genome aided breeding for crop improvement. Association mapping (AM)
is a powerful genetic mapping tool which refers to the analysis of statistical associations between
genotypes and phenotypes in a collection of individuals. Genetic mapping has usually been done
in specific populations such as a progeny of parents which differ for the trait of interest. By
contrast, AM could involve analyzing a set of individuals such as natural populations or
germplasm collections.
In the present work, an AM approach was assayed in a collection of sunflower inbred lines
which were genotyped with a set of 4000 SNP and phenotyped for oilseed fatty acid composition
and oil content.
This allowed for the identification of two main QTL associated with stearic and oleic acid in
agreement with previous data, validating the use of genome-wide association (GWA) techniques
for QTL analysis associated with this quality trait. On the other hand, this mapping approach
also allowed the identification of QTL associated with oil content.
Use of AM will contribute to a better definition of QTL or even to the identification of candidate
genes associated with fatty acid composition and oil content in sunflower, leading to a better
understanding of these complex traits and to developing new qualities of oil.
Key words: genetic diversity; genome-wide association; oil quality; mapping
INTRODUCTION
Even though regular sunflower oil has been traditionally appreciated, emerging oilseed markets are
demanding for new oil qualities both for food and non-food applications. Several sunflower mutants with
modified fatty acid composition have been generated by treatment with ionizing radiations or by chemical
mutagenesis. Among them, high stearic, high oleic and high stearic-high oleic sunflower mutants appear
as the most important ones (Fernández-Martínez et al., 2007; Garcés et al., 2009).
Seed oil yield, together with grain yield, are the two main features attempted to cover in sunflower
breeding programs. Seed oil percentage has a medium to high heritability and predominantly additive
gene action that facilitates selection in early generations.
Most traits of agricultural importance are controlled by multiple quantitative trait loci (QTL). The aim of
many genetic mapping studies is to identify QTL responsible for phenotypic variation facilitating genome
aided breeding for crop improvement. A new powerful genetic mapping tool is association mapping (AM)
(Abdurakhmonov and Abdukarminov, 2008; Myles et al., 2009). AM refers to the analysis of statistical
associations between genotypes and phenotypes determined in a collection of individuals. Genetic
mapping has usually been done in specific populations such as a progeny of parents which differ for the
trait of interest. By contrast, AM could involve analyzing a set of individuals such as natural populations
or germplasm collections. This becomes an advantage over linkage or family mapping (Myles et al.,
2009).
Two AM methodologies are in use: candidate gene and genome-wide association (GWA). The former
assumes good understanding of the biochemistry and genetics of the trait, while GWA involves testing for
association with the trait of most segments of the genome. The hypothesis under consideration is: ‘one (or
more) of the genetic loci being considered is either causal for the trait or in linkage disequilibrium with
the causal loci’. The strategy of a GWA study is to genotype enough markers across the genome so that
functional alleles will likely be in linkage disequilibrium (LD) with at least one of the genotyped markers.
Of course, the first step in this process is the discovery of a large number of genetic markers, typically
single nucleotide polymorphisms (SNPs), as a reference resource.
LD is the basis of genetic mapping as it is a requisite to detect markers closely linked to QTL. LD is the
association between loci and refers to the recombination of specific alleles at different loci. The key
difference between association and family mapping is the control the experimenter has over
recombination. In AM, genotype and phenotype data is collected from a population in which relatedness
is not manipulated, so there is not control over LD (Myles et al., 2009). Investigations carried out in
different crops showed that LD decreases with the distance between loci and also, the rate of decay is
slower in inbred lines than in wild populations. It was proposed that the decay in modern sunflower was
sufficient for very high density genetic mapping and high-resolution AM, which can be achieved with
marker densities lower than those usually reported in the literature (Kolkman et al., 2007; Fusari et al.,
2008).
AM exploits recombination events occurred in the evolutionary history supplying higher mapping
resolution (Myles et al., 2009). However, the uncontrolled population design can result in spurious
signals of association in downstream analysis. False positive association between markers and traits can
arise due to population structure caused by selection, plant improvement, etc. Thus, taking this into
consideration the population structure is a critical prerequisite in association analyses. Moreover, if
pedigree data from inbred lines could be reconstructed and used, control of type I and II error rates
improves the analysis (Yu et al., 2006). Another source that increases the false positive rate is the
occurrence of alleles in very low frequency.
Successful application of AM also requires comprehensive phenotypic data. In fact, this turns into another
benefit of AM because it can be based on historical breeding trials which abound in companies and crop
improvement centers. The methods for marker-trait association may differ for discrete or quantitative
traits. Thus, different statistical analyses have been applied: contingency tables, ANOVA, general linear
model, mixed linear models, among others. In this way, AM is a complementary approach to linkage
analysis in terms of providing prior knowledge, cross-validation and statistical power for QTL
investigation.
The objective of the present study has been to perform a whole genome association mapping analysis for
oil quality and oil content in a set of sunflower inbred lines to identify the genomic regions associated to
these traits.
MATERIAL AND METHODS
Phenotype data
A set of 89 inbred lines (high linoleic conventional sunflower) was selected aiming to represent the total
genetic variability present in the Advanta sunflower breeding program. The selection was based on allele
sharing distances obtained with SSR-genotype and pedigree information. For oil quality analysis twenty
plants from each line were sown in 2008 in Balcarce (two planting dates) and Venado Tuerto, Argentina.
Seeds from each line were harvested in bulk and the fatty acid composition was evaluated by gas
chromatography.
Out of the 89 lines sown at the second date in Balcarce, 74 produced enough seeds (50 g) for measuring
oil content by soxhlet method by duplicate.
Genotype data
A set of SNPs were discovered and validated by a multi-company Consortium led by The University of
Georgia (Athens, GA, US). The 89 conventional inbred lines from Advanta's germplasm were genotyped
by the Consortium with 6984 SNPs using a high-throughput genotyping system (Illumina Golden Gate).
Population structure
With the aim of controlling for spurious associations, population structure was investigated using
STRUCTURE software (Pritchard et al., 2000). The analysis showed that there were four sub-populations
among the Advanta inbred lines. A proportion of membership of each line to every sub- population was
used in order to classify lines. So, a factor of subpopulation is considered as a covariate in the association
mapping model.
Association analysis
Associations between both traits and all SNP markers were analyzed using an additive linear regression
model estimated by the mlreg function of the GenABEL package (http://www.genabel.org/) for R
statistical software.
Particularly for oil content, once regions were detected, the SNP of a particular region which had the
highest association with the trait was selected. For every selected SNP, the favorable/unfavorable allele’s
effect was studied and the percentage of total variation explained by the SNP was determined by the
following model:
OilContentij = µ + SNPi + Ɛij
Where µ is mean oil content, SNPi is effect of ith genotype over the mean oil content. Finally, Ɛij is the
residual error. Additionally, models combining 2, 3 or 4 SNPs were studied.
RESULTS
SNP genotyping
Eighty-nine Advanta’s inbreed lines were genotyped with all the SNPs generated by the Consortium. The
selection quality of those SNP was based on the rate of missing (no data) and heterozygous genotypes
(see table below). After the analysis, 4076 SNPs were selected and used for the genotype-phenotype
association analysis (Table 1).
Table 1. Selection of SNP used for genotyping 89 inbred lines for AM. The criterion followed was to
include those SNP with single locus, homozygous and with missing data lower than 10% of the
individuals genotyped and allele frequency higher than 10%.
SNPs mapped in one position
6984
SNPs not genotyped (missing data for all the lines)
163
(-2.33%)
SNPs with 10% of missing data or more
391
(-5.6%)
SNPs with less than 10% of missing data
6430
SNPs with 50% or more in heterozygosis
774
(-12.04%)
SNPs with 25-50% in heterozygosis
612
(-9.52%)
SNPs with 10-25% in heterozygosis
409
(-6.36%)
SNPs with less than 10% in heterozygosis
4635
SNPs with one genotype in 95-100% of the lines
334
(-7.2%)
SNPs with one genotype in 90-95% of the lines
225
(-4.85%)
SNPs effectively used for the AM analysis
4076
Oil quality
Statistical analyses for stearic and oleic acid content presented signals of association with markers located
at different linkage groups (LG) which were always the same among the environments assayed. In
particular, strong signals were found for LGs 1 and 14, in agreement with candidate genes and QTL
previously reported (Fig. 1). On LG 1 it was identified a chromosome region with strong signal of
association with stearic content and closely located to the mapped stearoyl-ACP desaturase locus (PérezVich et al., 2002); this enzyme is involved in the conversion of stearic acid into oleic acid. The AM
approach also allowed for identification of a chromosome region on LG 14 with a strong signal for oleic
content, in coincidence with the mapped oleoyl-PC desaturase (Hongtrakul et al., 1998) which catalyzes
the desaturation of oleic acid into linoleic acid.
Fig. 1. Association analysis between different SNPs located on LG 1 (a) and LG 14 (b) with stearic and
oleic fatty acids content expressed as minus logarithm of p-value (-log(p-value)) corresponding to marker
effect in individual environment analyses. SNP chromosome position is indicated in centiMorgans (cM).
The -log(p-value) scale is indicated in the colored right column.
Oil content
Statistical models analysis for oil content identified signals of association with SNPs at different LGs
regions and with different degrees of significance. Some of the regions identified are in agreement with
previous QTL analysis. Among all the regions identified, four were chosen due to their high significance.
For each region, the most significant SNP was selected. Each of these SNP explained between 15-21% of
the total phenotypic variation. The analysis of the haplotype of the 74 lines demonstrated that lines with
the unfavorable haplotype (unfavorable allele for the 4 selected SNPs) had an oil content average 10%
lower than lines with the favorable haplotype (Fig. 2).
Fig. 2. Mean seed oil content increases as favorable SNP alleles accumulate up to 4 homozygous alleles.
DISCUSSION
The use of AM approach allowed for the identification of two main QTL associated with stearic and oleic
acid in agreement with previous data, validating the use of GWA techniques for QTL analysis associated
with fatty acid oil composition in sunflower. Some minor QTL were also detected which should be
validated. In addition, AM analysis identified QTL associated with oil content.
As large-scale genotyping is becoming cost affordable, it is clear that the collection of high-quality
phenotype data will be the main bottleneck of a given mapping study. It is highly recommended that
experimental design begins selecting germplasm of appropriate levels of relatedness and to generate highquality phenotype data, as these factors will be major determinants of the power to identify QTL. Use of
AM approach will contribute to a better definition of QTL or even the identification of candidate genes
associated with fatty acid composition and oil content in sunflower, leading to a better understanding of
these complex traits. The results of this study will be useful not only as a source of information about the
genetics of the traits but also in marker-assisted breeding programs.
REFERENCES
Abdurakhmonov, I.Y. and Abdukarminov, A. 2008. Application of association mapping to understanding
the genetic diversity of plant germplasm resources. Int. J. Plant Genomics 2008:574927.
Fernández-Martínez, J.M., Pérez-Vich, B., Velasco, L. and Domínguez, J. 2007. Breeding for specialty
oil types in sunflower. Helia, 30:75-84.
Fusari, C.M., Lia, V.V., Hopp, H.E., Heinz, R.A. and Paniego, N.B. 2008. Identification of single
nucleotide polymorphisms and analysis of linkage disequilibrium in sunflower elite inbred lines using the
candidate gene approach. BMC Plant Biol. 8:7.
Garcés, R., Martínez-Force, E., Salas J.J. and Venegas-Calerón M. 2009. Current advances in sunflower
oil and its applications. Lipid Technol. 21:79-82.
Hongtrakul, V., Slabaugh, M.B. and Knapp, S.J. 1998. A seed specific delta-12 oleate desaturase gene is
duplicated, rearranged and weakly expressed in high oleic acid sunflower lines. Crop Sci. 38:1245-1249.
Kolkman, J.M., Berry, S.T., Leon, A.J., Slabaugh, M.B., Tang, S., Gao, W., Shintani, D. K., Burke, J.M.
and Knapp, S.J. 2007. Single nucleotide polymorphisms and linkage disequilibrium in sunflower.
Genetics 177:457-468.
Myles, S., Peiffer, J., Brown, P.J., Ersoz, E.S., Zhang, Z., Costich, D.E. and Buckler, E.S. 2009.
Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell
21:2194-2202.
Pérez-Vich, B., Fernández-Martínez, J.M., Grondona, M., Knapp, S.J. and Berry, S.T. 2002. StearoylACP and oleoyl-PC desaturase genes cosegregate with quantitative trait loci underlying high stearic and
high oleic acid mutant phenotypes in sunflower. Theor. Appl. Genet. 104:338-349.
Pritchard, J.K., Stephens, M. and Donnelly, P. 2000. Inference of population structure using multilocus
genotype data. Genetics 155:945-959.
Yu, J., Pressoir, G., Briggs, W.H., Vroh-Bi, I., Yamasaki, M., Doebley, J.F., McMullen, M.D., Gaut, B.S.,
Nielsen, D.M., Holland, J.B., Kresovich, S. and Buckler, E. 2006. A unified mixed-model method for
association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38:203-208.
Download