QTL analysis description, CN

advertisement
First report on marker/QTL analysis of MY1
Clare Nelson
5.1.06
Population characteristics
Of the original 500 lines, after genotyping with a few SSRs all but 156 were dropped
for either nonsegregation for parental alleles or possession of nonparental alleles.
Following genotyping with 155 SSRs, another 27 were dropped for anomalously high
heterozygosity, outcrossing, and/or extreme skewness of segregation. These events
strongly suggested careful study of the genotype data from the remaining lines.
Marker genotype tests
Segregation
Allele segregation at a marker locus in a RI population is expected to be 1:1, with a
small number of heterozygotes. The number of marker genotypes homozygous for
RT0034 was thus expected to equal that for Cypress. In fact the proportion was 12298 :
6603, with 646 heterozygotes and 446 missing. The most extreme segregation in favor of
RT0034 was 114:10:5 at a marker on chromosome 11. Only two markers favored
Cypress, with the most extreme segregating 59:66. The distribution of skewed
segregation across the genome was not random, with in general neighboring markers
having similar segregation ratios.
This segregation distortion toward indica alleles in a indica x japonica progeny is not
new in rice genetics. In at least one other cross, Moroberekan x CO39, RI progeny
showed a high bias toward the indica parent.
Similarity among lines
Several pairs of lines had identical genotypes at all but 2 to 6 of the ~150 marker
loci. Of these, one set of three lines differed from one another at 4, 6, and 8 loci. These
similarity tests are almost never reported by analysts, probably because it never occurs to
them to make them. I have previously seen similar anomalies in at least one other rice RI
progeny. I don’t think it makes much difference to QTL analysis, especially since a test
of correlation between all trait values for closely related pairs of lines gives no higher r
statistic than the test made on genotypically dissimilar lines; both numbers are in the lowto mid-0.90 – 1.0 range. However I suggest that it is worth wondering why this
nonrandom similarity occurs.
No QTLs for distribution shapes
The statisticians cleaning and preparing the data computed, for most traits, statistics
for each individual line describing the variation among measurements within replicated
plots for that line: standard deviations, coefficients of variation, ranges, and
minimum/maximum values. There were in fact more sets of these summary statistics than
of the mean values for the traits measured. While these statistics may be useful as reality
checks on the data, it’s unlikely that any meaningful QTL will ever be found for such
things (except “pseudo-QTLs” for min and max values, arising solely from correlation of
these with the mean). Genetically, it’s hard to imagine how a gene could govern the
scatter of the distribution of an effect, and practically it’s unclear what we would do with
such a QTL even if we found one. Would we want to select for narrow or broad
scattering, for example?
Caution on trait-data preparation
If data values are left as formulas in the Excel file, sorting the trait data can change
them to wrong numbers. Preparers should try to avoid this practice, but as insurance,
analysts should also begin by making a copy of the data in which formulas are replaced
with values.
Simple interval mapping (SIM) QTL analysis
SIM provides a fast and generally reliable overview of the QTLs profile for a trait,
showing where the main QTLs are, how strong their effects, and which parent contributes
the increasing allele. It should not be taken as a final analysis. Still, experience has shown
me that for a given trait, if with SIM you don’t find anything approaching a QTL, you
won’t find one with any more elaborate analysis either.
Full analysis should take into account the correlations among the traits analyzed, the
variation among environments, possible QTL x QTL effects, possible pleiotropy, domain
knowledge about the control and expression of the traits including parental phenotypes
(which suggest the expected source of superior QTL alleles) and an assessment of the
threshold for declaration of a QTL, done by permutation testing or other method. I have
not done these things in the preliminary scan.
In general a high correlation between traits, or across environments for a given trait,
is reflected in similarity of the QTL profiles. A low correlation between replicates within
an environment shows that the variation has very little genetic component, and you
should waste no time looking for QTLs. Characteristically the correlations between reps
within an environment are higher than those across environments. Where two correlation
values are given in the table, this is what they represent.
The suffixes AR, LA, and TX represent states, and the number following represents
the replicate experiment within state. In the attached Excel file is a summary of SIM QTL
results. “Consistency” refers to the similarity of the QTL profile or “contour” across reps
or locations, whichever is being referred to.
Download