First report on marker/QTL analysis of MY1 Clare Nelson 5.1.06 Population characteristics Of the original 500 lines, after genotyping with a few SSRs all but 156 were dropped for either nonsegregation for parental alleles or possession of nonparental alleles. Following genotyping with 155 SSRs, another 27 were dropped for anomalously high heterozygosity, outcrossing, and/or extreme skewness of segregation. These events strongly suggested careful study of the genotype data from the remaining lines. Marker genotype tests Segregation Allele segregation at a marker locus in a RI population is expected to be 1:1, with a small number of heterozygotes. The number of marker genotypes homozygous for RT0034 was thus expected to equal that for Cypress. In fact the proportion was 12298 : 6603, with 646 heterozygotes and 446 missing. The most extreme segregation in favor of RT0034 was 114:10:5 at a marker on chromosome 11. Only two markers favored Cypress, with the most extreme segregating 59:66. The distribution of skewed segregation across the genome was not random, with in general neighboring markers having similar segregation ratios. This segregation distortion toward indica alleles in a indica x japonica progeny is not new in rice genetics. In at least one other cross, Moroberekan x CO39, RI progeny showed a high bias toward the indica parent. Similarity among lines Several pairs of lines had identical genotypes at all but 2 to 6 of the ~150 marker loci. Of these, one set of three lines differed from one another at 4, 6, and 8 loci. These similarity tests are almost never reported by analysts, probably because it never occurs to them to make them. I have previously seen similar anomalies in at least one other rice RI progeny. I don’t think it makes much difference to QTL analysis, especially since a test of correlation between all trait values for closely related pairs of lines gives no higher r statistic than the test made on genotypically dissimilar lines; both numbers are in the lowto mid-0.90 – 1.0 range. However I suggest that it is worth wondering why this nonrandom similarity occurs. No QTLs for distribution shapes The statisticians cleaning and preparing the data computed, for most traits, statistics for each individual line describing the variation among measurements within replicated plots for that line: standard deviations, coefficients of variation, ranges, and minimum/maximum values. There were in fact more sets of these summary statistics than of the mean values for the traits measured. While these statistics may be useful as reality checks on the data, it’s unlikely that any meaningful QTL will ever be found for such things (except “pseudo-QTLs” for min and max values, arising solely from correlation of these with the mean). Genetically, it’s hard to imagine how a gene could govern the scatter of the distribution of an effect, and practically it’s unclear what we would do with such a QTL even if we found one. Would we want to select for narrow or broad scattering, for example? Caution on trait-data preparation If data values are left as formulas in the Excel file, sorting the trait data can change them to wrong numbers. Preparers should try to avoid this practice, but as insurance, analysts should also begin by making a copy of the data in which formulas are replaced with values. Simple interval mapping (SIM) QTL analysis SIM provides a fast and generally reliable overview of the QTLs profile for a trait, showing where the main QTLs are, how strong their effects, and which parent contributes the increasing allele. It should not be taken as a final analysis. Still, experience has shown me that for a given trait, if with SIM you don’t find anything approaching a QTL, you won’t find one with any more elaborate analysis either. Full analysis should take into account the correlations among the traits analyzed, the variation among environments, possible QTL x QTL effects, possible pleiotropy, domain knowledge about the control and expression of the traits including parental phenotypes (which suggest the expected source of superior QTL alleles) and an assessment of the threshold for declaration of a QTL, done by permutation testing or other method. I have not done these things in the preliminary scan. In general a high correlation between traits, or across environments for a given trait, is reflected in similarity of the QTL profiles. A low correlation between replicates within an environment shows that the variation has very little genetic component, and you should waste no time looking for QTLs. Characteristically the correlations between reps within an environment are higher than those across environments. Where two correlation values are given in the table, this is what they represent. The suffixes AR, LA, and TX represent states, and the number following represents the replicate experiment within state. In the attached Excel file is a summary of SIM QTL results. “Consistency” refers to the similarity of the QTL profile or “contour” across reps or locations, whichever is being referred to.