A genome-wide approach to identify pharmacogenetic contributions to adverse drug reactions in real time Matthew R. Nelson, Silviu-Alin Bacanu, Li Li, Clive E. Bowman, Michael Mosteller, Allen D. Roses, Eric H. Lai, Margaret G. Ehm Supplementary methods Whole-Genome Association Quality Assurance Using Probe Intensity Data (iQC) Genetic association studies are increasingly carried out on a genome-wide scale, wherein up to a million single nucleotide polymorphisms (SNPs) may be genotyped and tested. Although the development of technology and calling algorithms have resulted in relatively high average genotyping accuracy, various sources of experimental variability can result in genotyping errors, which may lead to false positive associations. This is particularly true with small sample sizes, such as in many pharmacogenetic studies, in which many of the most significantly associated results may be enriched with genotyping errors. We developed a quality index for rapid genotype quality assessment and detection of genotype errors that likely result in false positive association results using Affymetrix 500K intensity data. For a given marker, the probe intensity data for each allele is summarized in a 2-dimensional space. Only perfect-match data on a log2 scale is used. The mean log2 normalized intensity for each subject is calculated for alleles A and B, respectively, by averaging across all probes, subtracting the chip (subject) median to account for chip (subject) variation, and adding the global median across chips (subjects) to keep it positive. The normalized intensities are used to calculate the Euclidean distance matrix over all subjects. Based on the distance matrix and individual genotypes, the Silhouette width1 is obtained for each subject to measure the quality of genotype classification. A well classified subject is the one in the appropriate genotype cluster with Silhouette width ≥ 0. A quality index for each marker could be defined as the proportion of well classified genotypes of all subjects. For the case-control design considered here, the quality index is defined as the minimum of the proportion of well classified genotypes in cases or controls. By considering cases and controls separately, it is more likely to detect genotyping errors that have the greatest impact on association results, which can be beneficial for data with disproportional samples of cases and controls, particularly in PGx studies. Based on empirical data analyses, a cut off of quality index at 90% was recommended for a case-control design. For visualization, the mean adjusted intensities for each allele are plotted against each other, where each color represented a different genotype in the cluster plots. For a high quality marker, clusters of different colors (genotypes) should be separated from each other, indicating a high confidence in genotype calling. For a low quality marker, clusters of different colors (genotypes) might not have clear boundaries or part of the subjects may be misclassified to the wrong cluster. There are instances where the clustering suggests that the genotype calls are of poor quality even though the quality index may be high. Visual inspection is recommended for all top associations. References 1. Rousseeuw, P.J Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53–65