Supplementary Text S1 - Springer Static Content Server

advertisement
Supplementary Text S1
i. Discriminant analysis of principal components (DAPC)
To assess how the study samples clustered into classes and to gauge underlying structures in
genotype data, we applied DAPC [1] as a mean of visualizing the level of similarity within
differently assembled versions of the dataset. DAPC relies on data transformation using
principal component analysis (PCA) as a prior step to discriminant analysis (DA). The
method can group observations into homogeneous classes derived from their distances to
generate a graphical representation of the relatedness between the inferred clusters. DAPC can
implement training set data to establish classification rules and testing set(s) to gauge their
efficiency.
Initial DAPC runs on the training set samples used all 63 pigmentation related SNPs (in order
to assess the maximum separation of clusters possible from all available genetic data) and
then analyses were made of subsets of the most closely associated markers identified by INB
assessments in this study applying four and eight different hair colours categories In order to
strengthen clustering, we retained the same number of PCs then SNPs analysed and two
discriminant functions in each simulation. DAPC calculations were performed using R
statistical software [2] (R v.3.0.1, http://www.r-project.org/), together with the adegenet
package (adegenet v.1.4-2, http://adegenet.r-forge.r-project.org/) [3,4].
ii. Linkage disequilibrium and haplotype block analysis
The 63 pigmentation related SNPs were analysed for Hardy Weinberg equilibrium (HWE)
and plots of inter-SNP linkage disequilibrium (LD) were prepared using Haploview [5]. A
default distance of 500 kilobases between markers was selected to compute LD statistics for
each chromosome.
iii. Analysis of epistasis with multifactor dimensionality reduction (MDR)
Multifactor dimensionality reduction (MDR) was performed in order to detect and
characterise epistasis between SNPs in the final recommended hair colour predictive set of 12
markers. MDR was made for each pairwise phenotype differentiation in the four categories
system (e.g. blond vs. non-blond, etc.). MDR is a non-parametric approach permitting
interactions to be detected especially in relatively small sample sizes [6]. We applied the
MDR analysis module in www.epistasis.org v.2.0.
REFERENCES
1. Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a
new method for the analysis of genetically structured populations. BMC Genet 11:94.
doi:10.1186/1471-2156-11-94
2. R Core Team (2014) R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria
3. Jombart T (2008) adegenet: a R package for the multivariate analysis of genetic markers.
Bioinformatics 24 (11):1403-1405. doi:10.1093/bioinformatics/btn129
4. Jombart T, Ahmed I (2011) adegenet 1.3-1: new tools for the analysis of genome-wide SNP
data. Bioinformatics 27 (21):3070-3071. doi:10.1093/bioinformatics/btr521
5. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD
and haplotype maps. Bioinformatics 21 (2):263-265. doi:10.1093/bioinformatics/bth457
6. Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC (2006) A
flexible computational framework for detecting, characterizing, and interpreting statistical
patterns of epistasis in genetic studies of human disease susceptibility. Journal of theoretical
biology 241 (2):252-261. doi:10.1016/j.jtbi.2005.11.036
Download