Supplementary Information (doc 528K)

advertisement
Supplemental Materials
The genetic architecture of pediatric cognitive abilities in the Philadelphia
Neurodevelopmental Cohort
Elise B. Robinson, ScD a,b,1, Andrew Kirby, BA a,b, Kosha Ruparel, MSE c, Jian Yang, PhDd,
Lauren McGrath, PhD e, Verneri Anttila, PhD a,b,f, Benjamin M. Neale, PhD a,b, Kathleen
Merikangas, PhD g, Thomas Lehner, PhD h, Patrick M.A. Sleiman, PhD i, Mark J. Daly, PhD a,b,
Ruben Gur PhD c, Raquel Gur, MD, PhD c,, Hakon Hakonarson, MD, PhD i
1 For inquiries regarding this report: erobinson@atgu.mgh.harvard.edu
a) Analytic and Translational Genetics Unit, Massachusetts General Hospital and
Department of Medicine, Harvard Medical School, Boston, MA 02114.
b) Stanley Center for Psychiatric Research and Medical and Population Genetics Program,
Broad Institute of MIT and Harvard, Cambridge, MA 02142.
c) Department of Psychiatry, Perelman of School of Medicine, University of Pennsylvania,
Philadelphia, PA, 19104.
d) Queensland Brain Institute, University of Queensland, Brisbane, Australia.
e) Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research
and Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114.
f) Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
g) Genetic Epidemiology Research Branch, Intramural Research Program, National
Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892.
h) Office of Genomics Research Coordination, National Institute of Mental Health, National
Institutes of Health, Bethesda, MD 20892.
i) Center for Applied Genomics, The Children’s Hospital of Philadelphia, Philadelphia, PA,
19104.
Contents
1.
2.
3.
4.
5.
6.
Summary of Computerized Neurocognitive Battery (CNB) Measures
Genotyping and Imputation
Genotypic Principal Components Analyses
Unrotated Principal Components Analysis
Common Factor and Rotated Principal Components Analyses
Principal Component Loadings for Accuracy Traits; Phenotypic Correlation
Matrix
7. Comparison of magnitude between phenotypic and genotypic correlations
8. Phenotypic and Genetic Association between Domains and Factor Scores;
9. Variation Explained by Genic and Intergenic SNPs
1) Summary of Computerized Neurocognitive Battery (CNB) Measures
The CNB was developed by Ruben Gur, Raquel Gur and colleagues at the University
of Pennsylvania. Its component measures are described in Gur et al. 2010 and 2012.
The Wide Range Achievement Test (WRAT) is described by Wilkinson and
Robertson (2006). Cronbach’s alpha was estimated based on the consistency of
correct responses on each measure.
Table S1. Reliability of CNB measures
Trait
Measure (1, 2)
Abstraction and Mental Flexibility
Attention
Working Memory
Facial Memory
Spatial Memory
Verbal Memory
Language Reasoning
Nonverbal Reasoning
Spatial Reasoning
Age Differentiation
Emotional Differentiation
Emotional Identification
Wide Range Achievement Test
Penn Conditional Exclusion Test
Penn Continuous Performance
Test
Letter N-Back
Penn Face Memory Test
Visual Object Learning Test
Penn Word Memory Test
Penn Verbal Reasoning Test
Penn Matrix Reasoning Test
Penn Line Orientation Test
Penn Age Differentiation Test
Penn Emotion Differentiation
Test
Penn Emotion Identification Test
Wide Range Achievement Test
(3)
Cronbach’s
Alpha
NA
0.94
0.87
0.68*
0.52
0.80
0.77*
0.86*
0.93*
0.78*
0.81
0.75*
NA
Note: NA=Not Applicable; Cronbach’s alpha was not calculated for abstraction and
mental flexibility because the test is designed to elicit incorrect responses at various
points, several stimuli are repeated throughout the test, and the length of the test
varies between individuals based on their performance (2). The reading items from
the WRAT were used in this analysis—see Robertson et al. for details and validation.
Multiple, highly similar forms were used for several of the measures, indicated by
asterisks following the reliability estimates. This approach was used to prevent
learning effects in longitudinal (follow-up) research using the cohort. The reliability
values with asterisks indicate an average Cronbach’s alpha reliability estimate
weighted by the number of individuals who took each form of the test. No more than
4 forms were used for any individual measure and the range of the form-specific
alphas did not exceed 0.12 per measure.
2) Genotyping and Imputation
The data were cleaned and prepared for ancestry analysis at Massachusetts General
Hospital. Beginning with the 5141 self-described white non-Hispanic individuals, for
each chip array dataset, we: a) removed individuals with sex-mismatched genotype
and phenotype data (n=85); b) removed SNPs with genotype missingness>5%, c)
removed individuals with marker missingness>2% (total n=91), d) removed SNPs
with missingness>1%, e) removed SNPs with empirical minor allele frequency
greater or less than 0.15 from their HapMap CEU value, f) removed SNPs with
Hardy-Weinberg Equilibrium test p-values<1x10-6, and g) removed individuals with
heterozygosity values greater than 0.05 or less than -0.05 (total n=97). Using a
linkage disequilibrium pruned subset (n=39,830 SNPs) of the remaining markers
common to each of the four platforms (n= 236,473 SNPs), we identified and
removed individuals with excess relatedness (pi_hat>0.1, n=309). We then
conducted a principal components analysis as described in the Methods. The
individuals included in the genetic analyses were selected through the steps above
as well as further phenotypic analyses described in the Methods.
Imputation
The following data steps were implemented in a separate stage of the study at the
Children’s Hospital of Philadelphia.
Samples were genotyped on one of three Illumina arrays, the HumanHap550,
HumanHap610, or the OmniExpress v2. For each chip type, we excluded from
further analysis any samples that had missing genotypes for more than 2% of the
SNPs on the array; further, we only included SNPs with genotype missing rates <
5%, minor allele frequency > 1%, as well as Hardy-Weinberg Equilibrium test P
value > 0.0001.
Duplicate samples and cryptic relatedness: we generated pairwise IBD values for all
samples using the --genome command in PLINK(ref), excluding one sample from
any pair with a PI_HAT value exceeding 0.3. PI_HAT values estimate the degree of
genetic relatedness between individuals, where: >0.99 indicate sample duplication
or monozygotic twins, 0.5 indicates dizygotic twins or regular siblings, and 0.25
indicates half siblings.
Prephasing: for each of the three chip types, samples were prephased for imputation
using the SHAPEIT (http://www.shapeit.fr) package. The GWAS SNPs were lifted
over to hg19 build37 coordinates, marker strand alignments checked against the
1000 genomes Phase I reference alleles and any misaligned SNPs flipped prior to
phasing. Each entire chromosome was prephased separately. As prephasing was
carried out for each chip type separately, haplotypes were restricted to SNPs
common to all arrays prior to imputation.
Imputation: unobserved genotypes in each cohort were imputed using the IMPUTE2
package (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html) and the reference
haplotypes in Phase I of the 1000 genomes (June 2011 release). The used reference
included 37,138,905 million variants from 1,094 individuals from Africa, Asia,
Europe and the Americas.
Genotype concordance: internal cross validation was carried out automatically by
IMPUTE2. The calculation was performed by masking one variant at a time in the
study data, imputing the masked variant and comparing the result to the original
genotype. Average concordance for all datasets was > 90%.
Post-imputation QC: Imputed data was QC'd using the qctool package
(http://www.well.ox.ac.uk/~gav/qctool), SNPs with a maximum genotype
probability of less than of 0.9 were excluded from further analysis, as were SNPs
with a minor allele frequency below 1%.
3) Genotypic Principal Component Analyses
The post-QC sample of unrelated individuals who identified themselves as white
nonhispanic (WNH; n=4,559) is plotted in dark blue against the HapMap3
populations (northern and western Europe (CEU); Han Chinese in Beijing, China
(CHB); Japanese (JPT); Yoruba (YRI); African ancestry in the southwestern USA
(ASW); Chinese in Colorado, USA (CHD); Gujarati Indians in Texas, USA (GIH); Luhya
in Webuye, Kenya (LWK); Maasai in Kinyawa, Kenya (MKK); Mexican ancestry in
California, USA (MXL); and Tuscany, Italy (TSI)) (4). PC1, PC2, and PC3 indicate
loadings onto the first, second, and third principal components respectively.
Figure S1a. Full WNH sample PC1 v PC2
Figure S1b. Full WNH sample PC2 v PC3
After excluding outlying individuals, the reduced European-American sample below
was carried through for phenotypic exclusions and analyses analyses (n=4,050).
These remaining individuals are plotted in dark blue against the same HapMap3
populations. The final analytic sample (n=3,689) is described in Tables 4 and 5,
below.
Figure S1c. Reduced sample PC1 v PC2
Figure S1d. Reduced sample PC2 v PC3
4) Unrotated Principal Components Analysis
An unrotated PCA was conducted to estimate the variance captured by the first
principal component. There was a dominant first principal component that
explained 27.7% of variation in the accuracy variables. Variable loadings onto the
first principal component are presented in Table S2, below. These principal
component scores were not used in the genetic analyses as the accuracy variables
are highly correlated and unrotated principal components analysis does not yield
correlated or interpretable additional components.
Table S2. Unrotated principal components analysis
PC1
Loading
Abstraction and
Mental Flexibility
Attention
0.38782
Language
Reasoning
Nonverbal
Reasoning
Working Memory
0.67553
Spatial Reasoning
0.61953
WRAT
0.56751
Age Differentiation
0.54990
Emotion
Identification
Emotion
Differentiation
Facial Memory
0.32148
Spatial Memory
0.45616
Verbal Memory
0.37138
0.50369
0.63866
0.50393
0.59899
0.51301
Note: PC=Principal Component.
5) Common Factor and Rotated Principal Components Analyses
The common factor analyses for the CNB (including the WRAT) variables showed a
single strong, dominant factor. Rotation was not possible in this case as there was
only one factor. The eigenvalue estimates reflect the variation explained by each
factor relative to any single trait used in the factor analysis. For example, an
eigenvalue of 1 indicates that the factor does not explain any more variation than a
single component trait of the 13 accuracy traits.
Figure S2a. Common factor analysis
Figure S2b. Principal components analysis (promax rotation)
We conducted additional genetic analyses using the principal component loadings.
While the PCA similarly yielded a dominant first principal component, two
additional principal components had eigenvalues exceeding one. Rotation allows the
additional components to be interpreted. The first PC includes the reasoning and
executive function items; the second the social cognition items; and the third the
memory items.
6) Rotated Principal Components Analysis— Trait to Component Correlations
We conducted the rotated PCA of the variables four times—first with the full
analytic sample and then in each of the three age groups below. This table shows
trait/principal component pairs with a correlation coefficient above 0.35. All
variables were associated with only one principal component with a correlation
above 0.35, except facial memory accuracy (FMEM_A; in addition to the correlation
shown below, it showed a 0.43 correlation to PC2).
Table S3a. Item to principal component correlations
Variable
Full
Ages
Cohort 8-11
Abstraction and
0.492 0.465
Mental Flexibility
PC1
Attention
0.462 0.466
Loadings Language Reasoning
0.665 0.606
Nonverbal Reasoning
0.680 0.662
Working Memory
0.593 0.604
Spatial Reasoning
0.584 0.531
WRAT
0.562 0.473
PC2
Age Differentiation
0.772 0.800
Loadings Emotion Identification
0.537 0.354
Emotion Differentiation
0.715 0.714
PC3
Facial Memory
0.689 0.725
Loadings Spatial Memory
0.667 0.591
Verbal Memory
0.711 0.580
Ages
12-16
0.461
Ages
17-21
0.548
0.461
0.714
0.641
0.587
0.589
0.602
0.790
0.501
0.740
0.652
0.673
0.739
0.532
0.626
0.718
0.633
0.548
0.601
0.777
0.451
0.809
0.743
0.621
0.718
The factor structure and loadings were similar across age groups suggesting
consistency in the phenotypic correlation structure.
Table S3b. Phenotypic Correlation Matrix
(See Phen_Corr_Matrix_Revised.xlsx)
7) Comparison of magnitude between phenotypic and genotypic correlations.
Figure S3. Comparison of magnitude between phenotypic and genotypic correlations.
Note: The diameter of the circles corresponds to the magnitude of the phenotypic and
genotypic correlations between the domains (r=0-1). The magnitude of the genotypic
correlation between emotional identification and nonverbal reasoning is not presented
because it is less than 0.
8) Phenotypic and Genetic Association between Domains and Common Factor
Score
Table S4. Estimated phenotypic and genetic correlations between the common
factor score and its component variables with nominally significant univariate
heritability estimates.
Correlation
r(p)
p(r(p)=0)
r(g) (SE)
p(r(g)=0) p(r(g)=1)
Common Factor Score (CFS) –
Domain
CFS-Emotion Identification 0.272
<0.0001
0.184 (0.178)
0.2
1e-07
CFS-Verbal Memory 0.380
<0.0001
0.749 (0.174)
0.001
0.09
CFS-Language Reasoning 0.684
<0.0001
0.972 (0.082)
0.0004
0.4
CFS-Nonverbal Reasoning 0.636
<0.0001
0.686 (0.100)
0.0005
0.0003
CFS-Spatial Reasoning 0.594
<0.0001
0.738 (0.117)
0.0008
0.01
CFS-WRAT 0.566
<0.0001
0.743 (0.110)
0.0002
0.01
Note: r(p)= estimated phenotypic correlation; p(r(p)=0)= probability that phenotypic
correlation=0; r(g)=estimated genotypic correlation; p(r(g)=0)= probability that genotypic
correlation=0; p(r(g)=1)= probability that genotypic correlation=1; SE=standard error;
WRAT=Wide Range Achievement Test.
9) Variation Explained by Genic and Intergenic SNPs
Figure S5. Variation in complex cognition explained by SNPs within genic and
intergenic regions of the genome.
Note: error bars indicate one standard error; p values = probability of no joint
association between the genic and intergenic regions and the common factor score.
References
1.
Gur RC, Richard J, Calkins ME, Chiavacci R, Hansen JA, Bilker WB, et al. Age
group and sex differences in performance on a computerized neurocognitive battery
in children age 8-21. Neuropsychology. 2012;26(2):251-65. Epub 2012/01/19.
2.
Gur RC, Richard J, Hughett P, Calkins ME, Macy L, Bilker WB, et al. A cognitive
neuroscience-based computerized battery for efficient measurement of individual
differences: standardization and initial construct validation. Journal of neuroscience
methods. 2010;187(2):254-62. Epub 2009/12/01.
3.
Wilkinson GS, Robertson GJ. Wide Range Achievement Test 4 professional
manual. Lutz, Florida: Psychological Assessment Resources; 2006.
4.
Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, et al.
Integrating common and rare genetic variation in diverse human populations.
Nature. 2010;467(7311):52-8. Epub 2010/09/03.
Download