Supplementary Information (doc 408K)

advertisement
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
1 Supplemental Material
1.1 Subjects
This case-only study used a subset of data from the Autism Genome Project (AGP), a collaborative
genetics research initiative.1 The sample in these analyses was comprised of 2705 parent-affected child
trios (2384 of European ancestry) derived from a pooled sample, a subset of whom were included in
Pinto et al.,2 and is a subset of Pinto et al.3 The final sample included in the analysis consisted of 1590
cases with a diagnosis of an ASD, of European ancestry, with at least one rare CNV impacting any gene.
Due to the international nature of the AGP, there was a wide range of ethnic backgrounds included in
the sample collection. Ethnic ancestry was determined using the SpectralGEM software4 as shown in
Pinto et al.2 Supplementary figure 2. The analyses presented here were restricted to cases of European
ancestry.
1.2 Clinical Measures
Diagnostic inclusion was defined by DSM-IV5 criteria at all sites, assessed by the Autism Diagnostic
Observation Schedule (ADOS)6 and the Autism Diagnostic Interview-Revised (ADI-R).7 A combined
diagnostic classification strategy was employed, as in Risi et al.8 For these analyses, phenotypes beyond
diagnosis with a proposed neurodevelopmental origin and availability across AGP sites were derived
from the ADI, including verbal status, age at first words and phrases (also collapsed into a composite
language delay), gait disturbance, and seizures. Additionally, the ADOS severity score,9 Vineland
Adaptive Behavior Scales (VABS),10 a selected composite intelligence quotient (IQ; verbal, performance,
and full scale), maternal and paternal age at birth, and family type were analyzed when they were
available. Data completeness by variable is reported in Table S1.
Multi-level categorical ADI variables were dichotomized. The most commonly used versions of
the ADI used by the AGP sites include the Western Psychological Services (WPS) version, the preferred
version for the AGP; the 1995 long form version; and the 1995 short form version. Because all of the
versions contain the core items for the ADI diagnostic algorithm, they were combined into a single
merged ADI dataset for use in AGP analyses. The final verbal status variable was taken from the ADI’s
“Overall level of language” question, and divided into 'verbal' if the participant had functional use of
spontaneous speech with phrases of at least three words and 'non-verbal' if they did not. The final gait
variable was a composite of the ever and current gait variables using an 'or' rule, and divided into
'normal' gait or an 'unusual' gait. The current and ever “faints, fits and blackouts” variables were
combined using an 'or' rule and the single resulting variable included no history of seizures (excluding
febrile convulsions) or a history of seizures (diagnosis of epilepsy was not required). For age at first
words, ages less than 24 months were considered typical, and first words at 24 months or greater was
considered delayed. For age at first phrases, ages less than 33 months were considered typical, while
first phrases at 33 months or greater was considered delayed. In both instances, special codes were
1
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
used and recoded: 996 was considered typical; 993, 994 and 997 were considered delayed; and 998 and
999 were coded as missing. The dichotomous age at first words and age at first phrases were combined
into a single language delay variable, wherein being delayed on either item was an endorsement for
overall language delay. When age at first words and age at first phrases were analyzed continuously, all
of the meaningful missing codes were re-coded simply to missing.
Gotham and colleagues9 created a metric to measure autism severity overall, taking all disorder
related domains into account, as the ADI does not assess numerous items in nonverbal children. This
method was designed to be independent from chronological age, IQ and verbal status, and to present a
standardized measure of severity for use in research studies. The calibrated severity scores are
calculated via an AGP database algorithm using the subject’s chronological age, Module and ADOS raw
score. In cases where the ADOS was not uploaded to the database, severity scores were generated by
Ms. Ann Thompson (McMaster University) and provided to the author for the analyses presented here.
The calibrated severity scores range from one to ten, with one through three indicating non-spectrum,
four and five indicating autism spectrum, and six through ten indicating strict autism. At this time, an
algorithm for calculating ADOS severity scores is only available for Modules 1, 2 and 3. Participants who
completed Module 4 do not have a severity score, and were coded as missing.
Given that there are numerous AGP contributing sites, a host of different tests were used to
assess IQ. In order to rationalize these 57 tests, an AGP sub-committee, led by Dr. Judith Miller from the
University of Utah, created a selected composite IQ score. Each contributing site uploaded raw verbal,
performance and full scale IQ scores, as well as the age of assessment at each test. The list of tests was
prioritized, and in the case of assessments on more than one measure, the preferred test was used for
future analyses. If a subject was administered the same test at two or more time points, the test at
chronological age greater than six years was preferred with at least one complete test on all measures. If
less than six years of age, or there were no complete tests, individual measures were separately
selected. Both categorical and continuous scored tests were used. The continuous scores were divided
into three categories in the database: 1= 1-49 (moderate disability); 2= 50-70 (mild disability); and 3=
70+ (normal or above average ability). Additionally, various meaningful missing codes were included in
the database: 333= could not complete due to low functioning; 444= could not complete due to
behavior; 555= could not complete and reason unknown; and 999= IQ not available/never will be. As in
clinical practice and the convention for AGP analyses, on the IQ subscales, scores below 70 were
classified as 'low', indicating the presence of intellectual disability and scores 70 or greater were
classified as 'typical', which collapsed categories 1 and 2 from above into a single category. Tests that
were not able to be completed due to low functioning (333 code above) were placed into the 'low'
group for the dichotomous analysis, but coded as missing for continuous variable analyses. In order to
maximize the available data, these categorical scores were used in the analyses presented here.
Maternal and paternal ages at birth of the child were reported in months, with valid ages
ranging between 168 and 840 months. For the analyses presented here, the parental ages were treated
as continuous variables.
2
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
In the AGP collection, there are three family type classifications: simplex, multiplex and
unknown. Multiplex families have at least two first to third degree (cousins only) relatives with a
validated, clinical ASD diagnosis, and include affected dizygotic twins. Simplex families have only one
known affected individual among the first to third degree (cousins only) relatives, and include affected
monozygotic twins. All other situations are coded unknown. In all instances, a family history should be
taken, and when diagnoses are not confirmed, the unknown code was used.
Because gait disturbances and seizures were not assessed in all versions of the Autism
Diagnostic Interview (ADI), there are more missing data on these variables than on the ADI diagnostic
assessment overall. Furthermore, age at first words and phrases will not reflect the meaningful missing
codes described previously. Due to the variety of IQ assessments employed, the full scale IQ score has a
higher rate of missingness than the component verbal and performance measures. Approximately 16%
of the sample have an unknown family type.
Table S1: Complete Data on Core Measures
Variable
N
% Complete
ADI verbal status
1580
99.37
ADI language delay
1559
98.05
ADI age at first words
1300
81.76
ADI age at first phrases
1112
69.94
ADI gait disturbance
1430
89.94
ADI faints, fits, blackouts
1430
89.94
ADOS Severity Score
1054
66.29
Selected composite verbal IQ category
968
60.88
Selected composite performance IQ category 1122
70.57
Selected composite full-scale IQ category
871
54.78
VABS communication subscale score
1262
79.37
VABS socialization subscale score
1274
80.13
VABS daily living subscale score
1255
78.93
VABS composite score
1248
78.49
Paternal age
1311
82.45
Maternal age
1314
82.64
Family Type (including unknown)
1590
100.00
*ADI = Autism Diagnostic Interview; ADOS = Autism Diagnostic Observation Schedule; IQ = intelligence quotient.
1.3 Gene Lists
1.3.1.1 ASD- and ID-implicated (ASD/ID)
The ASD- and ID-implicated gene lists were compiled for the AGP by Dr. Catalina Betancur, MD,
PhD from the Université Pierre et Marie Curie, France. ASD- and ID- implicated gene lists were ‘expertcurated’ via literature searches and database reviews through December 2009 (see 2 for review). The
ASD-implicated gene list contains 36 genes strongly implicated in ASD and identified in subjects with
3
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
ASD or ASD and ID. The ID-implicated gene list contains 110 genes known to be implicated in ID but not
yet in ASD. These lists were merged and used as one list for the analyses presented here.
1.3.1.2Differentially Brain Expressed (DBE)
The differentially brain expressed (DBE) gene list was originally compiled by Dr. Soumya
Raychaudhuri, MD, PhD (see 11 for review). Briefly, he identified genes with specific, differential,
expression in the brain as compared to other body tissues. Dr. Raychaudhuri provided this list using
Entrez Gene IDs (http://www.ncbi.nlm.nih.gov/gene), via personal correspondence. Genes obtaining an
expression p-value of <0.01 were defined as preferentially expressed for our analyses. Some of the
Entrez IDs he provided were for ‘discontinued’ genes and these ‘discontinued’ genes were removed
from subsequent analyses. Genes lacking an official HUGO gene symbol (http://www.genenames.org/)
were also removed, leaving 3268 genes in the final differentially brain expressed list, which differs from
the number of genes reported in Raychaudhuri et al.11 Some of these genes do overlap with ASD/IDimplicated candidate list described above (42 genes).
1.4 Analytic Details
1.4.1 Latent Variable Analysis
Latent variable analysis is a statistical technique used to examine the association between
manifest (read: observable) variables and latent (read: inferred) variables. The latent variables are
characterized by a mathematical model from the observed variables. A main function of examining
latent variables is to reduce the number of variables under examination, combining them to create more
homogeneous groupings of data. Depending upon the type of variables under examination, different
latent variable analysis techniques are used.
In cases where the variables are mixed (continuous and categorical), the continuous approach is
used. Since our study is examining a categorical latent variable (presence or absence of a CNV impacting
a gene in one of our lists) and both categorical and continuous manifest variables, the latent profile
mixture model approach was used for this study (sometimes called a latent class cluster analysis). All
analyses were completed in the Mplus software package 12 using the mixture model option.
The number of classes that best fit the data can be determined using a number of different
metrics. The most commonly used metrics are information criteria and uncertainty measures.13 Mplus
provides Akaike (AIC), Bayesian (BIC) and Sample-Size Adjusted BIC (BICSSA) information criteria where a
lower value indicates a better model fit.
For the analyses presented here, the BIC was chosen, as it manages with small sample sizes and
consistently chooses the correct model as the sample size increases.14 All of the model fit criteria
examine the expected cell counts that are derived from the latent class model and the observed
frequency count.15 With respect to uncertainty measures, Mplus provides a measure of entropy that
demonstrates how well the variables under investigation separate into classes 13. These metrics can be
used independently or in combination.
4
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
The continuous variables used were age at first words, age at first phrases, ADOS Severity Score,
maternal age, paternal age, the Vinland subscale and composite scores, and the IQ measures: verbal,
performance, full scale. The categorical variables used in the analyses were verbal status, gait
disturbances, seizures, language delay and family type. The analyses were run on the entire sample, and
stratified by sex.
After testing information criteria fit indices for one through twelve classes, no local minimum
was found for the sample (Figure S1). From this we can conclude that there were no obvious phenotype
subgroups within this sample, for the variables that we selected.
Figure S1: Latent Profile Fit Indices for the AGP Sample
1.4.2 Recursive Partitioning via Random Forests
Recursive partitioning is a multivariable statistical technique used to identify homogeneous
groupings of data, using a series of classification rules created from the data itself, based on an initial
classification seed, or initial splitting variable. The aim is to divide the data into "mutually exclusive and
exhaustive subsets",16 by examining the homogeneity within each split and then (if necessary) pruning
the tree to avoid over-fitting the data. Classification and regression trees (CART) can be implemented
with numerous predictor data types and can be used when there is missing data. In addition, the
assumption of a normal distribution of the predictor data can be ignored, and interactions can be
detected.16 Here, a decision tree is formed, based on classification rules for each variable. In
Classification Trees the predicted outcome is the class to which the data belong, whereas with
Regression Trees the predicted outcome can be considered a real number, rather than a classification.
One major strength of these methods is that they can manage both numerical and categorical data. It is
a white box model, as opposed to the black box model of methods such as artificial neural networks.
CART methods are robust and can be validated using other techniques, such as regression. However,
5
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
there are some limitations. CART methods can create over-complex trees that are hard to generalize.
Also, the splitting algorithms may be biased in favor of those attributes with more levels.17
In this study, a combination of classification and regression trees, called Random Forests, was
used. To create a forest, a pre-determined number of trees are created as described by Breiman.18 An
example tree is shown in Figure S2.
Figure S2: Example Regression Tree
For each tree, a total set of N training cases and M classifier variables are considered. m
variables, a small subset of M (often log2(M)), are selected to create the tree. The aim of choosing m
smaller than M is to reduce the impact of multicollinearity among the variables. One can also optimize
m experimentally, by choosing different values of m and examining the outcome. Next, n cases are
chosen from all N available cases, with replacement, to create a ‘bootstrap’ sample. Typically the
bootstrap sample comprises 66% of the total sample. The remaining 33% is used as an out-of-bag (OOB)
sample to test classification error after the tree is completed. This bootstrap sample is then split into
two groups to measure entropy impurity, an index of the heterogeneity in the data.19
Once the tree is complete, the OOB sample is dropped through the tree, and each case is
classified by the tree. The proportion of cases correctly classified is the accuracy, and 1 minus the
accuracy is the error. Then, for each variable, the values of a predictor variable are permuted. The
original OOB samples and permuted OOB samples are classified by the corresponding tree. The
classification accuracy between original and permuted OOB samples is compared. Averaging the
6
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
differences over all trees defines the variable importance. If the variable’s value is permuted and the
error rates do not go up, it is not a useful predictor. In addition, in random forests, there is no need for
cross-validation or a separate test set to get an unbiased estimate of the test set error. This has proven
to be unbiased in many tests.
The analyses presented here were implemented in Willows,20 a software program developed by
Heping Zhang and colleagues at Yale University in the USA (http://c2s2.yale.edu/software/Willows/).
The Willows software can create classification trees, random forests or deterministic forests.
Classification trees create classification rules based on a given predictor and a threshold for that
predictor aiming to have the most homogeneous groupings possible. Subsequent splits create daughter
nodes that can be split in turn until splitting is no longer possible. In addition, redundant nodes can be
pruned from the tree. A random forest differs from the tree in that many trees are created and
compared, but the rules for creating the trees are somewhat different.20 The 'out of bag' error rate is
reported.
1.4.3 Association Analyses
Nine statistical tests were carried out for each sub-phenotype under investigation. These included
any CNVs, deletions, and duplications for each of the two gene lists, as well as any de novo CNV, ASD/ID
de novo CNVs, and DBE de novo CNVs. The statistical test and relevant covariates for each subphenotype are itemized in Table S2. CNV carrier status predicted the outcome phenotype, except in the
case of parental age, where that predicted CNV status.
7
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
Table S2: Analysis of ASD Variables
Variable
Overall level of language
Gait disturbance
Seizures
Language delay
Age at first words
Age at first phrases
ADOS Severity Score
Verbal IQ
Performance IQ
Full Scale IQ
VABS Communication
VABS Socialization
VABS Daily Living Skills
VABS Composite
Maternal Age
Paternal Age
Family Type
Variable Type Statistical Method
Clinical Characteristics
Selected ADI Variables
Dichotomous
Logistic Regression
Dichotomous
Logistic Regression
Dichotomous
Logistic Regression
Dichotomous
Logistic Regression
Continuous
Linear Regression
Continuous
Linear Regression
Measures of Severity
Ordinal
Linear Regression
Intelligence Testing
Dichotomous
Logistic Regression
Dichotomous
Logistic Regression
Dichotomous
Logistic Regression
Adaptive Function
Continuous
Linear Regression
Continuous
Linear Regression
Continuous
Linear Regression
Continuous
Linear Regression
Parental/Family Factors
Continuous
Logistic Regression
Continuous
Logistic Regression
Dichotomous
Logistic Regression
Covariates
Age at ADI, Stage
Age at ADI, Stage
Age at ADI, Stage
Age at ADI, Stage
Age at ADI, Stage
Age at ADI, Stage
Age at ADOS, Stage
Age at VIQ, Stage
Age at PIQ, Stage
Age at FSIQ, Stage
Age at VABS, Stage
Age at VABS, Stage
Age at VABS, Stage
Age at VABS, Stage
Age at ADI, Stage, Paternal Age
Age at ADI, Stage, Maternal Age
Age at ADI, Stage
*ADI = Autism Diagnostic Interview; ADOS= Autism Diagnostic Observation Schedule; IQ= Intelligence Quotient;
VABS= Vineland Adaptive Behavior Scales.
8
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
2 Additional Results
2.1 Additional Association Results
Table S3: Summary of Findings on Associations of CNVs impacting Autism Spectrum Disorder or Intellectual
Disability (ASD/ID) Genes with Clinical Phenotypes
ASD/ID Gene List
Variable
% Non-verbal
% positive for Gait disturbance
% positive for Seizures
% positive for Language delay
Age at first words (mean)
Age at first phrases (mean)
ADOS Severity Score (mean)
Verbal IQ (mean)
Performance IQ (mean)
Full Scale IQ (mean)
% low Communication
% low Socialization
% low Daily Living Skills
% low Composite
Maternal Age (mean)
Paternal Age (mean)
% Simplex Family Type
Total
Deletion
% or mean (N)
Yes
No
Clinical Characteristics
28.5% (451/1580)
35.3% 28.3%
48.6% (695/1430)
52.2% 48.5%
10.4% (149/1430)
17.4% 10.2%
76.3% (1190/1559) 58.0%1,2 76.9%
27.0 (1300)
28.0
27.0
42.1 (1112)
41.8
42.1
7.5 (1054)
7.9
7.5
Intelligence Testing
78.7 (938)
81.0
78.6
83.9 (1069)
76.2
84.1
77.7 (798)
69.5
77.9
VABS Adaptive Function
61.6% (777/1262)
68.4% 61.4%
72.2% (920/1274)
86.8%1 71.8%
70.1% (880/1255)
83.8% 69.7%
74.0% (923/1248)
89.2%1 73.5%
Parental/Family Factors
30.6 (1301)
30.4
30.6
33.0 (1298)
32.4
33.0
64.3% (862/1340)
65.9% 64.3%
Duplication
Yes
No
All
Yes
No
21.4%
48.0%
16.0%
70.2%
24.7
40.5
7.7
28.8%
48.6%
10.2%
76.6%
27.1
42.2
7.5
27.9%
48.4%
16.1%
63.5%1,2
26.3
40.8
7.8
28.6%
48.6%
10.0%
77.3%
27.0
42.2
7.5
89.11,2
84.7
80.6
78.3
83.8
77.5
85.91
81.5
76.4
78.2
84.0
77.8
44.2%1,2
63.6%
65.1%
60.5%1,2
62.2%
72.5%
70.3%
74.4%
55.7%
73.8%
74.4%
73.1%
62.0%
72.1%
69.8%
74.0%
30.7
32.5
62.0%
30.6
33.0
64.4%
30.7
32.7
63.6%
30.6
33.0
64.4%
*ADOS= Autism Diagnostic Observation Schedule; IQ= Intelligence Quotient. Statistically significant findings
(α=0.05) in unadjusted models are notated with a 1, and statistically significant findings in adjusted models are
noted with a 2. Adjusted models included adjustment for age at assessment and genotyping stage. The parental
age analyses also included adjustment for the opposite parent’s age. The ‘all’ category includes both deletions
and/or duplications.
9
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
Table S4: Summary of Findings on Associations of CNVs impacting Differentially Brain Expressed (DBE) Genes with
Clinical Phenotypes
Brain Expressed Gene List
Variable
% Non-verbal
% positive for Gait disturbance
% positive for Seizures
% positive for Language delay
Age at first words (mean)
Age at first phrases (mean)
ADOS Severity Score (mean)
Verbal IQ (mean)
Performance IQ (mean)
Full Scale IQ (mean)
% low Communication
% low Socialization
% low Daily Living Skills
% low Composite
Maternal Age (mean)
Paternal Age (mean)
% Simplex Family Type
Total
Deletion
% or mean (N)
Yes
No
Clinical Characteristics
28.5% (451/1580) 28.5% 28.6%
48.6% (695/1430) 50.1% 48.1%
10.4% (149/1430) 10.5% 10.4%
76.3% (1190/1559) 74.6% 77.0%
27.0 (1300)
26.3
27.3
42.1 (1112)
41.6
42.3
7.5 (1054)
7.4
7.6
Intelligence Testing
78.7 (938)
79.6
78.4
83.9 (1069)
84.1
83.8
77.7 (798)
77.7
77.6
VABS Adaptive Function
61.6% (777/1262) 60.9% 38.2%
72.2% (920/1274) 72.1% 72.3%
70.1% (880/1255) 70.2% 70.1%
74.0% (923/1248) 72.0% 74.7%
Parental/Family Factors
30.6 (1301)
30.6
30.6
33.0 (1298)
33.4
32.8
64.3% (862/1340) 66.1% 63.8%
Duplication
Yes
No
All
Yes
No
25.9%
46.6%
11.9%
74.2%
26.5
41.2
7.5
29.7%
49.5%
9.8%
77.3%
27.2
42.5
7.6
27.3%
48.0%
11.1%
74.7%
26.5
41.7
7.5
29.8%
49.2%
9.7%
78.1%
27.5
42.6
7.6
80.7
84.7
84.7
77.9
83.5
77.3
79.8
83.9
77.6
77.6
83.8
77.7
56.7%1,2
67.0%1,2
64.1%1,2
67.4%1,2
63.8%
74.6%
72.8%
76.9%
59.0%
70.0%
67.0%1,2
69.7%1,2
64.2%
74.6%
73.4%
78.4%
30.5
32.82
62.2%
30.6
30.6
65.3%
30.52
33.1
63.9%
30.7
32.9
64.8%
*ADOS= Autism Diagnostic Observation Schedule; IQ= Intelligence Quotient. Statistically significant findings
(α=0.05) in unadjusted models are notated with a 1, and statistically significant findings in adjusted models are
noted with a 2. Adjusted models included adjustment for age at assessment and genotyping stage. The parental
age analyses also included adjustment for the opposite parent’s age. The ‘all’ category includes both deletions
and/or duplications.
10
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
Table S5: Summary of Findings on Associations of de novo CNVs, de novo impacting Autism Spectrum Disorder or
Intellectual Disability Genes, or de novo CNVs impacting Differentially Brain Expressed Genes (DBE) with Clinical
Phenotypes
De Novo CNVs
Variable
% Non-verbal
% positive for Gait disturbance
% positive for Seizures
% positive for Language delay
Age at first words (mean)
Age at first phrases (mean)
ADOS Severity Score (mean)
Verbal IQ (mean)
Performance IQ (mean)
Full Scale IQ (mean)
% low Communication
% low Socialization
% low Daily Living Skills
% low Composite
Maternal Age (mean)
Paternal Age (mean)
% Simplex Family Type
Total
All
% or mean (N)
Yes
No
Clinical Characteristics
28.5% (451/1580)
27.6%
29.5%
48.6% (695/1430)
57.5%
47.7%
10.4% (149/1430) 20.2%1,2 10.0%
76.3% (1190/1559)
75.7%
76.6%
27.0 (1300)
28.5
26.9
42.1 (1112)
44.0
41.9
7.5 (1054)
8.0
7.5
Intelligence Testing
78.7 (938)
76.1
77.9
83.9 (1069)
78.0
83.3
77.7 (798)
78.2
77.0
VABS Adaptive Function
61.6% (777/1262)
72.7%
61.0%
72.2% (920/1274)
76.1%
72.4%
70.1% (880/1255)
69.8%
70.0%
74.0% (923/1248)
81.8%
73.8%
Parental/Family Factors
30.6 (1301)
30.3
30.6
33.0 (1298)
32.7
32.9
64.3% (862/1340)
72.0%
64.1%
ASD/ID
Yes
No
DBE
Yes
No
35.0%2
55.6%
33.3%1,2
70.0%
32.1
49.5
8.3
29.3%
48.3%
10.3%
76.7%
27.0
41.9
7.5
25.0%
55.4%
19.6%1
69.8%
26.6
42.0
7.7
29.6%
48.1%
10.3%
76.9%
27.1
42.1
7.6
83.3
85.0
85.4
77.7
82.9
77.0
80.8
79.2
78.6
77.6
83.1
77.0
63.2%
79.0%
78.0%
73.7%
61.9%
72.6%
69.8%
74.4%
70.0%
78.0%
66.7%
78.0%
61.6%
72.4%
70.1%
74.3%
31.9
33.9
84.6%
30.5
32.9
64.5%
30.2
32.5
80.0%1,2
30.6
32.9
64.0%
*ADOS= Autism Diagnostic Observation Schedule; IQ= Intelligence Quotient. Statistically significant findings
(α=0.05) in unadjusted models are notated with a 1, and statistically significant findings in adjusted models are
noted with a 2. Adjusted models included adjustment for age at assessment and genotyping stage. The parental
age analyses also included adjustment for the opposite parent’s age. The ‘all’ category includes both deletions
and/or duplications.
3 References
1.
Hu-Lince D, Craig DW, Huentelman MJ, Stephan DA. The Autism Genome Project: goals and
strategies. Am J Pharmacogenomics 2005; 5(4): 233-246.
2.
Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, Regan R et al. Functional impact of global
rare copy number variation in autism spectrum disorders. Nature 2010; 466(7304): 368-372.
3.
Pinto D, Delaby E, Merico D, Barbosa M, Merikangas A, Klei L et al. Convergence of genes and
cellular pathways dysregulated in autism spectrum disorders. Am J Hum Genet 2014; 94(5): 677694.
11
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
4.
Lee AB, Luca D, Klei L, Devlin B, Roeder K. Discovering genetic ancestry using spectral graph
theory. Genet Epidemiol 2010; 34(1): 51-59.
5.
American Psychiatric Association. Diagnostic and statistical manual of mental disorders, 4th
Edition (DSM-IV). 4th edn. American Psychiatric Association: Washington, DC, 1994.
6.
Lord C, Rutter M, Goode S, Heemsbergen J, Jordan H, Mawhood L et al. Autism diagnostic
observation schedule: a standardized observation of communicative and social behavior. J
Autism Dev Disord 1989; 19(2): 185-212.
7.
Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a
diagnostic interview for caregivers of individuals with possible pervasive developmental
disorders. J Autism Dev Disord 1994; 24(5): 659-685.
8.
Risi S, Lord C, Gotham K, Corsello C, Chrysler C, Szatmari P et al. Combining information from
multiple sources in the diagnosis of autism spectrum disorders. J Am Acad Child Adolesc
Psychiatry 2006; 45(9): 1094-1103.
9.
Gotham K, Pickles A, Lord C. Standardizing ADOS scores for a measure of severity in autism
spectrum disorders. J Autism Dev Disord 2009; 39(5): 693-705.
10.
Sparrow SS, Cicchetti DV, Balla DA. Vineland Adaptive Behavior Scales, Second Edition (VinelandII). Pearson: San Antonio, TX, 2005.
11.
Raychaudhuri S, Korn JM, McCarroll SA, Altshuler D, Sklar P, Purcell S et al. Accurately assessing
the risk of schizophrenia conferred by rare copy-number variation affecting genes with brain
function. PLoS Genet 2010; 6(9).
12.
Muthén LK, Muthén BO. Mplus User's Guide. Sixth Edition. Muthén & Muthén: Los Angeles, CA,
1998-2011.
13.
Vermunt JK, Magidson J. Latent Class Cluster Analysis. In: Hagenaars JA, McCutcheon AL (eds).
Applied Latent Class Analysis. Cambridge University Press2002
14.
Asparouhov T, Muthen B. Multilevel mixture models. In: Hancock GR, Samuelsen KM (eds).
Advances in latent variable mixture models. Information Age Publishing, Inc.: Charlotte, NC,
2008.
12
Supplemental Material: The phenotypic manifestations of rare genic CNVs in Autism Spectrum Disorder
15.
McCutcheon AL. Basic Concepts and Procedures in Single- and Multiple-Group Latent Class
Analysis. In: Hagenaars JA, McCutcheon AL (eds). Applied Latent Class Analysis. Cambridge
Univeristy Press2002.
16.
Vittinghoff E, McCulloch CE, Glidden DV, Shiboski SC. 5 Linear and Non-Linear Regression
Methods in Epidemiology and Biostatistics. In: Rao CR, Rao DC, Miller JP (eds). Handbook of
Statistics: Epidemiology and Medical Statistics: 272007.
17.
Altmann A, Tolosi L, Sander O, Lengauer T. Permutation importance: a corrected feature
importance measure. Bioinformatics 2010; 26(10): 1340-1347.
18.
Breiman L. Random forests. Machine Learning 2001; 45: 5-32.
19.
Tutorial on Decision Tree. http://people.revoledu.com/kardi/tutorial/DecisionTree, 2009.
20.
Zhang H, Wang M, Chen X. Willows: a memory efficient tree and forest construction package.
BMC Bioinformatics 2009; 10: 130.
13
Download