Supplementary Methods (doc 60K)

advertisement
Supplementary Methods for the analyses of the replication sample
1. Replication sample
In this study, we used an independent sample to verify the findings from our primary sample.
The replication sample was selected from the neuropathology collection of Stanley Medical
Research Institute. The sample contained 14 schizophrenia subjects, 14 bipolar disorder subjects,
and 15 healthy controls. The total RNA samples were isolated from the hippocampus and the
RNA samples were prepared and quality-controlled by Stanley staff. The RNA sequencing was
conducted using paired end chemistry, and standardized protocols were applied to base-calling
and removal of contaminated reads. Once the bases were called, we used the same procedures
and parameters as that used in our primary sample to map the reads to human genome and genes,
and the expression levels of the genes were calculated as RPKM.
We realized that the primary and replication samples used different brain regions, and this
difference could complicate the interpretation of the replication. The main reasons for this choice
were the following. A). Both the cingulate cortex and hippocampus had been implicated in both
schizophrenia and bipolar disorders by imaging, gene expression and proteomic studies.1,2 While
this was not a direct replication, the choice was reasonable. B). The main focuses of this were
genome-wide expression between schizophrenia and bipolar disorder, and the findings were
biological pathways and their interaction networks, not individual genes. This higher level
interactions across different pathways are more likely to be preserved across brain regions,
therefore, using samples from different brain regions could verify the results if the findings are
likely to be true. C). We had some practical difficulties to find a large transcriptome sequencing
work using the same brain region. It would be too long before we could do the same work with
an independent sample. When we were informed of the hippocampus dataset, we decided to use
it.
2. Gene differential expression
For the replication data set, we used RPKM as gene expression index as we did in the
primary sample. Prior to differential expression analyses, we excluded those genes with low
expression values. Specifically, we removed the genes with RPKM = 0 in more than 20%
individuals and the genes with a median RPKM < 0.5. A total of 12,731 genes were used for
subsequent differential expression analysis. Then we fitted a linear model on the expression data
of SCZ, BPD, and control samples to identify DEGs. Age, sex, cumulative anti-psychotic use
(square root transformed), brain pH, and postmortem interval were included in the regression
analyses as covariates.
The differential expression analyses did not produce a clear set of DEGs since none of the
genes was statistically significant after multiple test correction (minimal q values were 0.2291
and 0.7267 for SCZ and BPD respectively). Since we identified 105 DEGs for SCZ and 153
DEGs for BPD from the primary data set, we chose the top 105 and 153 genes (ordered by
increasing p-values) as DEG candidates for SCZ and BPD, respectively, for pathway and
network analyses. Of these top-ranked DEG candidates, 98 SCZ genes and 144 BPD genes had a
valid Entrez Gene ID (according to R package “org.Hs.eg.db” v 2.9.0). These top-ranked DEG
candidates had no overlap with the DEGs found in the primary sample for both SCZ and BPD.
We used the same rationale to select a set of 213 DCEG candidates (212 with valid Entrez Gene
IDs) based on the absolute product of paired t-scores (from SCZ and BPD analyses, respectively).
Only one gene (ETS2) overlapped with the DCEGs found in the primary sample.
3. Pathway enrichment analysis
We performed the pathway enrichment analyses using the same procedures as we did with
the primary sample. Specifically, we used hypergeometric test implemented in the tool
WebGestalt (version 2, http://bioinfo.vanderbilt.edu/webgestalt/)3 to identify enriched pathways
from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. To avoid too many or
too few genes to be considered in each pathway analysis, we only included the pathways whose
sizes were between 5 and 250 genes.4 The p-values from hypergeometric tests were further
adjusted by Benjamini-Hochberg method.5 For the top-ranked DEG candidates in SCZ, the
regulation of actin cytoskeleton was identified, which was a direct confirmation of the same
pathway found in the DEGs from the primary sample. Similarly, pathway “Metabolism of
xenobiotics by cytochrome P450” was a direct confirmation for BPD differentially expressed
genes. In the analyses of top-ranked DCEG candidates, the small cell lung cancer pathway was
the only pathway confirmed.
4. Pathway crosstalk/interaction analysis
To test if the same pathway interactions exist among the top-ranked DCEG candidates, we
conducted an analysis using the 18 pathways enriched by the primary sample DCEGs and DCEG
candidates selected from the replication dataset. Specifically, we applied the Character Sub-
Pathway Network (CSPN) algorithm6 to sift significantly interacting pathway pairs. CSPN was
designed to prioritize pathway pairs having a large number of pathway-bridging Protein-Protein
Interactions (PPIs) that would be unlikely to exist in randomly permuted PPI networks. We used
the human PPI data from the Protein Interaction Network Analysis (PINA) platform (September
14, 2012)7 as the reference network in this pathway crosstalk analysis. Our working PPI network
included a total of 11,318 nodes (protein-coding genes) and 67,936 interactions. When running
CSPN, a mode “OR” was selected, meaning that we considered all PPIs formed by the DCEGs
as well as their one-step extension. In the final step of this analysis, we selected the significant
pathway interaction pairs as having permutation p-values less than 0.05. We identified five
significant pathway interactions. All these five interactions were the same as that from the
primary sample. In particular, the interaction between axon guidance and Fc gamma R-mediated
phagocytosis and the interaction between axon guidance and regulation of actin cytoskeleton are
verified in the replication sample.
Acknowledgement
Thanks to Drs. Junfeng Xia, and Xiaojing Wang for helpful discussion and technical support; to
Dr. Shao Li for providing the CSPN script.
References
1. Focking M, Dicker P, English JA, Schubert KO, Dunn MJ, Cotter DR. Common proteomic
changes in the hippocampus in schizophrenia and bipolar disorder and particular
evidence for involvement of cornu ammonis regions 2 and 3. Arch Gen Psychiatry
2011; 68: 477-488.
2. Sheng G, Demers M, Subburaju S, Benes FM. Differences in the circuitry-based association
of copy numbers and gene expression between the hippocampi of patients with
schizophrenia and the hippocampi of patients with bipolar disorder. Arch Gen
Psychiatry 2012; 69: 550-561.
3. Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt):
update 2013. Nucleic Acids Res 2013; 41: W77-W83.
4. Jia P, Liu Y, Zhao Z. Integrative pathway analysis of genome-wide association studies and
gene expression data in prostate cancer. BMC Syst Biol 2012; 6 Suppl 3: S135. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful
approach to multiple testing. J R Statist Soc B 1995; 57: 289-300.
6. Huang Y, Li S. Detection of characteristic sub pathway network for angiogenesis based on
the comprehensive pathway network. BMC Bioinformatics 2010; 11 Suppl 1: S327. Wu J, Vallenius T, Ovaska K, Westermarck J, Makela TP, Hautaniemi S. Integrated network
analysis platform for protein-protein interactions. Nat Methods 2009; 6: 75-77.
Download