SUPPLEMENTARY INFORMATION Table of Contents 1. Introduction 2. Supplementary information about samples from different databases 2.1. Study design 2.2. Brain collections 2.2.1. Stanley Medical Research Institute (SMRI): Bipolar disorder (BD), Schizophrenia (SCZ) and control (CTRL) samples 2.2.1.1. Prefrontal cortex (PFC) 2.2.1.2. Parietal cortex (PCX) 2.2.2.3. Cerebellum (CB) 2.2.2. Victorian Brain Bank Network (VBBN): SCZ and CTRL samples 3. SMRI PCX and CB sample preparation 3.1. RNA extraction 3.2. Array processing 4. Gene expression data quality assessment 4.1. Probe level assessment 4.1.1. Affymetrix Gene Array filtering 4.1.2. Affymetrix U133 Array filtering 4.2. Sample level assessment 4.3. Batch effect adjustment 4.4. Summary 5. Gene expression data analyses 5.1. Identification of differential expression genes between SCZ and CTRL samples in PCX-SMRI 5.2. Weighted gene co-expression network analysis (WGCNA) 5.2.1. Network construction 5.2.2. Module detection 5.3. Characteristics of modules 5.4. Module preservation statistics 5.5. Pathway and functional analyses 6. Genotyping data from different consortiums 6.1. Genetic Association Information Network (GAIN)-SCZ 6.2. GAIN-BD 6.2. Translational Genomics Research Institute (TGen)-BD 7. Imputation of GWAS data 8. Integration analysis of expression and GWAS data 9. Supplementary references 10. Supplementary figures Supplementary Fig. 1 | Cross-tabulation of results displays overlaps between Oldham modules and PCX-CTRL modules. Supplementary Fig. 2 | Topological overlap matrix (TOM) plot to detect modules in PCX Supplementary Fig. 3 | Composite preservation statistics of SMRI PCX modules in SMRI PFC SCZ and CTRL samples Supplementary Fig. 4 | Composite preservation statistics of SMRI PCX modules in SMRI CB SCZ and CTRL samples Supplementary Fig. 5 | Composite preservation statistics of SMRI PCX modules in VBBN PFC SCZ and CTRL samples Supplementary Fig. 6 | Composite preservation statistics of SMRI PCX modules in SMRI PCX BD and CTRL samples Supplementary Fig. 7 | Composite preservation statistics of SMRI PCX modules in SMRI CB BD and CTRL samples Supplementary Fig. 8 | Manhattan Plot for M1A gene set enrichment in GAIN-BD genetic association signals Supplementary Fig. 9 | Manhattan Plot for M1A gene set enrichment in GAIN-SCZ genetic association signals Supplementary Fig. 10 | Manhattan Plot for M1A gene set enrichment in TGen-BD genetic association signals 11. Supplementary tables Supplementary Table 1 | Demography of SMRI PCX samples Supplementary Table 2 | Demography of SMRI CB samples Supplementary Table 3 | Demography of SMRI PFC samples Supplementary Table 4 | Demography of VBBN PFC samples Supplementary Table 5 | M1A gene list and intramodular connectivity Supplementary Table 6 | M3A gene list and intramodular connectivity Supplementary Table 7 | M1A gene list and top GO functions Supplementary Table 8 | M3A gene list and top GO functions 1. Introduction This file contains detailed information about sample sources and preparation (sections 2, 3), data preprocessing and analysis (sections 4-8), and supplementary figures and tables. The sample sources include the online resources from which we downloaded data and the data produced in our lab. The data preprocessing and analysis includes quality control assessments and data analysis. In sections 5- 8, we provide detailed information on gene expression network analysis, module preservation analysis, pathway analysis, and genetic signals enrichment test. Additional tables and figures are presented at the end. 2. Supplementary information of samples from different databases 2.1. Study design Parietal cortex tissues from Stanley Medical Research Institute (SMRI) SCZ and CTRL samples were used as preliminary data in our analysis49. To validate our findings, we tested them in gene expression data sets from three brain banks, three brain regions and two psychotic diseases. 2.2. Brain collections 2.2.1. Stanley Medical Research Institute (SMRI): BD, SCZ and CTRL samples Brains came from the SMRI’s Neuropathology Consortium and Array collections, and included 50 SCZ samples, 50 BD samples and 50 CTRL samples. The detailed information about age, sex, race, postmortem interval, pH and side of brain is provided in the demographics table (Supplementary Table 1). 2.2.1.1. Parietal cortex (PCX) and Cerebellum (CB) Our lab obtained the PCX and CB tissues from SMRI. The RNA was prepared in our lab, then sent to the Yale core facility for the microarray experiments. The details of the sample preparation procedure are listed in section 3. 2.1.1.2. Prefrontal cortex (PFC) PFC gene expression data was downloaded from the Stanley Medical Research Institute’s online genomics database (www.stanleygenomics.org). The Study ID is five and created by Seth E. Dobrin. Brain region is frontalBA 46 and array type is Affymetrix hug133p. 2.2.2. Victorian Brain Bank Network (VBBN): SCZ and CTRL samples VBBN expression data was downloaded from the Gene Expression Omnibus (GEO) database. The data came from postmortem brain tissue (BA46) of 30 schizophrenic patients and 29 age- and sex-matched CTRLs ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE21138 )50. 3. PCX and CB sample preparation 3.1. RNA extraction RNA was extracted from the cerebellar cortex of 132 samples, and parietal cortex of 146 samples using the RNeasy Mini kit (Qiagen, Valencia, CA). The concentration and A260/A280 ratio were measured on the NanoDrop spectrophotometer. The 28S:18S rRNA ratio and RNA Integrity Number (RIN) were measured using an RNA LabChip kit on the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). Only RNA samples with a RIN > 7 were used for the expression profiling. 3.2. Array processing Affymetrix Human Gene 1.0 ST Array was used for whole genome transcriptome profiling at the NIH Neuroscience Microarray Consortium facility at Yale University. 4. Gene expression data quality assessment 4.1. Probe level assessment 4.1.1. Filtering of Affymetrix gene array data Single nucleotide polymorphisms (SNPs) in probe regions can affect probe hybridization efficiency51. We created the list of probes containing SNPs, and removed those probes before data preprocessing: 39,529 out of 805,481 probes were eliminated from the analysis. We provided customized library files (http://bioinfo.psych.uic.edu/ArrayGenes/SNPsInProbes.jsp) for the Robust Multichip Average (RMA) preprocessing steps: background correction, quantile normalization and gene level summarization52. Afterwards, for convenience of comparison, only genes with Entrez IDs were kept. 4.1.2. Filtering of Affymetrix U133 Array data To reduce noise, we filtered out probe sets using the MAS 5.0 algorithm53; probe sets which were called as present in more than 80% of the samples were retained. As for the PCX and CB data, probe sets without detailed annotation were also removed. 4.2. Sample level assessment For all data sets, we removed non-Europeans and outliers detected by Affymetrix Expression Console and randomly chose one sample if the data set included replicates54. 4.3. Batch effect adjustment ComBat, an efficient batch effect removal approach, was used to remove batch effects from these data sets55,56. 4.4. Summary The data collected in our lab from the SMRI PCX and CB tissues ultimately included CB data from 39 schizophrenia patients, 36 bipolar disorder patients and 44 normal samples, as well as PCX data from 45 schizophrenia patients, 42 bipolar disorder patients and 46 normal samples in parietal cortex. For both these data sets, 19,984 genes were retained. Downloaded SMRI PFC data came from 30 schizophrenia patients, 25 bipolar disorder patients and 29 normal samples; for this data set, 14,988 probe sets were retained for further analysis. The PFC data set from VBBN included 30 schizophrenia patients and 29 normal samples; 14,988 probe sets were retained for further analysis. The PFC data from CCHPC included 28 schizophrenia patients and 23 normal samples; 14,988 probe sets were retained for further analysis. 5. Gene expression data analyses 5.1. Identification of genes differentially-expressed between SCZ and CTRL samples in PCX-SMRI Multiple linear regression was used for each transcript, with age, sex and post-mortem interval as covariates, to remove any effects from these potentially confounding factors. Differential gene expression analysis between SCZ and CTRL samples was run after this correction. 5.2. Weighted gene co-expression network analysis (WGCNA) We identified genes with similar expression patterns using weighted gene co-expression network analysis (WGCNA)57,58. 5.2.1. Network construction The absolute values of Pearson correlation coefficients were calculated for all possible pairwise genes. This correlation matrix, S= [ sij ], was weighted into an adjacency matrix A= [ aij ] by power function, i.e. aij =power ( sij ,β) sij . The parameter of power, β, was chosen to ensure the adjacency matrix had an approximate scale-free topology. 5.2.2. Module detection Gene module detection begun with transforming the adjacency matrix into a topological overlap matrix (TOM) Ω= [ ij ], with 0< aij <1 implying 0< ij <1. Next, TOM-based dissimilarity between all possible pairwise genes was defined by d ij 1 ij . The Dynamic Tree cut algorithm was used to detect network modules. WGCNA and Dynamic Tree cut algorithm were implemented in R59. 5.3. Characteristics of modules Association test based on singular value decomposition (SVD) on each module We ran singular value decomposition (SVD) on each module’s expression matrix, and used the resulting eigengene to characterize the entire module in the subsequent analysis. After multiple linear regression on the eigengenes to remove the effects of sex, age, pH and PMI on SMRI and VBBN samples, the corrected module values were used to test the disease association using Pearson’s correlation test60. 5.4. Module preservation statistics We utilized cross-tabulation and module preservation statistics to assess the module preservation in different expression data sets61. The module preservation test has two advantages over the traditional cross-tabulation test. First, it considers not only the number of overlapping genes, but also the density and connectivity patterns of modules defined in the reference data set. Second, network-based preservation statistics do not require modules to be identified in the test set for comparison with the reference data set; this reduces the variation introduced to the analysis by various parameter settings used to build the network. Zsummary is the measurement statistic used to summarize evidence that a module is preserved more significantly than a random sample of all network genes. Langfelder et al. proposed the thresholds at follows: Zsummary < 2 implies no evidence for module preservation, 2 < Zsummary < 10 implies weak to moderate evidence, and Zsummary > 10 implies strong evidence for module preservation62. When module size varied across data sets, we reported the median rank statistic on our module preservation test, which is useful for comparing relative preservation among modules because it does not depend on module size. Before the preservation test, we converted the probe-level measurements into gene-level measurements to make the data from different platform comparable. For each gene, only the probe with the highest coefficient of variation was kept for further analysis. Overall, 8497 genes were retained on our preservation calculation. 5.6. Pathway and functional analyses The genes in the identified modules were analyzed through DAVID (DAVID, http://david.abcc.ncifcrf.gov/tools.jsp)63, which provided existing knowledge about those genes, on their functionality, potential relevance to neuropsychiatric diseases, and neuronal functions. Also, we identified specific functional category, including canonical pathways, functional categories, protein domain and Gene Ontology (GO) terms, that may be enriched for these genes. M3A contains 106 genes; we used the whole gene list to run the test. M1A includes 490 genes; due to its size, we selected the 200 genes most highly correlated with the module’s eigengene to run the test. 6. Genotyping data from different consortia 6.1. SCZ GWAS data (GAIN) SCZ genome-wide association data was downloaded from dbGaP (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000021.v3.p2). The study web link is http://www.genome.gov/19518664 (Genetic Association Information Network, GAIN)64. The consortium has collected 4591 cases and controls (1217 European-American cases, 1442 European-American controls, 953 African-American cases, 979 African-American controls). Whole genome genotyping was done with the Affymetrix Genome-wide Human SNP Array 6.0. 6.2. BD GWAS data (GAIN) Genome-wide association study of BD data was downloaded from dbGaP (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000017.v3.p1). The study web link is http://www.genome.gov/19518664 (Genetic Association Information Network, GAIN)65. The consortium has collected 3261 cases and controls (1079 European-American cases, 1081 European-American controls, 415 African-American cases, 686 African-American controls). Whole genome genotyping was done with the Affymetrix Genome-wide Human SNP Array 6.0. 6.3. BD GWAS data (TGen) Genome-wide association study of BD data was downloaded from the Translational Genomics Research Institute (TGen). This study includes 1,190 newly genotyped BD cases from the Bipolar Genome Study (BiGS) and 401 controls. Sample genotyping was conducted using Affymetrix GeneChip Mapping 5.0K Array. 7. Imputation of GWAS data BRLMM-p (Affymetrix) was used as the genotype-calling algorithm. SNPs with call rates less than 99% were excluded from the analysis. SNPs showing departure from Hardy-Weinberg equilibrium (HWE) were filtered out, as well (p < 0.001). Of the remaining SNPs, only SNPs showing minor allele frequency (MAF) of at least 10% were carried forward for further analysis. We used MaCH v1.0 for SNP imputation, to increase the density of interrogated SNPs66,67. Overall, 2,593,107 SNPs in GAIN-SCZ, 3,281,319 SNPs in GAIN-BD, and 2,543,887 SNPs in TGen-BD were included after imputation. 8. Integration analysis of gene expression and GWAS data We utilized the following procedures to run the enrichment test20. First the max −log(P-value) of a SNP located at 20kb upstream or downstream of a gene was assigned to represent the gene, then M1A and M3A gene set enrichment scores (ES) were calculated based the gene’s rank. SNP label level permutation was used to generate a distribution of the ES, and then the distribution was normalized68. FDR q values were calculated if multiple gene sets were included in enrichment test. We tested the difference of gene length distribution between genes in the module M1A, which was enriched with neuronal differentiation functions, and other genes in the genome. The result was significant (p=5.47e-27). To test whether gene length bias existed in our genetic signals enrichment test, we applied a permutation procedure to randomly-selected genes with similar gene length distribution for verification. First, we randomly selected 490 genes (the number of genes in the M1A module) not included in M1A, and compared those genes’ length distribution to M1A’s length distribution. If the mean of randomly-selected genes’ length is longer than the mean length of M1A genes, we ran the GWAS enrichment test for each selected gene set and calculated the enrichment P value. Second, we repeated the above steps B times to obtain the null statistic Pb as the background distribution P values. Finally, we computed the permutation pvalue for M1A GWAS enrichment as: where B equals 1000 and PM1A is the M1A GWAS enrichment P value. If the p value we calculated is less than 0.05, it means no bias was introduced by gene length, or at least the effect of bias was not significant. Our results show that permutation p values are 0.089, 0.047, and 0.034 for GAIN-SCZ, GAIN-BD and TGen-BD GWAS signal enrichment tests, respectively. This indicates that the significant enrichments we detected are not primarily a product of gene length bias. 9. Supplementary References 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. Torrey, E.F., Webster, M., Knable, M., Johnston, N. & Yolken, R.H. The Stanley Foundation brain collection and Neuropathology Consortium. Schizophrenia Research 44, 151-155 (2000). Narayan, S. et al. Molecular profiles of schizophrenia in the CNS at different stages of illness. Brain Research 1239, 235-248 (2008). Benovoy, D., Kwan, T. & Majewski, J. Effect of polymorphisms within probe-target sequences on olignonucleotide microarray experiments. Nucleic Acids Res 36, 4417-23 (2008). Irizarry, R.A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249-264 (2003). Li, C. & Wong, W.H. Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America 98, 31-36 (2001). Affymetrix. Affymetrix Expression Console Software website. Vol. 2011. Johnson, W.E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118-127 (2007). Chen, C. et al. Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS One 6(2011). Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4, Article17 (2005). Horvath, S. & Langfelder, P. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9(2008). Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719-720 (2008). Langfelder, P. & Horvath, S. Eigengene networks for studying the relationships between coexpression modules. Bmc Systems Biology 1(2007). Langfelder, P., Luo, R., Oldham, M.C. & Horvath, S. Is My Network Module Preserved and Reproducible? PLoS Computational Biology 7(2011). Langfelder, P. et al. A systems genetic analysis of high density lipoprotein metabolism and network preservation across mouse models. Biochim Biophys Acta. Huang, D.W., Sherman, B.T. & Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols 4, 44-57 (2009). Stefansson, K. et al. Common variants conferring risk of schizophrenia. Nature 460, 744-747 (2009). Dick. Genomewide linkage analyses of bipolar disorder: A new sample of 250 pedigrees from the national institute of mental health genetics initiative (vol 73, pg 107, 2003). American Journal of Human Genetics 73, 979-979 (2003). Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype Imputation. Annual Review of Genomics and Human Genetics 10, 387-406 (2009). Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes. Genetic Epidemiology 34, 816-834 (2010). Wang, J., Zhang, K.L., Cui, S.J., Chang, S.H. & Zhang, L.Y. i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Research 38, W90-W95 (2010). 10. Supplementary Figures Supplementary Fig. 1 | Cross-tabulation of results displays overlaps between Oldham modules and PCX-CTRL modules Colors indicate the significance of the overlap in gene composition between particular modules, specifically, the -log of the p-values from the hypergeometric distribution test. Supplementary Fig. 2 | Topological overlap matrix (TOM) plot to detect modules in PCX SCZ+CTRL samples Each line in hierarchical clustering dendrogram indicated one gene. Modules correspond to branches of the dendrogram, displayed as various colors in horizontal and vertical bars. The color in the topological overlap matrix plot indicated module membership in each module. Supplementary Fig. 3 | Composite preservation statistics of SMRI PCX modules in SMRI PFC SCZ and CTRL samples. Summary preservation statistic, Zsummary, was used to test whether SMRI PCX modules were preserved in SMRI PFC data. M1A is the brown module, and M3A is the green module. X-axis represents the number of genes in each module; Y-axis represents Zsummary in the second data set. The blue and green dashed lines in the figure indicate the thresholds Z=2 and Z=10, respectively. Zsummary < 2 implies no evidence for module preservation, 2 < Zsummary < 10 implies weak to moderate evidence, and Zsummary > 10 implies strong evidence for module preservation. M1A is well-preserved and M3A is moderately-preserved in SMRI PFC. Supplementary Fig. 4 | Composite preservation statistics of SMRI PCX modules in SMRI CB SCZ and CTRL samples. X-axis represents the number of genes in each module; Y-axis represents Zsummary in the second data set. The blue and green dashed lines in the figure indicate the thresholds Z=2 and Z=10, respectively. Zsummary < 2 implies no evidence for module preservation, 2 < Zsummary < 10 implies weak to moderate evidence, and Zsummary > 10 implies strong evidence for module preservation. Each point represents a module in SMRI PCX, labeled by color. M1A is the brown module, and M3A is green module. Supplementary Fig. 5 | Composite preservation statistics of PCX modules in VBBN PFC SCZ and CTRL samples. X-axis represents the number of genes in each module; Y-axis represents Zsummary in the second data set. The blue and green dashed lines in the figure indicate the thresholds Z=2 and Z=10, respectively. Zsummary < 2 implies no evidence for module preservation, 2 < Zsummary < 10 implies weak to moderate evidence, and Zsummary > 10 implies strong evidence for module preservation. Each point represents a module in PCX, labeled by color. M1A is the brown module, and M3A is green module. Supplementary Fig. 6 | Composite preservation statistics of PCX modules in SMRI PCX BD and CTRL samples. X-axis represents the number of genes in each module; Y-axis represents Zsummary in the second data set. The blue and green dashed lines in the figure indicate the thresholds Z=2 and Z=10, respectively. Zsummary < 2 implies no evidence for module preservation, 2 < Zsummary < 10 implies weak to moderate evidence, and Zsummary > 10 implies strong evidence for module preservation. Each point represents a module in PCX, labeled by color. M1A is the brown module, and M3A is green module. Supplementary Fig. 7 | Composite preservation statistics of PCX modules in SMRI CB BD and CTRL samples. X-axis represents the number of genes in each module; Y-axis represents Zsummary in the second data set. The blue and green dashed lines in the figure indicate the thresholds Z=2 and Z=10, respectively. Zsummary < 2 implies no evidence for module preservation, 2 < Zsummary < 10 implies weak to moderate evidence, and Zsummary > 10 implies strong evidence for module preservation. Each point represents a module in PCX, labeled by color. M1A is the brown module, and M3A is green module. Supplementary Fig. 8 | Manhattan Plot for M1A gene set enrichment with GAIN BD genetic association signals. GWAS of BD includes 2,662,182 SNPs with corresponding P-values. The lowest –log(p) SNP located in 20kb upstream and downstream was used to present the gene. Totally 1,352,922 variants were used and 18,316 genes were mapped for enrichment analysis. Supplementary Fig. 9 | Manhattan Plot for M1A gene set enrichment with GAIN SCZ genetic association signals. GWAS of SCZ includes 2,593,107 SNPs with corresponding P-values. The lowest –log(p) SNP located in 20kb upstream and downstream was used to present the gene. Totally 1,345,474 variants were used and 17,542 genes were mapped for enrichment analysis. Supplementary Fig. 10 | Manhattan Plot for M1A gene set enrichment with TGen BD genetic association signals. GWAS of BD includes 2,542,706 SNPs with corresponding P-values. The lowest –log(p) SNP located in 20kb upstream and downstream was used to present the gene. Totally 1,322,654 variants were used and 17,607 genes were mapped for enrichment analysis. 11. Supplementary Tables Supplementary Table 1 | Demography of SMRI PCX samples Schizophrenia Bipolar disorder Age(years) 42.76(20-60) 44.23(20-65) Sex(M/F) 35/14 25/21 Race(Euro/Non-Euro) 45/4 44/2 PMI(hours) 32.20(9-80) 35.65(12-84) Brain pH 6.39(5.8-6.93) 6.39(5.8-6.97) Left Brain(Fixed/Frozen) 23/26 22/24 Normal Controls 45.41(30-70) 34/15 48/1 27.22(8-58) 6.50(5.8-7.03) 25/24 Supplementary Table 2 | Demography of SMRI CB samples Schizophrenia Bipolar disorder Age(years) 43.21(20-70) 44.72(20-65) Sex(M/F) 28/11 19/17 Race(Euro/Non-Euro) 39/0 36/0 Brain pH 6.43(5.80-6.93) 6.43(5.92-6.97) Left Brain(Fixed/Frozen) 19/20 20/16 Normal Controls 45.11(30-70) 31/13 44/0 6.53(5.80-7.03) 23/21 Supplementary Table 3 | Demography of SMRI PFC samples Schizophrenia Bipolar disorder Age(years) 43.18(19-62) 44.4(19-64) Sex(M/F) 35/15 26/24 Race(Euro/Non-Euro) 46/4 47/3 PMI(hours) 32.80(9-80) 36.6(12-84) Brain pH 6.38(5.8-6.9) 6.36(5.76-6.9) Brain region(Left/Right) 24/26 23/27 Normal Controls 45.3(29-68) 35/15 49/1 27.68(8-58) 6.51(5.8-7.1) 26/24 Supplementary Table 4 | Demography of VBBN PFC samples Schizophrenia Age(years) 43.40(19-81) Sex(M/F) 24/6 PMI(hours) 39.13(17-68) Brain pH 6.24(5.6-6.64) Normal Controls 44.72(21-80) 24/5 40.47(12-69) 6.31(5.82-6.56) Supplementary Table 5 | M1A gene list (top 25) and correlation with module eigengenes Gene NOTCH2 SLC4A4 SLC25A18 PREX2 ACBD7 METTL7A SLC39A12 GRAMD1C GJA1 GPC5 DOCK7 ATP13A4 ACSS3 SLC1A3 GPAM AMOT SLC15A2 GOLIM4 MSI2 GPR98 BMPR1B SLC1A2 MID1 ATP1A2 MLC1 Entrez ID 4853 8671 83733 80243 414149 25840 221074 54762 2697 2262 85440 84239 79611 6507 57678 154796 6565 27333 124540 84059 658 6506 4281 477 23209 Chr chr1 chr4 chr22 chr8 chr10 chr12 chr10 chr3 chr6 chr13 chr1 chr3 chr12 chr5 chr10 chrX chr3 chr3 chr17 chr5 chr4 chr11 chrX chr1 chr22 Start Stop position position 120454176 120612276 72053003 72437799 18043183 18073651 68864353 69143897 15117474 15130775 51318534 51326300 18240821 18332218 113557680 113666021 121756788 121770873 92050882 93519490 62920397 63153969 193119866 193272696 81471809 81649582 36606689 36688436 113909622 113943521 112017731 112084043 121613287 121660458 167727231 167813669 55333931 55757299 89854617 90460087 95679128 96076167 35272753 35441524 10413596 10851773 160085548 160113381 50497820 50524358 Correlation 0.966282 0.955201 0.951395 0.949843 0.948906 0.94853 0.947239 0.946852 0.944748 0.944504 0.944176 0.943363 0.943127 0.942854 0.942621 0.941833 0.940898 0.939817 0.939664 0.936887 0.936801 0.935932 0.935091 0.934725 0.934619 P value 7.32E-52 1.02E-46 3.03E-45 1.11E-44 2.4E-44 3.25E-44 9.07E-44 1.23E-43 6.12E-43 7.34E-43 9.37E-43 1.7E-42 2.02E-42 2.46E-42 2.91E-42 5.12E-42 9.89E-42 2.09E-41 2.32E-41 1.48E-40 1.57E-40 2.75E-40 4.7E-40 5.92E-40 6.33E-40 Supplementary Table 6 | M3A gene list (top 25) and correlation with module eigengenes Gene MT1X TRAF3IP2 MT2A RFX4 MT1M MGST1 PLOD2 MT1L MT1DP SLC7A2 GPR56 CYP4F11 GLIS3 LRIG1 YAP1 FNDC3B FAM189A2 IL6ST IL33 PPFIA1 MT1P3 MT1JP ITPR2 HGF LPIN1 Entrez ID 4501 10758 4502 5992 4499 4257 5352 4500 326343 6542 9289 57834 169792 26018 10413 64778 9413 3572 90865 8500 140851 4498 3709 3082 23175 Chr chr16 chr6 chr16 chr12 chr16 chr12 chr3 chr16 chr16 chr8 chr16 chr19 chr9 chr3 chr11 chr3 chr9 chr5 chr9 chr11 chr20 chr16 chr12 chr7 chr2 Start Stop position position Correlation 56716393 56718108 0.91599 111877657 111927449 0.913403 56642496 56643409 0.907505 106976685 107156581 0.903227 56642568 56667898 0.895629 16500076 16517344 0.892384 145787227 145879282 0.881285 56651373 56652727 0.876324 56677599 56679162 0.875918 17396304 17428025 0.875673 57653958 57698944 0.867166 16023181 16045676 0.867138 3824127 4300035 0.860373 66429144 66551435 0.856017 101981279 102104149 0.836262 171757418 172118487 0.836008 71940348 72007371 0.835066 55230923 55290772 0.828274 6215809 6257983 0.819621 70116815 70230502 0.81249 33805758 33806127 0.803921 56669651 56670998 0.800318 26488285 26986131 0.788552 81328322 81399454 0.773948 11886740 11967620 0.768599 P value 1.81E-35 6.24E-35 9.05E-34 5.65E-33 1.19E-31 4.09E-31 2.09E-29 1.07E-28 1.22E-28 1.32E-28 1.83E-27 1.84E-27 1.31E-26 4.41E-26 6.76E-24 7.18E-24 8.97E-24 4.29E-23 2.86E-22 1.27E-21 7.01E-21 1.40E-20 1.23E-19 1.51E-18 3.60E-18 Supplementary Table 7 | M1A’s gene list and top GO functions Category Term % PValue FDR GOTERM_BP_FAT GO:0030182~neuron differentiation 11 7.50E-08 1.27E-04 EGFR, PARD3, TUBB2B, PTPRZ1, CLU, SOX2, EMX2, PAX6, GJA1, DOCK7, NR2E1, GLI3, GPR98, TGFB2, SLC1A3, S1PR1, CRB1, LHX2, NTRK2, OPHN1, BMP7, BMPR1B GOTERM_BP_FAT GO:0048666~neuron development 8.5 3.63E-06 0.006141 EGFR, PARD3, PTPRZ1, CLU, PAX6, GJA1, DOCK7, NR2E1, GPR98, TGFB2, SLC1A3, CRB1, LHX2, NTRK2, OPHN1, BMPR1B, BMP7 GOTERM_CC_FAT GO:0044459~plasma membrane part 26.5 5.69E-06 0.007388 RHOJ, CADM1, GPR125, SLC15A2, FERMT2, GJA1, AQP4, TLR4, SDC4, SDC2, IL17RB, SLC1A4, GPC5, EDNRB, SLC1A2, S1PR1, SLC1A3, APOE, SLC4A4, EGFR, GABRG1, SLC9A3R1, NTSR2, ARHGAP31, CYBRD1, OPHN1, ADD3, PARD3, FGFR3, PHKA1, CLDN10, GNG12, EZR, FAT1, P2RY1, DTNA, SLC6A11, GPR75, PTPRZ1, MAOA, AXL, GPR137B, ATP1A2, GJB6, NOTCH2, RAB31, TMEM47, KCNN3, NTRK2, AMOT, MERTK, BMPR1B, CD302 GOTERM_BP_FAT GO:0000902~cell morphogenesis 8.5 6.76E-06 0.011446 EGFR, PARD3, PTPRZ1, CLU, PAX6, GJA1, DOCK7, SOX9, NR2E1, TGFB2, EZR, SLC1A3, CRB1, LHX2, OPHN1, BMPR1B, BMP7 GOTERM_BP_FAT GO:0007423~sensory organ development 6.5 2.26E-05 0.038226 EGFR, SOX2, PAX6, GJB6, NR2E1, GLI3, GPR98, TGFB2, EYA1, CRB1, NTRK2, BMPR1B, BMP7 GOTERM_BP_FAT GO:0032989~cellular component 8.5 2.61E-05 0.044204 morphogenesis EGFR, PARD3, PTPRZ1, CLU, PAX6, GJA1, DOCK7, SOX9, NR2E1, TGFB2, EZR, SLC1A3, CRB1, LHX2, OPHN1, BMPR1B, BMP7 Supplementary Table 8 | M3A’s gene list and top GO functions Category Term SP_PIR_KEYWORDS metal-thiolate cluster MT1X, MT2A, MT1M, MT1JP, MT1L, MT1E, MT1P0 GOTERM_BP_FAT GO:0051270~regulation of cell motion IGF1R, NRP1, LYN, IL6ST, F3, ABHD2, FGF2 SP_PIR_KEYWORDS chelation MT1L, MT1M, MT1E, MT1JP, MT1P3, MT1X SP_PIR_KEYWORDS cadmium MT1L, MT1M, MT1E, MT1JP, MT1P3, MT1X GOTERM_BP_FAT GO:0042493~response to drug ABCC9, TNFRSF11B, LYN, IL1B, XRCC1, MGST1, ABCG2 % 3.88 PValue 2.41E-05 FDR 0.031 6.80 6.01E-04 0.942 2.91 6.49E-04 0.822 2.91 6.49E-04 0.822 6.80 0.00108 1