Supplementary Material A microarray for assessing transcription from pelagic marine microbial taxa. Shilova et al. Methods Design of the MicroTOOLs microarray The seed dataset included sequences from the following strains, if present in their genomes: alpha-Proteobacterium HIMB5, Crocosphaera watsonii WH8501, gamma-Proteobacterium HTCC2207, Marinobacter sp. ELN17, Pelagibacter ubique HTCC1062, Pelagibacter ubique HTCC7211, Prochlorococcus marinus MIT9301, Prochlorococcus marinus MIT9313, Prochlorococcus marinus CCMP1986, Pseudomonas stutzeri A1501, Roseobacter GAI101, Rhodobacterales bacterium HTCC2255, Synechococcus sp. CC9311, Synechococcus sp. CC9605, Synechococcus sp. WH8102, Trichodesmium erythraeum IMS101, and others. Sequences derived from clone libraries of genes included amoA, cynA, narB, nifH, nirS, nr, ntcA, glnA, petB, phnD, rbcL, urtA, and viral genes g20, gp23, mcp, pol, RdRp. Testing probe specificity in silico: Each target sequence was trimmed at the start of the first probe and the end of the last probe. The trimmed regions (“probed region”) were used as queries in BLASTN against several datasets: all target sequences, “Non-redundant nucleotide”, and “All Prokaryotic Genomes” databases at CAMERA, and against the SILVA database of ribosomal RNA sequences as of May 2010 and August 2011, respectively (www.arb-silva.de). If the probed region had a 1 non-specific hit (determined as 95% nucleotide similarity hit to something else over 90% region length), each probe in the region was analyzed with BLASTN to determine and remove the exact non-specific probe(s). RNA extraction and processing for hybridization to the microarray RNA was extracted using Ambion® RiboPureTM kit (Life Technologies, Grand Island, NY) with modifications. The filter was removed from the Sterivex cartridge and placed into a 2.0 mL microcentrifuge tube with 100 L 0.1 mm diameter glass beads (Biospec Products, Bartlesville, OK) and 1.0 ml Ambion® TRIzol reagent (Life Technologies). The tubes were bead-beaten twice for 2 min and centrifuged at 12,000 g for 10 min at 4ºC. The supernatant was transferred into a new 1.5 ml microcentrifuge tube. Chloroform was added at one fifth of the supernatant volume; the solution was vortexed for 15 sec, incubated at room temperature for 15 min, and centrifuged at 12,000 g for 10 min at 4ºC. The consequent RNA purification followed the RiboPure (Ambion) manufacturers instructions. If the concentration of RNA was less than 100 ng µl-1, the extract was concentrated by precipitating with 0.1 volume of sodium acetate (0.3 M final concentration), each 1.0 ml of RNA extract, and 2.5 volumes of 100% (v/v) ice-cold ethanol. RNA samples were treated with RNase-free DNase (QIAGEN, Valencia, CA, USA) for 30 min and purified using QIAGEN RNeasy Mini Kit. DNA extraction DNA was extracted from the organic phase of the nucleic acid extract after RNA separation using RiboPureTM kit (Ambion) according to the manufacturer’s instructions. Briefly, DNA was precipitated from the organic phase with 300 2 100% (v/v) ethanol per each 1 mL TRIzol® reagent used in cell lysis, washed three times with ice-cold 75% (v/v) ethanol, and solubilized with 8 mM sodium hydroxide. The pH of the DNA solution was adjusted to 8.4 with 1M HEPES buffer. The quality and quantity of DNA in the extracts were determined with a NanoDrop 1000 (Thermo Scientific) and the 2100 Bioanalyzer using the DNA 7500 kit (Agilent Technologies). For qPCR, DNA samples were diluted 100-fold, and inhibition tests were run on all samples. 3 Figures C 20000 50000 0 20000 50000 50000 0 50000 0 20000 Block_1 50000 0 20000 Block_2 50000 0 20000 Block_3 50000 0 20000 Block_4 0 20000 Block_5 0 20000 50000 0 20000 50000 0 20000 50000 Figure S1. Normalized transcription in test samples obtained with a prototype highdensity oligonucleotide microarray. The prototype microarray was designed similarly to the MicroTOOLs microarray, but contained 13460 probes representative of 97 gene categories. The probes were synthesized in five replications on four-plex (four chips on one slide) of 72000 features (4x72K) NimbleGen array. Total RNA samples were obtained from cultures of cyanobacteria: Synechococcus sp. WH8102 (Synechococcus), Crocosphaera watsonii WH8501 (C. wat), Trichodesmium erythraeum IMS101 (T.ery), and a mix of total RNA from three cultures. Environmental samples were obtained from South Pacific Ocean stations 9, 11, 12, and 17 during KM0704 between Fiji and Hawaii in April 2007 (Hewson et al., 2009, LO). RNA extraction and processing were done as described in this study, except environmental samples were processed as described 4 in Hewson et al., (2009). Hybridization was done at the NimbleGen facility (Iceland). The results are shown for hybridization of Prochlorococcus probes to Synechococcus (A top panel), Crocosphaera (A second panel), and to environmental sample from Station 17 (A bottom panel). Genes are located on X axis and grouped by KEGG classes. Transcription normalized to median in each sample is shown on Y axis as log(2) values. Less than 2% of Prochlorococcus-specific probes yielded crosshybridization to Synechococcus-specific probes, and the genes were less than 5% different on nucleotide level. (B) Distribution of transcription signal for Crocosphaera probes in culture and environmental samples. Less than 1% of Trichodesmium-specific probes yielded cross-hybridization to Crocosphaera transcripts. The detection of transcription in Crocosphaera at Stn. 12 was consistent with the presence of Crocosphaera cells at this station detected with qPCR (Moisander et al. 2010). (C) Correlation (Pearson 0.98±0.01) between technical replicates for environmental sample SP_35016 from the South Pacific Ocean. Figure S2. Quality control of microarray hybridizations and normalization. (A) Boxplot of hybridization signals for randomly sampled 10000 probes before normalization. Y axis is hybridization signal shown as log(2) exponent. (B) Density distribution of hybridization signals for these probes before normalization. X axis is hybridization signal shown as log(2) exponent. (C) Boxplot of transcription values for 19560 genes obtained using RMA algorithm and Li-Wong normalization as described in Methods. Y axis is normalized transcription shown as log(2) exponent. (D) Density distribution of transcription values for the 19560 genes. X axis is 5 transcription shown as log(2) exponent. Sample names: P_S1 stands for P-amended replicate #1 and Fe_S1 stands for Fe-amended replicate #1. ERCC transcrip on signal, log2 1.1E+7 6.9E+2 ERCC transcript copy, log2 Figure S3. Hybridization signal for the ERCC mRNA spike-in mix (Ambion®) in all samples. Different colors correspond to different samples. The detection range was estimated as 700 to 11,000,000 copies of mRNA. The relative sensitivity was The lowest detected 700 mRNA molecules constitute 1.8E-06% of mRNA of 1000 nt long and calculated based on following: A) 1 ug contains 1.88E+12 mRNA molecules of 1000 nt size (average size of a bacterial mRNA) B) 400 ng of total RNA was used for cDNA synthesis C) Considering rRNA as 95% of total RNA: 20 ng of mRNA was used for cDNA synthesis D) 20 ng is equivalent to 3.8E+10 mRNA molecules of 1000 nt long The relative cell sensitivity (the lowest relative abundance of cells within the community that can be detected) of 0.0025% was based on the assumptions of 1380 mRNA molecules per cell (Neidhardt, 1996) and estimated as 1.8E-06% multiplied by 1380. A B 6 Figure S4. (A) Distribution of p values (X axis) for Wilcoxon test for randomly selected 3000 genes, where N=100, and bandwidth= 0.018. The data for clustering and W-test was centered and scaled across genes and samples. (B) SAM analysis observed versus expected scores. Significant genes deviate from ‘expected=observed’ line, and genes up-regulated in P treatment and in Fe treatments are shown in green (lower left corner) and red (upper right corner), respectively. Delta=0.677 was selected to find significantly differential genes with FDR=0.11. 7 Figure S5. (A) Hierarchical clustering of 3742 genes (rows) by transcription across samples (columns). Blue and red in the heatmap represent down-regulation and upregulation of transcription, correspondingly. Cluster numbers 1-9 in all genes heatmap are shown on the left, where clusters 1:5, and 9 contain genes up-regulated in P amendment, and clusters 6, 7, and 8 contain genes up-regulated in Fe amendment. Transcription of genes up-regulated in the Fe or P amendments by Phylogroup (B) and Prochlorococcus and Synechococcus Clade (C). Transcription was normalized to the mean across samples. 8 9 Figure S6. Normalized average transcription for top differentially transcribed genes for (A) Replication and Cell cycle, (B) Iron stress response, and (C) P metabolism and stress response. Transcription was normalized to the mean transcription across samples. TABLES File TableS1.xls Table S1. MicroTOOLs array content: target genes in each microbial group and viruses. File TableS2.xls Table S2. Top 3000 genes with detected transcription at Stn ALOHA. Complete data is available at NCBI GEO. Column names: Control, Phosphorus, and Iron: average transcription values in control samples with no amendments, P-amended and Fe-amended treatments, respectively; SD_Control, SD_Phosphorus, SD_Iron – standard deviation for transcription values for Control, P- and Fe-amended treatments; NCBI_GI: NCBI GI for the best hit in BLASTN if available; Organism: affiliation to the best hit; Pathway: KEGG pathway; Source: where nucleotide sequences was obtained from; Per_Ident: percent identity of gene nucleotide sequence to the organism by BLASTN; Clade: clades for Synechococcus and Prochlorococcus; Group: phylogroups. File TableS3.xls Table S3. Genes with detected transcription at Stn. ALOHA. Column names: ‘Detected transcripts’ stands for counts of detected transcripts for each gene; ‘Detected transcripts %’ is percent of detected transcripts from total detected; ‘Total probes % is percent of detected transcripts from a total number of probes in the microarray for this gene. File TableS4.xls Table S4. Pearson correlation coefficients for all samples. Tab #1 includes all detected genes and differential genes from Eukaryota, Prochlorococcus, Synechococcus, Energy and Nitrogen metabolism as shown in Figure 4. Tab #2 shows correlation coefficients for other groups of genes: all differential genes, genes from Archaea, Viruses, and genes from phosphorus metabolism, and iron metabolism. File TableS5.xls Table S5. Top 50 differentially transcribed genes identified with LIMMA. Column names: Fe - average transcription in Fe treatment,, log2; P - average transcription in P treatment, log2; logFC - log2 fold-change; adj.P.Val - adjusted p value by Benjamini and Hochberg; ORGANISM – organism identified as a top hit by BLASTN; Gene – gene name, Annotation – gene annotation; Module - KEGG class; NT % Ident – BLASTN percent identity of the target sequence to the hit organism; ID - unique gene ID on the microarray; NCBI ID – NCBI ID if known. 10 File TableS6.xls Table S6. Differentially transcribed genes for Fe, N, P acquisition and metabolism, energy metabolism and carbon fixation. 11