1 2 Appendix S1: Supplementary Methods Microarray Development: Assembled sequences from the A. palmata transcriptome 3 (Polato, Vera et al. 2011) were used to generate a NimbleGen 12 plex 135K feature Custom 4 Gene Expression Array (Roche Diagnostics, IN). Two 60-mer probes were designed for each con- 5 tig (n = 85,260), and a single probe was designed for each singleton sequence (n = 45,390). Sin- 6 gletons were included on the array because a large proportion (46%) had significant BLAST hits 7 to known sequences, and evidence suggests that singletons are likely to represent low abun- 8 dance transcripts rather than artifacts or contaminants (Meyer, Aglyamova et al. 2009). Two 9 additional probes each were developed for sequences associated with annotation information 10 relating to calcium metabolism and stress response (n = 4,798). Replicate probes for individual 11 sequences from the assembled transcriptome were not identical; rather they represented mul- 12 tiple different 60-mer sequences from the original template. Microarray probe sequences and 13 annotation data is available at https://homes.bio.psu.edu/people/faculty/baums/Links.htm. 14 Information on how to order this array from Roche-NimbleGen is available from the authors. 15 Microarray Hybridization: To prepare samples for microarray hybridization, one cycle 16 of amplification was performed on 1ug of each RNA sample using the Amino Allyl MessageAmp 17 II aRNA Amplification Kit (Ambion Life Technologies, AM1753) following the manufacturer’s 18 protocol. Dye coupling of 15 ug of aRNA was performed with Cy3 or Cy5 (GE Health Care, 19 RPN5661), and subsequently purified according to the Ambion Kit instructions. For each pair of 20 samples that were to be hybridized to the same array, 2µg each of the Cy3 and Cy5 labeled 21 sample were combined and fragmented using RNA Fragmentation Reagents (Ambion, AM8740) 22 according the manufacturer’s instructions, then dried down completely in a speed-vac. Samples 23 were resuspended in the appropriate tracking control and hybridization solution master mix 24 was added following manufacturer’s instructions (Roche NimbleGen). Following two 5-minute 25 incubations at 95°C and 42°C, the mixer was attached to the array and hybridization solutions 26 were added according to the manufacturer’s instructions. Hybridization was performed while 27 mixing overnight at 42°C in a MAUI hybridization instrument (BioMicro Systems, UT). Hybrid- 28 ized slides were washed according to the manufacturer’s instructions (Roche NimbleGen), and 29 spun at 1000 RPM for 3 minutes in a 50 ml conical tube filled with N2 gas. Dried slides were 30 placed in fresh tubes with 2.5 ml of DyeSaver (Genisphere Inc.) and rotated for 45 seconds to 31 coat the slide. A final spin at 700 RPM for 3 seconds removed excess DyeSaver, before air drying 32 briefly and scanning with Axon GenePix 4000B. 33 Functional Analysis: Analysis of the functions associated with differentially expressed 34 probes was performed in two ways to overcome challenges inherent to functional analysis of 35 data from non-model species obtained via electronic annotation. First, single enrichment analy- 36 sis was run using the program GOEAST (Zheng and Wang 2008). This method compares lists of 37 GO terms associated with DEGs, to a list of all GO terms associated with the array. It then tests 38 if the number of occurrences of any specific GO term is present in the DEG list more often than 39 would be expected by chance using a hypergeometric test. The resulting enrichment P-values 40 are corrected for multiple testing via the false discovery rate method of Benjamini and Yekutieli 41 (2001). This approach has the benefit of being highly customizable as any list of GO terms can 42 be loaded into the program for comparison to all GO terms on the array, thus enabling analysis 43 of the specific subsets of DEGs associated with the factors of interest described above. This in- 44 cludes GO terms associated with genes identified in non-model organisms. Shortcomings of 45 this approach however, include highly redundant output (a consequence of the directed acyclic 46 graph based structure of GO) and no clear way to collapse expression values from multiple 47 probes from the same sequence into a single expression value. The problem was dealt with by 48 processing the output lists with the program REVIGO (Supek, Bošnjak et al. 2011) which re- 49 moved redundant GO terms based on semantic similarity scores. These reduced lists were then 50 visually inspected for accuracy and terms were replaced or further reduced as appropriate. 51 The second approach taken to characterize important functions in the differen- 52 tially expressed genes made use of the Ingenuity Pathway Analysis (IPA) software (Ingenuity 53 Systems, CA). IPA is a web-based application that performs functional enrichment analyses to 54 determine the probability that a given gene set is associated with pre-defined functions beyond 55 what would be expected by chance. Gene function data in IPA is based on information in the 56 Ingenuity Knowledge Base, a manually reviewed database of pathways and relationships taken 57 from primary literature and public databses including GO, KEGG and Entrez. The statistical re- 58 sults are based on enrichment P-values computed with a Fisher’s Exact Test corrected for mul- 59 tiple tests (Benjamini and Hochberg 1995). This technique is comparable to other well-known 60 enrichment analysis methods (Huang, Sherman et al. 2009), and experimental evidence sug- 61 gests that it performs well on large datasets (Hong, Pawitan et al. 2009). The benefit of this 62 method was that redundancy in the input and output was minimized by considering expression 63 levels from only the probes with the highest observed log fold change, when multiple probes to 64 a single sequence were present on the array. Furthermore, it enabled analysis of transcription 65 factor activation states based on differential regulation of downstream target molecules (a use- 66 ful technique considering the fact that a very slight change in transcriptions factor expression 67 may not pass significance thresholds in traditional microarray analysis but can be responsible 68 for strong differential expression of downstream targets). The primary limitation of this method 69 however was that the IPA database is based on findings primarily from model vertebrates, thus 70 differentially expressed genes without homologs in mice, rats, or humans are not considered in 71 the analysis, and information on the functions and interactions of some genes may not be ap- 72 propriate for taxa as evolutionarily distant as the Cnidaria. 73 74 References: 75 76 77 78 79 80 81 82 83 84 85 86 87 Benjamini, Y. and Y. Hochberg (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing." Journal of the Royal Statistical Society. Series B (Methodological) 57(1): 289-300. Benjamini, Y. and D. Yekutieli (2001). "The control of the false discovery rate in multiple testing under dependency." Annals of statistics: 1165-1188. Hong, M. G., Y. Pawitan, et al. (2009). "Strategies and issues in the detection of pathway enrichment in genome-wide association studies." Human genetics 126(2): 289-301. Huang, D. W., B. T. Sherman, et al. (2009). "Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists." Nucleic acids research 37(1): 1. Supek, F., M. Bošnjak, et al. (2011). "REVIGO summarizes and visualizes long lists of Gene Ontology terms." PloS one 6(7): e21800. Zheng, Q. and X. J. Wang (2008). "GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis." Nucleic acids research 36(suppl 2): W358-W363. 88 89