Supplementary Methods Fischer et al Drosophila spermatozoal mRNA transcripts in the fertilized egg Tissue collection and RNA extraction for microarrays Sperm were purified from adult D. melanogaster males essentially as described [1, 2]. Briefly, flies were dissected in PBS using fine forceps and a dissecting microscope. Testes were removed by excision of the male terminalia with forceps and extrusion of the paired testes and attached seminal vesicles from the abdominal cavity. Seminal vesicles were carefully detached from the testes and moved to a clean drop of PBS. Seminal vesicles were punctured with a fine needle, allowing sperm to flow outward, and the sperm removed as an intact mass to a microfuge tube containing PBS using forceps and the samples stored on ice. Routinely 25-50 dissections were performed per session (approximately 60 min) and then centrifuged at 13000 rpm for 2-5 minutes at room temperature. The sperm pellets were washed twice with 4ºC PBS and excess PBS removed from the final wash before storing frozen at -80ºC. For whole testes, 10 adult males were dissected and their testes along with associated accessory glands stored in Trizol (Invitrogen) at -80°C. RNA from both sample types was extracted using the Trizol method (Invitrogen). For the purified sperm samples, RNA equivalent to the dissections from approximately 200 males were pooled to create three independent biological replicates and the yield estimated using an agarose gel. The testis RNA was quantified using a Nanodrop spectrophotometer (Thermo Scientific). SMART amplification for microarrays RNA samples were amplified using SMART method [3]. Briefly, RNA was treated with RQ1 DNase (Promega). Reverse transcription was performed with 1 μM 3’SMART CDS Primer IIA (5’ AAG-CAG-TGG-TAT-CAA-CGC-AGA-GTA-CTT-TTT-TTT-TTTTTT-TTT-TTT-TTT-TTT-TTT-VN 3’, V = G+A+C, N = A+C+G+T), 1 μM SMART IIA chimeric oligo (5’ AAG-CAG-TGG-TAT-CAA-CGC-AGA-GTA-CGC-888 3’, 8 = riboG), 1x First-Strand Buffer, 0.25 mM of each nucleotide, 2 mM DTT and 1 μl PowerScript Reverse Transcriptase (Clontech) for 1.5 hours at 42 ºC, then snap cool on ice. Amplification of 5 μl cDNA was performed with 1 μl 50x Advantage 2 Polymerase Mix (Clontech) in 1x Advantage 2 PCR buffer, 0.05 mM of each nucleotide, 0.8 μM 5’ PCR Primer II (5’AAG-CAG-TGG-TAT-CAA-CGC-AGA-GT 3’) in 50 μl total volume. 1 The cycling conditions were: initiation 95°C for 1 min, followed by 20 cycles of 95°C for 5sec, 65°C for 5 sec, extension at 68°C for 6 min. The optimal number of cycles was determined taking aliquots from every second cycle between cycles 18 to 28 and visualising these by electrophoresis in a 1% agarose gel. For all samples 20 cycles (two cycles less than saturation) were used to stay within the exponential phase. Sample labelling and hybridization for microarrays 1 μg of amplified DNA was labelled as technical dye-swap replicates using the BioPrime DNA labelling kit (Invitrogen) in the presence of fluorescently labeled Cy3- or Cy5-dCTP (GE Healthcare) at 37°C for 2 hours and the product purified using Sephadex G50 columns (GE Healthcare). A long oligonucleotide set was used to print in house microarrays (GEO platform accession GPL8244) using a Qarray2 (Genetix) spotter and FMB PowerMatrix slides. Co-hybridisation of labelled sperm or testis samples with labelled genomic DNA (used to help identify the location of the probes on the array) was performed for 16 hours at 51°C using a GeneTac hybridisation station (Digilab Genomic Solutions Inc). Post hybridisation washes were performed according to the slide manufacturer’s recommendation. Detailed protocols for array spotting, labelling, hybridisation and washes are available at http://www.flychip.org.uk/protocols/. Microarray analysis Arrays were scanned using the GenePix 4000B dual laser scanner (Axon Instruments) at 5µm resolution and individually optimised PMT gain settings. Intensity values for each probe were extracted using Dapple [4]. Only intensity values of the channels of the sperm and testis data were retained for further analysis. Data was then split into two sets (sperm and testis): probes with one or more replicates below the background signal noise level (<200) were removed from the analysis. Each set was normalised independently using a quantile method [5]. The median of the normalised intensity of each sample type was ranked using minimum ties method of the rank function in R, assigning the highest intensity value a rank of 1 (Table 1). Microarray data are publically available via the Gene Expression Omnibus under accession number GSE33947. Genomic distribution and molecular function analyses The chromosomal distribution of spermatozoal RNA genes was compared statistically to the distribution of annotated genes using a chi-square test with Yates correction. Analysis 2 of gene clustering was conducted using an adjacent gene model where clusters were defined as a set of 2 or more physically adjacent genes where, i) greater than 2/3 of the genes encode spermatozoa RNA and ii) the cluster was bounded at each end by a spermatozoa RNA gene. The observed number of each cluster type was determined using an algorithm (E. Wilkin and S. Dorus, unpublished) that counted the unique set of largest clusters by iteratively removing the largest sequence fulfilling the criteria. Statistical assessment was conducted using a nonparametric Monte-Carlo approach where the distribution of expected cluster types was determined by randomly rearranging the positions of genes within the genome (10,000 iterations). Statistical analysis of Gene Ontology molecular function enrichment of the 500 most abundant Drosophila and human sperm transcripts was conducted using a hypergeometric distribution and the Yekutieli (FDR under dependency) multiple-test correction as implemented by the GOEAST toolkit [6]. Average human testis expression data was obtained from GSE1133 [7] and human spermatozoal RNA from [8]. Drosophila stocks and embryo collections for RT-PCR confirmation Drosophila stocks were: Oregon-R (OrR), CPTI000493 (RpS9-YFP), CPTI002881 (RpL41-YFP), CPTI001654 (CG9336-YFP), which contain a Venus (a yellow fluorescent protein variant) exon insertion within an intron (http://www.flyprot.org [9, 10]). All stocks were maintained at 25°C on standard media. Virgin male flies from the YFP tagged lines were crossed to virgin OrR females. Embryos were collected after 15 or 30 minute lays on yeasted apple or grape juice plates then aged for 0, 1 or 3 hours after which single embryos were collected and stored in Trizol (Invitrogen) at -80°C until RNA extraction. Reverse Transcription and PCR reactions for RT-PCR confirmation RNA was extracted from single embryos using the Trizol method (Invitrogen). The RNA was dissolved in 1 μl DEPC-treated water and then treated with RQ1 DNase (Promega), replacing the RQ1 Buffer with 5x First Strand Buffer (Invitrogen). Reverse transcription was performed (as per manufacturers recommendation) at 55°C for 1h using Superscript III enzyme (Invitrogen) and anchored oligo dT primer (Sigma) in a total of volume of 20 μl. The resulting cDNA was diluted 1:2 in water and 1 μl of cDNA was used in the PCR reactions using 1 unit Thermostart DNA Polymerase (Thermo Scientific), 1x Thermostart PCR Buffer, 1.5 mM MgCl2, 0.05 mM of each nucleotide, 0.2 μM of each primer in a total 3 volume of 25 µl. PCR conditions were: initiation at 95°C for 15 minutes followed by 40 cycles of 95°C for 30sec, 59°C (56°C for Rp49) for 30 sec, extension at 72°C for 30 sec, followed by a final extension at 72°C for 10 minutes (See below for primers). Each embryo RNA sample was assayed for all 9 transcripts. To detect the YFP tagged genes nested PCR was performed using 1 μl of the PCR product of the first round, performing 40 cycles for RpS9 and RpL41, and 60 cycles for CG9336. PCR primers and PCR product size bic sisA sna eve fas3 Rp49 RpS9YFP RpS9YFP Nested RpL41YFP RpL41YFP Nested CG9336YFP CG9336YFP Nested Forward acgacgctacagatcttgga ccatggaacggagtcatctt tggaaagctgtacaccacca cctcgccaaatgaatgcctatc gccatcttaacagatgcactcac catacaggcccaagatcg tatggtctgcgcaacaagc cgccctggctaagatccgta gttctcaaaccgtcgtccag caaaccgtcgtccagaccag gaggctgatgagacgttgct cccacgctacctgcagaactt Reverse cgatggggattatacgcttg tctccaggagcatctggtct agcgacatcctggagaaaga aaggcgggatcggagtagac aagtagccctcgcgatttg gcttgttcgatccgtaacc agatcagcttcagggtcagc cgtttacgtcgccgtccag agatcagcttcagggtcagc cgtttacgtcgccgtccag agatcagcttcagggtcagc cgtttacgtcgccgtccag cDNA Product size (bp) 254 222 273 307 389 174 371 220 297 258 368 252 References 1. Dorus S., Busby S.A., Gerike U., Shabanowitz J., Hunt D.F., Karr T.L. 2006 Genomic and functional evolution of the Drosophila melanogaster sperm proteome. Nat Genet 38, 1440-1445. 2. Snook R.R., Cleland S.Y., Wolfner M.F., Karr T.L. 2000 Offsetting effects of Wolbachia infection and heat shock on sperm production in Drosophila simulans: analyses of fecundity, fertility and accessory gland proteins. Genetics 155, 167-178. 3. Seth D., Gorrell M.D., McGuinness P.H., Leo M.A., Lieber C.S., McCaughan G.W., Haber P.S. 2003 SMART amplification maintains representation of relative gene expression: quantitative validation by real time PCR and application to studies of alcoholic liver disease in primates. J Biochem Biophys Methods 55, 53-66. 4. Buhler J., Ideker T., Haynor D. 2000 Dapple: improved techniques for finding spots on DNA microarrays. UW CSE Technical Report UWTR 2000-08-05. 4 5. Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P. 2003 A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185-193. 6. Zheng Q., Wang X.J. 2008 GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res 36, W358-363. 7. Su A.I., Wiltshire T., Batalov S., Lapp H., Ching K.A., Block D., Zhang J., Soden R., Hayakawa M., Kreiman G., et al. 2004 A gene atlas of the mouse and human proteinencoding transcriptomes. Proceedings Of The National Academy Of Sciences Of The United States Of America 101, 6062-6067. 8. Platts A.E., Dix D.J., Chemes H.E., Thompson K.E., Goodrich R., Rockett J.C., Rawe V.Y., Quintana S., Diamond M.P., Strader L.F., et al. 2007 Success and failure in human spermatogenesis as revealed by teratozoospermic RNAs. Hum Mol Genet 16, 763773. 9. Rees J.S., Lowe N., Armean I.M., Roote J., Johnson G., Drummond E., Spriggs H., Ryder E., Russell S., Johnston D.S., et al. 2011 In Vivo Analysis of Proteomes and Interactomes Using Parallel Affinity Capture (iPAC) Coupled to Mass Spectrometry. Mol Cell Proteomics 10, M110 002386. 10. Ryder E., Spriggs H., Drummond E., St Johnston D., Russell S. 2009 The Flannotator--a gene and protein expression annotation tool for Drosophila melanogaster. Bioinformatics 25, 548-549. 5