1 Appendix S6: AANAT Polymorphism and Phylogeny 2 Phylogenetic reconstructions 3 Because the only AANAT crystal structure available is from O. aries (Hickman 4 et al. 1999a, 1999b) we chose this species as an outgroup for all performed 5 phylogenies. Genbank and Ensembl were searched for all Teleost AANAT2 cDNA 6 sequences available (Table S3). Non AANAT2 isoforms were excluded thanks to 7 nomenclature verification using Blast and preliminary phylogenetic reconstructions. 8 Analyses were restricted to the full AANAT2 ORF ranging from 627 to 693 bp length, 9 depending on the species. 10 Two mitochondrial genes were searched in Genbank for the 15 species 11 investigated: the Cytochrome b (Cytb) and Cytochrome C Oxidase subunit 1 (COX1) 12 (Table S3). The genes were cloned for the catfish O. sifontesi because they were not 13 available in the data bases. For this purpose, total RNA was extracted from catfish 14 skin using Trizol. The cDNA library generation and the Cytb or COX1 amplification 15 was performed using the same protocols as mentioned earlier on for AANAT2 16 cloning. The majority of O. sifontesi cytb and COX1 genes were obtained 17 (respectively 1,005 and 1,385bp). 18 Each set of sequences was aligned using MUSCLE codon algorithm (Edgar 19 2004) displayed in MEGA 5 (Tamura et al. 2011). Nucleotide or amino acids 20 alignments were then imported in Geneious Pro 5.4.6 (Drummond et al. 2010). 21 Amino acid matrices are displayed for the three genes in Figure S2. The nucleotide 22 AANAT2 alignment was analyzed by maximum likelihood (ML). The model of 23 nucleotide substitution was selected in Modeltest v 3.8 (Posada 2006) using the 24 Akaike Information Criterion (AIC) for each position of the codon. The more complex 25 model was found at the first position (GTR + G), and was applied to all tree positions 26 in the ML reconstructions. A ML heuristic search, using a starting tree obtained by 27 Maximum Parsimony, was then conducted using PAUP 4.0b10 package (Swofford 28 2003). Node support was assessed with the bootstrap technique, using 100 29 replicates. Bayesian phylogenetic analyses were also performed on nucleotide 30 sequences using MrBayes 3.0 (Ronquist & Huelsenbeck 2003) implemented in 31 Geneious. The Bayesian analysis was performed with the Metropolis-coupled Markov 32 chain Monte Carlo algorithm using a Codon Model. The tree-space was explored by 33 using four chains run during 2 million generations and saving every 100th tree. The 34 first 25,000 trees were discarded (Burnin’) based on preliminary runs that allowed 35 empirical checking of the point when chains reached apparent stationarity. A 36 consensus was built with the remaining trees. Bayesian probabilities were used to 37 evaluate branch support. 38 Amino acid AANAT2 sequences were also analyzed by ML. The model of 39 nucleotide substitution was selected in Prottest webserver (Abascal et al. 2005). The 40 best model based on the AIC was JTT+G (alpha = 0.73). It was implemented in 41 PhyML displayed in Geneious for further ML reconstructions (Guindon & Gascuel 42 2003). Node support was assessed with the bootstrap technique, using 1000 43 replicates. Bayesian phylogenetic analyses were also performed on this amino acid 44 matrix using MrBayes 3.0 (Ronquist & Huelsenbeck 2003) implemented in Geneious. 45 The Bayesian analysis was performed with the Metropolis-coupled Markov chain 46 Monte Carlo algorithm using a Mixed model. The tree-space was explored by using 47 four chains run during 4 million generations and saving every 100 th tree. The first 48 25,000 trees were discarded (Burnin’) using the same strategy as previously 49 mentioned for nucleotide sequences. A consensus was built with the remaining trees. 50 Bayesian probabilities were used to evaluate branch support. 51 Phylogenies based on the two mitochondrial genes were performed using the 52 same techniques as presented before for the AANAT2 gene. For nucleotide and 53 amino acid Bayesian reconstructions, the same parameters as the one employed for 54 AANAT2 were used. The more complex model was found at the second and first 55 position of the codon (GTR + I + G) for, respectively, Cytb and COX1, and was 56 applied to all tree positions in the ML reconstructions performed in PAUP. Amino acid 57 sequences of the two genes were concatenated and this new matrix was also used 58 as an input for further analyses. The best substitution models provided by Prottest 59 webserver (Abascal et al. 2005) were MtArt+I+G+F for COX1, and MtMam+I+G for 60 cytb and MtArt+I+G+F for the concatenation. They were used as previously 61 mentioned as inputs for the analysis in PhyML. Bayesian reconstructions were run 62 each time for 2 million generations. 63 Substitution rate and natural selection 64 AANAT2 polymorphism was assessed using DNAspV5 (Librado & Rozas 65 2009). We searched for signatures of positive or negative selection at every codon of 66 Teleost AANAT2 and O. aries AANAT using the HyPhy package (available through 67 the Datamonkey webserver (Delport et al. 2010; Kosakovsky Pond & Frost 2005). 68 Putative recombination within sequences was assessed with the GARD algorithm 69 that partitioned sequences in 4 non recombinant fragments. These fragments were 70 analyzed separately with the SLAC codon-based maximum likelihood method. The 71 best time-reversible model was detected and substitution rate parameters were 72 estimated. These values along with branch length (NJ tree) allowed obtaining a 73 global non synonymous / synonymous substitutions (dN/dS) ratio. Ancestral 74 sequences were then inferred site by site by ML providing the level of significance of 75 positive or negative selection at each codon position. 76 77 78 79 REFERENCES 80 81 82 Abascal F, Zardoya R, Posada D (2005) ProtTest: Selection of best-fit models of protein evolution. Bioinformatics, 21, 2104-2105. 83 84 85 Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA (2001) Electrostatics of nanosystems: Application to microtubules and the ribosome. Proceedings of the National Academy of Sciences of the United States of America, 98, 10037-10041. 86 87 Collaborative Computational Project Number 4 (1994) The CCP4 suite: programs for protein crystallography. Acta crystallographica. Section D, Biological crystallography, 50, 760-3. 88 89 Delport W, Poon AF, Frost SD, Kosakovsky Pond SL (2010) Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics, 1, 2455-2457. 90 91 92 Drummond AJ, Ashton B, Buxton S, Cheung M, Cooper A, Duran C, Field M, Heled J, Kearse M, Markowitz S, Moir R, Stones-Havas S, Sturrock S, Thierer T, Wilson A (2010) Geneious v5.3, Available from http://www.geneious.com. 93 94 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792-1797. 95 96 97 98 Fields PA, Somero GN (1998) Hot spots in cold adaptation: Localized increases in conformational flexibility in lactate dehydrogenase A4 orthologs of Antarctic notothenioid fishes. Proceedings of the National Academy of Sciences of the United States of America, 95, 11476-1181. 99 100 Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52, 696-704. 101 102 103 Hickman AB, Klein DC, Dyda F (1999a) Melatonin biosynthesis: the structure of serotonin Nacetyltransferase at 2.5 A resolution suggests a catalytic mechanism. Molecular cell, 3, 23-32. 104 105 106 Hickman AB, Namboodiri MA, Klein DC, Dyda F (1999b) The structural basis of ordered substrate binding by serotonin N-acetyltransferase: enzyme complex at 1.8 A resolution with a bisubstrate analog. Cell, 97, 361-369. 107 108 Kosakovsky Pond SL, Frost SDW (2005) Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics, 21, 2531-2533. 109 110 Librado P, Rozas J (2009) DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25, 1451-1452. 111 112 Maddison WP, Maddison DR (2011) Mesquite: a modular system for evolutionary analysis. Version 2.75 http://mesquiteproject.org. 113 114 Midford PE, Garland T, Maddison WP (2005) PDAP Package of Mesquite. Available at: http://mesquiteproject.org/pdap_mesquite/. 115 116 Posada D (2006) ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online. Nucleic Acids Research, 34, 700-703. 117 118 Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19, 1572-1574. 119 120 Swofford DL (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4 (ed S Associates). 121 122 123 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution, 28, 2731-2739. 124 125 126 Wolf E, De Angelis J, Khalil EM, Cole PA, Burley SK (2002) X-ray crystallographic studies of serotonin N-acetyltransferase catalysis and inhibition. Journal of molecular biology, 317, 215-224. 127 128