1 Supplementary Methods 2 Sampling, DNA extraction, and sequencing 3 Over 80 peat cores were collected in July 2012 from the Bog Lake fen (Fen, 47° 30' 22.62", - 4 93° 29' 20.46") and S1 bog (47° 30' 22.3914", -93° 27' 11.772") sites in the Marcell 5 Experimental Forest (MEF) in northern Minnesota, USA. Biogeochemical characterization 6 (microbial community composition, functional potential, and organic matter properties, etc.) of 7 these samples has been described elsewhere (Lin et al 2014a, Lin et al 2014b, Tfaily et al 2014). 8 Two 75-100 cm depth intervals from Bog Lake Fen and S1 bog T3M (a mid-point of the transect 9 #3) sites were selected for metagenomic sequencing because of the high proportion (up to 60% 10 of total community) of Archaea detected in these soils based on quantitative real time PCR and 11 amplicon sequencing data (Lin et al 2014b). The bog and fen sites differed in their vegetation 12 cover, porosity of peat column, organic matter properties, and thus microbial community 13 composition (Lin et al 2014b, Tfaily et al 2014). The bog is an acidic (pH 3.5-4.0) and nutrient- 14 deficient environment that receives water inputs primarily from precipitation. In contrast, Bog 15 Lake Fen (pH≈4.5-4.8) is a poor fen, which contains a higher coverage of vascular plants along 16 with Sphagnum. Core sections were homogenized and subsampled in sterile bags. Samples for 17 DNA extractions were frozen at -20oC within 2 hours of sampling and then transferred to a -80oC 18 freezer at the end of the sampling day. 19 Genomic DNA was extracted from triplicate peat soil subsamples using a MoBio PowerSoil 20 DNA extraction kit (MoBio, Carlsbad, CA) following the manufacturer’s protocol and using 0.5 21 g of peat per extraction. Extracted DNA samples were pooled. Libraries for metagenomic 22 sequencing were generated using the Nextera DNA sample preparation kit (Illumina, Inc. San 23 Diego, CA). Libraries were size-selected using E-Gels (Life Technologies, Inc.) for an insert size 24 range of 400-800 bp. Libraries were then quantified and quality checked using the Invitrogen 25 Qubit and Agilent Bioanalyzer. All libraries were pooled in an equimolar ratio and sequenced 26 using an Illumina HiSeq2000 instrument, generating paired-end reads of 150 bases in length. 27 Metagenome assembly, binning, and annotation 28 Low quality reads were trimmed using Sickle (v. 1.29) with a quality score threshold of Q=3 29 (Joshi NA and Fass JN, unpublished, https://github.com/najoshi/sickle), and then assembled 30 using IDBA_ud v. 1.0.9 (Peng et al 2012), with a min and max kmer size of 40 and 70, 31 respectively, and a 5 kmer step size. Contigs greater than 2 kb were retained for further analysis, 32 and taxonomic binning. The coverage of each contig was determined by mapping reads using 33 Bowtie v. 1.0.0 with default settings (Langmead et al 2009). The clustering of contigs into 34 genome bins was based on agreements among contig tetranucleotide frequency composition, 35 read coverage, G+C content, and predicted protein UBLAST (usearch v. 7.0.1001, Edgar RC, 36 unpublished, http://drive5.com/usearch/) hits to the UniProt UniRef90 database, following gene 37 prediction with Prodigal in metagenomics mode (Hyatt et al 2012). To aid binning, Emergent 38 Self Organizing Maps (ESOM) were created using Databionics ESOM tools (http://databionic- 39 esom.sourceforge.net/) and tetranucleotide frequencies predicted after fragmenting contigs into 5 40 kb lengths (Dick et al 2009). Contigs fragmented into 2 kb lengths were then projected onto to 41 these maps. Genome completeness was determined by searching for 31 bacterial single copy 42 marker genes or 104 archaeal single copy marker genes, using the ‘MarkerScanner.pl’ script of 43 AMPHORA2 (Wu and Scott 2012). 44 Each assembled genome was uploaded to the Integrated Microbial Genomes (IMG) system at 45 DOE's Joint Genome Institute (JGI) for annotation (Markowitz et al 2014). For comparison, 46 genome sequences were also uploaded to the RAST server and annotated by RAST v4.0 47 (Overbeek et al 2014). Metabolic reconstruction and pathway mapping of each assembled 48 genome was completed using the KEGG Automatic Annotation Server (KAAS) (Moriya et al 49 2007). Identification of glycoside hydrolases (GH) was performed by searching against the Pfam 50 database using HMMER v.3.0 (Finn et al 2014), including targets of GH families, 66 51 carbohydrate active enzymes, 34 carbohydrate binding modules, 3 polysaccharide lyases, and 5 52 carbohydrate esterases as listed in (Tveit et al 2013). Representative genes in fermentation 53 pathways were counted to determine the genomic potential for producing lactate, H2, ethanol, 54 propionate, CO2, acetate and butyrate by fermentation (Kirchman et al 2014). 55 Phylogenetic analysis of 16S rRNA genes and concatenated marker genes 56 Expectation maximization iterative reconstruction of genes from the environment (EMIRGE) 57 was used to reconstruct full-length 16S rRNA gene sequences from the metagenomic data 58 (Miller et al 2011), with sequence clustering at 97% identity. Taxonomy was determined by 59 searching representative sequences against a dereplicated version of the SILVA 108 databases 60 using SINA v1.2.11 (Miller et al 2011, Pruesse et al 2012). The 16S rRNA gene fragments 61 present in contigs were detected and retrieved by using Metaxa (Bengtsson et al 2011). 16S 62 rRNA gene sequences reconstructed by EMIRGE were matched to genomes by aligning 16S 63 rRNA gene sequences from both EMIRGE and Metaxa using SINA v1.2.11 (Pruesse et al 2012), 64 along with sequences from their closest relatives and sequences previously found in amplicon 65 sequencing from the same sample (Lin et al 2014b). MEGA6 (Tamura et al 2013) was used to 66 construct the Maximum Likelihood tree using the Kimura 2-parameter model for nucleotide 67 evolution, with a bootstrap test of 500 replication. The tree branches of Crenarchaeota and 68 Thaumarchaeota were clustered and collapsed into taxonomic groups according to the 69 crenarchaeotal taxonomic framework used in (Ochsenreiter et al 2003) and (Kubo et al 2012). 70 To construct a concatenated core gene tree for the two near complete genome bins, a distance 71 tree was first created based on 16S rRNA gene sequences from genomes of all Crenarchaeota, 72 Thaumarchaeota, and Thermoplasmata available in the IMG database (Markowitz et al 2014). 73 This 16S tree was used to identify the closest neighbors for the two near complete genomes. 74 Genome sequences of the two assembled genomes and their closest neighbors were then 75 downloaded and searched against 104 archaeal marker proteins using AMPHORA2. Marker 76 proteins identified in each downloaded genome were aligned and concatenated after removing 77 unaligned fragments. All aligned and concatenated protein sequences of each genome were then 78 used to construct the concatenated core gene tree using FastTree v2.1.3 with default parameters 79 (Price et al 2010). 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 Supplementary references Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet GA et al (2011). Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek 100: 471-475. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP et al (2009). Community-wide analysis of microbial genome sequence signatures. Genome Biol 10: R85. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR et al (2014). Pfam: the protein families database. Nucleic Acids Res 42: D222-230. Hyatt D, LoCascio PF, Hauser LJ, Uberbacher EC (2012). Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28: 2223-2230. Kirchman DL, Hanson TE, Cottrell MT, Hamdan LJ (2014). Metagenomic analysis of organic matter degradation in methane-rich Arctic Ocean sediments. Limnol Oceanogr 59: 548-559. Kubo K, Lloyd KG, J FB, Amann R, Teske A, Knittel K (2012). Archaea of the Miscellaneous Crenarchaeotal Group are abundant, diverse and widespread in marine sediments. Isme J 6: 1949-1965. Langmead B, Trapnell C, Pop M, Salzberg SL (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10. 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 Lin X, Tfaily MM, Green SJ, Steinweg JM, Chanton P, Imvittaya A et al (2014a). Microbial metabolic potential for carbon degradation and nutrient (nitrogen and phosphorus) acquisition in an ombrotrophic peatland. Appl Environ Microbiol 80: 3531-3540. Lin X, Tfaily MM, Steinweg JM, Chanton P, Esson K, Yang ZK et al (2014b). Microbial community stratification linked to utilization of carbohydrates and phosphorus limitation in a boreal peatland at Marcell Experimental Forest, Minnesota, USA. Appl Environ Microbiol 80: 3518-3530. Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Pillay M et al (2014). IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res 42: D560567. Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011). EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol 12: R44. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007). KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 35: W182-185. Ochsenreiter T, Selezi D, Quaiser A, Bonch-Osmolovskaya L, Schleper C (2003). Diversity and abundance of Crenarchaeota in terrestrial habitats studied by 16S RNA surveys and real time PCR. Environmental Microbiology 5: 787-797. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T et al (2014). The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42: D206-214. Peng Y, Leung HCM, Yiu SM, Chin FYL (2012). IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28: 1420-1428. Price MN, Dehal PS, Arkin AP (2010). FastTree 2--approximately maximum-likelihood trees for large alignments. Plos One 5: e9490. Pruesse E, Peplies J, Glockner FO (2012). SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28: 1823-1829. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013). MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30: 2725-2729. Tfaily MM, Cooper WT, Kostka JE, Chanton PR, Schadt CW, Hanson PJ et al (2014). Organic matter transformation in the peat column at Marcell Experimental Forest: Humification and vertical stratification. J Geophys Res Biogeosci 119: 661-675. Tveit A, Schwacke R, Svenning MM, Urich T (2013). Organic carbon transformations in high- 151 152 153 154 155 156 Arctic peat soils: key functions and microorganisms. Isme J 7: 299-311. Wu M, Scott AJ (2012). Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28: 1033-1034.