Supplementary material Supplementary table 1: List of metagenomes and MG-RAST (Meyer et al. 2008) accession numbers # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 MG-RAST metagenome name 174-1 174-2 179-1 179-2 Bag1_13thMay_DNA Bag6_May13th_DNA SRS000298 GS006 Shotgun - Estuary - North American East Coast - Bay of Fundy, Nova Scotia - Canada GS011 Shotgun - Estuary - North American East Coast - Delaware Bay, NJ USA GS012 Shotgun - Estuary - North American East Coast - Chesapeake Bay, MD - USA mb2000jd298_1 mb2000jd298_2 mb2001jd115_1 mb2001jd115_2 mb2001jd135_1 mb2001jd135_2 BBAY01 BBAY15 GS005 Shotgun - Embayment - North American East Coast - Bedford Basin, Nova Scotia - Canada GS002 Shotgun - Coastal - North American East Coast - Gulf of Maine Canada GS003 Shotgun - Coastal - North American East Coast - Browns Bank, Gulf of Maine - Canada GS004 Shotgun - Coastal - North American East Coast - Outside Halifax, Nova Scotia - Canada GS007 Shotgun - Coastal - North American East Coast - Northern Gulf of Maine - Canada GS008 Shotgun - Coastal - North American East Coast - Newport Harbor, RI MG-RAST accession 4443725.3 4443729.3 4443731.3 4443732.3 4440212.3 4440213.3 4443707.3 Size (bp) 5.90E+07 6.74E+07 4.39E+07 5.28E+07 4.73E+07 3.10E+07 3.14E+07 Category in figure 1 algal bloom algal bloom algal bloom algal bloom algal bloom algal bloom algal bloom 4441582.3 6.46E+07 Estuary 4441658.3 1.33E+08 Estuary 4441584.3 4443713.3 4443712.3 4443714.3 4443715.3 4443716.3 4443717.3 4443688.3 4443693.3 1.36E+08 5.30E+07 4.79E+07 4.50E+07 4.14E+07 5.30E+07 4.47E+07 9.72E+07 1.77E+08 Estuary Bay Bay Bay Bay Bay Bay Bay Bay 4441581.3 6.60E+07 Bay 4441579.3 1.29E+08 Coastal 4441580.3 6.69E+07 Coastal 4441152.3 5.69E+07 Coastal 4441153.3 5.54E+07 Coastal 4441583.3 1.38E+08 Coastal 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 - USA GS009 Shotgun - Coastal - North American East Coast - Block Island, NY USA GS010 Shotgun - Coastal - North American East Coast - Cape May, NJ USA GS013 Shotgun - Coastal - North American East Coast - Off Nags Head, NC - USA GS014 Shotgun - Coastal - North American East Coast - South of Charleston, SC - USA GS015 Shotgun - Coastal - Caribbean Sea - Off Key West, FL - USA GS016 Shotgun - Coastal Sea - Caribbean Sea - Gulf of Mexico - USA GS019 Shotgun - Coastal - Caribbean Sea - Northeast of Colon - Panama GS021 Shotgun - Coastal - Eastern Tropical Pacific - Gulf of Panama Panama GS027 Shotgun - Coastal - Galapagos Islands - Devil's Crown, Floreana Island Ecuador GS028 Shotgun - Coastal - Galapagos Islands - Coastal Floreana - Ecuador GS029 Shotgun - Coastal - Galapagos Islands - North James Bay, Santigo Island - Ecuador GS034 Shotgun - Coastal - Galapagos Islands - North Seamore Island - Ecuador GS035 Shotgun - Coastal - Galapagos Islands - Wolf Island - Ecuador GS036 Shotgun - Coastal - Galapagos Islands - Cabo Marshall, Isabella Island Ecuador GS049 Shotgun - Coastal - Polynesia Archipelagos - Moorea, Outside Cooks Bay - Fr. Polynesia GS117a Shotgun - Coastal sample Indian Ocean - St. Anne Island, Seychelles - Seychelles GS117b Shotgun - Coastal sample Indian Ocean - St. Anne Island, Seychelles - Seychelles GS108a Shotgun - Lagoon Reef - Indian Ocean - Coccos Keeling, Inside Lagoon Australia GS108b Shotgun - Lagoon Reef - Indian 4441143.3 8.43E+07 Coastal 4441144.3 8.24E+07 Coastal 4441585.3 1.49E+08 Coastal 4441659.3 1.40E+08 Coastal 4441586.3 1.38E+08 Coastal 4441660.3 1.37E+08 Coastal 4441589.3 1.46E+08 Coastal 4441591.3 1.43E+08 Coastal 4441595.3 2.37E+08 Coastal 4441596.4 2.05E+08 Coastal 4441596.3 1.44E+08 Coastal 4441600.3 1.42E+08 Coastal 4441601.3 1.52E+08 Coastal 4441602.3 8.58E+07 Coastal 4441605.3 9.44E+07 Coastal 4441613.3 3.40E+08 Coastal 4441148.3 5.48E+07 Coastal 4441139.3 4441133.3 5.09E+07 5.35E+07 Reef Reef 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 Ocean - Coccos Keeling, Inside Lagoon Australia GS025 Shotgun - Fringing Reef - Eastern Tropical Pacific - Dirty Rock, Cocos Island - Costa Rica GS048a Shotgun - Coral Reef - Polynesia Archipelagos - Moorea, Cooks Bay - Fr. Polynesia GS048b Shotgun - Coral Reef - Polynesia Archipelagos - Moorea, Cooks Bay - Fr. Polynesia GS051 Shotgun - Coral Reef Atoll Polynesia Archipelagos - Rangirora Atoll Fr. Polynesia GS148 Shotgun - Fringing Reef - Indian Ocean - East coast Zanzibar (Tanzania), offshore Paje lagoon - Tanzania GS032 Shotgun - Mangrove - Galapagos Islands - Mangrove on Isabella Island Ecuador GS031 Upwelling, Fernandina Island Ecuador GS149 Shotgun - Harbor - Indian Ocean West coast Zanzibar (Tanzania), harbour region - Tanzania GS000a Shotgun - Open Ocean Sargasso Sea - Sargasso Station 13 Bermuda GS000b Shotgun - Open Ocean Sargasso Sea - Sargasso Station 13 Bermuda GS000c Shotgun - Open Ocean Sargasso Sea - Sargasso Stations 3 Bermuda GS000d Shotgun - Open Ocean Sargasso Sea - Sargasso Station 13 Bermuda GS001a Shotgun - Open Ocean Sargasso Sea - Hydrostation S - Bermuda GS001b Shotgun - Open Ocean Sargasso Sea - Hydrostation S - Bermuda GS001c Shotgun - Open Ocean Sargasso Sea - Hydrostation S - Bermuda GS017 Shotgun - Open Ocean Caribbean Sea - Yucatan Channel Mexico GS018 Shotgun - Open Ocean Caribbean Sea - Rosario Bank - Honduras GS022 Shotgun - Open Ocean - Eastern Tropical Pacific - 250 miles from Panama 4441593.3 1.30E+08 Reef 4441603.3 9.28E+07 Reef 4441167.3 5.10E+07 Reef 4441604.3 1.40E+08 Reef 4441617.3 1.08E+08 Reef 4441598.3 1.53E+08 Reef 4441597.3 4.62E+08 Upwelling 4441618.3 1.11E+08 Harbor 4441571.3 6.59E+08 Open Ocean 4441573.3 3.21E+08 Open Ocean 4441574.3 3.72E+08 Open Ocean 4441575.3 3.36E+08 Open Ocean 4441576.3 1.43E+08 Open Ocean 4441577.3 9.10E+07 Open Ocean 4441578.3 9.27E+07 Open Ocean 4441587.3 2.81E+08 Open Ocean 4441588.3 1.56E+08 Open Ocean 4441592.3 1.31E+08 Open Ocean 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 City - Panama GS023 Shotgun - Open Ocean - Eastern Tropical Pacific - 30 miles from Cocos Island - Costa Rica GS026 Shotgun - Open Ocean Galapagos Islands - 134 miles NE of Galapagos - Ecuador GS037 Shotgun - Open Ocean - Eastern Tropical Pacific - Equatorial Pacific TAO Buoy - International GS047 Shotgun - Open Ocean - Tropical South Pacific - 201 miles from F. Polynesia - French Polynesia GS109 Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS110a Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS110b Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS111 Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS112a Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS112b Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS113 Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS114 Shotgun - Open Ocean - Indian Ocean - 500 Miles west of the Seychelles in the Indian Ocean - International GS115 Shotgun - Open Ocean - Indian Ocean - Indian Ocean - International GS116 Shotgun - Open Ocean - Indian Ocean - Outside Seychelles, Indian Ocean - Seychelles GS119 Shotgun - Open Ocean - Indian Ocean - International Water Outside of Reunion Island - International GS120 Shotgun - Open Ocean - Indian Ocean - Madagascar Waters Madagascar GS121 Shotgun - Open Ocean - Indian Ocean - International water between Madagascar and South Africa International GS122a Shotgun - Open Ocean - Indian Ocean - International waters between Madagascar and South Africa International GS122b Shotgun - Open Ocean - Indian 4441661.3 1.44E+08 Open Ocean 4441594.3 1.09E+08 Open Ocean 4441145.3 6.87E+07 Open Ocean 4441146.3 6.83E+07 Open Ocean 4441155.3 6.28E+07 Open Ocean 4441607.3 1.00E+08 Open Ocean 4441134.3 5.36E+07 Open Ocean 4441156.3 6.21E+07 Open Ocean 4441609.3 1.02E+08 Open Ocean 4441147.3 5.56E+07 Open Ocean 4441610.3 1.18E+08 Open Ocean 4441611.3 3.45E+08 Open Ocean 4441150.3 6.42E+07 Open Ocean 4441149.3 6.42E+07 Open Ocean 4441151.3 6.51E+07 Open Ocean 4441135.3 4.57E+07 Open Ocean 4441614.3 1.19E+08 Open Ocean 4441615.3 4441139.4 1.05E+08 5.27E+07 Open Ocean Open Ocean 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 Ocean - International waters between Madagascar and South Africa International GS123 Shotgun - Open Ocean - Indian Ocean - International water between Madagascar and South Africa International S_35131 S_35155 S_35163 S_35179 SRS000297 1-19-DNA-flx 6-19-DNA-flx Arctic Canada AntarcticaAquatic_5 - MARINE DERIVED LAKE AntarcticaAquatic_6 - ACE LAKE, ANTARCTICA AntarcticaAquatic_8 Newcomb Bay AntarcticaAquatic_9 GS020 Shotgun - Fresh Water - Panama Canal - Lake Gatun - Panama Tilapia pond microbes Duplicate GS030 Shotgun - Warm Seep - Galapagos Islands - Warm seep, Roca Redonda Ecuador OctopusSpringsMatCoreF GS033 Shotgun - Hypersaline - Galapagos Islands - Punta Cormorant, Hypersaline Lagoon, Floreana Island - Ecuador HighSalternSDbayMicD200407 LowSalternSDbayMic200407 SaltonSeaMic20060823 Tamarix Arabidopsis Clover Soybean Rice phyllosphere Rice Rhizosphere NTS_creosote.amb.1 NTS_crust.amb.1 Luquillo Experimental Forest Soil, Puerto Rico Waseca Farm Soil 4441616.3 4443766.3 4443697.3 4443699.3 4443701.3 4443705.3 4440275.3 4440276.3 4441621.3 1.16E+08 4.36E+07 6.81E+07 6.48E+07 5.19E+07 6.82E+07 5.93E+07 6.82E+07 6.83E+07 Open Ocean Open Ocean Open Ocean Open Ocean Open Ocean Open Ocean Open Ocean Open Ocean Arctic/Antarctic 4443682.3 2.84E+08 Arctic/Antarctic 4443684.3 4443686.3 4443687.3 2.81E+08 1.02E+08 9.57E+07 Arctic/Antarctic Arctic/Antarctic Arctic/Antarctic 4441590.3 4440423.3 3.15E+08 3.88E+07 Fresh Fresh 4441662.3 4443749.3 3.92E+08 1.86E+07 Warm Hot 4441599.3 4440438.3 4440437.3 4440329.3 4448834.3 4447810.3 4447811.3 4447793.3 4450328.3 4449956.3 4445996.3 4445993.3 7.30E+08 3.48E+07 2.53E+07 1.89E+07 3.98E+08 2.56E+08 2.41E+08 1.30E+08 8.32E+08 3.96E+08 1.17E+08 1.34E+08 Saline Saline Saline Saline Tamarix Phyllosphere Phyllosphere Phyllosphere Phyllosphere Rhizosphere Creosote Crust 4446153.3 4441091.3 3.22E+08 1.54E+08 Soil Soil 112 113 114 115 Whale Fall Bone Whale Fall Mat Whale Fall Rib Gut Microflora 4441619.3 4441656.4 4441620.3 4440461.5 4.13E+07 3.79E+07 3.91E+07 9.15E+07 Dark Dark Dark Dark Relative abundance calculation Various methods have been proposed for normalization of metagenomic data for comparative purposes, attempting to control for variance caused by differences in average read length and average genome size among other factors (Raes et al., 2007; Beszteri et al., 2010). Here, we use the simple approach of calculating the relative abundance of a gene by dividing the number of BLAST hits to this gene within a metagenome by the number of BLAST hits to a universal single copy marker gene. This approach is based on the assumption that all noise generating factors act in a similar manner on the abundance of the gene in question and on the abundance of the marker gene, and are thus eliminated by division. The factor that remains to be controlled for is gene length, which linearly increases the number of BLAST hits a gene gets in a dataset. Genes used in this calculation were selected according to their occurrence profile in sequenced genomes (using Gene Search and Function Search across all sequenced bacterial and archaeal genomes on the JGI IMG server: http://img.jgi.doe.gov/cgibin/w/main.cgi), eliminating genes that had multiple copies within genomes or genes that only appeared in some genomes within their category (sup. Table 2). In order to further control for stochastic variability in marker gene abundances in metagenomes, we normalized each gene to the average hit number of 35 universal single copy COGs used in (Beszteri et al., 2010; sup. Table 3): 1. The hit number (Hm) of each universal marker gene (i) is divided by the gene length (lm): 2. The normalization denominator is calculated as the average of M(i): 3. The hit number (Hp) of each photic gene is divided by the gene length (lp): 4. The normalized abundance (A) is calculated by the division . 5. For each functional category (PS I, PS II type 1 RC, type 2 RC), the average abundance was calculated according to genes within it. Genes were selected according to their occurrence profile in sequenced genomes (sup. Table 2). The normalizaiotion method was validated by measuring the cross-metagenome stochimerty of PS I and PS II abundance, expected to be equal. Supplementary figure 1: Validation of normalization method: As photosystem I and photosystem II genes appear in bacterial genomes in tandem, a normalized abundance measure is expected to yield a 1:1 ration between the two. Supplementary table 2: PS I and PS II genes and their occurrence profile in sequenced genomes. Gene photosystem I P700 chlorophyll a apoprotein subunit Ia (PsaA) photosystem I P700 chlorophyll a apoprotein subunit Ib (PsaB) photosystem I iron-sulfur center subunit VII (PsaC) photosystem I subunit II (PsaD) photosystem I subunit IV (PsaE) photosystem I subunit III precursor, plastocyanin (cyt c553) docking protein (PsaF) photosystem I subunit VI (PsaH) photosystem I subunit VIII (PsaI) photosystem I subunit IX (PsaJ) photosystem I subunit X (PsaK, PsaK1) photosystem I subunit XI (PsaL) photosystem I subunit XII (PsaM) photosystem II protein D1 (PsbA) Photosystem II CP47 protein (PsbB) Photosystem II CP43 protein (PsbC) photosystem II protein D2 (PsbD) Cytochrome b559 alpha chain (PsbE) Cytochrome b559 beta chain (PsbF) Photosystem II 10 kDa phosphoprotein (PsbH) Photosystem II protein PsbI Photosystem II protein PsbJ Photosystem II protein PsbK Photosystem II protein PsbL Photosystem II protein PsbM Photosystem II manganese-stabilizing protein (PsbO) Photosystem II oxygen evolving complex protein PsbP Photosystem II protein PsbT Photosystem II 12 kDa extrinsic protein (PsbU) Photosystem II protein PsbV, cytochrome c550 Photosystem II 13 kDa protein Psb28 (PsbW) Photosystem II protein PsbX Photosystem II protein PsbY Photosystem II protein PsbZ Number of Occurrences in Sequenced Genomes 84 86 70 81 79 78 14 58 71 103 84 52 232 87 99 126 73 71 80 60 66 72 60 61 83 92 57 54 67 87 53 75 72 Number of Sequenced Genomes Featuring Gene 72 72 67 73 71 74 10 58 65 72 74 52 75 70 71 72 70 64 70 60 65 68 60 60 73 74 57 50 52 81 52 68 69 Used as marker gene? Yes No Yes Yes Yes Yes No No Yes No Yes No No No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes No No No No Yes No Yes Yes Photosystem II protein Psb27 73 71 Yes Supplementary figure 2: Correlation between gene abundances in respective DNA and mRNA samples from the Plymouth Marine Lab Coastal Waters project. Points represent average abundance across samples. Standard Error of the mean is shown. Proteorhodopsin Supplementary table 3: List of clusters of orthologous groups used for normalization COG id COG0012 COG0016 COG0048 COG0049 COG0052 COG0080 COG0081 COG0085 COG0087 COG0088 COG0090 COG0091 COG0092 COG0093 COG0094 COG0096 COG0097 COG0098 COG0099 COG0100 COG0102 COG0103 COG0124 COG0184 COG0185 COG0186 COG0197 COG0200 COG0201 COG0256 COG0495 COG0522 COG0525 COG0533 COG0541 Function Predicted GTPase, probable translation factor Phenylalanyl-tRNA synthetase alpha subunit Ribosomal protein S12 Ribosomal protein S7 Ribosomal protein S2 Ribosomal protein L11 Ribosomal protein L1 DNA-directed RNA polymerase, beta subunit/140 kD subunit Ribosomal protein L3 Ribosomal protein L4 Ribosomal protein L2 Ribosomal protein L22 Ribosomal protein S3 Ribosomal protein L14 Ribosomal protein L5 Ribosomal protein S8 Ribosomal protein L6P/L9E Ribosomal protein S5 Ribosomal protein S13 Ribosomal protein S11 Ribosomal protein L13 Ribosomal protein S9 Histidyl-tRNA synthetase Ribosomal protein S15P/S13E Ribosomal protein S19 Ribosomal protein S17 Ribosomal protein L16/L10E Ribosomal protein L15 Preprotein translocase subunit SecY Ribosomal protein L18 Leucyl-tRNA synthetase Ribosomal protein S4 and related proteins Valyl-tRNA synthetase Metal-dependent proteases with possible chaperone activity Signal recognition particle GTPase References 1. Meyer F, Paarmann D, D`Souza M, Olson R, Glass EM et al. (2008) The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes BMC Bioinformatics 9: 386. 2. Beszteri B, Temperton B, Frickenhaus, S, Giovannoni, SJ. (2010) Average genome size: a potential source of bias in comparative metagenomics. ISMEJ 4: 1075 3. Raes J, Korbel JO, Lercher M, von Mering C, Brook P. (2007) Prediction of effective genome size in metagenomic samples. Genome Biol. 8: R10