Supplementary Methods Marker Genes for Phage Taxa Phage Orthologous Groups (POGs) were constructed using the proteins contained in over 1000 phage genomes, including ss- and ds-DNA, ss- and dsRNA phages, and archaeal viruses (Kristensen et al. 2013). Briefly, proteins were clustered, using the standard COG construction algorithm, into orthologous groups using triangles of 3-way reciprocal best matches(Kristensen, Kannan, et al. 2010). Subsequently, for each POG a Viral Quotient (VQ) was calculated indicating its specificity to phage genomes, relative to non-prophage regions of bacterial chromosomes. Then, taxon-specific marker genes were identified that are never found in other viral taxa (i.e. 100% precision), and not found in nonprophage regions of bacterial chromosomes (i.e. VQ greater than 85%). These are therefore suitable for detecting phage taxa in mixed samples of prokaryotic and viral cells. Note that some marker genes have recall<100% (ie. <100% of the genomes in that taxa contain the marker gene), thus the absence of a detectable marker gene does not reliably indicate the absence of that type of phage in the sample, and thus for the abundance calculations, markers with recall <85% were not used. The presence of a phage in a given sample is determined by the detection of one of these marker genes, and the abundance of that phage was calculated using the average number of matches among the set of all marker genes that are specific to that taxa. Those markers with recall>=85% and present in at most a single copy per virus genome (such that one match to the gene is approximately equal to one match to the virus), were used for abundance calculations are labelled as "Quantitative" in the list of marker genes in Supplementary Table S1. To obtain the relative rank information Figure 1(a), abundance was calculated similarly, but for all markers (rather than just those that are Quantitative), and then the precise values were discarded, keeping only the qualitative information about the rank of each phage's abundance level relative to the other phages present in the same sample. Note: The abundance calculated using the Picovirinae subfamily marker gene is much higher than the abundance of the only Picovirinae genus with an available taxonomic marker gene (Phi29-like viruses); for most samples, the Phi29-like abundance is less than 1% of the abundance of the Picovirinae. Most likely, the abundance of the Picovirinae taxon in excess of that of Phi29-like viruses represents other Picovirinae such as Ahjd-like viruses and unclassified or unidentified members. Calculation of Bacterial Taxon Abundances. The MOCAT pipeline was used to calculate the abundance of each bacterial taxon. Briefly, metagenomic reads were mapped to a set of 10 universal marker genes from 3,496 reference prokaryotic genomes using 97 % nucleotide identity. Initially, only reads that were uniquely mapped were retained. Then for the reads that mapped to multiple marker genes, their abundance was distributed to their respective genomes according to the proportions determined using the unique-mappers. The abundance was calculated as the length normalized coverage per base pair. Identification of Prophage Host Interaction Network In order to avoid detecting false relationships, criteria for classifying the bacterial host, were rather strict. Firstly, the nucleotide sequences of the scaftigs up- and down-stream of the prophage regions were extracted and classified using Phylopithia (McHardy et al. 2007). Thus, the non-prophage bacterial chromosome was assigned to a taxon at the level at which the up and downstream classifications agreed with each other. Secondly, blastN was performed using the up- and down-stream sequences against a database of reference genome bacteria genomes, requiring a bit score greater than 60 and nucleotide identity greater than 85%. The last common ancestor of all hits was determined, and then the last common ancestor of the up- and down-stream sequences was finally used as the identity of the bacterial host. If a scaffold was classified using both methods, the most specific classification was used. Of the scaffolds that were classified using both methods, none of these were conflicting. Of the 463 refG- prophages that occur in the gut, 47 (10%) contained at least one taxon-specific marker gene. From these we constructed a network of phage taxa that infect specific bacterial genomes (Table S3). To simplify this network, all the associations are summarized at the level of genus for the bacteria. For the construction of the network diagram, phage-bacterial associations derived from the reference genomes were summarized at the level of genus for the bacteria. For example, the 2 associations of P2-like phages with Escherichia coli HS and of P2-like phages with Escherichia coli 53638 are summarized as 1 connection between P2-like phages and the genus Escherichia. Of the 2518 unique scaftig-prophages, 683 contain at least one marker POG (27%). However, only 363 scaffolds had reliable taxonomic classification of the surrounding host genomic region (14%), resulting in an intersection of 79 for which both the prophage region and the host chromosome had a taxonomic classification.