Pi c Sp ov ir o Au un ina to av e g i N rap rina 4− h e lik ivi H e v rina p1 ir e u 93 T7− like se 6 lik vir s la e us ct v G oco irus ok c e * M us cal s ho ph Ph u−li vir ag i2 ke ina es 9 v e T1 −lik iru e s Ph −lik vi es iC e v r u s D ir e s Pe 119 use d li s P u ke * L 2− ovi viru am like rina s b d vi e a ru PB −li se ke s I3 1lik vi Ph −lik evir rus iK e v us es Z − ir u li s T ke es SP ec vir 6− tivir use lik us s e vi ru se s 1 2 3 4 5 total log10 sum abundance per Gbp across the samples Supplementary Figures ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Figure S1 – Total Sum Abundance of all Phage Taxa Across 252 Samples. The asterix besides Mu-like and Lambda-like viruses indicates that their marker genes do not detected all of the genomes in the genera (ie. recall < 85%) and thus do not accurately reflect the abundance of those taxa. 4− lik Pi e v co iru H viri ses p1 n 93 Sp like ae 6 ou vi la n a r u c s Ph toc viri i2 oc na * M 9−li cal e u− ke pha l vi g Pe ike rus es Au du viru es to ov s Ph gra irin es iC ph ae D G 1 1 ivir i ok 9 na l T7 ush ikev e −l ov iru P2 ike irin s −l vir ae I3 ike use Ph −lik viru s e * L iKZ vi ses − am li rus b ke es SP da− vir 6− lik use T1 like e v s −l v irus ik ir e PB e v use s 1l irus s i Te kev es c t ir u iv s ir u s N 0 20 40 60 80 Percentage of Samples Taxa is Present In 100 ● ● ubiquitous taxa ● moderately prevalent ● ● ● ● ● ● ● ● ● ● ● rare taxa ● ● ● ● ● ● Figure S2 – Prevalence of all Phage taxa in the 252 Samples. The asterix besides Mu-like and Lambda-like viruses indicates that their marker genes do not detected all of the genomes in the genera (ie. recall < 85%) and thus do not accurately reflect the abundance of those taxa. 8 6 4 2 Number of Non−overlaping Phage taxa Denmark Spain USA Figure S3 – Number of Non-overlapping Phage Taxa Per Sample, Grouped by Country of Residence of the Subject. Subfamilies with representative genera were not included in this analysis. For example, Autographivirinae was not included as a taxon, as it is represented by the T7-like and SP6-like genera. Pi N cov 4 − ir l in Sp ike ae o u vir u H nav se p 1 ir s Pe like inae v G d u o ir u ok v s i Au us rin to ho a e Ph gr vir iK aph ina Z iv e I3 −lik irin − e a S lik v e 93 P6 e v irus 6 −li iru es la ke se ct o v s T1 co irus c − Ph lik cal es i2 e v ph * 9−l iru ag * L Mu ike ses es a m − lik vir bd e v use P2 a− iru s l s Ph −lik ike es e iC v vir D ir u s 1 u PB 19 se es li s T7 1lik kev e −l v iru ik iru s e Te vir s ct us iv es iru s −3 −2 −1 0 1 median log10 abundance per Gbp in each sample 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Figure S4 – Median Abundance Per Sample for of all Phage Taxa detected. The asterix besides Mu-like and Lambda-like viruses indicates that their marker genes do not detected all of the genomes in the genera (ie. recall < 85%) and thus do not accurately reflect the abundance of those taxa. ● ● ● 0.0 1 ● 0.5 ● ● −0.5 −1.5 −1.0 log10abundance per Gbp −2.0 0 −1 ● ● ● ● ● 5 ● ● ● ● ● 10 ● ● −3.0 −3 −2.5 −2 log10abundance per Gbp ● ● ● ● 15 ● ● ● 5 ● ● ● ● ● ● 10 phage taxa 2 ● ● ● 15 phage taxa ● ● ● ● ● ● 0 ● ● ● −3 −2 −1 −1 0 log10abundance per Gbp ● −2 log10abundance per Gbp 1 1 ● ● ● ● ● ● ● ● ● ● ● −3 ● 5 10 phage taxa ● ● ● ● ● ● ● ● ● ● ● ● 15 5 10 15 phage taxa Figure S5 – Example Rank Abundance Curves for 4 random samples. On the y-axis is the log10 abundance of the taxon and the x-axis indicates the rank of the taxon. These are for non-overlapping taxa as explained above. number of scaftig prophage regions 2500 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0 ● present PtoH>5 PtoH>10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 25 50 ● ● ● ● ● ● ● ● ● ● ● ● 75 100 125 150 175 200 225 252 number of samples Figure S6 – Number of Scaftig-predicted prophages identified. The -axis indicates the number of predicted prophages detected, and the x-axis indicates the number of samples the prophage was detected in. The orange and red points indicates prophages that were deemed to be active based on their PtoH ratio. ● present PtoH>5 PtoH>10 450 400 number of refGenome prophage regions ● 350 300 250 ● ● ● ● ● 200 ● 150 ● ● ● ● 100 ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 5 25 50 75 100 125 150 175 200 225 252 number of samples the reference genome prophage is present in Figure S7 - Number of refG-predicted prophages identified. The Y-axis indicates the number of predicted prophages detected, and the x-axis indicates the number of samples the prophage was detected in. The orange and red points indicates prophages that were deemed to be active based on their PtoH ratio.