1 Supplementary Information for “The effect of microbial colonization on the host proteome 2 varies by gastrointestinal location” 3 4 Authors and Affiliations: 5 Joshua S. Lichtman1, Emily Alsentzer2 , Mia Jaffe3, Daniel Sprockett4, Evan Masutani5, Elvis Ikwa5, 6 Gabriela K. Fragiadakis4, David Clifford6, Bevan Emma Huang7, Justin L. Sonnenburg4,^, Kerwyn 7 Casey Huang4,5,^, Joshua E. Elias1,^ 8 9 1Department of Chemical and Systems Biology, Stanford University School of Medicine, 318 10 Campus Drive, Stanford, California 94305, USA 11 2Department 12 94305, USA 13 3Department 14 CA 94305, USA 15 4Department 16 Campus Drive, Stanford, California 94305, USA 17 5Department 18 USA 19 6The 20 7Digital 21 Dutton Park QLD 4102, Australia of Computer Science, Stanford University, 450 Serra Mall, Stanford California of Genetics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, of Microbiology and Immunology, Stanford University School of Medicine, 299 of Bioengineering, Stanford University, 443 Via Ortega, Stanford, California 94305, Climate Corporation, San Francisco, CA 64103, USA Productivity Flagship, Commonwealth Scientific and Industrial Research Organization, 22 23 Supplemental Figure 1. Hierarchical clustering of protein abundance by GI region. Clustering 24 of the abundance of all 853 proteins was performed using Euclidean distance and average 25 linkage metrics. Boxes around clusters identify regions of the gut that are enriched in that 26 particular cluster. Samples are labelled with the mouse, colonization state, and region with the 27 labels of samples that clustered correctly based on location colored black. 28 29 30 Supplemental Figure 2. Protein function groups by GI region. PCA of the summed abundance 31 of 1,520 GO terms across all 45 luminal samples. 32 33 34 Supplemental Figure 3. GO term abundance clustered by gut location. Hierarchical clustering 35 of 1,520 GO terms using Euclidean distance and average linkage metrics. Boxes outline clusters 36 that are composed mainly of one or two GI regions. Samples are labelled with the mouse, 37 colonization state, and region with the labels of samples that clustered correctly based on 38 location colored black. 39 40 Supplemental Table Legends 41 42 Supplemental Table 1. List of identified proteins. The Uniprot identifiers and annotations for 43 the 853 proteins identified in this analysis. 44 45 Supplemental Table 2. Abundance of luminal proteins. The average and standard deviation of 46 the most abundant host proteins identified in this analysis. 47 48 Supplemental Table 3. The core mouse luminal proteome. Eighteen proteins were identified in 49 every sample, regardless of colonization state, GI region, or biological replicate. 50 51 Supplemental Table 4. Random forest classifiers effectively predict locations. Top 10% of 52 proteins selected for random forest based on all mice (All), the B. thetaiotaomicron and germ- 53 free mice (BT/GF), or the conventionally raised mice (CR). Proteins are ordered by importance 54 as measured by mean decrease in Gini score (MDG) in the random forest model, and mean 55 relative abundance of each protein (MRA) is given within each subpopulation. 56 57 Supplemental Table 5. Top 10% of GO terms selected for random forest for four locations 58 based on all mice (All), the B. thetaiotaomicron and germ-free mice (BT/GF), or the 59 conventionally raised mice (CR). GO terms are ordered by importance as measured by mean 60 decrease in Gini score (MDG) in the random forest model, and mean relative abundance of 61 each GO term (MRA) is given within each subpopulation. 62 63 Supplemental Table 6. Proteins identified in each k-means cluster for each colonization state. 64 65 Supplemental Table 7. Proximal colon samples from germ-free mice fall within the variability 66 of stool experiments. Spearman correlation coefficients of protein abundance measured in 67 proximal colon luminal contents from germ-free mice and fecal contents from two independent 68 sets of germ-free mice. 69 70 Supplemental Table 8. Proximal colon luminal samples from conventional mice fall outside 71 the variability of stool experiments. Spearman correlation coefficients of protein abundance 72 measured in proximal colon luminal contents from conventional mice and fecal samples from 73 two independent sets of conventional mice.