Supplementary Text Data collection and beta diversity of microbial communities from global AMD and associated environments We searched papers in Web of Science and reviewed molecular inventory studies that explored using a 16S rRNA clone library approach microbial communities in natural AMD and associated environments (such as acidic biofilm, sediment and tailings) distributed globally. Researches that retrieved sequences from artificial and semi-artificial acidic environments, such as enrichments, bioreactors, and bioleaching and wetland systems, were excluded from the metadata. 16S rRNA clone sequences were identified and recovered from GenBank. Samples with detailed information of OTUs and their relative retrieval in the clone library, or the overall community composition with relative abundance of each of the lineages (i.e., phylum, or sub-phylum for the Proteobacteria) were remained for further analyses. Site locations and environmental parameters including pH, temperature and concentrations of sulfate and total iron (ferric and ferrous irons) that were mostly recorded across the studies were summarized. It should be noted that we did not aim to conduct an exhaustive literature search but rather to reveal broad community patterns in these typical acidic environments through a meta-analysis of representative studies. We aligned representative sequences of the OTUs from each sample using NAST (DeSantis et al., 2006). Well-aligned sequences were imported to the ARB program (Ludwig et al., 2004) and added using the parsimony insertion tool to a tree with >200 000 sequences available at the Greengenes website (http://greengenes.lbl.gov). The reconstructed phylogenetic tree was 1 uploaded to the Fast UniFrac web interface (Hamady et al., 2010) to implement the weighted UniFrac analysis. Community composition of each sample was identified by taxonomic classification (RDP Classifier) of the OTUs and their relative recovery in the original clone libraries. References DeSantis TZ Jr, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM et al. (2006). NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 34: W394-W399. Hamady M, Lozupone C, Knight R. (2010). Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J 4: 17-27. Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar et al. (2004). ARB: a software environment for sequence data. Nucleic Acids Res 32: 1363-1371. 2