Fig. S2. Unrooted neighbor-joining protein tree of proteolytic enzymes obtained from karst water biofilm and tufa (A). The tree includes representative members of serine protease families S1B, S1C, and S8A retrieved from MEROPS database (Rawlings et al., 2012). Amino acid sequences of published proteases were retrieved from GenBank and accession numbers are given in brackets. The length of branches indicates the number of amino acid substitutions per site. Evolutionary analyses were conducted using MEGA5 (Tamura et al., 2011). Bootstrap values were calculated from 1,000 resamplings. Characteristics of protease-encoding genes pwb1 to pwb5 and the corresponding gene products (B). Construction of fosmid libraries and screening for proteolytic enzymes Bacterial strains and vectors used in the present study are shown in Supplementary Table S1. Escherichia coli strains were routinely grown in Luria-Bertani (LB) medium at 37°C. Escherichia coli strain EPI300-T1R (Epicentre Biotechnologies, Madison, WI, USA) was used as a host for the cloning of metagenomic DNA. In addition, E. coli strain TOP10 (Invitrogen) was employed for subcloning of the target genes. The large-insert metagenomic fosmid libraries WB3, WB4, WB5 and WB5tufa (see Supplementary Table S2) were constructed by using the CopyControl Fosmid Library Production kit (Epicentre Biotechnologies) as described by Nacke et al. (Nacke et al. 2011b) resulting in approximately 36,800 library-containing clones (approximately 9,200 per sample), which were arrayed and stored in 96-well microtiter plates. For activity-based screening, the arrayed library-containing E. coli clones were streaked on LB agar plates containing skim milk (2% [wt/vol]) as indicator substrate. To maintain the presence of recombinant fosmids and increase the copy number of the fosmids the indicator agar contained 12.5 mg chloramphenicol L-1 and 0.001% arabinose, respectively. Proteolytic activity exhibiting clones were identified by the formation of halos on indicator agar after incubation for 1 to 14 days. Initially, 37 proteolytic clones were detected and controlled for unique fosmid inserts by restriction analysis with BamHI (Fermentas, St. Leon-Rot, Germany) as recommended by the manufacturer. Sequence analysis of proteolytic activity conferring genes To avoid time-consuming sequencing of complete fosmid inserts and to enable rapid detection of proteolytic activity conferring genes the target genes were subcloned. The recombinant fosmids from positive clones were isolated, sheared by sonication for 3 s at 30% amplitude, cycle 0.5 (UP200S Sonicator, Dr. Hielscher GmbH). Subsequently, the resulting DNA fragments were separated by agarose gel electrophoresis. Appropriate fragments (1.5 to 3.5 kbp) were excised and purified from gels by using the peqGold gel extraction kit (Peqlab Biotechnologie GmbH). The resulting DNA fragments were ligated into pCR-2.1-TOPO (Invitrogen), and used to transform E. coli TOP10 (Invitrogen) as recommended by the manufacturer. The resulting recombinant E. coli strains were re-screened on the indicator agar. The recombinant plasmids derived from positive clones were sequenced. The initial prediction of ORFs located on the plasmids-inserts was performed by using the ORFfinder program (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) provided by the National Center for Biotechnology Information (NCBI) and finally annotated with Artemis (Rutherford et al. 2000). The results were verified and improved manually by using criteria such as the presence of a ribosome-binding site, GC frameplot analysis, and similarity to known genes. Classification of the identified proteases was performed by BLAST searches implemented in the MEROPS peptidase database (http://merops.sanger.ac.uk) (Rawlings et al. 2012). Similarity searches of related protein sequences were performed by employing BLAST against the GenBank database. Domain structures were analyzed by submitting protein sequences to InterProScan (Quevillon et al. 2005). Multiple alignments of deduced protein sequences and construction of a phylogenetic tree were performed by employing MEGA5 (Tamura et al. 2011). Analyses of novel proteases The functional analysis revealed that protein metabolism including proteindegradation is of importance in the biofilm community. The screening of approximately 36,800 clones yielded 37 positive clones conferring a stable proteolytic phenotype. Finally, five unique proteolytic activity-conferring genes (pwb1 to pwb5) were identified. The deduced protein sequences revealed 50 to 87% amino acid identity to amino acid sequences from known proteases (Figure S2B). Classification of the deduced amino acid sequences according to the MEROPS protease database (Rawlings et al. 2012) revealed that the enzymes belong to S1 (chymotrypsin family) and S8 (subtilisin family) of the serine proteases. Proteases of family S1 are endoproteases (Rawlings et al. 2012), which are often characterized by the presence of an N-terminal signal peptide and a propeptide. Interestingly, Pwb1 belongs to subfamily S1B and the corresponding gene encodes no signal peptide, which has been also observed for a protease (CBM43238) detected in a study focused on screening of proteases from different environmental samples of Germany (Niehaus et al. 2011). Pwb2 belongs to subfamily S1C and is a putative periplasmic enzyme with a signal peptide showing highest identity to a hypothetical protein of eukaryotic origin. Pwb3 and Pwb5 grouped into family subfamily S8A, which is represented by the endoprotease subtilisin Carlsberg. All deduced amino acid sequences, except Pwb2 exhibited highest similarities to cyanobacterial proteases (Figure S2A). Pwb2 showed highest identity (50%) to a hypothetical protein of the fern Selaginella moellendorffii. Due to the low similarity of Pwb2 to the related database entry it is likely that this gene originated from cyanobacteria or chloroplasts. The cyanobacterial-related proteases (Pwb1, Pwb3, Pwb4 and Pwb5) were affiliated to filamentous Oscillatoriales (Leptolynbya sp. PCC 7375 and Oscillatoria nigro-viridis) and are probably derived from members of the high abundant TPM clade detected by 16S rRNA gene analysis of the biofilm (Figure 5A). References Nacke H, Will C, Herzog S, Nowka B, Engelhaupt M, Daniel R. 2011b. Identification of novel lipolytic genes and gene families by screening of metagenomic libraries derived from soil samples of the German Biodiversity Exploratories. FEMS Microbiol Ecol 78: 188-201. Niehaus F, Gabor E, Wieland S, Siegert P, Maurer KH, Eck J. 2011. Enzymes for the laundry industries: tapping the vast metagenomic pool of alkaline proteases. Microb Biotechnol 4: 767-776. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R et al. 2005. InterProScan: protein domains identifier. Nucleic Acids Research 33: W116-120. Rawlings ND, Barrett AJ, Bateman A. 2012. MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 40: D343-350. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA et al. 2000. Artemis: sequence visualization and annotation. Bioinformatics 16: 944-945. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular biology and evolution 28: 2731-2739.