File S3 - Genetics

Supplemental Materials and Methods Motif Analysis The Mit1 motif (Figure S1a) was developed using the Motif Finder utility in MochiView v1.45 using 500bp centered on the extracted Mit1 peaks. The S288c Genome was used as the Sequence Set, a location set consisting of promoters from S288c was used for the Background Model Setting, and the desired motif width was set to 12bp. For an initial pass, only the lower 50 Mit1 binding sites (based on enrichment) were used to develop the motif with the remaining top 24 sites held back. Motifs developed using this approach were then analyzed for significance of enrichment using the Motif Enrichment Plot Function, using the withheld 24 binding sites as a Query Set and a set of 1985 500bp regions randomly taken from non-Mit1 bound promoters as the Control Set. The 1985 member control set was developed using a previously described appraoch (LOHSE et al. 2010). Motifs were compared based on their ability to explain the 24 withheld Mit1 sites with minimal false positive results in the control set; the other motifs developed using this approach were less able to predict Mit1 binding to the withheld data set. A second Mit1 motif (Figure S1b) was derived using all 74 Mit1 bound locations. This motif was then compared to the initial version using MochiView’s Motif Comparison feature (based on GUPTA et al. 2007), the two motifs match with an E-value of 0 indicating that the two sites are very unlikely to be different (Figure S1c). The initial Mit1 motif was also compared to the previously reported Wor1 motif (LOHSE et al. 2010), the E-value of 0.0033 indicates that these two sites are also likely to be identical. The Mit1 motif in Figure S1a can explain 50% of the 24 withheld binding sites at a stringency resulting in a 7.2% false positive rate, and 70.8% of withheld sites with a 20.4% false positive rate (Figure S1e). Looking at all 74 binding sites, the motif from Figure S1a can explain 51.4% of binding sites with a 8.7% false positive rate and 70.3% of sites with a 20.3% false positive rate (Figure S1f). The motif developing using all 74 binding sites (Figure S1b) can explain 51.4% of binding sites with a 7.6% false positive rate and 70.3% of sites with a 17.7% false positive rate (data not shown). Expression Microarrays S. cerevisiae strains were grown in 25mL cultures in rich medium (YEPD) to mid-log phase, as previously described (TUCH et al. 2008). Cultures were centrifuged for 5 min at 3,700g, the supernatant was removed, and pellets were frozen in liquid nitrogen and stored at −80 °C. RNA was isolated and reverse transcribed as previously described (MITROVICH et al. 2007), expect for the use of SuperScript II (Invitrogen). An equivalent volume of Cy3 (pooled reference) or Cy5 (individual sample) dye (Amersham) was added and the reaction was incubated in the dark at 65 °C for 20 min. Labelled cDNAs were purified with a Clean and Concentrator -5 kit (Zymo Research). Equal amounts of the Cy3-labelled and Cy5-labelled cDNA were hybridized overnight to Agilent microarrays containing probes from the Yeast v2 expression array 015072. After hybridization, the arrays were washed as specified by Agilent. Arrays were scanned at 5 μm, averaging two lines, with an Axon GenePix 4000A scanner. Arrays were gridded with GenePix Pro v5.1. Global Lowess normalization analysis was performed for each array with a Goulphar script (LEMOINE et al. 2006) for R (R Foundation for Statistical Computing). Normalized data were collapsed first by averaging the result for all duplicate probes and finally by taking the median of the probes for each ORF. Data were transformed as described for each experiment. Microarray data have been uploaded to NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE32558 (transcription arrays and ChIP-chip data) and GSE32550 (transcription arrays only). Chromatin Immunoprecipitation Normalized enrichment values were determined for every probe on the microarray by LOWESS normalization using Agilent Chip Analytics Version 1.2 software. Display, analysis, and identification of the binding events were determined using MochiView Version 1.45 software (http://johnsonlab.ucsf.edu/sj/mochiview-start/) (HOMANN and JOHNSON 2010), where peaks for the GFP-tagged strain (plus GFP antibody) or the wild-type strain (plus custom antibody), are compared to peaks from an untagged reference strain (plus GFP antibody) or the Δmit1 Δyhr177w deletion strain (plus custom antibody). Identification of binding events was determined by smoothing the two experimental data sets together and the two control data sets together using the “Extract Peaks from Data Set(s)” utility described in detail in the MochiView manual. Briefly, a smoothing function is first applied to the Chip Analytics log2 enrichment values, followed by the application of a peak detection algorithm, where all binding peaks are assigned a P-value using permutation testing. For greater confidence, the amount of sampling was increased tenfold from the default setting to 100,000 (number of random samples to compare against each peak), and 100 (maximum number of random samples passing for inclusion of peak). For the Mit1-GFP experiments, the user-defined cut-offs for the “minimum value for peak inclusion post-smoothing” were set at 1.5 for the experimental data set and 0.5 for the untagged control data set. These values were set to 0.644 and 0.2 respectively for the Yhr177w data set and 0.58 and 0.36 respectively for the Wor1 data set. The Mit1-GFP analysis initially identified 80 binding sites. We culled regions located over open reading frames, tRNAs, or ribosomal protein genes, reducing the number to 74 peaks corresponding to 68 intergenic regions. The Yhr177w analysis initially identified 29 binding sites, which was culled to 21 after removing sites located over open reading frames and applying a normalized enrichment requirement of at least 0.85. An independent analysis of the Yhr177w binding data using the Agilent Chip Analytics software identified several additional potential binding sites. These were hand curated, and three sites with a 1.5 fold (log 2) difference in enrichment between both experimental and both control experiments were added to the binding site list. These sites are listed with a significance value of “0” in Table S6. ChIP-chip data have been uploaded to NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE32558 (ChIP-chip data and transcription arrays) and GSE32557 (ChIP-chip data only). Binding Site Comparisons Binding data for Sok2, Tec1, and Ste12 were taken from previously published studies (BORNEMAN et al. 2007). All data sets for binding site comparisons were processed in the following manner, using MochiView Version 1.45. Using the “Merge Location Set, Subtraction” function, sequences corresponding to ORFs were subtracted from called binding sites. An intergenic region set containing the modified binding site list was then created using the “Merge Location Set, Union” function with “only keep locations intersected by all contributing location sets” selected. The overlap of the bound intergenic sets for different regulators was then compared using the “Merge Location Set, Intersection” function, and assessed for significance using a hypergeometric statistic; values were calculated using the website http://stattrek.com/tables/hypergeometric.aspx. Hand Annotation of Orthology Mapping Based on additional information, changes were made to the mapping for three sets of genes prior to further analysis. First, an examination of protein sequence alignment across many fungal species indicates that Mit1 and Yhr177w are clearly orthologs of Wor1. C. albicans Pth2, an ortholog to Schizosaccharomyces pombe Pac2 and many other proteins, was lost prior to the WGD and thus is not present in S. cerevisiae. Therefore, Mit1 and Yhr177w are mapped 2:1 to C. albicans Wor1 rather than 2:2 to C. albicans Wor1 and Pth2. In the second case, SYNERGY mis-annotates the orthology of Tec1 as a 1:4 annotation, calling four possible proteins in C. albicans as orthologs. Tec1 has a definitive TEA/ATTS DNA binding domain, and has been identified in C. albicans as having one clear ortholog, known as CaTec1 (SCHWEIZER et al. 2000). This has been re-classified a 1:1 orthologous relationship. Finally, SYNERGY did not find an ortholog for S. cerevisiae protein Rox1. C. albicans Rfg1 has previously been shown to be an ortholog to the S. cerevisiae protein Rox1 (KADOSH and JOHNSON 2001), an analysis confirmed by examination of an orthology tree. As a result, this has been reclassified from a 0:0 to a 1:1 relationship. These three hand annotations do not substantially change the results of the bootstrap statistical analysis described in the Materials and Methods section or those described below. Bootstrap Analysis of Target Overlap In order to ensure that our hand annotations did not affect the bootstrap analysis of Mit1/Wor1 target overlap, we repeated the bootstrap analysis on the purely systematic SYNERGY orthology set. Using this orthology set, 13 of the 65 S. cerevisiae orthology-mapped Mit1 targets share orthology with one or more genes in the Wor1 target list. Sets of 65 S. cerevisiae genes were sampled from the group of all S. cerevisiae genes with ortholog mappings (4,524 genes) and tested for whether the number of genes in the set with shared orthology to genes in the Wor1 target list equaled or exceeded the observed 13 gene Mit1-to-Wor1 mapping. No sampled set met this standard, confirming a significant enrichment of Wor1 target orthologs in the Mit1 gene list (n=1,000,000; p<1x10-6). A reciprocal test yielded similar results, comparing the benchmark of 11 of 119 Wor1 targets that share orthology with the Mit1 gene list against sets of 119 genes sampled from the 4,393 C. albicans genes with ortholog mappings (n=1,000,000; p=1x10-5). Because the orthogroup mappings are not always 1-to-1, we tested whether enrichment for targets with large orthogroups rather than specific enrichment for orthologs in the target group led to the significant overlap. We repeated the previously described bootstrapping analyses on the SYNERGY orthology set, this time only keeping sampled sets that mapped to at least as many target orthologs as the gene list of interest (the Mit1 target genes mapped to 82 C. albicans orthologs and the Wor1 target genes mapped to 150 S. cerevisiae orthologs). The results remained the same (n=1,000,000; S. cerevisiae-to-C. albicans p=3x10-6; C. albicans-to-S. cerevisiae p=7.6 x10-5). References BORNEMAN, A. R., Z. D. ZHANG, J. ROZOWSKY, M. R. SERINGHAUS, M. GERSTEIN et al., 2007 Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms. Funct. Integr. Genomics 7: 335345. HOMANN, O. R., and A. D. JOHNSON, 2010 MochiView: versatile software for genome browsing and DNA motif analysis. BMC Biol. 8: 49. KADOSH, D., and A. D. JOHNSON, 2001 Rfg1, a protein related to the Saccharomyces cerevisiae hypoxic regulator Rox1, controls filamentous growth and virulence in Candida albicans. Mol. Cell. Biol. 21: 2496-2505. LEMOINE, S., F. COMBES, N. SERVANT and S. LE CROM, 2006 Goulphar: rapid access and expertise for standard two-color microarray normalization methods. BMC Bioinformatics 7: 467. MITROVICH, Q. M., B. B. TUCH, C. GUTHRIE and A. D. JOHNSON, 2007 Computational and experimental approaches double the number of known introns in the pathogenic yeast Candida albicans. Genome Res. 17: 492-502. SCHWEIZER, A., S. RUPP, B. N. TAYLOR, M. RÖLLINGHOFF and K. SCHRÖPPEL, 2000 The TEA/ATTS transcription factor CaTec1p regulates hyphal development and virulence in Candida albicans. Mol. Microbiol. 38: 435-445. TUCH, B. B., D. J. GALGOCZY, A. D. HERNDAY, H. LI and A. D. JOHNSON, 2008 The evolution of combinatorial gene regulation in fungi. PLoS Biol. 6: e38.

File S3 - Genetics

Related documents

Products

Support

File S3 - Genetics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib