File S3 - Genetics

advertisement
Supplemental Materials and Methods
Motif Analysis
The Mit1 motif (Figure S1a) was developed using the Motif Finder utility in MochiView
v1.45 using 500bp centered on the extracted Mit1 peaks. The S288c Genome was used as the
Sequence Set, a location set consisting of promoters from S288c was used for the Background
Model Setting, and the desired motif width was set to 12bp. For an initial pass, only the lower
50 Mit1 binding sites (based on enrichment) were used to develop the motif with the remaining
top 24 sites held back. Motifs developed using this approach were then analyzed for significance
of enrichment using the Motif Enrichment Plot Function, using the withheld 24 binding sites as a
Query Set and a set of 1985 500bp regions randomly taken from non-Mit1 bound promoters as
the Control Set. The 1985 member control set was developed using a previously described
appraoch (LOHSE et al. 2010). Motifs were compared based on their ability to explain the 24
withheld Mit1 sites with minimal false positive results in the control set; the other motifs
developed using this approach were less able to predict Mit1 binding to the withheld data set.
A second Mit1 motif (Figure S1b) was derived using all 74 Mit1 bound locations. This
motif was then compared to the initial version using MochiView’s Motif Comparison feature
(based on GUPTA et al. 2007), the two motifs match with an E-value of 0 indicating that the two
sites are very unlikely to be different (Figure S1c).
The initial Mit1 motif was also compared to
the previously reported Wor1 motif (LOHSE et al. 2010), the E-value of 0.0033 indicates that
these two sites are also likely to be identical.
The Mit1 motif in Figure S1a can explain 50% of the 24 withheld binding sites at a
stringency resulting in a 7.2% false positive rate, and 70.8% of withheld sites with a 20.4% false
positive rate (Figure S1e). Looking at all 74 binding sites, the motif from Figure S1a can
explain 51.4% of binding sites with a 8.7% false positive rate and 70.3% of sites with a 20.3%
false positive rate (Figure S1f). The motif developing using all 74 binding sites (Figure S1b) can
explain 51.4% of binding sites with a 7.6% false positive rate and 70.3% of sites with a 17.7%
false positive rate (data not shown).
Expression Microarrays
S. cerevisiae strains were grown in 25mL cultures in rich medium (YEPD) to mid-log
phase, as previously described (TUCH et al. 2008). Cultures were centrifuged for 5 min at 3,700g,
the supernatant was removed, and pellets were frozen in liquid nitrogen and stored at −80 °C.
RNA was isolated and reverse transcribed as previously described (MITROVICH et al.
2007), expect for the use of SuperScript II (Invitrogen). An equivalent volume of Cy3 (pooled
reference) or Cy5 (individual sample) dye (Amersham) was added and the reaction was
incubated in the dark at 65 °C for 20 min. Labelled cDNAs were purified with a Clean and
Concentrator -5 kit (Zymo Research).
Equal amounts of the Cy3-labelled and Cy5-labelled cDNA were hybridized overnight to
Agilent microarrays containing probes from the Yeast v2 expression array 015072. After
hybridization, the arrays were washed as specified by Agilent. Arrays were scanned at 5 μm,
averaging two lines, with an Axon GenePix 4000A scanner. Arrays were gridded with GenePix
Pro v5.1. Global Lowess normalization analysis was performed for each array with a Goulphar
script (LEMOINE et al. 2006) for R (R Foundation for Statistical Computing). Normalized data
were collapsed first by averaging the result for all duplicate probes and finally by taking the
median of the probes for each ORF. Data were transformed as described for each experiment.
Microarray data have been uploaded to NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/)
under accession numbers GSE32558 (transcription arrays and ChIP-chip data) and GSE32550
(transcription arrays only).
Chromatin Immunoprecipitation
Normalized enrichment values were determined for every probe on the microarray by
LOWESS normalization using Agilent Chip Analytics Version 1.2 software. Display, analysis,
and identification of the binding events were determined using MochiView Version 1.45
software (http://johnsonlab.ucsf.edu/sj/mochiview-start/) (HOMANN and JOHNSON 2010), where
peaks for the GFP-tagged strain (plus GFP antibody) or the wild-type strain (plus custom
antibody), are compared to peaks from an untagged reference strain (plus GFP antibody) or the
Δmit1 Δyhr177w deletion strain (plus custom antibody). Identification of binding events was
determined by smoothing the two experimental data sets together and the two control data sets
together using the “Extract Peaks from Data Set(s)” utility described in detail in the MochiView
manual. Briefly, a smoothing function is first applied to the Chip Analytics log2 enrichment
values, followed by the application of a peak detection algorithm, where all binding peaks are
assigned a P-value using permutation testing. For greater confidence, the amount of sampling
was increased tenfold from the default setting to 100,000 (number of random samples to
compare against each peak), and 100 (maximum number of random samples passing for
inclusion of peak). For the Mit1-GFP experiments, the user-defined cut-offs for the “minimum
value for peak inclusion post-smoothing” were set at 1.5 for the experimental data set and 0.5 for
the untagged control data set. These values were set to 0.644 and 0.2 respectively for the
Yhr177w data set and 0.58 and 0.36 respectively for the Wor1 data set.
The Mit1-GFP analysis initially identified 80 binding sites. We culled regions located
over open reading frames, tRNAs, or ribosomal protein genes, reducing the number to 74 peaks
corresponding to 68 intergenic regions. The Yhr177w analysis initially identified 29 binding
sites, which was culled to 21 after removing sites located over open reading frames and applying
a normalized enrichment requirement of at least 0.85. An independent analysis of the Yhr177w
binding data using the Agilent Chip Analytics software identified several additional potential
binding sites. These were hand curated, and three sites with a 1.5 fold (log 2) difference in
enrichment between both experimental and both control experiments were added to the binding
site list. These sites are listed with a significance value of “0” in Table S6.
ChIP-chip data have been uploaded to NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/)
under accession numbers GSE32558 (ChIP-chip data and transcription arrays) and GSE32557
(ChIP-chip data only).
Binding Site Comparisons
Binding data for Sok2, Tec1, and Ste12 were taken from previously published studies
(BORNEMAN et al. 2007). All data sets for binding site comparisons were processed in the
following manner, using MochiView Version 1.45. Using the “Merge Location Set,
Subtraction” function, sequences corresponding to ORFs were subtracted from called binding
sites. An intergenic region set containing the modified binding site list was then created using
the “Merge Location Set, Union” function with “only keep locations intersected by all
contributing location sets” selected. The overlap of the bound intergenic sets for different
regulators was then compared using the “Merge Location Set, Intersection” function, and
assessed for significance using a hypergeometric statistic; values were calculated using the
website http://stattrek.com/tables/hypergeometric.aspx.
Hand Annotation of Orthology Mapping
Based on additional information, changes were made to the mapping for three sets of
genes prior to further analysis. First, an examination of protein sequence alignment across many
fungal species indicates that Mit1 and Yhr177w are clearly orthologs of Wor1. C. albicans Pth2,
an ortholog to Schizosaccharomyces pombe Pac2 and many other proteins, was lost prior to the
WGD and thus is not present in S. cerevisiae. Therefore, Mit1 and Yhr177w are mapped 2:1 to
C. albicans Wor1 rather than 2:2 to C. albicans Wor1 and Pth2. In the second case, SYNERGY
mis-annotates the orthology of Tec1 as a 1:4 annotation, calling four possible proteins in C.
albicans as orthologs. Tec1 has a definitive TEA/ATTS DNA binding domain, and has been
identified in C. albicans as having one clear ortholog, known as CaTec1 (SCHWEIZER et al.
2000). This has been re-classified a 1:1 orthologous relationship. Finally, SYNERGY did not
find an ortholog for S. cerevisiae protein Rox1. C. albicans Rfg1 has previously been shown to
be an ortholog to the S. cerevisiae protein Rox1 (KADOSH and JOHNSON 2001), an analysis
confirmed by examination of an orthology tree. As a result, this has been reclassified from a 0:0
to a 1:1 relationship. These three hand annotations do not substantially change the results of the
bootstrap statistical analysis described in the Materials and Methods section or those described
below.
Bootstrap Analysis of Target Overlap
In order to ensure that our hand annotations did not affect the bootstrap analysis of
Mit1/Wor1 target overlap, we repeated the bootstrap analysis on the purely systematic
SYNERGY orthology set.
Using this orthology set, 13 of the 65 S. cerevisiae orthology-mapped Mit1 targets share
orthology with one or more genes in the Wor1 target list. Sets of 65 S. cerevisiae genes were
sampled from the group of all S. cerevisiae genes with ortholog mappings (4,524 genes) and
tested for whether the number of genes in the set with shared orthology to genes in the Wor1
target list equaled or exceeded the observed 13 gene Mit1-to-Wor1 mapping. No sampled set
met this standard, confirming a significant enrichment of Wor1 target orthologs in the Mit1 gene
list (n=1,000,000; p<1x10-6). A reciprocal test yielded similar results, comparing the benchmark
of 11 of 119 Wor1 targets that share orthology with the Mit1 gene list against sets of 119 genes
sampled from the 4,393 C. albicans genes with ortholog mappings (n=1,000,000; p=1x10-5).
Because the orthogroup mappings are not always 1-to-1, we tested whether enrichment
for targets with large orthogroups rather than specific enrichment for orthologs in the target
group led to the significant overlap. We repeated the previously described bootstrapping
analyses on the SYNERGY orthology set, this time only keeping sampled sets that mapped to at
least as many target orthologs as the gene list of interest (the Mit1 target genes mapped to 82 C.
albicans orthologs and the Wor1 target genes mapped to 150 S. cerevisiae orthologs). The
results remained the same (n=1,000,000; S. cerevisiae-to-C. albicans p=3x10-6; C. albicans-to-S.
cerevisiae p=7.6 x10-5).
References
BORNEMAN, A. R., Z. D. ZHANG, J. ROZOWSKY, M. R. SERINGHAUS, M. GERSTEIN et al., 2007
Transcription factor binding site identification in yeast: a comparison of high-density
oligonucleotide and PCR-based microarray platforms. Funct. Integr. Genomics 7: 335345.
HOMANN, O. R., and A. D. JOHNSON, 2010 MochiView: versatile software for genome browsing
and DNA motif analysis. BMC Biol. 8: 49.
KADOSH, D., and A. D. JOHNSON, 2001 Rfg1, a protein related to the Saccharomyces cerevisiae
hypoxic regulator Rox1, controls filamentous growth and virulence in Candida albicans.
Mol. Cell. Biol. 21: 2496-2505.
LEMOINE, S., F. COMBES, N. SERVANT and S. LE CROM, 2006 Goulphar: rapid access and
expertise for standard two-color microarray normalization methods. BMC Bioinformatics
7: 467.
MITROVICH, Q. M., B. B. TUCH, C. GUTHRIE and A. D. JOHNSON, 2007 Computational and
experimental approaches double the number of known introns in the pathogenic yeast
Candida albicans. Genome Res. 17: 492-502.
SCHWEIZER, A., S. RUPP, B. N. TAYLOR, M. RÖLLINGHOFF and K. SCHRÖPPEL, 2000 The
TEA/ATTS transcription factor CaTec1p regulates hyphal development and virulence in
Candida albicans. Mol. Microbiol. 38: 435-445.
TUCH, B. B., D. J. GALGOCZY, A. D. HERNDAY, H. LI and A. D. JOHNSON, 2008 The evolution of
combinatorial gene regulation in fungi. PLoS Biol. 6: e38.
Download