Silicon Biology Learn how to look at the yeast genome using the internet. Turn on a computer. Open a browser. Focus on the web pages for Entrez Pubmed Blast SGD UCSC genome browser From here you can find out many of the things about yeast. If you know the name of a gene, you can find its sequence. If you know its sequence, for example You can find out if there is a gene encoding it or something similar. SGD contains yeast information. Often we want to compare yeast sequences to sequences from other organisms such as humans or E. coli. For this we use BLAST. This is a program that finds sequence homologies. BLAST is available at SGD but only for looking at the yeast genome. BLAST at NCBI at the NLM can search a wider array of sequences and databases. You may encounter a sequence and you will want to know what this protein does. The sequence databases are linked to the literature through ENTREZ, which allows simple searches of all databases for keywords. Medline is also linked through PUBMED which allows literature searches. The BLAST results are often one or two clicks away from the abstract of a paper concerning the sequence your search has hit. By looking at the hits, you can learn about related sequences and their function. BLAST comes in different flavors. BLASTN uses a nucleotide query sequence to find similar nucleotide sequences in a nucleotide database. BLASTP uses a protein query to find similar sequences in a protein database. BLAST gives a score, a probability, and percent identity (and for proteins, percent similarity). The score comes from the program and is a number that is hard to think about. The probability represents the likelihood that a similar match could have occurred by chance. Thus a small P is, the more significant the match is likely to be. Similar sequences are expected to share origin and function, thus the extent and distribution of sequence similarity to other genes or proteins is of special interest to the searcher and can be used to infer structure-function and evolutionary relationship. A principal that guide most yeast research is similar to the rationale used to study E. colithat is that what is fundamental to life will be conserved in living things. In the case of yeast, the eukaryotic flavor of life is of special interest. What is true of fundamental eukaryotic processes in yeast will be conserved on other eukaryotes. Therefore upon finding a function for a gene/protein in yeast, one wants to know if similar genes/proteins are present in other organisms. Conversely, upon finding a function for a gene in humans, one would like to know whether such a homologous gene exists in yeast. If so, it may be that certain experiments can be more easily and informatively done in yeast. The power of yeast as a system lies in the ability to perform classical and reverse genetics. Classical genetics involves hunting for mutants, as well as suppressors and enhancers of mutations using selections and screens. The short generation time, ease of characterization and ability to screen millions of individuals makes yeast the best eukaryote for the application of genetics to fundamental issues of eukaryotic biology. Reverse genetics involves engineering a cell to express a desired mutant form of a protein or RNA and watching what happens. The genetic constitution of yeast can be changed virtually at will using recombinant techniques. The power of this approach is that specific hypotheses can be tested directly, and conclusions about how the component works can be made. Together, classical (forward) and reverse genetics can be applied iteratively. The following questions are designed to get you to start thinking about how to use the web as a resource for molecular genetics. Try a BLAST search using the sequence above. What protein did you identify? Can you take the protein sequence to the NCBI site and search databases there? What did you find? Are there any publications concerning this protein and its close relatives? To what family does this protein belong? There are 90 yeast ORF that are more than 70% identical to a known mammalian protein. What general classes of cellular functions do these genes carry out? What general classes of cellular functions seem to be missing? What do you think this means? What is Werner’s syndrome? Determine whether you think that the yeast homolog of the Werner’s syndrome gene represents a good model for the human process affected in Werner’s syndrome, how good is the evidence that the yeast homolog functions in the same way as the human protein functions. What is “human XPBC”? Is there a yeast homolog of this protein? If there is a yeast sequence with homology to the human protein, how conserved are the two proteins? How likely is it that the yeast homolog will function in the same way as the human protein? Saccharomyces cerevisiae Nomenclature GENES (LOCI) Gene symbols comprise three italic lowercase letters, and an Arabic number (full gene names are not controlled by the nomenclature system). Symbols are styled according to the phenotype of the identifying mutation or for the function of the wild-type gene (see ‘Genes’ and ‘Alleles’ for more details): lowercase italic for recessive, uppercase italic for dominant. ade5 cdc28 CUP1 SPC105 ALLELES Allele designations consist of the gene symbol, a hyphen and an italic Arabic number. act1-606 his2-1 PROTEINS Proteins are referred to by the relevant gene symbol, non-italic, initial letter uppercase and with the suffix ‘p’ (to avoid confusion with the phenotype, see below). If unambiguous, the suffix can be omitted e.g. ‘the Ade5 protein’. Ade5p Cdc28p Cup1p Spc105p PHENOTYPES Phenotypes are designated by a non-italic three-letter abbreviation corresponding to the gene symbol, initial letter uppercase. Wild-type or mutant status is indicated by a superscript plus or minus sign, respectively, e.g. a strain requiring arginine. Arg– (cf. wild type Arg+) SACCHAROMYCES CEREVISIAE – DETAILS GENES As mentioned above, for genes defined by mutation, upper- and lowercase designations are used for dominant and recessive alleles, respectively. However, because a given allele can be dominant in one cross and recessive in another, this can lead to some difficulty. On the genetic and physical maps, the convention is to use the mapped allele to decide which form of the name is used. Genes with related properties are usually given the same three-letter name and different numbers, e.g. there are multiple genes that have functions in mating-type switching. SWI SWI1 SWI3 SWI5 etc. Open reading frame (ORF) designations are not gene names but ‘location holders’ on the genetic map until a gene name is assigned. ORF names are always three non-italic uppercase letters, a number and a letter: Y (for yeast unknown sequence); A, B to P (for chromosome I, II through XVI); R or L (for right or left arm); a number corresponding to the order of the ORF (counting from the centromere), and W or C to designate Watson or Crick strand (the Watson strand is 5 →3 left telomere to right telomere), e.g. the 25th ORF on the left arm of chromosome XI. YKL025C Mitochondrial mutations should, in general, be designated following the rules outlined above, but wellknown symbols, such as +, –, + and –, have been retained. Detailed designations have been published for mitochondrial mutants1 and killer strains2. ALLELES Alleles created by recombinant DNA technology should be named by use of the symbol for the gene that is altered, followed by a symbol to indicate the nature of the alteration: disruption (::); deletion (-Δ); replacement (Δ::). (a) ade6::URA4 (b) ade6-Δ1 (c) ade6 ::URA4 e.g. (a) Disruption of the ade6 gene by integration by the functional URA4 gene. (b) Deletion number 1 of the ade6 gene. (c) Replacement of ade6 by the URA4 gene. Dominant and recessive suppressors are designated by three upper- or lowercase letters, respectively, and a locus number. SUP4 SUF1 sup35 suf11 Frameshift suppressors are normally designated in upper- or lowercase. SUF1 or suf1 Metabolic suppressors can be designated in various ways, e.g. (a) a suppressor of snf1; (b) a suppressor of rna1-1; (c) a suppressor of his2-1. (a) ssn1 (b) srn1 (c) suh1 Ochre and amber suppressors are sometimes distinguished by a bold-face suffix -o or -a. SUP4-o SUP4-a Intragenic mutations that inactivate suppressor function are designated by the same rules as other mutant alleles. sup4-o-1 Mating-type loci. Special rules apply: (a) wild-type alleles of the mating-type (MAT) locus; (b) the two complementation groups of the MAT locus; (c) mutations of the MAT genes are lowercase italic; (d) the two wild-type homothallic alleles at the HMR and HML loci; (e) mutations at the HMR and HML loci. (a) MATa and MAT (b) MAT 1 and MAT 2 (c) mata-1 and mat 1-1 (d) HMRa HMR HMLa HML (e) hmra-1 hml -1 Alleles resulting from transposon insertion are designated by the same rules as alleles created by recombination technology; the name of the transposon does not normally form part of the allele designation. ura3::Ty2 GENOTYPES The mating-type loci are typically listed first. If the cell is haploid, just one copy of each gene is listed. MAT act1-1 URA3 ADE2 If the cell is diploid then two copies of each gene are listed, separated by a slash. MAT /MATa act1-1/ACT1 ura3Δ/URA3 ADE2/ADE2 Nonmendelian genotypes (e.g. those conferred by plasmids and mitochondrial DNA elements) can be distinguished by square brackets2. [KIL-0] MAT trp1-1 CHROMOSOMES The 16 chromosomes are designated by Roman numerals. I to XVI (a) Chromosome arms are designated left (L, short arm) and right (R, long arm); (b) CEN, centromeres (no specific rule for telomeres). (a) L and R (b) CEN1 to CEN16 MOBILE ELEMENTS A new genetic nomenclature3 for S. cerevisiae transposons, called Ty elements (originally designated Ty1, Ty2, Ty3 and Ty4), has been created. The initial letter of the designation is Y, followed by the single letter for the chromosome containing the Ty element, L or R to denote which chromosome arm, C or W for the strand (as in ORF designations, see above), Ty1 or Ty2, etc., a hyphen and a number to make it unique. e.g. (a) The first Ty1 on chromosome V, right of the centromere, in the Crick strand; and (b) the first Ty5 on chromosome III, left of the centromere, in the Watson strand. The LTR sequences of Ty1 and -2, Ty3 and Ty4 are designated , and , respectively. (a) YERCTy1-1 (b)YCLWTy5-1 NOMENCLATURE INFORMATION The nomenclature rules for S. cerevisiae were compiled by the Committee for Genetic Nomenclature, chaired by Robert Mortimer. Queries about S. cerevisiae nomenclature should be addressed to: the SGD curators (yeastcurator@genome.stanford.edu). WEBSITES The Saccharomyces Genome Database (SGD) contains genetic maps, physical maps, DNA sequence data, functional analysis results, and a large collection of biological information gathered from the literature and the community. SGD also serves as the S. cerevisiae community’s repository for genetic nomenclature and maintains the Gene Name Registry. The genomic sequence and tables of useful information can also be obtained from the SGD FTP site. The MIPS Yeast Genome Project contains yeast genomic sequence and protein infromation. This site also includes database search features, a catalogue of protein functions and a growing number of reviews written for the MIPS website. Other topic areas of the MIPS site include transcription, lists of intron-containing genes, centromeres, and tables and graphics describing a large variety of results determined at MIPS. The Yeast Protein Database (YPD™) provides a web database on the literature and characteristics of yeast proteins. SGD http://genome-www.stanford.edu/Saccharomyces/ SGD ftp site ftp://genome-ftp.stanford.edu/yeast/ MIPS http://speedy.mips.biochem.mpg.de/mips/yeast/index.html YPD http://www.proteome.com/YPDhome.html GENOME PROJECT The complete genomic sequence was released in April 1996 (Ref. 4). See also Dujon5. Minor updates to the sequence can be obtained from the GenBank/EMBL/DDBJ sequence databases or from the SGD and MIPS yeast databases. STOCK CENTRE ATCC (American Type Culture Collection) contact: help@atcc.org (The YGSC has closed and all of its stocks will be available from the ATCC in the very near future.) References 1 Grivell, L. (1993) Mitochondrial DNA in the yeast Saccharomyces cerevisiae in Genetic Maps, 6th edn (O’Brien, S.J., ed.) pp. 3.57–3.65, Cold Spring Harbor Laboratory Press 2 Wickner, R.B. (1991) Yeast RNA virology: the killer systems in Molecular and Cellular Biology of the Yeast Saccharomyces (Vol. 1), (Broach, J.R., Pringle, J.R. and Jones, E.W., eds), pp. 263–296, Cold Spring Harbor Laboratory Press 3 Kim, J.M. et al. (1998) Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence Genome Res. 8, 464–478 4 The yeast genome directory, Nature 387, issue 66325S 5 Dujon, B. (1996) The yeast genome project: what did we learn? Trends Genet. 12, 263–270