Web resources

advertisement
Silicon Biology
Learn how to look at the yeast genome using the internet.
Turn on a computer. Open a browser. Focus on the web pages for
Entrez
Pubmed
Blast
SGD
UCSC genome browser
From here you can find out many of the things about yeast. If you know the name of a
gene, you can find its sequence.
If you know its sequence, for example
You can find out if there is a gene encoding it or something similar.
SGD contains yeast information. Often we want to compare yeast sequences to sequences
from other organisms such as humans or E. coli. For this we use BLAST. This is a
program that finds sequence homologies. BLAST is available at SGD but only for
looking at the yeast genome. BLAST at NCBI at the NLM can search a wider array of
sequences and databases.
You may encounter a sequence and you will want to know what this protein does. The
sequence databases are linked to the literature through ENTREZ, which allows simple
searches of all databases for keywords. Medline is also linked through PUBMED which
allows literature searches. The BLAST results are often one or two clicks away from the
abstract of a paper concerning the sequence your search has hit. By looking at the hits,
you can learn about related sequences and their function.
BLAST comes in different flavors. BLASTN uses a nucleotide query sequence to find
similar nucleotide sequences in a nucleotide database. BLASTP uses a protein query to
find similar sequences in a protein database.
BLAST gives a score, a probability, and percent identity (and for proteins, percent
similarity). The score comes from the program and is a number that is hard to think
about. The probability represents the likelihood that a similar match could have occurred
by chance. Thus a small P is, the more significant the match is likely to be. Similar
sequences are expected to share origin and function, thus the extent and distribution of
sequence similarity to other genes or proteins is of special interest to the searcher and can
be used to infer structure-function and evolutionary relationship.
A principal that guide most yeast research is similar to the rationale used to study E. colithat is that what is fundamental to life will be conserved in living things. In the case of
yeast, the eukaryotic flavor of life is of special interest. What is true of fundamental
eukaryotic processes in yeast will be conserved on other eukaryotes. Therefore upon
finding a function for a gene/protein in yeast, one wants to know if similar
genes/proteins are present in other organisms. Conversely, upon finding a function for a
gene in humans, one would like to know whether such a homologous gene exists in yeast.
If so, it may be that certain experiments can be more easily and informatively done in
yeast.
The power of yeast as a system lies in the ability to perform classical and reverse
genetics. Classical genetics involves hunting for mutants, as well as suppressors and
enhancers of mutations using selections and screens. The short generation time, ease of
characterization and ability to screen millions of individuals makes yeast the best
eukaryote for the application of genetics to fundamental issues of eukaryotic biology.
Reverse genetics involves engineering a cell to express a desired mutant form of a protein
or RNA and watching what happens. The genetic constitution of yeast can be changed
virtually at will using recombinant techniques. The power of this approach is that specific
hypotheses can be tested directly, and conclusions about how the component works can
be made. Together, classical (forward) and reverse genetics can be applied iteratively.
The following questions are designed to get you to start thinking about how to use the
web as a resource for molecular genetics.
Try a BLAST search using the sequence above. What protein did you identify? Can you
take the protein sequence to the NCBI site and search databases there? What did you
find? Are there any publications concerning this protein and its close relatives? To what
family does this protein belong?
There are 90 yeast ORF that are more than 70% identical to a known mammalian protein.
What general classes of cellular functions do these genes carry out? What general classes
of cellular functions seem to be missing? What do you think this means?
What is Werner’s syndrome? Determine whether you think that the yeast homolog of the
Werner’s syndrome gene represents a good model for the human process affected in
Werner’s syndrome, how good is the evidence that the yeast homolog functions in the
same way as the human protein functions.
What is “human XPBC”? Is there a yeast homolog of this protein? If there is a yeast
sequence with homology to the human protein, how conserved are the two proteins? How
likely is it that the yeast homolog will function in the same way as the human protein?
Saccharomyces cerevisiae Nomenclature
GENES (LOCI)
Gene symbols comprise three italic lowercase letters, and an Arabic number (full gene names are not
controlled by the nomenclature system).
Symbols are styled according to the phenotype of the identifying mutation or for the function of
the wild-type gene (see ‘Genes’ and
‘Alleles’ for more details): lowercase italic for recessive, uppercase italic for dominant.
ade5 cdc28 CUP1 SPC105
ALLELES
Allele designations consist of the gene symbol, a hyphen and an italic Arabic number.
act1-606 his2-1
PROTEINS
Proteins are referred to by the relevant gene symbol, non-italic, initial letter uppercase and with the suffix
‘p’ (to avoid confusion with the
phenotype, see below). If unambiguous, the suffix can be omitted e.g. ‘the Ade5 protein’.
Ade5p Cdc28p Cup1p Spc105p
PHENOTYPES
Phenotypes are designated by a non-italic three-letter abbreviation corresponding to the gene symbol,
initial letter uppercase.
Wild-type or mutant status is indicated by a superscript plus or minus sign, respectively, e.g. a strain
requiring arginine.
Arg– (cf. wild type Arg+)
SACCHAROMYCES CEREVISIAE – DETAILS
GENES
As mentioned above, for genes defined by mutation, upper- and lowercase designations are used for
dominant and recessive alleles, respectively.
However, because a given allele can be dominant in one cross and recessive in another, this can lead to
some difficulty. On the genetic and
physical maps, the convention is to use the mapped allele to decide which form of the name is used.
Genes with related properties are usually given the same three-letter name and different numbers,
e.g. there are multiple genes that
have functions in mating-type switching.
SWI SWI1 SWI3 SWI5 etc.
Open reading frame (ORF) designations are not gene names but ‘location holders’ on the genetic map
until a gene name is assigned. ORF names
are always three non-italic uppercase letters, a number and a letter: Y (for yeast unknown sequence); A,
B to P (for chromosome I, II through
XVI); R or L (for right or left arm); a number corresponding to the order of the ORF (counting from the
centromere), and W or C to designate
Watson or Crick strand (the Watson strand is 5 →3 left telomere to right telomere), e.g. the 25th ORF on
the left arm of chromosome XI.
YKL025C
Mitochondrial mutations should, in general, be designated following the rules outlined above, but wellknown symbols, such as +, –,
+ and
–, have been retained. Detailed designations have been published for mitochondrial mutants1 and killer
strains2.
ALLELES
Alleles created by recombinant DNA technology should be named by use of the symbol for the gene
that is altered, followed by a
symbol to indicate the nature of the alteration: disruption (::); deletion (-Δ); replacement (Δ::).
(a) ade6::URA4 (b) ade6-Δ1 (c) ade6 ::URA4
e.g. (a) Disruption of the ade6 gene by integration by the functional URA4 gene. (b) Deletion number 1
of the ade6 gene. (c) Replacement
of ade6 by the URA4 gene.
Dominant and recessive suppressors are designated by three upper- or lowercase letters, respectively,
and a locus number.
SUP4 SUF1 sup35 suf11
Frameshift suppressors are normally designated in upper- or lowercase.
SUF1 or suf1
Metabolic suppressors can be designated in various ways, e.g. (a) a suppressor of snf1; (b) a
suppressor of rna1-1; (c) a suppressor of his2-1.
(a) ssn1 (b) srn1 (c) suh1
Ochre and amber suppressors are sometimes distinguished by a bold-face suffix -o or -a.
SUP4-o SUP4-a
Intragenic mutations that inactivate suppressor function are designated by the same rules as other
mutant alleles.
sup4-o-1
Mating-type loci. Special rules apply: (a) wild-type alleles of the mating-type (MAT) locus; (b) the
two complementation groups of
the MAT locus; (c) mutations of the MAT genes are lowercase italic; (d) the two wild-type
homothallic alleles at the HMR and HML
loci; (e) mutations at the HMR and HML loci.
(a) MATa and MAT (b) MAT 1 and MAT 2
(c) mata-1 and mat 1-1 (d) HMRa HMR
HMLa HML (e) hmra-1 hml -1
Alleles resulting from transposon insertion are designated by the same rules as alleles created by
recombination technology; the name
of the transposon does not normally form part of the allele designation.
ura3::Ty2
GENOTYPES
The mating-type loci are typically listed first. If the cell is haploid, just one copy of each gene is listed.
MAT act1-1 URA3 ADE2
If the cell is diploid then two copies of each gene are listed, separated by a slash.
MAT /MATa act1-1/ACT1 ura3Δ/URA3 ADE2/ADE2
Nonmendelian genotypes (e.g. those conferred by plasmids and mitochondrial DNA elements) can
be distinguished by square brackets2.
[KIL-0] MAT trp1-1
CHROMOSOMES
The 16 chromosomes are designated by Roman numerals.
I to XVI
(a) Chromosome arms are designated left (L, short arm) and right (R, long arm); (b) CEN, centromeres
(no specific rule for telomeres).
(a) L and R (b) CEN1 to CEN16
MOBILE ELEMENTS
A new genetic nomenclature3 for S. cerevisiae transposons, called Ty elements (originally designated
Ty1, Ty2, Ty3 and Ty4), has been created. The
initial letter of the designation is Y, followed by the single letter for the chromosome containing the Ty
element, L or R to denote which chromosome
arm, C or W for the strand (as in ORF designations, see above), Ty1 or Ty2, etc., a hyphen and a number
to make it unique. e.g. (a) The first Ty1
on chromosome V, right of the centromere, in the Crick strand; and (b) the first Ty5 on chromosome III,
left of the centromere, in the Watson strand.
The LTR sequences of Ty1 and -2, Ty3 and Ty4 are designated , and , respectively.
(a) YERCTy1-1 (b)YCLWTy5-1
NOMENCLATURE INFORMATION
The nomenclature rules for S. cerevisiae were compiled by the Committee for Genetic Nomenclature,
chaired by Robert Mortimer. Queries
about S. cerevisiae nomenclature should be addressed to: the SGD curators (yeastcurator@genome.stanford.edu).
WEBSITES
The Saccharomyces Genome Database (SGD) contains genetic maps, physical maps, DNA sequence
data, functional analysis results, and a large collection of biological information gathered from the
literature and the community. SGD also serves as the S. cerevisiae community’s repository for genetic
nomenclature and maintains the Gene Name Registry. The genomic sequence and tables of useful
information can also be obtained from the SGD FTP site. The MIPS Yeast Genome Project contains
yeast genomic sequence and protein infromation. This site also includes database search features, a
catalogue of protein functions and a growing number of reviews written for the MIPS website. Other topic
areas of the MIPS site include transcription, lists of intron-containing genes, centromeres, and tables and
graphics describing a large variety of results determined at MIPS. The Yeast Protein Database (YPD™)
provides a web database on the literature and characteristics of yeast proteins.
SGD
http://genome-www.stanford.edu/Saccharomyces/
SGD ftp site ftp://genome-ftp.stanford.edu/yeast/
MIPS http://speedy.mips.biochem.mpg.de/mips/yeast/index.html
YPD
http://www.proteome.com/YPDhome.html
GENOME PROJECT
The complete genomic sequence was released in April 1996 (Ref. 4). See also Dujon5. Minor updates to
the sequence can be obtained from the GenBank/EMBL/DDBJ sequence databases or from the SGD and
MIPS yeast databases.
STOCK CENTRE
ATCC (American Type Culture Collection) contact: help@atcc.org (The YGSC has closed and all of its
stocks will be available from the
ATCC in the very near future.)
References
1 Grivell, L. (1993) Mitochondrial DNA in the yeast Saccharomyces cerevisiae in Genetic Maps,
6th edn (O’Brien, S.J., ed.) pp. 3.57–3.65, Cold Spring Harbor Laboratory Press
2 Wickner, R.B. (1991) Yeast RNA virology: the killer systems in Molecular and Cellular
Biology of the Yeast Saccharomyces (Vol. 1), (Broach, J.R., Pringle, J.R. and Jones, E.W.,
eds), pp. 263–296, Cold Spring Harbor Laboratory Press
3 Kim, J.M. et al. (1998) Transposable elements and genome organization: a comprehensive
survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome
sequence Genome Res. 8, 464–478
4 The yeast genome directory, Nature 387, issue 66325S
5 Dujon, B. (1996) The yeast genome project: what did we learn? Trends Genet. 12, 263–270
Download