epd29

advertisement
Eukaryotic Promoter Database
User Manual
Written by:
Philipp Bucher
Biocomputing
Institut Suisse de Recherches Experimentales sur le Cancer
Ch. des Boveresses 155
CH-1066 Epalinges s/Lausanne
Switzerland
Electronic mail: Philipp.Bucher@Isrec.Arcom.CH
This manual and the database it accompanies may be
copied and redistributed freely, without advance
permission, provided that
this statement
is
reproduced with each copy.
Published Research assisted by the Eukaryotic Promoter Database should cite:
Philipp Bucher (1991).
The Eukaryotic Promoter
Database of the Weizmann Institute of Science.
EMBL Nucleotide Sequence Data Library Release
Postfach 10.2209, D-6900 Heidelberg.
29,
<PAGE>
Eukaryotic Promoter Database
User Manual
Release 29, November 1991
CONTENTS
1.
2.
3.
4.
4.1.
4.2.
4.2.1.
4.2.2.
4.2.3.
4.3.
4.4.
5.
6.
7.
INTRODUCTION . . . . . . . .
PROMOTER SELECTION . . . . .
ASSIGNMENT OF INITIATION SITE
FORMAT CONVENTIONS . . . . .
The title line . . . . . .
Promoter entries . . . . .
The FP line . . . . . . . .
Documentation . . . . . . .
Literature references . . .
Discarded entries . . . . .
Miscellaneous . . . . . . .
CLASSIFICATION . . . . . . .
HOMOLOGOUS PROMOTERS . . . .
PROMOTER SEQUENCE RETRIEVAL .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
3
3
3
4
5
6
6
6
6
7
8
APPENDIX A
SURVEY OF RELEASE 29
APPENDIX B
CODES AND ABBREVIATIONS
B.1
B.2
B.3
SPECIES CODES . . . . . . . . . . . . . . . . . . B-1
JOURNAL CODES . . . . . . . . . . . . . . . . . . B-4
ABBREVIATIONS . . . . . . . . . . . . . . . . . . B-6
<PAGE>
1
INTRODUCTION
The Eukaryotic Promoter Database EPD is developed and maintained at the
Weizmann
Institute of Science in Rehovot (Israel) and distributed as a supplement
to the
EMBL Data Library. It provides information about eukaryotic promoters
available
in the EMBL Data Library and is intended to assist experimental
researchers, as
well as computer analysts, in the investigation of eukaryotic
transcription signals. The present version originated from a previous compilation
published in an
article (1) and is organized as a hierarchically ordered and documented
"functional position set" (2) pointing to transcription initiation sites. All
information is directly abstracted from scientific literature and is thus
independent
of the EMBL sequence entry descriptions. As a consequence, many of the
initiation sites referred to in EPD do not appear in corresponding EMBL
feature
tables.
A co-ordinated updating procedure has been set up by the two
laboratories that
will ensure future compatibility between the position references in EPD
and the
sequence data in the main data library. Investigators who access EMBL
via pub-
licly available programs should be aware of the fact that software
producers occasionally modify the sequence data in ways that render position
references
inaccurate. EPD is generally not compatible with sequence data of
another
release because EMBL sequence entries are not designed as stable data
units.
The completeness and accuracy of EPD greatly benefits from userfeedback. Any
report of mistakes or omissions would be very much appreciated. Direct
communication of newly published transcript mapping or gene expression data
is also
welcome. Please forward all correspondence to the address given on top
of this
document. Use electronic mail if possible.
2
PROMOTER SELECTION
EPD is a rigorously selected database. In order to be included in
a promoter must be:
1.
recognized by eukaryotic RNA POL II,
2.
active in a higher eukaryote,
3.
experimentally defined
or sufficiently homologous to a experimentally defined
EPD,
promoter,
4.
biologically functional,
5.
available in the current EMBL release,
6.
distinct from other promoters in the database.
Explanations:
1.
coding
Transcription by RNA POL II is bona fide assumed
genes
for
protein
but must be supported by alpha-amanitin data if the end
product
is an RNA.
1
<PAGE>
2.
All eukaryotes except phycophyta, fungi, myxomycetes, and
protozoa are
considered higher eukaryotes. Note that the expression
"active in"
does not always refer to the source organism of the promoter
(e.g. in
viruses).
3.
tran-
A promoter is
scription
experimentally
initiation
site
determined
if a
corresponding
is mapped with a precision of +/-
5 bp or
higher. Any technique that characterizes the 5'terminus of an
in
vivo
or
in vitro generated RNA is acceptable.
Single nuclease-
protection
or primer-extension data must be accompanied
by
additional
evidence
unless the
gene's intron-exon
organization is well
established.
Homology is considered "sufficient" if similarity (see section
6) is
>=60% between -79 and +20 or >=75% between -49 and +10.
4.
source
A promoter is biologically functional if it contributes to the
organism's
survival
and/or
reproduction.
This is bona fide
assumed
except for promoters of pseudogenes,
minor
transcription
initiation
sites
(<20%
of
total gene transcripts), promoters giving
rise to an
unstable RNA product, and mutant promoter.
5.
The minimum sequence requirement is 45 bp between -49 and +10.
6.
Promoters are considered distinct if they originate from
different
gene loci or different species. Identity is assumed if two
promoters
from the same species exhibit >95% similarity between -79
and +20
while their genetic relationship is unknown. Multiple
isolates of
viruses or transposable elements are considered distinct if at
least
one promoter region fails to fulfill the above similarity
criterion.
3
ASSIGNMENT OF TRANSCRIPTION INITIATION SITE
A eukaryotic promoter is defined as a DNA sequence around a
transcription initiation site.
The position reference to the initiation site is
therefore the
central part of a promoter entry. Its assignment is based
directly on
experimental data shown in an article, proposed adjustments
originating from
consensus sequence considerations being ignored. Averaged positions are
given if
the results of competing groups show minor discrepancies or if the
experiments
suggest multiple initiation sites (see below).
Position references are subject to permanent re-evaluation.
A
transcription
initiation site may be reassigned upon publication of new data.
Position
references are replaced when more upstream sequences of the same promoter
become
available in a new EMBL sequence entry.
Multiple initiation sites preceding the
same structural gene
appear as
alternative promoters if they are clearly separated from each
other or
differentially regulated.
Otherwise, they are considered a single
promoter
region. The minimum distance required between two alternative
initiation sites
is 20 bp.
Three types of promoters are distinguished by one-letter
codes in
order to account for the variety of transcription initiation
patterns in
eukaryotes:
S:
initiate
Single initiation
site:
>90%
of all
reported transcripts
within 10 bp (the experimental data usually do not allow
distinction
between a single cap-site and small mRNA 5'heterogeneity).
2
<PAGE>
M:
initiate
Multiple initiation sites:
within 20 bp.
>75% of
all reported transcripts
R:
Initiation region: <75% of all reported transcripts initiate
within 20
bp.
In sequence entries that contain a complete RNA or DNA genome of a
retrovirus or
a retrovirus-like transposable elements, the position reference points
to the
U3/R boundary of the 3'terminal LTR.
4
FORMAT CONVENTIONS
EPD is distributed as a single file containing a title line followed by a
number
of promoter entries. Interspersed are group headings whose function and
format
are described in the next section. The title line and parts of the
promoter
entries are rigidly formatted so that the entire database conforms
to the
standards of an FPS file (functional position set) of our current signal
search
analysis (1,2) software.
4.1.
The title line
The title line of EPD is shown below:
TI
EPD29
Eukaryotic Promoter Database / Release 29
EP
The TI line contains the following fields:
columns
data type
1- 2
3- 5
6-15
16-70
71-72
"TI"
(blank)
FPS name
title
FPS code
Explanations:
FPS name and FPS code are used by our data
software to
generate default names for output files.
extraction
4.2. Promoter entries
Promoter entries
entries.
are presented in
a similar
format as EMBL
sequence
All lines start with a two-letter code. Columns 3 to 5 are blank and
text does
not exceed column 72. Each entry starts with an FP line that contains a
position
reference to a transcription initiation site,
and ends with a
terminator (//).
Spacer lines (XX) are inserted in order to make the promoter database
easier to
read by eye.
3
<PAGE>
Below is an example of a promoter entry:
FP
010*2
XX
DO
DO
RF
//
Hs c-myc
P2+:+S
PRI:HSMYCC
1+
2490; 11148.053
Experimental evidence: 4,4#,<2>
Expression/Regulation: +mitogen
Cell34:779
EMBOJ2:2375
MCB7:1393
MCB7:2988
4.2.1. The FP line
The FP line contains the following fields and subfields:
columns
1- 2
3- 5
6-30
6-25
26-26
27-27
section 6)
28-28
section 3)
29-29
30-30
31-55
31-45
31-33
data type
"FP"
(blank)
description:
promoter name
":"
independent subset status
type of initiation site
(blank)
hybrid sequence warning
functional position reference:
sequence reference:
EMBL division code
(see
(see
34-34
35-45
46-46
47-47
48-55
56-56
57-62
63-63
64-66
section 6)
67-67
68-72
68-70
71-71
72-72
":"
Entry name
sequence type (0 = circular, 1 = linear)
strand (+ or -)
position number
";"
entry code
"."
homology group number
(see
(blank)
alternative promoter identification code:
gene number
"*"
Initiation site number
Explanations:
a gene
The promoter name begins with a species
code usually followed by
locus or gene product name. Species codes consist of the
initials of
genus and species name. Occasionally, three characters are
required to
generate unique codes. Standard abbreviations identify
viruses.
The
full names of the organisms are given in appendix B.1.
Subspecies or
strains are specified in parentheses. Chromosomal locations
(genetic or
cytogenetic loci, genomic map units,
etc.) appear in square
brackets
immediately following species codes. Many gene products are
referred to
by abbreviations explained in appendix B.3.
Alternative
initiation
sites are identified by right-justified P1,P2.., or E1,E2..,
depending
on whether the corresponding 5'exons are 3'co-terminal or
not.
The
strongest initiation site is marked by trailing + if known.
4
<PAGE>
Hybrid sequence warning:
! indicates that the adjacent
position
reference points to a sequence generated by fusion of
multiple DNA
sequences from different sources (e.g. genomic upstream region +
cDNA).
The entry code is a five-digit number which is the only
part of a
promoter entry that is stable from release to release. The
first two
digits designate the release of initial appearance.
Alternative promoter identification code: Genes represented by
multiple
promoter entries in EPD are assigned a gene number. The
corresponding
initiation sites are numbered sequentially from 5' to 3'.
4.2.2. Documentation
Documentation of promoter entries is presented on lines starting with
"DO".
They are essentially free format and so far not processed by specific
programs.
In the present release, there are two DO lines per entry, the first
referring to
the transcript mapping experiments that define the promoter, the second
giving
information about expression and regulation.
The varies experimental techniques are identified by number codes:
codes
1
2
3
experiments
direct RNA sequencing
length measurement of an RNA product
length measurement of a nuclease-protected complementary
RNA or
4
5
6
cDNA or a
7
comparison
8
9
DNA fragment by comparison with homologous sequence ladder
same as 3 but with heterologous size markers
RNA sequencing by dideoxy-terminated primer extension
DNA Sequencing of an in vitro generated strong-stop
full-length cDNA clone
length measurement of
a primer-extension
product by
with homologous sequence ladder
same as 7 but with heterologous size markers
DNA sequencing of a full-length processed pseudogene
Special characters appended to the number codes designate an
experimental gene
expression system where
the RNA for the
corresponding
experiments was
synthesized.
*
o
#
RNA POL II in vitro system
injected amphibian oocytes
transfected or transformed cells, injected neurons
!
transgenic organisms
Explanations and additional conventions:
- The full-length assumption of a cDNA clone or a proccessed
pseudogene is
based on consistency with accompanying nuclease-protection or
primerextension data or, alternatively,
the existence of multiple
5'coterminal clones or pseudogenes.
- Codes given in parentheses refer to experiments
closely
related genes and RNAs.
performed with
5
<PAGE>
-
Angle brackets enclose low-precision data (error > +/- 5 bp).
The information on expression/regulation may include indication of
developmental
stages, tissues, cell types, cell cycle stages, and various regulatory
features.
Conventions:
- Semicolon delimits different types of specifications (e.g.
developmental
stage and tissue).
-
Comma delimits alternative keywords (e.g. liver,
kidney)
-
"+" means "induced by" or "strongly expressed in".
-
"-" means "repressed by" or "weakly expressed in".
-
"~" means "modulated by".
-
Cell cycle stages are given in square brackets.
4.2.3. Literature references
References are given on lines starting with "RF" in highly
condensed form
beginning with a journal code explained in Appendix B.2. They primarily
point to
the articles where the experimental promoter evidence is presented.
Additional
potential subjects are homology to other promoters,
gene
expression and
regulation, nomenclature. Papers containing only sequence data are
usually not
referred to because they are easy to find via the corresponding EMBL
sequence
entry descriptions.
4.3. Discarded entries
Evaluation of new data occasionally suggests that an existing
promoter entry
should be discarded according to the criteria laid down in section 2.
This is
done by changing "FP" at the beginning of an entry into "F-". An
explanation is
given on the second line. At present, no entry is physically deleted.
4.4. Miscellaneous:
- Greek letters are represented by corresponding latin letters
followed by
apostrophe:
a' = alpha
b' = beta
g' = gamma
d' = delta
e' =
z' = zeta
h' = eta
th'= theta
k' = kappa
l' =
epsilon
lambda
n' = nu
- sub- and superscripts are indicated by preceding "_"
respectively.
5
and
"^",
CLASSIFICATION
The entries of the Eukaryotic Promoter Database are embedded in a
hierarchical
classification system.
A promoter's taxonomic location is made
clear by
interspersed group headings. The example shown below is taken from top
of the
database. A contrasting format has been chosen to emphasize the very
different
nature of this information.
6
<PAGE>
*---------------------------------------------------------------------*
*
1. Plant promoters
*
*---------------------------------------------------------------------*
*
1.1. Chromosomal genes
*
*---------------------------------------------------------------------*
*
1.1.1. Small nuclear RNAs
*
*---------------------------------------------------------------------*
A group heading consists of a series of node numbers and a title.
The
highest
classification level distinguishes between promoters active in major
eukaryotic
taxa (phyla). Further below, grouping considers replicon type and
functional
properties of gene products.
On the lowest level,
homology (as
defined in
section 6) is the criterion. A survey of the upper part of the
classification
pyramid is presented in appendix A.
The proposed classification system has a highly tentative character as
it is
often unclear how a new promoter should be classified, especially if
the gene
product is a multifunctional protein.
Users should therefore not be
surprised
or discouraged if they don't find a promoter at the initially expected
place.
6
HOMOLOGOUS PROMOTERS
Homology is defined as sequence similarity due to common phylogenetic
origin. In
EPD, two promoters are considered homologous if they exhibit >=50%
sequence
similarity between -79 and +20.
Similarity is calculated from optimal alignments generated with the aid
of the
UWGCG subroutine ShiftAlign (3) using the following symbol comparison
table:
A
C
G
N
T
1.0
0.0
1.0
0.0
0.0
1.0
0.5
0.5
0.5
0.5
0.0
0.0
0.0
0.5
1.0
A
C
G
N
T
Gap weight and gap length weight are specified as 3 and 0,
respectively.
Terminal gaps are ignored.
Percent similarity is understood as
alignment score
divided by segment length, times 100.
Groups of homologous promoters are identified by homology group
numbers (see
4.2.1.). Definition of these groups is based on similarity scores as
defined
above and a tree generation method called UPGMA (4). A few scores
between 50%
and 56% obtained from alignments of supposedly unrelated promoters were
ignored
as well as those resulting from alternative promoters spaced by <=50 bp.
A subset of "independent" promoters is marked by "+" in column 27 of
the FP
line.
This set contains only one member per homology group
(usually,
the
promoter with the longest upstream sequence available) and is intended
to be
used for statistical analysis of functional patterns where it is
important to
avoid bias by multiples of closely related sequences.
7
<PAGE>
7
PROMOTER SEQUENCE RETRIEVAL
Promoter sequence listings have not been incorporated into EPD for two
reasons:
(i) to avoid duplication of data already existing elsewhere in the
EMBL data
library,
and (ii)
to encourage usage of FPS-dependent sequence
retrieval
programs which enables the user to specify suitable 5'- and 3'boundaries
of the
requested sequence segments himself.
Effort is under way to motivate
producers
of standard nucleotide sequence analysis packages to provide such tools
in the
future. In the meantime, users with some programming experience will
find it
easy to write their own routines. Our local sequence extraction programs
run in
a UWGCG environment (3) and have been implemented at several sites in
Europe and
the United States. They are documented and freely available on request.
References:
1. Bucher,P. & Trifonov,E.N., "Compilation and analysis of
eukaryotic
POL II promoter sequences", Nucl. Acids Res. 14, 10009-10026
(1986).
2.
local-
Bucher,P.
& Bryan,B., "Signal search analysis: a new method to
ize and characterize functionally important DNA sequences",
Nucl. Acids
Res. 12, 287-305 (1984).
3. Devereux,J.,
Haeberli,P., & Smithies,O.,
"A
comprehensive set
of sequence analysis programs for the VAX", Nucl.
Res. 12,
387-395 (1984).
4. Sneath,H.A. & Sokal,R.R.,
"Numerical
Freemann,
San Francisco, London (1973).
taxonomy",
Acids
W.H.
8
<PAGE>
APPENDIX A
SURVEY OF RELEASE 29
Total number of promoter entries (independent entries)
979 ( 651)
1. Plant promoters
100 (
70)
1.1. Chromosomal genes
88 (
59)
1.1.1.
1.1.2.
1.1.3.
1.1.4.
1.1.5.
1.1.6.
1.1.7.
7
9
23
28
1
8
12
(
(
(
(
(
(
(
3)
9)
13)
19)
1)
6)
8)
1.2. Prokaryotic plasmid DNA
8 (
7)
1.2.1. Enzymes
1.2.2. Unclassified
4 (
4 (
3)
4)
1.3. Viral genes
4 (
4)
1.3.1. Geminiviruses
1.3.2. Cauliflower mosaic virus
2 (
2 (
2)
2)
2. Nematode promoters
8 (
7)
2.1. Chromosomal genes
8 (
7)
2.1.1.
2.1.2.
2.1.3.
2.1.4.
1
4
1
2
1)
3)
1)
2)
Small nuclear RNAs
Structural proteins
Storage and transport proteins, apoproteins
Enzymes
Regulatory proteins
Proteins related to stress or pathogen defense
Unclassified
Structural proteins
Storage and transport proteins, apoproteins
Hormones, growth factors, regulatory proteins
Proteins related to stress or pathogen defense
(
(
(
(
3. Arthropode promoters
153 ( 101)
3.1. Chromosomal genes
147 (
95)
1
67
7
21
24
13
14
(
(
(
(
(
(
(
1)
34)
5)
10)
23)
9)
13)
3.2. Transposable elements and retroviruses
2 (
2)
3.2.1. Long terminal repeats
2 (
2)
3.3. Viral genes
4 (
4)
3.3.1. Nuclear polyhedrosis viruses (early genes only)
4 (
4)
4. Mollusc promoters
3 (
3)
4.1. Chromosomal genes
3 (
3)
4.1.1. Hormones, growth factors, regulatory proteins
3 (
3)
5. Echinoderm promoters
27 (
17)
5.1. Chromosomal genes
27 (
17)
5.1.1. Small nuclear RNAs
5.1.2. Structural proteins
5.1.3. Storage and transport proteins, apoproteins
2 (
24 (
1 (
1)
15)
1)
3.1.1.
3.1.2.
3.1.3.
3.1.4.
3.1.5.
3.1.6.
3.1.7.
Small nuclear RNAs
Structural proteins
Storage and transport proteins, apoproteins
Enzymes
Hormones, growth factors, regulatory proteins
Proteins related to stress or pathogen defense
Unclassified
A-1
<PAGE>
6. Vertebrate promoters
688 ( 453)
6.1. Chromosomal genes
530 ( 338)
6.1.1.
6.1.2.
6.1.3.
6.1.4.
6.1.5.
6.1.6.
25
108
95
88
140
52
Small nuclear RNAs
Structural proteins
Storage and transport proteins, apoproteins
Enzymes
Hormones, growth factors, regulatory proteins
Proteins related to stress or pathogen defense
(
(
(
(
(
(
7)
82)
47)
65)
86)
35)
6.1.7. Unclassified
22 (
16)
6.2. Transposable elements and retroviruses
30 (
13)
6.2.1. Long terminal repeats
30 (
13)
6.3. Viral genes
6.3.1.
6.3.2.
6.3.3.
6.3.4.
6.3.5.
6.3.6.
6.3.7.
128 ( 102)
Herpes viruses (not EBV)
Epstein-Barr virus and other g'-Herpesviruses
Adenoviruses
Papilloma viruses
Parvoviruses
Papovaviruses (not papilloma)
Hepadnaviruses
48
23
24
9
6
8
10
(
(
(
(
(
(
(
42)
23)
12)
8)
6)
5)
6)
A-2
<PAGE>
APPENDIX B.1
SPECIES CODES
Code
AAV2
Ac
AcNPV
Ad2
Ad5
Ad7
Ad12
Ag
ALV
Am
A-MLV
Ap
Scientific name (English name)
Adeno-associated virus 2
Aplysia californica (gastropod mollusk)
Autographa californica nuclear polyhedrosis virus
Human adenovirus type 2
Human adenovirus type 5
Human adenovirus type 7
Human adenovirus type 12
Ateles geoffroyi (spider monkey)
Avian leukemia virus
Antirrhinum majus (snapdragon)
Abelson murine leukemia virus
Antheraea polyphemus (silkmoth)
At (plants)
At (vertebrates)
At[pTi..
Ay
B19
BKV
BLV
Bm
BPV1
Bt
CaMV
Cc
Cco
Ce
Ch
Cl
Cm
Ct
Dc
Df
Dh
DHBV
Dm
Dma
Dmo
Dmu
Do
Dp
Ds
Dse
Dv
EBV
Ec
FBJ-MSV
FBR-MSV
F-MCF
Fs
Arabidopsis thaliana (fam. cruciferae)
Aotus trivirgatus (owl or night monkey)
Agrobacterium tumefaciens Ti plasmid
Antheraea yamamai (Japanese oak silkmoth)
Human parvovirus B19
(Human) papovavirus BK
Bovine leukemia virus
Bombyx mori (silkmoth)
Bovine papilloma virus type 1
Bos taurus (cattle)
Cauliflower mosaic virus
Cricetus cricetus (Chinese hamster)
Coturnix coturnix (Quail)
Caenorhabditis elegans (nematode)
Capra hircus (goat)
Canis lupus (dog)
Cairina moschata (duck)
Chironomus thummi (midge)
Daucus carota (carrot)
Drosophila funebris
(fruit fly)
Drosophila hydei
(fruit fly)
Duck hepatitis virus
Drosophila melanogaster (fruit fly)
Drosophila mauritiana
(fruit fly)
Drosophila mojavensis
(fruit fly)
Drosophila mulleri
(fruit fly)
Drosophila orena
(fruit fly)
Drosophila pseudoobscura (fruit fly)
Drosophila simulans
(fruit fly)
Drosophila sechellia
(fruit fly)
Drosophila virilis
(fruit fly)
(Human) Epstein-Barr virus
Equus cavallus (horse)
Finkel-Biskis-Jinkins murine osteosarcoma virus
Finkel-Biskis-Reilly murine osteosarcoma virus
(Murine) Friend mink cell focus-inducing virus
Felis silvestris (cat)
B-1
<PAGE>
F-SFFV
GA-FeLV
GALV
Gg
Gg[ev1]
Ggo
Gm
GSHV
H-1
HBV
HCMV
(Murine) Friend spleen focus forming virus
Gardner-Arnstein feline leukemia virus
Gibbon ape leukemia virus
Gallus gallus (chicken)
(Avian) endogenous virus 1
Gorilla gorilla (gorilla)
Glycine max (soybean)
Ground squirrel hepatitis virus
(Murine) H-1 parvovirus
Human hepatitis B virus
Human cytomegalovirus
Hg
HIV-1
HIV-2
HPV16
HPV18
Hs
HSV-1
HSV-2
HTLV-I
HTLV-II
Hv
HVS
JCV
Le (plants)
Le (vertebrates)
Lm
Lp
Lv
Ma
Mc
MCF
MCMV
MLV
Mm
M-MLV
M-MSV
MMTV
Ms
MSV
Np
Nt
Oa
Oc
Or
Ph
Pa
Pc
Pm
Polyoma
Pp (arthropodes)
Pp (vertebrates)
Ps
Pt
Pv
RAV2
Rc
R-MCF
Halichoerus grypus (grey seal)
Human immunodeficiency virus type 1
Human immunodeficiency virus type 2
Human Pappilloma virus 16
Human Pappilloma virus 18
Homo sapiens (man)
Human herpes simplex virus type 1
Human herpes simplex virus type 2
Human T-cell leukemia virus type I
Human T-cell leukemia virus type II
Hordeum vulgare (barley)
Herpesvirus saimiri
(Human) papovavirus JC
Lycopersicon esculentum
Lepus europeaeus (hare)
Locusta migratoria
Lytechinus pictus (sea urchin)
Lytechinus variegatus (sea urchin)
Mesocricetus aureus (golden hamster)
Macaca cynomolgus (macaque)
Mink cell focus-inducing virus
Murine cytomegalovirus
Murine leukemia virus
Mus musculus (mouse)
Moloney murine leukemia virus
Moloney murine sarcoma virus
Mouse mammary tumor virus
Medicago sativa (alfalfa)
Maize streak virus
Nicotiana plumbaginifolia
Nicotiana tabacum (tobacco)
Ovis aries (sheep)
Oryctolagus cuniculus (rabbit)
Oryza sativa (rice)
Petunia hybrida (e.g. Petunia strain Mitchell)
Papio anubis (olive baboon)
Petroselinum crispum (parsley)
Psammechinus miliaris (sea urchin)
(Murine) polyoma virus
Photinus pyralis
Pongo pygmaeus (orangutan)
Pisum sativum (pea)
Pan troglodytes (chimpanzee)
Phaseolus vulgaris (french bean, kidney bean)
(Avian) Rous associated virus type 2
Ricinus communis
(Murine) Rauscher mink cell focus-inducing virus
B-2
<PAGE>
Rn
Rattus norvegicus (rat)
RSV
SA7P
Sd
Se
Sg
SIV-III
SNV
So
Sp (arthropodes)
Sp (echinoderms)
Ss
SSV
St
SV40
Ta
Visna
Xl
Xt
Zm
(Avian) Rous sarcoma virus
Simian adenovirus 7P
Strongylocentrotus drobachiensis (sea urchin)
Spalax ehrenbergi (blind mole rat)
Salmo gairdneri (rainbow trout)
Simian immunodeficiency virus type III
(Avian) spleen necrosis virus
Spinacia oleracea (spinach)
Sarcophaga peregrina (flesh fly)
Strongylocentrotus purpureatus, (sea urchin)
Sus scrofa (pig)
Simian sarcoma virus
Solanum tuberosum (potato)
Simian virus 40
Triticum aestivum (wheat)
Visna lentivirus
Xenopus laevis
(clawed frog)
Xenopus tropicalis (clawed frog)
Zea mays (maize)
B-3
<PAGE>
APPENDIX B.2
JOURNAL CODES
Code
Journal Name
ARB
ARP
BBA
BBRC
Bch
Bchi
BchJ
BrJR
Btech
CanR
Cell
Chrom
CSHS
CTMI
CurG
DNA
DevB
EJBc
EMBOJ
Evo
FEBS
GDev
Gene
Genom
Gnts
ImTo
JBC
JBch
JCB
JEM
JGV
JMAG
JMB
JME
JVir
MBE
Annual Review of Biochemistry
Annual Review of Physiology
Biochimica Biophysica Acta
Biochemical and Biophysical Research Communications
Biochemistry
Biochimie
Biochemical Journal
British Journal of Rheumatology
Biotechnology
Cancer Research
Cell
Chromosoma
Cold Spring Harbor Symposia on Quantitative Biology
Current Topics in Microbiology and Immunology
Current Genetics
DNA
Developmental Biology
European Journal of Biochemistry
EMBO Journal
Evolution
FEBS Letters
Genes and Development
Gene
Genomics
Genetics
Immunology Today
Journal of Biological Chemistry
Journal of Biochemistry
Journal of Cell Biology
Journal of Experimental Medicine
Journal of General Virology
Journal of Molecular and Applied Genetics
Journal of Molecular Biology
Journal of Molecular Evolution
Journal of Virology
Molecular Biology and Evolution
MBM
MBR
MCB
MEnd
MEnz
MGG
MNeub
MPMI
NAR
Nat
Pla
PMB
PSL
Molecular Biology and Medicine
Molecular Biology Reports
Molecular and Cellular Biology
Molecular Endocrinology
Methods in Enzymology
Molecular and General Genetics
Molecular Neurobiology
Molecular Plant-Microbe Interactions
Nucleic Acids Research
Nature
Planta
Plant Molecular Biology
Plant Science Letters
B-4
<PAGE>
PNAS
United
Sci
SCMG
TiG
Vir
VirR
Proceedings of the National Academy of Sciences of the
States of America
Science
Somatic Cell and Molecular Genetics
Trends in Genetics
Virology
Virus Research
B-5
<PAGE>
APPENDIX B.3
ABBREVIATIONS
20-OHE
4CL
a1
abd-g.
abl
AChR
ACTH
ADA
ADH
ADPg-s GT
adult-HA
AFW1
(AGM)
AGP
AIRS
ALA-synt.
ALDH_2
AlkExo
20-Hydroxyecdysone
4-coumarate coenzyme A ligase
Gene locus 1 involved in anthocyanin biosynthesis
Abdominal ganglion
Abelson murine leukemia virus oncogene
Acetylcholin receptor
Adrenocorticotropic hormone
Adenosine deaminase
Alcohol dehydrogenase
ADPglucose-starch glucosyltransferase
Adult hermaphrodite
Adult fast-white (myosin heavy chain) 1
"from african green monkey"
Acid glycoprotein
Aminoimidazole ribonucleotide synthase
5-Aminolevulinate synthase
Aldehyde dehydrogenase 2
Alkaline exonuclease
Amy
antp
aP2
apolipop.
apoVLDLII
APRT
AR
arg
AS
AS-C
AspAT
ass.
AT
ATCase
ATP
awd
BB
Bcl-2
b.p.
BPTI
BSF
bsg25D
cc1
CA
cab
cAMP
cc-ind.
CD3
CD4
CD8
CG
CNS
cp
Amylase
"antennapedia" locus
Adipocyte homologue of myelin P2
Apolipoprotein
Very low densitiy apolipoprotein II
Adenine phosphoribosyltransferase
Adrenergic receptor
Arginine
Argininosuccinate synthetase
"achaete-scute" complex locus
Aspartate aminotransferase
Associated
Antitrypsin
Aspartate transcarbamylase
Adenosinetriphosphate
"abnormal wing disk" locus
Bowman-Birk (protease inhibitor)
B-cell leukemia/lymphoma 2 proto-oncogene
Binding protein
Bovine pancreatic trypsin inhibitor
B-cell stimulating factor
Blastoderm specific locus 25D
Cellular protooncogene ..
Regulatory locus of anthocyanin synthesis (maize)
Carbonic anhydrase
Chlorophyll a/b-binding protein
Cyclic AMP (Adenosinemonophosphate)
Cell cycle-independent
T-cell differentiation antigen CD3
T-cell differentiation antigen CD4
T-cell differentiation antigen CD8
Chorionic gonadotropin
Central nervous system
Cytoplasm(ic)
B-6
<PAGE>
CPSase
CRP
cs
CSF
cyt
dbp
DDC
dep.
dev.
DHFR
diff.
DL/R
dUTPase
E
Carbamyl-phosphate synthase
C-reactive protein
Cytosol(ic)
Colony stimulating facter
Cytokinin gene (coding for isopentenyltransferase)
DNA binding protein
DOPA decarboxylase
dependent
Development(ally)
Dihydrofolate reductase
differentiation, differentiated
Left and right duplicated region
Deoxyuridinetriphosphatase
1. Early, 2. Erythroid cell-specific
EBNA
EDF
EFW1
EGF
EIa
Eip
ELH
em
erbA,B
E-resp.
ERV3
E.Tn
eve
f.
fibrob.
fos
oncogene
FSH
ftz
GA
GADPH
GARS
Gart
GART
gC
G-CSF
gD
GdX
gE
GFAP
gln
glucc
GM-CSF
gp
GPD
GRF
GRP
GS17
GSHPx
G-spec.
GST
H
Ha-ras
hb
Hc
Epstein-Barr virus nuclear antigens
Eosinophil differentiation factor
Embryonic fast-white (myosin heavy chain) 1
Epidermal growth factor
Adenovirus early Ia region (transactivating element)
Ecdysone-induced protein
Egg-laying hormone
Embryo, embryonic
(Avian) erythroblastosis virus oncogene A,B
Estrogen-responsive
Endogenous retrovirus 3
Early transposon
"even-skipped" locus
Factor
Fibroblasts
FBJ (Finkel-Biskis-Jinkins) osteosarcoma virus
Follicle stimulating hormone
"fushi tarazu" locus
Gibberellic acid
Glyceraldehyde-3-phosphate dehydrogenase
Glycinamide ribonucleotide synthase
"Gart" locus (-> GARS, AIRS, GART)
Glycinamide ribonucleotide transformylase
Glycoprotein C
Granulocyte colony stimulating factor
Glycoprotein D
X-linked gene downstream of G6PD gene
Glycoprotein E
Glial fibrillary acidic protein
Glutamine
Glucocorticoid
Granulocyte/Macrophage colony stimulating factor
Glycoprotein
Glycerol-3-phosphate dehydrogenase
Growth hormone-releasing factor
Glycine-rich (cell wall) protein
Gastrula-specific transcript 17
Gluthathione peroxidase
Gastrula-specific
Gutathione S-transferase
1. Heavy chain, 2. Housekeeping-type promoter
Rat-derived Harvey murine sarcoma virus oncogene
"hunchbank" locus
High-cysteine (chorion protein)
B-7
<PAGE>
HGT
hist.
HMG-CoA
High-(glycine+tyrosine) keratin
Histone
3-Hydroxy-3-methylglutaryl coenzyme A
HPRT
hs
hsc
HSF
hsp
HTF
IAP
ICP
IE
IF
IFI
IFN
Ig
IGF
IL
inf.
inh.
ISG
kin.
Ki-ras
L
larva-1,2,..
LCAT
LDH
leghem.
LeIF
LH
LHC
LMW
LPH
LPS
MBP
(MAC)
MC
MCK
mGK
MHCI/MHCII
MIF
mit
mononuc-c.
MOPC..
mos
MP
MPC..
MRP
MSF
msp
MT
mst
MUP
myb
myc
neu
Hypoxanthine phosphoribosyltransferase
Heatshock
Constitutive analogue of heatshock gene/protein
Hepatocyte-stimulating factor
Heatshock protein
Restriction endonuclease HpaII tiny fragments
Intracisternal A-particles
Infected cell protein
Immediate early (gene, RNA)
Intermediate filament
Interferon-induced gene/protein
Interferon
Immunoglobulin
Insulin-like growth factor
Interleukin
Infected
Inhibitor
Interferon-stimulated gene
Kinase
Rat-derived Kirsten murine sarcoma virus oncogene
1. Light chain; 2. Late
First, second, .. instar larva
Lecithin-cholesterol acyltransferase
Lactate dehydrogenase
Leghemoglobin
Leukocyte interferon
Luteinizing hormone
Light-harvesting complex
Low molecular weight
Lipotropic hormone
Lipopolysaccharide
Myelin basic protein
Macaque
Methylcholanthrene
Muscle-specific creatine kinase
Submaxillary gland kallikrein
Class I/II transplantation antigens of major
histocompatibility complex
Macrophage migration inhibitory factor
Mitochondrial
Mononuclear cells
Mineral oil-induced plasmacytoma
Moloney murine sarcoma virus oncogene
Macrophage
Mouse plasma cell tumor
MIF-related protein (see MIF)
Megakaryocyte stimulating factor
Major sperm protein gene
Metallothionein
Male-specific transcript
Major urinary protein
(Avian) myeoloblastosis virus oncogene
Myelocytomatosis virus 29 oncogene
Ethyl-nitrosurea-induced rat neuroblastoma oncogene
neuropep.
Neuropeptide
B-8
<PAGE>
NGF
ninaE
nos
N-ras
NS
ocs
Ori
ovalb.
p.
P-450
p53
panc.
parath.
PB
PBGD
PDGF
PEPCK
PG
PHA
PK
P_L
POL
POMC
pp..
PR1a
PRL
prog.
PrP
PSBP
PSP
pTiN
pTiO
r
R
ras
rec.
red.
reg.
rep-dep.
RNR1, RNR2
rp
rTn
RuBPCss
s.
saliv-g.
SBP
sem-v.
Nerve growth factor
"neither inactivation nor afterpotential" locus E
Nopaline synthetase
Neuroblastoma ras-like (-> Ha-ras) oncogene
Nervous system
Octopine synthetase
Origin of replication
Ovalbumin
Protein
Cytochrome P-450
53K phosphoprotein
pancreas, pancreatic
Parathyroid
Phenobarbital
Porphobilinogen deaminase
Platelet-derived growth factor
Phosphoenolpyruvate carboxykinase
Prostaglandin
Phytohemagglutinin
Protein kinase
Late promoter
Polymerase
Proopiomelanocortin
Phosphoprotein ..
Pathogenesis-related protein 1a
Prolactin
Progesterone
Prion protein
Prostatic steroid binding protein
Parotid secretory protein
Nopaline type tumor inducing plasmid
Octopine type tumor inducing plasmid
"rudimentary" locus
Regulatory subunit
Homologue of -> Ha-ras, Ki-ras, etc.
Receptor
Reductase
Regulated
Replication-dependent
Ribonucleotide reductase large, small subunit
Ribosomal protein
Retrotransposon
Ribulose-1,5-biphosphate carboxylase small subunit
Small
Salivary gland
Spermine-binding protein
Seminal vesicle
ser.
sgs
sis
sk-m.
skel-m.
smooth-m.
snRNA
SOD
som
spat-reg.
sry
Serum
Salivary gland secretion protein
Simian sarcoma virus oncogene
Skeletal muscle
Skeletal muscle
Smooth muscle
Small nuclear RNA
Superoxide dismutase
Somatic
Spatially regulated
"serendipity" locus
B-9
<PAGE>
SV40T
SVS
synt.
T3d'
chain
TAT
TCDD
TCGF
TCR
TdT
test.
TF
TH
thyr.
Thy-1.2
TIF
tis.
TM
tmr
TNF
TnT
TO
TPA
TPI
tr.,trTRF
TS
TSH
T/t
Ubx
uPA
URO-D
Vg1
vir-inf.
VL30
V_NP
Tumor antigen of simian virus 40 (SV40)
Seminal vesicle secretory protein
Synthase
T-cell antigen receptor-associated T3-complex delta
Tyrosine aminotransferase
2,3,7,8-Tetrachlorodibenzo-p-dioxin
T-cell growth factor
T-cell receptor
Terminal deoxynucleotidyltransferase
testis
Transcription factor
Tyrosin hydroxylase
Thyroxine
Thy-1 (thymocyte) antigen/glycoprotein allotype 2
Trans-inducing factor
Tissue
Tropomyosin
"tumor morphology root" locus
Tumor necrosis factor
Troponin T (tropomyosin-binding subunit)
Tryptophan oxygenase
12-O-tetradecaonyl-phorbol-13-acetate
Triosephosphate isomerase
Transcript
T-cell replacing factor
Thymidylate sythetase
Thyroid stimulating hormone
Large/small T(tumor) antigen
"ultrabithorax" locus
Urine plasminogen activator
Uroporphyrinogen decarboxylase
Vegetal hemisphere-specific mRNA 1
Viral infection
Retrovirus-like 30s RNA
(Immunoglobulin heavy chain) variable region specific
for 4-hydroxyl-3-nitrophenacetyl
VP5
VSP
vWf
Virion protein 5 (HSV-1/2: =major capsid protein)
Virion stimulatory protein
von Willebrand factor
B-10
Download