Eukaryotic Promoter Database User Manual Written by: Philipp Bucher Biocomputing Institut Suisse de Recherches Experimentales sur le Cancer Ch. des Boveresses 155 CH-1066 Epalinges s/Lausanne Switzerland Electronic mail: Philipp.Bucher@Isrec.Arcom.CH This manual and the database it accompanies may be copied and redistributed freely, without advance permission, provided that this statement is reproduced with each copy. Published Research assisted by the Eukaryotic Promoter Database should cite: Philipp Bucher (1991). The Eukaryotic Promoter Database of the Weizmann Institute of Science. EMBL Nucleotide Sequence Data Library Release Postfach 10.2209, D-6900 Heidelberg. 29, <PAGE> Eukaryotic Promoter Database User Manual Release 29, November 1991 CONTENTS 1. 2. 3. 4. 4.1. 4.2. 4.2.1. 4.2.2. 4.2.3. 4.3. 4.4. 5. 6. 7. INTRODUCTION . . . . . . . . PROMOTER SELECTION . . . . . ASSIGNMENT OF INITIATION SITE FORMAT CONVENTIONS . . . . . The title line . . . . . . Promoter entries . . . . . The FP line . . . . . . . . Documentation . . . . . . . Literature references . . . Discarded entries . . . . . Miscellaneous . . . . . . . CLASSIFICATION . . . . . . . HOMOLOGOUS PROMOTERS . . . . PROMOTER SEQUENCE RETRIEVAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 3 3 4 5 6 6 6 6 7 8 APPENDIX A SURVEY OF RELEASE 29 APPENDIX B CODES AND ABBREVIATIONS B.1 B.2 B.3 SPECIES CODES . . . . . . . . . . . . . . . . . . B-1 JOURNAL CODES . . . . . . . . . . . . . . . . . . B-4 ABBREVIATIONS . . . . . . . . . . . . . . . . . . B-6 <PAGE> 1 INTRODUCTION The Eukaryotic Promoter Database EPD is developed and maintained at the Weizmann Institute of Science in Rehovot (Israel) and distributed as a supplement to the EMBL Data Library. It provides information about eukaryotic promoters available in the EMBL Data Library and is intended to assist experimental researchers, as well as computer analysts, in the investigation of eukaryotic transcription signals. The present version originated from a previous compilation published in an article (1) and is organized as a hierarchically ordered and documented "functional position set" (2) pointing to transcription initiation sites. All information is directly abstracted from scientific literature and is thus independent of the EMBL sequence entry descriptions. As a consequence, many of the initiation sites referred to in EPD do not appear in corresponding EMBL feature tables. A co-ordinated updating procedure has been set up by the two laboratories that will ensure future compatibility between the position references in EPD and the sequence data in the main data library. Investigators who access EMBL via pub- licly available programs should be aware of the fact that software producers occasionally modify the sequence data in ways that render position references inaccurate. EPD is generally not compatible with sequence data of another release because EMBL sequence entries are not designed as stable data units. The completeness and accuracy of EPD greatly benefits from userfeedback. Any report of mistakes or omissions would be very much appreciated. Direct communication of newly published transcript mapping or gene expression data is also welcome. Please forward all correspondence to the address given on top of this document. Use electronic mail if possible. 2 PROMOTER SELECTION EPD is a rigorously selected database. In order to be included in a promoter must be: 1. recognized by eukaryotic RNA POL II, 2. active in a higher eukaryote, 3. experimentally defined or sufficiently homologous to a experimentally defined EPD, promoter, 4. biologically functional, 5. available in the current EMBL release, 6. distinct from other promoters in the database. Explanations: 1. coding Transcription by RNA POL II is bona fide assumed genes for protein but must be supported by alpha-amanitin data if the end product is an RNA. 1 <PAGE> 2. All eukaryotes except phycophyta, fungi, myxomycetes, and protozoa are considered higher eukaryotes. Note that the expression "active in" does not always refer to the source organism of the promoter (e.g. in viruses). 3. tran- A promoter is scription experimentally initiation site determined if a corresponding is mapped with a precision of +/- 5 bp or higher. Any technique that characterizes the 5'terminus of an in vivo or in vitro generated RNA is acceptable. Single nuclease- protection or primer-extension data must be accompanied by additional evidence unless the gene's intron-exon organization is well established. Homology is considered "sufficient" if similarity (see section 6) is >=60% between -79 and +20 or >=75% between -49 and +10. 4. source A promoter is biologically functional if it contributes to the organism's survival and/or reproduction. This is bona fide assumed except for promoters of pseudogenes, minor transcription initiation sites (<20% of total gene transcripts), promoters giving rise to an unstable RNA product, and mutant promoter. 5. The minimum sequence requirement is 45 bp between -49 and +10. 6. Promoters are considered distinct if they originate from different gene loci or different species. Identity is assumed if two promoters from the same species exhibit >95% similarity between -79 and +20 while their genetic relationship is unknown. Multiple isolates of viruses or transposable elements are considered distinct if at least one promoter region fails to fulfill the above similarity criterion. 3 ASSIGNMENT OF TRANSCRIPTION INITIATION SITE A eukaryotic promoter is defined as a DNA sequence around a transcription initiation site. The position reference to the initiation site is therefore the central part of a promoter entry. Its assignment is based directly on experimental data shown in an article, proposed adjustments originating from consensus sequence considerations being ignored. Averaged positions are given if the results of competing groups show minor discrepancies or if the experiments suggest multiple initiation sites (see below). Position references are subject to permanent re-evaluation. A transcription initiation site may be reassigned upon publication of new data. Position references are replaced when more upstream sequences of the same promoter become available in a new EMBL sequence entry. Multiple initiation sites preceding the same structural gene appear as alternative promoters if they are clearly separated from each other or differentially regulated. Otherwise, they are considered a single promoter region. The minimum distance required between two alternative initiation sites is 20 bp. Three types of promoters are distinguished by one-letter codes in order to account for the variety of transcription initiation patterns in eukaryotes: S: initiate Single initiation site: >90% of all reported transcripts within 10 bp (the experimental data usually do not allow distinction between a single cap-site and small mRNA 5'heterogeneity). 2 <PAGE> M: initiate Multiple initiation sites: within 20 bp. >75% of all reported transcripts R: Initiation region: <75% of all reported transcripts initiate within 20 bp. In sequence entries that contain a complete RNA or DNA genome of a retrovirus or a retrovirus-like transposable elements, the position reference points to the U3/R boundary of the 3'terminal LTR. 4 FORMAT CONVENTIONS EPD is distributed as a single file containing a title line followed by a number of promoter entries. Interspersed are group headings whose function and format are described in the next section. The title line and parts of the promoter entries are rigidly formatted so that the entire database conforms to the standards of an FPS file (functional position set) of our current signal search analysis (1,2) software. 4.1. The title line The title line of EPD is shown below: TI EPD29 Eukaryotic Promoter Database / Release 29 EP The TI line contains the following fields: columns data type 1- 2 3- 5 6-15 16-70 71-72 "TI" (blank) FPS name title FPS code Explanations: FPS name and FPS code are used by our data software to generate default names for output files. extraction 4.2. Promoter entries Promoter entries entries. are presented in a similar format as EMBL sequence All lines start with a two-letter code. Columns 3 to 5 are blank and text does not exceed column 72. Each entry starts with an FP line that contains a position reference to a transcription initiation site, and ends with a terminator (//). Spacer lines (XX) are inserted in order to make the promoter database easier to read by eye. 3 <PAGE> Below is an example of a promoter entry: FP 010*2 XX DO DO RF // Hs c-myc P2+:+S PRI:HSMYCC 1+ 2490; 11148.053 Experimental evidence: 4,4#,<2> Expression/Regulation: +mitogen Cell34:779 EMBOJ2:2375 MCB7:1393 MCB7:2988 4.2.1. The FP line The FP line contains the following fields and subfields: columns 1- 2 3- 5 6-30 6-25 26-26 27-27 section 6) 28-28 section 3) 29-29 30-30 31-55 31-45 31-33 data type "FP" (blank) description: promoter name ":" independent subset status type of initiation site (blank) hybrid sequence warning functional position reference: sequence reference: EMBL division code (see (see 34-34 35-45 46-46 47-47 48-55 56-56 57-62 63-63 64-66 section 6) 67-67 68-72 68-70 71-71 72-72 ":" Entry name sequence type (0 = circular, 1 = linear) strand (+ or -) position number ";" entry code "." homology group number (see (blank) alternative promoter identification code: gene number "*" Initiation site number Explanations: a gene The promoter name begins with a species code usually followed by locus or gene product name. Species codes consist of the initials of genus and species name. Occasionally, three characters are required to generate unique codes. Standard abbreviations identify viruses. The full names of the organisms are given in appendix B.1. Subspecies or strains are specified in parentheses. Chromosomal locations (genetic or cytogenetic loci, genomic map units, etc.) appear in square brackets immediately following species codes. Many gene products are referred to by abbreviations explained in appendix B.3. Alternative initiation sites are identified by right-justified P1,P2.., or E1,E2.., depending on whether the corresponding 5'exons are 3'co-terminal or not. The strongest initiation site is marked by trailing + if known. 4 <PAGE> Hybrid sequence warning: ! indicates that the adjacent position reference points to a sequence generated by fusion of multiple DNA sequences from different sources (e.g. genomic upstream region + cDNA). The entry code is a five-digit number which is the only part of a promoter entry that is stable from release to release. The first two digits designate the release of initial appearance. Alternative promoter identification code: Genes represented by multiple promoter entries in EPD are assigned a gene number. The corresponding initiation sites are numbered sequentially from 5' to 3'. 4.2.2. Documentation Documentation of promoter entries is presented on lines starting with "DO". They are essentially free format and so far not processed by specific programs. In the present release, there are two DO lines per entry, the first referring to the transcript mapping experiments that define the promoter, the second giving information about expression and regulation. The varies experimental techniques are identified by number codes: codes 1 2 3 experiments direct RNA sequencing length measurement of an RNA product length measurement of a nuclease-protected complementary RNA or 4 5 6 cDNA or a 7 comparison 8 9 DNA fragment by comparison with homologous sequence ladder same as 3 but with heterologous size markers RNA sequencing by dideoxy-terminated primer extension DNA Sequencing of an in vitro generated strong-stop full-length cDNA clone length measurement of a primer-extension product by with homologous sequence ladder same as 7 but with heterologous size markers DNA sequencing of a full-length processed pseudogene Special characters appended to the number codes designate an experimental gene expression system where the RNA for the corresponding experiments was synthesized. * o # RNA POL II in vitro system injected amphibian oocytes transfected or transformed cells, injected neurons ! transgenic organisms Explanations and additional conventions: - The full-length assumption of a cDNA clone or a proccessed pseudogene is based on consistency with accompanying nuclease-protection or primerextension data or, alternatively, the existence of multiple 5'coterminal clones or pseudogenes. - Codes given in parentheses refer to experiments closely related genes and RNAs. performed with 5 <PAGE> - Angle brackets enclose low-precision data (error > +/- 5 bp). The information on expression/regulation may include indication of developmental stages, tissues, cell types, cell cycle stages, and various regulatory features. Conventions: - Semicolon delimits different types of specifications (e.g. developmental stage and tissue). - Comma delimits alternative keywords (e.g. liver, kidney) - "+" means "induced by" or "strongly expressed in". - "-" means "repressed by" or "weakly expressed in". - "~" means "modulated by". - Cell cycle stages are given in square brackets. 4.2.3. Literature references References are given on lines starting with "RF" in highly condensed form beginning with a journal code explained in Appendix B.2. They primarily point to the articles where the experimental promoter evidence is presented. Additional potential subjects are homology to other promoters, gene expression and regulation, nomenclature. Papers containing only sequence data are usually not referred to because they are easy to find via the corresponding EMBL sequence entry descriptions. 4.3. Discarded entries Evaluation of new data occasionally suggests that an existing promoter entry should be discarded according to the criteria laid down in section 2. This is done by changing "FP" at the beginning of an entry into "F-". An explanation is given on the second line. At present, no entry is physically deleted. 4.4. Miscellaneous: - Greek letters are represented by corresponding latin letters followed by apostrophe: a' = alpha b' = beta g' = gamma d' = delta e' = z' = zeta h' = eta th'= theta k' = kappa l' = epsilon lambda n' = nu - sub- and superscripts are indicated by preceding "_" respectively. 5 and "^", CLASSIFICATION The entries of the Eukaryotic Promoter Database are embedded in a hierarchical classification system. A promoter's taxonomic location is made clear by interspersed group headings. The example shown below is taken from top of the database. A contrasting format has been chosen to emphasize the very different nature of this information. 6 <PAGE> *---------------------------------------------------------------------* * 1. Plant promoters * *---------------------------------------------------------------------* * 1.1. Chromosomal genes * *---------------------------------------------------------------------* * 1.1.1. Small nuclear RNAs * *---------------------------------------------------------------------* A group heading consists of a series of node numbers and a title. The highest classification level distinguishes between promoters active in major eukaryotic taxa (phyla). Further below, grouping considers replicon type and functional properties of gene products. On the lowest level, homology (as defined in section 6) is the criterion. A survey of the upper part of the classification pyramid is presented in appendix A. The proposed classification system has a highly tentative character as it is often unclear how a new promoter should be classified, especially if the gene product is a multifunctional protein. Users should therefore not be surprised or discouraged if they don't find a promoter at the initially expected place. 6 HOMOLOGOUS PROMOTERS Homology is defined as sequence similarity due to common phylogenetic origin. In EPD, two promoters are considered homologous if they exhibit >=50% sequence similarity between -79 and +20. Similarity is calculated from optimal alignments generated with the aid of the UWGCG subroutine ShiftAlign (3) using the following symbol comparison table: A C G N T 1.0 0.0 1.0 0.0 0.0 1.0 0.5 0.5 0.5 0.5 0.0 0.0 0.0 0.5 1.0 A C G N T Gap weight and gap length weight are specified as 3 and 0, respectively. Terminal gaps are ignored. Percent similarity is understood as alignment score divided by segment length, times 100. Groups of homologous promoters are identified by homology group numbers (see 4.2.1.). Definition of these groups is based on similarity scores as defined above and a tree generation method called UPGMA (4). A few scores between 50% and 56% obtained from alignments of supposedly unrelated promoters were ignored as well as those resulting from alternative promoters spaced by <=50 bp. A subset of "independent" promoters is marked by "+" in column 27 of the FP line. This set contains only one member per homology group (usually, the promoter with the longest upstream sequence available) and is intended to be used for statistical analysis of functional patterns where it is important to avoid bias by multiples of closely related sequences. 7 <PAGE> 7 PROMOTER SEQUENCE RETRIEVAL Promoter sequence listings have not been incorporated into EPD for two reasons: (i) to avoid duplication of data already existing elsewhere in the EMBL data library, and (ii) to encourage usage of FPS-dependent sequence retrieval programs which enables the user to specify suitable 5'- and 3'boundaries of the requested sequence segments himself. Effort is under way to motivate producers of standard nucleotide sequence analysis packages to provide such tools in the future. In the meantime, users with some programming experience will find it easy to write their own routines. Our local sequence extraction programs run in a UWGCG environment (3) and have been implemented at several sites in Europe and the United States. They are documented and freely available on request. References: 1. Bucher,P. & Trifonov,E.N., "Compilation and analysis of eukaryotic POL II promoter sequences", Nucl. Acids Res. 14, 10009-10026 (1986). 2. local- Bucher,P. & Bryan,B., "Signal search analysis: a new method to ize and characterize functionally important DNA sequences", Nucl. Acids Res. 12, 287-305 (1984). 3. Devereux,J., Haeberli,P., & Smithies,O., "A comprehensive set of sequence analysis programs for the VAX", Nucl. Res. 12, 387-395 (1984). 4. Sneath,H.A. & Sokal,R.R., "Numerical Freemann, San Francisco, London (1973). taxonomy", Acids W.H. 8 <PAGE> APPENDIX A SURVEY OF RELEASE 29 Total number of promoter entries (independent entries) 979 ( 651) 1. Plant promoters 100 ( 70) 1.1. Chromosomal genes 88 ( 59) 1.1.1. 1.1.2. 1.1.3. 1.1.4. 1.1.5. 1.1.6. 1.1.7. 7 9 23 28 1 8 12 ( ( ( ( ( ( ( 3) 9) 13) 19) 1) 6) 8) 1.2. Prokaryotic plasmid DNA 8 ( 7) 1.2.1. Enzymes 1.2.2. Unclassified 4 ( 4 ( 3) 4) 1.3. Viral genes 4 ( 4) 1.3.1. Geminiviruses 1.3.2. Cauliflower mosaic virus 2 ( 2 ( 2) 2) 2. Nematode promoters 8 ( 7) 2.1. Chromosomal genes 8 ( 7) 2.1.1. 2.1.2. 2.1.3. 2.1.4. 1 4 1 2 1) 3) 1) 2) Small nuclear RNAs Structural proteins Storage and transport proteins, apoproteins Enzymes Regulatory proteins Proteins related to stress or pathogen defense Unclassified Structural proteins Storage and transport proteins, apoproteins Hormones, growth factors, regulatory proteins Proteins related to stress or pathogen defense ( ( ( ( 3. Arthropode promoters 153 ( 101) 3.1. Chromosomal genes 147 ( 95) 1 67 7 21 24 13 14 ( ( ( ( ( ( ( 1) 34) 5) 10) 23) 9) 13) 3.2. Transposable elements and retroviruses 2 ( 2) 3.2.1. Long terminal repeats 2 ( 2) 3.3. Viral genes 4 ( 4) 3.3.1. Nuclear polyhedrosis viruses (early genes only) 4 ( 4) 4. Mollusc promoters 3 ( 3) 4.1. Chromosomal genes 3 ( 3) 4.1.1. Hormones, growth factors, regulatory proteins 3 ( 3) 5. Echinoderm promoters 27 ( 17) 5.1. Chromosomal genes 27 ( 17) 5.1.1. Small nuclear RNAs 5.1.2. Structural proteins 5.1.3. Storage and transport proteins, apoproteins 2 ( 24 ( 1 ( 1) 15) 1) 3.1.1. 3.1.2. 3.1.3. 3.1.4. 3.1.5. 3.1.6. 3.1.7. Small nuclear RNAs Structural proteins Storage and transport proteins, apoproteins Enzymes Hormones, growth factors, regulatory proteins Proteins related to stress or pathogen defense Unclassified A-1 <PAGE> 6. Vertebrate promoters 688 ( 453) 6.1. Chromosomal genes 530 ( 338) 6.1.1. 6.1.2. 6.1.3. 6.1.4. 6.1.5. 6.1.6. 25 108 95 88 140 52 Small nuclear RNAs Structural proteins Storage and transport proteins, apoproteins Enzymes Hormones, growth factors, regulatory proteins Proteins related to stress or pathogen defense ( ( ( ( ( ( 7) 82) 47) 65) 86) 35) 6.1.7. Unclassified 22 ( 16) 6.2. Transposable elements and retroviruses 30 ( 13) 6.2.1. Long terminal repeats 30 ( 13) 6.3. Viral genes 6.3.1. 6.3.2. 6.3.3. 6.3.4. 6.3.5. 6.3.6. 6.3.7. 128 ( 102) Herpes viruses (not EBV) Epstein-Barr virus and other g'-Herpesviruses Adenoviruses Papilloma viruses Parvoviruses Papovaviruses (not papilloma) Hepadnaviruses 48 23 24 9 6 8 10 ( ( ( ( ( ( ( 42) 23) 12) 8) 6) 5) 6) A-2 <PAGE> APPENDIX B.1 SPECIES CODES Code AAV2 Ac AcNPV Ad2 Ad5 Ad7 Ad12 Ag ALV Am A-MLV Ap Scientific name (English name) Adeno-associated virus 2 Aplysia californica (gastropod mollusk) Autographa californica nuclear polyhedrosis virus Human adenovirus type 2 Human adenovirus type 5 Human adenovirus type 7 Human adenovirus type 12 Ateles geoffroyi (spider monkey) Avian leukemia virus Antirrhinum majus (snapdragon) Abelson murine leukemia virus Antheraea polyphemus (silkmoth) At (plants) At (vertebrates) At[pTi.. Ay B19 BKV BLV Bm BPV1 Bt CaMV Cc Cco Ce Ch Cl Cm Ct Dc Df Dh DHBV Dm Dma Dmo Dmu Do Dp Ds Dse Dv EBV Ec FBJ-MSV FBR-MSV F-MCF Fs Arabidopsis thaliana (fam. cruciferae) Aotus trivirgatus (owl or night monkey) Agrobacterium tumefaciens Ti plasmid Antheraea yamamai (Japanese oak silkmoth) Human parvovirus B19 (Human) papovavirus BK Bovine leukemia virus Bombyx mori (silkmoth) Bovine papilloma virus type 1 Bos taurus (cattle) Cauliflower mosaic virus Cricetus cricetus (Chinese hamster) Coturnix coturnix (Quail) Caenorhabditis elegans (nematode) Capra hircus (goat) Canis lupus (dog) Cairina moschata (duck) Chironomus thummi (midge) Daucus carota (carrot) Drosophila funebris (fruit fly) Drosophila hydei (fruit fly) Duck hepatitis virus Drosophila melanogaster (fruit fly) Drosophila mauritiana (fruit fly) Drosophila mojavensis (fruit fly) Drosophila mulleri (fruit fly) Drosophila orena (fruit fly) Drosophila pseudoobscura (fruit fly) Drosophila simulans (fruit fly) Drosophila sechellia (fruit fly) Drosophila virilis (fruit fly) (Human) Epstein-Barr virus Equus cavallus (horse) Finkel-Biskis-Jinkins murine osteosarcoma virus Finkel-Biskis-Reilly murine osteosarcoma virus (Murine) Friend mink cell focus-inducing virus Felis silvestris (cat) B-1 <PAGE> F-SFFV GA-FeLV GALV Gg Gg[ev1] Ggo Gm GSHV H-1 HBV HCMV (Murine) Friend spleen focus forming virus Gardner-Arnstein feline leukemia virus Gibbon ape leukemia virus Gallus gallus (chicken) (Avian) endogenous virus 1 Gorilla gorilla (gorilla) Glycine max (soybean) Ground squirrel hepatitis virus (Murine) H-1 parvovirus Human hepatitis B virus Human cytomegalovirus Hg HIV-1 HIV-2 HPV16 HPV18 Hs HSV-1 HSV-2 HTLV-I HTLV-II Hv HVS JCV Le (plants) Le (vertebrates) Lm Lp Lv Ma Mc MCF MCMV MLV Mm M-MLV M-MSV MMTV Ms MSV Np Nt Oa Oc Or Ph Pa Pc Pm Polyoma Pp (arthropodes) Pp (vertebrates) Ps Pt Pv RAV2 Rc R-MCF Halichoerus grypus (grey seal) Human immunodeficiency virus type 1 Human immunodeficiency virus type 2 Human Pappilloma virus 16 Human Pappilloma virus 18 Homo sapiens (man) Human herpes simplex virus type 1 Human herpes simplex virus type 2 Human T-cell leukemia virus type I Human T-cell leukemia virus type II Hordeum vulgare (barley) Herpesvirus saimiri (Human) papovavirus JC Lycopersicon esculentum Lepus europeaeus (hare) Locusta migratoria Lytechinus pictus (sea urchin) Lytechinus variegatus (sea urchin) Mesocricetus aureus (golden hamster) Macaca cynomolgus (macaque) Mink cell focus-inducing virus Murine cytomegalovirus Murine leukemia virus Mus musculus (mouse) Moloney murine leukemia virus Moloney murine sarcoma virus Mouse mammary tumor virus Medicago sativa (alfalfa) Maize streak virus Nicotiana plumbaginifolia Nicotiana tabacum (tobacco) Ovis aries (sheep) Oryctolagus cuniculus (rabbit) Oryza sativa (rice) Petunia hybrida (e.g. Petunia strain Mitchell) Papio anubis (olive baboon) Petroselinum crispum (parsley) Psammechinus miliaris (sea urchin) (Murine) polyoma virus Photinus pyralis Pongo pygmaeus (orangutan) Pisum sativum (pea) Pan troglodytes (chimpanzee) Phaseolus vulgaris (french bean, kidney bean) (Avian) Rous associated virus type 2 Ricinus communis (Murine) Rauscher mink cell focus-inducing virus B-2 <PAGE> Rn Rattus norvegicus (rat) RSV SA7P Sd Se Sg SIV-III SNV So Sp (arthropodes) Sp (echinoderms) Ss SSV St SV40 Ta Visna Xl Xt Zm (Avian) Rous sarcoma virus Simian adenovirus 7P Strongylocentrotus drobachiensis (sea urchin) Spalax ehrenbergi (blind mole rat) Salmo gairdneri (rainbow trout) Simian immunodeficiency virus type III (Avian) spleen necrosis virus Spinacia oleracea (spinach) Sarcophaga peregrina (flesh fly) Strongylocentrotus purpureatus, (sea urchin) Sus scrofa (pig) Simian sarcoma virus Solanum tuberosum (potato) Simian virus 40 Triticum aestivum (wheat) Visna lentivirus Xenopus laevis (clawed frog) Xenopus tropicalis (clawed frog) Zea mays (maize) B-3 <PAGE> APPENDIX B.2 JOURNAL CODES Code Journal Name ARB ARP BBA BBRC Bch Bchi BchJ BrJR Btech CanR Cell Chrom CSHS CTMI CurG DNA DevB EJBc EMBOJ Evo FEBS GDev Gene Genom Gnts ImTo JBC JBch JCB JEM JGV JMAG JMB JME JVir MBE Annual Review of Biochemistry Annual Review of Physiology Biochimica Biophysica Acta Biochemical and Biophysical Research Communications Biochemistry Biochimie Biochemical Journal British Journal of Rheumatology Biotechnology Cancer Research Cell Chromosoma Cold Spring Harbor Symposia on Quantitative Biology Current Topics in Microbiology and Immunology Current Genetics DNA Developmental Biology European Journal of Biochemistry EMBO Journal Evolution FEBS Letters Genes and Development Gene Genomics Genetics Immunology Today Journal of Biological Chemistry Journal of Biochemistry Journal of Cell Biology Journal of Experimental Medicine Journal of General Virology Journal of Molecular and Applied Genetics Journal of Molecular Biology Journal of Molecular Evolution Journal of Virology Molecular Biology and Evolution MBM MBR MCB MEnd MEnz MGG MNeub MPMI NAR Nat Pla PMB PSL Molecular Biology and Medicine Molecular Biology Reports Molecular and Cellular Biology Molecular Endocrinology Methods in Enzymology Molecular and General Genetics Molecular Neurobiology Molecular Plant-Microbe Interactions Nucleic Acids Research Nature Planta Plant Molecular Biology Plant Science Letters B-4 <PAGE> PNAS United Sci SCMG TiG Vir VirR Proceedings of the National Academy of Sciences of the States of America Science Somatic Cell and Molecular Genetics Trends in Genetics Virology Virus Research B-5 <PAGE> APPENDIX B.3 ABBREVIATIONS 20-OHE 4CL a1 abd-g. abl AChR ACTH ADA ADH ADPg-s GT adult-HA AFW1 (AGM) AGP AIRS ALA-synt. ALDH_2 AlkExo 20-Hydroxyecdysone 4-coumarate coenzyme A ligase Gene locus 1 involved in anthocyanin biosynthesis Abdominal ganglion Abelson murine leukemia virus oncogene Acetylcholin receptor Adrenocorticotropic hormone Adenosine deaminase Alcohol dehydrogenase ADPglucose-starch glucosyltransferase Adult hermaphrodite Adult fast-white (myosin heavy chain) 1 "from african green monkey" Acid glycoprotein Aminoimidazole ribonucleotide synthase 5-Aminolevulinate synthase Aldehyde dehydrogenase 2 Alkaline exonuclease Amy antp aP2 apolipop. apoVLDLII APRT AR arg AS AS-C AspAT ass. AT ATCase ATP awd BB Bcl-2 b.p. BPTI BSF bsg25D cc1 CA cab cAMP cc-ind. CD3 CD4 CD8 CG CNS cp Amylase "antennapedia" locus Adipocyte homologue of myelin P2 Apolipoprotein Very low densitiy apolipoprotein II Adenine phosphoribosyltransferase Adrenergic receptor Arginine Argininosuccinate synthetase "achaete-scute" complex locus Aspartate aminotransferase Associated Antitrypsin Aspartate transcarbamylase Adenosinetriphosphate "abnormal wing disk" locus Bowman-Birk (protease inhibitor) B-cell leukemia/lymphoma 2 proto-oncogene Binding protein Bovine pancreatic trypsin inhibitor B-cell stimulating factor Blastoderm specific locus 25D Cellular protooncogene .. Regulatory locus of anthocyanin synthesis (maize) Carbonic anhydrase Chlorophyll a/b-binding protein Cyclic AMP (Adenosinemonophosphate) Cell cycle-independent T-cell differentiation antigen CD3 T-cell differentiation antigen CD4 T-cell differentiation antigen CD8 Chorionic gonadotropin Central nervous system Cytoplasm(ic) B-6 <PAGE> CPSase CRP cs CSF cyt dbp DDC dep. dev. DHFR diff. DL/R dUTPase E Carbamyl-phosphate synthase C-reactive protein Cytosol(ic) Colony stimulating facter Cytokinin gene (coding for isopentenyltransferase) DNA binding protein DOPA decarboxylase dependent Development(ally) Dihydrofolate reductase differentiation, differentiated Left and right duplicated region Deoxyuridinetriphosphatase 1. Early, 2. Erythroid cell-specific EBNA EDF EFW1 EGF EIa Eip ELH em erbA,B E-resp. ERV3 E.Tn eve f. fibrob. fos oncogene FSH ftz GA GADPH GARS Gart GART gC G-CSF gD GdX gE GFAP gln glucc GM-CSF gp GPD GRF GRP GS17 GSHPx G-spec. GST H Ha-ras hb Hc Epstein-Barr virus nuclear antigens Eosinophil differentiation factor Embryonic fast-white (myosin heavy chain) 1 Epidermal growth factor Adenovirus early Ia region (transactivating element) Ecdysone-induced protein Egg-laying hormone Embryo, embryonic (Avian) erythroblastosis virus oncogene A,B Estrogen-responsive Endogenous retrovirus 3 Early transposon "even-skipped" locus Factor Fibroblasts FBJ (Finkel-Biskis-Jinkins) osteosarcoma virus Follicle stimulating hormone "fushi tarazu" locus Gibberellic acid Glyceraldehyde-3-phosphate dehydrogenase Glycinamide ribonucleotide synthase "Gart" locus (-> GARS, AIRS, GART) Glycinamide ribonucleotide transformylase Glycoprotein C Granulocyte colony stimulating factor Glycoprotein D X-linked gene downstream of G6PD gene Glycoprotein E Glial fibrillary acidic protein Glutamine Glucocorticoid Granulocyte/Macrophage colony stimulating factor Glycoprotein Glycerol-3-phosphate dehydrogenase Growth hormone-releasing factor Glycine-rich (cell wall) protein Gastrula-specific transcript 17 Gluthathione peroxidase Gastrula-specific Gutathione S-transferase 1. Heavy chain, 2. Housekeeping-type promoter Rat-derived Harvey murine sarcoma virus oncogene "hunchbank" locus High-cysteine (chorion protein) B-7 <PAGE> HGT hist. HMG-CoA High-(glycine+tyrosine) keratin Histone 3-Hydroxy-3-methylglutaryl coenzyme A HPRT hs hsc HSF hsp HTF IAP ICP IE IF IFI IFN Ig IGF IL inf. inh. ISG kin. Ki-ras L larva-1,2,.. LCAT LDH leghem. LeIF LH LHC LMW LPH LPS MBP (MAC) MC MCK mGK MHCI/MHCII MIF mit mononuc-c. MOPC.. mos MP MPC.. MRP MSF msp MT mst MUP myb myc neu Hypoxanthine phosphoribosyltransferase Heatshock Constitutive analogue of heatshock gene/protein Hepatocyte-stimulating factor Heatshock protein Restriction endonuclease HpaII tiny fragments Intracisternal A-particles Infected cell protein Immediate early (gene, RNA) Intermediate filament Interferon-induced gene/protein Interferon Immunoglobulin Insulin-like growth factor Interleukin Infected Inhibitor Interferon-stimulated gene Kinase Rat-derived Kirsten murine sarcoma virus oncogene 1. Light chain; 2. Late First, second, .. instar larva Lecithin-cholesterol acyltransferase Lactate dehydrogenase Leghemoglobin Leukocyte interferon Luteinizing hormone Light-harvesting complex Low molecular weight Lipotropic hormone Lipopolysaccharide Myelin basic protein Macaque Methylcholanthrene Muscle-specific creatine kinase Submaxillary gland kallikrein Class I/II transplantation antigens of major histocompatibility complex Macrophage migration inhibitory factor Mitochondrial Mononuclear cells Mineral oil-induced plasmacytoma Moloney murine sarcoma virus oncogene Macrophage Mouse plasma cell tumor MIF-related protein (see MIF) Megakaryocyte stimulating factor Major sperm protein gene Metallothionein Male-specific transcript Major urinary protein (Avian) myeoloblastosis virus oncogene Myelocytomatosis virus 29 oncogene Ethyl-nitrosurea-induced rat neuroblastoma oncogene neuropep. Neuropeptide B-8 <PAGE> NGF ninaE nos N-ras NS ocs Ori ovalb. p. P-450 p53 panc. parath. PB PBGD PDGF PEPCK PG PHA PK P_L POL POMC pp.. PR1a PRL prog. PrP PSBP PSP pTiN pTiO r R ras rec. red. reg. rep-dep. RNR1, RNR2 rp rTn RuBPCss s. saliv-g. SBP sem-v. Nerve growth factor "neither inactivation nor afterpotential" locus E Nopaline synthetase Neuroblastoma ras-like (-> Ha-ras) oncogene Nervous system Octopine synthetase Origin of replication Ovalbumin Protein Cytochrome P-450 53K phosphoprotein pancreas, pancreatic Parathyroid Phenobarbital Porphobilinogen deaminase Platelet-derived growth factor Phosphoenolpyruvate carboxykinase Prostaglandin Phytohemagglutinin Protein kinase Late promoter Polymerase Proopiomelanocortin Phosphoprotein .. Pathogenesis-related protein 1a Prolactin Progesterone Prion protein Prostatic steroid binding protein Parotid secretory protein Nopaline type tumor inducing plasmid Octopine type tumor inducing plasmid "rudimentary" locus Regulatory subunit Homologue of -> Ha-ras, Ki-ras, etc. Receptor Reductase Regulated Replication-dependent Ribonucleotide reductase large, small subunit Ribosomal protein Retrotransposon Ribulose-1,5-biphosphate carboxylase small subunit Small Salivary gland Spermine-binding protein Seminal vesicle ser. sgs sis sk-m. skel-m. smooth-m. snRNA SOD som spat-reg. sry Serum Salivary gland secretion protein Simian sarcoma virus oncogene Skeletal muscle Skeletal muscle Smooth muscle Small nuclear RNA Superoxide dismutase Somatic Spatially regulated "serendipity" locus B-9 <PAGE> SV40T SVS synt. T3d' chain TAT TCDD TCGF TCR TdT test. TF TH thyr. Thy-1.2 TIF tis. TM tmr TNF TnT TO TPA TPI tr.,trTRF TS TSH T/t Ubx uPA URO-D Vg1 vir-inf. VL30 V_NP Tumor antigen of simian virus 40 (SV40) Seminal vesicle secretory protein Synthase T-cell antigen receptor-associated T3-complex delta Tyrosine aminotransferase 2,3,7,8-Tetrachlorodibenzo-p-dioxin T-cell growth factor T-cell receptor Terminal deoxynucleotidyltransferase testis Transcription factor Tyrosin hydroxylase Thyroxine Thy-1 (thymocyte) antigen/glycoprotein allotype 2 Trans-inducing factor Tissue Tropomyosin "tumor morphology root" locus Tumor necrosis factor Troponin T (tropomyosin-binding subunit) Tryptophan oxygenase 12-O-tetradecaonyl-phorbol-13-acetate Triosephosphate isomerase Transcript T-cell replacing factor Thymidylate sythetase Thyroid stimulating hormone Large/small T(tumor) antigen "ultrabithorax" locus Urine plasminogen activator Uroporphyrinogen decarboxylase Vegetal hemisphere-specific mRNA 1 Viral infection Retrovirus-like 30s RNA (Immunoglobulin heavy chain) variable region specific for 4-hydroxyl-3-nitrophenacetyl VP5 VSP vWf Virion protein 5 (HSV-1/2: =major capsid protein) Virion stimulatory protein von Willebrand factor B-10