The Hansenula polymorpha (strain CBS4732) genome sequencing

FEMS Yeast Research 4 (2003) 207^215
www.fems-microbiology.org
The Hansenula polymorpha (strain CBS4732) genome sequencing
and analysis
Massoud Ramezani-Rad a; , Cornelis P. Hollenberg a , Juergen Lauber b ,
Holger Wedler b , Eike Griess b , Christian Wagner c , Kaj Albermann c , Jean Hani c ,
Michael Piontek d , Ulrike Dahlems e , Gerd Gellissen e
a
Institute for Microbiology, Heinrich-Heine University Du«sseldorf, Universita«tsstrasse 1, 40225 Du«sseldorf, Germany
b
Qiagen Genomic Services, Qiagen GmbH, Max-Volmer Strasse 4, 40724 Hilden, Germany
c
Biomax Informatics AG, Lochhamer Strasse 11, 82152 Martinsried, Germany
d
ARTES Biotechnology GmbH, Agnesstrasse 8, 45136 Essen, Germany
e
Rhein Biotech GmbH, Eichsfelder Strasse 11, 40595 Du«sseldorf, Germany
Received 28 December 2002; received in revised form 24 February 2003; accepted 13 March 2003
First published online 30 April 2003
Abstract
The methylotrophic yeast Hansenula polymorpha is a recognised model system for investigation of peroxisomal function, special
metabolic pathways like methanol metabolism, of nitrate assimilation or thermostability. Strain RB11, an odc1 derivative of the particular
H. polymorpha isolate CBS4732 (synonymous to ATCC34438, NRRL-Y-5445, CCY38-22-2) has been developed as a platform for
heterologous gene expression. The scientific and industrial significance of this organism is now being met by the characterisation of its
entire genome. The H. polymorpha RB11 genome consists of approximately 9.5 Mb and is organised as six chromosomes ranging in size
from 0.9 to 2.2 Mb. Over 90% of the genome was sequenced with concomitant high accuracy and assembled into 48 contigs organised on
eight scaffolds (supercontigs). After manual annotation 4767 out of 5933 open reading frames (ORFs) with significant homologies to a
non-redundant protein database were predicted. The remaining 1166 ORFs showed no significant similarity to known proteins. The
number of ORFs is comparable to that of other sequenced budding yeasts of similar genome size.
A 2003 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.
Keywords : Yeast genomics; Sequence analysis ; Hansenula polymorpha
1. Introduction
Yeasts constitute an important group of industrial microorganisms. Its long tradition of human use, the overwhelming knowledge of its genetics and physiology made
the baker’s yeast Saccharomyces cerevisiae a eukaryotic
model organism for basic research and industrial applications [1]. In 1996, it was the ¢rst eukaryotic organism for
which the complete genome sequence was established [2].
The initial focus on S. cerevisiae has been extended by
investigations of a range of alternative yeast species. As
a consequence, the number of fully or partially sequenced
budding yeast genomes has continued to grow. Among
* Corresponding author. Tel. : +49 (211) 311 3425 ;
Fax : +49 (211) 311 5370.
E-mail address : ramezani@uni-duesseldorf.de (M. Ramezani-Rad).
others, a comparative genomic exploration of 13 species
was conducted selected from hemiascomycetous yeasts [3].
The methylotrophic yeast Hansenula polymorpha (syn.
Pichia angusta) is one of the most important industrially
applied non-conventional yeasts [4,5]. H. polymorpha is a
ubiquitous yeast species occurring naturally in spoiled orange juice, maize meal, in the gut of various insect species
and in soil. It grows as white to cream, butyrous colonies
and does not form ¢laments [6]. H. polymorpha isolates
are homothallic and reproduction occurs vegetatively by
budding. H. polymorpha belongs to the fungal family of
Saccharomycetaceae, subfamily Saccharomycetoideae
[6,7]. Most research has been performed with three basic
strains designated as H. polymorpha DL-1, CBS4732 and
NCYC495, respectively. These strains are of independent
origin and unclear relationship and exhibit di¡erent features, including di¡erent chromosome numbers. Depending on strain and separation conditions, between two and
1567-1356 / 03 / $22.00 A 2003 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.
doi:10.1016/S1567-1356(03)00125-9
FEMSYR 1572 27-10-03
Cyaan Magenta Geel Zwart
208
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
seven chromosomes can be distinguished [8,9]. Strain
CBS4732 (syn. ATCC34438, NRRL-Y-5445 ; CCY38-222) was originally isolated from soil irrigated with waste
water from a distillery in Pernambuco, Brazil [10]. Its
odc1 derivatives LR9 [11] and RB11 [12] have been developed as hosts for heterologous gene expression [12]. Recombinant compounds produced in these hosts include
enzymes like the feed additive phytase [13,14], anticoagulants like hirudin and saratin [15^17] and an e⁄cient vaccine against hepatitis B infection [18^20]. The signi¢cance
of H. polymorpha in basic research stems largely from
studies focussed on peroxisome homeostasis [21] and nitrate assimilation [22]. Although much is known about the
physiology, biochemistry and ultra structure of this yeast
(for review see monograph on H. polymorpha [4]), little
information is available about the genomic structure and
function [23]. Several groups worldwide have initiated
studies on its genome several years ago. Included in the
comparative genome analysis on 13 hemiascomycetous
yeasts mentioned above part of the H. polymorpha
(P. angusta) genome sequence was established using a partial random sequencing strategy with a coverage of 0.3
genome equivalents. Using this approach, about 3 Mb of
sequencing raw data of the H. polymorpha genome was
obtained [3]. We performed a genome analysis aimed at
a higher coverage and using a BAC-to-BAC approach.
This work now culminated in the comprehensive genome
analysis of this organism. A ¢rst description of the data
generated is provided in this study. Access to the genome
data can be granted upon request (G.G.) and after signing
a Material Transfer Agreement. The access has already
been granted to six academic groups working on various
aspects of functional genomics of H. polymorpha.
The present paper describes the results of the sequencing
and characterisation of 8.733 Mb assembled into 48 contigs. The sequence covers over 90% of the estimated total
genome content of 9.5 Mb located on six chromosomes
ranging in size between 0.9 and 2.2 Mb [23]. The established sequence contains 5933 ORFs.
2. Materials and methods
2.1. Construction of the genomic BAC library
For the sequencing of H. polymorpha strain RB11, an
odc1 derivative of wild-type strain CBS4732 was selected
[12]. For the construction of the genomic BAC library of
H. polymorpha, the vector pBACe3.6 was used and prepared according to Osoegawa et al. [24]. H. polymorpha
cells from a 50 ml YPD (1% yeast extract, 2% peptone, 2%
glucose) culture were washed twice with TSE bu¡er (25
mM Tris^HCl, 300 mM sucrose, 25 mM EDTA, pH 8)
and resuspended in TSE bu¡er. Then, agarose plugs from
these cells were prepared according to the Bio-Rad manual
of the Chef DR II pulsed-¢eld gel electrophoresis system
FEMSYR 1572 27-10-03
(PFGE system) using 1.5% low melting point agarose. Preelectrophoresis was carried out on a Bio-Rad PFGE system. Partial digestion of genomic DNA was carried out
according to Osoegawa et al. [25] using Sau3AI for restriction. Gel electrophoresis was carried out on a Bio-Rad
PFGE system according to conditions given at Rod
Wing’s homepage (Clemson University, Genomics Institute, construction of BAC libraries protocol : 6 V cm31 ,
90 s pulse, 13‡C 18 h). Agarose digestion with gelase,
ligation and transformation were carried out using the
same protocol. Subsequent electroporation of DH10B cells
(Invitrogen) was again carried out according to Osoegawa
et al. [25], and bacteria were plated onto 2UYT plates
supplemented with chloramphenicol as selecting agent.
Clones obtained from that procedure were picked and
used to inoculate 1.2 ml of 2UYT supplemented with
chloramphenicol. These bacterial cultures were used to
prepare glycerol stocks in 96-well microtitre plate format
as resource for all subsequent work.
2.2. Construction of shotgun libraries from BAC DNA
Large-scale preparations of BAC DNA were carried out
using the Large-Construct kit from Qiagen (Qiagen
GmbH, Hilden, Germany; cat. no. 12462). After soni¢cation and enzymatic repair of the ends, fragments of desired
size (usually 1.2^1.5 kb) were isolated from a 1% preparative agarose gel using the MinElute Gel Extraction kit
(Qiagen, cat. no. 28604) and inserted into a SmaI-digested
and alkaline phosphatase-treated pUC19 vector [26]. Ligation was carried out with the Rapid Ligation kit (Roche)
according to the manufacturer’s protocol. The ligation
mixture was then desalted using a QIAquick kit (Qiagen,
cat. no. 28304) according to the instructions of the supplier with the exception of the elution step. This was carried
out with ddH2 O. 1/10 volume of the eluted DNA was used
for transformation of competent Escherichia coli DH10B
cells using a Genepulser II device (Bio-Rad). 1 ml Luria^
Bertani (LB) medium [26] was added and incubated for
1 h at 37‡C. 1/200 and 1/20 volumes of the transformed
cells were plated onto Petri dishes containing LB agar,
ampicillin, X-Gal and isopropyl thiogalactose (IPTG)
[26] and grown overnight at 37‡C to determine the
yield of recombinant clones. Usually the transformation
rate was greater than 108 transformants per Wg vector
DNA and the white:blue ratio was approximately 10:1
or better.
2.3. Plasmid preparation of shotgun clones
For subsequent DNA sequencing, plasmid DNA from
white colonies was isolated after growth in 1.2 ml 2UYT
cultures containing ampicillin for 24 h at 37‡C and shaking at 220 rpm. Plasmid puri¢cation of shotgun clones was
carried out using the REAL Prep 96 kit (Qiagen, cat. no.
26173).
Cyaan Magenta Geel Zwart
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
209
Fig. 1. Summary of sequencing statistics.
2.4. DNA sequencing
3. Results and discussion
DNA sequencing reactions were set up using BigDye
Terminator v 2.0 cycle sequencing chemistry (Applied Biosystems, cat. no. 4314416) and puri¢ed using the DyeEx 96
(Qiagen, cat. no. 63183). Sequencing data were generated
using ABI Prism 3700 sequence analyzers.
3.1. Genome sequencing
A BAC library with approximately s 17U coverage
was constructed in pBACe3.6 and characterised by endsequencing and restriction digestion. Insert sizes of BAC
clones ranged from below 50 to over 100 kb per clone. A
total of 2880 BAC clones were generated with an average
insert size of 65 kb. 4892 BAC end sequences were generated with 483 bases average read length (phred20). BACend sequencing success rate was 85.5%. In total, 213 BAC
clones were selected for analysis, out of which 188 BACs
representing the minimal tiling path were selected for shotgun sequencing, BAC-by-BAC. Sequencing coverage of
BACs was 8.27-fold on average (Fig. 1). The number of
BACs with one contig only was 162, with two contigs 15,
with three contigs 9 and BACs with four contigs were 2.
2.5. Sequence assembly
Base calling and quality checks were carried out using
Phred [27]. Sequences were assembled with Phrap and editing was performed after import into gap4. BAC assemblies and raw data were visualised and edited using the
STADEN package (version 4.5; developed by Roger Staden et al.; http://www.mrc-lmb.cam.ac.uk/pubseq/staden_
home.html).
2.6. Automated bioinformatic annotation
3.2. Genome assembly
Fully automated annotation was carried out using the
ConSequence1 software system provided by Qiagen
(based on Pedant-Pro1 from Biomax Informatics AG)
[28].
The BAC library constructed covers the genome 18fold. 4892 BAC-end sequences from those clones yielded
approximately 2.4 Mb of raw data, covering 25% of the
Table 1
Overview of genome organisation and assembled sequences in supercontigs
Chromosome karyotype
Size (Mb)
Chromosome marker
Sequencing supercontig
Size (bp)
I
II
III
IV
V
VI
0.95
1.25
1.5
1.7
1.9
2.2
URA3; CPY (PRC1); GAP
rDNA (5.8S, 18S, 26S)
HARS1
PEP4 (PRA1); TPS1
MOX
FMD
6
9.5
5
6
1
8
4
3
2
7
8
968 770
983 699
1 220 583
1 290 524
1 306 376
1 494 936
218 529
1 250 065
8 733 482
FEMSYR 1572 27-10-03
Cyaan Magenta Geel Zwart
210
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
Fig. 2. Overview of supercontigs. The framed numbers within a stretch of BACs representing the respective supercontigs indicate the approximated size
of a particular gap between neighbouring ends.
genome (at 1U). On average, every 2 kb one BAC-end
sequence is located on the genome, suggestive of an estimating genome size of about 9.78 Mb. Pulsed-¢eld gel
electrophoresis of H. polymorpha RB11 chromosomes revealed six bands and the sum of the molecular masses of
chromosomal DNA bands suggested a genome size of
about 9^10 Mb [5] (Table 1). Mapping the end sequences
onto the growing and eventually ¢nal genomic sequence
showed a very even distribution of those end sequences
with no local clustering, underlining the good random
cloning of large genomic sub-fragments into this BAC
library. The only exception were clones and end sequences
falling into the rDNA region of the genome. There were
no further large repetitive regions noticed. Smaller repeat
regions have all been resolved for each individual BAC.
Further, no repeats within BAC/BAC overlapping regions,
potentially confounding a correct BAC-to-BAC assembly,
were found. In addition to the BAC-to-BAC assembly
based on overlapping regions, all BAC-end sequences
with their forward/reverse constraints per clone as well
as sizing information for individual BAC clones were
used to layer a BAC map on top of the resulting assemblies. The consistency of the assembly was checked on
the back of that BAC map for each BAC/BAC overlap
and assembly. No discrepancies were detected between a
single BAC/BAC overlap assembly and the BAC map
backbone.
The genome was assembled into 48 contigs and could be
logically joined using clones physically bridging known
gaps to eight supercontigs with a unique total size of
8.733 Mb from the six known chromosomes with assigned
gene markers to electrophoretically separated chromosomes [5] (Table 1 and Fig. 2). Sequence overlaps between
individual BACs with a total size of 1.521 Mb (approximately 15% of the total sequence generated) were used to
measure the sequencing accuracy. It was determined to
Table 2
Comparisons of the S. cerevisiae and H. polymorpha genomes
Genome size (Mb)
Sequenced non-redundant genome length (bp)
GC content (%)
Number of ORFs (with similarities)
Average ORF distance (bp)
Average protein length (aa)
Number of tRNAs
FEMSYR 1572 27-10-03
S. cerevisiae
H. polymorpha
13.5
12 156 307
38.1
6449 (5978)
1885
471
278
V9.5
8 733 442
47.9
5933 (4767)
1472
437
80
Cyaan Magenta Geel Zwart
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
211
Fig. 3. Functional comparison of S. cerevisiae and H. polymorpha gene content (general functional categories).
99.998% or fewer than 1.75 errors in 100 kb. As the same
technologies, expertise and work scheme were applied for
all sequencing work, we conclude from this analysis that
more than 90% of the total genome was sequenced with
this high accuracy of 99.998%. The estimated 10% of the
genome not yet sequenced includes telomeric regions, approximately 45^50 additional rDNA repeats (with a total
of approximately 0.3 Mb only), and small gaps, some of
which are indicated as boxes in Fig. 2. These results indicated that using end sequencing as a way to map the
BAC clones allowed for high accuracy and eventual direct
alignment onto the assembled genomic contigs as well as
sequence comparisons between all sequences obtained
(BACs but also shotgun sequences from three di¡erent
shotgun libraries with inserts in the 1, 3 and 6^8 kb range)
during the course of the project.
3.3. Genome organisation
The Pedant-Pro1 Sequence Analysis Suite was used for
gene identi¢cation. Out of the sequenced 8.73 Mb, 5933
ORFs have been extracted for proteins longer than 80
amino acids. ORFs whose sequence is entirely contained
within another reading frame have been excluded from the
analysis. 70 shorter ORFs ( 6 80 amino acids) with signi¢cant BLAST similarities have been extracted manually.
4767 ORFs show signi¢cant similarities to a non-redundant protein database. Out of the 4767 ORFs with similarities, 4109 showed signi¢cant similarity to ORFs from
S. cerevisiae. The remaining 1166 ORFs have no signi¢cant similarities to known sequences. 410 ORFs are shorter than 100 amino acids. The numbers are not comparable
due to di¡erent automatic gene-prediction methods and
Fig. 4. Functional comparison of S. cerevisiae and H. polymorpha gene content (functional categories of metabolism).
FEMSYR 1572 27-10-03
Cyaan Magenta Geel Zwart
212
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
Table 3
Synteny analysis between H. polymorpha and S. cerevisiae
H.p. BAC
H.p. ORF
BLAST E value
S.c. ORF
S.c. Description
S.c. Chr.
cqbh_00
cqbh_00
cqbh_00
orf129
orf158
orf155
7.00E324
4.00E357
2.00E339
ypr185w
ypr186c
ypr187w
16
16
16
cqbh_00
cqbh_00
orf135
orf121
0.0
4.00E369
ypr189w
ypr190c
cqbh_00
cqgr.00
cqgs.00
cqag_00
cqhn.00
cqgr.00
cqhm.00
cqan_00
orf117
orf129
orf143
orf148
orf161
orf168
orf177
orf362
6.00E350
1.00E342
1.00E3101
3.00E344
4.00E310
0.0
0.0
2.00E312
ypr191w
ylr403w
ylr405w
ylr406c
ylr407w
ylr409c
ylr410w
yjr086w
cqan_00
orf357
5.00E321
yjr088c
cqan_00
orf324
1.00E3123
yjr090c
cqan_00
cqan_00
cqan_00
orf304
orf248
orf231
3.00E377
1.00E332
1.00E314
yjr091c
yjr092w
yjr093c
cqan_00
cqga.00
cqga.00
cqga.00
cqga.00
cqga.00
cqga.00
cqfq.00
cqfq.00
cqfq.00
cqfq.00
cqfq.00
cqfq.00
orf230
orf27
orf19
orf15
orf30
orf42
orf56
orf75
orf72
orf70
orf68
orf64
orf60
1.00E324
5.00E338
1.00E3167
4.00E346
0.0
3.00E328
7.00E345
4.00E330
1.00E3145
8.00E333
4.00E340
5.00E336
3.00E387
yjr094w-a
ygr091w
ygr092w
ygr093w
ygr094w
ygr095c
ygr096w
ygl191w
ygl190c
ygl189c
ygl187c
ygl185c
ygl184c
cqav_00
cqav_00
cqav_00
cqav_00
orf272
orf276
orf330
orf315
8.00E324
2.00E355
1.00E325
8.00E369
ygl111w
ygl110c
ygl106w
ygl105w
cqav_00
cqav_00
cqav_00
cqbp_00
cqbp_00
cqbp_00
cqbp_00
cqbp_00
cqbp_00
orf294
orf292
orf263
orf17
orf21
orf26
orf75
orf68
orf189
2.00E362
2.00E314
1.00E3100
2.00E346
1.00E3125
5.00E367
2.00E363
5.00E315
1.00E3135
ygl103w
ygl102c
ygl100w
ydr447c
ydr448w
ydr449c
ydr450w
ydr451c
ydr452w
cqaq_00
orf216
2.00E322
ydr362c
cqaq_00
cqaq_00
cqaq_00
cqaq_00
orf202
orf191
orf165
orf180
2.00E375
3.00E323
2.00E396
1.00E3139
ydr365c
ydr367w
ydr372c
ydr375c
cqaq_00
orf251
1.00E3104
ydr380w
cqaq_00
orf245
1.00E320
ydr381w
APG13 ^ protein required for the autophagic process
PZF1 ^ TFIIIA (transcription initiation factor)
RPO26 ^ DNA-directed RNA polymerase I, II, III 18 kDa
subunit
SKI3 ^ antiviral protein
RPC82 ^ DNA-directed RNA polymerase III, 82 kDa
subunit
QCR2 ^ ubiquinol-cytochrome-c reductase 40 kDa chain II
SFP1 ^ zinc ¢nger protein
similarity to Azospirillum brasilense nifR3 protein
RPL31B ^ 60S large subunit ribosomal protein L31.e.c12
hypothetical protein
strong similarity to Schizosaccharomyces pombe L-transducin
VIP1 ^ strong similarity to S. pombe protein Asp1p
STE18 ^ GTP-binding protein Q subunit of the pheromone
pathway
weak similarity to S. pombe hypothetical protein
SPBC14C8.18c
GRR1 ^ required for glucose repression and for glucose
and cation transport
JSN1 ^ suppresses the high-temperature lethality of tub2-150
BUD4 ^ budding protein
FIP1 ^ component of pre-mRNA polyadenylation factor
PF I
RPL43B ^ 60S large subunit ribosomal protein
PRP31 ^ pre-mRNA splicing protein
DBF2 ^ ser/thr protein kinase related to Dbf20p
similarity to hypothetical S. pombe protein
VAS1 ^ valyl-tRNA synthetase
RRP46 ^ involved in rRNA processing
similarity to bovine Graves disease carrier protein
COX13 ^ cytochrome-c oxidase chain VIa
CDC55 ^ ser/thr phosphatase 2A regulatory subunit B
RPS26A ^ 40S small subunit ribosomal protein S26e.c7
COX4 ^ cytochrome-c oxidase chain IV
weak similarity to dehydrogenases
STR3 ^ strong similarity to Emericella nidulans and
similarity to other cystathionine L-lyase and Cys3p
weak similarity to hypothetical protein S. pombe
similarity to hypothetical protein SPCC1906.02c S. pombe
MLC1 ^ Myo2p light chain
ARC1 ^ protein with speci¢c a⁄nity for G4 quadruplex
nucleic acids
RPL28 ^ 60S large subunit ribosomal protein L27a.e
questionable ORF
SEH1 ^ nuclear pore protein
RPS17B ^ ribosomal protein S17.e.B
ADA2 ^ general transcriptional adapter or co-activator
similarity to hypothetical protein S. pombe
RPS18A ^ ribosomal protein S18.e.c4
YHP1 ^ strong similarity to Yox1p
PHM5 ^ similarity to human sphingomyelin
phosphodiesterase (PIR :S06957)
TFC6 ^ TFIIIC (transcription initiation factor) subunit,
91 kDa
weak similarity to Streptococcus M protein
similarity to hypothetical protein SPAC26H5.13c S. pombe
similarity to hypothetical S. pombe protein
BCS1 ^ mitochondrial protein of the CDC48/PAS1/SEC18
(AAA) family of ATPases
ARO10 ^ similarity to Pdc6p, Thi3p and to pyruvate
decarboxylases
YRA1 ^ RNA annealing protein
FEMSYR 1572 27-10-03
Cyaan Magenta Geel Zwart
16
16
16
12
12
12
12
12
12
10
10
10
10
10
10
10
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
4
4
4
4
4
4
4
4
4
4
4
4
4
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
213
Table 3 (Continued).
H.p. BAC
H.p. ORF
BLAST E value
S.c. ORF
S.c. Description
cqaq_00
cqdw.p1
cqdw.p1
cqdw.p1
cqdw.p1
cqdw.p1
cqdw.p1
orf236
orf217
orf208
orf271
orf228
orf257
orf250
2.00E320
7.00E382
0.0
1.00E328
4.00E390
5.00E334
7.00E357
ydr382w
ydr061w
ydr062w
ydr067c
ydr069c
ydr071c
ydr072c
RPP2B ^ 60S large subunit acidic ribosomal protein
similarity to E. coli modF and photorepair protein phrA
LCB2 ^ serine C-palmitoyltransferase subunit
similarity to YNL099c
DOA4 ^ ubiquitin-speci¢c protease
similarity to Ovis aries arylalkylamine N-acetyltransferase
IPT1 ^ mannosyl diphosphorylinositol ceramide synthase
due to the di¡erent genomes. Only after an in-depth analysis will an evaluation of the number of questionable
ORFs be possible and will maybe reduce the number of
ORFs shorter than 100 amino acids. Calculation of the
gene density and protein length, taking into account the
S.c. Chr.
gene numbers, showed an average length of 1472 bp and
an average protein length of 437 amino acids. No experiments have been performed so far for the evaluation of
these predicted numbers.
Introns have been identi¢ed by homology to known
Table 4
Nuclear tRNA genes identi¢ed in the H. polymorpha genome
tRNA species
Anticodon
H. polymorpha
S. cerevisiae
tRNA-Ala
tRNA-Ala
tRNA-Arg
tRNA-Arg
tRNA-Arg
tRNA-Arg
tRNA-Asn
tRNA-Asp
tRNA-Cys
tRNA-Gln
tRNA-Gln
tRNA-Glu
tRNA-Glu
tRNA-Gly
tRNA-Gly
tRNA-Gly
tRNA-His
tRNA-Ile
tRNA-Ile
tRNA-Leu
tRNA-Leu
tRNA-Leu
tRNA-Leu
tRNA-Leu
tRNA-Leu
tRNA-Lys
tRNA-Lys
tRNA-Met
tRNA-Phe
tRNA-Pro
tRNA-Pro
tRNA-Ser
tRNA-Ser
tRNA-Ser
tRNA-Ser
tRNA-Thr
tRNA-Thr
tRNA-Thr
tRNA-Trp
tRNA-Tyr
tRNA-Val
tRNA-Val
Total
Di¡erent tRNAs
AGC
UGC
ACG
CCG
CCU
UCU
GUU
GUC
GCA
CUG
UUG
CUC
UUC
CCC
GCC
UCC
GUG
AAU
UAU
AAG
CAA
CAG
GAG
UAA
UAG
CUU
UUU
CAU
GAA
AGG
UGG
AGA
CGA
GCU
UGA
AGU
CGU
UGU
CCA
GUA
AAC
CAC
3
2
2
1
1
3
2
3
1
1
2
3
2
0
4
2
2
4
1
1
3
1
0
1
1
3
2
4
3
0
3
2
1
2
1
3
1
1
2
2
3
1
80
40
11
5
6
1
1
11
10
16
4
1
9
2
9
2
16
3
7
13
2
13
10
0
1
7
3
14
7
10
10
2
10
11
1
2
3
11
1
3
6
8
14
2
278
41
FEMSYR 1572 27-10-03
4
4
4
4
4
4
4
Cyaan Magenta Geel Zwart
214
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
[3] in the literature, we have taken all MIPS genes into
account for the comparisons. S. cerevisiae contains 6449
ORFs with an average distance of 1885 bp in comparison
to 5933 ORFs in H. polymorpha with an average distance
of 1472 bp. The gene density in H. polymorpha appears
higher than that in S. cerevisiae when correlating the number of ORFs in the two organisms with the size of the
respective genomes. An exhaustive synteny analysis has
been performed between H. polymorpha and S. cerevisiae.
It revealed up to eight syntenic proteins in both organisms. Six clusters were found to contain six syntenic proteins; two clusters were found to contain seven syntenic
proteins and one cluster contains eight syntenic proteins
(Table 3).
Overall, 80 nuclear tRNA genes were identi¢ed in the
H. polymorpha genome sequence (Table 4), in comparison
to S. cerevisiae where 278 tRNA genes have been found.
Despite these di¡erences, both yeasts have nearly the same
amount of di¡erent tRNA species, in H. polymorpha 40, in
S. cerevisiae 41. The lower number of tRNA genes in
H. polymorpha is consistent with the tRNA analysis of
RST sequences from Pichia sorbitophila [3], a close relative
of H. polymorpha. One-third of the P. sorbitophila genome
was found to contain 23 nuclear tRNA genes only. The
estimated number for the complete P. sorbitophila genome
(V70) is thus comparably low.
The identi¢cation of relevant genes of the mating system
and pheromone signal transduction pathway are shown in
Table 5. Data analyses indicate that H. polymorpha contains several genes attributed to the regulation of mating,
such as STE3, STE6, GPA1, STE18, CDC42, STE50 and
STE11. These data suggest that a conserved mitogen-activated protein kinase pathway might regulate mating in
H. polymorpha. In addition, the data analyses indicate
that H. polymorpha contains a gene that corresponds to
the mating type regulatory protein gene at the HMR locus
of Kluyveromyces lactis (HMRa1). The cryptic mating type
loci like HMRa1 in S. cerevisiae and K. lactis act as reservoirs of mating type information in mating type switching in homothallic yeast strains. The function of this homologue in H. polymorpha remains unknown.
proteins and con¢rmed by using GeneWise [29]. In a preliminary analysis 91 intron-containing genes were identi¢ed in this way. These include all genes identi¢ed previously [3] as intron-containing genes. 80 tRNAs were
identi¢ed, corresponding to all 20 amino acids. From approximately 50 rRNA clusters [5], seven clusters have been
fully sequenced. All clusters are completely identical and
have a precise length of 5033 bp. Although representing
only 10% of the estimated total number of rDNA repeats
to be present in H. polymorpha, the seven fully sequenced
rDNA repeats are absolutely identical.
The main functional categories and their distribution in
the gene set are automatically predicted for: transposable
elements, 1%; energy, 5%; cellular communication, signal
transduction mechanism, 6%; protein synthesis, 6%; cell
rescue, defense and virulence, 9%; cellular transport and
transport mechanisms, 12%; cell cycle and DNA processing, 12%; protein fate (folding, modi¢cation, destination)
12%; transcription, 14%; and metabolism, 23% (Fig. 3).
Localisation was assigned to 2858 ORFs.
3.4. Comparison with S. cerevisiae sequences
The comparative genomic analysis of closely related organisms allowed us to identify species-speci¢c genes and
permitted us to estimate the rates of sequence divergence
of the derived proteins. Comparing the genomic organisation of S. cerevisiae to that of H. polymorpha reveals differences and similarities at di¡erent levels (Table 2 and
Figs. 3 and 4). The overall H. polymorpha genome exhibits
a GC content of 47.9% compared to 38.1% found for the
S. cerevisiae genome. The amino acid composition properties are essentially driven by GC content. The size of the
genome of S. cerevisiae is 13.5 Mb (sequenced non-redundant genome length 12 156 kb) in comparison to the 9.5
Mb (sequenced non-redundant genome length 8733 kb) of
H. polymorpha. For the comparison of H. polymorpha to
S. cerevisiae we have used the MIPS comprehensive yeast
genome database CYGD [30]. It includes 6449 genes. Out
of these, 471 genes are marked as questionable. As the
exact gene number of S. cerevisiae is still under debate
Table 5
Mating-speci¢c genes in H. polymorpha
Hp_ORF
AA length
BLAST hit
AA length
BLASTP score
Function
BJ_37
BO_26
CA_130
BI_65
AG_50
AN_362
AY_145
AL_42
215
433
1227
700
398
127
295
197
Kl_YCR097w
Sc_STE3
Sc_STE6
Sc_STE11
Sc_STE50
Sc_STE18
Sc_GPA1
Sc_CDC42
126
470
1290
738
364
110
472
192
154
509
1167
690
223
126
130
248
mating-type regulatory protein, silence copy at HMR locus
pheromone a-factor receptor
ATP-binding cassette transporter protein
pheromone response
pheromone response
G protein Q subunit
G protein K subunit
G protein
FEMSYR 1572 27-10-03
Cyaan Magenta Geel Zwart
M. Ramezani-Rad et al. / FEMS Yeast Research 4 (2003) 207^215
Acknowledgements
Erika Wedler, Kathleen Balke, Nicole Lokmer, and
Do«rte Mo«stl are acknowledged for their excellent technical
work during the entire DNA sequencing phase of the project.
References
[1] Joseph, R. (1999) Yeasts : production and commercial uses. In: Encyclopedia of Food Microbiology, Vol. 3 (Robinson, R. K., Batt, C.
A. and Patel, P.D., Eds.), pp. 2335^2341. Academic Press, San Diego,
CA.
[2] Go¡eau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis,
E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H. and
Oliver, S.G. (1996) Life with 6000 genes. Science 274, 563^567.
[3] Feldmann, H. (Ed.) (2000) Ge¤nolevures. Genomic exploration of the
hemiascomycetous yeasts. FEBS Lett. 487, 1^150.
[4] Gellissen, G. (2000) Heterologous protein production in methylotrophic yeasts. Appl. Microbiol. Biotechnol. 54, 741^750.
[5] Gellissen, G. (Ed.) (2002) Hansenula polymorpha - Biology and Applications. Wiley-VCH, Weinheim.
[6] Barnett, J.A., Payne, R.W. and Yarrow, D. (2000) Yeasts : Characteristics and Idendi¢cation, 3rd edn. Cambridge University Press,
Cambridge.
[7] Middelhoven, W.J. (2002) History, habitat, varability, nomenclature
and phylogenetic position of Hansenula polymorpha. In: Hansenula
polymorpha - Biology and Applications (Gellissen, G., Ed.), pp. 1^7.
Wiley-VCH, Weinheim.
[8] Marri, L., Rossolini, G.M. and Satta, G. (1993) Chromosome polymorphism among strains of Hansenula polymorpha. Appl. Environ.
Microbiol. 59, 939^941.
[9] Lahtchev, K. (2002) Basic genetics of Hansenula polymorpha. In:
Hansenula polymorpha - Biology and Applications (Gellissen, G.,
Ed.), pp. 8^20. Wiley-VCH, Weinheim.
[10] Morais, J.O.F. and Maia, M.H.D. (1959) Estudos de microorganismos enconcentrados em leitos de despe¤jos de caldas de destilarias de
Pernambuco. II. Uma nova espe¤cie de Hansenula, H. polymorpha.
Anais de Escola Superior de Qimica, Universidade do Recife 1, 15^
20.
[11] Roggenkamp, R., Hansen, H., Eckart, M., Janowicz, Z. and Hollenberg, C.P. (1986) Transformation of the methylotrophic yeast Hansenula polymorpha by autonomous replication and integration vectors. Mol. Gen. Genet. 202, 302^308.
[12] Suckow, M. and Gellissen, G. (2002) The expression platform based
on H. polymorpha strain RB11 and its derivatives - history, status and
perspectives. In: Hansenula polymorpha - Biology and Applications
(Gellissen, G., Ed.), pp. 105^123. Wiley-VCH, Weinheim.
[13] Mayer, A.F., Hellmuth, K., Schlieker, H., Lopez-Ulibarri, R., Oertel,
S., Dahlems, U., Strasser, A.W.M. and van Loon, A.P.G.M. (1999)
An expression system matures : a highly e⁄cient and cost-e¡ective
process for phytase production by recombinant strains of Hansenula
polymorpha. Biotechnol. Bioeng. 63, 373^381.
[14] Papendieck, A., Dahlems, U. and Gellissen, G. (2002) Technical enzyme production and whole-cell biocatalysis : application of Hansenula polymorpha. In: Hansenula polymorpha - Biology and Applications (Gellissen, G., Ed.), pp. 255^271. Wiley-VCH, Weinheim.
FEMSYR 1572 27-10-03
215
[15] Avgerinos, G.C., Turner, B.G., Gorelick, K.J., Papendieck, A., Weydemann, U. and Gellissen, G. (2001) Production and clinical development of a Hansenula polymorpha-derived PEGylated hirudin. Sem.
Thromb. Hemostas. 27, 357^371.
[16] Barnes, C.S., Kra¡t, B., Frech, M., Hofmann, U.R., Papendieck, A.,
Dahlems, U., Gellissen, G. and Hoylaerts, M.F. (2001) Production
and charcaterization of saratin, an inhibitor of von Willebrand’s factor-dependent platelet adhesion to collagen. Sem. Thromb. Hemostas. 27, 337^347.
[17] Bartelsen, O., Barnes, C.S. and Gellissen, G. (2002) Production of
anticoagulants in Hansenula polymorpha. In: Hansenula polymorpha Biology and Applications (Gellissen, G., Ed.), pp. 211^228. WileyVCH, Weinheim.
[18] Janowicz, Z.A., Melber, K., Merckelbach, A., Jacobs, E., Harford,
N., Comberbach, M. and Hollenberg, C.P. (1991) Simultaneous expression of the S and L surface antigens of hepatitis B and formation
of mixed particles in the methylotrophic yeast, Hansenula polymorpha. Yeast 7, 431^433.
[19] Schaefer, S., Piontek, M., Ahn, S.-J., Papendieck, A., Janowicz, Z.A.
and Gellissen, G. (2001) Recombinant hepatitis B vaccines - characterization of the viral disease and vaccine production in the methylotrophic yeast, Hansenula polymorpha. In: Novel Therapeutic Proteins - Selected Case Studies (Dembowsky, K. and Stadler, P., Eds.),
pp. 245^274. Wiley-VCH, Weinheim.
[20] Schaefer, S., Piontek, M., Ahn, S.-J., Papendieck, A., Janowicz, Z.A.,
Timmermans, I. and Gellissen, G. (2002) Recombinant hepatitis B
vaccines - disease characterization and vaccine production. In: Hansenula polymorpha - Biology and Applications (Gellissen, G.. Ed.),
pp. 175^210. Wiley-VCH, Weinheim.
[21] Van der Klei, I.J. and Veenhuis, M. (2002) Hansenula polymorpha: a
versatile model organism in peroxisome research. In: Hansenula polymorpha - Biology and Applications (Gellissen, G., Ed.), pp. 76^94.
Wiley-VCH, Weinheim.
[22] Siverio, J.M. (2002) Biochemistry and genetics of nitrate assimilation.
In: Hansenula polymorpha - Biology and Applications (Gellissen, G.,
Ed.), pp 21^40. Wiley-VCH, Weinheim.
[23] Waschk, D., Klabunde, J., Suckow, M. and Hollenberg, C.P. (2002)
Characteristics of the Hansenula polymorpha genome. In: Hansenula
polymorpha - Biology and Applications (Gellissen, G., Ed.), pp. 95^
104. Wiley-VCH, Weinheim.
[24] Osoegawa, K., de Jong, P.J., Frengen, E. and Ioannou, P.A. (1999)
Construction of bacterial arti¢cial chromosome (BAC/PAC) libraries.
Current Protocols in Human Genetics. 5.15.1^5.15.33.
[25] Osoegawa, K., Woon, P.Y., Zhao, B., Frengen, E., Tateno, M., Catanese, J.J. and de Jong, P.J. (1998) An improved approach for construction of bacterial arti¢cial chromosome libraries. Genomics 52, 1^
8.
[26] Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning, A Laboratory Manual. Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, NY.
[27] Ewing, B., Hillier, L., Wendl, M.C. and Green, P. (1998) Base-calling
of automated sequencer traces using phred. Genome Res. 8, 175^194.
[28] Frishman, D., Albermann, K., Hani, J., Heumann, K., Metanomski,
A., Zollner, A. and Mewes, H.W. (2001) Functional and structural
genomics using PEDANT. Bioinformatics 17, 44^57.
[29] Birney, E. and Durbin, R. (2000) Using GeneWise in the Drosophila
annotation experiment. Genome Res. 10, 547^548.
[30] Mewes, H.W., Frishman, D., Gu“ldener, U., Mannhaupt, G., Mayer,
K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S. and
Weil, B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31^34.
Cyaan Magenta Geel Zwart