The wheat PDI (Protein Disulfide Isomerase) genes. M. Ciaffi, O.A. Tanzarella and E. Porceddu* Dipartimento di Agrobiologia e Agrochimica, Università della Tuscia, 01100 Viterbo, Italy * Corresponding author: Prof. Enrico Porceddu Dipartimento di Agrobiologia e Agrochimica Via S. Camillo De Lellis 01100 Viterbo ITALY E-mail: porceddu@unitus.it Tel: 0039 0761 357231 Fax: 0039 0761 357256 Keywords: Gene structure, Gene expression, Triticum, Protein folding, Wheat quality 1 1. INTRODUCTION Cultivated on over 217 million hectares of land, spanning from the Scandinavian peninsula in the North to Argentina in the South, wheat provides a total annual production of over 620 million Mt of grain, and represents the staple food for over two billion people, more than one third of world population. Wheat grain is almost exclusively consumed after processing to give rise to many different types of cooked food. This ability is largely due to its storage proteins, which play an important role in determining dough properties. Wheat storage proteins consist primarily of prolamins, which are synthesised in the developing endosperm and targeted to the endoplasmic reticulum (ER) lumen, where they are folded and connected by intermolecular disulfide bonds to form large aggregates (Shewry and Tatham 1997). Generally, these protein polymers are deposited in massive protein bodies within vesicles that shear directly off the ER, although some of them (primarily the monomeric gliadins) are transported to vesicles via the Golgi apparatus (Rubin et al. 1992). Therefore, genes encoding seed storage proteins as well as factors that affect their deposition, such as molecular chaperones and foldase enzymes, are of particular relevance in wheat industry. Even though wheat storage proteins have been the object of a wide range of studies both at chemical and genetic levels (reviewed by Shewry et al. 2003), knowledge of factors affecting their folding and deposition is still extremely limited. The aim of this paper is to summarise recent achievements in understanding the peculiarities of protein disulfide isomerase (PDI), an enzyme possibly involved in the folding of endosperm storage proteins, and will specifically focus on the molecular characterisation of wheat PDI genes. 2. THE PDI ENZYME 2 During maturation of the secretory proteins, disulfide bonds cross-linking specific cysteines are added to stabilize a protein or to join covalently different polypeptides. These bonds are often crucial for the stability of the final protein structure (Tu and Weissmann 2004). The slow in vitro spontaneous folding of proteins (hours to days, or never) is incompatible with the time scale of their in vivo secretion (about 30-60 min). The eukaryotic cells have reconciled this incongruence through a specialized redox environment (Frand et al. 2000), i. e. ER, which is equipped with enzymatic catalysts promoting disulfide bond formation and isomerization (Denecke 1996; Fassio and Sitia 2002). PDI is one of them. It is a long living, abundant protein able to catalyze thiol-disulfide oxidation, reduction and isomerization, the latter originating directly through intramolecular disulfide rearrangement or through cycles of reduction and oxidation (Schwaller et al. 2003). Complexity, structure and function of PDI have been extensively studied in mammalians (for reviews see Ferrari and Soling 1999; Wilkinson and Gilbert 2004; Ellgaard and Ruddock 2005), wherein it is a dimer consisting of two identical subunits of about 57 kDa and is one of the most abundant proteins in the ER (Lyles and Gilbert 1991). Analyses of sequence homology and NMR studies have shown that human PDI has a modular structure comprising five domains: a, b, b’, a’ and c (Fig. 1). The a and a’ domains are homologous to thioredoxin, a small protein involved in many cytoplasmic redox reactions (Freedman et al. 1994); both a and a' domains contain a catalytic site for isomerase and redox activities consisting of the Cys-Gly-His-Cys amino acid sequence (Noiva and Lennarz 1992). The middle b and b’ domains have a secondary structure similar to that of the a and a’ domains (Kemmink et al. 1997; Ferrari and Soling 1999), but do not show any significant homology to tioredoxin. The c domain, at the C end, is rich of acidic residues typical of calcium binding proteins (Lucero and Kaminer 1999), and has a KDEL sequence for ER protein retention (Denecke et al. 1992). 3 PDI was initially considered a catalyst for disulfide bond formation (Freedman et al. 1994), but its ability of binding to unfolded or partly folded proteins, preventing their aggregation, also suggested a chaperone role (Hayano et al. 1995; Yao et al. 1997), as part of the quality control system for the correct folding of the proteins synthesized in the ER (Turano et al. 2002). Several additional important functions of PDI have been detected later (Ferrari and Soling 1999; Pihlajanieni et al. 1987; Wetterau et al. 1990; Lucero and Kaminer 1999; Cheng et al. 1987; Bennet et al. 1988; Tsai et al. 2001; Tanaka et al. 2000). The human PDI gene is located at 17q25, is about 18 kb in size and consists of 11 exons and 10 introns, both varying in size (Tasanen et al. 1988). Typical PDI is the most prominent member of a growing family of related proteins characterised by one, two or three thioredoxin-like active domains (Ferrari and Soling 1999). Several PDI-like genes encoding proteins with unusual primary structure, different expression patterns and exhibiting a surprising range of activities have recently been identified in every extensively sequenced mammalian genome (Turano et al. 2002; Clissold and Bicknell 2003; Wilkinson and Gilbert 2004; Ellgaard and Ruddock 2005). The term PDI refers to both the family and the first member of the family isolated in mammalians, which is also the best characterized. About twenty members of the PDI family have recently been identified in humans (Ellgaard and Ruddock, 2005). All the proteins of the PDI family are located in the ER, some of them in relatively high amounts, and are often considered to perform their functions exclusively in this compartment. Turano et al. (2002), however, have indicated that at least some members of the family are also present in unusual subcellular locations (i.e. the cell surface, the extracellular space, the cytosol and the nucleus), which are reached through an export mechanism not yet identified (Turano et al. 2002). On few occurrences their function in locations different from ER is clearly related to their redox properties, but in most cases their mechanism of action is 4 still unknown, although their tendency to associate with other proteins, or even with DNA, might be the main factor related to their activities (Turano et al. 2002). 3. PLANT PDI Information on structural and functional aspects of PDIs and PDI-like proteins in plants is still very limited. PDI cDNA sequences have been cloned and sequenced from species such as alfalfa (Shorrosh and Dixon 1991), barley (Chen and Hayes 1994), common and durum wheat (Shimoni et al. 1995a; Ciaffi et al. 2001), maize (Li and Larkins 1996) and castor bean (Coughlan et al. 1996); two distinct cDNA sequences encoding PDI-related proteins have been isolated and characterized in alfalfa (Shorrosh and Dixon 1992) and carrot (Xu et al. 2002). Recently the available whole genome sequences of Arabidopsis and rice allowed to investigate in detail the complexity and diversity of PDI-like genes in plants. The genomewide structural annotation of the PDI gene family in Arabidopsis has resulted in the identification of 13 genes distributed across all its five chromosomes (Houston et al. 2005). Using the Arabidopsis PDI-like sequences in iterative BLAST searches of public and proprietary sequence databases, an orthologous set of 12 PDI-like sequences was identified in both rice and maize (Houston et al. 2005). Phylogenetic analyses conducted on the Arabidopsis, rice and maize proteins indicated that the plant PDI family may include at least eight different gene subfamilies, whose members can be grouped on the basis of sequence homology, number and position of active domains and presence/absence of the KDEL signal for ER retention. Members of the first five groups or subfamilies (I-V) had two thioredoxin-like active domains and showed structural similarities to different PDI-like proteins in other higher eukaryotes. The remaining three subfamilies (VI-VIII) contain proteins with a single thioredoxin-like active domain. Proteins in phylogenetic groups I, II and III were similar in size (500-560 aa) and were predicted to be secretory proteins with putative signal peptides and the C-terminal 5 KDEL-like ER retention signals. The first group includes the typical PDIs identified in several plant species, whose members, as it will be reported in the following paragraphs, have been previously characterized in wheat (Ciaffi et al. 2005). Proteins represented in group IV were approximately 360 aa in length, but lacked a KDEL-like for ER retention. Members of the subfamily V were longer (approximately 440 aa) and had a KDEL-like ER retention signal at their C-termini. Aside from having a single thioredoxin domain, group VI, VII and VIII proteins shared few structural features. Such diversity is not surprising, given the divergent evolutionary origins indicated by phylogenetic analyses. The close phylogenetic relationship between the nucleotide sequences encoding single domain group VII proteins and the Nterminal thioredoxin domain of members of the groups I, II and III is consistent with the hypothesis that the group VII proteins have emerged by domain loss from a two domain PDIlike precursor. Members of the VI and VIII subfamily resembled small single-domain PDI from lower eukaryotes, such as yeast and Giardia lamblia, indicating that they could have retained an ancestral domain structure. PDI-like proteins of group VI were the shortest members of the plant PDI family, with only approximately 150 aa, whereas proteins of the groups VII and VIII were much larger (418-485 aa). PDI-like proteins from groups VI and VII were predicted to be secretory proteins with signal peptides, but none of the single thioredoxin domain proteins had KDEL-like sequences. Finally, all the members of the groups VII and VIII contain a transmembrane segment and hence were predicted to be membrane proteins. Despite the recent analysis of the complexity and diversity of the PDI gene family of plants, there are still numerous unanswered questions concerning the location and physiological function of the individual proteins. Up to now most studies on the molecular characterization, transcriptional regulation and intracellular localization of members of the PDI family have been carried out on the typical PDIs of some cereal species (Chen and Hayes 1994; Shimoni et al. 1995b; Li and Larkins 1996). These studies indicated that the PDI enzyme may 6 accomplish an important role in the folding of plant secretory proteins, and particularly in the formation of endosperm protein bodies. Lines of evidence supporting a role for PDI in the storage protein deposition in cereals derive also from the analysis of some maize and rice mutants producing seeds with altered endosperm protein bodies. PDI expression in the endosperm of a maize floury2 mutant was considered to be induced by a systemic stress signal due to the production of a defective -zein storage protein resulting in the abnormal association of protein bodies (Li and Larkins 1996). Two more maize mutants, mucronate (mc) and defective endosperm B30 (de*-B30), exhibit an endosperm-specific increase of PDI as a result of structural changes in storage proteins (Wrobel 1996; Kim et al. 2004). Recent investigations on a natural rice mutant with irregular protein bodies found that the main storage proteins of rice, that is prolamins and glutelins, which normally form discrete protein bodies containing separately either protein class, failed to segregate correctly, forming new and smaller protein bodies containing both prolamin and glutelin precursors bound by disulfide bonds (Takemoto et al. 2002). The failure of forming correct protein bodies was demonstrated to be due to absence of PDI expression, suggesting an essential and direct role of PDI in the segregation of the two classes of polypeptides and formation of protein bodies. 4. THE WHEAT PDI 4.1. Intracellullar localization and expression analyses The presence of PDI in wheat endosperm was initially demonstrated by Roden at al. (1982), who showed that PDI activity was associated with ER fractions isolated by ultracentrifugation of homogenates of developing endosperms. Similarly, Livesley et al. (1992) showed that PDI was associated with microsomal (ER) fraction from embryos and aleurone layers of dry mature and germinating grains of wheat. To study more accurately the role of PDI in the maturation of plant proteins, Shimoni et al. (1995b) purified PDI from wheat endosperm and showed that wheat PDI appears as a 60-kD glycoprotein and is among the most 7 abundant proteins within the ER of developing grains. Subcellular localization analysis and electron micrographs of immunogold labelling showed that PDI is not only present in the lumen of the ER, but it is also co-localized with the storage proteins in protein bodies. The presence of PDI in the ER of wheat endosperm does not prove that it is necessary for the folding of storage proteins in vivo, although Bulleid and Freedman (1988a) showed that it was able to catalyse the formation of intra-molecular disulfide bonds in a -gliadin synthesised in vitro from a cloned cDNA. The newly synthesised polypeptide was transported into the microsomal fraction (ER) from dog pancreas. When transcription and translation were carried out under conditions favouring disulfide bond formation, the protein had a faster electrophoretic mobility than when separated after reduction, indicating the presence of intra-chain disulfide bonds. This phenomenon was not observed when the microsomes were treated to remove PDI and other soluble lumenal proteins, but was restored by the addition of purified PDI. Similar studies were carried out with genes encoding the high molecular weight glutenin subunits 1Dy10 and 1Dy12 and a low molecular weight glutenin subunit (Bulleid and Freedman 1988b; 1992); in all cases the products synthesised under conditions favouring disulfide bond formation migrated slightly faster than the fully reduced proteins. This higher mobility was ascribed to the formation of intra-chain disulfide bonds. However, the system was not able to precisely mimic the situation in the ER of wheat endosperm, since proteins failed to form disulfide-stabilised oligomers. Authors offered different possible explanations, such as low concentration of proteins, short time course or use of a heterologous (dog pancreas) system. Analysis of PDI expression in durum wheat showed that its mRNA was constitutively present in several tissues, but it was expressed at a very low level in coleoptiles, roots, leaves and florets, and at a very high level in developing caryopses, where the transcript content was very high during the early stages of seed development (9 to 17 days after anthesis) (Ciaffi et 8 al. 2001). This finding was in agreement with the results obtained in common wheat by Shimoni et al. (1995b) and Grimwade et al. (1996), who had shown, at protein and mRNA levels respectively, that the temporal expression of PDI was not tightly co-ordinated with the expression of storage proteins, starting earlier in grain development and reaching a maximum before the period of highest gluten synthesis. Although available data do not allow to affirm unequivocably that PDI is essential for folding and deposition of wheat storage proteins, they indicated its involvement at some early stage of protein processing and protein body formation, or that it may have a more general, housekeeping role in the processing of secretory proteins. 4.2. Characterization of PDI genes and of their promoters Wheat genes coding for typical PDI in wheat have been located in chromosome group four by Ciaffi et al. (1999). Using a probe consisting in most of the PDI coding sequence cloned by PCR amplification using primers designed on the basis of the published cDNA sequence (Shimoni et al. 1995a), they were able to detect by Southern analysis of DNA of Triticum aestivum cv. Chinese Spring (CS), four fragments of different length, which were located in decreasing order on chromosome arms 4AL, 4BS, 4DS and 1BS by CS di-telosomic lines. However, recent findings indicate that the PDI gene on chromosome 1B may correspond to a pseudogene, missing part of the 3’ coding sequence (Johnson and Bhave 2004). Location of a PDI gene sequence in the long rather than in the short arm of chromosome 4A, was considered consistent with the pericentric inversion in this chromosome (Devos et al. 1995). The number of PDI gene sequences is in line with results obtained in other plant species, such as alfalfa (Shorrosh and Dixon 1991), maize (Li and Larkins 1996) and castor bean (Coughlan et al. 1996), which possess single copy sequence, whereas two independent PDI loci were de- 9 tected in barley (Chen and Hayes 1994). More recently, two and three PDI gene sequences have been identified in Arabidopsis and rice genomes (Houston et al. 2005). Assessment of Restriction Fragment Length Polymorphisms (RFLPs) in different accessions and lines of hexaploid and tetraploid cultivated species of Triticum, showed that the restriction fragments located on chromosomes of the homoeologous group 4 were highly conserved and that polymorphism occurred only at the 1B locus. Similar analyses performed on 23 species of Triticum and Aegilops (Ciaffi et al. 2000), indicated that PDI restriction fragments were highly conserved within each species and confirmed that plant PDI is encoded either by single or few copy sequences per genome, respectively in diploid and polyploid species. The Aegilops species of the Sitopsis section showed a rather complex pattern and a high level of intraspecific variation, with the exception of Ae. searsii, which possessed a single, conserved PDI fragment. T. urartu and Ae. tauschii showed single fragments with the same mobility as those located respectively in the A and D genomes of polyploid species, whereas differences were observed between the hybridization patterns of T. monococcum and T. boeoticum and that of the A genome. The hybridization pattern of T. zhukovskyi was identical to that of T. timopheevi, except for the presence of an additional strong hybridization fragment having the same mobility as the one detected in T. boeoticum and T. monococcum. The nucleotide sequences of the three genes located on genomes A, B, and D (designed as GPDI-4A, GPDI-4B and GPDI-4D) were 3561, 3527 and 3466 bp long, respectively (Ciaffi et al. 2005). Their alignment and comparison with the corresponding cDNA sequences (indicated as CPDI-4A, CPDI-4B and CPDI-4D) showed that they possess a conserved complex structure consisting of ten exons (Fig. 2). The first 5’ exon included a nontranslated region of 32 bp and a translated region of 200 bp, with a 75 bp initial segment encoding a putative signal peptide of 25 amino acids. Codons for disulfide isomerase catalytic sites (CGHC) were located in the second exon, starting from the first nucleotide, and in the 10 ninth exon, starting from the 26th nucleotide. Codons for a potential N-glicosilation site (NSF) were in the sixth exon starting from the 15th nucleotide. The tenth exon included a non-translated region (171 bp for GPDI-4B and 168 bp for GPDI-4A and GPDI-4D) and a translated region (216 bp for GPDI-4B and 225 bp for GPDI-4A and GPDI-4D). Moreover, the three genes showed a consensus sequence for the tetrapeptide KDEL for protein retention within the ER, at the 3’ translated end (Ciaffi et al. 2005). The Open Reading Frames (ORFs) of the PDI genes located in chromosomes 4A and 4D consisted of 1545 bp, corresponding to polypeptides of 515 amino acids, with an estimated molecular weight of 56.6 kDa, whereas that on chromosome 4B was shorter, with a length of 1536 bp, corresponding to 512 amino acids, and an estimated molecular weight of 56.3 kDa. The three deduced protein sequences were rich in acidic residues and had 4.7 pI. The nucleotide sequences of the three ORFs showed an identity ranging between 96.0 and 97.5%. Only 14 of the 66 nucleotide substitutions in the coding regions caused amino acid changes, with six of them able to modify the physico-chemical features of the mature protein (Ciaffi et al. 2005). GPDI-4B and GPDI-4D showed high identity (94.5%), whereas GPDI-4A had the same identity (92.0 %) both with GPDI-4B and GPDI-4D. Identity between introns was 8991% and that between exons 94-100%. Exons showed single nucleotide substitutions only, with the single deletion of nine nucleotides within the tenth exon of GPDI-4B, whereas the introns had more frequent base substitutions and insertions/deletions, which caused the different length of the genomic sequences (Fig. 2). The genomic sequence on chromosome 4D (GPDI4D) of CS showed a very high identity (99.7%) with that of Aegilops tauschii (Johnson and Bhave 2004). The most noteworthy differences were a 34 bp deletion at the end of the first intron and a two bp deletion in the eighth intron of GPDI-4D. The A genome PDI gene sequences of hexaploid and tetraploid species were very conserved, showing 99.6% identity 11 (Ciaffi et al. 2001). Comparisons of these genomic sequences with those of Arabidopsis and rice showed a significant conservation of the exon/intron structure and exon size across the three species, most probably due to a strong relationship between the domain organization of the encoded proteins and the genomic structure of the corresponding genes (Ciaffi et al. 2005). Although the deduced amino acid sequence of three wheat PDI genes exhibited an overall identity of only 31% to that of the human PDI, their modular architectures in terms of number, size, location and secondary structure-propensities of the constituent domains are remarkably similar. Sequence homologies, both internally and to the human PDI, indicated that proteins encoded by the three genes are composed of four major regions, corresponding almost exactly to the a, b, b’ and a’ domains of the human PDI (Fig. 1). Secondary structure analysis revealed that the a and a’ domains of wheat PDIs, which are homologous both to each other (43% identity) and to thioredoxin, adopted a thioredoxin-like folding. However, both the putative b and b’ domains of the wheat proteins had a folding pattern very similar to that of the a and a’ modules, although the extent of sequence identity between the b and b’ regions was not adequate to consider them as internal repeats; moreover, no sequence homology was detected between them and any thioredoxin or thioredoxin-like domains. Although specific structural studies would be necessary to recognize the domain boundaries and to define their structure unambiguously, the proposed multidomain structure of wheat PDI would suggest that in plants, as in mammalians, the PDI domains may stem from partial gene duplication of a common thioredoxin ancestral gene, followed by sequence divergence. Probably those domains arose before the appearance of most eukaryotic species, because homologous PDI sequences are present in eukaryotes as diverse as fungi, insects, mammalians and plants (Freedman et al. 1994; Sahrawy et al. 1996; Ferrari and Soling 1999). All eukaryotic PDIs have recognisable a and a’ modules, whereas the putative b and b’ modules have diverged to such extent within 12 and between species that their homology is doubtful, however the corresponding segments contain always approximately the same number of amino acids and retain some elements of the thioredoxin folding. As far as the promoter sequences of the three homoeologous genes are concerned, Ciaffi et al. (2005) cloned the upstream region of the translation start codon of every one of them. Their length was 1352 bp for PromPDI-4A, 1370 bp for PromPDI-4B and 1292 bp for PromPDI-4D. The three sequences showed 89% identity, high degree of conservation in the 700 nt proximal sequence, with identity exceeding 93%, and low in the distal region, with about 80% identity. Differences were due to both nucleotide substitutions and short insertions/deletions. Every promoter showed a TATA-box located at –79 nt from the start codon and several CAAT boxes. They had a number of different cis-acting conserved regulatory elements (Table 1), including several motifs (AACA, GCN4, prolamin box, RY element, Skn-1) involved in the regulation of endosperm specific genes (Guilfoyle 1997; Albani et al. 1997; Wu et al. 1998). Expression analysis of the three CS PDI genes, performed by RT-PCR on mRNAs extracted from roots, coleoptiles, spikelets, leaves and developing caryopses collected at short intervals between 6 and 34 days after anthesis (DAA), indicated that the three genes were constitutively present in all the tissues tested, with a very strong expression in immature caryopses, where transcription levels were quite similar (Fig. 3). The transcription levels of the three genes were higher in the early stage of seed development (6-14 DAA) and decreased during middle to late stage of grain filling (18-34 DAA). The lowest level of trancripts was observed for all the three genes in coleoptiles , whereas differences in their expression were detected in spikelets, roots and leaves (Fig. 4). CPDI-4A transcription was higher in spikelets, that of CPDI-4B was higher in roots, the CPDI-4D transcripts were more abundant in leaves. 13 It is noteworthy that no differences in the expression of the three genes were detected at different stages of caryopses development (Fig. 3). 4.3. Identification of novel wheat PDI-like gene sequences A search in the HarvEST Wheat database allowed Ciaffi et al. (in preparation) to identify several sequences containing ORFs with significant similarity to the coding regions of genes assigned to five of the eight PDI phylogenetic groups identified in the Arabidopsis genome (Houston et al. 2005). Among them, 18 wheat ESTs corresponded to the full length PDI sequences of Arabidobsis belonging to the fifth subfamily, and 15 ESTs to the second subfamily, both characterised for having two thioredoxin-like active domains and structural similarities to different PDI-like proteins in higher eucaryotes, whereas two distinct groups of EST sequences covered only part of the coding region of the fourth and seventh subfamilies. Finally, several ESTs showed significant homology with Arabidopsis sequences of the first phylogenetic group, whose members, as reported previously, have been cloned and extensively characterized in wheat. Full length (II and V groups) and partial (IV and VII groups) cDNA sequences were generated by RT-PCR from mRNAs of immature caryopses of CS and cloned. Successively, the validity of the full length (WHEPDI-3 and WHEPDI-4) and partial (WHEPDI-2 and WHEPDI-5) cDNA clones was checked and confirmed by sequence analysis and comparison. Full-length cDNA sequences were obtained by 5’ and 3’ RACE extension for the two incomplete sequences WHEPDI-2 and WHEPDI-5. WHEPDI-2 exhibited only 19.4 % identity with the typical wheat PDI of the first group (WHEPDI-1), it was shorter, had less amino acids separating the pair of thioredoxin active domains and there was a C-terminal -helical domain of about 100 aa, termed D domain, whose function is unknown (Fig. 5). In spite of the presence of a potential ER-translocation signal, this protein lacks an ER-retention signal, suggesting that it might be targeted to a dif14 ferent subcellular location or could be retained as part of a heteromeric complex with other subunits containing such signal. WHEPDI-3 exibited only 19.6 and 18.8% identity with WHEPDI-1 and WHEPDI-2, respectively. This protein includes 440 amino acid residues, the two thioredoxin active domains are separated by 32 amino acids, and the putative protein contains the C-terminal KDEL signal for ER retention. The deduced aa sequence of WHEPDI-3, as the other plant members of the fifth subfamily, is tightly related to the mammalian P5 PDI-like proteins in terms of sequence homology (about 40% identity), position of thioredoxin domains and size of polypeptides. The aa sequence of WHEPDI-4 showed 27.4, 18 and 14% identity with WHEPDI-1, WHEPDI-2 and WHEPDI-3, respectively; it is the largest (585 aa) among the PDI-like proteins identified in wheat. WHEPDI-4 is characterised by the presence of a domain, located at the N-terminus of the mature protein, that contains 40% of acidic residues (E+D). Remarkably, this domain is reminiscent of the c domain found close to the C-terminus of typical PDI from mammalians and to the N-terminus of homologs of ERP72 (Ferrari and Soling, 1999). In mammalians this domain is a putative low-affinity, high capacity calcium-binding site. The deduced amino acid sequence of WHEPDI-4 is also characterized by the presence of a signal peptide of 20 aa, of two thioredoxin active domains separated by 234 aa and of the C-terminal signal KDEL for ER retention. Finally, WHEPDI5 deduced aa sequence is characterised by the presence of a single thioredoxin active domain, of a transmembrane segment, and by the absence of the KDEL signal. Recently, proteins with similar structure have been identified in man, Drosophila and C. elegans (Clissold and Bicknell 2003; Wilkinson and Gilbert 2004, Ellgaard and Ruddock 2005). 15 5. CONCLUSIONS The multigenic family, comprising PDI and PDI-like proteins, accomplishes manifold metabolic functions. Their role has been shown by many studies, mostly in mammalians, whereas in plants the knowledge on structural and functional features of this group of proteins and on their encoding genes is much less extensive. As for their involvement in determining the technological properties of wheat flour, the study of the gene family encoding these proteins in wheat is important from the applied viewpoint, but it would be very important also for understanding the molecular evolution of this multigenic family in a polyploid context. Up to now all studies on molecular characterization and transcriptional regulation were exclusively focused on wheat typical PDI; the limited available information show that in hexaploid wheat it is coded by three homoeologous genes located in the group 4 chromosomes, that the coding nucleotide sequences, exon/intron structures and regulatory sequences of these genes are well conserved, and all functional. Likely, the high evolutionary conservation reflects the important functional role of their product in protein folding. The negligible differences between their cDNAs at the level of both nucleotide and deduced amino acid sequences do not support a functional differentiation of their gene products, as only six of the 14 replaced amino acids may modify the physico-chemical features of the mature proteins. The observation that the expression of the three homoeologous genes is similar and much higher in immature caryopses than in other wheat tissues is consistent with the assumption that the quality control system involving PDI is up-regulated in tissues wherein it takes place abundant synthesis of secretory proteins, such as the wheat endosperm. The higher amount of PDI transcripts of the three homoeologous genes detected in developing endosperm is coherent with the presence in their promoter regions of several conserved motifs, which have been shown as being involved in the regulation of genes preferentially expressed in the endosperm. Differences observed in the level of transcripts of the three genes in spikelets, roots and leaves would suggest a differ16 ential regulation of transcription rates of the three wheat PDI genes. Future studies should focus on the functional analysis of the promoter regions of the three PDI genes to elucidate the mechanism controlling their spatial and temporal specific expression and the role of the single regulatory motifs. In particular, expression analysis by reporter genes of progressive deletions in their distal ends and/or base substitutions within putative consensus sequences would be helpful for identifying the cis-elements that contribute to the higher transcriptional levels observed in seeds and the differential expression in other tissues. Despite the recent data on the complexity and diversity of the PDI gene family in plants, there are still numerous unanswered questions concerning the cell location and physiological functions of the single PDI and PDI-like proteins. For each of them it will be necessary to determine whether they have overlapping and redundant or separate and specific target substrates. Determining the enzymatic specificity of the plant PDI-like proteins and their capacity to act independently or by interacting with other proteins in a redox chain would be important for understanding their role in production and/or isomerization of disulfide bridges, as well as their physiological accomplishments. In wheat most members of the PDI family have yet to be identified, since only five PDI-like genes have been isolated, as described previously. Further researches will be needed for elucidating the complexity and diversity of the PDI-like proteins in wheat and for understanding their involvement and role in folding, assembly, transport and deposition of seed storage proteins. A possible approach for exploring these aspects would be to modify the expression levels of the PDI and PDI-like genes in wheat transgenic plants and to examine the resulting phenotype, with particular attention to the processing and storage of proteins which pass through the secretory system and to their direct and/or indirect relevance on the technological quality. ACKNOWLEDGMENTS 17 This research was supported by the MIUR (Italian Ministry of Instruction, University and Research), “FIRB” project (D.M. 199, 8/3/2001, Prot. RBNE01TYZF). This paper is dedicated to G. T. Scarascia Mugnozza on the occasion of his 80th birthday. 6. REFERENCES Albani D, Hammond-Kosack MCU, Smith C, Conlan S, Colot V, Holdsworth M and Bevan MW (1997) The wheat transcriptional activator SPA: a seed-specific bZIP protein that recognizes the GCN4-like motif in the bifactorial endosperm box of prolamin genes. Plant Cell 9: 171-184 Bennet CF, Balcarek JM, Varricchio A and Crooke ST (1988) Molecular cloning and complete amino-acid sequence of form-I phosphoinositide-specific phospholipase C. Nature 334: 268-270. Bulleid NJ and Freedman RB (1988a) Defective co-translational formation of disulphide bonds in protein disulphide-isomerase-deficient microsomes. Nature 335: 649-651. Bulleid NJ and Freedman RB (1988b) The trancription and translation in vitro of individual cereal storage-protein genes from wheat (Triticum aestivum cv. Chinese Spring). Biochemical Journal 254: 805-810. Bulleid NJ, Shewry PR and Freedman RB (1992) Exploring the structure and assembly of wheat storage proteins using an in vitro trancription/translation system. In “Plant Protein Engineering” (PR Shewry and S Gutteridge, eds) Cambridge University Press, Cambridge, UK pp 201-208. Chen F and Hayes PM (1994) Nucleotide sequence and developmental expression of duplicated genes encoding protein disulfide isomerase in barley (Hordeum vulgare L.). Plant Physiol 106: 1705-1706. 18 Cheng SY, Gong QH, Parkinson C, Robinson EA, Appella E, Merlino GT and Pastan I (1987) The nucleotide sequences of a human cellular thyroid hormone binding protein present in endoplasmic reticulum. J Biol Chem 262: 11221-11227. Ciaffi M, Dominici L, Tanzarella OA and Porceddu E (1999) Chromosomal assignment of gene sequences coding for protein disulphide isomerase (PDI) in wheat. Theor Appl Genet 98: 405-410. Ciaffi M, Dominici L, Umana E, Tanzarella OA and Porceddu E (2000) Restriction Fragment Length Polymorphism (RFLP) for protein disulfide isomerase (PDI) gene sequences in Triticum and Aegilops species. Theor Appl Genet 101: 220-226. Ciaffi M, Paolacci AR, Dominici L, Tanzarella OA and Porceddu E (2001) Molecular characterization of gene sequences coding for protein disulfide isomerase (PDI) in durum wheat (Triticum turgidum ssp. durum). Gene 265: 147-156. Ciaffi M, Paolacci AR, d’Aloisio E, Tanzarella OA and Porceddu E (2005) Cloning and characterization of wheat PDI (Protein disulfide isomerase) homoeologous genes and promoter sequences. Gene (Accepted). Clissold PM and Bicknell R (2003) The thioredoxin-like fold: hidden domains in protein disulfide isomerases and other chaperone proteins. BioEssays 25: 603-611. Coughlan SJ, Hastings C and Winfrey RJ (1996) Molecular characterization of plant endoplasmic reticulum: Identification of protein disulfide-isomerase as the major reticuloplasmin. Eur J Biochem 235: 215-224. Deleage G, Blanchet C and Geourjon C (1997). Protein structure prediction. Implication for biologist. Biochemie 79(11): 681-686. Denecke J, De Rycke R and Botterman J (1992) Plant and mammalian sorting signals for protein retention in the endoplasmic reticulum contain a conserved epitope. EMBO J 11: 2345-2355. 19 Denecke J (1996) Soluble endoplasmic reticulum resident proteins and their function in protein synthesis and transport. Plant Physiol Biochem 34: 197-205. Devos KM, Dubcovsky J, Dvorak J, Chinoy CN and Gale MD (1995) Structural evolution of wheat chromosomes 4A, 5A and 7B and its impact on recombination. Theor Appl Genet 91: 282-288. Ellgaard L and Ruddock LW (2005) The human protein disulphide isomerase family: substrate interactions and functional properties. EMBO reports 6: 28-32. Fassio A and Sitia R (2002) Formation, isomerization and reduction of disulphide bonds during protein quality control in the endoplasmic reticulum. Histochem Cell Biol 117: 151157. Ferrari DM and Soling HD (1999) The protein disulphide-isomerase family: unravelling a string of folds. Biochem J 339: 1-10. Frand AR, Cuozzo JW and Kaiser CA (2000) Pathways for protein disulphide bond formation. Trends Cell Biol 10(5): 203-210. Freedman RB, Hirst TR and Tuite MF (1994) Protein disulphide isomerase: building bridges in protein folding. Trends Biochem Sci 19: 331-336. Grimwade B, Tatham AS, Freedman RB, Shewry PR and Napier JA (1996) Comparison of the expression patterns of wheat gluten proteins and proteins involved in the secretory pathway in developing caryopses of wheat. Plant Mol Biol 30(5): 1067-1073. Guilfoyle TJ (1997) The structure of plant gene promoters. In: Setlow JK (ed) Genetic engineering vol 19, Plenum Press, New York, pp15-47. Hayano T, Hirose M and Kikuchi M (1995) Protein disulfide isomerase lacking its isomerase activity accelerates folding in the cell. FEBS Letters 377(3): 505-511. 20 Houston NL, Fan C, Xiang QY, Schulze JM, Jung R and Boston RS (2005) Phylogenetic analyses identify 10 classes of the protein disulfide isomerase family in plants, including single-domain protein disulfide isomerase-related proteins. Plant Physiol 137: 762-778. Johnson JC and Bhave M (2004) Molecular characterisation of the protein disulphide isomerase genes of wheat. Plant Sci 167: 397-410. Kemmink J, Darby NJ, Dijkstra K, Nilges M and Creighton TE (1997). The folding catalyst protein disulfide isomerase is constructed of active and inactive thioredoxin modules. Curr Biol 7(4): 239-245. Kim CS, Hunter BG, Kraft J, Boston RS, Yans S, Jung R and Larkins BA (2004) A defective signal peptide in a 19-kD alpha-zein protein causes the unfolded protein response and an opaque endosperm phenotype in the maize De*-B30 mutant. Plant Physiol 134: 380387. Li CP and Larkins BA (1996) Expression of protein disulfide isomerase is elevated in the endosperm of the maize floury-2 mutant. Plant Mol Biol 30: 873-882. Livesley MA, Bulleid NJ and Bray CM (1992) Protein disulfide isomerase in germinating wheat (Triticum aestivum) seed and loss of viability. Seed Sci Res 2: 97-103. Lucero HA and Kaminer B (1999) The role of calcium on the activity of ER calcistorin/Protein disulfide isomerase and the significance of the C-terminal and its calcium binding. A comparison with mammalian protein-disulfide isomerase. J Biol Chem 274(5): 3243-3251. Lyles MM and Gilbert HF (1991) Catalysis of the oxidative folding of ribonuclease A by protein disulfide isomerase: dependence of the rate on the composition of the redox buffer. Biochemistry 30(3): 613-619. Noiva R and Lennarz WJ (1992) Protein disulfide isomerase. A multifunctional protein resident in the lumen of the endoplasmic reticulum. J Bio Chem 267(6): 3553-3556. 21 Pihlajaniemi T, Helaakoski T, Tasanen K, Myllyla R, Huhtala ML, Koivu JG and Kivirikko KI (1987) Molecular cloning of the beta-subunit of human prolyl 4-hydroxylase. This subunit and protein disulphide isomerase are products of the same gene. EMBO J 6: 643-649. Roden LT, Miflin BJ and Freedman RB (1982) Protein disulphide isomerase is located in the endoplasmic reticulum of developing wheat endosperm. FEBS Lett 138: 121-124. Rubin R, Levanoy H and Galili G (1992) Evidence for the presence of two different types of protein bodies in wheat endosperm. Plant Physiol 99: 718-724. Sahrawy M, Hecht V, Lopez-Jaramillo J, Chueca A, Chartier Y and Meyer Y (1996). Intron position as an evolutionary marker of thioredoxins and thioredoxin domains. J Mol Evol 42: 422-431. Schwaller M, Wilkinson B and Gilbert HF (2003) Reduction-reoxidation cycles contribute to catalysis of disulfide isomerisation by protein-disulfide isomerase. J Biol Chem 278(9): 7154-7159. Shewry PR and Tatham AS (1997) Disulphide bonds in wheat gluten proteins. J Cereal Sci 25: 207-227. Shewry PR, Halford NG, Tatham AS, Popineau Y, Lafiandra D and Belton PS (2003) The high molecular weight subunits of wheat glutenin and their role in determining wheat processing properties. Adv Food Nutr Res 45: 219-302. Shimoni Y, Segal G., Zhu X and Galili G (1995a) Nucleotide sequence of a wheat cDNA encoding protein disulfide isomerase. Plant Physiol 107: 281. Shimoni Y, Zhu X, Levanoy H, Segal G and Galili G (1995b) Purification, characterization, and intracellular localization of glycosylated protein disulfide isomerase from wheat grains. Plant Physiol 108: 327-335. 22 Shorrosh BS and Dixon RA (1991) Molecular cloning of a putative plant endomembrane protein resembling vertebrate protein disulfide-isomerase and a phosphatidylinositolspecific phospholinase. Proc Natl Acad Sci USA 88: 10941-10945. Shorrosh BS and Dixon RA (1992) Molecular characterization and expression of an alfalfa protein with sequence similarity to mammalian ERp72, a glucose-regulated endoplasmic reticulum protein containing active site sequences of protein disulphide isomerase. Plant J 2: 51-58. Takemoto Y, Coughlan SJ, Okita TW, Hikaru S, Masahiro O and Tohihiro K (2002). The rice mutant esp2 greatly accumulates the glutenin precursor and deletes the protein disulfide isomerase. Plant Physiol 128: 1212-1222. Tanaka S, Uehara T and Nomura Y (2000) Up-regulation of protein-disulfide-isomerase in response to hypoxia/brain ischemia and its protective effect apoptotic cell death. J Biol Chem 275: 10388-10393. Tasanen K, Parkkonen T, Chow LT, Kivirikko KI and Pihlajaniemi T (1988) Characterization of the human gene for a polypeptide that acts both as the beta subunit of prolyl 4hydroxilase and as protein disulfide isomerase. J Biol Chem 263: 16218-16624. Tsai B, Rodighiero C, Lencer WI and Rapoport TA (2001) Protein disulfide isomerase acts as a redox-dependent chaperone to unfold cholera toxin. Cell 104: 937-948. Tu BP and Weissman JS (2004) Oxidative protein folding in eukaryotes:mechanisms and consequences. J Cell Biol 164(3): 341-346. Turano C, Coppari S, Altieri F and Ferraro A (2002) Proteins of the PDI family: unpredicted non-ER locations and functions. J Cell Physiol 193: 154-163. Wetterau JR, Combs KA, Spinner SN and Joiner BG (1990) Protein disulfide isomerase is a component of the microsomal triglyceride transfer protein complex. J Biol Chem 265(17): 9801-9807. 23 Wilkinson B and Gilbert HF (2004) Protein disulfide isomerase. Biochim Biophys Acta 1699: 35-44. Wrobel R (1996) Expression of molecular chaperones in endoplasmic reticulum of maize endosperm. PhD thesis. North Carolina State University, Raleigh, NC. Wu CY, Suzuki A, Washida H and Takaiwa F (1998) The GCN4 motif in a rice glutelin gene is essential for endosperm-specific gene expression and is activated by Opaque-2 in transgenic rice plants. Plant J 14: 673-683. Xu ZJ, Ueda K, Masuda K, Ono M and Inoue M (2002) Molecular characterization of a novel protein disulfide isomerase in carrot. Gene 284: 225-231. Yao Y, Zhou YC and Wang CC (1997) Both the isomerase and chaperone activities of protein disulfide isomerase are required for the reactivation of reduced and denaturated acidic phosholipase A2. EMBO J 16: 651-658. 24 Table 1. Main regulatory motifs found within the promoter sequences of the three wheat PDI genes Distance from ATG Sequencea + + + - -226 -227 -226 -1117 -1135 -1105 TGAAAAGT CGAAAAGT CGAAAAGT TGAAAAGT TGAAAAGT TGAAAAGT 4A + -1111 CAAC- 4B + -1129 4D + -1099 Cis-acting element involved in seed specific expression of proteins in legumes and cereals 4A 4B 4D + + + -815 -810 -804 CATGCATT CATGCATT CATGCATT Cis-regulatory element involved in seed specific expression 4A 4B 4D 4A 4B 4D 4A 4B 4D 4A 4B 4D + + + + + + - -356 -358 -357 -492 -493 -491 -484 -485 -483 -780 -775 -769 CGACTCA CGAGTCA TGAGTCA CATGTCA CATGTCA CGTGTCA CGTGTCA CGTGTCA CATGTGA CGAGCCA CGAGCCA CGAGCCA 4A 4B 4D + + + -79 -79 -79 TATTAAA TATTAAA TATTAAA 4A 4B 4D - -480 -481 -479 Motif Function Chromosome location Prolamin box Cis-acting element associated with GCN4 in prolamin genes 4A 4B 4D 4A 4B 4D AACA CAATTTCG Cis-acting element conserved in rice glutelin genes and involved CAACAAACTTCG in endosperm specific expression RY-element GCN4 Strand CAAC- CATTTTCG TATA box Skn-1 Cis-acting regulatory element for endosperm expression Chromosome location ACGAC ACGAC ATGAC Strand + (number) Strand (number) CAAT box Cis-acting element common in promoter and enhancer regions 4A 4B 4D 12 14 12 12 17 15 GC-motif Cis-acting element common in promoter and enhancer regions 4A 4B 4D 6 7 10 11 11 10 a Consensus sequences of the motifs are in bold, base substitutions are underlined. 25 Fig. 1. Comparison of the domain structure of human and wheat typical PDIs. The elements of secondary structure, either determined for the a and b domain of the human PDI using NMR techniques (PDB Id 1BJX) or predicted for the putative a, a’, b and b’ domains of wheat CPDI-4A by the procedure of Deleage et al. (1997) are reported. Open boxes indicate residues present in helices, whereas those delimited by black solid boxes indicate residues present in strands. Fig. 2. Intron-exon structures of the three group 4 homoeologous PDI genes. The open boxes indicate exons and the solid black boxes denote introns, numbers represent exon and intron size (bp). The positions of the putative N-terminal signal peptide (SP), of the two thioredoxin-like active sites (CGHC) and of the C-terminal KDEL signal sequence for ER retention are also indicated. Fig. 3. RT-PCR of the three PDI genes in developing caryopses collected between 6 and 34 DAA (days after anthesis). RT-PCR products were taken after 20 and 25 cycles of amplification and analysed in 1.2% agarose gels. The -tubulin (TUB) constitutive gene was amplified as control. M: part of the DNA molecular weight marker XIV (Roche), the most intense band is 500 bp in length. Fig. 4. Expression analysis by RT-PCR of the three PDI genes in different wheat tissues (1: roots; 2: seedlings; 3: spikelets; 4: leaves; 5: developing caryopses 10 DAA). a) Agarose gel electrophoresis of PCR products after 20 and 25 cycles of amplification. The tubulin constitutive gene (TUB) was amplified as control. b) Southern blots of PCR 26 products after 18 and 23 cycles hybridised with probes represented by the cDNA sequences of the three homoeologous genes (CPDI-4A, CPDI-4B and CPDI-4D). Fig. 5 - Domain structure of the deduced amino acid sequences of wheat PDI-like genes. The position and length of the redox-active thioredoxin domains, of the acidic domain, of the D domain, of the transmembrane segment and of the putative signal peptide (SP) are indicated. Bars indicate the C-terminal KDEL signal sequence and the thioredoxin-like active sites (CGHC). The analysis of the predicted protein sequences of the wheat PDIlike genes was carried out by searching for conserved motifs at the Pfam HMMs, InterPro and SMART databases. 27 Human 132 1 17 21 137 234 367 369 236 479 480 508 SP CGHC KDEL CGHC a b b’ a’ c Wheat 1 25 39 149 152 252 372 382 491 492 515 SP CGHC a -strands CGHC b b’ -helices Fig. 1 28 KDEL a’ 32 31 200 SP 90 200 138 126 173 189 285 359 684 118 120 92 105 90 104 138 126 160 674 118 120 100 105 CGHC 31 32 200 SP 93 CGHC 31 32 SP 189 285 350 189 285 332 90 104 126 173 138 617 CGHC Fig. 2 29 118 120 101 105 113 144 CGHC 113 110 CGHC 113 114 CGHC 225 168 85 KDEL 216 171 86 KDEL 225 168 85 KDEL 30 DAA 6 10 14 18 22 26 30 34 DAA M 6 10 14 18 22 26 30 34 PDI-4A PDI-4B PDI-4D TUB 25 cycles 20 cycles Fig. 3 31 M Fig. 4 32 SP Thioredoxin domain Thioredoxin domain a a’ WHEPDI-1 40 SP Thioredoxin domain Thioredoxin domain a° a WHEPDI-2 32 SP 383 150 CGHC 139 150 CGHC 258 272 Thioredoxin domain a° a 133 165 CGHC CGHC Acidic domain SP E+D rich segment Thioredoxin domain WHEPDI-4 440 aa 271 NDEL Thioredoxin domain 99 102 SP CGHC Thioredoxin domain WHEPDI-5 207 441 CGHC Transmembrane segment 414 aa a 34 CGHC 140 585 aa a’ a 35 KDEL 367 aa CGHC 29 489 CGHC D domain Thioredoxin domain WHEPDI-3 515 aa 378 400 Fig. 5 33 549 KDEL