Site-specific Incorporation of trans-4-hydroxyproline in Recombinant Collagens and Gelatins Doug Brownfield, Emily Perttu, Eddie Wang, and James Zhang Introduction: In order for synthetic biology systems to better mimic biology, methods for posttranslational modification need to be developed. To reach their final form, many eukaryotic proteins are post-translationally modified at specific amino acid residues. Studying these proteins by expression in microbial hosts which generally lack the capabilities to make these additions can lead to products with diminished or no function. Furthermore, such studies provide no insight into the purpose of the modifications. Therefore it is in synthetic biology’s interest to have the tools necessary for creating these modifications. The goal of our project is to mimic the action of one such modification in E. coli, the conversion of proline residues to trans-4hydroxyproline residues. As an initial target protein, we will attempt to express collagens, as they are the best studied and arguably most important class of proteins that contain hydroxyproline. We hope to ensure high efficiency, fidelity, and yield of our product by designing a genetic regulatory pathway. Idea Overview: Our objective is to site-specifically incorporate the non-canonical amino acid, trans-4hydroxyproline into collagen for expression in E. coli. We will accomplish this by engineering an amber suppressor prolyl tRNA and cognate hydroxyprolyl tRNA aminoacyl synthetase. The aminoacyl synthetase will specifically charge the prolyl tRNA with hydroxyproline and, in response to amber stop codons, this charged tRNA will be read by the ribosome, thereby incorporating hydroxyproline. The trans-4-hydroxyproline residues will be continually supplied via introduction of an enzyme which catalyzes the transformation of proline to hydroxyproline. Finally, gene translation will be regulated by introduction of an engineered hydroxyproline responsive riboregulator. The resulting E. coli will have what amounts to a 21 amino acid genetic code with all the cellular machinery necessary for efficient collagen expression. Background: Collagen is the most abundant protein that comprises mammalian organisms, constituting 30% of a human’s protein mass. Serving as a scaffold, collagen is used by cells to mold their surroundings, eventually cultivating an environment conducive to cellular functionalization and tissue development. Besides mechanical support, collagen contains various ligands for growth factor receptors and integrins that can influence such cellular actions as cell adhesion, chemotaxis/migration, tissue remodeling, and wound healing. By definition, collagen molecules consist of three polypeptide chains (α chains) and have at least one domain composed of repeating Gly-X-Y sequences in each of the constituent chains (Myllyharju and Kivirikko, 2004). Currently, vertebrates have at least 27 collagen types with 42 distinct α chains. Some collagens form homotrimers with the three α chains while others contain two or even three different α chains. The X and Y positions can have any amino acid other than glycine, but typically proline is found in the X position and 4-hydroxyproline in the Y position. While 4-hydroxyprolines are essential for the stability of the triple helix, glycines are necessary for packing the three chains into a coiled-coil structure. This structure is characterized as a left-handed helix which is then wound around a common axis to form a triple helix with a shallow right-handed superhelical pitch, making the final structure a rope-like rod. Collagen Types To avoid confusion, collagens are numbered with roman numerals in the order of their discovery (types I-XXVII). When referring to a collagen’s composition, each of the three α chains are first numbered for chain number (1,2, or 3) then the collagen type is given in parentheses. For example, α2(I) means the second α chain is type I while α1(II) means the first α chain is type II collagen. Division of collagen types into families is made mainly by the mechanism and structure of matrix assembly. The nine collagen families with their relative types are: fibril-forming (I, II, III, V, XI, XXIV and XXVII), fibril-associated collagens with interrupted triple helices (FACITs) located on the surface of fibrils (IX, XII, XIV, XVI, XIX, XX, XXI, XXII and XXVI), hexagonal forming (VIII and X), basement membrane forming (IV), beaded filaments (VI), Anchoring fibrils for basement membranes (VII), transmembrane domains (XIII, XVII, XXIII and XXV), and the family of type XV and XVIII collagens. Certain collagens are expressed in a tissue specific manner, such as the types II, IX and XI that are found almost exclusively in cartilage, while type XVII is only found in skin hemidesmosomes. On the other hand, some collagen types are common in most extracellular matrices, such as type I. Moreover, collagen fibrils often consist of more than one collagen type, such as type I collagen fibrils that often contain small amounts of types III,Vand XII. Further heterogeneity within the superfamily results from alternative splicing of the transcripts of many of the genes as well as the use of alternative promoters in some genes. By the large number of structurally distinct members of the superfamily implies that they are involved in numerous biological functions (Kadler, 1995). Collagen assembly The majority of collagens share a similar formulation process that’s typically associated with type I. Starting inside the cell, three peptide chains are formed in ribosomes along the Rough Endoplasmic Reticulum (RER). These peptide chains are referred to as preprocollagens and each have registration peptides (on the end) as well as a signal peptide. These peptide chains are then sent into the lumen of the RER where they are cleaved into their procollagen forms. While still in the RER, these peptide chains proceed to undergo a series of functional changes. First, the lysine and proline amino acids are hydroxylated, a process dependent on ascorbic acid (Vitamin C). Next, specific hydroxylated amino acids are glycosylated, allowing the three chains to associate into a triple helical structure. Finally, the procollagen is shipped to the golgi apparatus where it is packaged and secreted by exocytosis. Once outside the cell, the collagen is again organized into a functional matrix. Registration peptides are cleaved via procollagen peptidase, forming tropocollagen, which can self-aggregate to form collagen fibrils, which also self-aggregate to form into collagen fibers. For non-fibrillar collagen, the N- and C-propeptides remain and may play a critical role in directing supramolecular assembly. After fiber formation, interchain crosslinking of collagen occurs between hydroxylysine and lysine residues following deamination from lysyl oxidase (Yamauchi and Shiiba 2002). Prolyl 4-Hydroxylase (P4H) As previously mentioned, hydroxylation of the Y-position proline residues is a critical modification for generating stable triple helical collagen. This modification is carried out in the lumen of the RER by the enzyme prolyl 4-hydroxylase (Tandon, 1998). The vertebrate forms of these P4H’s are α2 β2 tetramers in which the β subunit is identical to the protein disulfide isomerase PDI (Myllyharju, 2003). Various isoforms of the catalytic a subunit have been found in organisms of varying size and complexity; from humans to Drosophila (Vuori et al., 1992; Annunen et al., 1999). Another family of P4H’s in the cytoplasm has been uncovered and has been linked to the regulation of the hypoxia-inducible transcription factor HIF (Ivan, 2001). Cytoplasmic P4H’s have no PDI subunit, require different sequences flanking the prolines that are hydroxylated, and have markedly higher Km values (Kivirikko and Myllyharju, 1998). No overall amino acid sequence homology is detected between the collagen and the cytoplasmic HIF P4H’s, with the exception of critical catalytic residues. HIF is continuously synthesized and under normoxic conditions a critical proline residue in a -Leu-X-X-Leu-Ala-Prosequence is hydroxylated by the cytoplasmic P4H’s, not by collagen P4Hs. The resulting 4-hydroxyproline residue is essential for HIFα binding to the von Hippel–Lindau (VHL) E3 ubiquitin ligase complex for subsequent proteasomal degradation. However, under hypoxic conditions hydroxylation ceases, allowing HIFα to escape degradation and instead forms a stable dimer with HIFβ (Jaakkola, 2001). Once formed, the dimer is translocated into the nucleus and becomes bound to the HIF-responsive elements in a number of hypoxia-inducible genes, such as those for erythropoietin, vascular endothelial growth factor, glycolytic enzymes and even for the α(I) subunit of human type I collagen (Takahashi, 2002). Applications Collagen has been widely used in cosmetic surgery, hemostats, device coatings, resuscitation fluids, formulation excipients, capsules, cartilage reconstruction, drug delivery, as wells as skin substitutes for burn patients. However, both medical and cosmetic use is declining because most commercially available collagens are derived from bovine or porcine tissues. Mainly enriched in type I collagen, these preparations also contain small amounts of type III as well as other collagens that are difficult and expensive to remove from the desired material. Moreover, there is a high rate of allergic reactions from animal-derived collagens, causing prolonged redness. Using collagen derived from cows also poses the risk of transmitting prion diseases such as bovine spongiform encephalopathy (BSE). The scientific community also uses collagen in its studying its role in tissue development and disease. Extracting sufficient quantities of nontraditional or less prominent collagens is a costly and difficult task. A processed form of collagen commonly used is gelatin. Derived from denatured collagen, gelatin is composed of a mixture of collagen chains of different length, structure, and composition. This distribution depends on what type(s) of collagens are extracted, the extraction method, as well as the pH and ionic strength of the solution used for processing. Because gelatin is a heterogeneous composition, especially in size and isoelectric point, the resulting products will inevitably have variable gelling and physical properties (Olsen, 2005). This variability presents a significant challenge for medical applications where stability, safety, and control are necessary. Cheaply produced recombinant collagens and gelatins have the potential to alleviate many of the issues associated with animal derived versions. Given the large number of aforementioned applications there is also a large market in this area. Scalable technology is needed to make microbial expression of recombinant collagens a viable alternative to tissue extraction. Using microbes to engineer collagen allows for greater control over collagen synthesis and organization, which in turn increases the quality, consistency, and safety of collagen production. It would also provide an easy platform for introducing altered primary sequences into recombinant collagens. Such genetic control over collagen structure is crucial in studying the impact of specific mutations on collagen structural hierarchical assembly and associated functions and also would allow for the creation of designer collagen-mimetic materials. Recombinant expression would also allow for the extraction of sufficient quantities of native collagen forms that are present at low levels which are otherwise mainly characterized at cDNA and genomic levels. This would allow for structural and functional analysis of these rarer collagens. Biomaterials applications for collagens in hemostats, as skin substitutes, in cartilage reconstruction, and for drug delivery can benefit from the improved purity of cloned sources of collagen. Purity in this case would include both reducing other extracellular matrix components that may be carried through the purification process leading to potential inflammatory responses, or bioburdens with potential impact on human heath, particularly neurological disorders due to prion concerns. Recombinant human collagen seems to avoid immune reactions previously described and is therefore more biocompatible. Recombinantly derived collagen was shown to have superior mechanical strength and hemostatic activity compared to animal derived collagen when formed into a matrix. They can be altered to include bioactive peptide sequences as well as to be collagenase resistant. Recombinant gelatins can be tailored to alter their gelling temperature by controlling their hydroxyproline content. Moreover, they have been shown to be less allergenic. As they are widely used in the food and drug industry, recombinantly derived gelatins can be made animalfree and thus open for consumption by vegetarians (Baez, 2005). Past Work Besides tissue extraction, nonmicrobial/bacterial systems have also been developed for producing recombinant collagens. However, the current productivity, quality, and costs of these nonmicrobial systems are not attractive for commercial applications (Baez, 2005). Transfected mammalian cells with human collagen genes were first used for collagen production and is the most efficient system for expressing properly prolyl hydroxylated full-length collagen that also gets secreted (Ala-Kokko et al. 1991). Expression of collagen genes in insect cells yielded unstable non-hydroxylated collagen, but yielded hydroxylated versions when coexpressed with the human P4H gene (Lamberg, 1996). However they fail to secrete it and instead accumulate the product intracellularly. Other sources have included milk from transgenic animals, secretions from transgenic silkworms, and transgenic plants. Most of these methods have not been able to obtain the same degree of prolyl hydroxylation as in native human collagen. Moreover, cost and productivity have not made them commercially viable. Two recombinant systems using lower level organisms have been used in the production of stable triple-helical human collagens; yeast and E.coli. For yeast, collagen fragments are secreted as single-chain polypeptides via the yeast alpha-mating factor pre-pro sequence, but secretion of full-length triple-helical procollagen, as seen in normal mammalian collagen synthesis, has not been achieved. In contrast to mammalian expression, the trimerization of collagen polypeptides has an inhibitory effect on secretion, leading to intracellular accumulation despite the presence of the alpha-mating factor pre-pro secretory sequence. The most successful work has come from coexpression of collagen with P4H in Pichia pastoris where scientists have successfully generated 1-1.5g/L of collagen type I, II, and III with hydroxylation levels being at or near native levels. The products also showed the proper thermal stability and morphology as native collagen (Baez, 2005). In Saccharomyces cerevisiae, collagen type I was generated with 82% of native hydroxylation levels. {{78 Toman,P.David 2000; }}. E. coli has seen limited success in collagen production. Small fragments (93-245 AA) of bovine collagen α2 (I) chain have been expressed (Hori et al. 2002). While accumulation levels or purification yields aren't known, enough material was produced for identification via antibody staining. Another group used E. coli to produce a totally synthetic gelatin made solely of 32 repeats of Gly-Pro-Pro (Goldberg et al. 1989). While properly constructed, the gelatin was shown to accumulate in inclusion bodies. Furthermore, inhibition of the heat shock (HIF related) response of E. coli significantly stabilized the expressed synthetic gelatin product. For the most part, success in expressing collagen in E. coli has been limited by low yields mainly attributed to the apparent instability of these highly repetitive genes in E. coli (Cappello, 1990). In attempt to circumvent this obstacle, an E. coli strain was engineered for the cotranslational incorporation of hydroxyproline into various lengths of type I collagen (Buechter, 2003). This was achieved by growing an E. coli culture engineered for increased prolyl aminoacyl-tRNA synthase accumulation in a hyperosmotic media supplemented with hydroxyproline. However, the resulting α1(I) collagen fragment was different from tissue-derived collagen in that hydroxyproline was present at both X and Y positions of the Gly-X-Y triplets. Interestingly, collagen fragments of this variant were still assembled into triple helices. Collagen and P4H coexpression in E. Coli has had setbacks because a pair of essential disulfide bonds in the P4H β subunits are not formed in the cytoplasm of E. coli. However, it has also been demonstrated that E.coli can produce properly folded human collagen P4H in the periplasm (Neubauer, 2007). Also, a novel mutant E. coli strains with a more oxidizing cytoplasm has recently been developed which have successfully expressed proteins with up to 17 disulfide bonds. Using these strains recent work has used E. coli to produce large amounts of a recombinant human collagen P4H in the cytoplasm which resulted in higher amounts of the active tetramer. The major current limitation of these previous methods is that coexpression of P4H genes with collagen genes still leaves the hydroxylation process to be at the whim of P4H's sequence/structural specificity. That is, scientists have no power to specify the exact location of hydroxyproline residues. This is a hindrance to the design of novel collagen based materials and to any studies that might want to predictably alter the hydroxyproline content. It may also be the case that not all collagen proteins created become hydroxylated to the same degree using these methods, leading to a less homogenous product. Methods Site-specific incorporation of hydroxyproline In order to site-specifically incorporate hydroxyproline, we will use methods pioneered by Peter Schultz’s lab at Scripps. These methods provide a means to genetically encode the location of unnatural amino acids using the amber stop codon (TAG). The procedure will involve three main steps: 1. Generating an orthogonal prolyl tRNA/prolyl tRNA aminoacyl synthetase pair (tRNA[pro]/aaSyn). 2. Engineering the synthetase to acylate hydroxyproline (Wang, 2006). 3. Optimization. A successful starting point for generating tRNA/aaSyn pairs has been to look to archaeal sources. Many archael tRNAs are not substrates for eubacterial amino-acyl synthetases and archael tRNAs and synthetases express efficiently in E. coli. Moreover, there is an everincreasing amount of archaeal sequence and structural data coming out.{{73 Santoro, 2003; }} Orthogonal pairs for leucine, glutamate, tyrosine, and lysine have already been created using archael sources (Wang, 2006). We will attempt to use the tRNA[pro]/aaSyn pair from Methanocaldococcus janaschii because it is well studied and has available structural data. (PDB:1NJ8). The pair is very likely to be orthogonal because prolyl aaSyn from M. janaschii has close homology to eukaryotic Proly aaSyns and is actually in a different class of aaSyns than E. coli’s. Also, several nucleotides important in E.Coli tRNA[pro] recognition are different in tRNA[pro] from M.janaschii (Burke, 2001). We will begin by changing the anticodon of M.janaschii tRNA[pro] to the amber stop codon and expressing it in E.coli (tRNA[proA]). Previous orthogonal pairs have relied on the aaSyn to be unaffected by changes in the anticodon region. However, in the case of prolyl aaSyn, the anticodon region on its cognate tRNA is known to be important for recognition (Burke, 2001). Therefore, we will use directed evolution on what is known to be the enzyme’s anticodon recognition region to alter its specificity to recognize the amber stop codon. A library of aaSyns will be created and assayed for activity by coexpression with a fluorescent marker bearing amber stop codons. The library members that best suppress the amber stop codon can be isolated using FACS (Fig. 1). At this point we can proceed using the same methods as the Schultz group. Briefly, this involves negative selection on a library of tRNA[proA], followed by a round of positive selection to yield a tRNA[proA] that is completely orthogonal to the rest of E.coli’s tRNA (Wang, 2006). The next step is altering the substrate specificity of the prolyl aaSyn. This involves creating a library of aaSyn. At this point it is helpful to have the crystal structure of the aaSyn, because this allows for specific directed evolution on the amino acid binding region. This library is put through a round of positive selection followed by negative selection, ultimately yielding an aaSyn that can charge hydroxyproline to its cognate tRNA (Fig. 2,3) (Wang, 2006). The final challenge for site-specific incorporation is optimization. For the most part, studies using artificial amino acids have involved inserting at one site in a specific product. However, recombinant collagen will require several hydroxyprolines to be inserted per chain. Inefficiency will result in poor yield and truncation products. Recently Ryu et. al. have shown that optimizing their system could lead to incorporation of an unnatural amino acid at levels approaching that of natural amino acids (Ryu, 2006). We will no doubt have to perform similar work to ensure incorporation of hydroxyproline with high efficiency and fidelity. Generation of hydroxyproline An effective system for site-directed incorporation of hydroxyproline will require sufficiently high concentrations of hydroxyproline, a situation uncommon in natural systems. Traditionally, unnatural amino acids are supplemented to the medium, however, insufficient uptake leads to truncation products (Liu, 2006). It has already been shown that hydroxyproline is not efficiently transported into the cytosol (Buechter, 2003). Previously the Schultz group added a pathway for the production of the non-canonical amino acid, p-amino phenylalanine, to E.coli. We are proposing a similar strategy for generating the hydroxyproline in vivo using a nonmammailian P4H, thereby alleviating the issue of poor uptake (Mehl, 2003). The majority of proline-4-hyrdoxlases act on peptidyl proline exclusively and in a sequence specific manner. However, several studies have found P4H’s which also hydroxylate free L-proline (Petersen 2003, Lawrence 1996, Bontoux 2006). In fact, free hydroxyproline is used as a precursor for a variety of secondary metabolites such as in etamycin synthesis in Streptomyces griseoviridus P8648 (Lawrence 1996). The P4H to be used in this study is derived from a sequence cloned from Dactylosporangium sp. Previous work has shown that, when cloned into E. coli, this P4H exhibits a 1600-fold increase in activity relative to its native host and environment (Shibasaki 2000). Additionally, several non-protein factors are required by P4H. P4H is a 2-oxoacid ferrous dependent dioxygenase (Lawrence 1996). Therefore, to promote efficient hydroxylation, 2-oxoglutarate and Fe2+ will be provided in the culture media for cellular uptake. Ascorbate will also be supplemented in the culture media. The P4H gene will be introduced into a high copy number plasmid with a pMB1 origin of replication. Shibasaki et. al. showed that the hydroxyproline output could be tuned in several ways. This includes addition of L-proline, feedback resistant mutations for increased proline biosynthesis, and mutations in proline degradation enzymes (Shibasaki 2000). Ultimately, the relative concentrations of proline and hydroxyproline will need to be controlled and optimized for efficient collagen synthesis. Gene regulation Typically, to express a protein, the cells are grown to a certain density and then induced to start pumping out the product. However, our product relies heavily on the availability of hydroxyproline. Therefore, to maximize efficiency of collagen production, we require our pathway to be regulated such that the expression of collagen occurs only when there are sufficient concentrations of hydroxyproline in the host. Otherwise translation of collagen will halt at the amber stop codons due to a lack of hydroxyproline activated tRNA. This implies that we need to engineer a genetic control element that is highly sensitive to free hydroxyproline. We propose to construct a custom riboswitch that selectively binds to hydroxyproline, and induces genes under its control when this binding occurs. Riboswitches can be found in the 5’ untranslated region of the mRNA under regulation, just before the ribosome binding site (RBS). Most riboswitches consists of two structural domains: an aptamer and an expression platform (Tucker, 2005). The aptamer domain is highly folded and binds specifically to a target molecule. Upon binding, the RNA undergoes structural changes and the expression platform either exposes the RBS or hides it, thus facilitating translation or inhibiting it. There are many classes of riboswitches characterized by differences in the aptamer and expression platforms. One riboswitch discovered at the Breaker lab at Yale, gcvT, is particularly well-suited for our purposes. The gcvT motif is found in many bacterial species, including B. subtilis and V. cholerae, and resides upstream of genes that participate in the glycine cleavage pathway. The gcvT operon is rare because it utilizes ligand binding to activate gene expression, whereas most other riboswitches are used to repress gene activity. The ability to activate the gene in the presence of ligand is exactly the feature we desire in our system. Also, gcvT selectively binds to a very small molecule, glycine, which is composed of only 10 atoms. Our target molecule, hydroxyproline is also a small molecule therefore gcvT can serve as a very good starting template for our engineered riboswitch. Furthermore, unlike other ribozymes, gcvT has two aptamer domains, type I and type II, with a highly conserved linker sequence in between. Experiments have shown that these two domains cooperatively bind to glycine with a Hill coefficient of between 1.4 and 1.6 (Mandel, 2004). This gives us a new mechanism to tune the sensitivity of this riboswitch. With only one aptamer domain, gcvT went from 10% to 90% ligand bound more than a 100-fold increase in glycine concentration. With two aptamer domains, however, the same change in ligand binding occurs with only a 10-fold increase in glycine concentration. We propose to use gcvT riboswitch as the starting template for our hydroxyproline gene switch. The template will be computationally redesigned to bind to hydroxyproline instead of glycine. RNA structural prediction and folding is well studied and previous work has demonstrated the feasibility of computational design of ribozymes with desired function (Penchovsky, 2005) To further enhance the specificity and binding affinities of the hydroxyproline apatamers, we will use directed evolution via SELEX (systematic evolution of ligands by exponential enrichment) (Ellington 1990) on the most promising candidates as determined from the computational step. We will test the function of these hydroxyproline riboswitches in vivo by inserting them at the 5’ end of GFP mRNA, and induce GFP translation with addition of hydroxyproline into the medium. The hydroxyproline responsive riboswitch will be ultimately incorporated upstream of the RBS of our engineered collagen genes. The flexibility in our riboswitches allows us to fine tune the activity of different genes by using slightly different riboswitches with different hydroxyproline sensitivities. Our goal is to design the control system such that we can create and maintain a steady state production of collagen. This will likely involve extensive testing to determine the concentration of hydroxyproline and methods to alter the degree of induction. Collagen Secretion/Purification Ideally we will be able to secrete our product in order to ease purification. It may be possible to achieve this by fusion with a signal sequence such as pelB that has been shown to facilitate the secretion of the attached protein (Sletta 2007, Xuyang 1995). Xuyang, et al. demonstrated the ability of the pelB sequence to direct peptides to the extracellular medium through the secA pathway, achieving a total protein production of 2.2 g/L, half of which was in the soluble form. A second comparison study by Sletta, et al replaced the pelB sequence with another naturally occurring secretion signal, ompA. They observed that for certain proteins, when coupling to pelB did not result in significant transport, ompA did. Therefore, in our study, both secretory signals will be tested as it is difficult to predict a priori which, if any, will be most effective in driving the secretion of collagen. Protein secretion in E. coli also involves the help of chaperone proteins. Because the level of secretion required in the proposed system is much greater than under normal conditions, additional chaperones will be necessary. To account for this, sequences encoding for additional secB, Dnak, and DnaJ will also be included (Baneyx 1999) on a ColE1-based plasmid with a copy number that can be regulated depending on the initial secretion results Collection and purification of the collagen will begin with the isolation of the soluble collagen from the culture media. E. coli does not secrete natural proteins in high volumes and therefore the soluble collagen should have few proteinaceous contaminants. The soluble collagen will be collected through gentle centrifugation. Next, the cells enclosing the remaining collagen will be lysed and the cell fragments will be removed by centrifugation. The ionic content of the collagen-containing solution will then be increased, causing the collagen to precipitate. The precipitate will be then be taken up into an acidic solution in which the collagen is soluble. These two steps will remove proteins that do not precipitate at high ionic strengths first and then those that do not re-solubilize in acidic conditions. Finally, both the soluble collagen collected early on, and the isolated cell-based collagen will be purified through ion exchange chromatography. Analysis and System Characterization Several aspects of the system require analysis in order to maximize overall collagen synthesis. First, the P4H concentration and enzymatic activity will have to be measured. The P4H levels in the cells can be determined by fusion of GFP with the P4H coding sequence. The amount of P4H can be detected by fluorescence. The activity of the enzyme will be determined by the conversion of L-proline to 4-trans-L-hydroxyproline as measured by HPLC. The measurements will be made for a range of cofactor and cosubstrate concentrations (Fe2+, 2oxoglutarate, ascorbate). Next, because this system’s claimed benefit is the production of collagen with low polydispersities and highly uniform placement of hydroxyproline, the structure of collagen must be carefully analyzed. SDS-PAGE will be used to ensure single product formation. GPC will be used to determine the polydispersity and molecular weight of the collagen. Individual strands can be sequenced through Edman degradation techniques to confirm the incorporation of the hydroxyprolines. Overall yields, including the ratio of secreted to non-secreted collagen will also be determined based on the purified masses. Finally, TEM images can be used to determine the morphology of the collagen fibrils. Issues and Troubleshooting There are several aspects of the proposed system which may yield insufficient results and may require alterations. Generating orthogonal tRNA[pro]/aaSyn pairs can lead to dead ends (Wang, 2006). In that case, we can attempt to use non-archael sources such as yeast, or attempt to use previously generated orthogonal pairs by engineering their aaSyn substrate specificities for hydroxyproline. In terms of purification it is unlikely that a large majority of the collagen will be secreted especially if it is of high molecular weight. Although several studies have reported successful secretion mechanisms, there is not a ubiquitous pathway and most of the reported successes were specific to protein type. We hope that one of the previously studied signaling sequences will facilitate the secretion of a large portion of the collagen; however, if this is not observed it is possible to relocate our system into a different organism such as yeast. Secretion of recombinant proteins from yeast has been more successful that from E. coli. In fact, Julio Baez et al have reported secretion of collagen from yeast in amounts ranging from 3-14 g/L. Therefore, yeast cells offer a realistic alternative for overcoming low secretion in E. coli. Ultimately, the most daunting task will be fine-tuning the flux through each pathway to ensure full-length collagen production. For example, designing concentration dependence of the hydroxyproline responsive riboregulator will depend highly on determining the turnover efficiency of P4H. It will also be dependent on determining the concentration of hydroxyproline necessary for efficient amber suppression. This interdependence means that this project will require a great deal of iterative alteration and optimization. Timeframe We believe that our initial goal of site-specific introduction of hydroxyproline into collagen will take two years. However, we believe it will take at least two more years to further optimize the system and possibly make it commercially viable. Figure 4 displays a proposed timeline. Given a large enough team, generation of the orthogonal pair, design of the hydroxyproline responsive riboregulator, characterization of P4H activity, and secretion testing could be done in parallel in about 1.5 years. We believe another half year would be necessary to combine the systems for low level, low efficiency production of collagen. The following two years would be spent optimizing the system, mainly for increased yield. Finally, an additional year would be necessary for scaling up the production process. Ethical/Social Impacts We do not foresee any significant ethical issues with our project besides the usual concerns about genetic engineering and synthetic biology in general. The number one concern that we must address is that of safety. The host organism we propose to use, E. coli, is well studied, widely used, and poses minimal risk. However, to use our engineered collagen in humans for medical purposes will require the necessary FDA approvals. In fact, our bacteriaproduced collagen is probably safer than the current animal derived alternatives. This is because extracting and using bovine collagen poses risk of transmitting prion diseases such as bovine spongiform encephalopathy (BSE). Using bacteria to produce quantities of human collagen has ethical advantages as well. Our product will mostly replace animal derived collagen, which is currently being opposed by animal rights activist organizations such as PETA. The ability to cheaply produce medical grade collagen will have significant impact on society. It is predicted that the demand for collagen-based biomaterials will continue to rise, mainly due to the aging of the baby boomer generation. Frost & Sullivan reported that in 2001 the total US market for collagen-based biomaterials generated over $70 million in revenue. This is projected to grow beyond $91.8 million in 2008 (Murrieta, 2002). Our technology of producing cheap high quality collagen should capture significant shares of this growing market. Conclusion In summary we believe that successful completion of our proposal will lead to facile, high yield production of monodisperse collagens and gelatins containing site-specific incorporation of hydroxyproline. This will allow scientists to study the structure and assembly of collagen and the importance of hydroxyproline at a previously unachievable level. Additionally, this may provide a new source for animal free collagens and gelatins. References Ala-Kokko L, Hyland J, Smith C, Kivirikko KI, Jimenez SA, Prockop DJ. Expression of a human cartilage procollagen gene (COL2A1) in mouse 3T3 cells. J Biol Chem. (1991) 266: 14175–14178. Annunen, P, Koivunen P, Kivirikko KI. Cloning of the α subunit of prolyl 4-hydroxylase from Drosophila and expression and characterization of the corresponding enzyme tetramer with some unique properties. J. Biol. Chem. (1999) 274: 6790–6796. Baez, et. al. Recombinant microbial systems for the production of human collagen and gelatin. Applied microbiology and biotechnology 69, 245 (2005). Baneyx, Francois. Current Opinnion in Biotechnology (1999) Vol.10, Issue 5 411-421. Bontoux, M.-C. et al. Tetrahedron Letters 47 (2006) 9073-9076. Buechter, D. D. et al. Co-translational Incorporation of Trans-4-Hydroxyproline into Recombinant Proteins in Bacteria. J. Biol. Chem. 278, 645-650 (2003). Burke, et. al. Divergent adaptation of tRNA recognition by Methanococcus jannaschii prolyltRNA synthetase. The Journal of biological chemistry 276, 20286 (2001). Cappello J, Crissman J, Dorman M, Mikolajczak M, Textor G, Marquet M, Ferrari F. Genetic engineering of structural protein polymers. Biotechnol Prog. (1990) 6: 198–202. Ellington, A. D. & Szostak, J. W. Nature (1990) 346, 818-822 Goldberg I, Salerno AJ, Patterson T, Williams JI. Cloning and expression of a collagen-analogencoding synthetic gene in Escherichia coli. Gene. (1989) 80: 305–314. Hori H, Hattori S, Inouye S, Kimura A, Irie S, Miyazawa H, Sakaguchi M. Analysis of the major epitope of the alpha2 chain of bovine type I collagen in children with bovine gelatin allergy. J Allergy Clin Immunol. (2002) 110: 652–657. Ivan M, Kondo K, Yang H. HIFα targeted for VHL-mediated destruction by proline hydroxylation: implications for O2 sensing. Science. (2001) 292: 464–468. Jaakkola P, Mole DR, Tian YM. Targeting of HIFα to the von Hippel–Lindau ubiquitylation complex by O2-regulated prolyl hydroxylation. Science. (2001) 292: 468–472. Kadler K. Extracellular matrix 1: fibril-forming collagens. Protein Profile (1995) 2: 491–619. Kivirikko KI, Myllyharju J. Prolyl 4-hydroxylases and their protein disulfide isomerase subunit. Matrix Biol. (1998) 16: 357–368. Lamberg, A. et al. Characterization of Human Type III Collagen Expressed in a Baculovirus System. J. Biol. Chem. 271, 11988-11995 (1996). Lawrence, Christopher et al. Biochem J. (1996) 313, 185-193. Liu. Recombinant expression of selectively sulfated proteins in Escherichia coli. Nature biotechnology 24, 1436 (2006). Mandel, M. et al. Science 306, 275-279. Mehl, R. A. et al. Generation of a Bacterium with a 21 Amino Acid Genetic Code. J. Am. Chem. Soc. 125, 935-939 (2003). Murrieta, T. Baby boomers drive demand for collagen-based biomaterials. Health & Medicine Week, July 1, 2002 pp 16. Myllyharju J, Kivirikko K. Collagens, modifying enzymes and their mutations in humans, flies and worms. TRENDS in Genetics. (2004) 20: 33-43. Myllyharju, J. Prolyl 4-hydroxylases, the key enzymes of collagen biosynthesis. Matrix Biol. (2003) 22: 15–24. Neubauer A, Soini J, Bollok M, Zenker M, Sandqvist J, Myllyharju J, Neubauer P. Fermentation process for tetrameric human collagen prolyl 4-hydroxylase in Escherichia coli: Improvement by gene optimisation of the PDI/β subunit and repeated addition of the inducer anhydrotetracycline. Journal of Biotechnology. (2007) 128: 308–321. Penchovsky, R. & Breaker, R.R. Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes. Nature Biotechnology (2005) Vol. 23, No. 11 pp. 1424-1433. Petersen, L. et al. Appl. Microbiol. Biotechnol. (2003) 62:263-267. Ryu, . Efficient incorporation of unnatural amino acids into proteins in Escherichia coli. Nature methods 3, 263 (2006). Santoro, . An archaebacteria-derived glutamyl-tRNA synthetase and tRNA pair for unnatural amino acid mutagenesis of proteins in Escherichia coli. Nucleic Acids Research 31, 6700 (2003). Sletta, H. et al. Applied and Environmental Microbiology Feb. 2007 Vol. 73, No. 3 pp. 906-912. Takahashi Y, Takahashi S, Shiga Y, Yoshimi T, Miura T. Hypoxic induction of prolyl 4hydroxylase α(I) in cultured cells. J. Biol. Chem. (2002) 275: 14139–14146. Tandon M, Wu M, Begley TP, Myllyharju J, Pirskanen A, Kivirikko K. Substrate specificity of human prolyl-4-hydroxylase. Bioorg Med Chem Lett. (1998) 8: 1139–1144. Toman, P. D. et al. Production of Recombinant Human Type I Procollagen Trimers Using a Four-gene Expression System in the Yeast Saccharomyces cerevisiae. J. Biol. Chem. 275, 23303-23309 (2000). Tucker, B.J. & Breaker, R.R. Riboswitches as versatile gene control elements. Curr Opinions in Structural Biology (2005) 15:342-348 Vuori K., Pihlajaniemi T. Marttila M., Kivirikko KI. Characterization of the human prolyl 4hydroxylase tetramer and its multifunctional protein disulfide-isomerase subunit synthesized in a baculovirus expression system. Proc. Natl. Acad. Sci. (1992) 89: 7467–7470. Wang, L., Xie, J. & Schultz, P. G. EXPANDING THE GENETIC CODE. Annu. Rev. Biophys. Biomol. Struct. 35, 225-249 (2006). Xuyang Li et al. Applied and Environmental Microbiology July 1995 Vol. 61, No. 7 pp. 26702680. Yamauchi M, Shiiba M. Lysine hydroxylation and crosslinking of collagen. Methods Mol Biol (2002) 194: 290. Figure 1: A library of proline aaSyns will be screened for their ability to suppress the amber codons in a GFP gene in the presence of tRNA[proA]. Winners will be determined by FACS. Figure 2: A library of tRNA[proA] is subjected to negative selection followed by positive selection to enrich for tRNA[proA] that are completely orthogonal in E. coli (Wang, 2006) Figure 3: Several rounds of positive and negative selection on a library of aaSyns is performed in order to generate an enzyme specific for the unnatural amino acid. This figure illustrates the process for a Tyrosine aaSyn (Wang, 2006). 6 Mo 12 Mo 18 Mo 24 Mo 30 Mo 36 Mo 42 Mo Hydroxyproline tRNA construction (10 persons) Testing and optimization of HP insertion efficiency with HP tRNA Identification of aptamer regulation system (10 persons) Optimization of aptamer regulation with respect to HP concentration Testing of aptamer regulation with GFP Implementation of secretory system (10 persons) Testing of secretory system with GFP Testing and optimization of secretory system with collagen Construction of enzyme plasmid (10 persons) Testing and optimization of the unregulated P4H efficiency, with respect to co-factor and cosubstrate, free proline etc… Testing and optimization of regulated enzyme plasmid Scale up and commercialization Figure 4: Predicted timescale for achieving goals throughout the project. 48 Mo 54 Mo 60 Mo