Biological Program for Selected Senior High School Students in Academia Sinica !"#$%&!'()*+,'-./012#$34 Cloning of A Novel Fungal Cellulase Candidate Gene. Authors Hsuan-An Chen 567 (96019) Hung-Wei Liu 89: (96104) Taipei Municipal Jianguo High School ;<!*, 2010 Primary Investigator: Dr. Wen-Hsiung Li =>? Biodiversity Research Center & Genomics Research Center, Academia Sinica February, 2010 Cloning of A Novel Fungal Cellulase Candidate Gene. Hsuan-An Chen !"# & Hung-Wei Liu $%& Taipei Municipal Jianguo High School '()*, 2010 FINAL REPORT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE BIOLOGICAL PROGRAMS FOR SELECTED SENIOR HIGH SCHOOL STUDENTS IN ACADEMIA SINICA Primary Investigator: Dr. Wen-Hsiung Li +,Biodiversity Research Center & Genomics Research Center, Academia Sinica February, 2010 Abstract Knowing that the fossil fuel resources are getting fewer and fewer, scientists want to develop new alternative resources. Biofuel is one of the choices. In this project we had cloned a fungal cellulase candidate gene, the W5-CAT13 which was predicted from the whole-genome sequence of W5, a new endemic fungus species in Taiwan. The goal of the Li Lab and consortium’s multi-component biofuel project is to engineer microorganisms to produce cellulosic biofuel for energy purposes. Cellulases are therefore the key machinery that digests agricultural wastes to transform cellulose into glucose which then feeds fermentative microorganisms to produce alcohols. We performed several methods to amplify, clone, screen and select W5-CAT13 clones, including polymerase chain reaction (PCR), TA cloning, blue/white selection, colony PCR, DNA sequencing and sequence analysis. The purpose of these procedures was to assure correct sequence of the clone to be expressed and assayed. As a result, however, we found that the gene fragments in our experiment was contaminated, possibly from some materials commonly used in our lab. Introduction To ease fossil-fuel dependency and to take advantage of the solar energy stored in plants , especially agricultural wastes -- the part that is inedible and usually thrown away, a major research project in the Li Wen-Hsiung Lab is in the development of cellulosic biofuels. The goal is to transform those previously abandoned agriculture wastes, such as rice straw or bagasse into biofuels, including bio-ethanol or other higher alcohols, via microbial cellulose hydrolysis and fermentation activities. This approach will also help to reduce pollution and the amount of carbon dioxide released into the atmosphere, since the incineration of these agricultural wastes is avoided. Figure 1 (US Department of Energy, 2009) illustrates the concept of carbon and energy cycle in the environment with an emphasis in biofuel. The basics of cellulase for bio-fuels Cellulase is roughly categorized into three classes (Figure 2) (Wikipedia, 2009), including (1) endocellulase that cuts cellulose from 3D to 2D structure, (2) exocellulase, which cuts those long chains into double sugars, and (3) cellobiase (beta-glucosidase) turing cellobiose or cellutetrose into glucose. The idea is to engineer these three cellulases for efficient hydrolysis of the rich cellulose in rice straw, bagasse, etc. to glucose, which can then serve as carbon source for natively fermentative microorganisms, e.g., the yeast, for the production of bio-alcohols for energy purposes. The issue is that there are presently no proper cellulases for such application. Insufficient enzyme efficiency of currently available cellulases leading to significant pretreatment, costs and energy required for industry-scale production renders cellulosic biofuel impracticable based on current technology. 1 Figure 1. Carbon dioxide and biofuels in the energy cycle. (US Department of Energy, 2009) 2 Figure 2. Three major classes of cellulase and their substrates. (Wikipedia, 2009) 3 The quest for “ideal” cellulases In order to develop new, applicable cellulases for large-scale energy production, the approach taken in the Li Lab and consortium was to first identify new cellulase candidates from naturally competent organisms. Through the collaboration with Prof. Yo-Chia Chen at National Pingtung University of Science and Technology, we obtained a new endemic fungus species in Taiwan, the W5, isolated from the rumen of Taiwan yellow cattle. A whole-genome de novo sequencing was performed on W5; cellulase gene candidates in W5 genome were predicted based on available cellulase gene sequences and enzyme structures. Primers against these cellulase gene candidates were designed for PCR amplification. Each candidate was then amplified from W5 total cDNA, TAcloned for sequence verification, and cloned into an expression vector for functional characterization and enzyme assay in the yeast system (Figure 3). Among all the predicted genes, the one we worked on for our research project was W5CAT13, a candidate beta-glucosidase gene. We used CAT13 gene-specific primers to PCR amplify the gene fragments, initially TA cloned the PCR products to the commercially available pGEM®-T Easy Vector to facilitate sequencing verification, and subsequently the clone with correct sequence will be cloned to a yeast expression vector using the method by Shih et al. (Shih et al., 2002) for future enzyme studies in the lab (Figure 4). Materials and Methods This section contains the detail information for the experimental procedures performed during our training session in the laboratory for the purpose of W5 fungal cellulase candidate gene cloning. 4 Figure 3. The flow chart illustrating the overall procedure for the development of novel cellulase for the purpose of biofuel production. 5 Gene-specific PCR primer design based on the results of whole-genome sequencing & gene prediction PCR amplify the gene of selected candidate cellulase Clone the purified gene fragments into an appropriate vector for sequencing verification (Gel) purify PCR products Express the candidate cellulase in selected host for activity assay and biochemical analysis Figure 4. The flow chart highlighting the steps implemented in this research for molecular cloning of a novel cellulase candidate, W5-CAT13. 6 Gene-specific primers, Polymerase chain reaction and DNA sample preparation Polymerase chain reaction (PCR) was first carried out using two pairs of primers (CAT13 1-1 forward and reverse, and CAT13 1-2 forward and reverse) specific to the candidate beta-glucosidase gene, W5-CAT13 (gene specific regions: ATG AAG ACT CTT ACT TTA TTT AC for the forward primer and GTT TTG TTC AAC ATT TTC AAG G for the reverse). Both pairs share the same gene-specific regions, whereas the 1-1 forward primer contains the Watson strand of EcoRI restriction site on the 5’-end, and the 1-2 reverse primer contains the Crick strand of XhoI restriction site also on the 5’-end (Shih et al., 2002). Additionally, the 1-1 reverse primer carries a guanosine (G) on the 5’-end and the 1-2 forward primer carries a cytidine (C) also on the 5’-end, to compliment each of the EcoRI and XhoI restriction sites. These restriction sites were designed to facilitate molecular cloning strategy designed and described by Vice President Andrew H.-J. Wang’s Lab (Shih et al., 2002). The previously synthesized W5 cDNA library prepared from W5 mRNA served as the template DNA in this experiment. Although no rigorous annealing temperature profiling was performed, the annealing temperature for both 1-1 and 1-2 pairs of CAT13 primers were set at 35°C for this experiment, after a few initial attempts. PCR products were resolved in 0.8-1% TAE agarose gel after electrophoresis, and visualized under UV light by either ethidium bromide or SYBR® Safe (Invitrogen, USA) staining. DNA fragments appeared at the expected size (2000 bp) relative to the DNA markers were considered positive PCR results. Prior to subsequent TA cloning or sequencing procedure, PCR product purification was performed using the commercially available QIAquick spin method by QIAGEN (USA). 7 When considerable amount of non-specific PCR products were observed, gel extraction of the 3-kb DNA fragments was done also using the QIAquick spin method. The spectrophotometry was used to determine the OD260, OD280 and OD230 values of the purified samples in order to estimate the quantity and quality of the products. TA cloning and clone screening Purified PCR products were cloned to pGEM®-T Easy Vector via a commercially available TA cloning method (Promega, USA) and transformed to HIT-DH5-alpha E. coli competent cells (Real Biotech Corp., USA) for subsequent sequencing analysis. Positive transformants carrying the pGEM-T Easy Vector were Ampicillin resistant; the clones with the inserted DNA fragment should grow into white colonies on solid media containing X-gal (with IPTG). Colony PCR was performed on the white colonies to screen for clones containing W5CAT13 insert. Plasmid miniprep (QIAGEN, USA) was performed on the clones to harvest the vectors potentially with insert. The spectrophotometry was applied to estimate the quantity and quality of the plasmid samples. The plasmids were restriction digested at 37°C for four hours by NdeI (4 unit/ug DNA) (NEB, USA) and SacII (4 unit/ ug DNA) (NEB, USA) then heat inactivated at 65°C for 20 minutes to further confirm the insert. 6 uL of the digested products were loaded and resolved on agarose-TAE gel after electrophoresis. Each clone was also submitted for DNA sequencing done by a local vendor. DNA sequence analysis The candidates of pGEM®-T Easy Vector with W5-CAT13 insert were each sequenced with SP6 Promoter Primer, T7 Promoter Primer, the gene-specific (1-1 or 1-2) forward and reverse primers, as well as a forward primer internal to W5-CAT13 gene 8 (CAT13_601f) respectively. All the sequencing results were first visually examined on their chromatograms for an overall quality control. The sequence reads were aligned, using ClustalW multiple sequence alignment algorithm (Larkin et al., 2007), against the reference W5-CAT13 sequence previously obtained in W5 whole-genome sequencing. BLAST (Altschul et al., 1990) was used to identify the regions among the reads that were not well aligned with the reference sequence (or the vector backbone) by an NCBI DNA sequence database search. Results and Discussion This study is part of the biofuel research project carried out in the Wen-Hsiung Li Lab, the aim of which is to identify and characterize novel cellulases from W5, an endemic fungal species in Taiwan. The ultimate goal is to apply these newly found enzymes in cellulosic biofuel production, via the hydrolysis of agricultural wastes, such as rice straw, and fermentation, using yeast as the host organism. We attempted to clone a predicted beta-glucosidase gene, W5-CAT13 using the approach published by Vice President Andrew H.-J. Wang and colleagues (Shih et al., 2002). We were able to perform the procedures from PCR amplification of the gene, TA cloning, through DNA sequencing for clone verification. In this section, we detail the major results of our experiments and elaborate what we have observed. PCR and nucleic acid purification Although there was no rigorous testing of PCR temperature profiles, the annealing temperature of 35°C for both 1-1 and 1-2 primer pairs was applied after several attempts. Both 1-1 and 1-2 PCR products at the size of 2 kb were observed (Figure 5). The initial quantity, nevertheless, was low, as only a faint band appeared on the agarose gel. We 9 3kb 2kb Figure 5. An ethidium bromide-stained 1% agarose-TAE gel showing the presence of PCR products at the expected size of about 2kb. 10 thus performed several sub-PCRs using products from previous reactions as the template to obtain sufficient amount of DNA for subsequent cloning. Since the annealing temperature is significantly lower than the primers’ predicted melting temperatures, non-specific amplification is likely to occur. Should non-specific DNA fragments be produced, they would be further enriched in the subsequent sub-PCR. Cloning of the PCR-amplified gene fragments into individual sequencing vector followed by sequencing verification is therefore highly recommended prior to molecular cloning for gene expression and enzyme assays. Purification of the PCR products for TA cloning were done using a commercially available method (QIAGEN, USA). The concentrations of most of the products were about 300 ng/uL, determined by spectrophotometry, usually in the volume of 30 uL. TA cloning and clone screening Only a few white colonies appeared after the W5-CAT13 1-1 TA cloning and transformation reaction. Even fewer white colonies were obtained for sample 1-2. Most of these clones were actually blue ones after patching (Figure 6). Such low cloning and transformation efficiency was likely to be the result of bad DNA quality. Although the apparent concentration appeared sufficient at ~300 ng/uL, the actual products might include non-specifics such as primer dimers, primer extensions, etc., as a bright band or bright smears at lower molecular weight were constantly observed. The final concentration of true working 1-1 or 1-2 W5-CAT13 DNA molecules could have been overly diluted for TA cloning. Colony PCR with the same W5-CAT13 1-1 or 1-2 primer pairs was used in the initial screening of the TA clones. Surprisingly, the colony PCR products of all screened clones 11 Figure 6. Single-colony patches of W5-CAT13 candidate TA clones. Blue indicates the absence of an insert. White patches are clones potentially carrying an insert. 12 were in the size of 3 kb, rather than the original 2 kb, after several attempts and at least one repeated cloning reaction. To further verify (the presence of) the insert, we obtained the plasmids of each of the TA clones using a commercially available miniprep method (QIAGEN, USA) and subjected these plasmid samples to both DNA sequencing and restriction enzyme digestion. The part on sequencing will be discussed in the next section. The banding patterns of the digested TA clones are shown in Figure 7. We used NdeI and SacII, each of which respectively has one restriction site on position 97 bp and 49 bp on the multiple cloning site flanking the insert on pGEM®-T Easy Vecvor (3051 bp, Figure 8) but does not cut the W5-CAT13 insert, our candidate gene of interest. We performed one double-digestion (NdeI and SacII, lane 1), two single-digestions (NdeI or SacII alone, lanes 2 and 3), as well as a no-enzyme (water) control (lane 4) (A 2-log DNA ladder was used and loaded on the left.) As seen on the gel, both single- and doubledigestions resulted in a single major band at ~3 kb, whereas two bands, one at ~2.5 kb (between 2 and 3 kb of the 2-log ladder) and the other at ~5 kb, were observed in the noenzyme control. The 3-kb fragments in the singly-digested samples were likely to be the linearized plasmid; those in the doubly-digested reaction should also be the linearized plasmid, with or without the insert if the insert had been 3 kb. We were more inclined to conclude that this clone did not contain any inserts that are larger than 0.5 kb in size, as (1) no significant increase in the brightness (DNA quantity) of the 3-kb band as should be if the insert were 3 kb, or (2) no appearance of bands in any size other than 3 kb was observed in lane 1 (double digestion) where the insert should be cut off from the vector backbone, relative to lanes 2 and 3 (single digestion) thus indicating the absence of DNA fragments larger than 500 bp. 13 6kb 1 2 3 4 3kb 1kb Figure 7. Restriction digestion profile of one CAT13 pGEM-T Easy clone (1. NdeI & SacII, 2. NdeI, 3. SacII, 4. water). 14 tm042.0507.qxp 5/24/2007 3:44 PM Page 6 II.C. pGEM®-T Easy Vector Map and Sequence Reference Points XmnI 2009 f1 ori Ampr pGEM-T Easy Vector lacZ T (3015bp) T T7 ApaI AatII SphI BstZI NcoI BstZI NotI SacII EcoRI SpeI EcoRI NotI BstZI PstI SalI NdeI SacI BstXI NsiI ➞ ori SP6 1 start 14 20 26 31 37 43 43 49 52 64 70 77 77 88 90 97 109 118 127 141 1473VA05_6A NaeI 2707 ➞ ScaI 1890 Figure 3. pGEM®-T Easy Vector circle map and sequence reference points. Figure 8.® The map and sequence reference points of pGEM®-T Easy pGEM -T Easy Vector sequence reference points: Vector used for TA cloning in this experiment (Promega). T7 RNA polymerase transcription initiation site multiple cloning region SP6 RNA polymerase promoter (–17 to +3) SP6 RNA polymerase transcription initiation site pUC/M13 Reverse Sequencing Primer binding site lacZ start codon lac operator β-lactamase coding region phage f1 region lac operon sequences pUC/M13 Forward Sequencing Primer binding site T7 RNA polymerase promoter (–17 to +3) 1 10–128 139–158 141 176–197 180 200–216 1337–2197 2380–2835 2836–2996, 166–395 2949–2972 2999–3 Note: Inserts can be sequenced using the SP6 Promoter Primer (Cat.# Q5011), T7 Promoter Primer (Cat.# Q5021), pUC/M13 Forward Primer (Cat.# Q5601), or pUC/M13 Reverse Primer (Cat.# Q5421). ! Note: A single digest with BstZI (Cat.# R6881), EcoRI (Cat.# R6011) or NotI (Cat.# R6431) will release inserts cloned into the pGEM®-T Easy Vector. Double digests can also be used to release inserts. Promega Corporation · 2800 Woods Hollow Road · Madison, WI 53711-5399 USA Toll Free in USA 800-356-9526 · Phone 608-274-4330 · Fax 608-277-2516 · www.promega.com Part# TM042 Page 6 Printed in USA. Revised 5/07 15 DNA sequence analysis We used five primers to sequence the TA clone of interest. Two of the primers were the vector-based T7 and SP6 Promoter primers flanking the multiple cloning site (Figure 8). Since the W5-CAT13 gene is expected to be 2 kb in size, an internal gene-specific forward primer at gene position 601 bp was also used, in addition to the forward and reverse PCR primers. The reverse complimentary sequences of the sequencing data generated with a reverse primer were used in sequence alignment and analysis. Although vector-based T7 and SP6 Promoter primers generated quality sequencing results, the gene-specific primers gave bad or no reads. The 5’ part of the sequence reads from T7 and SP6 Promoter primers aligned well with pGEM®-T Easy Vector flanking the cloning site, indicating the vector backbone was intact and the sample was free of contamination. None of the sequence data, however, could align with W5-CAT13, suggesting no successful insertion of this candidate gene. In order to identify the DNA fragment being inserted and sequenced on pGEM®-T Easy Vector, we did a BLAST search (Altschul et al., 1990) against the entire available NCBI DNA database (nr). Figure 9 is an excerpt showing the top hits from the search result. In particular our submitted query sequence (190 bp) was 100% identical to 81% of a partial 16S rRNA gene from an uncultured bacterium (accession number FM242723.1) with the Expect value (E) of 5e-40, suggesting the short insert carried in this particular clone examined was likely to be the result of a random amplification. Other top hits include uncultured fungi partial 18S rRNA gene, and some cloning vector backbones. Although most of these BLAST hits were short thus we were unable to determine the source of this 16 Figure 9. An excerpt of BLAST result, showing top hits with significant alignments. 17 false insert, the BLAST results suggested that this inserted fragment was likely to be from an arbitrary lab contaminant. Summary In this experiment, we have learned not only the basic molecular biology techniques and skills in cloning and manipulation of genes, but also some advanced knowledge in modern biology and bio-fuel. We have also learned to adjust our expectations and goals on our research, as most of great scientific findings are constituted by many small parts, which is the role of our project in the Li Lab. As much as we were eager to finish this experiment by ourselves, we have come to realize that research is a challenging endeavor -- it not only takes hard work, but also requires considerable patience and luck. Making every effort to succeed on our experiments, we spent a lot of time working in the lab, trial and error, and repeating many steps whose significance did not seem obvious at that moment. There were times when our ambition was running down; the idea of giving up surfaced on our minds. Eventually, however, we overcame the obstacles and tiredness and found the true enthusiasm for doing research. Regardless of the outcome of our experiments, we still enjoyed the science through the execution of the research project. This experience may be trivial to our colleagues in the lab, but it is significant to us, who step into this territory for the very first time. It has been a profound experience in our senior high school life. Thank you to those who have helped us and taught us the attitude of mind for doing scientific research. Acknowledgements We would like to thank Dr. Christine Wang for teaching us how to conduct the experiments, offering us invaluable advice and helping us with the editing of this report. 18 We are also grateful to Dr. Tzi-Yuan Wang !"# and other colleagues in the Li lab, for giving us support and advice. We would like to express our sincerest gratitude to Dr. Wen-Hsiung Li $%& for having us in his lab where we could carry out experiments for our training and research program. It has been a great pleasure to be part of the Biological Programs for Selected Senior High School Students in Academia Sinica. We would like to thank all the faculty members, program officers, and participant institutes and departments for this wonderful training program. References Altschul et al. (1990) Basic local alignment search tool. J. Mol. Biol. 215: 403-10. Larkin et al. (2007) ClustalW and ClustalX version 2. Bioinformatics 23: 2947-2948. Shih et al. (2002) High-throughput screening of soluble recombinant proteins. Protein Sci. 11: 1714-1719. US Department of Energy (2009) http://www.jgi.doe.gov/education/bioenergy/ bioenergy_1.html. Wikipedia (2009) http://en.wikipedia.org/wiki/Cellulase. 19