Comparative Transcriptomics as a Gene Discovery Tool in Solanum pennellii, a Potential Source of Biogasoline Tom McKnight, Sachi Mandal, Wang Ming Ji, Department of Biology Photo by TRGC xkcd.com Fractions decreasing in density and boiling point Fractions increasing in density and boiling point Crude oil C1 to C4 gases C1 Methane C2 Ethanol C4 Biobutanol C5 to C9 naphtha C5 to C10 gasoline C4 to C10 S. pennellii biogasoline C10 to C16 kerosene C14 to C20 diesel C20 to C50 lubricating oil Heating C50 to C70 fuel oil > C70 residue C16 & C18 Biodiesel Solanum pennellii is native to extremely arid regions of Peru Glucolipids are secreted by trichomes onto the leaf surface Fobes, J.F., Mudd, J.B. and Marsden, M.P.F. (1985) Plant Physiol. 77, 567-570. Glucolipids accumulate to over 20% of dry weight of the plant! mg/g extractable lipid % DW 200 150 20% 15% S. pennellii 100 10% 50 5% VF36 5 6 7 8 9 10 11 12 13 14 15 16 Weeks of Growth Fobes, J.F., Mudd, J.B. and Marsden, M.P.F. (1985) Plant Physiol. 77, 567-570. The S. pennellii glucolipid has three short-chain fatty acids (C4 to C10) esterified to glucose CH2OH O O CH3(CH2)n-C-O O OH O O O-C-(CH2)nCH3 C-(CH2)nCH3 Transesterification of triglycerides produces biodiesel H2C-OH O H2C-O-C-(CH2)n-CH3 O HC-O-C-(CH2)n-CH3 O H2C-O-C-(CH2)n-CH3 + CH3OH Vegetable oil + MeOH HC-OH NaOH H2C-OH O CH3-O-C-(CH2)nCH3 O CH3-O-C-(CH2)nCH3 O CH3-O-C-(CH2)nCH3 Glycerol + 3 long-chain fatty acid esters Transesterification of glucolipid produces biogasoline CH2OH O OH + CH3OH OH HO 2,3,4-tri-O-acylglucose NaOH O OH CH3-O-C-(CH2)nCH3 O CH3-O-C-(CH2)nCH3 O CH3-O-C-(CH2)nCH3 Glucolipid + MeOH Glucose + 3 short-chain fatty acid esters Predominant fatty acids in acylsugars of S. pennellii accessions Fatty acid LA 0716 LA 1941 LA 1946 LA 1912 (n=6) (n=6) (n=6) (n=6) 2-methylpropanoate (C4) 41.8 (0.4) 42.2 (1.6) 41.6 (0.8) t 2-methylbutanoate (C5) 10.8 (0.2) 9.9 (0.7) 9.0 (0.3) - 3-methylbutanoate (C5) 4.0 (0.2) 8.5 (2.2) 13.0 (0.5) t 5-methylhexanoate (C7) - t 2.0 (0.2) - 6-methylheptanoate (C8) - - - t 7-methyloctanoate (C9) t t t - n-octanoate (C8) - - - - 8-methylnonanoate (C10) 26.3 (0.6) 19.7 (2.3) 9.9 (0.3) t n-decanoate (C10) 10.7 (0.4) 10.6 (1.1) 14.8 (0.4) - 9-methyldecanoate (C11) t t t t N-dodecanoate (C12) 4.9 (0.2) 5.7 (0.6) 7.1 (0.6) t t = Trace (<2%) measured. Joseph A. Shapiro et al. (1993) Biochemical Systematics and Ecology 22, 545-561. Advantages of S. pennellii Not a food or feed crop Drought tolerant and can grow on marginal land Lipid is on leaf surface and can be extracted in the field with a simple ethanol rinse to rapidly yield a high-energy, high-value liquid without transporting large amounts of low-value biomass Glucolipid can be converted to gasoline with standard transesterification technology Resulting biogasoline should be compatible with existing fuel technology (transportation and engines) Potential Disadvantages of S. pennellii Not perennial (yet) Yield per acre is not known (yet) Biosynthetic Pathway for Glucolipid • Glucolipid is made from UDPGlucose and short chain fatty acids. UDP Glucose Fatty acid Cloned • Genes encoding the first two enzymes have been cloned and characterized. • Step 1 – Glucosyltransferase • Step 2 – Glucose acyltransferase • There are only two or three additional steps, making this short pathway a good candidate for moving into other plants. 1-O-acyl-ß-glucose Cloned 1,2-di-O-acyl-ß-glucose 2,3,4-tri-O-acyl-ß-glucose Glucoseor Not Cloned Characterized Information Flow DNA makes RNA makes Protein DNA is the chemical stable genetic material. RNA is an unstable messenger that conveys information from DNA to ribosomes. Ribosomes read the genetic code on RNA to make proteins. Proteins do the bulk of work in cells. Growing protein chain Generation of S. pennellii transcriptome Total RNA isolated from different Solanum pennellii lines Purified mRNA Oligo dT selection (< 1% of total RNA) Reverse Transcription ~200 million paired-ends reads Next-Gen DNA Sequencing cDNA library QC & End Trimming Assembly Trimmed reads (101 or 125 nt long) Assembled Solanum pennellii transcriptome for further analysis Short reads (101 nt) mapped to Gene 1 genomic DNA region Intergenic region Promoter (switch) Protein coding region Identities of Putative Transcripts No Match (20,547) Matched to GO Term (32,909) Matched, No Info (8,313) CEGMA: 456 of 458 conserved genes represented (99.5%) Comparative Transcriptomics Four high and four low glucolipid-producing accessions High (>20% DW) • • • • 0716 1941 1946 1302 Low (<5% DW) • • • • 1911 1912 1920 1926 ~200 million reads (125 bp x 2 ends) for 8 accessions Transcriptomes of different Solanum pennellii accessions after Trinity assembly Solanum pennellii lines Number of contigs 30000000 0716 Hi Reference 57242 25000000 70749 1302 Hi 67214 1941 Hi 61207 1946 Hi 60790 1911 Lo 66962 1912 Lo 68316 1920 Lo 62045 1926 Lo 62908 Total reads 0716 Hi 20000000 15000000 10000000 5000000 0 0716 1302 1944 1946 1911 1912 Solanum pennellii lines 1920 1926 Mapping RUBISCO small subunit sequence reads for QC 1911_Low 1912_Low 1920_Low 1920_Low 1302_High 1941_High 1946_High 0716_High Expression level of control genes in different S. pennellii accessions 700 High Low 600 RPKM 500 400 Phosphoglycerate kinase 300 Ubiquitin conjugating enzyme 200 100 0 0716 0716 1302 1302 1944 1946 1911 1912 1920 1926 1941 1946 1911 1912 1920 1926 Solanum pennellii accessions RPKM = Reads per kilobase of model (gene) Gene 1: sequence and expression levels 1911_Low 1912_Low 1920_Low 1926_Low 1302_High 1941_High 1946_High 0716_High Expression level of Gene1 & 2 in different S. pennellii accessions 600 Low High 500 RPKM 400 300 Gene1 Gene2 200 100 0 0716 0716 1302 1302 1944 1941 1946 1946 1911 1911 1912 1912 Solanum pennellii accessions RPKM = Reads per kilobase of model (gene) 1920 1920 1926 1926 Alpha/beta hydrolase family Sugar (and other) transporter Thaumatin family Glutathione S-transferase, Nterminal domain alpha/beta hydrolase fold Initiation factor 2 subunit family Cyclophilin type peptidyl-prolyl cistrans isomerase/CLD Fatty acid desaturase hypothetical protein Plant invertase/pectin methylesterase inhibitor hypothetical protein hypothetical protein Xylanase inhibitor N-terminal Prephenate dehydratase hypothetical protein hypothetical protein Plant invertase/pectin methylesterase inhibitor AP2 domain EamA-like transporter family Glycosyl hydrolases family 17 Chitinase class I Chitinase class I Dirigent-like protein Uncharacterised protein family (UPF0041) hypothetical protein hypothetical protein Cytochrome P450 Glycosyl hydrolases family 16 Light regulated protein Lir1 Potato inhibitor I family Pathogenesis-related protein Bet v I family Major intrinsic protein Thaumatin family Subtilase family Subtilase family Subtilase family Protein of unknown function (DUF_B2219) hypothetical protein UDP-glucoronosyl and UDP-glucosyl transferase hypothetical protein hypothetical protein No apical meristem (NAM) protein Peptidase family M20/M25/M40 BURP domain Cytochrome P450 Patatin-like phospholipase Cytochrome P450 Serine carboxypeptidase Transmembrane amino acid transporter protein Leucine Rich repeats (2 copies) Leucine Rich repeats (2 copies) hypothetical protein F-box associated Cytochrome P450 Mycolic acid cyclopropane synthetase Cytochrome P450 Reverse transcriptase (RNA-dependent DNA polymerase) Rieske (2Fe-2S) domain Gibberellin regulated protein Myb-like DNA-binding domain AP2 domain Putative lysophospholipase Pathogenesis-related protein Bet v I family Glycosyl hydrolases family 32 N-terminal domain DnaJ domain Plastocyanin-like domain hypothetical protein Yippee zinc-binding/DNA-binding /Mis18, centromere assembly Leucine rich repeat non-haem dioxygenase in morphine synthesis N-terminal Domain associated at C-terminal with AAA Alpha/beta hydrolase family K-box region Chitinase class I Pectinesterase Terpene synthase family, metal binding domain Papain family cysteine protease hypothetical protein non-haem dioxygenase in morphine synthesis N-terminal hypothetical protein EF hand Chitinase class I Gibberellin regulated protein hypothetical protein Translationally controlled tumour protein Glutathione S-transferase, N-terminal domain Plant mobile domain Uncharacterised protein family (UPF0113) Protein of unknown function (DUF1298) Helix-loop-helix DNA-binding domain PA domain Cytochrome P450 hypothetical protein Xylanase inhibitor C-terminal MOSC N-terminal beta barrel domain gag-polypeptide of LTR copia-type Reverse transcriptase (RNAdependent DNA polymerase) hypothetical protein Protein kinase domain Cytochrome P450 Glycosyl hydrolases family 17 Protein kinase domain Leucine Rich Repeat Xylanase inhibitor C-terminal Glutathione S-transferase, N-terminal domain