METABOLIC PATHWAYS & GENOMICS File: genome&p

advertisement
DATABASES & PATHWAYS
File name DATABASES AND PATHWAYS
2012
INTRODUCTION
● Genome, transcriptome, proteome, phenome (mutant phenotype), and biochemical / metabolic pathway
databases and their associated tools offer powerful ways to investigate metabolism.
● Genomics-driven approaches (‘database mining’) complement classical biochemical approaches to the
metabolism of all organisms, including plants.
Sequence and expression information - from genomes, transcriptomes, proteomes, etc – complements
biochemical information in several ways:
1. Identifying genes for plant enzymes. Because enzymes (and some transporters) are conserved,
homology searches (with BLAST programs) using prokaryotic, yeast, or animal sequences as query can
identify the corresponding plant proteins, and show whether they are encoded by single genes or gene
families.
Searching plant genomes in this way can show which enzymes are present and which are absent. This
in turn allows ‘metabolic reconstruction’, i.e. predicting metabolic capabilities (the metabolic pathways that
are present) from DNA sequence data alone.
- Plant sequences can be expressed heterologously (e.g., in E. coli or yeast, with a tag to facilitate
purification), and the recombinant proteins can be characterized. This is especially useful for lowabundance or unstable proteins, which are difficult or impossible to isolate from plants in sufficient
amounts for study.
- The functions encoded by plant sequences can be investigated using functional complementation in
microorganisms.
2. Predicting organellar targeting, localization in membranes. Genomic sequences, cDNAs, and ESTs
and can give information about the organellar targeting of enzymes, via their characteristic signal
sequences, and about whether proteins have membrane-spanning domains and hence are likely to be
located in membranes.
Organellar proteome databases can provide high-throughout experimental support for these predictions.
Knowing organellar location can rule in or out possible metabolic functions of proteins.
3. Predicting biochemical function from expression data (microarrays, RNAseq). When, where, and
at what level a gene is expressed can likewise provide clues about function. Correlated patterns of gene
expression (‘co-expression’) in relation to development, environment, or genetic changes (e.g., knocking out
or overexpressing genes) can point to related function.
4. Predict missing enzyme or transporter genes and predict new gene functions by comparative
genomics. By looking for functional linkages among genes in bacteria and archaea (gene fusions,
conserved gene clusters, and co-occurrence patterns) it is possible to:
- Identify enzymes and transporters that are ‘missing’ from known pathways
- Discover new enzymes, pathways, and processes.
Having found a new prokaryotic enzyme by this approach, its counterpart can be sought in plants via
homology searches. Conversely, if an unknown plant enzyme has prokaryotic homologs, comparative
genomic analysis of the latter can help predict the function of the enzyme in both groups. This is a powerful
approach because prokaryotes share many pathways with plants.
********
This part of the course introduces web resources needed to extract the above types of information, and
illustrates how to use them.
BASIC RESOURCES
NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search
programs.
CD-Search http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi NCBI Conserved Domain Database.
Well-annotated models for ancient domains and full-length proteins. Example:
>Ureaplasma urealyticum ATP synthase C chain (EC 3.6.3.14)
MSSFIDITNVISSHVEANLPAVSAENVQSLANGAGIAYLGKYIGTGITMLAAGAVGLMQGFSTANAVQAVARNPEAQPKILSTMIVG
LALAEAVAIYALIVSILIIFVA
Click on ‘Search for similar domain architectures’ button to access CDART tool
Pfam http://pfam.janelia.org/ Pfam protein families and domains database. Click on ‘Sequence search’,
click on hit(s), then click on ‘Domain organisation’
Multalin Sequence Alignment http://bioinfo.genotoul.fr/multalin/multalin.html Aligns protein or DNA
sequences (output in color) and draws simple phylogenetic trees.
ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames.
Phylogeny.fr
specialist
http://www.phylogeny.fr/ Web-based, robust phylogenetic analysis for the non-specialist or
Targeting prediction (membranes, chloroplast, mitochondrion, vacuole, etc) and targeting peptide
cleavage sites:
TMHMM http://www.cbs.dtu.dk/services/TMHMM/
proteins.
TargetP
Prediction of transmembrane helices in
http://www.cbs.dtu.dk/services/TargetP/
Predotar http://urgi.versailles.inra.fr/predotar/predotar.html
WoLF PSORT http://wolfpsort.org/
METABOLIC PATHWAY RESOURCES
Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISSPROT protein database, BRENDA, KEGG, etc)
BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.
KEGG http://www.genome.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic
pathways, and compound structures that can be captured.
BioCyc, EcoCyc, MetaCyc, YeastCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and
Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.
AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows
querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway
overview diagram.
MaizeCyc http://www.gramene.org/pathway/maizecyc.html
KEGG and the various Cyc databases have similar aims but each has features the others lack.
Plant Gateway for PubSEED http://pubseed.theseed.org/seedviewer.cgi?page=PlantGateway Contains
interactive pathway diagrams for plant B vitamin synthesis (Arabidopsis and maize) that include information
on gene fusions. There are also deep annotations of B vitamin pathways in tabular form.
Beware! All metabolic pathway databases have weaknesses:
- They are not necessarily up-to-date and may have omissions and errors in their pathways – so they
should be checked against the literature.
- Proteins are very often (for non-model organisms, almost always) assigned functions based solely on
homology - but it is not clear from the database that this is so
- To reach firm conclusions it is therefore necessary to go to the literature to find whether a putative
function has been authenticated biochemically or genetically
PLANT GENOME RESOURCES
JGI http://genome.jgi.doe.gov/ Joint Genome Institute genome portal (all kingdoms of life)
Gramene http://www.gramene.org/
analysis of grasses and other plants
Curated, open-source, data resource for comparative genome
TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.
Maizesequence.org http://www.maizesequence.org/index.html
and annotation of the maize genome
Browser providing the latest sequence
PLANT TRANSCRIPTOME RESOURCES
Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html
Microarray data. Gives overview of expression, searches for co-responses.
ATTED http://atted.jp/ Microarray data. Searches for co-expression patterns in Arabidopsis (and also
rice); shows gene networks, not just lists of correlated genes.
qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various
organs, correlation of expression of two genes.
PLANT PROTEOME RESOURCES
PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase
SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB-cellular location database for Arabidopsis proteins
(includes GFP and MS-MS data)
PLANT PHENOME RESOURCES
SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.
Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of
chloroplast genes.
RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database,
phenotypic data in transposon-insertional mutants.
COMPARATIVE GENOMICS RESOURCES
STRING http://string.embl.de/ STRING is a database of known and predicted protein-protein relationships,
derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput
experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and
other organisms.
SEED http://www.theseed.org/wiki/Main_Page Database with 3,000+ genomes, many analysis tools.
Very useful for gene cluster analysis.
Browsers compatible with SEED: DOWNLOAD FIREFOX (PC or Mac) or SAFARI (PC or Mac)
To request a SEED account: Go to http://rast.nmpdr.org/rast.cgi * Click ‘Register a new account’,
complete the form, hit ‘Request’ button * After an automated email reply, a password will be emailed.
USING METABOLIC PATHWAY RESOURCES
• SWISS-PROT ENZYME Enzyme nomenclature database http://ca.expasy.org/enzyme/
ENZYME is a repository of information on enzyme nomenclature, with links to other databases. It describes
enzymes that have been given an EC (Enzyme Commission) number, and the reactions they catalyze. It
can be searched in various ways, e.g. by EC number, by common name, by substrate or product.
Example: alcohol dehydrogenase = EC 1.1.1.1 ENZYME entry page * Links to:
BRENDA (convenient entry point)
KEGG (Kyoto University Ligand Chemical Database (maps – glycolysis)
PDB (protein structure database)
MetaCyc
Medline
Cloned enzymes in SwissProt (not exhaustive but curated, i.e. high quality)
• BRENDA Enzyme database http://www.brenda-enzymes.info/ BRENDA is an extensively
referenced enzyme data information system; it includes data on substrate specificity, physical and kinetic
characteristics, inhibitors, sources, cloning, purification etc.
Example: alcohol dehydrogenase EC 1.1.1.1
• KEGG Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/
KEGG computerizes knowledge of molecular and cell biology in terms of pathways that consist of
interacting molecules or genes and provides links from gene catalogs produced by genome sequencing. It
covers regulatory pathways and molecular assemblies as well as metabolic pathways. Its metabolic
pathway maps have links to the enzymes and compounds.
Example: KEGG PATHWAY * Metabolism of Cofactors and Vitamins - Folate biosynthesis * Note that all
enzymes (EC numbers) and intermediates are clickable, e.g. * 3.5.4.16 and its * product (structure can be
captured). Note that this is a composite metabolic scheme. It includes methanopterin biosynthesis (found
only in methane-producing microbes) and tetrahydrobiopterin synthesis (found in animals). Note the
pulldown table (top left) of folate biosynthesis enzymes in different organisms; when an organism is
selected, the enzymes putatively encoded in its genome are colored green. Compare Arabidopsis and
human.
• EcoCyc, MetaCyc http://BioCyc.org/ and AraCyc
EcoCyc - Encyclopedia of E. coli Genes and Metabolism: Describes the genome and biochemical
machinery of E. coli. Contains annotations of all E. coli genes, and their DNA sequences, and describes all
known pathways of E. coli small-molecule metabolism. Each pathway and its component reactions and
enzymes have detailed annotations, and are extensively referenced.
MetaCyc - Metabolic Encyclopedia: A metabolic-pathway database that describes pathways, reactions,
and enzymes of various organisms, especially microbes. MetaCyc contains the E. coli pathways of EcoCyc,
plus other pathways from the literature and on-line sources, with citations to the sources of pathways.
Example: MetaCyc * Search tab – Pathways * Search/Filter by ontology * Biosynthesis * Amino acid
biosynthesis * Superpathway of phenylalanine and tyrosine biosynthesis * Note that all elements in pathway
are clickable.
AraCyc (PlantCyc) at TAIR http://www.arabidopsis.org/biocyc/index.jsp * Search tab – Pathways *
Search/Filter by ontology * Biosynthesis * Amino acid biosynthesis * Superpathway of
Lysine/Threonine/Methionine biosynthesis * Click ‘More detail’ 2x to display genes corresponding to
pathway steps * Select species.
• Plant Gateway for PubSEED http://pubseed.theseed.org/seedviewer.cgi?page=PlantGateway
Contains interactive pathway diagrams for B vitamin synthesis in plants (Arabidopsis and maize), which
includes information on fusions. There are also deep annotations of B vitamin pathways in tabular form.
Example: Click on SEED diagram for thiamin * Note pathway diagram showing compartmentation *
Hovering over enzyme boxes shows enzyme names, clicking on compound circles links to KEGG * Note
‘Gene fusion events’ box * In this box, note the fusion between a TenA protein and a HAD domain of
unknown function * Select Arabidopsis to color genome * Green overlay indicates enzymes for which genes
are known * Clicking on these genes connects to main SEED database.
ORGANELLAR TARGETING
Targeting prediction
Example: 10-Formyltetrahydrofolate deformylase (PurU) is an enzyme found in E. coli and many other
bacteria (e.g., the cyanobacterium Nostoc) that hydrolyzes 10-formyltetrahydrofolate, releasing formate.
The Arabidopsis genome encodes two homologs of E. coli PurU (At5g47435 and At4g17360).
>E_coli gi|548645|sp|P37051|PURU_ECOLI FORMYLTETRAHYDROFOLATE DEFORMYLASE (FORMYL-FH(4) HYDROLASE)
MHSLQRKVLRTICPDQKGLIARITNICYKHELNIVQNNEFVDHRTGRFFMRTELEGIFNDSTLLADLDSA
LPEGSVRELNPAGRRRIVILVTKEAHCLGDLLMKANYGGLDVEIAAVIGNHDTLRSLVERFDIPFELVSH
EGLTRNEHDQKMADAIDAYQPDYVVLAKYMRVLTPEFVARFPNKIINIHHSFLPAFIGARPYHQAYERGV
KIIGATAHYVNDNLDEGPIIMQDVIHVDHTYTAEDMMRAGRDVEKNVLSRALYKVLAQRVFVYGNRTIIL
>Nostoc gi|186681065|ref|YP_001864261.1| formyltetrahydrofolate deformylase [Nostoc punctiforme PCC 73102]
MMTNPTATLLISCPDQRGLVAKFANFIYSNGGNIIHADQHTDFAAGLFLTRIEWQLEGFNLPREFIAPAF
NAIAQPLSAKWEIRFSDTVPRIAIWVSRQDHCLFDLIWRQRAKEFVAEIPLIISNHANLKVVAEQFNIDF
QHVPITKDNKSEQEAQQLELLRQYKIDLVVLAKYMQIVSADFINQFSQIINIHHSFLPAFIGANPYHRAF
ERGVKIIGATAHYATADLDAGPIIEQDVVRVSHRDEVDDLVRKGKDLERVVLARAVRSHLQNRVLVYGNR
TVVFE
>At5g47435 gi|18422794|ref|NP_568682.1| formyltetrahydrofolate deformylase, putative [Arabidopsis thaliana]
MIRRITERASGFAKNIPILKSSRFHGESLDSSVSPVLIPGVHVFHCQDAVGIVAKLSDCIAAKGGNILGY
DVFVPENNNVFYSRSEFIFDPVKWPRSQVDEDFQTIAQRYGALNSVVRVPSIDPKYKIALLLSKQDHCLV
EMLHKWQDGKLPVDITCVISNHERASNTHVMRFLERHGIPYHYVSTTKENKREDDILELVKDTDFLVLAR
YMQILSGNFLKGYGKDVINIHHGLLPSFKGGYPAKQAFDAGVKLIGATSHFVTEELDSGPIIEQMVESVS
HRDNLRSFVQKSEDLEKKCLTRAIKSYCELRVLPYGTNKTVVF
>At4g17360 gi|15236046|ref|NP_193467.1| formyltetrahydrofolate deformylase, putative [Arabidopsis thaliana]
MIRRVSTTSCLSATAFRSFTKWSFKSSQFHGESLDSSVSPLLIPGFHVFHCPDVVGIVAKLSDCIAAKGG
NILGYDVLVPENKNVFYSRSEFIFDPVKWPRRQMDEDFQTIAQKFSALSSVVRVPSLDPKYKIALLLSKQ
DHCLVEMLHKWQDGKLPVDITCVISNHERAPNTHVMRFLQRHGISYHYLPTTDQNKIEEEILELVKGTDF
LVLARYMQLLSGNFLKGYGKDVINIHHGLLPSFKGRNPVKQAFDAGVKLIGATTHFVTEELDSGPIIEQM
VERVSHRDNLRSFVQKSEDLEKKCLMKAIKSYCELRVLPYGTQRTVVF
Targeting predictions for the At5g47435 and At4g17360 proteins using:
TargetP: http://www.cbs.dtu.dk/services/TargetP/ Paste in both Arabidopsis sequences * Check ‘Plant’,
‘Perform cleavage site predictions’
Predotar: http://urgi.versailles.inra.fr/predotar/predotar.html Paste in both Arabidopsis sequences
The prediction algorithms agree that both proteins are mitochondrial. To check this, align them with the
bacterial PurU sequences using Multalin http://bioinfo.genotoul.fr/multalin/multalin.html * Alignment
shows that both Arabidopsis proteins have N-terminal extensions of ~35 residues (a typical size for a
mitochondrial targeting peptide). * Align just the two Arabidopsis sequences – note that the N-terminal
extensions are not conserved (typical of targeting sequences).
Targeting – proteome databases with experimental findings
PPDB http://ppdb.tc.cornell.edu/ Click on ‘Accession’ * Paste AGI number(s) in box, e.g. At1g03475
(Coproporphyrinogen III oxidase) * Click on link(s) * Displays proteomic evidence in database and
published.
SUBA3 http://suba.plantenergy.uwa.edu.au/ Click Search tab * Paste AGI number(s) in lower box, e.g.
At1g03475 * Click + ‘Arabidopsis Gene Initiative (AGI) identifier(s) is in list’, then ‘Query’ * Displays
evidence.
PHYLOGENETIC TREES
Using Phylogeny.fr: Select ‘One Click’ mode * Paste in the four PurU sequences above * Click ‘Submit’ *
Carries out in sequence alignment with MUSCLE, Maximum Likelihood tree-building with PhyML, and treedrawing with TreeDyn. The analysis runs the aLRT statistical test, which gives results similar to the
bootstrap procedures but is much faster. * Download the tree in preferred image format. ‘Advanced’ and ‘A
la carte’ modes are available for experienced users.
MICROARRAY DATABASES
Microarrays: The 22K Affymetrix chip contains most Arabidopsis genes, so in principle it can be used to
monitor the expression of almost all metabolic genes. However, many metabolic genes have low expression
levels, and so cannot be monitored with confidence. Genes with low average expression levels tend to give
large numbers of spurious co-expression matches.
mRNA abundance in general correlates broadly with protein abundance and with in-vivo metabolic fluxes.
Therefore digital gene expression data can indicate which organs have a pathway and which do not, and
whether a pathway is likely to be a major or minor one. Note also that primary metabolic pathways are
expressed everywhere and always, and that secondary pathways by definition are not. Unexpected
differences in expression may provide clues about genetic control of pathways, e.g. an enzyme whose
transcript level varies more than that of others in the pathway (i.e. is highly regulated, not constitutive) may
be a control point in the pathway.
Microarray-based gene expression profiling using the Golm Transcriptome database
http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html
For an overview of expression in different organs and in different environmental conditions: On face page,
paste in one or several AGI numbers e.g.
At3g12930
At5g47190
At2g39800
(At3g12930 is the plastid Iojap protein; At5g47190 is chloroplast ribosomal protein L19; At2g39800 is the
first enzyme of proline biosynthesis)
* Scroll down to graphs. Note positive correlation between At3g12930 and At5g47190. Note induction of
At2g39800 by stresses.
To search for positively correlated genes, go to Transcript Co-Response, Single Gene Query, paste in
At5g47190 * Select a dataset (‘Matrix’), e.g. developmental series * select an output, e.g. positive, top 100
of co-responding genes * Scroll down list of hits – note many strong correlations with other chloroplast
ribosomal proteins, which associate together to form the protein complexes of the ribosome
Microarray-based gene expression profiling using ATTED http://atted.jp/
On face page Search box select ‘Gene ID’, paste an AGI number, e.g. At5g47190 in box, click ‘Search’ *
Click on link (‘Target’ box summarizes targeting predictions) * Displays coexpressed gene network around
At5g47190 * Note the many proteins related to plastid ribosomes * Click on coexpressed gene list for more
coexpressed genes * Check all 4 boxes * Default ranking is by all datasets * Rankings in individual datasets
(e.g. tissue type, abiotic stress) can also be displayed * In ‘Link’ column, graph icon displays correlation
data points * Osa homolog column shows putative rice ortholog, clicking on link displays correlation list for
rice genes * Note many ribosome associations of rice homolog of best Arabidopsis hit.
RNAseq-based gene expression profiling using qteller http://qteller.com/ Select maize, paste in
GRMZM2G161299 or GRMZM2G420119 * Note expression profile by organ * Use these two genes for
correlation analysis.
PLANT PHENOME DATABASES
Although less developed than phenotype databases for mutants in model microorganisms, there are several
such resources for plants, and they are growing.
SeedGenes http://www.seedgenes.org/ Covers ~350 Arabidopsis genes that give a seed phenotype when
disrupted by mutation. Click ‘Enter’, click ‘Access the SeedGenes Query Page’, ‘Browse genes’, search for
AGI numbers in list.
Chloroplast2010 http://www.plastid.msu.edu/ Has morphological and metabolic phenotype data for
>5,000 mutants in genes whose products are predicted to be chloroplast-targeted. In ‘Large scale
phenomics data’ click ‘Here’ * In ‘Phenotypic Analysis Overvew’ click ‘Here’ * Log in or sign up to get an
account * In Search by Query Term(s) area, search by AGI number, e.g. At4g25050, At1g10310 (be sure to
avoid blank spaces or empty lines) * Click on links to genes * See tabs for morphology, leaf amino acid
profile, etc
RAPID http://rarge.gsc.riken.jp/phenome/ Phenotypic data in transposon-insertional mutants. Click ‘Line
list’ * search for AGI number, e.g. At2g48120 * copy line code 11-2389-1 * Click on ‘Search’ * Paste line
code into search box * Click ‘Search’ * displays image of albino seedling
USING GENOME RESOURCES TO FIND PLANT ENZYME GENES
This exercise demonstrates how to find Arabidopsis and maize genes encoding an enzyme, starting from
the sequence of a bacterial enzyme, 5,10-methylenetetrahydrofolate reductase, EC 1.5.1.20 (MetF).
Go to Swiss-Prot Enzyme, enter 1.5.1.20 * Click on link to E. coli MetF * Capture FASTA sequence * Go to
NCBI Protein BLAST search * Select Arabidopsis thaliana * Hits on MTHFR1 and MTHFR2 (At3g59970
and At2g44160) Note multiple entries for each gene * Capture full-length (about 590 residues) FASTA text
sequences, save to Word file * Align in Multalin to confirm their very high similarity.
To maize homologs, go to Maizesequence.org, click ‘BLAST’ in header bar * Paste either Arabidopsis
sequence in search box * Select ‘peptide queries’, ‘peptide database’, ‘Filtered gene set peptides’,
‘BLASTP’, search sensitivity ‘no optimization’, click ‘Run’ * In output, if necessary turn on all columns, select
‘E-val’ in Stats and <E-val in Sort By * To see alignments, click [A] * Very strong hit, GRMZM2G347056
(593 residues) on chromosome 1; also second hit, truncated (382 residues), GRMZM2G034278 on
chromosome 5. (Third hit is a small fragment) * To capture protein sequences, click on GRMZM identifiers,
‘Protein sequence’, save to Word file.
Sequence alignment indicates that GRMZM2G034278 is distinct from GRMZM2G347056 and both
Arabidopsis sequences in lacking ~200 residues at the C-terminus, in having a very different N-terminal
region of ~80 residues. GRMZM2G034278 is thus almost certainly an incorrectly-called gene or a
pseudogene.
To check whether these genes are expressed, use GRMZM2G034278 and GRMZM2G347056 protein
sequences in tBLASTn against maize ESTs:
-
50 exactly match GRMZM2G082463 (allowing for imperfections characteristic of EST sequences)
None appear to exactly match GRMZM2G034278
Therefore, since the predicted GRMZM2G034278 protein is truncated, and has no cognate ESTs (i.e. is not
transcribed), it is most probably a pseudogene. Note that ~85% of the maize genome consists of hundreds
of families of transposable elements. These are responsible for capture and amplification of many gene
fragments.
>MetF 5,10-methylenetetrahydrofolate reductase [Escherichia coli str. K-12 substr. MG1655]
MSFFHASQRDALNQSLAEVQGQINVSFEFFPPRTSEMEQTLWNSIDRLSSLKPKFVSVTYGANSGERDRT
HSIIKGIKDRTGLEAAPHLTCIDATPDELRTIARDYWNNGIRHIVALRGDLPPGSGKPEMYASDLVTLLK
EVADFDISVAAYPEVHPEAKSAQADLLNLKRKVDAGANRAITQFFFDVESYLRFRDRCVSAGIDVEIIPG
ILPVSNFKQAKKFADMTNVRIPAWMAQMFDGLDDDAETRKLVGANIAMDMVKILSREGVKDFHFYTLNRA
EMSYAICHTLGVRPGL
>MTHFR1 gi|15232215|ref|NP_191556.1| methylenetetrahydrofolate reductase 1 [Arabidopsis thaliana]
MKVVDKIKSVTEQGQTAFSFEFFPPKTEDGVENLFERMDRLVSYGPTFCDITWGAGGSTADLTLEIASRM
QNVICVETMMHLTCTNMPIEKIDHALETIRSNGIQNVLALRGDPPHGQDKFVQVEGGFACALDLVNHIRS
KYGDYFGITVAGYPEAHPDVIEADGLATPESYQSDLAYLKKKVDAGADLIVTQLFYDTDIFLKFVNDCRQ
IGINCPIVPGIMPISNYKGFLRMAGFCKTKIPAELTAALEPIKDNDEAVKAYGIHFATEMCKKILAHGIT
SLHLYTLNVDKSAIGILMNLGLIDESKISRSLPWRRPANVFRTKEDVRPIFWANRPKSYISRTKGWNDFP
HGRWGDSHSAAYSTLSDYQFARPKGRDKKLQQEWVVPLKSIEDVQEKFKELCIGNLKSSPWSELDGLQPE
TKIINEQLGKINSNGFLTINSQPSVNAAKSDSPAIGWGGPGGYVYQKAYLEFFCSKDKLDTLVEKSKAFP
SITYMAVNKSENWVSNTGESDVNAVTWGVFPAKEVIQPTIVDPASFKVWKDEAFEIWSRSWANLYPEDDP
SRKLLEEVKNSYYLVSLVDNNYINGDIFSVFA
>MTHFR2 gi|18406468|ref|NP_566011.1| methylenetetrahydrofolate reductase 2 [Arabidopsis thaliana]
MKVIDKIQSLADEGKTAFSFEFFPPKTEDGVDNLFERMDRMVAYGPTFCDITWGAGGSTADLTLDIASRM
QNVVCVESMMHLTCTNMPVEKIDHALETIRSNGIQNVLALRGDPPHGQDKFVQVEGGFDCALDLVNHIRS
KYGDYFGITVAGYPEAHPDVIGENGLASNEAYQSDLEYLKKKIDAGADLIVTQLFYDTDIFLKFVNDCRQ
IGISCPIVPGIMPINNYRGFLRMTGFCKTKIPVEVMAALEPIKDNEEAVKAYGIHLGTEMCKKMLAHGVK
SLHLYTLNMEKSALAILMNLGMIDESKISRSLPWRRPANVFRTKEDVRPIFWANRPKSYISRTKGWEDFP
QGRWGDSRSASYGALSDHQFSRPRARDKKLQQEWVVPLKSVEDIQEKFKELCLGNLKSSPWSELDGLQPE
TRIINEQLIKVNSKGFLTINSQPSVNAERSDSPTVGWGGPVGYVYQKAYLEFFCSKEKLDAVVEKCKALP
SITYMAVNKGEQWVSNTAQADVNAVTWGVFPAKEIIQPTIVDPASFNVWKDEAFETWSRSWANLYPEADP
SRNLLEEVKNSYYLVSLVENDYINGDIFAVFADL
>GRMZM2G347056
MKVIEKILEAAGDGRTAFSFEYFPPKTEEGVENLFERMDRMVAHGPSFCDITWGAGGSTA
DLTLEIANRMQNMVCVETMMHLTCTNMPVEKIDHALETIKSNGIQNVLALRGDPPHGQDK
FVQVEGGFACALDLVQHIRAKYGDYFGITVAGYPEAHPDAIQGEGGATLEAYSNDLAYLK
RKVDAGADLIVTQLFYDTDIFLKFVNDCRQIGITCPIVPGIMPINNYKGFLRMTGFCKTK
IPSEITAALDPIKDNEEAVRQYGIHLGTEMCKKILATGIKTLHLYTLNMDKSAIGILMNL
GLIEESKVSRPLPWRPATNVFRVKEDVRPIFWANRPKSYLKRTLGWDQYPHGRWGDSRNP
SYGALTDHQFTRPRGRGKKLQEEWAVPLKSVEDISERFTNFCQGKLTSSPWSELDGLQPE
TKIIDDQLVNINQKGFLTINSQPAVNGEKSDSPTVGWGGPGGYVYQKAYLEFFCAKEKLD
QLIEKIKAFPSLTYIAVNKDGETFSNISPNAVNAVTWGVFPGKEIIQPTVVDHASFMVWK
DEAFEIWTRGWGCMFPEGDSSRELLEKVQKTYYLVSLVDNDYVQGDLFAAFKI
>GRMZM2G034278
MCMLLRKDSGHYLAIVVYVKCCSLEEERRKERIPTELMRSFILTSHTAPGRAPAASSICN
DRTRRRAELLSSYIYNSSTKVCVETMMHLTCTNMPVEKIDHALETIKFNGIHNVLALRGD
PPHGQDKFVQVEGGFACALDLVQHIRSKYGDYFGITVAGYPEAHPDAIQGEGGATLEAYS
NDLAYLKRKVDAGADLIVTQLFYDTDIFLKFVNDCRQIGITCPIVPGIMPINNYKGFMRM
TGFCKTKIPSEITAALDPIKDNEEAVRAYGIHLGTEMCKKIIASGIKTLHLYTLNVDKSA
LGILMNLGLIEESKVSRSLPWRPATNVFRVKEVVRPIFWASRPKSYLKRTLGWDQYPHEG
GVILETHHMEHLGIVHKTTWTW
Download