METABOLIC PATHWAYS & GENOMICS File: genome&p

advertisement

DATABASES & PATHWAYS

File name DATABASES AND PATHWAYS

2013

INTRODUCTION

● Genome, transcriptome, proteome, phenome (mutant phenotype), and biochemistry / metabolic pathway databases and their associated tools offer powerful ways to investigate metabolism.

● Genomicsdriven approaches (‘database mining’) complement classical biochemical approaches to metabolism in all organisms, including plants.

Sequence and expression information - from genomes, transcriptomes, proteomes, etc – complements biochemical information in several ways:

1. Identifying genes for plant enzymes. Because many enzymes (and some transporters) are conserved, homology searches (with BLAST programs) using prokaryotic, yeast, or animal sequences as query can identify the corresponding plant proteins, and show whether they are encoded by single genes or gene families.

Searching plant genomes in this way can show which enzymes are present and which are absent . This in turn allows ‘metabolic reconstruction’, i.e. predicting metabolic capabilities (the metabolic pathways that are present) from DNA sequence data alone.

- Plant sequences can be expressed heterologously (e.g., in E. coli , with a tag to facilitate purification), and the recombinant proteins can be characterized. This is especially useful for low-abundance or unstable proteins, which are hard to isolate from plants in sufficient amounts for study.

- The functions encoded by plant sequences can be investigated using functional complementation in microorganisms.

2. Predicting organellar targeting, localization in membranes.

Genomic sequences, cDNAs, and ESTs and can give information about the organellar targeting of enzymes, via their characteristic signal sequences, and about whether proteins have membrane-spanning domains and hence are likely to be located in membranes.

Organellar proteome databases can provide high-throughout experimental support for these predictions.

Knowing organellar location can rule in or out possible metabolic functions of proteins.

3. Predicting biochemical function from expression data (microarrays, RNAseq). When, where, and at what level a gene is expressed can likewise provide clues about function. Correlated patterns of gene expression (‘co-expression’) in relation to development, environment, or genetic changes (e.g., knocking out or overexpressing genes) can point to related function.

4. Predict missing enzyme or transporter genes and predict new gene functions by comparative genomics. By looking for functional linkages among genes in bacteria and archaea (gene fusions, conserved gene clusters, and co-occurrence patterns) it is possible to:

- I dentify enzymes and transporters that are ‘missing’ from known pathways

- Discover new enzymes, pathways, and processes.

Having found a new prokaryotic enzyme by this approach, its counterpart can be sought in plants via homology searches. Conversely, if an unknown plant enzyme has prokaryotic homologs, comparative

genomic analysis of the latter can help predict the function of the enzyme in both groups. This is a powerful approach because prokaryotes share many pathways with plants.

********

T his part of the course introduces web resources needed to extract the above types of information, and illustrates how to use them.

BASIC RESOURCES

NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs.

CD-Search http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi NCBI Conserved Domain Database.

Well-annotated models for ancient domains and full-length proteins. Example:

>Ureaplasma urealyticum ATP synthase C chain (EC 3.6.3.14)

MSSFIDITNVISSHVEANLPAVSAENVQSLANGAGIAYLGKYIGTGITMLAAGAVGLMQGFSTANAVQAVARNPEAQPKILSTMIVG

LALAEAVAIYALIVSILIIFVA

Click on ‘Search for similar domain architectures’ button to access CDART tool

Pfam http://pfam.janelia.org/ Pfam protein families and dom ains database. Click on ‘Sequence search’, click on hit(s), then click on ‘Domain organisation’

Multalin Sequence Alignment http://bioinfo.genotoul.fr/multalin/multalin.html Aligns protein or DNA sequences (output in color) and draws simple phylogenetic trees.

ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames.

Phylogeny.fr

http://www.phylogeny.fr/ Web-based, robust phylogenetic analysis for the non-specialist or specialist

Targeting prediction (membranes, chloroplast, mitochondrion, vacuole, etc) and targeting peptide cleavage sites:

TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices in proteins.

TargetP http://www.cbs.dtu.dk/services/TargetP/

Predotar http://urgi.versailles.inra.fr/predotar/predotar.html

WoLF PSORT http://wolfpsort.org/

Seq2Ref http://prodata.swmed.edu/seq2ref/ Finds literature references relating to BlastP hits of an input protein sequence

METABOLIC PATHWAY RESOURCES

Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISS-

PROT protein database, BRENDA, KEGG, etc)

BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.

KEGG http://www.genome.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured.

BioCyc, EcoCyc, MetaCyc, YeastCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and

Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.

AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis . Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram.

MaizeCyc http://www.gramene.org/pathway/maizecyc.html

KEGG and the various Cyc databases have similar aims but each has features the others lack.

Beware!

All metabolic pathway databases have weaknesses :

- They may not be up-to-date and may have omissions and errors in their pathways – so they should be checked against the literature.

- Proteins are very often (for non-model organisms, almost always) assigned functions based solely on homology - but it is not clear from the database that this is so

- To reach firm conclusions it is therefore necessary to go to the literature to find whether a putative function has been authenticated biochemically or genetically

PLANT GENOME RESOURCES

JGI http://genome.jgi.doe.gov/ Joint Genome Institute genome portal (all kingdoms of life)

TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.

Maizesequence.org

http://www.maizesequence.org/index.html Browser providing the latest sequence and annotation of the maize genome

PLANT TRANSCRIPTOME RESOURCES

Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html

Microarray data. Gives overview of expression, searches for co-responses.

ATTED http://atted.jp/ Microarray data. Searches for co-expression patterns in Arabidopsis (and also rice); shows gene networks, not just lists of correlated genes. qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various organs, correlation of expression of two genes.

PLANT PROTEOME RESOURCES

PPDB http://ppdb.tc.cornell.edu/ The P lant P roteome D ata B ase

SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB -cellular location database for Arabidopsis proteins

(includes GFP and MS-MS data)

PLANT PHENOME RESOURCES

SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.

Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes.

RAPID http://rarge.gsc.riken.jp/phenome/ R IKEN Arabidopsis P henome I nformation D atabase, phenotypic data in transposon-insertional mutants.

COMPARATIVE GENOMICS RESOURCES

STRING http://string.embl.de/ STRING is a database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and other organisms.

SEED http://www.theseed.org/wiki/Main_Page Database with 3,000+ genomes, many analysis tools.

Very useful for gene cluster analysis.

Browsers compatible with SEED: DOWNLOAD FIREFOX (PC or Mac) or SAFARI (PC or Mac)

To request a SEED account: Go to http://rast.nmpdr.org/rast.cgi

* Click ‘Register a new account’, complete the form, hit ‘Request’ button * After an automated email reply, a password will be emailed.

USING METABOLIC PATHWAY RESOURCES

• SWISS-PROT ENZYME

Enzyme nomenclature database http://ca.expasy.org/enzyme/

ENZYME is a repository of information on enzyme nomenclature, with links to other databases. It describes enzymes that have been given an EC (Enzyme Commission) number, and the reactions they catalyze. It can be searched in various ways, e.g. by EC number, by common name, by substrate or product.

Example: alcohol dehydrogenase = EC 1.1.1.1 ENZYME entry page * Links to:

BRENDA (convenient entry point)

KEGG (Kyoto University Ligand Chemical Database (maps – glycolysis)

PDB (protein structure database)

MetaCyc

Medline

Cloned enzymes in SwissProt (not exhaustive but curated, i.e. high quality)

• BRENDA

Enzyme database http://www.brenda-enzymes.info/ BRENDA is an extensively referenced enzyme data information system; it includes data on substrate specificity, physical and kinetic characteristics, inhibitors, sources, cloning, purification etc.

Example: alcohol dehydrogenase EC 1.1.1.1

• KEGG

Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/

KEGG computerizes knowledge of molecular and cell biology in terms of pathways that consist of interacting molecules or genes and provides links from gene catalogs produced by genome sequencing. It covers regulatory pathways and molecular assemblies as well as metabolic pathways. Its metabolic pathway maps have links to the enzymes and compounds.

Example: KEGG PATHWAY * Metabolism of Cofactors and Vitamins - Folate biosynthesis * Note that all enzymes (EC numbers) and intermediates are clickable, e.g. * 3.5.4.16 and its * product (structure can be captured). Note that this is a composite metabolic scheme. It includes methanopterin biosynthesis (found only in methane-producing microbes) and tetrahydrobiopterin synthesis (found in animals). Note the pulldown table (top left) of folate biosynthesis enzymes in different organisms; when an organism is selected, the enzymes putatively encoded in its genome are colored green. Compare Arabidopsis and human.

• EcoCyc, MetaCyc

http://BioCyc.org/

and AraCyc

EcoCyc - Encyclopedia of E. coli Genes and Metabolism: Describes the genome and biochemical machinery of E. coli . Contains annotations of all E. coli genes, and their DNA sequences, and describes all known pathways of E. coli small-molecule metabolism. Each pathway and its component reactions and enzymes have detailed annotations, and are extensively referenced.

MetaCyc - Metabolic Encyclopedia: A metabolic-pathway database that describes pathways, reactions, and enzymes of various organisms, especially microbes. MetaCyc contains the E. coli pathways of EcoCyc, plus other pathways from the literature and on-line sources, with citations to the sources of pathways.

Example: MetaCyc * Search tab – Pathways * Search/Filter by ontology * Biosynthesis * Amino acid biosynthesis * Superpathway of phenylalanine and tyrosine biosynthesis * Note that all elements in pathway are clickable.

AraCyc (PlantCyc) at TAIR http://www.arabidopsis.org/biocyc/index.jsp

* Search tab – Pathways *

Search/Filter by ontology * Biosynthesis * Amino acid biosynthesis * Superpathway of

Lysine/Threonine/Methionine biosynthesis * Click ‘More detail’ 2x to display genes corresponding to pathway steps * Select species.

ORGANELLAR TARGETING

Targeting prediction

Example: 10-Formyltetrahydrofolate deformylase (PurU) is an enzyme found in E. coli and many other bacteria (e.g., the cyanobacterium Nostoc ) that hydrolyzes 10-formyltetrahydrofolate, releasing formate.

The Arabidopsis genome encodes two homologs of E. coli PurU (At5g47435 and At4g17360).

>E_coli gi|548645|sp|P37051|PURU_ECOLI FORMYLTETRAHYDROFOLATE DEFORMYLASE (FORMYL-FH(4) HYDROLASE)

MHSLQRKVLRTICPDQKGLIARITNICYKHELNIVQNNEFVDHRTGRFFMRTELEGIFNDSTLLADLDSA

LPEGSVRELNPAGRRRIVILVTKEAHCLGDLLMKANYGGLDVEIAAVIGNHDTLRSLVERFDIPFELVSH

EGLTRNEHDQKMADAIDAYQPDYVVLAKYMRVLTPEFVARFPNKIINIHHSFLPAFIGARPYHQAYERGV

KIIGATAHYVNDNLDEGPIIMQDVIHVDHTYTAEDMMRAGRDVEKNVLSRALYKVLAQRVFVYGNRTIIL

>Nostoc gi|186681065|ref|YP_001864261.1| formyltetrahydrofolate deformylase [Nostoc punctiforme PCC 73102]

MMTNPTATLLISCPDQRGLVAKFANFIYSNGGNIIHADQHTDFAAGLFLTRIEWQLEGFNLPREFIAPAF

NAIAQPLSAKWEIRFSDTVPRIAIWVSRQDHCLFDLIWRQRAKEFVAEIPLIISNHANLKVVAEQFNIDF

QHVPITKDNKSEQEAQQLELLRQYKIDLVVLAKYMQIVSADFINQFSQIINIHHSFLPAFIGANPYHRAF

ERGVKIIGATAHYATADLDAGPIIEQDVVRVSHRDEVDDLVRKGKDLERVVLARAVRSHLQNRVLVYGNR

TVVFE

>At5g47435 gi|18422794|ref|NP_568682.1| formyltetrahydrofolate deformylase, putative [Arabidopsis thaliana]

MIRRITERASGFAKNIPILKSSRFHGESLDSSVSPVLIPGVHVFHCQDAVGIVAKLSDCIAAKGGNILGY

DVFVPENNNVFYSRSEFIFDPVKWPRSQVDEDFQTIAQRYGALNSVVRVPSIDPKYKIALLLSKQDHCLV

EMLHKWQDGKLPVDITCVISNHERASNTHVMRFLERHGIPYHYVSTTKENKREDDILELVKDTDFLVLAR

YMQILSGNFLKGYGKDVINIHHGLLPSFKGGYPAKQAFDAGVKLIGATSHFVTEELDSGPIIEQMVESVS

HRDNLRSFVQKSEDLEKKCLTRAIKSYCELRVLPYGTNKTVVF

>At4g17360 gi|15236046|ref|NP_193467.1| formyltetrahydrofolate deformylase, putative [Arabidopsis thaliana]

MIRRVSTTSCLSATAFRSFTKWSFKSSQFHGESLDSSVSPLLIPGFHVFHCPDVVGIVAKLSDCIAAKGG

NILGYDVLVPENKNVFYSRSEFIFDPVKWPRRQMDEDFQTIAQKFSALSSVVRVPSLDPKYKIALLLSKQ

DHCLVEMLHKWQDGKLPVDITCVISNHERAPNTHVMRFLQRHGISYHYLPTTDQNKIEEEILELVKGTDF

LVLARYMQLLSGNFLKGYGKDVINIHHGLLPSFKGRNPVKQAFDAGVKLIGATTHFVTEELDSGPIIEQM

VERVSHRDNLRSFVQKSEDLEKKCLMKAIKSYCELRVLPYGTQRTVVF

Targeting predictions for the At5g47435 and At4g17360 proteins using:

TargetP: http://www.cbs.dtu.dk/services/TargetP/ Paste in both Arabidopsis sequences * Check ‘Plant’,

‘Perform cleavage site predictions’

Predotar: http://urgi.versailles.inra.fr/predotar/predotar.html

Paste in both Arabidopsis sequences

The prediction algorithms agree that both proteins are mitochondrial. To check this, align them with the bacterial PurU sequences using Multalin http://bioinfo.genotoul.fr/multalin/multalin.html

* Alignment shows that both Arabidopsis proteins have N-terminal extensions of ~35 residues (a typical size for a mitochondrial targeting peptide). * Align just the two Arabidopsis sequences – note that the N-terminal extensions are not conserved (typical of targeting sequences).

Targeting – proteome databases with experimental findings

PPDB http://ppdb.tc.cornell.edu/ Click on ‘Accession’ * Paste AGI number(s) in box, e.g. At1g03475

(Coproporphyrinogen III oxidase) * Click on link(s) * Displays proteomic evidence in database and published.

SUBA3 http://suba.plantenergy.uwa.edu.au/ Click Search tab * Paste AGI number(s) in lower box, e.g.

At1g03475 * Click + ‘Arabidopsis Gene Initiative (AGI) identifier(s) is in list’, then ‘Query’ * Displays evidence.

PHYLOGENETIC TREES

Using Phylogeny.fr

: Select ‘One Click’ mode * Paste in the four PurU sequences above * Click ‘Submit’ *

Carries out in sequence alignment with MUSCLE, Maximum Likelihood tree-building with PhyML, and treedrawing with TreeDyn. The analysis runs the aLRT statistical test, which gives results similar to the bootstrap procedures bu t is much faster. * Download the tree in preferred image format. ‘Advanced’ and ‘A la carte’ modes are available for experienced users.

MICROARRAY DATABASES

Microarrays: The 22K Affymetrix chip contains most Arabidopsis genes, so in principle it can be used to monitor the expression of almost all metabolic genes. However, many metabolic genes have low expression levels, and so cannot be monitored with confidence. Genes with low average expression levels tend to give large numbers of spurious co-expression matches. mRNA abundance in general correlates broadly with protein abundance and with in-vivo metabolic fluxes.

Therefore digital gene expression data can indicate which organs have a pathway and which do not, and whether a pathway is likely to be a major or minor one. Note also that primary metabolic pathways are expressed everywhere and always, and that secondary pathways by definition are not. Unexpected differences in expression may provide clues about genetic control of pathways, e.g. an enzyme whose transcript level varies more than that of others in the pathway (i.e. is highly regulated, not constitutive) may be a control point in the pathway.

Microarray-based gene expression profiling using the Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html

For an overview of expression in different organs and in different environmental conditions: On face page, paste in one or several AGI numbers e.g.

At3g12930

At5g47190

At2g39800

(At3g12930 is the plastid Iojap protein; At5g47190 is chloroplast ribosomal protein L19; At2g39800 is the first enzyme of proline biosynthesis)

* Scroll down to graphs. Note positive correlation between At3g12930 and At5g47190. Note induction of

At2g39800 by stresses.

To search for positively correlated genes, go to Transcript Co-Response, Single Gene Query, paste in

At5g47190 * Select a dataset (‘Matrix’), e.g. developmental series * select an output, e.g. positive, top 100 of co-responding genes * Scroll down list of hits – note many strong correlations with other chloroplast ribosomal proteins, which associate together to form the protein complexes of the ribosome

Microarray-based gene expression profiling using ATTED http://atted.jp/

On face page Search box select ‘Gene ID’, paste an AGI number, e.g. At5g47190 in box, click ‘Search’ *

Click on link (‘Target’ box summarizes targeting predictions) * Displays coexpressed gene network around

At5g47190 * Note the many proteins related to plastid ribosomes * Click on coexpressed gene list for more coexpressed genes * Check all 4 boxes * Default ranking is by all datasets * Rankings in individual datasets

(e.g. tissue type, abiotic stress) can also be displayed * In ‘Link’ column, graph icon displays correlation

data points * Osa homolog column shows putative rice ortholog, clicking on link displays correlation list for rice genes * Note many ribosome associations of rice homolog of best Arabidopsis hit.

RNAseq-based gene expression profiling using qteller http://qteller.com/ Select maize, paste in

GRMZM2G161299 or GRMZM2G420119 * Note expression profile by organ * Use these two genes for correlation analysis.

PLANT PHENOME DATABASES

Although less developed than phenotype databases for mutants in model microorganisms, there are several such resources for plants, and they are growing.

SeedGenes http://www.seedgenes.org/ Covers ~350 Arabidopsis genes that give a seed phenotype when disrupted by mutation. Click ‘Enter’, click ‘Access the SeedGenes Query Page’, ‘Browse genes’, search for

AGI numbers in list.

Chloroplast2010 http://www.plastid.msu.edu/ Has morphological and metabolic phenotype data for

>5,000 mutants in genes whose products are predicted to be chloroplast-targeted. In ‘Large scale phenomics data’ click ‘Here’ * In ‘Phenotypic Analysis Overvew’ click ‘Here’ * Log in or sign up to get an account * In Search by Query Term(s) area, search by AGI number, e.g. At4g25050, At1g10310 (be sure to avoid blank spaces or empty lines) * Click on links to genes * See tabs for morphology, leaf amino acid profile, etc

RAPID http://rarge.gsc.riken.jp/phenome/ Phenotypic data in transposon-insertional mutants. Click ‘Line list’ * search for AGI number, e.g. At2g48120 * copy line code 11-2389-1 * Click on ‘Search’ * Paste line code into search box * Click ‘Search’ * displays image of albino seedling

USING GENOME RESOURCES TO FIND PLANT ENZYME GENES

This exercise demonstrates how to find Arabidopsis and maize genes encoding an enzyme, starting from the sequence of a bacterial enzyme, 5,10-methylenetetrahydrofolate reductase, EC 1.5.1.20 (MetF).

Go to Swiss-Prot Enzyme , enter 1.5.1.20 * Click on link to E. coli MetF * Capture FASTA sequence * Go to

NCBI Protein BLAST search * Select Arabidopsis thaliana * Hits on MTHFR1 and MTHFR2 (At3g59970 and At2g44160) Note multiple entries for each gene * Capture full-length (about 590 residues) FASTA text sequences, save to Word file * Align in Multalin to confirm their very high similarity.

For maize homologs, go to Maizesequence.org

, click ‘BLAST’ in header bar * Paste either

Arabidopsis sequence in search box * Select ‘peptide queries’, ‘peptide database’, ‘Filtered gene set peptides’,

‘BLASTP’, search sensitivity ‘no optimization’, click ‘Run’ * In output, if necessary turn on all columns, select

‘E-val’ in Stats and <E-val in Sort By * To see alignments, click [A] * Very strong hit, GRMZM2G347056

(593 residues) on chromosome 1; also second hit, truncated (382 residues), GRMZM2G034278 on chromosome 5. (Third hit is a small fragment) * To capture protein sequences, click on GRMZM identifiers,

‘Protein sequence’, save to Word file.

Sequence alignment indicates that GRMZM2G034278 is distinct from GRMZM2G347056 and both

Arabidopsis sequences in lacking ~200 residues at the C-terminus, in having a very different N-terminal

region of ~80 residues. GRMZM2G034278 is thus almost certainly an incorrectly-called gene or a pseudogene.

To check whether these genes are expressed, use GRMZM2G034278 and GRMZM2G347056 protein sequences in tBLASTn against maize ESTs:

50 exactly match GRMZM2G082463 (allowing for imperfections characteristic of EST sequences)

None appear to exactly match GRMZM2G034278

Therefore, since the predicted GRMZM2G034278 protein is truncated, and has no cognate ESTs (i.e. is not transcribed), it is most probably a pseudogene. Note that ~85% of the maize genome consists of hundreds of families of transposable elements. These are responsible for capture and amplification of many gene fragments.

>MetF 5,10-methylenetetrahydrofolate reductase [Escherichia coli str. K-12 substr. MG1655]

MSFFHASQRDALNQSLAEVQGQINVSFEFFPPRTSEMEQTLWNSIDRLSSLKPKFVSVTYGANSGERDRT

HSIIKGIKDRTGLEAAPHLTCIDATPDELRTIARDYWNNGIRHIVALRGDLPPGSGKPEMYASDLVTLLK

EVADFDISVAAYPEVHPEAKSAQADLLNLKRKVDAGANRAITQFFFDVESYLRFRDRCVSAGIDVEIIPG

ILPVSNFKQAKKFADMTNVRIPAWMAQMFDGLDDDAETRKLVGANIAMDMVKILSREGVKDFHFYTLNRA

EMSYAICHTLGVRPGL

>MTHFR1 gi|15232215|ref|NP_191556.1| methylenetetrahydrofolate reductase 1 [Arabidopsis thaliana]

MKVVDKIKSVTEQGQTAFSFEFFPPKTEDGVENLFERMDRLVSYGPTFCDITWGAGGSTADLTLEIASRM

QNVICVETMMHLTCTNMPIEKIDHALETIRSNGIQNVLALRGDPPHGQDKFVQVEGGFACALDLVNHIRS

KYGDYFGITVAGYPEAHPDVIEADGLATPESYQSDLAYLKKKVDAGADLIVTQLFYDTDIFLKFVNDCRQ

IGINCPIVPGIMPISNYKGFLRMAGFCKTKIPAELTAALEPIKDNDEAVKAYGIHFATEMCKKILAHGIT

SLHLYTLNVDKSAIGILMNLGLIDESKISRSLPWRRPANVFRTKEDVRPIFWANRPKSYISRTKGWNDFP

HGRWGDSHSAAYSTLSDYQFARPKGRDKKLQQEWVVPLKSIEDVQEKFKELCIGNLKSSPWSELDGLQPE

TKIINEQLGKINSNGFLTINSQPSVNAAKSDSPAIGWGGPGGYVYQKAYLEFFCSKDKLDTLVEKSKAFP

SITYMAVNKSENWVSNTGESDVNAVTWGVFPAKEVIQPTIVDPASFKVWKDEAFEIWSRSWANLYPEDDP

SRKLLEEVKNSYYLVSLVDNNYINGDIFSVFA

>MTHFR2 gi|18406468|ref|NP_566011.1| methylenetetrahydrofolate reductase 2 [Arabidopsis thaliana]

MKVIDKIQSLADEGKTAFSFEFFPPKTEDGVDNLFERMDRMVAYGPTFCDITWGAGGSTADLTLDIASRM

QNVVCVESMMHLTCTNMPVEKIDHALETIRSNGIQNVLALRGDPPHGQDKFVQVEGGFDCALDLVNHIRS

KYGDYFGITVAGYPEAHPDVIGENGLASNEAYQSDLEYLKKKIDAGADLIVTQLFYDTDIFLKFVNDCRQ

IGISCPIVPGIMPINNYRGFLRMTGFCKTKIPVEVMAALEPIKDNEEAVKAYGIHLGTEMCKKMLAHGVK

SLHLYTLNMEKSALAILMNLGMIDESKISRSLPWRRPANVFRTKEDVRPIFWANRPKSYISRTKGWEDFP

QGRWGDSRSASYGALSDHQFSRPRARDKKLQQEWVVPLKSVEDIQEKFKELCLGNLKSSPWSELDGLQPE

TRIINEQLIKVNSKGFLTINSQPSVNAERSDSPTVGWGGPVGYVYQKAYLEFFCSKEKLDAVVEKCKALP

SITYMAVNKGEQWVSNTAQADVNAVTWGVFPAKEIIQPTIVDPASFNVWKDEAFETWSRSWANLYPEADP

SRNLLEEVKNSYYLVSLVENDYINGDIFAVFADL

>GRMZM2G347056

MKVIEKILEAAGDGRTAFSFEYFPPKTEEGVENLFERMDRMVAHGPSFCDITWGAGGSTA

DLTLEIANRMQNMVCVETMMHLTCTNMPVEKIDHALETIKSNGIQNVLALRGDPPHGQDK

FVQVEGGFACALDLVQHIRAKYGDYFGITVAGYPEAHPDAIQGEGGATLEAYSNDLAYLK

RKVDAGADLIVTQLFYDTDIFLKFVNDCRQIGITCPIVPGIMPINNYKGFLRMTGFCKTK

IPSEITAALDPIKDNEEAVRQYGIHLGTEMCKKILATGIKTLHLYTLNMDKSAIGILMNL

GLIEESKVSRPLPWRPATNVFRVKEDVRPIFWANRPKSYLKRTLGWDQYPHGRWGDSRNP

SYGALTDHQFTRPRGRGKKLQEEWAVPLKSVEDISERFTNFCQGKLTSSPWSELDGLQPE

TKIIDDQLVNINQKGFLTINSQPAVNGEKSDSPTVGWGGPGGYVYQKAYLEFFCAKEKLD

QLIEKIKAFPSLTYIAVNKDGETFSNISPNAVNAVTWGVFPGKEIIQPTVVDHASFMVWK

DEAFEIWTRGWGCMFPEGDSSRELLEKVQKTYYLVSLVDNDYVQGDLFAAFKI

>GRMZM2G034278

MCMLLRKDSGHYLAIVVYVKCCSLEEERRKERIPTELMRSFILTSHTAPGRAPAASSICN

DRTRRRAELLSSYIYNSSTKVCVETMMHLTCTNMPVEKIDHALETIKFNGIHNVLALRGD

PPHGQDKFVQVEGGFACALDLVQHIRSKYGDYFGITVAGYPEAHPDAIQGEGGATLEAYS

NDLAYLKRKVDAGADLIVTQLFYDTDIFLKFVNDCRQIGITCPIVPGIMPINNYKGFMRM

TGFCKTKIPSEITAALDPIKDNEEAVRAYGIHLGTEMCKKIIASGIKTLHLYTLNVDKSA

LGILMNLGLIEESKVSRSLPWRPATNVFRVKEVVRPIFWASRPKSYLKRTLGWDQYPHEG

GVILETHHMEHLGIVHKTTWTW

Download