Computational analysis of membrane proteins implicated in metal transport in Arabidopsis thaliana CIAVVLCLVFMSVEVVGGIKANSLAILTDAAHLLSDVAAFAISLFSLWAAGWEATPRQTYGFFRIEILGALVSIQLIWLLT ALFLLINTAYMVVEFVAGFMSNSLGLISDACHMLFDCAALAIGLYASYISRLPANHQYNYGRGRFEVLSGYVNAVFLVLVG CFVVVLCLLFMSIEVVCGIKANSLAILADAAHLLTDVGAFAISMLSLWASSWEANPRQSYGFFRIEILGTLVSIQLIWLLT LIAVLLCAIFIVVEVVGGIKANSLAILTDAAHLLSDVAAFAISLFSLWASGWKANPQQSYGFFRIEILGALVSIQMIWLLA ---IFLYLIVMSVQIVGGFKANSLAVMTDAAHLLSDVAGLCVSLLAIKVSSWEANPRNSFGFKRLEVLAAFLSVQLIWLVS Stefanie Hartmann Max Planck Institute for Molecular Plant Physiology Supervisors: Joachim Selbig, Ute Krämer QuickTime™ and a TIFF (Uncompress ed) decompress or are needed to s ee this picture. Qu ic kTi me™ a nd a TIFF (U nc omp res se d) de co mpre ss or are n ee de d to se e thi s p i cture . Quick Time™a nd a TIFF ( Uncomp res sed) deco mpre ssor are n eede d to s ee this picture . Qu i ck Ti me ™a nd a TIFF (Unc om pres se d) de co mp re ss or are n ee de d to s ee th is pi ctu re . Q ui ck Ti m e ™ an d a T I FF ( U nc om p r es se d) de co m pr e ss or ar e n ee de d t o se e t hi s p i ct ur e . Q ui ck Ti m e ™ an d a T I FF ( U nc om p r es se d) de co m pr e ss or ar e n ee de d t o se e t hi s p i ct ur e . Q ui ck Ti m e ™ an d a T I FF ( U nc om p r es se d) de co m pr e ss or ar e n ee de d t o se e t hi s p i ct u re . QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. QuickTime™ and a TIFF (Uncompress ed) decompress or are needed to s ee this picture. Qu ic kTi me™ a nd a TIFF (U nc omp res se d) de co mpre ss or are n ee de d to se e thi s p i cture . Quick Time™a nd a TIFF ( Uncomp res sed) deco mpre ssor are n eede d to s ee this picture . Qu i ck Ti me ™a nd a TIFF (Unc om pres se d) de co mp re ss or are n ee de d to s ee th is pi ctu re . 12 membrane proteins involved in metal transport in Arabidopsis Q ui ck Ti m e ™ an d a T I FF ( U nc om p r es se d) de co m pr e ss or ar e n ee de d t o se e t hi s p i ct ur e . Q ui ck Ti m e ™ an d a T I FF ( U nc om p r es se d) de co m pr e ss or ar e n ee de d t o se e t hi s p i ct ur e . Q ui ck Ti m e ™ an d a T I FF ( U nc om p r es se d) de co m pr e ss or ar e n ee de d t o se e t hi s p i ct u re . QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. Metal transporters are of great importance because… …they provide an adequate supply of essential trace metals …they prevent an excess of these potentially toxic ions in silico analyses may help design further experiments on • basic research on metal homeostasis • development of new ways of phytoremediation Cation Diffusion Facilitator (CDF) proteins also referred to as cation efflux (CE) proteins • occur in archaea, bacteria, eukaryotes • are involved in transporting heavy metals (Co2+, Cd2+, Zn2+, Ni2+) • the CDF family of proteins had 13 members in 1997 • the CE Pfam family today has 348 members (July 2003) 426 (Jan 2004) CDF signature sequence: S X (ASG) (LIVMT)2 (SAT) (DA) (SGAL) (LIVFYA) (HDN) X3 D X2 (AS) The Arabidopsis thaliana CDF protein family CDF1: At2g46800 S LAILTDAAHLLS D VAA CDF2: At3g61940 CDF3: At3g58810 CDF4: At2g29410 CDF5: At2g04620 CDF6: At2g47830 CDF7: At2g39450 CDF8: At1g16310 CDF9: At1g79520 CDF10: At3g58060 CDF11: At3g12100 CDF12: At1g51610 S LAILADAAHLLT D VGA exact match S LAILTDAAHLLS D VAA S LAVMTDAAHLLS D VAG S LGLISDACHMLF D CAA 1 mismatch S TAIIADAAHSVS D VVL S LAIIASTLDSLL D LLS S MAVIASTLDSLL D LLS 2 mismatches S MAVIASTLDSLL D LLS S IAIAASTLDSLL D LMA R VGLVSDAFHLTF G CGL S HVIMAEVVHSVA D FAN 3 mismatches 4 mismatches Research questions: Can all 12 proteins be classified as CDF proteins? i.e., are there predicted structural and functional similarities of these 12 Arabidopsis proteins? secondary structure prediction, inclusion in membrane- and transporter databases, evaluation of common motifs, etc Research questions: Can all 12 proteins be classified as CDF proteins? i.e., are there predicted structural and functional similarities of these 12 Arabidopsis proteins? secondary structure prediction, inclusion in membrane- and transporter databases, evaluation of common motifs, etc What are the relationships of the 12 Arabidopsis proteins among each other and to other published sequences? intron/exon structure, phylogenetic reconstructions Research questions: Can all 12 proteins be classified as CDF proteins? i.e., are there predicted structural and functional similarities of these 12 Arabidopsis proteins? secondary structure prediction, inclusion in membrane- and transporter databases, evaluation of common motifs, etc What are the relationships of the 12 Arabidopsis proteins among each other and to other published sequences? intron/exon structure, phylogenetic reconstructions Is it possible to predict the 3D structure of these proteins? fold recognition by threading Sequence retrieval - four ambiguous sequences TIGR Arabidopsis thaliana database TAIR: The Arabidopsis Information Resource MIPS Arabidopsis thaliana genome database • different assignment of introns, use of alternative start codons Sequence analysis - three additional ambiguous sequences SWALL Pfam vs. TIGR/TAIR/MIPS • insertions and deletions, different amino acid sequence Cloning and RT-PCR revealed correct sequences for six of the seven ambiguous CDFs Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF3 CDF4 CDF5 () - CDF6 CDF7 - CDF8 - - CDF9 - - CDF10 - CDF11 CDF12 - Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF3 CDF4 CDF5 () () - CDF6 () () CDF7 () () - CDF8 () () - - CDF9 () () - - CDF10 () - CDF11 () CDF12 () - Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF3 CDF4 CDF5 () () – CDF6 () () CDF7 () () – CDF8 () () – – CDF9 () () – – CDF10 () – CDF11 () CDF12 () – Hidden Markov models used for secondary structure prediction cytoplasmic side membrane non-cytoplasmic side • states (loops, transmembrane domains, etc) are defined • states are connected in a biologically reasonable way (transitions) • each state has a specific probability distribution over the 20 amino acids • each transition has a specific transition probability • amino acid probabilities and transition probabilities are learned • models are first taught using a training set, the trained model is then used for the prediction Results of secondary structure predictions number of TMD N-terminus within cytoplasm CDF1 6 2/3 CDF2 6 3/3 CDF3 6 2/3 CDF4 5-6 2/3 CDF5 6 CDF6 0-6 1/3 CDF7 4-6 2/3 CDF8 5-6 3/3 CDF9 5-6 3/3 CDF10 4-6 2/3 CDF11 6 3/3 CDF12 4-6 3/3 TMHMM v2 HMMTOP v2 Memsat2 (14) (Tusnady and Simon, 1998, 2001) (Sonnhammer et al. 1998) (Jones et al. 1994, McGuffin et al. 2000) 3/3 Results of secondary structure predictions number of TMD N-terminus within cytoplasm CDF1 6 2/3 CDF2 6 3/3 CDF3 6 2/3 CDF4 5-6 2/3 CDF5 6 CDF6 0-6 1/3 CDF7 4-6 2/3 CDF8 5-6 3/3 CDF9 5-6 3/3 CDF10 4-6 2/3 CDF11 6 3/3 CDF12 4-6 3/3 TMHMM v2 HMMTOP v2 Memsat2 (14) (Tusnady and Simon, 1998, 2001) (Sonnhammer et al. 1998) (Jones et al. 1994, McGuffin et al. 2000) 3/3 CDF signature CE signature Prediction of subcellular localization mTP: mitochondrial targeting peptide cTP: chloroplast transit peptide SP: signal peptide (ER/secretory pathway) Prediction of subcellular localization - methods • N-terminal sorting signals display characteristic amino acid compositions • sequence-based methods predicting N-terminal sorting signals are based on this observation TargetP mTP, cTP, SP neural network-based iPSORT mTP, cTP, SP decision list Predotar mTP, cTP neural network-based SignalP NN SignalP HMM mTP: mitochondrial targeting peptide SP neural network-based SP based on hidden Markov models cTP: chloroplast transit peptide SP: signal peptide (ER/secretory pathway) Prediction of subcellular localization - results TargetP iPSORT Predotar SignalP NN HMM CDF1 CDF2 3/4 CDF3 CDF4 CDF5 CDF6 mTP cTP cTP cTP mTP cTP* mTP* 1/4 CDF7 CDF8 2/4* Y* CDF9 CDF10 CDF11 CDF12 mTP: mitochondrial targeting peptide mTP mTP cTP: chloroplast transit peptide SP: signal peptide (ER/secretory pathway) Exon structure of the CDF proteins # of exons CDF1 1 CDF2 1 CDF3 1 CDF4 1 CDF5 1 CDF11 9 CDF6 12 CDF12 13 CDF7 6 CDF8 6 CDF9 7 CDF10 5 Gene organization of the CDF proteins CDF1 CDF1 CDF2 CDF2 CDF3 CDF3 CDF4 CDF4 CDF5 CDF5 CDF11 CDF10 CDF6 CDF6 CDF12 CDF12 CDF7 CDF7 CDF8 CDF8 CDF9 CDF9 CDF11 CDF10 Phylogenetic Relationships within Cation Transporter Families of Arabidopsis Plant Physiology 2001; 126 (4): 1646–1667 omitted: CDFs 5, 7, 8, 9 CDF6 CDF11 CDF4 CDF3 CDF2 CDF10 CDF12 CDF1 Phylogenetic analysis of the Arabidopsis CDF proteins AtCDF4 100 AtCDF3 group I 100 AtCDF1 98/94/99 AtCDF2 AtCDF12 100 AtCDF6 AtCDF10 67/–/69 100 AtCDF7 group II 86/100/95 –/9479 AtCDF9 100 AtCDF8 AtCDF5 100/ 73/68 8 AtCDF11 RmCzcD Phylogenetic analysis of sequences containing the CE signature Escherischia coli ZITB Ralstonia metallidurans CZCD Ralstonia metallidurans CZCD Mus musculus ZNT4 Rattus norvegigus ZNT2 Mus musculus ZNT3 Arabidopsis thaliana CDF4 Arabidopsis thaliana CDF2 Eucalyptus grandis Arabidopsis thaliana CDF3 Arabidopsis thaliana CDF1 Thlaspi caerulescens ZTP1 Thlaspi goesingense MTP1 Thlaspi goesingense MTP1 Thlaspi goesingense MTP1 Oryza sativa Zea mays Lotus japonicus Medicago trunculata Oryza sativa Triticum aestivum Caenorhabditis elegans CDF1 Rattus norvegicus ZNT1 Schizosachharomyces pombe ZHF Saccharomyces cerevisiae COT1 Saccharomyces cerevisiae ZRC1 Oryza sativa Stylosanthes hamata MTP1 Oryza sativa Arabidopsis thaliana CDF10 Arabidopsis thaliana CDF7 Stylosanthes hamata MTP4 Stylosanthes hamata MTP2 Stylosanthes hamata MTP3 Oryza sativa Arabidopsis thaliana CDF8 Arabidopsis thaliana CDF9 Arabidopsis thaliana CDF6 Saccharomyces cerevisiae Mmt2 Saccharomyces cerevisiae Mmt1 T. thermophilus czrB Oryza sativa Arabidopsis thaliana CDF12 Homo sapiens ZNT5 Homo sapiens ZTL1 Homo sapiens ZNT7 Homo sapiens ZNT6 Arabidopsis thaliana CDF11 S. cerevisiae MSC2 Oryza sativa Arabidopsis thaliana CDF5 Staphylococcus aureus Staphylococcus aureus Bacillus stearothermophilus Bacillus stearothermophilus Arabidopsis group I sequences, monocot and dicot sequences, mammalian metal transporters Arabidopsis group II sequences, monocot and dicot sequences, prokaryotic and eukaryotic seqs several two-domain proteins outgroup working model: topology of Arabidopsis CDF proteins CDF signature sequence cell exterior/organelle cytoplasm N C Information derived from the 3D structure of a protein assignment of function guide mutagenesisexperiments ligand and functional sites evolutionary relationships residue solvent exposure putative interaction sites Structure determination 1. Classical approaches • X-ray crystallography • NMR spectroscopy 2. Computational approaches • comparative (“homology”) modeling • fold recognition (“threading”) • ab initio methods The basis of fold recognition (“threading”) The number of folds occurring in nature is limited: QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. PDB statistics: http://www.rcsb.org/pdb/holdings.html There are many sequences with no significant sequence identity but with the same or similar folds …HEAIDHKPKLTGMKTGRVVSSMKSNFFADLP… …HDGRSSMTRFSRYFRKTGRVSEYYKKQERLLE… Fold recognition methods aim: to find an optimal sequence-structure alignment 1. “threading” of an unknown target sequence into the backbone structure of template proteins of known structure ………CLVFMSVEVVGGIKANSLAILTD……… Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials or using knowledge-based mean force potentials 3. Output: 4.99 Å a list of folds (sorted or unsorted), their “compatibility score”, sometimes other information such as SCOP descriptors, alignment, rudimentary 3D model of the query protein, raw scores, solvation energy for the model, links No new insights regarding the structure of CDF proteins Membrane proteins are significantly under-represented in structural databases – and therefore also in fold libraries If there is no fold similar to the native fold of the target protein, this approach cannot succed. Threading methods cannot be used for modeling of transmembrane proteins Will the 3D structure of CDFs be available soon? • for fold recognition methods to be used successfully: significantly more 3D structures of membrane proteins are needed • fold recognition methods specifically for integral membrane proteins may eventually be developed • cyrystallization of bacterial homologs and subsequent extraploation of structural features as an alternative? • approach for globular proteins: predicting a protein’s solubility and propensity to crystallize, based on results from high-throughput structure determination Can threading results be used as an independent way to verify group assignment? Were some structural hits specific for any of the CDF groups? 1. Which hits were common to which of the CDF sequences? 1 • • • • • • • 2 • • • • • • 3 • • • • • • 4 • • • • • • 5 • • • • • • • • • • • 1 • • • • • • 2 • • • 3 • • • • 2. “Phylothreading” • • • • 4 • • 5 Can threading results be used as an independent way to verify group assignment? Were some structural hits specific for any of the CDF groups? 1. Which hits were common to which of the CDF sequences? 1 • • • • • • • 2 • • • • • • 3 • • • • • • 4 • • • • • • 5 • • • • • • • • • • • 1 • • • • • • 2 • • • 3 • • • • 2. “Phylothreading” • • • • 4 • • 5 Which hits were common to which of the CDF sequences? Structural hits predicted • for most CDF sequences • for group I sequences • for group II sequences • for CDF5 and CDF11 • for CDF6 and CDF12 1… …170 1 • • • • • • 2 • • • • • • … • • • • • • 11 • • • • • • 12 • • • • • • • • • • • • • • • • • • • • • • Results were unable to provide evidence to verify group assignments based on other methods • • • • • • • • • “Phylothreading” CDF1 CDF2 CDF3 CDF4 CDF5 CDF6 78 CDF7 99 68 CDF8l CDF8s CDF9 CDF10 CDF11 CDF12 Phylothreading results can neither verify nor refute group assignments based on other methods Threading: non-transmembrane CDF fragments cell exterior/organelle cytoplasm N C N-terminus histidine-rich loop between TMD 4 and 5 C-terminus “Phylothreading”: CDF C-terminal fragments CDF1 CDF2 group I CDF3 “phylothreading” results confirm the assignment of CDF sequences to groups that were based on independent methods CDF4 CDF5 67 CDF11 CDF6 69 CDF7 68 79 85 CDF8 group II CDF9 CDF10 CDF12 Conclusions • The 12 Arabidopsis protein sequences reveal structural and therefore probably functional conservation • My results support the classification of these proteins as CDF metal transporters • I propose that the CDF protein family of A. thaliana contains two groups, each containing at least four proteins that are structurally and functionally closely related • Threading methods cannot be used for transmembrane proteins or for their non-transmembrane domains (yet) • Threading results for multiple sequences may be used to confirm (or find?) relationships among these sequences (“phylothreading”) • I was able to evaluate and compare a number of online tools that are available for the analysis of sequence data Conclusions 1. Sequence retrieval revealed conflicting information for 7 of the 12 proteins 2. The 12 Arabidopsis protein sequences reveal striking structural and therefore probably functional conservation 3. My results support the classification of these proteins as CDF metal transporters 4. I propose that the CDF protein family of A. thaliana contains two groups, each containing four proteins that are structurally and functionally closely related 5. I was able to evaluate and compare a variety of online tools available for the analysis of sequence data Conclusions 1. Sequence retrieval revealed conflicting information for 7 of the 12 proteins 2. The 12 Arabidopsis protein sequences reveal striking structural and therefore probably functional conservation 3. My results support the classification of these proteins as CDF metal transporters 4. I propose that the CDF protein family of A. thaliana contains two groups, each containing four proteins that are structurally and functionally closely related 5. I was able to evaluate and compare a variety of online tools available for the analysis of sequence data 6. Threading methods cannot be used for transmembrane proteins or for their non-transmembrane domains (yet) 7. Threading results for multiple sequences can be used to confirm (or find?) relationships among these sequences (“phylothreading”) METHODS Phylogenetic analysis: tree-building methods • distance-based methods overall distance between all pairs of sequences are calculated and then used to calculate a tree (Neighbor Joining) • character-based methods the individual substitutions among the sequences are used to determine the most likely ancestral relationships (Maximum Parsimony, Maximum Likelihood) • Bayesian inference of phylogenies ...CLVFMSVEVVGGIKANSLAILTD... ...NTAYMVVEFVAGFMSNSLGLISD... ...CLLFMSIEVVCGIKANSLAILAD... ...CAIFIVVEVVGGIKANSLAILTD... ...YLIVMSVQIVGGFKANSLAVMTD... Phylogenetic analysis: statistical evaluation of trees • bootstrap analysis how much support exists for particular branches in a phylogeny? 1. 2. 3. 4. 5. 6. tree construction, determination of the “best” tree bootstrap datasets (pseudosamples) are created from the original dataset by random sampling with replacement tree construction using the bootstrap datasets comparison of the bootstrap tree with the inferred tree this is repeated several hundred times bootstrap value: percentage of times an interior branch in the bootstrap tree was the same as the one in the inferred tree ...CLVFMSVEVVGGIKANSLAILTD... ...NTAYMVVEFVAGFMSNSLGLISD... ...CLLFMSIEVVCGIKANSLAILAD... ...CAIFIVVEVVGGIKANSLAILTD... ...YLIVMSVQIVGGFKANSLAVMTD... Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure • using environment-based mean force potentials (Bowie, Fischer, Eisenberg: 1991-1996) - residue positions are categorized into environment classes - the 3D protein structure is converted into a 1D sequence - generate alignment of this 1D string to target sequence • using knowledge-based mean force potentials (Sippl: 1990-1995) - information is automatically learned from databases of protein structures - pairwise interactions between structurally adjacent residues are calculated - transformation of mean force potentials as a function of distance Fold recognition methods aim: to find an optimal sequence-structure alignment 1. “threading” of an unknown target sequence into the backbone structure of template proteins of known structure ………CLVFMSVEVVGGIKANSLAILTD……… fold library query sequence Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials or using knowledge-based mean force potentials 4.99 Å Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials* or using knowledge-based mean force potentials* 4.99 Å * distant-dependent forces that act between atoms/residues (electrostatic and van der Waals interactions, influences on the surrounding medium on these interactions, contacts between two or three amino acids, angles between residue pairs, …) Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials or using knowledge-based mean force potentials 3. Output: 4.99 Å a list of folds (sorted or unsorted), their “compatibility score”, sometimes other information such as SCOP descriptors, alignment, rudimentary 3D model of the query protein, raw scores, solvation energy for the model, links Threading methods used Qu i ck Ti me ™a nd a TIFF (Unc om pres se d) de co mp re ss or are n ee de d to s ee th is pi ctu re . QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Quick Time™a nd a TIFF ( Uncomp res sed) deco mpre ssor are n eede d to s ee this picture . Qu i ck Ti me ™a nd a TIFF (Unc om pres se d) de co mp re ss or are n ee de d to s ee th is pi ctu re . UCLA-DOE Fold Server P. Mallick et al., 2002 (BLAST, PSI-BLAST, SDP, DASEY) Threader D.T. Jones et al., 1992 mGenThreader L.J. McGuffin & D.T. Jones 2003 3D-PSSM L.A. Kelley et al., 2000 Arby I. Sommer et al., unpublished (PSI-BLAST, 123D, Jprop) Selection of structural hits for further analysis UCLA-DOE: top 10 structural hits are returned, all were kept Threader: compatibility of target sequence and all 2000 available templates is evaluated; lists were sorted by Z-value, approximately 10-20 best hits were kept mGenThreader: top 20 structural hits are returned, all were kept 3D-PSSM: top 20 structural hits are returned, all were kept Arby: a list of the 10-20 best scores is returned; the corresponding hits were extracted from a large table Evaluation of the top score for each CDF sequence UCLA 350 Threader 300 0 very poor score 250 1 200 2 150 100 50 0 3 borderline significant 4 significant very significant scores: no guidelines mGenThreader 3D-PSSM 1.0 0.8 guess 0.6 0.4 2.5 2.0 1.5 low confidence 0.2 0.0 poor score 1.0 worthy of attention 0.5 medium confidence high confidence certain 0.0 highly confident There is no consensus of top fold predicted by different methods example: top two structural hits for CDF1 Threader: 1ONE 1C3Q phosphopyruvate hydrolase thiazole kinase mGenThreader: 1L8M 1QGR his-rich protein (model) importin beta UCLA-DOE: 1B8F 1HFA histidine ammonia-lyase clathrin assembly protein 3D-PSSM: 1PW4 1KPW glycerol-3-phosphate transporter green cone pigment Arby: 1HZX 1EZV bovine rhodopsin yeast cytochrome bc1 No new insights regarding the structure of CDF proteins Membrane proteins are significantly under-represented in structural databases – and therefore also in fold libraries If there is no fold similar to the native fold of the target protein, this approach cannot succed. Threading methods cannot be used for modeling approaches Threading results: C-termini 1. Structural information no information of domains for metal transport available. BUT: several of the returned hits are proteins in which bound metals have structural or catalytic roles 2. Verification of group assignment i. Hits predicted for more than one C-terminus: specific for group I: specific for group II: specific for CDF5 and CDF11: ii. “Phylothreading” 48 folds 3 2 2 Positions of conserved domains and signature sequences 1 2 3 4 5 11 6 12 7 8 9 10 TMD I II III IV V VI CDF signature Pfam CE signature BLOCKS (eMOTIF) 10, 11 11, 12 6-12 Arabidopsis CDF proteins AtCDF1 AtCDF2 AtCDF3 AtCDF4 AtCDF5 AtCDF11 AtCDF6 AtCDF12 AtCDF9 AtCDF8 AtCDF10 AtCDF7 outgroup group I: - contain his-rich region between TMD 4 and 5 - one member is confirmed to transport Zn ions - genome structure conserved (no introns) no group assignment: - CDF6, CDF12: possibly distant common ancestry and mitochondrial localization - CDF5, CDF11: close relationship also in PFAM tree group II: - lack the his-rich region between TMD 4 and 5 - proteins may transport Mn ions - C-terminal regions differ from group I sequences working model: topology of Arabidopsis CDF proteins CDF signature sequence cell exterior/organelle cytoplasm N C Gene organization of the CDF proteins CDF1 CDF2 CDF3 CDF4 CDF5 CDF10 CDF6 CDF12 CDF7 CDF8 CDF9 CDF11 Phylogenetic analysis of sequences containing the CE signature 100/52/83 100/100/70 100/98/97 100/100/69 100/ 99/82 100/99/88 99/ 94/ 72 100/100/63 100/100/74 100/100/– 71/55 100/99/95 99/–/61 -/75 1000/100/94 100/ 100/ 53 62 100/100/82 100/100 /95 97/–/– 68/–/83 100/–/63 83/–/67 100/100/71 99/65/– 100/100/– 100/99/75 96/96/78 100/100/63 100/100/69 74/–/– 100/100/92 100 /100 /85 5 100 /100 /84 Escherischia coli ZITB Ralstonia metallidurans CZCD Ralstonia metallidurans CZCD Mus musculus ZNT4 Rattus norvegigus ZNT2 Mus musculus ZNT3 Arabidopsis thaliana CDF4 Arabidopsis thaliana CDF2 Eucalyptus grandis Arabidopsis thaliana CDF3 Arabidopsis thaliana CDF1 Thlaspi caerulescens ZTP1 Thlaspi goesingense MTP1 Thlaspi goesingense MTP1 Thlaspi goesingense MTP1 Oryza sativa Zea mays Lotus japonicus Medicago trunculata Oryza sativa Triticum aestivum Caenorhabditis elegans CDF1 Rattus norvegicus ZNT1 Schizosachharomyces pombe ZHF Saccharomyces cerevisiae COT1 Saccharomyces cerevisiae ZRC1 Oryza sativa Stylosanthes hamata MTP1 Oryza sativa Arabidopsis thaliana CDF10 Arabidopsis thaliana CDF7 Stylosanthes hamata MTP4 Stylosanthes hamata MTP2 Stylosanthes hamata MTP3 Oryza sativa Arabidopsis thaliana CDF8 Arabidopsis thaliana CDF9 Arabidopsis thaliana CDF6 Saccharomyces cerevisiae Mmt2 Saccharomyces cerevisiae Mmt1 T. thermophilus czrB Oryza sativa Arabidopsis thaliana CDF12 Homo sapiens ZNT5 Homo sapiens ZTL1 Homo sapiens ZNT7 Homo sapiens ZNT6 Arabidopsis thaliana CDF11 S. cerevisiae MSC2 Oryza sativa Arabidopsis thaliana CDF5 Staphylococcus aureus Staphylococcus aureus Bacillus stearothermophilus Bacillus stearothermophilus IV I V II III Phylogenetic analysis: tree-building methods • maximum parsimony methods the best tree topology minimizes the total amount of evolutionary change that has occurred • distance methods the best tree topology minimizes the the total distance among taxa • maximum likelihood methods given a particular substitution model and given a particular tree, how likely is the observed data? ...CLVFMSVEVVGGIKANSLAILTD... ...NTAYMVVEFVAGFMSNSLGLISD... ...CLLFMSIEVVCGIKANSLAILAD... ...CAIFIVVEVVGGIKANSLAILTD... ...YLIVMSVQIVGGFKANSLAVMTD... Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF zinc transporter CDF CDF CDF putative MTP CDF CDF CDF3 CDF putative MTP CDF CDF CDF4 CDF putative MTP CDF CDF CDF5 singleton (CDF related) putative cation transporter CDF - CDF6 singleton unknown protein CDF CDF CDF7 family unknown protein CDF - CDF8 family hypothetical protein - - CDF9 family unknown protein - - CDF10 family putative MTP - CDF CDF11 singleton putative MTP CDF CDF CDF12 singleton putative MTP - CDF Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF zinc transporter CDF CDF CDF putative MTP CDF CDF CDF3 CDF putative MTP CDF CDF CDF4 CDF putative MTP CDF CDF CDF5 singleton (CDF related) putative cation transporter CDF - CDF6 singleton unknown protein CDF CDF CDF7 family unknown protein CDF - CDF8 family hypothetical protein - - CDF9 family unknown protein - - CDF10 family putative MTP - CDF CDF11 singleton putative MTP CDF CDF CDF12 singleton putative MTP - CDF Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF zinc transporter CDF CDF CDF putative MTP CDF CDF CDF3 CDF putative MTP CDF CDF CDF4 CDF putative MTP CDF CDF CDF5 singleton (CDF related) putative cation transporter CDF - CDF6 singleton unknown protein CDF CDF CDF7 family unknown protein CDF - CDF8 family hypothetical protein - - CDF9 family unknown protein - - CDF10 family putative MTP - CDF CDF11 singleton putative MTP CDF CDF CDF12 singleton putative MTP - CDF Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Membrane Protein Library (AMPL) CDF1 CDF2 Arabidopsis ARAMEMNON Transport Protein Database PlantsT CDF zinc transporter CDF CDF CDF putative MTP CDF CDF CDF3 CDF putative MTP CDF CDF CDF4 CDF putative MTP CDF CDF CDF5 singleton (CDF related) putative cation transporter CDF - CDF6 singleton unknown protein CDF CDF CDF7 family unknown protein CDF - CDF8 family hypothetical protein - - CDF9 family unknown protein - - CDF10 family putative MTP - CDF CDF11 singleton putative MTP CDF CDF CDF12 singleton putative MTP - CDF