Structure Alignment Michael Schroeder BioTechnological Center TU Dresden ms@biotec.tu-dresden.de www.biotec.tu-dresden.de Biotec Structure Alignment + By Michael Schroeder, Biotec 2 Content Motivation Some basics Double Dynamic Programming By Michael Schroeder, Biotec 3 PART I: Motivation By Michael Schroeder, Biotec 4 Motivation: Conformational changes Upon ligand binding structures may change Structural alignment can highlight the changes By Michael Schroeder, Biotec 5 Conformational changes: Small GTPases Small GTPases act as molecular switches to control and regulate important functions and pathways within in cell Activated by guanine nucleotide exchange factors (GEF) Inactivated by GTPase activating proteins (GAP) GEFs GAPs By Michael Schroeder, Biotec 6 G proteins: Conformational change in GTP and GDP bound state By Michael Schroeder, Biotec 7 Open and closed conformation of cytrate synthase (1cts,5cts) Open: oxalacetate, Closed: oxalacetate and co-enzyme A Loop between two helices moves by 6A and rotates by 28º, some atoms move by 10A By Michael Schroeder, Biotec 8 By Michael Schroeder, Biotec 9 Hinge motion in Lactoferrin (1lfh, 1lfg) Lactoferrin is an iron-binding protein found in secretions such as milk or tears Rotation of 54º upon iron-binding By Michael Schroeder, Biotec 10 Hinge motion in Lactoferrin (1lfh, 1lfg) Lactoferrin is an iron-binding protein found in secretions such as milk or tears Rotation of 54º upon iron-binding By Michael Schroeder, Biotec 11 By Michael Schroeder, Biotec 12 Motivation: (Distant) Relatives Sequence similarity may be low, but structural similarity can still be high By Michael Schroeder, Biotec Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt 13 Distant relatives Globins occur widely Primary function: binding oxygen Assembly of helices surrounding haem group By Michael Schroeder, Biotec 14 Relatives SpermBywhale myoglobin (2lh7) and Lupin leghaemoglobin (1mbd) Michael Schroeder, Biotec 15 Distant Relatives By Michael Schroeder, Biotec 16 Relatives Actinidin (2act) and Papain (9pap) Sequence identity 49%, rmsd 0.77A Same family: Papain-like By Michael Schroeder, Biotec 17 Relatives Plastocyanin (5pcy) and azurin (2aza) Core of structure is conserved By Michael Schroeder, Biotec 18 Relatives Structure classifications like CATH and FSSP use structural alignments to identify superfamilies. By Michael Schroeder, Biotec 19 Motivation: Convergent Evolution By Michael Schroeder, Biotec 20 Sequence similarity: low >1cse Subtilisin AQTVPYGIPLIKADKVQAQGFKGANVKVAVLDTGIQA SHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAAL DNTTGVLGVAPSVSLYAVKVLNSSGSGSYSGIVSGIE WATTNGMDVINMSLGGASGSTAMKQAVDNAYARGVVV VAAAGNSGNSGSTNTIGYPAKYDSVIAVGAVDSNSNR ASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMA SPHVAGAAALILSKHPNLSASQVRNRLSSTATYLGSS FYYGKGLINVEAAAQ >1acb Chymotrypsin CGVPAIQPVLSGLSRIVNGEEAVPGSWPWQVSLQDKT GFHFCGGSLINENWVVTAAHCGVTTSDVVVAGEFDQG SSSEKIQKLKIAKVFKNSKYNSLTINNDITLLKLSTA ASFSQTVSAVCLPSASDDFAAGTTCVTTGWGLTRYTN ANTPDRLQQASLPLLSNTNCKKYWGTKIKDAMICAGA SGVSSCMGDSGGPLVCKKNGAWTLVGIVSWGSSTCST STPGVYARVTALVNWVQQTLAAN By Michael Schroeder, Biotec 21 Structural similarity: low By Michael Schroeder, Biotec 1CSE:E, 1ACB:E 22 Convergent Evolution c.41.1 and b.47.1 share interaction partners c.41.1 Subtilisin-like d.40.1 CI-2 family of serine protease inhibitors d.58.3 Protease propeptides/ inhibitors b.47.1 Trypsin-like serine proteases d.84.1 Subtilisin inhibitor g.15.1 Ovomucoid/PCI-1 like inhibitor By Michael Schroeder, Biotec c.56.5 Zn-dependent exopeptidase 23 Convergent Evolution 1oyv Ovomucoid/PCI-1 like inhibitor, g.15.1top Subtilisin like c.41.1bottom 1OYV By Michael Schroeder, Biotec 4sgb Ovomucoid/PCI-1 like inhibitor, g.15.1, top Trypsin-like serine proteases, b.47.1.2, 24 bottom Convergent Evolution Aligned structures 1cse CI-2 family of serine proteases inhitors, d.40.1 top Subtilisin like c.41.1bottom By Michael Schroeder, Biotec 1acb CI-2 family of serine proteases inhitors, d.40.1 top Trypsin-like serine proteases, b.47.1.2,25bottom Catalytic Triad >1cse Subtilisin AQTVPYGIPLIKADKVQAQGFKGANVKVAVLDTGIQA SHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAAL DNTTGVLGVAPSVSLYAVKVLNSSGSGSYSGIVSGIE WATTNGMDVINMSLGGASGSTAMKQAVDNAYARGVVV VAAAGNSGNSGSTNTIGYPAKYDSVIAVGAVDSNSNR ASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMA SPHVAGAAALILSKHPNLSASQVRNRLSSTATYLGSS FYYGKGLINVEAAAQ >1acb Chymotrypsin CGVPAIQPVLSGLSRIVNGEEAVPGSWPWQVSLQDKT GFHFCGGSLINENWVVTAAHCGVTTSDVVVAGEFDQG SSSEKIQKLKIAKVFKNSKYNSLTINNDITLLKLSTA ASFSQTVSAVCLPSASDDFAAGTTCVTTGWGLTRYTN ANTPDRLQQASLPLLSNTNCKKYWGTKIKDAMICAGA SGVSSCMGDSGGPLVCKKNGAWTLVGIVSWGSSTCST STPGVYARVTALVNWVQQTLAAN By Michael Schroeder, Biotec 26 Convergent evolution A B C C B A A’ C A and B are native, C is viral By Michael Schroeder, Biotec Henschel et al., Bioinformatics 2006 27 HIV Nef mimics kinase in binding SH3 Kinase (Src Haematopoeitic cell kinase, Catalytic domain) Comparison of NefSH3 and intra-chain interaction of catalytic domain and SH3 of Hck, PDBs: 1efn and 2hck No evidence of homology between Nef and Kinase HIV1-Nef Fyn-SH3/Hck-SH3 By Michael Schroeder, Biotec Henschel et al., Bioinformatics 2006 28 Automatic calculation of equivalent residues Nef Kinase Apart from PxxP motif matches: Arg71/Lys249, Phe90/His289 Residues with equivalents are strictly conserved in HIVNef By Michael Schroeder, Biotec Henschel et al., Bioinformatics 2006 29 Mimickry of baculovirus p35 and human inhibitor of apoptosis Caspase (red) P35 (yellow) IAP (green) Upon infection cell starts apoptosis programme, p35 tries to stop it By Michael Schroeder, Biotec Henschel et al., Bioinformatics 2006 30 Mimickry of Capsids and Cyclophilin HIV capsid protein (yellow) Cyclophilin (red, green) Cyclophilin A restricts HIV infectivity Upon mutation of cyclophilin or inhibition with cyclophorin, infectivity goes up >100 (Towers, Nature Medicine, 2003) By Michael Schroeder, Biotec Henschel et al., Bioinformatics 2006 31 PART II: Some basics By Michael Schroeder, Biotec 32 What do we need? To main operations to align structures: Translation Rotation How to evaluate a structural alignment? Root mean square deviation, rmsd By Michael Schroeder, Biotec 33 Basic Operations: Translation By Michael Schroeder, Biotec 34 Basic Operations: Translation By Michael Schroeder, Biotec 35 Basic Operations: Translation By Michael Schroeder, Biotec 36 Basic Operations: Rotation By Michael Schroeder, Biotec 37 Root Mean Square Deviation What is the distance between two points a with coordinates xa and ya and b with coordinates xb and yb? Euclidean distance: d(a,b) = √ (xa--xb )2 + (ya -yb )2 a b And in 3D? By Michael Schroeder, Biotec 38 Root Mean Square Deviation In a structure alignment the score measures how far the aligned atoms are from each other on average Given the distances di between n aligned atoms, the root mean square deviation is defined as rmsd = √ 1/n ∑ di2 By Michael Schroeder, Biotec 39 Quality of Alignment and Example Unit of RMSD => e.g. Ångstroms Identical structures => RMSD = “0” Similar structures => RMSD is small (1 – 3 Å) Distant structures => RMSD > 3 Å By Michael Schroeder, Biotec 40 PART III: Dynamic Programming By Michael Schroeder, Biotec 41 A very simple algorithm… …to align identical structures with conformational changes Generate a sequence alignment (not necessary if both sequences are really 100% identical) Compute center of mass for both structures Move both structures so that the centers of mass are the origin Compute the angle between all aligned residues Rotate structure by median of all angles By Michael Schroeder, Biotec 42 A very simple algorithm… …to align identical structures with conformational changes Generate a sequence alignment (not necessary if both sequences are really 100% identical) Compute center of mass for both structures Move both structures so that the centers of mass are the origin Question: How? Compute the angle between all aligned residues Assume n atoms Rotate structure by median of (x all ,y angles 1 1,z1) to (xn,yn,zn) (for one structure) By Michael Schroeder, Biotec 43 A very simple algorithm… n atoms(x Question: …to alignHow?Assume identical structures with 1,yconformational 1,z1) to (xn,yn,zn:) changesCenter of mass (xCoM,yCoM,zCoM) = (1/n ni=1 xi , 1/n ni=1 yi 1/n ni=1 zi ) Generate a sequence alignment (not necessary if both sequences are really 100% identical) Compute center of mass for both structures Move both structures so that the centers of mass are the origin How? Compute the angle between allQuestion: aligned residues Rotate structure by median of all angles By Michael Schroeder, Biotec 44 A very simple algorithm… n atomswith (x1,yconformational Question: …to alignHow?Assume identical structures 1,z1) to (xn,yn,zn:) changesCenter of mass (xCoM,yCoM,zCoM) = (1/n ni=1 xi , 1/n ni=1 yi 1/n ni=1 zi Generate a sequence alignment (not necessary if both sequences are really 100% identical) Compute center of mass for both structures Move both structures so that the centers of mass are the origin Compute the angle between all aligned residues Rotate structure by median of all angles For all i: do xi:= xi-xCoM, yi:= yi-yCoM, yi:= yi-yCoM, By Michael Schroeder, Biotec 45 A very simple algorithm… …to align identical structures with conformational changes Generate a sequence alignment (not necessary if both sequences are really 100% identical) Compute center of mass for both structures Move both structures so that the centers of mass are the origin Compute the angle between all aligned residues Rotate structure by median of all angles Why median and not mean? By Michael Schroeder, Biotec 46 A refinement: Alternating alignment and superposition 1. P = initial alignment (e.g. based on sequence alignment) 2. Superpose structures A and B based on P 3. Generate distance-based scoring matrix R from superposition 4. Use dynamic programming to align A and B using scoring matrix R 5. P‘ = new alignment derived from dynamic programming step 6. If P‘ is different from P then go to step 2 again By Michael Schroeder, Biotec 47 Distance-based scoring matrix Let d(Ai, Bj) be the Euclidean distance between Ai and Bj Let t be the upper distance limit for residues to be rewarded The scoring matrix R is defined as follows: R(Ai, Bj) = 1 / d(Ai, Bj) - 1 / t if R(Ai, Bj) > max. score then R(Ai, Bj) = max. score The gap/mismatch penalty is set to 0 By Michael Schroeder, Biotec 48 Distance-based scoring matrix Let d(Ai, Bj) be the Euclidean distance between Ai and Bj Let t be the upper distance limit for residues to be rewarded The scoring matrix R is defined as follows: R(Ai, Bj) = 1 / d(Ai, Bj) - 1 / t if R(Ai, Bj) > max. scoreWhat thensize R(Adoes i, Bj) = max. score PAM have? What size does The gap/mismatch penalty is set to 0 R have? By Michael Schroeder, Biotec 49 Example R(Ai, Bj) = 1/d(Ai, Bj) - 1/t for t=1/10 and max. score =2 By Michael Schroeder, Biotec 50 Part IV: Double dynamic programming (chapter 9) By Michael Schroeder, Biotec 51 Doube dynamic programming Goal: Simultaniously align and superpose structures Double dynamic programming is a heuristic which tries to achieve goal Implemented as part of SSAP (used e.g. by CATH) By Michael Schroeder, Biotec 52 Idea of double dynamic programming Use two levels of dynamic programming: High level, which summarises low level DP Low level, which generates alignment based on assumption that ai and bj are part of an optimal alignment By Michael Schroeder, Biotec 53 Low level matrix ijR is the low level scoring matrix assuming the pair ai and bj are aligned ijRkl is the score showing how well ak fits onto bl under the constraint that ai and bj are aligned Perform dynamic programming for all pairs i,j using ijR with constraint that optimal alignment includes (i,j) By Michael Schroeder, Biotec 54 By Michael Schroeder, Biotec 55 By Michael Schroeder, Biotec 56 Questions: How was max. score set in this example? By Michael Schroeder, Biotec 57 By Michael Schroeder, Biotec 58 By Michael Schroeder, Biotec 59 By Michael Schroeder, Biotec 60 By Michael Schroeder, Biotec 61 By Michael Schroeder, Biotec 62 Summary Structural alignments are useful to study conformational changes, to classify domains into families (DDP is used in CATH), to study proteins with distant relationships and hence low sequence similarity Algorithms Basic operations: translate and rotate Simple algorithm based on dynamic programming Double dynamic programming: low-level programming using substitution matrix based residue distance Aggregation of best paths for high-level programming By Michael Schroeder, Biotec 63