Bioinformatics Master Course Sequence Alignment Lecture 10 Pattern matching part I 1 Sequence Patterns vs. Protein Structure I. Protein-Protein interaction 1. enzyme (protein) substrate : serine protease trypsin 2. receptor (protein) ligand : growth hormone receptor 3. antibody (protein) antigen : immunoglobulin (Ig) II. Protein-Ion and small molecule interaction 1. protein ion (Ca2+, Mg2+, Na+, K+, Cl–, HCO3–, SO42–) : calmodulin 2. pump ion, coupled to enzymatic function : ATPase 3. channel water : aquaporin III. Protein-DNA/RNA interaction 1. enzyme DNA : Eco-RI ribozyme 2. binder DNA groove : leucine zipper, zinc finger 3. regulator RNA : KH domain 2 Reactions and Interactions • What is the difference between a reaction and an interaction? change in chemical bonding • Which one of these is a chemical bond? 1. 2. 3. 4. H3C-CH2-O-H Na+ Cl– H-O-H···OH2 H-O-CH2-CH3···H3C-CH2-O-H 3 Bond Strength • Bond strength and lifetime are a function of temperature vibration (bond stretching), thermal background • Non-covalent interactions depend very much on the medium compare salt crystal with salt solution • Interaction strength has a strong distance dependence ion-ion ~ r–2, dipole-dipole ~ r–4 quadrupole-quadrupole ~ r–6 4 Binding: Complementary Interfaces Binding requires complementary interfaces: Interfaces have characteristic and conserved residues patterns or motifs 5 Sequence Patterns and Profiles • Comparison between sequence pattern matching and similarity scoring PATTERN SCORE exact word identity regular expression weight matrix Hidden Markov Model profile generalized profile general Hidden Markov Model 6 Resources • PROSITE: biologically significant sites, patterns and profiles – www.ebi.ac.uk/ppsearch/ • PFAM: large collection of multiple sequence alignments – www.sanger.ac.uk/Software/Pfam/ • DIP: interacting proteins – dip.doe-mbi.ucla.edu/ • Specialized Databases – Immunoglobins: imgt.cines.fr/ – Ca2+-binding proteins structbio.vanderbilt.edu/cabp_database/ • Molecular visualisation packages – VMD: www.ks.uiuc.edu/Research/vmd/ – MOLMOL: www.mol.biol.ethz.ch/wuthrich/software/molmol/ – Rasmol: www.umass.edu/microbio/rasmol/ 7 Protein-Protein Interactions 8 Protein Interaction Networks Most proteins are functionally linked to other proteins H Jeong, SP Mason, A-L Barabási & ZN Oltvai "Lethality and centrality in protein networks" Nature 2001;411(6833):41 9 I.1 Enzyme: Serine Protease Trypsin • Specific class of hydrolases – cleave peptide bonds at specific residue positions. • aspartate proteases, cysteine proteases, serine proteases Trypsin HO 'R' H C N N C H H O CH2 Trypsin Trypsin H2O HO CH2 'R' CH2 H C N O N C H H H O 'R' C OH N C H H O H H N • Trypsin is a serine protease – cleaves C-terminal of the basic residues Lys and Arg – one of the three principal digestive proteases • other two are pepsin and chymotrypsin – produced in an inactive form by the pancreas • Pattern: His57, Asp102 and Ser195 (H-D-S) 10 Serine Protease: Trypsin • Pattern: His57, Asp102 and Ser195 (H-D-S) 11 Principle of Catalysis http://www.chemguide.co.uk/physical/basicrates/catalyst.html 12 Trypsin Complex with Inhibitor 1btc.pdb 13 I.2 Receptor: Growth Hormone Receptor • Membrane-borne receptors: – extra-cellular domain • ligand-binding site – transmembrane domain • anchoring in the cell membrane – intracellular domain • kinase or another signalling module (typically) • Receptor for growth hormone – member of the cytokine receptor superfamily – dimerizes upon binding growth hormone as ligand – activates intracellular kinase, triggers cellular signalling cascade. • Most structures only contain extra/intracellular domain – transmembrane domain is difficult to crystallize • Patterns: – YGEFS (growth hormone receptor) – WSxWS (cytokine receptor family) 14 Growth Hormone Receptor Complex with Growth Hormone 1a22.pdb 15 I.3 Immune System: Antibody • Antibodies (immunoglobulins, or Ig) – immune system: bind ’foreign’ (non-self) characteristic structures • e.g. protein surfaces • Heavy Chain and Light Chain • Constant part (Fc) and Variable part (Fv). – Fv specific recognition of target molecule (‘antigen’) • structure called ‘Ig fold’: – Two b-sheets face-to-face, with ‘Greek-key’ motif – binding site between two Ig folds – hypervariable loops participate in binding: • H1, H2, H3 and L1, L2, L3 • composition characteristic for antigen 16 Pfam Ig Family Alignment 17 Patterns of Hypervariable Loops Loop Before After Length CDR-L1 always Cys always Trp 10 to 17 CDR-L2 generally Ile-Tyr, also Val-Tyr, Ile-Lys, Ile-Phe - always 7 CDR-L3 always Cys always Phe-Gly-xxx-Gly CDR-H1 always Cys-xxx-xxx-xxx always Trp 10 to 12 CDR-H2 typically Leu-Glu-Trp-Ile-Gly Lys, Arg-Leu, Ile, Val, Phe, Thr, Ala-Thr, Ser, Ile, Ala 16 to 19 CDR-H3 always Cys-xxx-xxxx always Trp-Gly-xxx-Gly 3 to 25 7 to 11 18 Antibody Structure 1F3R.pdb Kontou et al. Eur J Biochem 2000 267 2389 19 Antibody Diversity • Gene translocation • heavy chain – multiple VH genes join with one DH and one JH • light chain – multiple VL genes join with one JL gene www.cat.cc.md.us/courses/bio141/lecguide/unit3/humoral/antibodies/abydiversity/abydiversity.html 20 Protein-Ion and Protein-’small molecule’ Interactions 21 II.1 Ion Binding: Calmodulin • Two domains, each two ‘EF-hands’: – helix-loop-helix structure – loop contains Ca2+-binding motif. • Ca2+-ion: 6-fold coordinated: – Oxygens from residues 1, 3, 5, 7, 9, and 12 in EF loop: D-K-D-G-D-G-T-I-T-T-K-Q – one water molecule – three are negatively charged • Ca2+-binding changes conformation of entire protein from closed to open – open conformation exposes hydrophobic surface area – binding site for calmodulin target proteins 22 Calmodulin Complex with Calcium Ions 1exr.pdb 23 II.2 Ion Pump: 2. Calcium ATPase (ATP synthase) • protein complex – links electrical potential to ATP hydrolysis/synthesis – interconversion between mechanical and electrochemical energy in molecular motors. • • F1F0 ATPase: reversible proton pump/motor P-type ATPases: transport ions across membrane against a concentration gradient. – Pattern: D-K-T-G-T-[LIVM]-[TIS] – Next to aspartate which is phosphorylated during reaction cycle • Na+/K+-ATPase: ubiquitous membrane transport protein in mammalian cells – maintains high K+ and low Na+ in cytoplasm for normal membrane potentials and cellular activities • Ca-ATPases: Ca2+ from cytoplasm to organels (mammalian) – e.g. sarcoplasmic reticulum, endoplasmic reticulum 24 ATPases F1Fo-ATPase Ca2+-ATPase www.rpi.edu/dept/bcbp/molbiochem/MBWeb/mb1/part2/f1fo.htm www.utoronto.ca/maclennan/rint1.htm 25 ATPase: Calcium Ions in Active Site 1eul.pdb 26 II.3 Membrane Channel: Aquaporin Conserved NPA motifs: Asn, Pro and Ala stabilise loops through multiple hydrogen bonds Bert de Groot: www.mpibpc.mpg.de/groups/de_groot/bgroot.html 27 Aquaporin: Motifs • NPA: stabilizes loops B and E • G(a)xxxG(a)xxG(a): – Crossing of right-hand helical bundles Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press 28 Aquaporin Subunit Bert de Groot: www.mpibpc.mpg.de/groups/de_groot/bgroot.html 1j4n.pdb 29 Protein-DNA/RNA Interactions 30 III.1 Enzyme: Eco-RI • Restriction enzyme: – cut palindrome sequences – complex of one DNA molecule with two Eco-RI molecules with inversion symmetry www.accessexcellence.org/RC/VL/GG/restriction.html 31 Eco-RI 1qrh.pdb 32 III.2a DNA recognition: Leucine Zipper • Dimer – Leu interactions – binds DNA by a fork-shaped structure • ‘coiled-coil’ structure: – leucines on one side of helix – 7-residue repeat; one helix turn is 3.6 residues a 256 b c d e f g (position) K V E E L L S K N Y H L E N E V A R L K K L V G 279 33 Leucine Zipper: Complex with DNA 1an2.pdb 34 Leucine Zipper: 7-Residue Repeat 35 III.2b DNA Recognition: Zinc Finger Proteins • zinc coordinates several side chains – pulls them together to form ‘finger’ loops • Pattern: C-x2-4-C-x12-15-H-x3-5-H or C-x2-4-C-x12-15-C – recognize nucleic acids (DNA or RNA) • modulate genes (also proteins can be targeted) • modulate important functions: – gene expression – reverse transcription and virus assembly • drug discovery targets: – pathogen-specific 3D structures – different from endogeneous (cellular) zinc finger proteins 36 Zinc Finger Complex with DNA 1a1h.pdb 37 III.3 RNA Regulation: KH Domain • bind to specific DNA/RNA locations – regulation of RNA synthesis and metabolism – combination with other domains – Pattern: G-x-x-G • ribonucleoprotein (RNP) domain • double stranded RNA binding domain (dsRBD) • K Homology (KH) domain – recognize tetranucleotide motifs – high affinity/specificity: • RNA secondary structure • repeated sequence elements • alpha/beta fold similar to ribosomal proteins 38 KH Domain Complex with RNA 1k1g.pdb 39 Hammerhead The Motif of Ribozyme HHRz Przybilski, R., et al. Plant Cell 2005;17:1877-1885 Copyright ©2005 American Society of Plant Biologists 40 Hammerhead Motif of Ribozyme • three base-paired helices (I-III) • core of 11 highly conserved, non-complementary nucleotides – necessary for the catalysis. • catalytic motif discovered by sequence comparison of plant viroids – site-specific, self-catalyzed cleavage (Birikh, 1997) academic.brooklyn.cuny.edu/chem/zhuang/QD/toppage1.htm 41 Hammerhead Ribozyme Action 488d.pdb 42 Modeling of the Arabidopsis HHRz Ara2 Przybilski, R., et al. Plant Cell 2005;17:1877-1885 Copyright ©2005 American Society of Plant Biologists 43 44