PowerPoint

advertisement
Bioinformatics Master Course
Sequence Alignment
Lecture 10
Pattern matching
part I
1
Sequence Patterns vs.
Protein Structure
I.
Protein-Protein interaction
1. enzyme  (protein) substrate : serine protease trypsin
2. receptor  (protein) ligand : growth hormone receptor
3. antibody  (protein) antigen : immunoglobulin (Ig)
II. Protein-Ion and small molecule interaction
1. protein  ion (Ca2+, Mg2+, Na+, K+, Cl–, HCO3–, SO42–) :
calmodulin
2. pump  ion, coupled to enzymatic function : ATPase
3. channel  water : aquaporin
III. Protein-DNA/RNA interaction
1. enzyme  DNA : Eco-RI ribozyme
2. binder  DNA groove : leucine zipper, zinc finger
3. regulator  RNA : KH domain
2
Reactions and Interactions
• What is the difference between a reaction
and an interaction?
 change in chemical bonding
• Which one of these is a chemical bond?
1.
2.
3.
4.
H3C-CH2-O-H
Na+ Cl–
H-O-H···OH2
H-O-CH2-CH3···H3C-CH2-O-H
3
Bond Strength
• Bond strength and lifetime are a function of
temperature
 vibration (bond stretching), thermal background
• Non-covalent interactions depend very much
on the medium
 compare salt crystal with salt solution
• Interaction strength has a strong distance
dependence
 ion-ion ~ r–2,
 dipole-dipole ~ r–4
 quadrupole-quadrupole ~ r–6
4
Binding: Complementary Interfaces
Binding requires complementary interfaces:
Interfaces have characteristic and conserved residues
 patterns or motifs
5
Sequence Patterns and Profiles
• Comparison between sequence pattern matching and
similarity scoring
PATTERN
SCORE
exact word
identity


regular expression
weight matrix


Hidden Markov Model
profile


generalized profile
general Hidden Markov
Model
6
Resources
• PROSITE: biologically significant sites, patterns and profiles
– www.ebi.ac.uk/ppsearch/
• PFAM: large collection of multiple sequence alignments
– www.sanger.ac.uk/Software/Pfam/
• DIP: interacting proteins
– dip.doe-mbi.ucla.edu/
• Specialized Databases
– Immunoglobins: imgt.cines.fr/
– Ca2+-binding proteins structbio.vanderbilt.edu/cabp_database/
• Molecular visualisation packages
– VMD: www.ks.uiuc.edu/Research/vmd/
– MOLMOL: www.mol.biol.ethz.ch/wuthrich/software/molmol/
– Rasmol: www.umass.edu/microbio/rasmol/
7
Protein-Protein
Interactions
8
Protein Interaction Networks
Most proteins are functionally linked to other proteins
H Jeong, SP Mason, A-L Barabási & ZN Oltvai "Lethality and
centrality in protein networks" Nature 2001;411(6833):41
9
I.1 Enzyme:
Serine Protease Trypsin
• Specific class of hydrolases
– cleave peptide bonds at specific residue positions.
• aspartate proteases, cysteine proteases, serine proteases
Trypsin
HO
'R'
H
C
N
N
C
H
H
O
CH2
Trypsin
Trypsin
H2O
HO
CH2
'R'
CH2 H
C
N
O
N
C
H
H
H
O
'R'
C
OH
N
C
H
H
O
H
H
N
• Trypsin is a serine protease
– cleaves C-terminal of the basic residues Lys and Arg
– one of the three principal digestive proteases
• other two are pepsin and chymotrypsin
– produced in an inactive form by the pancreas
• Pattern: His57, Asp102 and Ser195 (H-D-S)
10
Serine Protease: Trypsin
• Pattern: His57, Asp102 and Ser195 (H-D-S)
11
Principle of Catalysis
http://www.chemguide.co.uk/physical/basicrates/catalyst.html
12
Trypsin Complex with Inhibitor
1btc.pdb
13
I.2 Receptor:
Growth Hormone Receptor
• Membrane-borne receptors:
– extra-cellular domain
• ligand-binding site
– transmembrane domain
• anchoring in the cell membrane
– intracellular domain
• kinase or another signalling module (typically)
• Receptor for growth hormone
– member of the cytokine receptor superfamily
– dimerizes upon binding growth hormone as ligand
– activates intracellular kinase, triggers cellular signalling cascade.
• Most structures only contain extra/intracellular domain
– transmembrane domain is difficult to crystallize
• Patterns:
– YGEFS (growth hormone receptor)
– WSxWS (cytokine receptor family)
14
Growth Hormone Receptor Complex
with Growth Hormone
1a22.pdb
15
I.3 Immune System: Antibody
• Antibodies (immunoglobulins, or Ig)
– immune system: bind ’foreign’ (non-self)
characteristic structures
• e.g. protein surfaces
• Heavy Chain and Light Chain
• Constant part (Fc) and Variable part (Fv).
– Fv specific recognition of target molecule (‘antigen’)
• structure called ‘Ig fold’:
– Two b-sheets face-to-face, with ‘Greek-key’ motif
– binding site between two Ig folds
– hypervariable loops participate in binding:
• H1, H2, H3 and L1, L2, L3
• composition characteristic for antigen
16
Pfam Ig Family Alignment
17
Patterns of Hypervariable Loops
Loop
Before
After
Length
CDR-L1
always Cys
always Trp
10 to 17
CDR-L2
generally Ile-Tyr,
also Val-Tyr, Ile-Lys, Ile-Phe
-
always 7
CDR-L3
always Cys
always Phe-Gly-xxx-Gly
CDR-H1
always Cys-xxx-xxx-xxx
always Trp
10 to 12
CDR-H2
typically Leu-Glu-Trp-Ile-Gly
Lys, Arg-Leu, Ile, Val, Phe,
Thr, Ala-Thr, Ser, Ile, Ala
16 to 19
CDR-H3
always Cys-xxx-xxxx
always Trp-Gly-xxx-Gly
3 to 25
7 to 11
18
Antibody Structure
1F3R.pdb
Kontou et al. Eur J Biochem 2000 267 2389 19
Antibody Diversity
• Gene translocation
• heavy chain
– multiple VH genes join with one DH and one JH
• light chain
– multiple VL genes join with one JL gene
www.cat.cc.md.us/courses/bio141/lecguide/unit3/humoral/antibodies/abydiversity/abydiversity.html
20
Protein-Ion and
Protein-’small molecule’
Interactions
21
II.1 Ion Binding: Calmodulin
• Two domains, each two ‘EF-hands’:
– helix-loop-helix structure
– loop contains Ca2+-binding motif.
• Ca2+-ion: 6-fold coordinated:
– Oxygens from residues 1, 3, 5, 7, 9, and 12 in EF loop:
D-K-D-G-D-G-T-I-T-T-K-Q
– one water molecule
– three are negatively charged
• Ca2+-binding changes conformation of entire protein
from closed to open
– open conformation exposes hydrophobic surface area
– binding site for calmodulin target proteins
22
Calmodulin Complex with Calcium Ions
1exr.pdb
23
II.2 Ion Pump:
2. Calcium ATPase (ATP synthase)
•
protein complex
– links electrical potential to ATP hydrolysis/synthesis
– interconversion between mechanical and electrochemical energy in
molecular motors.
•
•
F1F0 ATPase: reversible proton pump/motor
P-type ATPases: transport ions across membrane against a
concentration gradient.
– Pattern: D-K-T-G-T-[LIVM]-[TIS]
– Next to aspartate which is phosphorylated during reaction cycle
•
Na+/K+-ATPase: ubiquitous membrane transport protein in mammalian
cells
– maintains high K+ and low Na+ in cytoplasm for normal membrane potentials
and cellular activities
•
Ca-ATPases: Ca2+ from cytoplasm to organels (mammalian)
– e.g. sarcoplasmic reticulum, endoplasmic reticulum
24
ATPases
F1Fo-ATPase
Ca2+-ATPase
www.rpi.edu/dept/bcbp/molbiochem/MBWeb/mb1/part2/f1fo.htm
www.utoronto.ca/maclennan/rint1.htm 25
ATPase: Calcium Ions in Active Site
1eul.pdb
26
II.3 Membrane Channel: Aquaporin
Conserved NPA motifs:
Asn, Pro and Ala stabilise loops through multiple hydrogen bonds
Bert de Groot: www.mpibpc.mpg.de/groups/de_groot/bgroot.html
27
Aquaporin: Motifs
• NPA: stabilizes loops B and E
• G(a)xxxG(a)xxG(a):
– Crossing of
right-hand
helical
bundles
Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press 28
Aquaporin Subunit
Bert de Groot: www.mpibpc.mpg.de/groups/de_groot/bgroot.html
1j4n.pdb
29
Protein-DNA/RNA
Interactions
30
III.1 Enzyme: Eco-RI
• Restriction enzyme:
– cut palindrome sequences
– complex of one
DNA molecule
with two Eco-RI
molecules with
inversion
symmetry
www.accessexcellence.org/RC/VL/GG/restriction.html
31
Eco-RI
1qrh.pdb
32
III.2a DNA recognition:
Leucine Zipper
• Dimer
– Leu interactions
– binds DNA by a fork-shaped structure
• ‘coiled-coil’ structure:
– leucines on one side of helix
– 7-residue repeat; one helix turn is 3.6 residues
a
256
b
c
d
e
f
g
(position)
K
V E E L L S K
N Y H L E N E
V A R L K K L
V G
279
33
Leucine Zipper: Complex with DNA
1an2.pdb
34
Leucine Zipper: 7-Residue Repeat
35
III.2b DNA Recognition: Zinc
Finger Proteins
• zinc coordinates several side chains
– pulls them together to form ‘finger’ loops
• Pattern: C-x2-4-C-x12-15-H-x3-5-H or C-x2-4-C-x12-15-C
– recognize nucleic acids (DNA or RNA)
• modulate genes (also proteins can be targeted)
• modulate important functions:
– gene expression
– reverse transcription and virus assembly
• drug discovery targets:
– pathogen-specific 3D structures
– different from endogeneous (cellular) zinc finger proteins
36
Zinc Finger Complex with DNA
1a1h.pdb
37
III.3 RNA Regulation: KH Domain
• bind to specific DNA/RNA locations
– regulation of RNA synthesis and metabolism
– combination with other domains
– Pattern: G-x-x-G
• ribonucleoprotein (RNP) domain
• double stranded RNA binding domain (dsRBD)
• K Homology (KH) domain
– recognize tetranucleotide motifs
– high affinity/specificity:
• RNA secondary structure
• repeated sequence elements
• alpha/beta fold similar to ribosomal proteins
38
KH Domain Complex with RNA
1k1g.pdb
39
Hammerhead The
Motif
of Ribozyme
HHRz
Przybilski, R., et al. Plant Cell 2005;17:1877-1885
Copyright ©2005 American Society of Plant Biologists
40
Hammerhead Motif of Ribozyme
• three base-paired helices (I-III)
• core of 11 highly conserved, non-complementary
nucleotides
– necessary for the catalysis.
• catalytic motif
discovered by
sequence
comparison
of plant viroids
– site-specific,
self-catalyzed
cleavage
(Birikh, 1997)
academic.brooklyn.cuny.edu/chem/zhuang/QD/toppage1.htm 41
Hammerhead Ribozyme Action
488d.pdb
42
Modeling of the Arabidopsis HHRz Ara2
Przybilski, R., et al. Plant Cell 2005;17:1877-1885
Copyright ©2005 American Society of Plant Biologists
43
44
Download