Workshop in Computational Structural Biology

advertisement
Workshop in Computational
Structural Biology
2014
81855 & 81813, 4 points
Ora Schueler-Furman
TA: Orly Marcu
Introduction – When, Where, How?
• When & Where:
– Thursdays, Givat Ram
– Lecture: 15:00-16:45,
Sprinzak 25
NOTE: changed time!!
– Exercise: 17:00-19:45,
Sprinzak computer class #4
– Lectures & exercises available
in moodle
• How:
– Make sure you have an
account in CS ✓
• Exercises
-
Submit 7/9 exercises
Due within 2 weeks
Submit by email to
orly.marcu@gmail.com
30% of grade
• Contact:
Ora 87094
oraf@ekmd.huji.ac.il, or
Orly 87063
orly.marcu@gmail.com
Acknowledgements: Sources of figures and slides include slides from Branden & Tooze; some slides have been
adapted from members of the Rosetta Community, especially from Jens Meiler
Exercises in Pyrosetta have been adapted from teaching material by Jeff Gray
What will we learn:
Part I: Protein structure in the eye
of the computational biologist
1. Introduction to computational structural biology
•The basics of protein structure
•Challenges in computational biology and bioinformatics
•Protein structure prediction and design
Part I: Protein structure in the eye
of the computational biologist
2. Introduction to Rosetta and structural modeling
•Approaches for structural modeling of proteins
•The Rosetta framework and its prediction modes
•Cartesian and polar coordinates
•Sampling (find the structure) and
•Scoring (select the structure)
3. Optimization techniques
•Energy minimization
•Monte Carlo (MC) Sampling
•MC with minimization (MCM)
Part II: Protein modeling and design
4. Ab initio modeling: Principles and approaches
5. Full-atom refinement
• Local optimization
• Side chain modeling
– The representation of side chains as rotamers
– Rotamer and off-rotamer sampling
– Finding minimum energy rotamer combinations
Part II: Protein modeling and design
6. Homology modeling
• Selection of template and alignment of query sequence to
template
• Loop modeling approaches (modeling of unaligned regions)
7. Protein design
• The theoretical basis of protein design; how different design
goals are achieved
• Success and challenge in computational design
Part III: Protein interactions
8. Protein-protein docking
• Challenges and approaches in protein docking
• The theoretical basis of low-resolution and high-resolution
docking
9. Interface analysis and design
• Determinants of binding affinity and specificity
• Identification of interface residue hotspots: Computational
alanine scanning
• Success and challenge in interface design
10. Summary
What will we learn: Exercises
Exercises will span a variety of
subjects and involve both Rosetta
and other widely-used protocols
• Basic introduction: how to
look at proteins
• Protein structure evaluation
and classification: What does
my protein do, how good is its
structure?
• Structure comparison
• Running Rosetta
• Pyrosetta and Rosettascripts:
running and programming
•
•
•
•
•
•
•
ab initio modeling
Homology modeling
Structure refinement
Modeling side chains
Loop modeling
Protein docking
Interface analysis –
Computational alanine
scanning
• Protein design and protein
interface design
1. Introduction to Computational
Structural Biology
The Basics of Protein Structure
The central dogma
The code: 4 bases, 64 triplets, 20 amino acids
4 Hierarchies of protein structure
• Anfinsen: sequence determines structure
The building blocks:
20 amino acids
• Differ in size, polarity,
charge, secondary
structure propensity …
Special amino acids
CO
N
C
H
H
• The simplest aa
• No sc
• Very flexible bb
H
CO
N
C
H2C
CH2
CH2
H
• Cyclic aa
• sc Connects bb N
• Very constrained bb
Aliphatic amino acids
• sc contains only carbon and hydrogen atoms
• hydrophobic
Amino acids with hydroxyl group
Negatively charged amino acids
Different size → different tendency for 2. structure
Amide amino acids
Positively charged amino acids
• pKa 11.1
• pKa 12
• large sc
Aromatic amino acids
• pKa 7
• benzene
ring
• sc contains aromatic ring
Amino acids with sulfur
Cystine
Oxidation of Sulfur
atoms creates
covalent disulfide
bond (S-S bond)
between two
cysteines
S-S bonds stabilize the protein
A chain s
s
GIVEQCCASVCSLYQLENENYCN
s
s
B chain
s
s
F V N Q H L C G S H L V E A L Y L V C G E R G F..
N
C
A chain
B chain
Insulin
Post-translational
modifications
• Processing (pro-insulin/insulin)
– control of protein activity
• Glycosylation
– protein trafficking
• Phosphorylation (Tyr, Ser, Thr)
– regulation of signaling
• Methylation, Acetylation
– histone tagging
• ….
24
Metal binding proteins
• aa: HCDE
• Fe, Zn, Mg, Ca
• Fe
– blood: red hemoglobin
– electro-transfer: cytochrome c
• Zn
– in DNA-binding “Zn-finger” proteins
– Alcohol dehydrogenase: oxidation of alcohol
25
Important bonds for protein folding and
stability
Dipole moments attract each
other by van der Waals force
(transient and very weak: 0.10.2 kcal.mol)
Hydrophobic interaction –
hydrophobic groups/ molecules
tend to cluster together and
shield themselves from the
hydrophilic solvent
Hydrogen bonding potential of amino acids
Primary sequence: concatenated amino
acids
Primary sequence: concatenated amino
acids
Formation of a peptide bond
H
+H N
3
Ca
O
C
O-
R
cpk colors
O - oxygen
H - hydrogen
N - nitrogen
C - carbon
Dihedral angles
Dihedral angles c1-c4 define side chain
• Dihedral angle: defines geometry of
4 consecutive atoms (given bond
lengths and angles)
From wikipedia
Dihedral angles F and 
define backbone geometry

W
F
The peptide bond is planar and polar:
W=180o (trans) or 0o (cis)
The geometry of the peptide backbone
Peptide bond length and angles do not change•
Peptide dihedral angles define structure•
Ramachandran plot

F
All except Glycine
Glycine: flexible backbone
35
Ramachandran plot

F
36
Secondary structure: local interactions
Secondary structure – built from
backbone hydrogen bonds
a helix
• discovered 1951 by Pauling
• 5-40 aa long
• average: 10aa
• right handed
• Oi-NHi+4 : bb
atoms satisfied
• p helix: i - i+5
• 310 helix: i - i+3
Favored: Ala, Leu, Arg, Met, Lys
1.5Ǻ/res
Disfavored: Asn, Thr, Cys, Asp, Gly
a helix: dipole
• binds negative charges at N-terminus
a helix: side chains point out
View down one helical turn
41
Frequent amino acids at the
N-terminus of a helices
Ncap, N1, N2, N3 …….Ccap
Pro
Blocks the continuation of the helix by its side
chain
Asn, Ser
Block the continuation of the helix by
hydrogen bonding with the donor (NH) of N3
42
Helices of different character
1. buried
2. partially exposed
3. exposed
43
Representation: helical wheel
1. buried
2. partially exposed: amphipathic
helix
3. exposed
44
b-sheet
• Involves several regions in sequence
• Oi-NHj
•Parallel and
anti-parallel
sheets
Favored: Tyr, Thr, Ile, Phe, Trp
Disfavored: Glu, Ala, Asp, Gly, Pro
45
Antiparallel b-sheet
• Parallel Hbonds
• Residue side chains point up/down/up ..
• Pleated
46
Parallel b-sheet
• less stable than antiparallel sheet
• angled
hbonds
47
Connecting elements of secondary
structure define tertiary structure
48
Loops
• connect helices and strands
• at surface of molecule
• more flexible
• contain functional sites
49
Hairpin Loops (b turns)
• Connect strands in antiparallel sheet
G,N,D
G
G
S,T
50
Super secondary structures –
Greek Key Motif
Most common topology for 2 hairpins
51
Super Secondary Structuresb-a-b Motif
• connects strands in parallel sheet
• always right-handed
52
Repeated b-a-b motif creates
b-meander: TIM barrel
53
Tertiary structure defines protein
function
The quaternary structure of a
protein defines its biological
functional unit
55
Quaternary structure: Hemoglobin
consists of 4 distinct chains
Quaternary structure: assembly of
protein domains
(from two distinct protein chains, or two
domains in one protein sequence)
Glyceraldehyde phosphate
dehydrogenase:
• domain 1 binds the
substance for being
metabolized,
• domain 2 binds a
cofactor
1. Introduction to Computational
Structural Biology
Experimental determination of
protein structure: X-ray diffraction
and NMR
Experimental determination of
structure
X-ray crystallography
NMR
• Determines electron
density – positions of
atoms in structure
• Highly accurate
• Static: depends on
crystal
• Determines constraints
between labeled spins
• Allows measure of
structure in solution
• Resolution not defined:
more constraints –
better defined structure
X-ray diffraction
X-ray diffraction
If direction is such that
>-Constructive addition
>-Reflection spot in the diffraction pattern
• Wavelength of x-ray ~ crystal plane
separations
• Rotation of crystal relative to beam allows
recording of different diffractions
• Diffraction maps are translated to electron
density maps using Fourier Transform
Resolution measures diffraction angles
(high angle peaks – high resolution data)
X-ray diffraction
Iterative refinement
allows improvement of
structure
R-factor measures quality
Fo – observed
Fc - calculated
X-ray diffraction
1950’s first protein structure
solved by Kendrew & Perutz:
sperm whale myoglobin
Today: ~90’000 structures solved,
most by x-ray crystallography
Challenges
• Grow crystal
• Determine phase
NMR (Nuclear Magnetic Resonance)
NMR-active nuclei (possess
spins)
1H, 13C
Application of magnetic field
reorients spins – measure
resonance between close
nuclei
Extract constraints &
determine structure
1. Introduction to Computational
Structural Biology
Challenges in Computational
Structural Biology
Protein structure prediction
and design
Protein Structure
prediction
Protein
sequence
FASTA
>2180 hSERT
METTPLNSQKQ……
Protein
structure
PDB
ATOM
ATOM
ATOM
ATOM
…..
….
Protein Design
490
491
492
493
N
CA
C
O
GLN
GLN
GLN
GLN
A
A
A
A
31
31
31
31
52.013
52.134
51.726
51.015
-87.359 -8.797
-87.762 -10.201
-89.222 -10.343
-89.601 -11.275
1.00 7.06
1.00 8.67
1.00 10.90
1.00 9.63
Additional topics in computational
structural biology
• Nucleic acids - Prediction of binding and structure
– RNA stem & loops, pseudoknots; protein-RNA binding
– DNA curvature; protein-DNA binding
• Prediction of macromolecular structures
– Reconstruction of protein assemblies from lowresolution cryo-EM maps
• Protein-ligand interactions
– Docking of small ligands
– Design of inhibitors
… and many many more!
Download