Workshop in Computational Structural Biology 2014 81855 & 81813, 4 points Ora Schueler-Furman TA: Orly Marcu Introduction – When, Where, How? • When & Where: – Thursdays, Givat Ram – Lecture: 15:00-16:45, Sprinzak 25 NOTE: changed time!! – Exercise: 17:00-19:45, Sprinzak computer class #4 – Lectures & exercises available in moodle • How: – Make sure you have an account in CS ✓ • Exercises - Submit 7/9 exercises Due within 2 weeks Submit by email to orly.marcu@gmail.com 30% of grade • Contact: Ora 87094 oraf@ekmd.huji.ac.il, or Orly 87063 orly.marcu@gmail.com Acknowledgements: Sources of figures and slides include slides from Branden & Tooze; some slides have been adapted from members of the Rosetta Community, especially from Jens Meiler Exercises in Pyrosetta have been adapted from teaching material by Jeff Gray What will we learn: Part I: Protein structure in the eye of the computational biologist 1. Introduction to computational structural biology •The basics of protein structure •Challenges in computational biology and bioinformatics •Protein structure prediction and design Part I: Protein structure in the eye of the computational biologist 2. Introduction to Rosetta and structural modeling •Approaches for structural modeling of proteins •The Rosetta framework and its prediction modes •Cartesian and polar coordinates •Sampling (find the structure) and •Scoring (select the structure) 3. Optimization techniques •Energy minimization •Monte Carlo (MC) Sampling •MC with minimization (MCM) Part II: Protein modeling and design 4. Ab initio modeling: Principles and approaches 5. Full-atom refinement • Local optimization • Side chain modeling – The representation of side chains as rotamers – Rotamer and off-rotamer sampling – Finding minimum energy rotamer combinations Part II: Protein modeling and design 6. Homology modeling • Selection of template and alignment of query sequence to template • Loop modeling approaches (modeling of unaligned regions) 7. Protein design • The theoretical basis of protein design; how different design goals are achieved • Success and challenge in computational design Part III: Protein interactions 8. Protein-protein docking • Challenges and approaches in protein docking • The theoretical basis of low-resolution and high-resolution docking 9. Interface analysis and design • Determinants of binding affinity and specificity • Identification of interface residue hotspots: Computational alanine scanning • Success and challenge in interface design 10. Summary What will we learn: Exercises Exercises will span a variety of subjects and involve both Rosetta and other widely-used protocols • Basic introduction: how to look at proteins • Protein structure evaluation and classification: What does my protein do, how good is its structure? • Structure comparison • Running Rosetta • Pyrosetta and Rosettascripts: running and programming • • • • • • • ab initio modeling Homology modeling Structure refinement Modeling side chains Loop modeling Protein docking Interface analysis – Computational alanine scanning • Protein design and protein interface design 1. Introduction to Computational Structural Biology The Basics of Protein Structure The central dogma The code: 4 bases, 64 triplets, 20 amino acids 4 Hierarchies of protein structure • Anfinsen: sequence determines structure The building blocks: 20 amino acids • Differ in size, polarity, charge, secondary structure propensity … Special amino acids CO N C H H • The simplest aa • No sc • Very flexible bb H CO N C H2C CH2 CH2 H • Cyclic aa • sc Connects bb N • Very constrained bb Aliphatic amino acids • sc contains only carbon and hydrogen atoms • hydrophobic Amino acids with hydroxyl group Negatively charged amino acids Different size → different tendency for 2. structure Amide amino acids Positively charged amino acids • pKa 11.1 • pKa 12 • large sc Aromatic amino acids • pKa 7 • benzene ring • sc contains aromatic ring Amino acids with sulfur Cystine Oxidation of Sulfur atoms creates covalent disulfide bond (S-S bond) between two cysteines S-S bonds stabilize the protein A chain s s GIVEQCCASVCSLYQLENENYCN s s B chain s s F V N Q H L C G S H L V E A L Y L V C G E R G F.. N C A chain B chain Insulin Post-translational modifications • Processing (pro-insulin/insulin) – control of protein activity • Glycosylation – protein trafficking • Phosphorylation (Tyr, Ser, Thr) – regulation of signaling • Methylation, Acetylation – histone tagging • …. 24 Metal binding proteins • aa: HCDE • Fe, Zn, Mg, Ca • Fe – blood: red hemoglobin – electro-transfer: cytochrome c • Zn – in DNA-binding “Zn-finger” proteins – Alcohol dehydrogenase: oxidation of alcohol 25 Important bonds for protein folding and stability Dipole moments attract each other by van der Waals force (transient and very weak: 0.10.2 kcal.mol) Hydrophobic interaction – hydrophobic groups/ molecules tend to cluster together and shield themselves from the hydrophilic solvent Hydrogen bonding potential of amino acids Primary sequence: concatenated amino acids Primary sequence: concatenated amino acids Formation of a peptide bond H +H N 3 Ca O C O- R cpk colors O - oxygen H - hydrogen N - nitrogen C - carbon Dihedral angles Dihedral angles c1-c4 define side chain • Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles) From wikipedia Dihedral angles F and define backbone geometry W F The peptide bond is planar and polar: W=180o (trans) or 0o (cis) The geometry of the peptide backbone Peptide bond length and angles do not change• Peptide dihedral angles define structure• Ramachandran plot F All except Glycine Glycine: flexible backbone 35 Ramachandran plot F 36 Secondary structure: local interactions Secondary structure – built from backbone hydrogen bonds a helix • discovered 1951 by Pauling • 5-40 aa long • average: 10aa • right handed • Oi-NHi+4 : bb atoms satisfied • p helix: i - i+5 • 310 helix: i - i+3 Favored: Ala, Leu, Arg, Met, Lys 1.5Ǻ/res Disfavored: Asn, Thr, Cys, Asp, Gly a helix: dipole • binds negative charges at N-terminus a helix: side chains point out View down one helical turn 41 Frequent amino acids at the N-terminus of a helices Ncap, N1, N2, N3 …….Ccap Pro Blocks the continuation of the helix by its side chain Asn, Ser Block the continuation of the helix by hydrogen bonding with the donor (NH) of N3 42 Helices of different character 1. buried 2. partially exposed 3. exposed 43 Representation: helical wheel 1. buried 2. partially exposed: amphipathic helix 3. exposed 44 b-sheet • Involves several regions in sequence • Oi-NHj •Parallel and anti-parallel sheets Favored: Tyr, Thr, Ile, Phe, Trp Disfavored: Glu, Ala, Asp, Gly, Pro 45 Antiparallel b-sheet • Parallel Hbonds • Residue side chains point up/down/up .. • Pleated 46 Parallel b-sheet • less stable than antiparallel sheet • angled hbonds 47 Connecting elements of secondary structure define tertiary structure 48 Loops • connect helices and strands • at surface of molecule • more flexible • contain functional sites 49 Hairpin Loops (b turns) • Connect strands in antiparallel sheet G,N,D G G S,T 50 Super secondary structures – Greek Key Motif Most common topology for 2 hairpins 51 Super Secondary Structuresb-a-b Motif • connects strands in parallel sheet • always right-handed 52 Repeated b-a-b motif creates b-meander: TIM barrel 53 Tertiary structure defines protein function The quaternary structure of a protein defines its biological functional unit 55 Quaternary structure: Hemoglobin consists of 4 distinct chains Quaternary structure: assembly of protein domains (from two distinct protein chains, or two domains in one protein sequence) Glyceraldehyde phosphate dehydrogenase: • domain 1 binds the substance for being metabolized, • domain 2 binds a cofactor 1. Introduction to Computational Structural Biology Experimental determination of protein structure: X-ray diffraction and NMR Experimental determination of structure X-ray crystallography NMR • Determines electron density – positions of atoms in structure • Highly accurate • Static: depends on crystal • Determines constraints between labeled spins • Allows measure of structure in solution • Resolution not defined: more constraints – better defined structure X-ray diffraction X-ray diffraction If direction is such that >-Constructive addition >-Reflection spot in the diffraction pattern • Wavelength of x-ray ~ crystal plane separations • Rotation of crystal relative to beam allows recording of different diffractions • Diffraction maps are translated to electron density maps using Fourier Transform Resolution measures diffraction angles (high angle peaks – high resolution data) X-ray diffraction Iterative refinement allows improvement of structure R-factor measures quality Fo – observed Fc - calculated X-ray diffraction 1950’s first protein structure solved by Kendrew & Perutz: sperm whale myoglobin Today: ~90’000 structures solved, most by x-ray crystallography Challenges • Grow crystal • Determine phase NMR (Nuclear Magnetic Resonance) NMR-active nuclei (possess spins) 1H, 13C Application of magnetic field reorients spins – measure resonance between close nuclei Extract constraints & determine structure 1. Introduction to Computational Structural Biology Challenges in Computational Structural Biology Protein structure prediction and design Protein Structure prediction Protein sequence FASTA >2180 hSERT METTPLNSQKQ…… Protein structure PDB ATOM ATOM ATOM ATOM ….. …. Protein Design 490 491 492 493 N CA C O GLN GLN GLN GLN A A A A 31 31 31 31 52.013 52.134 51.726 51.015 -87.359 -8.797 -87.762 -10.201 -89.222 -10.343 -89.601 -11.275 1.00 7.06 1.00 8.67 1.00 10.90 1.00 9.63 Additional topics in computational structural biology • Nucleic acids - Prediction of binding and structure – RNA stem & loops, pseudoknots; protein-RNA binding – DNA curvature; protein-DNA binding • Prediction of macromolecular structures – Reconstruction of protein assemblies from lowresolution cryo-EM maps • Protein-ligand interactions – Docking of small ligands – Design of inhibitors … and many many more!