bio-modeling c o u r s e l a y o u t introduction molecular biology biotechnology bioMEMS bioinformatics bio-modeling cells and e-cells transcription and regulation cell communication neural networks dna computing fractals and patterns the birds and the bees ….. and ants i n t r o d u c t i o n far and away in the past Newton’s equations of motions (17th -18th century) Molecular dynamics (MD) Boltzmann’s statistics (19th century) Monte Carlo (MC) Schrödinger/Heisenberg’s century) quantum mechanics (20th birth of simulation in chemistry 1950’s: do it by hand (or mechanical calculator)! Tried to solve Newton’s equation of motion for small systems (e.g. three-atom system) Didn’t take very long before they saw computers 1970’s: Age of punchcards 1980’s: Better IO devices Workstations dominated as research platforms first generation (1980’s – 1990’s) Gas phase reaction (e.g.) H + H2 H2 + H RB-C MD RA-B first generation (1980’s – 1990’s) Liquid simulation (e.g.) Lennard-Jones Fluid MD/MC first generation (1980’s – 1990’s) Proteins on lattice MC first generation (1980’s – 1990’s) Quantum mechanical structure calculation (semi-empirical, ab initio, …) revolution (~ 1995) Workstation-like PCs 100 hr Cray time 64MB / 150MHz Pentium “Cheap and fast” Impacts Two directions 1) More accurate methods 2) Larger system Start of bio-simulations impact on “non-bio” simulations Better surface Revisions on existing surfaces Dynamics on mechanical surfaces quantum Time dependent Schrödinger equation instead of Newton’s equation Totally quantum (can’t be more accurate) Some people still do this for hydride/proton transfer in enzyme dynamics RB-C Quantum wavepacket dynamics RA-B Impacts on bio-simulations Proteins got free from the lattice! Off lattice model (still, each residue as a bead) United atom approach (e.g. CH3 one atom) All atom approach With water (explicit solvent) Without water (implicit solvent) What to look at? Kinetics: dynamic characteristics (e.g. folding simulation) Thermodynamics: equilibrium characteristics (e.g. binding affinity of protein & drug) solvent models Implicit solvent Solvent accessible surface area (SASA) Solvation free energy Cheaper than explicit Discrete nature of solvent not included Different methods for SASA/free-E calculation Generalized Born model (GB/SA) Poisson-Boltzmann model (PB/SA) Distance dependent dielectric (DD/SA) solvent models Explicit solvent Water as individual molecules Expensive calculation Periodic boundary conditions usually necessary Rigid/flexible, polarizable/non-polarizable SPC, TIP3P, TIP4P, TIP5P, … impacts on bi o-simulati ons Proteins got free from the lattice! Off lattice model (each residue as a bead) United atom approach (e.g. CH3 one atom) All atom approach With water (explicit solvent) Without water (implicit solvent) What to look at? Kinetics: dynamic characteristics (e.g. folding simulation) Thermodynamics: equilibrium characteristics (e.g. binding affinity of protein & drug) Remember, proteins are still big! off lattice go model Developed from lattice model: “funnel concept” Nature has developed proteins to fold (evolution) Proteins can be modeled to fold Native contacts energy surface Matches with experimental observations united atom/implicit model folding “Statistical folding” Starts from many independent trajectories Lucky trajectories fold Nfolded / Ntotal = kfold x time all atom unfolding Folding inferred from unfolding At high T, unfolding is fast (~ 1 ns) Full atomistic detail from folded state to unfolded state binding free energy: docking Molecular modeling” Binding free energy is calculated based on the shape of ligand and protein Drug design binding free energy: more accurate versions Free energy: Potential + entropy factor P + L PL Thermodynamic integration (TI) Free energy perturbation (FEP) Jarzinsky’s inequality Extremely expensive calculations DF free energ y la nds ca pe m ethod Kinetic information is inferred from free energy surface Rough free energy surface can be obtained faster by parallelization “Trajectory by intuition” current limitation Accuracies of models Force field Solvent models Speed For small proteins (<50 amino acids): 1 ns ~ 1 day Biologically relevant event timescale > 1 ms Size Many proteins are not just large: they are huge! responses to the challenges Accuracy: Blend with quantum mechanical calculation QM/MM, QM-trajectory method (e.g. CPMD) Speed E.g. Compute on video card Size E.g. Umbrella sampling computational biology Biological Systems are complex, thus, a combination of experimental and computational approaches are needed. computational biology Computational Biology Bioinformatics More than sequences, database searches, statistics or image analysis. A part of Computational Science Using mathematical modeling, simulation and visualization Complementing theory and experiment simplest chemical reaction AB irreversible, one-molecule reaction examples: all sorts of decay processes, e.g. radioactive, fluorescence, activated receptor returning to inactive state any metabolic pathway can be described by a combination of processes of this type (including reversible reactions and, in some respects, multimolecule reactions) simplest chemical reaction AB various levels of description: homogeneous system, large numbers of molecules = ordinary differential equations, kinetics small numbers of molecules = probabilistic equations, stochastics spatial heterogeneity = partial differential equations, diffusion small number of heterogeneously distributed molecules = single-molecule tracking (e.g. cytoskeleton modelling) k i n e t i c d e s c r i p t i o n Imagine a box containing N molecules. How many will decay during time t? k N Imagine two boxes containing N/2 molecules each. How many decay? k N Imagine two boxes containing N molecules each. How many decay? 2k N In general: dn(t ) t * n(t ) n(t ) N 0e dt differential equation (ordinary, linear, first-order) exact solution (in more complex cases replaced by a numerical approximation) what is bio-modeling? biological building blocks DNA GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG RNA GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG PROTEIN GLU GAL GLU ASN GLN ALA ASN PRO ARG LEU protein folding LEU ARG ASN PRO ALA ASN GLN GLU GLU GLU VAL GLU ASN VAL GLN ALA ASN PRO ARG LEU ... some fundamental questions Question #1: Given a protein or DNA molecule, what is the geometric structure of the molecule? Question #2: Why and how protein folds to a unique three-dimensional structure? Question #3: Given a set of distances between pairs of atoms, how can we determine the coordinates of the atoms? Question #4: Given the magnitudes of the structure factors of a protein, how can we determine the phases of the structure factors? Question #5: Given two proteins, how can we compare their geometric structures? Question #6: … methods for structure prediction and determination Protein X-ray Crystallography Nuclear Magnetic Resonance Potential Energy Minimization Molecular Dynamics Simulation Homology Modeling Fold Recognition Inverse Protein Folding empirical structure determination Two major experimental methods for determining protein structure X-ray Crystallography Requires growing a crystal of the protein (impossible for some, never easy) Diffraction pattern can be inverse-Fourier transformed to characterize electron densities (Phase problem) Nuclear Magnetic Resonance (NMR) imaging Provides distance constraints, but can be hard to find a corresponding structure Works only for relatively small proteins X -ra y cr ys t a llog ra phy X-rays, since wavelength is near the distance between bonded carbon atoms Maps electron density, not atoms directly Crystal to get a lot of spatially aligned atoms Have to invert Fourier transform to get structure, but only have amplitudes, not phases X -ra y cr ys t a llog ra phy X-ray crystallography computing In X-ray crystallography, protein first needs to be purified and crystallized, which may take months or years to complete, if not failed. After that, the protein crystal is put into an X-ray equipment to make an X-ray diffraction image. The diffraction image can be used to determine the threedimensional structure of the protein. The process is time consuming, and some proteins cannot even be crystallized. X-ray crystallography computing A mathematical problem, called the phase problem, needs to be solved before every crystal structure can be fully determined from the diffraction data. 80% of the structures in PDB Data Bank were determined by using X-ray crystallography. NMR structure determination The NMR approach is based on the fact that nuclei spin and generate magnetic fields. When two nuclei are close their spins interact. The intensity of the interaction depends on the distance between the nuclei. Therefore, the distances between certain pairs of atoms can be estimated by measuring the intensities of the nuclei spinspin couplings. The distance data obtained from the NMR experiment can be used to deduce the structural information for the molecule. One way of achieving such a goal is based on molecular distance geometry. NMR structure determination Not all distances between pairs of atoms can be detected. In practice, only lower and upper bounds for the distances can be obtained also. Structure can be determined by solving a distance geometry problem with the distance data from the NMR experiments. 15% of the structures in PDB Data Bank were determined by using NMR spectroscopy. potential energy minimization Hypothesis Protein native structure has the lowest or almost lowest potential energy. It can therefore be located at the global energy minimum of protein. potential energy minimization A reasonably accurate potential energy function needs to be constructed. Given such a function, a local minimum is easy to find, but a global one is hard, especially if the function has many local minima. No completely satisfactory algorithm has been developed yet for minimizing proteins. Potential energy minimization has been used successfully for structure refinement though. molecular dynamics Folding can be simulated by following the movement of the atoms in protein according to Newton’s second law of motion. molecular dynamics The step size has to be small in femto-second to achieve accuracy. Current computing technology can make only picoseconds to microseconds of simulation, while protein folding may take seconds or even longer time. Molecular dynamics simulation has been used successfully for the study of other types of dynamical behavior of protein. limitations of MD simulations Full atomic representation noise difficulty in discerning the dominant mechanisms of motion need for methods for filtering out the noise, such as Essential Dynamics. Empirical force fields limited by the accuracy of the potentials. Time steps constrained by fastest motion (vibrations in bond lengths occur in the femtoseconds (fs) time range and necessitate the use of timesteps of 1-5 fs). Inefficient sampling of the complete space of conformations. Limited to small proteins (100s of residues) and/or short times (subnanoseconds). sequence structure alignment Homology Modeling Sequence to Sequence Fold Recognition Structure to Sequence Known Sequences / Structures Sequence Structure Alignment Inverse Protein Folding Sequence to Structure Ranking Sequences / Structures sequence structure alignment Scoring functions may not be able to distinguish between good and bad matches. Computing the best alignment is NP-hard in general when gaps are allowed. The results are not accurate and have only certain level of confidence. what is biomolecular modeling? Application of computational models to understand the structure, dynamics, and thermodynamics of biological molecules The models must be tailored to the question at hand: Schrödinger equation is not the answer to everything! Reductionist view bound to fail! This implies that biomolecular modeling must be both multidisciplinary and multiscale an odd remark "Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit in chemistry. If mathematical analysis should ever hold a prominent place in chemistry - an aberration which is happily almost impossible - it would occasion a rapid and widespread degeneration of that science." A. Comte (1830) a Nobel remark 1992 Nobel Prize in Chemistry Rudolph Marcus (Theory of Electron Transfer) 1998 Nobel Prize in Chemistry John Pople (ab initio) Walter Kohn (DFT-density functional theory) growth of biological databases 3D structures growth http://www.rcsb.org/pdb/holdings.html molecular modeling structure-property relationships “First Principles” • H Y = E Y (QM) •- dE / dri = mi d2ri / dt2(MD) •Folding simulations Molecular Model Mathematical model Predictions: •Structure •Properties Empirical Correlations {property} = k {Descriptors} ^ •E = Ebonded + Enonbonded (MM) • log ( 1 C ) k p 2 + k 'p + rs + k '' (QSAR) •Fold recognition m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es Conformational energy (potential energy) Etotal Evalence + Enonbond Evalence = Ebond + Eangle + Etorsion + Eoop bond stretching(Ebond) valence angle bending (Eangle) dihedral angle torsion (Etorsion) out-of-plane interactions (Eoop) Enonbond = EvdW + ECoulomb + Ehbond van der Waals (EvdW) electrostatic (ECoulomb) hydrogen bond (Ehbond) F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es Force-field Σ Force fields conformational energy (potential energy) definition by atoms type atomic charges constant of force, equlibrium values energy equations F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es standard force field F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es bond-stretching ( Ebond ) k 1 e ( r r0 ) 2 Morse k ( r r0 ) 2 k 2 ( r r0 ) 2 + k 3 ( r r0 ) 3 + k 4 ( r r0 ) 4 quadratic quartic Morse quadratic valence angle bending (Eangle ) k ( 0 ) 2 quadratic dihedral angle torsion ( Etorsion ) k 1 + cos( n 0 ) F.Melani Molecular Modeling in Chimica Farmaceutica k 2 ( 0 ) 2 + k 3 ( 0 ) 3 + k 4 ( 0 ) 4 quartic m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es out-of-plane interactions ( Eoop ) k 2 H R' O R k F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es nonbond term (Enonbond ) 12 van der Waals ( EvdW ) Cij Dij i j rij rij 12 hydrogen bond ( Ehbond ) Cij Dij r r i j ij ij qi q j electrostatic ( Ecoulomb ) F.Melani 6 r i j ij Molecular Modeling in Chimica Farmaceutica 10 0 6 r 0 12 r ij ij Eij0 2 r rij i j ij 0 10 r 0 12 r ij ij 0 E 5 6 ij rij i j rij m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es Example: H2O (potential energy ) ( o E K OH b bOH ) + (b b ) 2 ' 2 o OH ( o + K HOH HOH ) 2 Koh, b0OH, KHOH, and 0HOH are parameters of the forcefield b is the current bond length of one O-H b' is the length of the other O-H bond is the H-O-H angle. F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es DOCKING The objective: searching the orientations with low interaction energies. 12 6 Cij Dij qi q j + Eint r rij i j rij ij F.Melani Molecular Modeling in Chimica Farmaceutica m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es MEP V ( p) nucleus A V ( p) i ZA r (r) dr RAp rrp qi rri p electronic density r (r) F.Melani BasisFunctions Pmm ( r ) ( r ) m Molecular Modeling in Chimica Farmaceutica molecular vibration molecular vibration protein structure protein structure Most proteins will fold spontaneously in water, so amino acid sequence alone should be enough to determine protein structure However, the physics are daunting: 20,000+ protein atoms, plus equal amounts of water Many non-local interactions Can takes seconds (most chemical reactions take place ~1012 --1,000,000,000,000x faster) Empirical determinations advancing rapidly. of protein structure are protein structure Proteins are polymers of amino acids linked by peptide bonds. Properties of proteins are determined by both the particular sequence of amino acids and by the conformation (fold) of the protein. Flexibility in the bonds around C: (phi) Y (psi) sidechain protein structure Protein structure is described in four levels Primary structure: amino acid sequence Secondary structure: local (in sequence) ordering into ()Helices: compressed, corkscrew structures ()Strands: extended, nearly straight structures ()Sheets: paired strands, reinforced by hydrogen bonds parallel (same direction) or antiparallel sheets Coils, Turns & Loops: changes in direction Tertiary structure: global ordering (all angles/atoms) Quaternary structures: multiple, disconnected amino acid chains interacting to form a larger structure helices 2 types of sheets anti-parallel parallel t u r n s combining secondary structures to make motifs DNA-binding helix-turn-helix Calcium-binding motif 24 ways to arrange adjacent hairpins alpha/beta domains Triosephosphate isomerase Dehydrogenase Ramanchandran plot Ramanchandran plot always glycine protein structure cartoons protein structure representations protein structure representations protein structure representations protein structure representations protein structure representations protein structure Proteins are created linearly and then assume their tertiary structure by “folding.” Exact mechanism is still unknown Proteins assume the lowest energy structure Or sometimes an ensemble of low energy structures. Hydrophobic collapse drives process Local (secondary) structure proclivities Internal stabilizers: Hydrogen bonds, disulphide bonds, salt bridges. CaM Kinase II structure serine-threonine protein kinase calmodulin regulation multimer formation 12 subunits with the catalytic domains facing out sequence comparison unc-43 rCaMKII hCaMKI rCaMKI --------------------MQLQQINSGAFSVVRRCVHKTTGLEFAAKIINTKKLSARD -------MATITCTRFTEEYQLFEELGKGAFSVVRRCVKVLAGQEYPAKIINTKKLSARD MLGAVEGPRWKQAEDIRDIYDFRDVLGTGAFSEVILAEDKRTQKLVAIKCIAKEALEGKE MPGAVEGPRWKQAEDIRDIYDFRDVLGTGAFSEVILAEDKRTQKLVAIKCIAKKALEGKE .. **** * . . * * * .. unc-43 rCaMKII hCaMKI rCaMKI FQKLEREARICRKLQHPNIVRLHDSIQEESFHYLVFDLVTGGELFEDIVAREFYSEADAS HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADAS GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS .* * . . ..***** * * **. **.*****. ** . .*.* *** unc-43 rCaMKII hCaMKI rCaMKI HCIQQILESIAYCHSNGIVHRDLKPENLLLASKAKGAAVKLADFGLAIEVN-DSEAWHGF HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA . * *.*... * *.*********** * . . ..****. unc-43 rCaMKII hCaMKI rCaMKI AGTPGYLSPEVLKKDPYSKPVDIWACGVILYILLVGYPPFWDEDQHRLYAQIKAGAYDYP AGTPGYLSPEVLRKDPYGKPVDLWACGVILYILLVGYPPFWDEDQHRLYQQIKARAYDFP CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD .*****..**** . ** * ** *. *** **** ***** ** .*. ** *.. unc-43 rCaMKII hCaMKI rCaMKI SPEWDTVTPEAKSLIDSMLTVNPKKRITADQALKVPWICNRERVASAIHRQDTVDCLKKF SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKF SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN ** ** .. ** * .. * ** *. .**. ***. . .* * . .** unc-43 rCaMKII hCaMKI rCaMKI NARRKLKGAILTTMIATRNLSSKRSYRLTLGAEKLVISMKNIEYWQVLLNKIFATYKIKM NARRKLKGAILTTMLATRNFSGG-----------------------------------KS FAKSKWKQAFNATAVVRHMR---------------------------------------FAKSKWKQAFNATAVVRHMR---------------------------------------*. * * * .* . . …continued …continued (overlapped) sequence comparison unc-43 rCaMKII hCaMKI rCaMKI SPEWDTVTPEAKSLIDSMLTVNPKKRITADQALKVPWICNRERVASAIHRQDTVDCLKKF SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKF SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN ** ** .. ** * .. * ** *. .**. ***. . .* * . .** unc-43 rCaMKII hCaMKI rCaMKI NARRKLKGAILTTMIATRNLSSKRSYRLTLGAEKLVISMKNIEYWQVLLNKIFATYKIKM NARRKLKGAILTTMLATRNFSGG-----------------------------------KS FAKSKWKQAFNATAVVRHMR---------------------------------------FAKSKWKQAFNATAVVRHMR---------------------------------------*. * * * .* . . unc-43 rCaMKII KQCRNLLNKKEQGPPSTIKESSESS-QTIDDNDSEKGGGQLKHENTVVRADGATGIVSSS G--G---NKKNDG----VKESSESTNTTIEDED--------------------------***. * .******. **.*.* unc-43 rCaMKII NSSTASKSSSTNLSAQKQDIVRVTQTLLDAISCKDFETYTRLCDTSMTCFEPEALGNLIE ------------TKVRKQEIIKVTEQLIEAISNGDFESYTKMCDPGMTAFEPEALGNLVE **.*..**. *..*** ***.**..** **.*********.* unc-43 rCaMKII GIEFHRFYFD--GNRKNQ-VHTTMLNPNVHIIGEDAACVAYVKLTQFLDRNGEAHTRQSQ GLDFHRFYFENLWSRNSKPVHTTILNPHIHLMGDESACIAYIRITQYLDAGGIPRTAQSE *..******. * ****.*** .*..*.. **.**...**.** * * **. unc-43 rCaMKII ESRVWSKKQGRWVCVHVHRSTQPSTNTTVSEF ETRVWHRRDGKWQIVHFHRSGAPSVLPH---*.*** .. *.* **.*** ** p rProtein o t e i n sstructure tructure basics proteins consist mostly of a-helices, b-sheets, and turns. the a-helices and b-sheets typically form the framework of the protein. the turns and other atypical structures often play important binding and catalytic roles. the core of the protein is hydrophobic, whereas the surface is usually polar or charged. most turns and kinks have glycines and prolines protein structure alpha helix protein structure three-stranded antiparallel b-sheet protein structure three-stranded antiparallel b-sheet, space filled protein structure substrate binding cleft rCaMKII rCaMKI SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKF SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN ** ** .. *** * .. .* ** *. .**.****. . . .* * . .** rCaMKII rCaMKI NARRKLKGAILTTMLATRN FAKSKWKQAFNATAVVRHM *. * * *. .* . . 316 297 sliced protein red - charged blue - polar green - hydrophobic protein structure rCaMKII rCaMKI HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADAS GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS .* * . . .****** * * ** *** .**.*****. ** . .*.* *** 119 rCaMKII rCaMKI HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA . * *.*.** * *.*********** * . . ..****. . 178 protein structure rCaMKII rCaMKI HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADAS GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS .* * . . .****** * * ** *** .**.*****. ** . .*.* *** 119 rCaMKII rCaMKI HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA . * *.*.** * *.*********** * . . ..****. . 178 protein structure rCaMKII rCaMKI HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA . * *.*.** * *.*********** * . . ..****. . 178 rCaMKII rCaMKI AGTPGYLSPEVLRKDPYGKPVDLWACGVILYILLVGYPPFWDEDQHRLYQQIKARAYDFP CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD .*****..**** . ** * ** *. *** **** ***** **.. .*..** *.* 238 protein structure protein structure prediction protein Goodsell, PDB model protein structure prediction the 3-D structure of proteins is used to understand protein function and design new drugs protein structure prediction Structural Predictions just from raw protein sequence? 1. ggcacgaggc acggctgtgc aggcacgcat gcaggccagc …. 2. atctgcacgt ggttatgctg ccggagtttg ggccgccact…. protein structure prediction 1 2 protein structure prediction 50 100 50 100 5.0 KD Hydrophobicity -5.0 10 Surface Prob. 0.0 1.2 Flexibility 0.8 1.7 Antigenic Index -1.7 CF Turns CF Alpha Helices CF Beta Sheets GOR Turns GOR Alpha Helices GOR Beta Sheets Glycosylation Sites Particular structural features can be recognised in protein sequences structure prediction Comparative modeling Modeling the structure of a protein that has a high degree of sequence identity with a protein of known structure Must be >30% identity to have reliable structure statistical methods Residue conformational preferences: Glu, Ala, Leu, Met, Gln, Lys, Arg - Val, Ile, Tyr, Cys, Trp, Phe, Thr Gly, Asn, Pro, Ser, Asp - helix strand turn Chou-Fasman algorithm: Identification of helix and sheet "nuclei" helix - 4 out of 6 residues with high helix propensity sheet - 3 out of 5 residues with high sheet propensity Propagation until termination criteria met structure prediction Threading/fold recognition Uses known fold structures to predict folds in primary sequence. inverse protein folding based on the assumption that there is limited number of structural protein classes (folds). One attempts to assign a new protein sequence to one of these classes. fold recognition/threading ...MLDTNMKTQL KAYLEKLT KPVELIATL DDSAKSAEIKELL... structure library fold recognition/threading ...MLDTNMKTQL KAYLEKLT KPVELIATL DDSAKSAEIKELL... structure prediction Ab initio Predicting structure from primary sequence data Generate as many conformations as possible, and assign an energy score to each one When the search terminates (usually when resources run out), the one with the lowest energy score is selected Usually not as robust nor practical, computationally intensive function prediction Key problem: predict the function of protein structures based on sequence and structure information Function is loosely defined, and can be thought of at many levels Atomic or molecular level Pathways level Network level Etc. Currently, relatively little progress has been made in function prediction, particularly for higher order processes function prediction Experimentation Experimentally determine the function of proteins and other structures The “gold standard” of function determination Expensive in terms of time and money current methods function prediction Annotation transfer When sequence or structure analysis yields correspondences between structures, the known properties and function of one is used to extrapolate the properties and function of the other This method has been extremely successful, but its drawbacks include [Bork et al., 1998]: Similar sequence or structure does not always imply similar function The annotated information about the “known” protein or its sequence or structure information in the database may be incomplete or incorrect Generally, only molecular functions of a protein can be inferred by analogy (i.e. not higher level functions) From a formal point of view, properties derived in this manner must be verified through experimentation current methods simulation-based analysis Simulation-based analysis tests hypotheses with in silico experiments, providing predictions to be tested by in vitro and in vivo studies. faster and more economical. Example: Folding@Home Folding@Home Simulates protein folds Folds dictate the function of the protein Unfolding was discovered by Christian Anfinsen When folds do not fold properly, it leads to diseases such as Alzheimer’s disease, Mad Cow, Parkinson’s disease If the fold of the protein is known then it can also be unfolded Folding@Home Runs on a distributed system Runs as a screensaver Downloadable at: http://folding.stanford.edu drug design structured-based drug design structured-based drug design Compound databases, Microbial broths, Plants extracts, Combinatorial Libraries Random screening synthesis 3-D ligand Databases Docking Linking or Binding Receptor-Ligand Complex Lead molecule 3-D QSAR Target Enzyme OR Receptor 3-D structure by Crystallography, NMR, electron microscopy OR Homology Modeling Testing Redesign to improve affinity, specificity etc. 3D QSAR quantitative structure activity relationships to calculate and predict charge distribution, solubility, hydrophobicity, lipophilicity active si tes drug target site Glutathione-GR drug target site DHFR multiple alignments of DHFR CLUSTAL W (1.81) multiple sequence alignment chabaudi vinckei berghei yoelii vivax falciparum -----------------------E--KAGCFSNKTFKGLGNEGGLPWKCNSVDMKHFSSV -----------AICACCKVLNSNE--KASCFSNKTFKGLGNAGGLPWKCNSVDMKHFVSV MEDLSETFDIYAICACCKVLNDDE--KVRCFNNKTFKGIGNAGVLPWKCNLIDMKYFSSV -----------AICACCKVINNNE--KSGSFNNKTFNGLGNAGMLPWKYNLVDMNYFSSV MEDLSDVFDIYAICACCKVAPTSEGTKNEPFSPRTFRGLGNKGTLPWKCNSVDMKYFSSV -------------------------KKNEVFNNYTFRGLGNKGVLPWKCNSLDMKYFCAV * *. **.*:** * **** * :**::* :* 35 47 58 47 60 35 chabaudi vinckei berghei yoelii vivax falciparum TSYVNETNYMRLKWKRDRYMEK---------NNVKLNTDGIPSVDKLQNIVVMGKASWES TSYVNENNYIRLKWKRDKYIKE---------NNVKVNTDGIPSIDKLQNIVVMGKTSWES TSYINENNYIRLKWKRDKYMEKHNLK-----NNVELNTNIISSTNNLQNIVVMGKKSWES TSYVNENNYIRLQWKRDKYMGKNNLK-----NNAELNNGELN--NNLQNVVVMGKRNWDS TTYVDESKYEKLKWKRERYLRMEASQGGGDNTSGGDNTHGGDNADKLQNVVVMGRSSWES TTYVNESKYEKLKYKRCKYLNKET----------VDNVNDMPNSKKLQNVVVMGRTNWES *:*::*.:* :*::** :*: * .:***:****: .*:* 86 98 113 100 120 85 chabaudi vinckei berghei yoelii vivax falciparum IPSKFKPLQNRINIILSRTLKKEDLAKEYN------NVIIINSVDDLFPILKCIKYYKCF IPSKFKPLENRINIILSRTLKKENLAKEYS------NVIIIKSVDELFPILKCIKYYKCF IPKKFKPLQNRINIILSRTLKKEDIVNENN--NENNNVIIIKSVDDLFPILKCTKYYKCF IPPKFKPLQNRINIILSRTLKKEDIANEDNKNNENGTVMIIKSVDDLFPILKAIKYYKCF IPKQYKPLPNRINVVLSKTLTKEDVK---------EKVFIIDSIDDLLLLLKKLKYYKCF IPKKFKPLSNRINVILSRTLKKEDFD---------EDVYIINKVEDLIVLLGKLNYYKCF ** ::*** ****::**:**.**:. * **..:::*: :* :***** 140 152 171 160 171 136 chabaudi vinckei berghei yoelii vivax falciparum I----------------------------------------------------------IIGGASVYKEFLDRNLIKKIYFTRINNAYT-----------------------------IIGGSSVYKEFLDRNLIKKIYFTRINNSYNCDVLFPEINENLFKITSISDVYYSNNTTLD IIGGSYVYKEFLDRNLIKKIYFTRINNSYN-----------------------------IIGGAQVYRECLSRNLIKQIYFTRINGAYPCDVFFPEFDESQFRVTSVSEVYNSKGTTLD I----------------------------------------------------------* 141 182 231 190 231 137 chabaudi vinckei berghei yoelii vivax falciparum ----------------FIIYSKTKE 240 --------FLVYSKVGG 240 --------- binding site analysis In the absence of a structure of target-ligand complex, it is not a trivial exercise to locate the binding site!!! This is followed by Lead optimization. lead optimisation Active site Lead Lead Optimization drug design factors affecting the affinity of a small molecule for a target protein LIGAND.wat n +PROTEIN.wat n LIGAND.PROTEIN.watp+(n+m-p) wat HYDROGEN BONDING HYDROPHOBIC EFFECT ELECTROSTATIC INTERACTIONS VAN DER WAALS INTERACTIONS STRAIN IN THE LIGAND ( BOUND) STRAIN IN THE PROTEIN difference between inhibitor and drug Extra requirement of a drug compared to an inhibitor Selectivity Less Toxicity Bioavailability Slow Clearance Reach The Target Ease Of Synthesis Low Price Slow Or No Development Of Resistance Stability Upon Storage As Tablet Or Solution Pharmacokinetic Parameters No Allergies thermodynamics of receptor -ligand binding Proteins that interact with drugs are typically enzymes or receptors. Drug may be classified as: substrates/inhibitors (for enzymes) agonists/antagonists (for receptors) Ligands for receptors normally bind via a non-covalent reversible binding. Enzyme inhibitors have a wide range of modes:non-covalent reversible,covalent reversible/irreversible or suicide inhibition. Enzymes prefer to bind transition states (reaction intermediates) and may not optimally bind substrates as part of energy used for catalysis. In contrast, inhibitors are designed to bind with higher affinity: their affi nities often exceed the corresponding substrate affinities by several orders of magnitude! Agonists are analogous to enzyme substrates: part of the binding energy may be used for signal transduction, inducing a conformation or aggregation shift. thermodynamics of receptor -ligand binding To understand ‘what forces’ are responsible for ligands binding to Receptors/Enzymes, It is worthwhile considering what forces drive protein folding – they share many common features. The observed structure of Protein is generally a consequence of the hydrophobic effect! Secondary amides form much stronger H-bonds to water than to other sec. Amides hydrophobic collapse Proteins generally bury hydrophobic residues inside the core,while exposing hydrophilic residues to the exterior Saltbridges inside Ligand building clefts in proteins often expose hydrophobic residues to solvent and may contain partially desolvated hydrophilic groups that are not paired: The desolvation penalty is paid for by favourable (hydrophobic) interaction elsewhere in the structure. docking methods Docking of ligands to proteins is a formidable problem since it entails optimization of the 6 positional degrees of freedom. Rigid vs Flexible Speed vs Reliability Manual Interactive Docking GRID based docking methods Grid Based methods GRID (Goodford, 1985, J. Med. Chem. 28:849) GREEN (Tomioka & Itai, 1994, J. Comp. Aided. Mol. Des. 8:347) MCSS (Mirankar & Karplus, 1991, Proteins, 11:29). Functional groups are placed at regularly spaced (0.30.5A) lattice points in the active site and their interaction energies are evaluated. automat ed docking methods Basic Idea is to fill the active site of the Target protein with a set of spheres. Match the centre of these spheres as good as possible with the atoms in the database of small molecules with known 3-D structures. Examples: DOCK, CAVEAT, AUTODOCK, LEGEND, ADAM, LINKOR, LUDI. drug binding pocket of L. casei DHFR predi ction & d esi gn of new dru gs Prediction of 3-D PfDHFR using bacterial DHFR and homology modeling approach. Search for the compounds using bifunctional basic groups that could form stable H-bonds in a plane with carboxyl group. Optimize the structure of small molecules and then dock them on PfDHFR model. Toyoda et. al. (1997). BBRC 235:515-519 could identify two compounds. identifying new leads These two compounds a triazinobenzimidazole & a pyridoindole were found to be active with high Ki against recombinant wild type DHFR. Thus demonstrate use of molecular modeling in malarial drug design. physiome project virtual human virtual human Simulation of complex models of cells, tissues and organs http://www.physiome.org/ physiome project “A worldwide effort to define the physiome by developing databases and models which will facilitate the understanding of the integrative functions of cells, organs and organisms.” defenition Physiome is the quantitative and integrated description of the functional behavior of the physiological state of an individual or species. physiome project main objective: “… to understand and describe the human organism, its physiology and pathophysiology quantitatively, and to use this understanding to improve human health.” physiome project Specific Objectives: 1. To develop a database with observations of physiological phenomenon and interpret these in terms of mechanism (reductionism). 2. To integrate experimental information into quantitative descriptions of the functioning of humans and other organisms (modern integrative biology glued together via modeling). 3. To disseminate experimental data and integrative models for teaching and research. physiome project Specific Objectives: 4. To foster collaboration amongst investigators worldwide, in an effort to speed up the discovery of how biological systems work. 5. To determine the most effective targets (molecules or systems) for therapy, either pharmaceutical or genomic. 6. To provide information for the design tissue-engineered, biocompatible implants. physiome project Issues being addressed: 1. Markup language -- development of SBML (in Caltech) for representing biochemical networks and CellML for electrophysiology, mechanics, energetics and general pathway. 2. Mathematical models -- development of models that are “anatomically based” and “biophysically based” to link gene, protein, cell, tissue ,organ and whole body systems physiology. physiome project Issues being addressed: 3. Web-accessible databases -- For easy data exchange, groups at MIT and UCSD are developing standards for this. Example databases: Genomic Databases, Protein Databases, Material Property Databases, Anatomical Model Databases, Clinical Databases 4. Development of new instrumentation 5. Development of Modeling tools, GUIs and webaccessible tools for visualization of complex models. physiome project 1. Microcirculation A common functional system between organs; It provides an important coupling between cells, tissues, and organs. http://www.bme.jhu.edu/news/microphys physiome project 2. Musculo-skeletal system Continues to extend the database of parameterised bone geometry to individual muscles, ligaments and tendons. a b Anatomically detailed model of Skeleton. Rendered finite element mesh for the bones and a subset of the muscles a http://www.bioeng.auckland.ac.nz/projects/nerf/skeletal.php b physiome project Computational model of the skull and torso. a b The layer of skeletal muscle is highlighted. The heart and lungs shown within the torso. a b physiome project 3. Cardiome Project An attempt to provide an integrated model of the heart, incorporating electrical activation, mechanical contraction, energy supply and utilization, cell signaling and many other biochemical processes. Heart model with a textured epidermal surface physiome project Fibrous-sheet architecture of the heart. Ribbons are drawn in the plane of the myocardial sheets a on the epicardial surface of the heart, b at midwall, and c on the endocardial surface. Note the large fibre angle changes. These fibre-sheet material axes are needed for computation of both myocardial activation and ventricular mechanics. a b heart structure c physiome project The finite element model of the right and left ventricle of the heart showing various anatomical structures. Geometric information is carried at the nodes of the finite element mesh and interpolated with cubic Hermite basis functions. heart structure physiome project Mechanics of the cardiac cycle, computed by large deformation finite element analysis, at a zero pressure state, b end-diastole, c mid-systole, d end-systole. Note the apex to base shortening and the twisting about the long axis. Also note the six generations of discretely modeled coronary vessels embedded within the myocardial elements which are used to compute coronary flow throughout the cardiac cycle. a b c ventricular mechanics d physiome project The collagenous structure of the extra-cellular myocardial tissue matrix, as revealed by confocal microscopy. The material axes used for defining mechanical and electrical constitutive laws in the continuum modeling of the myocardium are based on these microstructurally defined axes. ventricular mechanics physiome project Activation wave front computed on the finite element model using finite difference techniques based on grid points which move with the deforming myocardium. Bi-domain current conservation equations are solved with trans-membrane ionic currents. The stimulus in this case is a point on the left ventricular endocardial surface near the apex. The activation sequence is heavily influenced by the fibrous-sheet architecture of the myocardium. myocardial activation physiome project Computed flow in the coronary vasculature coronary perfusion physiome project Epicardial Fibers – FEM Model www.ccmb.jhu.edu ventricular fluid flow Endocardial Fibers – FEM Model physiome project Human Torso model has been developed which includes the heart, lungs and the layers of skeletal muscle, fat and skin. Current flow from the heart into the torso is computed in order to predict the body surface potentials arising from activation of the myocardium. physiome project 4. Lungs Development of models of the integrated function of various physical processes operating in the lung. 5. Bladder and Prostate An anatomically detailed model of the bladder and prostate is developed. 6. Circulation System A model of the circulation system is being developed based on the Visual Human Project dataset (http://www.nlm.nih.gov/research/v isible) future Development of Precision Models Simulation requires the integration of multiple hierarchies of models that have different scales and qualitative properties Some biological processes take place within milliseconds while others may take hours or days Example: Protein folding vs. Cell Mitosis future Development of Precision Models Biological processes can involve the interaction of different types of processes (i.e. biochemical networks coupled to protein transport, chromosome dynamics, cell migration or morphological changes in tissues) future Development of Precision Models Types of modeling: Using differential equations and stochastic simulation Many cell biological phenomena require calculation of structural dynamics Deformation of elastic bodies Spring-mass models and other physical processes the end