Boundary-element methods in biological science and engineering Jaydeep P. Bardhan Dept. of Electrical and Computer Engineering Northeastern University, Boston MA Four Points For Today 1. Cellular and molecular biomedical problems also need efficient simulation methods. 2. Fast BEM solvers represent an appealing approach even at the molecular scale! 1. Challenge: Persuading community to abandon beloved ad hoc fast methods for systematic ones. 2. Strategy: Systematic methods are more flexible as we add new physics and address inverse problems. Point 1: Efficient solvers are needed not only for macroscopic biomedical problems… 100 m Human body 10-1 m 10-2 m Organs/ tissues Electrical Impedance Tomography (EIT) Electrocardiography (ECG) 10-5 m 10-9 m Human cells Molecules Tumor growth Transport through blood vessel walls Human eye for keratoplasty Balsim et al. ‘10 Lowengrub et al. ‘09 Electroencephalography (EEG) Brain-computer interfaces Cochlea (ear) Peratta et al. ‘08 Hip prostheses ... And MEG Prof. Bin He, UMN Briare et al. ‘00 Sfantos et al ‘07 … but for microscopic ones as well! 100 m Human body 10-1 m 10-2 m Organs/ tissues Blood flow Quantum mechanics 10-5 m 10-9 m Human cells Molecules Biomolecule electrostatics and hydrodynamics Nanotechnology (quantum dots) Rahimian, Biros et al (2010) • Drug binding • Protein folding • Cell physiology • Molecular design Cell locomotion using flagella (sperm, bacteria) Gelbard ‘01 Cell “rolling” along tissue surface Molecular flexibility Ramia ‘91 Cell adhesion to surfaces under shear flow Gaver and Kute ‘98 King and Hammer ‘01 M. Bathe ‘08 A central molecular-scale modeling problem: water. Biology uses water to control molecular binding, protein folding, etc. Binding example is simple: Protein d=0 d=1 Basic Continuum Electrostatic Theory 100-1000 times faster than MD Protein model: o Shape: “union of spheres” (atoms) o Point charges at atom centers o Not very polarizable: = 2-4 Water model: no fixed charges Linearized PoissonBoltzmann equation o Single water: sphere of radius 1.4 Angstrom Modeling ions in solution is o Highly polarizable: = 80 In total: mixed-dielectric Poisson critical! But today’s focus is on the simpler math of “pure” water. A Boundary Integral Method For the Poisson Biomolecule Problem + + + + + +- - - Conservation law Constitutive relation + + + - - - - 1. Boundary conditions handled exactly 2. Point charges are treated exactly 3. Meshing emphasis can be placed directly on the interface Fast BEM Solvers are Essential 1. Solve Ax=b approximately using Krylov-subspace iterative methods such as GMRES: Memory growth is QUADRATIC Time is CUBIC!! 2. Compute dense matrix-vector product using O(N) method (fast multipole; tree code; precorrected FFT; FFTSVD) 3. Improve iterative convergence with preconditioning 4. For many problems, use diagonal entries! Iteration converges faster if matrix eigenvalues are “well clustered” PReplace “looks like” A-1 quadratic memory and cubic time requirements with LINEAR requirements! Application-Specific Challenges 1. “Continuum-solvent dynamics” o o 2. o o Replace water molecules with dielectric Calculate forces at each time step and integrate These lead to thousands, or even millions, of electrostatic Continuum post-processing of molecular dynamics simulations... Sample structures from explicit-water MD Compute average continuum energy from samples Some with identical dielectric 3. Electrostatic component boundaries, someanalysis with changing o Compute each atom’s interaction with every other boundaries! o Useful in drug design and protein engineering! Community’s Solution: Fast Ad Hoc Models Up to 100X faster than solving Poisson Define effective (nonphysical) parameters A given charge q in complex molecule gives rise to an energy E Find the radius R of a sphere that would have the same energy given a central charge Plug in to ad hoc (nonphysical) formula Distance between charges Effective radii Generalized Born theory Can give blatantly unphysical results … … exhibits incorrect dependence on dielectric constants… … needs all manner of handwaving justifications for improvements … … is VERY, VERY popular. BIBEE: A New, Rigorous Model of Continuum Electrostatics for Proteins “Boundary Integral Based Electrostatics Estimation” • Idea: Use preconditioner to approximate inverse No need to compute sparsified operator (saves time and memory) No need for Krylov solve • Test of elementary charges in a 20-Angstrom sphere: Single +1 charge +1, -1 charges 3 A apart BIBEE: Introducing Different Variants The preconditioning approximation takes into account the singular character of the electric-field kernel: The Coulomb-field approximation ignores the operator entirely: CFA seems better here… …and worse here. BIBEE: Natural, Rigorous Generalized Born + + BIBEE approx. charge includes all contributions R1 “Effective Born radius” - the radius of a sphere with the same solvation R2energy R3 Coulomb-field approximation: corresponds exactly to ignoring Stillthe equation: theoperator. basis of totally integral nonphysical Generalized Born (GB) models BIBEE/CFA is the extension of CFA to multiple charges! No ad hoc parameters, no heuristic interpolation Same approach taken by Borgis et al. in variational CFA Complementary Regimes of Accuracy • Small molecule’s reaction potential matrix and eigendecomposition (not the integral operator) • Top right: the electric fields induced by several eigenvectors of L, at the dielectric boundary • Charge distributions that generate uniform displacement fields are “like” low-order multipoles: CFA does well here and P does poorly • Small eigenvalues are associated with charge distributions that generate rapidly varying displacement fields; these are approximated well by P, not CFA V1 V2 V20 Goal: Make Fast Models More Rigorous Many advantages for chemists/biophysicists: 1. Enables systematic model improvement 2. Prove approximation properties 3. Leverage existing fast, scalable algorithms 4. Can add better physics as we learn them 5. Natural coupling to inverse problems We really want to approximate the dominant modes of the integral operator. The integral operator has to be split into two terms A hundred years of analysis • Eigenvalues are real in [-1/2,+1/2) • -1/2 is always an EV • Left, right eigenvectors of -1/2 are constants -1/10 -1/6 -1/2 BIBEE approximates E’s eigenvalues o P uses 0 (limit for sphere, prolate spheroid) o CFA uses -1/2 (known extremal) Sphere: analytical i Mathematical Rigor Enables Systematic Improvements Snapshots from MD Many parameters and ad hoc correction terms -1/10 -1/6 -1/2 i Dominant energies come from dominant modes: try to capture dipole/quadrupole modes approximately! BIBEE fluctuations track actual ones very closely – possible applications in uncertainty quantification Mean absolute error: 4% ! • This effective parameter is expected to be rigorously determined by approximating protein as ellipsoid (Onufriev+Sigalov, ‘06) Bardhan+Knepley, J. Chem. Phys. (in press) Goal: Make Fast Models More Rigorous Many advantages for chemists/biophysicists: Enables systematic model improvement 2. Prove approximation properties 3. Leverage existing fast, scalable algorithms 4. Can add better physics as we learn them 5. Natural coupling to inverse problems BIBEE/CFA Energy Is a Provable Upper Bound Feig et al. test set, > 600 proteins BIBEE/P is an effective lower bound, provable in some cases but not all Another variant (BIBEE/LB) is a provable LB but too loose to be useful Bardhan, Knepley, Anitescu (2009) The Reaction-Potential Matrix A weighted combination of charge distributions in the solute molecule produces a weighted combination of the individual responses: The “canonical” basis is the natural, atom-based point of view We can also use the eigenvector basis for analysis! In comparing models we don’t just have to use the total electrostatic solvation free energy Reaction-Potential Operator Eigenvectors Have Physical Meaning • Eigenvectors from distinct eigenvalues are orthogonal • Eigenvectors correspond to charge distributions that do not interact via solvent polarization (this confuses chemists) • If an approximate method generates a solvation matrix its eigenvectors should “line up” well with the actual eigenvectors, i.e. i=j , BIBEE in Separable Geometries For half-spaces, spheres, ellipsoids, BIBEE exactly reproduces actual eigenvectors. Proof for spheres, ellipsoids: use appropriate harmonics Question for future: What about near separable geometries? Bardhan and Knepley, 2011 BIBEE Is An Accurate, Parameter-Free Model Snapshots from MD Peptide example Met-enkephalin BIBEE’s stronger “diagonal” appearance indicates superior reproduction of the All models essentially the same here. eigenvectors of the look operator. SGB/CFA GBMV BIBEE/CFA Goal: Make Fast Models More Rigorous Many advantages for chemists/biophysicists: Enables systematic model improvement Prove approximation properties 3. Leverage existing fast, scalable algorithms 4. Can add better physics as we learn them 5. Natural coupling to inverse problems Pre-corrected FFT Algorithm • Potential calculation is a convolution. • Convolutions are “cheap” in frequency space • Green’s function independent! (Laplace, Helmholtz, Stokes, etc.) 1. 2. 3. 4. Project charges to grid Point-wise multiplication in frequency space Interpolate grid potentials “Pre-correct” so that local interactions are accurate Proteins Kuo, Altman, Bardhan, Tidor, White (2002) Circuit Simulation Aerodynamics Bioelectromagnetics Willis, Peraire, White Cadence Design Systems Phillips and White (1997) Higher-order Protein BEM A geometry representative of a protein: The barnase-barstar protein complex: 27 Bardhan + Altman et al., 2007 Altman + Bardhan, White, Tidor 2009 Develop scalable protein simulations with leaders in parallel computing + FMM 760-node GPU cluster Degima Parallel GPU FMM code Picture courtesy T. Hamada Cost of cluster: ~ US $420,000 Sustained: 34.6 Tflops Application to proteins with PetFMM code of Yokota, Cruz, Barba, Knepley, Hamada Performance/price: 80 Mflops/$ Scalable algorithms enable bigger science Lysozyme: ~2K atom charges, ~15K surface charges 1000 copies 800 Å 1000 lysozyme molecules: model of a concentrated protein solution 1 copy 100 copies 10 copies “How do proteins work in the crowded environment of the cell?” Yokota, Bardhan, et al. 2009 Goal: Make Fast Models More Rigorous Many advantages for chemists/biophysicists: Enables systematic model improvement Prove approximation properties Leverage existing fast, scalable algorithms 4. Can add better physics as we learn them 5. Natural coupling to inverse problems We are still adding physics to our models. “Classical” modeling: one can assume the model is right! Circuit simulation: Maxwell equations Solid mechanics: elasticity Airplane simulation: Navier-Stokes These are just the models associated with the molecular scale!! Speed CAD tools All simulate same thing! Accuracy Bio-modeling: “All models are wrong, some are useful”* Diverse set of flawed models. To avoid flaws, use expert insight. New models are always evolving! We have to connect multiple models (uncertainty quantification). --George Box Adding physics to the continuum model using nonlocal dielectric theory KNOWN weaknesses of Poisson model: 1. Linear response assumption Nonlinear dielectrics ARE important for some molecules! Test Caveat: with allatom molecular 2. Violates continuum-length-scale assumption dynamics Oxygen Relatively small deviation! Lone pair electrons Hydrogen bonds y=x denotes exactly Water molecules have finite size linear response Hydrogens Water molecules form semi-structured networks Nina, Beglov, Roux ‘97 Nonlocal Continuum Electrostatics: Lorentzian Model and Promising Tests Nonlocal response: Now Integrodifferential Poisson equation A. Hildebrandt et al. 2004 Green’s function for Single parameter fit for gives much better agreement with experiment!! Nonlocal Continuum Electrostatics: Reformulation for Fast Simulations Integrodifferential equations in complex geometries? Result: No progress on nonlocal model for DECADES Spherical ions, charges near planar half-spaces… nothing else. Breakthrough in 2004 (Hildebrandt et al.): 1. 2. 3. Define an auxiliary field: the displacement potential Molecular surface “Licorice” “Cartoon” Approximate the nonlocal boundary condition Double reciprocity leads to a boundary-integral method Nonlocal Continuum Electrostatics: Purely BIE Formulation Three surface variables, two types of Green’s functions, and a mixed first-second kind problem The derivation uses double reciprocity theory, which can be applied to nonlinear problems as well! Have derived exact solution for charges in a sphere Hildebrandt et al. 2005, 2007 Just as fast, but now with better physics! Unoptimized code still allows a laptop to solve 10X larger problems than is possible on a cluster with dense methods Current work: comparing to molecular dynamics simulations Required accuracy Dense methods used previously could not achieve useful accuracy! Local model Nonlocal model Bardhan and Hildebrandt, DAC ‘11 Nonlocal Continuum Electrostatics: Charge Burial and the pKa Problem Understanding charge burial energetics is important! o For protein folding, misfolding (Alzheimer’s), etc. o For two molecules binding (drug-protein, protein-protein, etc.) o For change in environment (pH, temperature, concentration, etc.) Ion or charged chemical group, alone in water Local theory needs unrealistically large dielectric constants to match experiment! 3 2 Error in pKa value (RMSD) 1 Ion or charged chemical group, buried in protein 0 Measured protein dielectric constants suggest = 2-5 5 20 Demchuk+Wade, 1996 40 60 80 Nonlocal Continuum Electrostatics: Charge Burial and the pKa Problem Nonlocal theory with realistic dielectric constant predicts similar energies as (widely successful) local theories with unrealistic dielectric constants! Bardhan, J. Chem. Phys. (in press) A Common Framework for Multiple Models GB-like fast nonlocal BIBEE provides a unifying, approximateapproach model scalable to testing and Fast GB-like nonlinear extending new of physics. Analytical solution approximations nonlocal model for sphere Improved GB models Explain Coulombfield approx. Linearized PB models Biomolecular complexes Full nonlinear PB via boundary-integrals Coupling to fast, scalable algorithms Advanced PB models (Bikerman, etc.) Dynamics: hybrid explicit/implicit, and fully implicit Popular quantum methods couple to exactly our Poisson problem (“polarizable continuum model”) Protein 1 Protein 2 Goal: Make Fast Models More Rigorous Many advantages for chemists/biophysicists: Enables systematic model improvement Prove approximation properties Leverage existing fast, scalable algorithms Can add better physics as we learn them 5. Natural coupling to inverse problems The Value of Systematic Approximations in Inverse Problems: Biomolecule Design The electrostatic contribution to binding is A total of three simulations is needed. Electrostatic Optimization of Biomolecules: Applications in Analysis and Design E. coli chorismate mutase inhibitors: o Analyzed by Kangas and Tidor o Suggested substitution experimentally verified: result is the tightest-binding inhibitor yet known Barnase/barstar protein complex: o Tight-binding complex o Optimal charge distribution closely matches “wild-type” charge distribution Mandal and Hilvert, 2003 Lee and Tidor, 2001 Challenge: Optimization is SLOW. 10 min/simulation * 2000 simulations (protein) = 2 CPU A Novel Method: The Reverse-Schur Approach For these PDE constraints, we really only need to solve multiple systems simultaneously: The unconstrained problem is therefore Constraints can be handled using standard methods (Lagrange multipliers, etc.) New Approach is Dozens to Hundreds of Times Faster, but Formally Exact Formally exact calculation Method scales comparably with normal PDE-constrained approaches 10 min/simulation = 20 min/optimization (no matter how many charges!) Bardhan et al., 2004; Bardhan et al., 2005; Bardhan et al., 2007; Bardhan et al., 2009 BIBEE as Inverse Problem Regularizer Regularization can be performed using “approximate” penalty functions: Approximated eigenvectors closely match actual ones No linear solve: Accurate but 10-20X faster than simulation!! BIBEE/P captures small eigenvalues very accurately identify number of directions to penalize Single +1 charge +1, -1 charges 3 A apart Application: Cyclin-Dependent Kinase 2 and Inhibitor PDE-constrained optimization is almost 200 times faster for this small molecule Red: Optimized charge values Blue: “Wild-type” charges (from 6-31G*/RESP) Anderson, et al. 2003 (not exactly the optimized ligand) Bardhan et al., J. Chem. Theory Comput. (2009) Summary: Pushing On All Dimensions 1. Fast, Scalable Numerical Methods 2. Add Realism But Preserve Speed 3. Solve Inverse Problems in Design 4. Unify Theories For New Science Four Points For Today 1. Cellular and molecular biomedical problems also need efficient simulation methods. 2. Fast BEM solvers represent an appealing approach even at the molecular scale! 1. Challenge: Persuading community to abandon beloved ad hoc fast methods for systematic ones. 2. Strategy: Systematic methods are more flexible as we add new physics and address inverse problems. Collaborators and Acknowledgments Fast methods: Michael Altman (Merck), Matt Knepley (U. Chicago), Rio Yokota (King Abdullah University of Science and Technology), Lorena Barba (Boston U.), Tsuyoshi Hamada (Nagasaki U.) Nonlocal continuum theory: Andreas Hildebrandt (Johannes Gutenberg U., Mainz), Peter Brune, David Green (SUNY Stony Brook) Fast optimization: Michael Altman, Bruce Tidor (MIT), Jacob White (MIT), Jung Hoon Lee (Merck), Sven Leyffer (Argonne) , Steve Benson (Argonne), David Green, Mala Radhakrishnan (Wellesley) Approximation method: Matt Knepley, Mihai Anitescu (Argonne), Mala Radhakrishnan Support from: 1. 2. 3. 4. Department of Energy (DOE) Computational Science Graduate Fellowship (CSGF) Wilkinson Fellowship in Math and Computer Science Division of Argonne National Lab NIH Technology Development (EUREKA) Rush New Investigator Award