Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing

Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation Overview  Overview of talk  Motivation  Challenge  Methods  Ensemble Dynamics  Folding@Home Evaluation  Observations  CMSC 838T – Presentation Motivation   Atomistic simulation of protein folding  understand dynamics of folding  real-time folding in full atomic detail  large-scale parallelization methods Benefits  protein folding & disease  protein self-assemble to function  proteins misfold  diseases nanotechnology    nanomachines self-assemble on the nanoscale CMSC 838T – Presentation Challenge  Difficulties  limited by current computational techniques    fastest folding in microseconds one CPU: 1ns/day, 30 years 10,000 fold computational gap   1,000 CPUs, 1 microsecond / day traditional parallelization scheme     hard to scale to a large amount of processors extremely fast communication complexity of coordination expensive supercomputers   cost time-sharing CMSC 838T – Presentation Method   ensemble dynamics  a new simulation algorithm  parallel simulation Folding@Home  heterogeneous network, Internet  large-scale distributed platform CMSC 838T – Presentation Simulation of Dynamics   free energy barrier  progress from one state to another: transition  thermal fluctuations to push system over free energy barrier previous approaches: sampling  maybe stuck in meta-stable free energy minima  expensive computational cost of sampling CMSC 838T – Presentation Ensemble Dynamics  application scenario     Algorithm     waiting time of transitions dominates total time protein folding  transition: free energy barrier crossing coupled simulations: transition coupling M independent simulations from a initial condition first simulation to cross free energy barrier  M times less to cross barrier than average time restart M simulations with the new location after transition Near linear speed up in #processors    exponential kinetics: f(t) = 1 – exp(-k t) If k * t is small, f(t) = k * t M simulations  M * f(t) = M * k * t folding events CMSC 838T – Presentation Limitations  barrier crossing probability    exponential assumptions correct transition detection  transition: free energy barrier crossing  a large variance in energy: threshold  correct detection is not guaranteed multiple possible transition  not addressed  selection of the first transition CMSC 838T – Presentation Distributed Computing   Distributed simulations  M processors for each run  simulate folding in atomic detail on each processor  restart once a crossing barrier event occurs Implementation: Folding@Home  worldwide distributed computing: Internet  started in October 2000   more than 200,000 participants 10,000 CPU-years in the first 12 months CMSC 838T – Presentation Folding@Home CMSC 838T – Presentation Folding@Home   client-server architecture  server assign jobs(work unit) to client  client sends back results after computation  ~100K data transfer between client and server why is ensemble dynamics good for Folding@Home?  CPU intensive job: a few hours, often days  connection speed: modem, good enough  suitable for Folding@Home CMSC 838T – Presentation Other@Home Work  SETI@Home    FightAids@Home    search for intelligent life outside Earth data analysis of signals find drug therapy for HIV how drugs interact with various HIV virus mutations distributed projects     Divide-and-Conquer CPU intensive jobs small pieces of data(kilobytes) transfer communication not a major concern CMSC 838T – Presentation Evaluation   Folding@Home  based on Tinker molecular dynamics code  voluntary participants worldwide, over 400,000 CPUs simulate folding and unfolding  folding rates  simulations on small proteins CMSC 838T – Presentation Folding Rates CMSC 838T – Presentation Folding & Unfolding CMSC 838T – Presentation Observations     Sampling  too expensive to run for a long timescales  waste too much time lingering in local energy minima Ensemble dynamics  speed up simulations of dynamics  biological meaning of simulations results?  results on large protein folding?  limitations: correct transition detection, transition probability Folding@Home  cheap way to achieve super computation power  huge distributed computing platform: over 400,000 CPUs  an efficient approach for CPU intensive job Complexity of problems and size of data increase rapidly  find better algorithm is preferable to buying supercomputers CMSC 838T – Presentation

Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing

Related documents

Products

Support

Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib