Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing

advertisement
Atomistic Protein Folding Simulations
on the Submillisecond Timescale Using
Worldwide Distributed Computing
Qing Lu
CMSC 838 Presentation
Overview

Overview of talk

Motivation

Challenge

Methods

Ensemble Dynamics
 Folding@Home
Evaluation

Observations

CMSC 838T – Presentation
Motivation


Atomistic simulation of protein folding

understand dynamics of folding

real-time folding in full atomic detail

large-scale parallelization methods
Benefits

protein folding & disease

protein self-assemble to function
 proteins misfold  diseases
nanotechnology



nanomachines
self-assemble on the nanoscale
CMSC 838T – Presentation
Challenge

Difficulties

limited by current computational techniques



fastest folding in microseconds
one CPU: 1ns/day, 30 years
10,000 fold computational gap


1,000 CPUs, 1 microsecond / day
traditional parallelization scheme




hard to scale to a large amount of processors
extremely fast communication
complexity of coordination
expensive supercomputers


cost
time-sharing
CMSC 838T – Presentation
Method


ensemble dynamics

a new simulation algorithm

parallel simulation
Folding@Home

heterogeneous network, Internet

large-scale distributed platform
CMSC 838T – Presentation
Simulation of Dynamics


free energy barrier

progress from one state to another: transition

thermal fluctuations to push system over free energy barrier
previous approaches: sampling

maybe stuck in meta-stable free energy minima

expensive computational cost of sampling
CMSC 838T – Presentation
Ensemble Dynamics

application scenario




Algorithm




waiting time of transitions dominates total time
protein folding
 transition: free energy barrier crossing
coupled simulations: transition coupling
M independent simulations from a initial condition
first simulation to cross free energy barrier
 M times less to cross barrier than average time
restart M simulations with the new location after transition
Near linear speed up in #processors



exponential kinetics: f(t) = 1 – exp(-k t)
If k * t is small, f(t) = k * t
M simulations  M * f(t) = M * k * t folding events
CMSC 838T – Presentation
Limitations

barrier crossing probability



exponential assumptions
correct transition detection

transition: free energy barrier crossing

a large variance in energy: threshold

correct detection is not guaranteed
multiple possible transition

not addressed

selection of the first transition
CMSC 838T – Presentation
Distributed Computing


Distributed simulations

M processors for each run

simulate folding in atomic detail on each processor

restart once a crossing barrier event occurs
Implementation: Folding@Home

worldwide distributed computing: Internet

started in October 2000


more than 200,000 participants
10,000 CPU-years in the first 12 months
CMSC 838T – Presentation
Folding@Home
CMSC 838T – Presentation
Folding@Home


client-server architecture

server assign jobs(work unit) to client

client sends back results after computation

~100K data transfer between client and server
why is ensemble dynamics good for Folding@Home?

CPU intensive job: a few hours, often days

connection speed: modem, good enough

suitable for Folding@Home
CMSC 838T – Presentation
Other@Home Work

SETI@Home



FightAids@Home



search for intelligent life outside Earth
data analysis of signals
find drug therapy for HIV
how drugs interact with various HIV virus mutations
distributed projects




Divide-and-Conquer
CPU intensive jobs
small pieces of data(kilobytes) transfer
communication not a major concern
CMSC 838T – Presentation
Evaluation


Folding@Home

based on Tinker molecular dynamics code

voluntary participants worldwide, over 400,000 CPUs
simulate folding and unfolding

folding rates

simulations on small proteins
CMSC 838T – Presentation
Folding Rates
CMSC 838T – Presentation
Folding & Unfolding
CMSC 838T – Presentation
Observations




Sampling

too expensive to run for a long timescales

waste too much time lingering in local energy minima
Ensemble dynamics

speed up simulations of dynamics

biological meaning of simulations results?

results on large protein folding?

limitations: correct transition detection, transition probability
Folding@Home

cheap way to achieve super computation power

huge distributed computing platform: over 400,000 CPUs

an efficient approach for CPU intensive job
Complexity of problems and size of data increase rapidly

find better algorithm is preferable to buying supercomputers
CMSC 838T – Presentation
Download