Volunteer computing - Boinc - University of California, Berkeley

advertisement
Scientific Computing in the
Consumer Digital Infrastructure
David P. Anderson
Space Sciences Lab
University of California, Berkeley
The Austin Forum
November 7, 2013
Science needs computing power
●
High-performance computing
●
High-throughput computing
–
Thousands or millions of independent jobs
–
What matters is the rate of job completion,
not the turnaround time of individual jobs
High-throughput computing
applications
●
●
●
Physical simulation
–
particle collision
–
atomic/molecular (bio, nano)
–
Earth climate system
Compute-intensive data analysis
–
particle physics (LHC)
–
Astrophysics (radio, gravitational)
–
genomics
Bio-inspired optimization
–
genetic algorithms, flocking, ant colony etc.
Approaches to HTC
●
Cluster computing
–
●
Grid computing
–
●
share clusters between organizations
Cloud computing
–
●
lots of commodity or rack-mounted PCs in a room
rent cluster nodes, e.g. Amazon EC2
Volunteer computing
–
use computers owned by consumers
The Consumer Digital Infrastructure
●
●
Computing devices
–
Desktop and laptop computers
–
Mobiles devices: tablets, smart phones
–
Game consoles
–
Set-top boxes, DVRs
–
Appliances
Commodity Internet
–
Cable, DSL, fiber to the home, cell networks
Measures of computing speed
●
Floating-point operation (FLOP)
●
GigaFLOPS (109/sec): 1 Central Processing Unit (CPU)
●
TeraFLOPS (1012/sec): 1 Graphics Processing Unit (GPU)
●
PetaFLOPS (1015/sec): 1 supercomputer
●
ExaFLOPS (1018/sec): current Holy Grail
CDI performance potential
●
●
1 billion Desktop/laptop PCs
–
CPUs: 10 ExaFLOPS
–
GPUs: 1,000 ExaFLOPS
2.5 billion smartphones
–
CPUs: 10 ExaFLOPS
Volunteer computing
●
●
Consumers donate computing capacity to
–
support science
–
be in a community
–
compete
History
–
1997: GIMPS, distributed.net
–
1999: SETI@home, Folding@home
–
2003: BOINC
Limiting factors
●
Volunteership
–
●
Study of college students [Toth 2006]
●
5% would “definitely participate”
●
10% would “possible participate”
PC availability
–
65% average availability [Kondo 2008]
–
35% of PCs are available 24/7
Other limiting factors
●
Network bandwidth (client, server)
–
●
Commodity Internet
Memory, disk usage
–
new PCs average 6 GB RAM
BOINC: middleware for volunteer
computing
●
Supported by NSF since 2002
●
Open source (LGPL)
●
Based at University of California, Berkeley
●
http://boinc.berkeley.edu
Volunteer computing with BOINC
projects
volunteers
LHC@home
CPDN
attachments
WCG
How to volunteer
Choose projects
Configure
Community
Creating a BOINC project
●
Install BOINC server software on a Linux box
●
Compile apps for Windows/Mac/Linux
●
Attract volunteers
–
develop web site
–
generate publicity
–
communicate with volunteers
Volunteer computing today
●
500,000 active computers
●
50 projects
●
15 PetaFLOPS average
Some BOINC-based projects
●
IBM World Community Grid
●
Einstein@home
●
Climateprediction.net
●
LHC@home
●
Rosetta@home
Cost
The cost of 10 TeraFLOPS for 1 year:
●
CPU cluster: $1.5M
●
Amazon EC2: $4M
–
●
5,000 small instances
Volunteer: ~ $0.1M
How BOINC works
project
home PC
get jobs
download data, executables
BOINC
client
compute
upload outputs
HTTP
BOINC
server
Issues handled by BOINC
●
Heterogeneous computers
●
Untrusted, anonymous computers
–
Result validation
●
replication, adaptive replication
●
Credit: amount of work done
●
Consumer-friendly client
Using GPUs
●
●
BOINC detects and schedules GPUs
–
NVIDIA, AMD, Intel
–
multiple/mixed GPUs
–
various language systems (CUDA, OpenCL, CAL)
Issues
–
non-preemptive GPU scheduling
–
no paging of GPU memory
Multicore apps
●
Next-generation PCs may have 100 cores
●
BOINC supports multi-core apps
–
OpenMP, MPI
–
OpenCL CPU apps
Using VM technology
●
●
●
CDI platforms:
–
85% Windows
–
7% Linux
–
7% Mac OS X
Developing and maintaining versions for
different platforms is hard
Even making a portable Linux executable is
hard
Virtual machines
application
Guest operating system
Host operating system
Virtual machines
application
Debian Linux 2.6
Windows 7
BOINC VM support
●
Create a VM image for your favorite environment
●
Create executables for that environment
VirtualBox
executive
BOINC
client
Vbox
wrapper
shared directory:
executable
input, output
files
VM instance
VM advantages
●
Develop in your favorite environment
–
●
A VM is a strong “sandbox”
–
●
No need for multiple versions
Can run untrusted applications
Free “checkpointing”
BOINC on Android
●
New GUI
●
Battery-related issues
●
Released July 2013
–
Google, Amazon App Stores
–
~50K active devices
Why hasn’t volunteer computing
gained traction?
●
“Ecosystem of projects” model
–
●
Lots of competing projects
Problems with this model
–
Creating/operating a project is too hard and risky
–
Volunteers need simplicity
–
No coherent PR; too many brands
Umbrella projects
●
One project serves many scientists
●
Examples
–
CAS@home (Chinese Academy of Science)
–
World Community Grid (IBM)
–
U. of Westminster (desktop grid)
–
Ibercivis (Spanish consortium)
Integrating BOINC
●
HTCondor (U. of Wisconsin)
–
Goal: BOINC-based back end for Open Science Grid
or any Condor pool
HTCondor node
Grid manager
BOINC GAHP
Job submission
BOINC
server
Integrating BOINC
●
HUBzero (Purdue)
–
Goal: BOINC-based back end for science portals
such as nanoHUB
Hub
BOINC
server
PCs
projects
projects
Proposal: Science@home
●
●
●
Single “brand” for volunteer computing
Volunteers register for science areas rather
than projects
How to allocate computing power?
–
Involve the HPC, scientific funding communities
Implementing Science@home
●
BOINC “account manager” architecture
BOINC
client
Science@home
projects
projects
projects
Summary
●
Volunteer computing is
–
Usable for most HTC applications
–
A path to ExaFLOPS computing
–
A way to popularize science
●
BOINC provides the software infrastructure
●
Barriers are largely organizational
Contacts
●
http://boinc.berkeley.edu
●
davea@ssl.berkeley.edu
Download