BOINC: A System for Public-Resource Computing and Storage David P. Anderson University of California, Berkeley Paper overview Defines public-resource computing Introduces the BOINC platform Lists public-resource computing projects using the BOINC platform Discusses BOINC implementation Public-resource computing AKA global computing or P2P computing Combines the resources of personal computers and game consoles belonging to the general public to perform scientific computations Started with Great Internet Mersenne Prime Search (GIMPS) (1996) Distributed.net (1997) Contrast with grid computing Grid computing involves "organizationallyowned resources" Centrally managed by IT professionals Powered on most of the time Connected by high-speed links Malicious behavior handled by organization None of that is true for public-resource computing. Specific requirements of public-computing systems Redundant computing Against result falsification/fabrication Cheat-resistant accounting Support for user-configurable application graphics BOINC Berkeley Open Infrastructure for Network Computing Developed at UCB Space Science Laboratory by the SETI@home group SETI@home started in 1999 and still runs today Goals of BOINC (I) Reduce the barriers of entry to public- resource computing: A project can be run from a single computer running standard open-source software Share resources among autonomous projects: Each PC owner can join multiple projects Results in better resource utilization Goals of BOINC (II) Support diverse applications: Offer various data distribution mechanisms Support various programming languages … Reward participants: Mostly by giving them credits System must be cheating-resistant Also by offering nice graphics Great screensavers! Projects using BOINC (I) SETIi@home: search for intelligent extra-terrestrial life Predictor@home: protein behavior Folding@home: protein folding, misfolding, aggregation and related diseases Climateprediction.net: long term-climate prediction Projects using BOINC (II) Climate@home.net: long term-climate prediction CERN projects: were to use in-house PCs Einstein@home: gravitational waves UCB/Intel study of Internet resources BOINC overview (I) Projects: Have a single master URL Can involve one or more applications They may change over time Server complex of a BOINC project Centered around a relational database containing most project data Scheduling server daemons handles RPC from by clients Data server daemons manage uploads BOINC overview (II) BOINC offers tools for Creating, starting, stopping and querying projects Adding new applications, new platforms, … Creating workunits Monitoring server performance Conceived to be used by scientists, not IT professionals BOINC overview (III) Participants join by registering with web site of project and downloading the BOINC client Client can run as Screensaver (with fancy graphics) Window service (running in the background) Application (displaying results in tabular form) Describing computations and data Applications: Can have different versions for different participant platforms Consist of one or more files Workunits: Represent inputs to a computation Include parameters specifying computational step requirements Results (obvious) More details Files associated with Application versions Workunits Results are immutable Files have numerous BOINC-specific attributes Client/scheduler interactions When a client interacts with a scheduling server, it Reports completed work Gets a collection of workunits Redundant computing (I) BOING supports redundant computing Projects can specify each workunit should be executed N times If M N executions agree on a particular result, it becomes canonical Redundant computing (II) Issues: Must prevent cheaters to create quorums of fabricated results Each user can provide at most one result for each workunit Must distinguish between erroneous results and mere numerical variations Homogeneous redundancy: Same workunit only sent to clients with identical platforms Failure and backoff Must prevent server overload after a failure Everyone wants to reconnect All client/server communication uses exponential backoff after a failure Preents avalanches of requests Participant preferences User general preferences let users specify numerous parameters Including transfer limits (for participants having monthly transfer limits) and CPU duty cycles (for participants having overclocked CPUs) Credit and accounting To reward participants User community features To motivate participant emulation Platform diversity issues Maintains multiple versions of application executables Participants may also request to recompile applications before running them Anonymous platform mechanism Useful for Non standard platforms Paranoiac participants Graphics and screensavers Client software appears monolithic to BOINC participants but comprises four components: Core client: does the work Client GUI: provides a user-interface showing computation progress and letting participants quit and join projects API: report CPU usage and fraction done, handle requests to provide graphics, … Screensaver Local scheduling Decides which projects to run Goals include Maximizing resource usage Meeting result deadlines; Respecting resource share allocation among projects of each participant Ensuring some ”variety” among projects Participants want to see progress in all the projects they have joined Conclusions It works More work still needs to be done If you want to hear more David Anderson speaks about SETI@home and BOINC http://www.youtube.com/watch?v=8iSRLIKx6A