Secure Location-Independent Autonomic Storage Architectures GR/S44501/01 February 2004 - January 2007 Graham Kirby, Alan Dearle, Ron Morrison & Stuart Norcross School of Computer Science, University of St Andrews {graham, al, ron, stuart}@dcs.st-and.ac.uk Project Aims Desirable features of a data storage system unbounded capacity zero latency & cost total reliability location independence simple interface complete security complete historical archive Aim: a storage architecture approximating above, focusing on: simple interface for end user (file system) abstracting over: user location physical devices provision of significant benefits with acceptable cost EPSRC e-Science 26/3/04 2 Potential Benefits Simplify user experience ‘home directory’ ubiquitously available, irrespective of: data highly durable no need for backup simple data sharing machines and disks physical location firewalls uniform global name space Historical views data never over-written EPSRC e-Science 26/3/04 3 Potential Hurdles to User Adoption Speed and convenience must be close enough to that of a local disk Users must be able to trust system not to allow inappropriate access to data by other users to be sufficiently reliable for serious evaluation Need viable exit strategy may require that system can reproduce effects of user’s existing backup regime Financial cost Critical mass of nodes and users required e.g. by maintaining a local copy of all data envisaged architecture relies on autonomic management of large numbers of nodes Storage overhead must be low enough incurred through replication of data EPSRC e-Science 26/3/04 4 User Control End users should deal only with very high-level configuration set broad goals regarding trade-offs (or ignore completely) task of autonomic management system to try to achieve these goals Examples of trade-offs speed of reads and writes durability consistency related to number and placement of replicas both absolute & time to converge how long before updates to shared data are visible to others? resource consumption storage, bandwidth, computation EPSRC e-Science 26/3/04 5 Control Example EPSRC e-Science 26/3/04 6 Control and Feedback Example EPSRC e-Science 26/3/04 7 Implementation Approach File system interface Replication of files or fragments abstracted by peer-to-peer overlay e.g. Tapestry Probes & gauges to monitor state of system controlled explicitly Routing to data erasure-resilient encoding Placement of data SMB or NFS publish/subscribe infrastructure e.g. Siena Autonomic management elements attempt to map user goals and probe events into suitable lowlevel actions EPSRC e-Science 26/3/04 8 Challenges Core distributed storage infrastructure appropriate replication mechanisms Autonomic management low-level policies probe & gauge infrastructure high-level views for users synthesising views from low-level events heuristics for adapting low-level policies to achieve high-level goals Evaluation simulation, local cluster, PlanetLab end-user adoption EPSRC e-Science 26/3/04 9 Conclusions Aim to design, implement and evaluate distributed storage system targeted at benefits to end-user very simple interface ubiquitously available highly durable append-only: historical views Project details http://www-systems.dcs.st-and.ac.uk/asa/ EPSRC e-Science 26/3/04 10