Helix Nebula Big Science in the Cloud Micheál Higgins Enterprise Solutions Architecture CloudSigma A Collaboration Initiative European Commission & relevant projects User organisations Demand-side European Cloud Computing Strategy Commercial Service Providers Supply-side 2 Helix Nebula ATLAS High Energy Physics Cloud Use Genomic Assembly in the Cloud To support the computing capacity needs for the ATLAS experiment A new service to simplify large scale genome analysis; for a deeper insight into evolution and biodiversity • • • SuperSites Exploitation Platform To create an Earth Observation platform, focusing on earthquake and volcano research Scientific challenges with societal impact Sponsored by user organisations Stretch what is possible with the cloud today Helix Nebula – The European Science Cloud • Concept • Big Science needs to begin to use Public Clouds • The European GRID is aging out • Cloud-burst has a much better TCO • Avoid lock-in and Procurement issues • Federation and Identity Management • Disintermediation of Cloud Vendor Solutions • API’s • Drive Image formats (KVM, ESx, Zen, etc.) • Cost/Billing models • Etc. • Initial Membership • Demand Side: CERN, ESA and EMBL • Supply Side: CloudSigma, Atos, T-Systems, Logica, Interoute and 4 non-provider SME’s Helix Nebula – The European Science Cloud • Phase I – Flagship Proof-of-Concepts • • • • Q4 2011 to Q4 2012 CERN – LHC ATLAS jobs with Panda/Condor EMBL – De Novo (non-human) Genome with StarCluster ESA – SSEP Earth Observation Site • Environment • One-to-one tests, vendor API, vendor drive format, etc. • Success criteria was binary, no performance or cost data • CS successful in all 3 and continuing to run ATLAS live • Phase II – PoC’s with some Disintermediation • • • • Q1 2013 Many-to-Many Blue Box 0.9 – remove complexity for the Customer enStratus + Integration • Phase III – Expanded Membership and Disintermediation • More Demand and Supply Side partners • Increased Blue Box Functionality – Federated Clouds Blue Box Maturity Model • Release 0.9 – enStratus – January 2013 • • • • Basic Services Catalog Federation and Identity Management, Token pass-through Web Portal and API Translations Some Image Management and Cost reporting (not Billing) - as time allows • • • • • • Service Catalog and Cube Filtering Image Factory and Transport Billing / Payment Module Embedded Monitoring On-screen Provisioning Contextualization • • • • • • • Cluster Management plug-ins (StarCluster, SGE, G-POD, etc.) Payment gateways PaaS for Science Recipes and Golden Image Management SLA / OLA Reporting Data Movement and Open Networking Ecosystems • Release 1.0 • Post 1.0 The Blue Box 0.9 EC1/EC2 The Blue Box - Production Helix Nebula – Learnings • Not All Things are Public Cloud Suitable • This is not the GRID • Some science middleware is not cloud friendly • IP’s and UUID’s change at re-boot • The Public Cloud is commodity hardware • 2.2 to 2.3 GHz CPU is common, less in older clouds • 32, 64 and 96 Gb maximum RAM sizing – no 1 Tb servers • Caution: You can’t eat the whole physical server • Parts of De Novo are single core, massive map reduce • Many science applications do not scale horizontally, yet • Software innovation by the Vendors is required • Putting Data in the Public Cloud is still perceived as a risk • My Firewall is better than your Firewall – or is it ? • Burst utilization must be somewhat predictable • No, you may not have 65,000 large VM’s for 2 hours later today • What do you mean I have to pay for it ?!! CloudSigma • Public IaaS Provider since 2008 • Locations (tier 4+ carrier neutral datacenters) • • • • Zurich, Switzerland (Headquarters) Las Vegas, USA Amsterdam (in-work) San Palo (planned 2012) • Key Values: • • • • • • • Open Platform and Networking Constant Innovation (big secret: all SSD storgage !!) High availability and Up Time SLA’s Customer Relationships / Enterprise Architecture Team Standards Adherence (OpenNubula, Jclouds, etc.) We only sell IaaS, we partner for SaaS and PaaS The Ecosystem Concept CloudSigma Features • Granular Resources not Bundles • • • • • CPU, Cores, RAM, Disk, SSD, GPU’s, etc. all virtualized Graphic Equalizer – reboot required Allows for the tuning of the Server to the Work-load E.g. Oracle requires 1.5x to 2x Memory of Standard config HEP/HPC applications are also not Typical configurations • • • • • • • KVM Hypervisor with full Virtualization, no sniffing, no root Any x68 O/S and Application (no you can’t run Mac) Public Drives Library, Pay-per-use and Bring-your-own Open Networking – 2x 10GBs NIC’s SigmaConnect and IX – private back-haul lines Peering (e.g. Switch, Geant, etc.) No Customer Lock-in – upload and download easily • Open Architecture CloudSigma Features • In-Work for the Future • • • • • • • • All SSD Storage – only S3 is magnetic JSON API Virtualized GPU’s Virtualized H/W for Transcoding and Rendering Virtual Desk Top – Command and Control Additional Science Applications: Panda/Condor, etc. PaaS for Media and PaaS for Science The Ecosystem Concept • • • • Hold the Data and the World will come to the Cloud Low to Zero cost Data hosting More margin in CPU and RAM than in Storage Meta meta-data – joining the Databases • Known Point + Vector + distance vs. Lat / Lon • ESA EO + WHO = Mosquito outbreak predictions The Easiest Way to Understand The CS Ecosystem Concept Questions / Discussion