Managing Cloud Resources for Medical Applications P. Nowakowski, T. Bartyński, T. Gubała, D. Harężlak, M. Kasztelnik, J. Meizner, M. Bubak ACC CYFRONET AGH, Krakow, Poland 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 1 Core concept: a cloud platform for medical application services and data Install any scientific application in the cloud Developer Application Managed application Manage cloud computing and storage resources Administrator Access available applications and data in a secure manner End user Cloud infrastructure for e-science • Install/configure each application service (which we call an Atomic Service) once – then use them multiple times in different workflows; • Direct access to raw virtual machines is provided for developers, with multitudes of operating systems to choose from (IaaS solution); • Install whatever you want (root access to Cloud Virtual Machines); • The cloud platform takes over management and instantiation of Atomic Services; • Many instances of Atomic Services can be spawned simultaneously; • Large-scale computations can be delegated from the PC to the cloud/HPC via a dedicated interface; • Smart deployment: computations can be executed close to data (or the other way round). 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 2 A brief glossary ! Virtual Machine: A self-contained operating system image, registered in the Cloud framework and capable of being managed by VPH-Share mechanisms. ! Atomic service: A VPH-Share application (or a component thereof) installed on a Virtual Machine and registered with the cloud management tools for deployment. Raw OS OS VPH-Share app. (or component) External APIs Cloud host ! Atomic service instance: A running instance of an atomic service, hosted in the Cloud and capable of being directly interfaced, e.g. by the workflow management tools or VPH-Share GUIs. OS VPH-Share app. (or component) External APIs 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 3 Platform for three user groups The goal of of the platform is to manage cloud/HPC resources in support of VPH-Share applications by: • Providing a mechanism for application developers to install their applications/tools/services on the available resources • Providing a mechanism for end users (domain scientists) to execute workflows and/or standalone applications on the available resources with minimum fuss • Providing a mechanism for end users (domain scientists) to securely manage their binary data in a hybrid cloud environment • Providing administrative tools facilitating configuration and monitoring of the platform End user support Easy access to applications and binary data Developer support Tools for deploying applications and registering datasets Admin support Management of VPHShare hardware resources 12-Oct-12 Cloud Platform Interface • Manage hardware resources • Heuristically deploy services • Ensure access to applications • Keep track of binary data • Enforce common security Application Generic service Application Data Data Application Data Hybrid cloud environment (public and private resources) CGW’12, Cracow, October 22-24, 2012 4 Cloud Platform Architecture Admin Modules available in first prototype Developer Data and Compute Cloud Platform Scientist Deployed by AMS on available resources as required by WF mgmt or generic AS invoker VPH-Share Master UI AS mgmt. interface VPH-Share Tool / App. AM Service Generic AS invoker VM templates Workflow description and execution AS images 101101 101101 101101 011010 011010 011010 111011 111011 111011 Security mgmt. interface Computation UI extensions Atomic Service Instances DRI Service Data mgmt. interface Available Managed cloud datasets infrastructure Atmosphere persistence layer (internal registry) Raw OS (Linux variant) LOB Federated storage access Web Service cmd. wrapper Web Service security agent Generic VNC server Generic data retrieval Data mgmt. UI extensions Security framework LOB federated storage access Cloud stack clients HPC resource client/backend Custom AS client Remote access to Atomic Svc. UIs 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 Physical resources 5 End user’s view of the cloud platform – contd. Log into Master Interface Select Atomic Service Instantiate Atomic Service Access and use application • Atomic Services can be instantiated on demand • Once instantiated, the service can be accessed by the end user • Unused instances can be shut down by Atmosphere 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 6 The Atmosphere Management Service • receives requests from the Workflow Execution stating that a set of atomic services is required to process/produce certain data; • queries the Component Registry to determine the relevant AS and data characteristics; • collects infostructure metrics, • analyzes available data and prepares an optimal deployment plan. Atmosphere Application -- or -- 1. Application (or any other authorized entity) requests access to an Atomic Service Workflow environment Core component of the VPH-Share cloud platform, responsible for managing cloud resources and deploying Atomic Services accordingly. 3. Heuristically determine whether to recycle an existing instance or spawn a new one. Also determine which computing resources to use when instantiating additional instances (based on cost information and performance metrics obtained from monitoring data) -- or -- AIR 2. Poll AIR for data regarding this AS and the available computing resources Also called the Atmosphere Internal Registry; stores all data on cloud resources, Atomic Services and their instances. [Asynchronous process] Collect monitoring data and analyze health of the cloud infrastructure to ensure optimal deployment of application services 4. Call cloud middleware services to enforce the deployment plan Computing infrastructure (hybrid public/private cloud) End user Cloud middleware Selection of low-level middleware libraries to manage specific types of cloud sites 12-Oct-12 5. Deploy Atomic Service Instances as directed by Atmosphere CGW’12, Cracow, October 22-24, 2012 7 Deployment planning Applications are heuristically deployed on the available computing resources, with regard to the following considerations: • where to deploy atomic services (partner’s private cloud site, public cloud infrastructure or hybrid installation), • whether the data should be transferred to the site where the atomic service is deployed or the other way around, • how many instances should be started, • whether it is possible to reuse predeployed AS (instances shared among workflows) The deployment plan bases on the analysis of: • workflow and atomic service resource demands, • volume and location of input and output data, • load of available resources, • cost of acquiring resources on private and public cloud sites, • cost of using cheaper instances (whenever possible and sufficient; e.g. EC2 Spot Instances or S3 Reduced Redundancy Storage for some noncritical (temporary) data), • public cloud provider billing model 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 8 High Performance Execution Environment Provides virtualized access to high performance execution environments Seamlessly provides access to high performance computing to workflows that require more computational power than clouds can provide Deploys and extends the Application Hosting Environment – provides a set of web services to start and control applications on HPC resources Invoke the Web Service API of AHE to delegate computation to the grid Application -- or -- Present security token (obtained from authentication service) Application Hosting Environment Auxiliary component of the cloud platform, responsible for managing access to traditional (grid-based) high performance computing environments. Provides a Web Service interface for clients. AHE Web Services (WSRF::Lite) GridFTP WebDAV Tomcat container Workflow environment -- or -- End user HARC Job Submission Service (OGSA BES / Globus GRAM) RealityGrid SWS User access layer Resource client layer Delegate credentials, instantiate computing tasks, poll for execution status and retrieve results on behalf of the client Grid resources running Local Resource Manager (PBS, SGE, Loadleveler etc.) 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 9 Service-based access to high-performance computational resources Developer AHE service host (ozone.chem.ucl.ac.uk) Scientist Accessing grid resources through the AHE service frontend: 1. 2. 3. 4. 5. prepare (The end-users selects a grid application for an appropriate computational resource registered with AHE, and starts an AHE Application Instance (job)) SetDataStaging (Sets up data staging information between the grid infrastructure and the user resource) setProperty (Sets up job property) start (Initiates data transfer, executes job, checks job status and fetches result once completed) status (Polls the underlying grid infrastructure for job status) AHE service interface AHE service backend Provides RESTful access to AHE applications, enables data staging and delegation of security credentials Provides credential delegation, data staging and execution monitoring features HPC resources (National Grid Service) The AHE service interface: • Simplifies Grid Security (end user does not have to handle grid security and MyProxy configurations and generation) • Simplifies application setup on the Grid(end user does not have to compile, optimize, install and configure applications) • Simplifies basic Grid Workflow (AHE stages the data, runs and polls the job and fetches the results automatically) • Simplifies Grid access through RESTful web-services (AHE provides a RESTful interface allowing clients and other web services to access the computational infrastructure and applications in a Software as a Service (SaaS) manner). 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 10 Data Access for Large Binary Objects LOBCDER host (149.156.10.143) WebDAV servlet LOBCDER service backend Core component host (vph.cyfronet.pl) GUI-based access Resource factory Storage driver Storage driver Storage driver (SWIFT) Resource catalogue Atomic Service Instance (10.100.x.x) Mounted on local FS (e.g. via davfs2) SWIFT storage backend • • • Generic WebDAV client Data Manager Portlet (VPH-Share Master Interface component) Service payload (VPH-Share application component) External host LOBCDER (the VPH-Share federated data storage component) enables data sharing in the context of VPHShare applications The system is capable of interfacing various types of storage resources and supports SWIFT cloud storage (support for Amazon S3 is under development) LOBCDER exposes a WebDAV interface and can be accessed by any DAV-compliant client. It can also be mounted as a component of the local client filesystem using any DAV-to-FS driver (such as davfs2). 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 11 Data Reliability and Integrity • Provides a mechanism which will keep track of binary data stored in the Cloud infrastructure • Monitors data availability • Advises the cloud platform when instantiating atomic services • Shifts/replicate data between cloud sites, as required DRI Service AIR Binary data registry Validation policy End-user features (browsing, querying, direct access to data) A standalone application service, capable of autonomous operation. It periodically verifies access to any datasets submitted for validation and is capable of issuing alerts to dataset owners and system administrators in case of irregularities. Register files Get metadata Migrate LOBs Get usage stats (etc.) Configurable validation runtime (registry-driven) Amazon S3 OpenStack Swift Runtime layer Cumulus Extensible resource client layer VPH Master Int. Store and marshal data Data management portlet (with DRI management extensions) Distributed Cloud storage 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 12 Security Framework • Provides a policy-driven access system for the security framework. • Provides a solution for an open-source based access control system based on fine-grained authorization policies. • Implements Policy Enforcement, Policy Decision and Policy Management • Ensures privacy and confidentiality of eHealthcare data • Capable of expressing eHealth requirements and constraints in security policies (compliance) • Tailored to the requirements of public clouds VPH clients Application Workflow managemen t service Developer End user Administrator (or any authorized user capable of presenting a valid security token) VPH Security Framework Public internet VPH Security Framework VPH Atomic Service Instances 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 13 Authentication and authorization • • Developer Admin Scientist • Developers, admins and scientists obtain access to the cloud platform via the Master Interface UI The OpenID architecture enables the Master Interace to delegate authentication to any public identity provider (e.g. BiomedTown). Following authentication the MI obtains a secure user token containing the current user’s roles. This token is then used to authorize access to Atomic Service Instances, in accordance with their security policies. 1. User selects „Log in with BiomedTown” VPH-Share Master Int. Authentication widget Login feature Portlet BiomedTown Identity Provider 2. Open login window and delegate credentials Authentication service Users and roles 3. Validate credentials and spawn session cookie containing user token (created by the Master Interface) VPH-Share Atomic Service Instance Portlet Portlet 4. When invoking AS, pass user token along with request header Portlet 6’. Report error (HTTP/401) if not authorized Security Proxy 6’. Relay request if authorized Security Policy Service payload (VPH-Share application component) 5. Parse user token, retrieve roles and allow/deny access to the ASI according to the security policy 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 14 Handling security on the ASI level • VPH-Share Atomic Service Instance 1. Incoming request Public AS API (SOAP/REST) Exposed externally by local web server (apache2/tomcat) 2. Intercept request Security Proxy 3’, 4’ Report error Security Policy 5. Relay original request (if cleared) 6. Intercept service response 3. Decrypt and validate the digital signature with the Master Interface’s secret key. a6b72bfb5f2466512a b2700cd27ed5f84f99 1422rdiaz!developer! rdiaz,Rodrigo Diaz,rodrigo.diaz@at osresearch.eu,,SPAIN, 08018 4. If the digital signature checks out, consult the security policy to determine whether the user should be granted access on the basis of his/her assigned roles. 12-Oct-12 Service payload (VPH-Share application component) • 7. Relay response User token digital signature timestamp unique username assigned role(s) additional info • Actual application API (localhost access only) 3’, 4’. If the digital signature is invalid or if the security policy prevents access given the user’s existing roles, the Security Proxy throws a HTTP/401 (Forbidden) exception to the client. 5. Otherwise, relay the original request to the service payload. Include the user token for potential use by the service itself. 6-7. The service response is relayed to the original client. This mechanism is entirely transparent from the point of view of the person/application invoking the Atomic Service. • • • CGW’12, Cracow, October 22-24, 2012 The application API is only exposed to localhost clients Calls to Atomic Services are intercepted by the Security Proxy Each call carries a user token (passed in the request header) The user token is digitally signed to prevent forgery. This signature is validated by the Security Proxy The Security Proxy decides whether to allow or disallow the request on the basis of its internal security policy Cleared requests are forwarded to the local service instance 15 Platform Modules and Technologies WP2 Component/Module Technologies applied Cloud Resource Allocation Management Java application with Web Service (REST) interfaces, OSGi bundle hosted in a Karaf container, Camel integration framework Cloud Execution Environment Java application with Web Service (REST) interfaces, OSGi bundle hosted in a Karaf container, Nagios monitoring framework, OpenStack and Amazon EC2 cloud platforms High Performance Execution Environment Application Hosting Environment with Web Service (REST/SOAP) interfaces Data Access for Large Binary Objects Standalone application preinstalled on VPH-Share Virtual Machines; connectors for OpenStack ObjectStore and Amazon S3; GridFTP for file transfer Data Reliability and Integrity Standalone application wrapped as a VPH-Share Atomic Service, with Web Service (REST) interfaces; uses LOB tools for access to binary data Security Framework Uniform security mechanism for SOAP/REST services; Master Interface SSO enabling shell access to virtual machines 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 16 Behind the scenes: Instantiating an Atomic Service Template (1/2) • The Cloud Manager portlet enables developers to create, deploy, save and instantiate Atomic Service Instances on cloud resources. Developer OpenStack WN (10.100.x.x) VPH-Share Master Int. Atomic Service Instance Cloud Manager 7. WN hypervisor (KVM) 7. Boot VM Mounted network storage Development Mode Assigned local storage Start Atomic Service 1. Start AS Per-WN storage 6. Upload VM image to WN storage Cloud Facade (API) Core Component Host (149.156.10.143) Nova Head Node (149.156.10.131) 2. Request instantiation of Atomic Service Atmosphere AMS Atmosphere Internal Registry 4. Call Nova to instantiate selected VM 3. Get AS VM details 5. Stage AS image on WN Glance image store OpenStack (API) AS Images Comp. model MongoDB 12-Oct-12 Storage model Nova management interface CGW’12, Cracow, October 22-24, 2012 17 Behind the scenes: Instantiating an Atomic Service Template (2/2) • • Developer Atmosphere takes care of interpreting user requests and managing the underlying cloud platform. CYFRONET contributes a private cloud site for development purposes. VPH-Share Master Int. Cloud Manager Development Mode IP Wrangler host (149.156.10.131) OpenStack WN (10.100.x.x) IP Wrangler Atomic Service Instance WN hypervisor Port mapping table ASI details 15. Poll for ASI status and update view Assigned local storage 8. Report VM is booting 9. Report VM is running 13. Configure IP Wrangler to enable port forwarding Cloud Facade (API) Core Component Host (149.156.10.143) Nova Head Node (149.156.10.131) 16. Retrieve ASI status, port mappings and access credentials Atmosphere AMS Atmosphere Internal Registry MongoDB 12-Oct-12 OpenStack (API) 10. Poll Nova for VM status Comp. model 12. Register ASI as booting/running Storage model 14. Register port mappings for this ASI 11. Delegate query and relay reply CGW’12, Cracow, October 22-24, 2012 Nova management interface 18 Behind the scenes: Communicating with Atomic Service Instance OpenStack WN (10.100.x.x) IP Wrangler host (149.156.10.131) 2. Initiate interaction Developer Standard IP stack (accessible via public IP) IP Wrangler 3. Relay 4. Call ASI Port mapping table Atomic Service Instance Assigned local storage 1. Look up ASI details (including IP Wrangler IP, port mappings and access credentials, if needed) VPH-Share Master Int. Cloud Manager • • Development Mode ASI metadata • • 12-Oct-12 Note: Atomic Service Instances typically do not have public IPs The role of the IP Wrangler is to facilitate user interaction on arbitrary ports (e.g. SSH, VNC etc.) with VMs deployed on a computing cluster (such as is the case at CYFRONET) The IP Wrangler bridges communication on predetermined ports, according to the ASI configuration which is stored in AIR Web Service calls do not require nonstandard ports and are instead handled by appending data to the endpoint path CGW’12, Cracow, October 22-24, 2012 19 Behind the scenes: Saving the Instance as a new Atomic Service • • Developers are able to save existing instances as new Atomic Services. Once saved, an Atomic Service can be instantiated by clients. Developer OpenStack WN (10.100.x.x) VPH-Share Master Int. Atomic Service Instance Cloud Manager 5. WN hypervisor (KVM) Mounted network storage 5. Image selected VM (incl. user space) Development Mode Save Atomic Service AS metadata 1. Create AS from ASI Assigned local storage Per-WN storage 6. Upload VM image to Glance Cloud Facade (API) Core Component Host (149.156.10.143) Atmosphere AMS Atmosphere Internal Registry Comp. model MongoDB 12-Oct-12 Nova Head Node (149.156.10.131) 2. Request storage of Atomic Service 3. Call Nova to persist ASI 3’. Register AS as being saved. 8. Register AS as available. 4. Store VM image in Glance OpenStack (API) 7. Report success Storage model Glance image store AS Images Nova management interface CGW’12, Cracow, October 22-24, 2012 20 More information on accessing the VPHShare Infrastructure • The Master Interface is deployed at new.physiomespace.com – – – – Provides access to all VPH-Share cloud platform features Tailored for domain experts (no in-depth technical knowledge necessary) Uses OpenID authentication provided by BiomedTown Contact Piotr Nowakowski (CYF) for details regarding access and account provisioning • Further information about the project can be found at www.vphshare.eu • Make sure to check out the DICE team website at CYF (dice.cyfronet.pl/projects/VPH-Share) for further information regarding the cloud platform and practical usage examples 12-Oct-12 CGW’12, Cracow, October 22-24, 2012 21