Director, NSF Planning I/UCRC for Spatiotemporal Thinking, Computing and Applications Co-Director, Center of Intelligent Spatial Computing for Water/Energy Sciences Associate Professor, Geography and GeoInformation Science George Mason Univ., Fairfax, VA, 22030-4444 http://cisc.gmu.edu/ http://cpgis.gmu.edu/homepage/ Outline Background What is Cloud Computing Why Cloud Computing What are the Issues Cloud Computing Research Cloud Computing Future Page 2 Background I Background II Background III What if we can • Integrate all geospatial data, information, knowledge, processing in a few minutes • Generate and send the right information in real time to the people including decision makers, first responders, victims This dream requires a computing platform that • can be ready in a few minutes • can reach out to all people needed • only cost for the amount of computing used • won’t cost to maintain after the emergency response This requires spatiotemporal thinking and computing, and was somehow envisioned by cloud computing Cloud Computing Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics. NIST 2010 Cloud Computing Five essential characteristics, which differentiate cloud computing from grid computing and other distributed computing paradigms: o On-demand self-service. provision computing capabilities as needed automatically. o Broad network access. available over the network and accessed through standard mechanisms. o Resource pooling. computing resources are pooled with location independence o Rapid elasticity. Capabilities can be rapidly and elastically provisioned. o Measured Service. automatically control and optimize resource NIST 2010 Cloud Computing Service Model •On-demand sharing physical infrastructures • Users: System Administrator Page 8 •Platform for developing and delivering applications, abstracted from infrastructures • Users: Developer • Almost any IT services • Users: End-user Clouds Type Commercial Clouds Private/Community Clouds Build by commercial or open-source Solutions Hybrid Clouds Commercial clouds and private clouds: EC2 Vs Eucalyptus, EC2 Vs OpenNebular Page 9 Framework Page 10 Why Cloud Computing User Perspective Economics Flexible price model: Pay-as –you-go No ongoing operational expenses No upfront capital Self-Service Simpler and faster to use cloud service Minimum interaction with the service provider Page 11 Elasticity On demand scale up and down Accessibility Accessed from anywhere and anytime with any device Why Cloud Computing Economics Improved Utilization Easier for application vendors to reach new customers Lowest cost way of delivering and supporting applications Ability to use commodity server and storage hardware Ability to drive down data center operational cost Server and storage utilization increased from 10-20% to 70-80% Page 12 What are the issues Many customers don’t wish to trust their data to be in “the cloud” Data must be locally retained for regulatory reasons Virtualized computing power and network Not suitable for real-time applications Cannot easily switch from existing legacy applications Equivalent cloud applications do not exist Page 13 What are the issues What if something goes wrong? What is the true cost of providing SLAs? SaaS/PaaS models are challeng Much lower upfront revenue Customers want intuitive GUI, open, standardarized, interoperable APIs Need to continuously add value Page 14 Cloud Research General issues Page 15 Cloud definition, services Management Cloud technologies, solutions, issues, cost model Cloud migration Web application Big data HPC applications Cloud Optimization Future Direction Across-Cloud implementations Tools and middleware will be available to enable interoperability and portability across different clouds IaaS PaaS Become standardized and commoditized Add new utilities and PaaS capabilities Battleground for determining the future of Cloud Computing SaaS Integrate with applications utilizing mobile devices and sensors Enabling Technology Virtualization World-wide distributed storage & file system Page 17 Web service & SOA, APIs Parallel & distributed programming model Architecture Virtual Machine VIM (OpenNebula, Eucalyptus,CloudStack) Hypervisor Hypervisor Hypervisor Hypervisor Physical Infrastructure Page 18 Virtual Infrastructure Middleware (VIM) VM lifecycling Scheduling & monitoring Networking Cloud Computing for GIScience Outline 1.Background 2.Case Study 1: Web application 3.Case Study 2: Big data application 4.Conclusion Background Many scientific problems are concurrent, data and computational intensive Case 1: Web application (GEOSS Clearinghouse) GeoCloud I Governmental cloud initiative Common operating system and software suites Deployment and management strategies Usage and costing of Cloud services Security (certification and accreditation) GEOSS Clearinghouse Metadata catalogues search facility for the Intergovernmental Group on Earth Observation (GEO). EO data, services, and related resources can be discovered and accessed. Amazon EC2 Cloud A “Web service that provides resizable compute capacity in the cloud” Elastic Block Storage(E BS) Hosting of Virtual machine images(AMI) EC2 Instances XEN Virtualization Simple Storage Service (S3) Physical Server Hosting of Virtual machine images(AMI) Deployment of GEOSS Clearinghouse on EC2 Cloud Performance in the EC2 Cloud 100 100 100 100 50 50 50 50 0 0 8/2 17:00 8/2 17:30 m1.small GetCapabilities Average Reponse Time(s) 1000 8/2 17:30 8/2 17:00 m1.large 8/2 17:30 8/2 17:00 100 100 100 50 50 50 50 8/2 17:00 8/2 17:30 m2.4xlarge 0 8/2 17:00 m2.2xlarge 8/2 17:30 0 8/2 17:00 m2.xlarge 8/2 17:30 c1medium m1.xlarge 100 0 800 0 0 8/2 17:00 8/2 17:30 0 8/2 17:00 8/2 17:30 c1.xlarge 600 Only One Core of the VM is utilized 400 200 0 1 20 40 60 80 100 120 Concurrent Request Number m1.small m2.2xlarge m1.large m2.4xlarge m1.xlarge c1.medium m2.xlarge c1.xlarge Lucene (used for indexing while searching) might be the reason behind the virtual CPUs underutilization. 0.38s : 0 record 3s: 26, 130 records MapReduce for indexing Spatiotemporal indexing Table 6. Monthly Costs of AWS services Usage/Costs in EC2 Clouds Usage chart from July to Nov, 2011 Monthly cost from July to Oct , 2011 Month Total(Dollar) (2011) Amazon EC2 Hours Costs Amazon EBS AWS Data Transfer July 113.73 320 108.80 4.64 0.01 August 278.74 758 257.72 20.99 0.03 September 267.25 720 244.80 22.4 0.06 October 276.82 744 252.96 22.21 1.64 Case 2: Big data -> Climate@Home 1 Year, 1 Scenario Input: 150 MB Output: 2G Computing time per scenario: 45 minutes 10 Year, 100 Scenario Input: 15 G Output: 750 G Computing time per scenario: 4 days and 16 hours 100 Year, 1000 Scenarios Run on Community Clouds(NASA Eucalyptus) Model Simulation Information Scenario: 300 model configuration VM: 4 – 8 (20 CPU Cores, 64 GB memory) Start date: Dec 1949 End date: Jan 1961 Cloud Computing Information Platform: Eucalyptus VMs: 4 – 8 (20 CPU Cores, 64 GB memory) Task scheduler: Condor System CPU Utilization Conclusion Provides high-capacity and scalable computing, storage and network connectivity for GIScience applications Create new opportunities for national, international, state, and local partners to leverage research easily Acknowledgements Collaborators: Doug Nebert, Myra Bambacus, Yan Xu, Daniel Fay, Karl Benedict, Songqing Chen Team: Qunying Huang, Kai Liu, Jizhe Xia, Zhipeng Gui, Chen Xu, and all CISC members I/UCRC for Spatiotemporal Thinking, Computing, and Applications (STC) Chaowei Yang, Director, GMU Site Keith Clarke, Co-Director, UCSB Site Peter Bol, Co-Director, Harvard Site Industry/University Cooperative Research Centers: National Scope, Impact Academic-Industry partnerships meeting industry sector research needs ENG CISE 59 Centers 172 I/UCRC Sites Plus Participating International Sites Over 760 Member Organizations (2010) Purpose: Maximize the potential for a successful Center Proposal. Planning Grant Step 6Step Step 6 Proposal Step 6 66 Step Step LOI 6 Planning Grant Meeting with University Partners, Students, Center Evaluator, Prospective Members and NSF I/UCRC Program Directors Events Pre Meeting Events Occuring at the Meeting Events Post Meeting Day 1 Day 2 LOI, Planning Grant Pending or Awarded, what now? Planning Meeting Approaching… Getting the proposal 33 ready to go! Successful Proposal & 1st IAB Meeting I/UCRC Planning Process Objective 1. Capture and advance human intelligence 2. Enable and improve machine processing and applications 3. Start from geographic science and technologies for spatiotemporal issues and solutions 4. Expand to other domains, such as Earth science, political science, economics, biology, public health, energy and environment, K-16 education, and others in the future if things went well Target 1. 2. 3. Improve the US and international spatiotemporal research infrastructure base; Advance the intellectual capacity of the future science, engineering and workforce; Establish the national and international leadership in spatiotemporal thinking, computing, and applications. Approaches 1. 2. 3. 4. 5. Explore new solutions to our 21st century challenges, such as natural disasters, by investigating the spatiotemporal principles within the challenges with national and international leaders. Advance human knowledge and intelligence by combining spatiotemporal principles and computing thinking to form spatiotemporal thinking as a new methodology and innovative thinking process to enable physical and social science discoveries, and to conduct the next generation computing. Improve interoperability and infrastructure building using the spatiotemporal methods formed to enable the discoverability, accessibility, and usability of big data. Facilitate better understanding of physical and social sciences through phenomena simulation and visualization improved by spatiotemporal thinking. Developing new spatiotemporal computing products in collaboration amongst the center’s members to establish national and international leadership in the field, and transferring the new technologies to companies to improve center members’ efficiency and competitiveness. NSF I/UCRC Typical Organization To ensure the success and sustainability of the center. •University Management includes VP for Research, Dean for COS, and GGS Chair •Science Advisory Committee includes international renowned scientists from industry, agencies, and academia •Industry advisory board comprises sponsor representatives •Research programs will be dynamic according to progress in the center life cycle •Each project will include a PI, IAB/sponsor member, and students participating in projects •A center director assistant or operational director will be assigned at each site Gray 1998 Membership and Benefits 1. Free access to R&D results worth 10+ times by investing $50k+ each year. 2. Increase company and agency’s competitiveness through deliverable oriented partnership with academia and agencies. 3. Access to student talent cultivated through the collaborative research and development projects. 4. Collaborate in an academia, government, and industry environment. The IUCRC Research Portfolio Cycle L.I.F.E.: Level of Interest and Feedback Evaluation Form Biannual IAB Meeting Review Discuss Adapt L.I.F.E. IAB Portfolio Engagement Center Site New Proposals Strengths Industry/Agenc y Advisory Board Needs L.I.F.E Review Discuss Adapt Select Biannual IAB Meeting IAB Portfolio Engagement The cooperative process rapidly aligns the Center’s Portfolio with Member Needs and University strengths Sample Projects Advancing spatiotemporal computing to enable 21st century geospatial sciences and applications Experimental Plan, Industrial Relevance and Appropriateness for the center: With the massive amount of spatiotemporal data now available, novel, more efficient approaches for data modeling and management are needed to enable 21st century geospatial sciences and applications. This project aims at developing the theoretical and technical foundations for spatiotemporal computing with a focus on exploiting spatiotemporal principles to build new approaches for data and scientific modeling, indexing, search, and retrieval. Objectives: Develop a novel approach for spatiotemporal computing. This is a four step approach including 1) design and implementation of data structures; 2) algorithms (e.g. indexing methods); 3) spatiotemporal enabled optimized ontology and reasoning methods and 4) search strategy. Team: PIs: Dr. Yang, Dr. Clarke, Dr. Bol, and interested members from agencies and industry, one graduate student at each site. Dr. Rezgui will work as the manager and integrator at the GMU site. Sample Projects Four Dimensional space time visualization of tracked movement Objective: Better visualizing enormous quantities of tracking data collected through innovative geospatial technology developing/using a host of new display techniques have emerged from computer vision, graphics and information visualization that show promise for space-time data. Approaches: or this research project, visualization environments (software programs, tools, code libraries and standards) will be combined with display environments (flat, stereo, augmented virtual and immersive virtual) such that moving objects and fields can be explored. Team: PI: Keith Clarke and Michael Goodchild at UCSB, Phil Yang at GMU, two students with one from each side; Prof. Janowitz will coordinate the research and development from the UCSB site. Sample Projects Temporal Gazetteer and Place Name Resolution Service with Temporal Awareness Objective: develop a new temporal gazetteer and place name resolution service with temporal awareness that (1) compiles and integrate data stored within existing gazetteer systems; (2) enables new crowd-sourced gazetteer entries through a standardized schema; and (3) provide an Application Programming Interface (API). Approaches: 1) Design and implement a comprehensive gazetteer structure; 2) Integrate information from multiple existing gazetteers; 3) Build a web-based entry system to allow crowd-sourced contributions; 4) Design and implement a conflation rule-base to resolve duplicated entries; 5) Publish a user interface for crowdsourced quality assessment, authorized adjustment of gazetteer entries, and iterative improvement of conflation rules; 6) Build a temporal place name resolution service accessible through API and an online user interface. Team: PI: Peter K. Bol, two technical staff, Dr. Wendy Guan will coordinate the research and development at Harvard University. Sample Projects Spatial Cloud Computing (SCC) Middleware Project objectives: SCC is to develop a middleware that can best arrange and optimize the computing resources and task scheduling by fully considering the spatiotemporal patterns of data, users, cloud computing resources, and geospatial science phenomena. Such an effort would greatly help to construct a better spatial cloud computing (SCC) platform (Yang et al. 2011b) and geospatial cyberinfrastructure (Yang et al., 2010a). We will conduct extensive experiments to explore the spatiotemporal patterns involved in the forecasting of land and atmospheric phenomena, e.g., air quality. We will also experiment with spatiotemporal patterns of users and computing resources, including computing nodes, network and storage. These experiments would provide basic guidelines on how to design the computing platform architecture, select and arrange the geographically distributed computing resources to handle the computations, how to organize and store the data for fast model initialization and output delivery. Team: PI: Drs. Yang, Houser, and two students Discussion Relevance Potential Projects Collaboration for customized project