Cloud Computing and an intro to The Inudstry/University

advertisement
Director, NSF Planning I/UCRC for Spatiotemporal Thinking, Computing and
Applications
Co-Director, Center of Intelligent Spatial Computing for Water/Energy
Sciences
Associate Professor, Geography and GeoInformation Science
George Mason Univ., Fairfax, VA, 22030-4444
http://cisc.gmu.edu/
http://cpgis.gmu.edu/homepage/
Outline
Background
What is Cloud Computing
Why Cloud Computing
What are the Issues
Cloud Computing Research
Cloud Computing Future
Page
2
Background I
Background II
Background III
What if we can
• Integrate all geospatial data, information,
knowledge, processing in a few minutes
• Generate and send the right information in real time
to the people including decision makers, first
responders, victims
This dream requires a computing platform that
• can be ready in a few minutes
• can reach out to all people needed
• only cost for the amount of computing used
• won’t cost to maintain after the emergency
response
This requires spatiotemporal thinking and computing,
and was somehow envisioned by cloud computing
Cloud Computing
Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned
and released with minimal management effort or service
provider interaction. This cloud model promotes availability
and is composed of five essential characteristics.
NIST 2010
Cloud Computing
Five essential characteristics, which differentiate cloud computing
from grid computing and other distributed computing paradigms:
o On-demand self-service. provision computing capabilities as
needed automatically.
o Broad network access. available over the network and accessed
through standard mechanisms.
o Resource pooling. computing resources are pooled with
location independence
o Rapid elasticity. Capabilities can be rapidly and elastically
provisioned.
o Measured Service. automatically control and optimize resource
NIST 2010
Cloud Computing Service Model
•On-demand sharing physical
infrastructures
• Users: System Administrator
Page
8
•Platform for developing and
delivering applications,
abstracted from infrastructures
• Users: Developer
• Almost any IT services
• Users: End-user
Clouds Type
Commercial Clouds
Private/Community Clouds
Build by commercial or open-source
Solutions
Hybrid Clouds
Commercial clouds and private
clouds: EC2 Vs Eucalyptus, EC2
Vs OpenNebular
Page
9
Framework
Page
10
Why Cloud Computing
User Perspective
Economics
 Flexible price model: Pay-as –you-go
 No ongoing operational expenses
 No upfront capital
Self-Service
 Simpler and faster to use cloud service
 Minimum interaction with the service
provider
Page
11
Elasticity
 On demand scale up and down
Accessibility
 Accessed from anywhere and anytime
with any device
Why Cloud Computing
Economics
Improved Utilization
Easier for application vendors to reach new customers
Lowest cost way of delivering and supporting applications
Ability to use commodity server and storage hardware
Ability to drive down data center operational cost
Server and storage utilization increased from 10-20% to 70-80%
Page
12
What are the issues
 Many customers don’t wish to
trust their data to be in “the
cloud”
 Data must be locally retained
for regulatory reasons
 Virtualized computing power
and network
Not suitable for real-time
applications
Cannot easily switch from existing legacy applications
Equivalent cloud applications do not exist
Page
13
What are the issues
 What if something goes wrong?
 What is the true cost of
providing SLAs?
 SaaS/PaaS models are challeng
 Much lower upfront revenue
 Customers want intuitive GUI, open, standardarized, interoperable APIs
 Need to continuously add value
Page
14
Cloud Research
General
issues
Page
15
Cloud definition,
services
Management
Cloud technologies,
solutions, issues,
cost model
Cloud
migration
Web application
Big data
HPC applications
Cloud
Optimization
Future Direction
Across-Cloud implementations
Tools and middleware will be available to enable
interoperability and portability across different clouds
IaaS
PaaS
 Become
standardized and
commoditized
 Add new utilities
and PaaS
capabilities
 Battleground for
determining the
future of Cloud
Computing
SaaS
Integrate with
applications
utilizing mobile
devices and
sensors
Enabling Technology
Virtualization
World-wide
distributed storage
& file system
Page
17
Web service &
SOA, APIs
Parallel & distributed
programming model
Architecture
Virtual Machine
VIM (OpenNebula, Eucalyptus,CloudStack)
Hypervisor Hypervisor Hypervisor Hypervisor
Physical Infrastructure
Page
18
Virtual Infrastructure Middleware (VIM)

VM lifecycling

Scheduling & monitoring

Networking
Cloud Computing for GIScience
Outline
1.Background
2.Case Study 1: Web application
3.Case Study 2: Big data application
4.Conclusion
Background
Many scientific problems are concurrent, data and
computational intensive
Case 1: Web application (GEOSS
Clearinghouse)

GeoCloud I






Governmental cloud initiative
Common operating system and software suites
Deployment and management strategies
Usage and costing of Cloud services
Security (certification and accreditation)
GEOSS Clearinghouse


Metadata catalogues search facility for the
Intergovernmental Group on Earth Observation
(GEO).
EO data, services, and related resources can be
discovered and accessed.
Amazon EC2 Cloud
A “Web service that provides resizable compute capacity in the cloud”
Elastic
Block
Storage(E
BS)
Hosting of Virtual
machine
images(AMI)
EC2
Instances
XEN Virtualization
Simple
Storage
Service
(S3)
Physical Server
Hosting of Virtual
machine
images(AMI)
Deployment of GEOSS Clearinghouse on
EC2 Cloud
Performance in the EC2 Cloud
100
100
100
100
50
50
50
50
0
0
8/2
17:00
8/2
17:30
m1.small
GetCapabilities
Average Reponse Time(s)
1000
8/2
17:30
8/2
17:00
m1.large
8/2
17:30
8/2
17:00
100
100
100
50
50
50
50
8/2
17:00
8/2
17:30
m2.4xlarge
0
8/2
17:00
m2.2xlarge
8/2
17:30
0
8/2
17:00
m2.xlarge
8/2
17:30
c1medium
m1.xlarge
100
0
800
0
0
8/2
17:00
8/2
17:30
0
8/2
17:00
8/2
17:30
c1.xlarge
600
Only One Core of the VM is utilized
400
200
0
1
20
40
60
80
100
120
Concurrent Request Number
m1.small
m2.2xlarge
m1.large
m2.4xlarge
m1.xlarge
c1.medium
m2.xlarge
c1.xlarge
Lucene (used for indexing while
searching) might be the reason
behind the virtual CPUs underutilization.
0.38s : 0 record
3s: 26, 130 records
 MapReduce for indexing
Spatiotemporal indexing
Table 6. Monthly Costs of AWS services
Usage/Costs in EC2 Clouds
Usage chart from July to Nov, 2011
Monthly cost from July to Oct , 2011
Month
Total(Dollar)
(2011)
Amazon EC2
Hours
Costs
Amazon
EBS
AWS
Data
Transfer
July
113.73
320
108.80
4.64
0.01
August
278.74
758
257.72
20.99
0.03
September
267.25
720
244.80
22.4
0.06
October
276.82
744
252.96
22.21
1.64
Case 2: Big data -> Climate@Home
1 Year, 1
Scenario
Input: 150 MB
Output: 2G
Computing time per
scenario: 45 minutes
10 Year, 100
Scenario
Input: 15 G
Output: 750 G
Computing time per
scenario: 4 days and 16
hours
100 Year, 1000
Scenarios
Run on Community Clouds(NASA Eucalyptus)
Model Simulation Information
Scenario: 300 model configuration
VM: 4 – 8 (20 CPU Cores, 64 GB memory)
 Start date: Dec 1949
 End date: Jan 1961
Cloud Computing Information
 Platform: Eucalyptus
 VMs: 4 – 8 (20 CPU Cores, 64 GB memory)
Task scheduler: Condor
System CPU Utilization
Conclusion
Provides high-capacity and scalable computing,
storage and network connectivity for GIScience
applications
Create new opportunities for national,
international, state, and local partners to
leverage research easily
Acknowledgements
 Collaborators: Doug Nebert, Myra Bambacus, Yan Xu,
Daniel Fay, Karl Benedict, Songqing Chen
 Team: Qunying Huang, Kai Liu, Jizhe Xia, Zhipeng Gui,
Chen Xu, and all CISC members
I/UCRC for Spatiotemporal Thinking,
Computing, and Applications (STC)
Chaowei Yang, Director, GMU Site
Keith Clarke, Co-Director, UCSB Site
Peter Bol, Co-Director, Harvard Site
Industry/University Cooperative Research
Centers: National Scope, Impact
Academic-Industry partnerships meeting industry sector research needs
ENG CISE
59 Centers
172 I/UCRC Sites
Plus Participating
International Sites
Over 760 Member
Organizations (2010)
Purpose: Maximize the potential for a successful Center
Proposal.
Planning Grant
Step
6Step
Step
6
Proposal
Step
6 66
Step
Step
LOI
6
Planning Grant Meeting with University Partners,
Students, Center Evaluator, Prospective Members
and NSF I/UCRC Program Directors
Events Pre
Meeting
Events Occuring at the
Meeting
Events Post
Meeting
Day 1
Day 2
LOI, Planning Grant
Pending or Awarded,
what now?
Planning Meeting
Approaching…
Getting the proposal
33
ready to go!
Successful Proposal &
1st IAB Meeting
I/UCRC Planning Process
Objective
1. Capture and advance human intelligence
2. Enable and improve machine processing and
applications
3. Start from geographic science and technologies
for spatiotemporal issues and solutions
4. Expand to other domains, such as Earth
science, political science, economics, biology,
public health, energy and environment, K-16
education, and others in the future if things
went well
Target
1.
2.
3.
Improve the US and international spatiotemporal research
infrastructure base;
Advance the intellectual capacity of the future science, engineering
and workforce;
Establish the national and international leadership in
spatiotemporal thinking, computing, and applications.
Approaches
1.
2.
3.
4.
5.
Explore new solutions to our 21st century challenges, such as
natural disasters, by investigating the spatiotemporal principles
within the challenges with national and international leaders.
Advance human knowledge and intelligence by combining
spatiotemporal principles and computing thinking to form
spatiotemporal thinking as a new methodology and innovative
thinking process to enable physical and social science discoveries,
and to conduct the next generation computing.
Improve interoperability and infrastructure building using the
spatiotemporal methods formed to enable the discoverability,
accessibility, and usability of big data.
Facilitate better understanding of physical and social sciences
through phenomena simulation and visualization improved by
spatiotemporal thinking.
Developing new spatiotemporal computing products in
collaboration amongst the center’s members to establish national
and international leadership in the field, and transferring the new
technologies to companies to improve center members’ efficiency
and competitiveness.
NSF I/UCRC Typical Organization
To ensure the success and
sustainability of the center.
•University Management includes VP
for Research, Dean for COS, and GGS
Chair
•Science Advisory Committee includes
international renowned scientists from
industry, agencies, and academia
•Industry advisory board comprises
sponsor representatives
•Research programs will be dynamic
according to progress in the center life
cycle
•Each project will include a PI,
IAB/sponsor member, and students
participating in projects
•A center director assistant or
operational director will be assigned at
each site
Gray 1998
Membership and Benefits
1. Free access to R&D results worth 10+ times
by investing $50k+ each year.
2. Increase company and agency’s
competitiveness through deliverable
oriented partnership with academia and
agencies.
3. Access to student talent cultivated through
the collaborative research and development
projects.
4. Collaborate in an academia, government,
and industry environment.
The IUCRC Research Portfolio Cycle
L.I.F.E.: Level of
Interest and
Feedback
Evaluation Form
Biannual IAB Meeting
Review
Discuss
Adapt
L.I.F.E.
IAB
Portfolio
Engagement
Center Site
New Proposals
Strengths
Industry/Agenc
y Advisory
Board Needs
L.I.F.E
Review
Discuss
Adapt
Select
Biannual IAB Meeting
IAB
Portfolio
Engagement
The cooperative
process rapidly
aligns the
Center’s
Portfolio with
Member Needs
and University
strengths
Sample Projects
Advancing spatiotemporal computing to enable 21st
century geospatial sciences and applications
Experimental Plan, Industrial Relevance and Appropriateness
for the center: With the massive amount of spatiotemporal data
now available, novel, more efficient approaches for data modeling
and management are needed to enable 21st century geospatial
sciences and applications. This project aims at developing the
theoretical and technical foundations for spatiotemporal
computing with a focus on exploiting spatiotemporal principles to
build new approaches for data and scientific modeling, indexing,
search, and retrieval.
Objectives: Develop a novel approach for spatiotemporal
computing. This is a four step approach including 1) design and
implementation of data structures; 2) algorithms (e.g. indexing
methods); 3) spatiotemporal enabled optimized ontology and
reasoning methods and 4) search strategy.
Team: PIs: Dr. Yang, Dr. Clarke, Dr. Bol, and interested
members from agencies and industry, one graduate student at
each site. Dr. Rezgui will work as the manager and integrator at
the GMU site.
Sample Projects
Four
Dimensional space time visualization
of tracked movement
Objective: Better visualizing enormous quantities of tracking data
collected through innovative geospatial technology developing/using
a host of new display techniques have emerged from computer
vision, graphics and information visualization that show promise for
space-time data.
Approaches: or this research project, visualization environments
(software programs, tools, code libraries and standards) will be
combined with display environments (flat, stereo, augmented virtual
and immersive virtual) such that moving objects and fields can be
explored.
Team: PI: Keith Clarke and Michael Goodchild at UCSB, Phil Yang
at GMU, two students with one from each side; Prof. Janowitz will
coordinate the research and development from the UCSB site.
Sample Projects
Temporal Gazetteer and Place Name
Resolution Service with Temporal
Awareness
Objective: develop a new temporal gazetteer and place name
resolution service with temporal awareness that (1) compiles and
integrate data stored within existing gazetteer systems; (2) enables
new crowd-sourced gazetteer entries through a standardized
schema; and (3) provide an Application Programming Interface
(API).
Approaches: 1) Design and implement a comprehensive gazetteer
structure; 2) Integrate information from multiple existing gazetteers;
3) Build a web-based entry system to allow crowd-sourced
contributions; 4) Design and implement a conflation rule-base to
resolve duplicated entries; 5) Publish a user interface for crowdsourced quality assessment, authorized adjustment of gazetteer
entries, and iterative improvement of conflation rules; 6) Build a
temporal place name resolution service accessible through API and
an online user interface.
Team: PI: Peter K. Bol, two technical staff, Dr. Wendy Guan will
coordinate the research and development at Harvard University.
Sample Projects
Spatial Cloud Computing (SCC) Middleware
Project objectives: SCC is to develop a middleware that can
best arrange and optimize the computing resources and task
scheduling by fully considering the spatiotemporal patterns
of data, users, cloud computing resources, and geospatial
science phenomena. Such an effort would greatly help to
construct a better spatial cloud computing (SCC) platform (Yang
et al. 2011b) and geospatial cyberinfrastructure (Yang et al.,
2010a).
We will conduct extensive experiments to explore the
spatiotemporal patterns involved in the forecasting of land and
atmospheric phenomena, e.g., air quality. We will also
experiment with spatiotemporal patterns of users and computing
resources, including computing nodes, network and storage.
These experiments would provide basic guidelines on how to
design the computing platform architecture, select and arrange
the geographically distributed computing resources to handle the
computations, how to organize and store the data for fast model
initialization and output delivery.
Team: PI: Drs. Yang, Houser, and two students
Discussion
Relevance
Potential Projects
Collaboration for customized project
Download