grid & performability Aad van Moorsel

advertisement
grid
&
performability
Aad van Moorsel
aadvanmoorsel.com
outline
to set the stage:
• what is grid?
• what is performability?
three perspectives on grid performability:
• `customer’ requirements
• system implementation
– utility computing
• associated research challenges
– focus on stochastic modeling
April 2003
Copyright Aad van Moorsel, HP Labs
page 2
what is grid?
what is performability?
April 2003
Copyright Aad van Moorsel, HP Labs
page 3
grid
for me, and in this talk:
•
•
•
middleware layer, Globus-like
shares resources
crosses boundaries
–
•
software-implemented boundaries
–
–
–
–
•
administrative domains, user domains, enterprise domains, …
flexibility in who uses what when
flexibility in what is secured against whom when
flexibility in who charges for what when
…
makes resources manageable
–
–
–
April 2003
grades of QoS
dynamic management of QoS
service level agreements, business metrics and penalties
Copyright Aad van Moorsel, HP Labs
page 4
performability
for me, and in this talk:
•
quality of service (QoS)
context:
• Meyer: metric P(T<t) where T was some random variable
• my thesis: meaningful quantitative evaluation of a system (definition 2 out
of 3)
• others: performance and reliability
• SPN models for system state, rewards or queuing networks for
performance/metric
April 2003
Copyright Aad van Moorsel, HP Labs
page 5
grid & performability
we accept the claim that grid is software that will facilitate
flexible performability management
• the software design still leaves to be desired
– automation? autonomous? autonomic?
– scaling? inter-business? security?
• but the applications will drive it in the right direction
– utility computing
– service-centric outsourcing
April 2003
Copyright Aad van Moorsel, HP Labs
page 6
grid & performability
`customer’ perspective
April 2003
Copyright Aad van Moorsel, HP Labs
page 7
business costs of owning and operating IT have
gone through the roof
April 2003
Copyright Aad van Moorsel, HP Labs
page 8
business cost of IT failures
downtime costs per hour
brokerage operations
credit card authorization
e-bay (1 outage 22 hours)
amazon.com
package shipping services
home shopping channel
catalog sales center
airline reservation center
cellular service activation
on-line network fees
ATM service fees
$6,450,000
$2,600,000
$225,000
$180,000
$150,000
$113,000
$90,000
$89,000
$41,000
$25,000
$14,000
survey of computer damages
in France, 2000
source: Dave Patterson keynote at FAST ‘02
April 2003
Copyright Aad van Moorsel, HP Labs
page 9
operational complexity: scale
courtesy of Lisa Spainhower, IBM
April 2003
Copyright Aad van Moorsel, HP Labs
page 10
operator faces heterogeneity
Content
Logic
Processes
Place content closer to
Business where it is needed
BPR
CDN
Databases
dynamic
composition
App servers
Reengineer business process
Select services for each activity
in the process dynamically
Web servers
Share a database vs.
Number of web
servers
App server
databaseNumber of app servers
Web
server
needed
Utility
Utility needed
Utility
Software create a new database
Re-index tables to
Start and stop new app
ZLE, DBMSservers
optimize queries
Servers
Hardware
April 2003
Network
Load balance transactions
load
across servers balancing
Storage
Allocate machines to
Reserve network
applications
bandwidth prior toRSVP
use
UDC/QM/SF
Assign storage devices to
workloads
Storage
Replace a failed
machine transparently
by migrating its VMs
applications
Configure buffer sizes in
device drivers to maximize
performance
QoS-based routing
decisions
Copyright Aad van Moorsel, HP Labs
management
page 11
operation faces federation needs
Content
Logic
Place content closer to
Business where it is needed
Databases
Share a database vs.
Software create a new database
Re-index tables to
optimize queries
Servers
Hardware
April 2003
Processes
Reengineer business process
Select services for each activity
in the process dynamically
App servers
Web servers
Number of app servers
needed
Number of web servers
needed
Start and stop new app
servers
Load balance transactions
across servers
Network
Storage
Allocate machines to
applications
Reserve network
bandwidth prior to use
Assign storage devices to
workloads
Replace a failed
machine transparently
by migrating its
applications
QoS-based routing
decisions
Configure buffer sizes in
device drivers to maximize
performance
Copyright Aad van Moorsel, HP Labs
page 12
customer needs
business-driven, automated operator tools
for systems with increasing
scale, heterogeneity and federation challenges
April 2003
Copyright Aad van Moorsel, HP Labs
page 13
grid & performability
system perspective (utility computing)
April 2003
Copyright Aad van Moorsel, HP Labs
page 14
twin UDCs in HP Labs
•
built the first large utility data center in
Palo Alto (US) and Bristol (UK)
–
learn what it takes to build a solution
– move HPL IT services to the UDC
•
the first virtualized data center
–
from Server, storage, networks to
energy management
– dynamically assigns applications to
resources
– customer sees resources as ‘utility’
– operator sees resources as ‘utility’
April 2003
Copyright Aad van Moorsel, HP Labs
page 15
utility computing from usage perspective
reserving resources
getting resources
flexing resources
UDC2
?
Server Cluster
UDC1
April 2003
Copyright Aad van Moorsel, HP Labs
page 16
utility computing from operator perspective
(prototype developed at HP Labs,
initially gtk2, currently migrated to gtk3)
Utility Data Center =
programmable pool
of data center resources
Grid interface
UDC
GRAM
UDC/XML
Interface
UDC GRAM =
Globus
Gatekeeper +
UDC Adapter
April 2003
Copyright Aad van Moorsel, HP Labs
page 17
title
configure
properties
April 2003
Copyright Aad van Moorsel, HP Labs
page 18
title
generate
RSL
April 2003
Copyright Aad van Moorsel, HP Labs
page 19
utility computing for operators
utility computing has great potential to improve
operations:
•
better utilization of resources
•
better tools for setting up applications
•
new business models, better accountability
but UDC is just one, high-end solution
need something that is open, extensible, uniform, …
grid based management backplane
April 2003
Copyright Aad van Moorsel, HP Labs
page 20
utility computing grid middleware
OpenView
orchestrates
IT
HP valueadd
management
leverage
Grid
OpenView command and control
management backplane:
monitoring, rich discovery, life-cycle, coordinated ‘act’, policy,
biz-impact driven adaptation, flexible secure mgmt domains
base Grid:
uniform interface, single sign-on, federation, stateful services
everything
is a Grid
service
April 2003
SLA
Copyright Aad van Moorsel, HP Labs
page 21
more automation: flexing resources
objective: increase asset utilization via resource sharing while
providing a desired quality of service for applications
approach: a statistical multiplexing technique for resource utilities
that host business applications
characteristics of business applications:
• require resources continuously
• changes in number of users and workload mix may result in:
– time varying demands
– large peak to mean ratios for demand
– future demands that are difficult to predict precisely
• customers want assurances they will get resources when needed
– for example, resource request will be satisfied with a prob. p=0.999
– i.e. 999 times out of 1000
– customers don’t always need an assurance of p=1.0
April 2003
Copyright Aad van Moorsel, HP Labs
page 22
statistical demand profiles
to guide the development of our techniques we rely on gathered
data:
–
–
–
48 servers in an HP data center
hosting business applications
each with 2 to 8 CPUs
create a statistical demand profile for each application
–
–
–
compact representation of pattern for demand
characterize “day of week” and “day of weekend” separately
• ignore weekends for the purpose of the study
characterize a “weekday” by 24 60-minute time slots
• probability mass function (pmf) gives the observed distribution for
the number of CPUs needed per slot
the profiles populate a calendar of “expected demand” for the utility
–
April 2003
enables admission control
Copyright Aad van Moorsel, HP Labs
page 23
admission control approach
•
•
•
a new application requests admission to the utility
assume we admit the new application
unfold its profile onto the utility’s calendar for a capacity
planning horizon
–
•
•
•
for example, several months into the future
characterize the calendar’s new per-slot distributions of
aggregate demand
use distributions to estimate required size of resource
pool
admit application if there are sufficient resources
April 2003
Copyright Aad van Moorsel, HP Labs
page 24
demands for a time slot t
applications
utility:
- distribution of aggregate demand is approximated by the joint pmf
- however, we must also consider correlations between application demands
April 2003
Copyright Aad van Moorsel, HP Labs
page 25
experimental design and results
•
how many CPUs are needed if applications:
–
are statically assigned their peak numbers of CPUs?
are assigned the peak number of CPUs needed on per-slot basis?
are offered assurance p that resource requests will be satisfied?
–
–
•
about the experiments:
–
include application demand correlations as measured
include 60 minute warm-up/warm-down application migration
overheads
reported estimates verified using trace driven simulation
–
–
resource access mechanism
static
peak per slot (p=1.0)
statistical multiplexing p=0.999
statistical multiplexing p=0.99
April 2003
number of CPUs required
309
275
179 (estimate)
163 (estimate)
Copyright Aad van Moorsel, HP Labs
page 26
grid & performability
modeling research perspective
April 2003
Copyright Aad van Moorsel, HP Labs
page 27
modeling issue I
the many perspectives of virtualization
virtualization enables flexibility in UDC:
1. storage area networks let applications use any
storage device
2. computing virtualization allows to assign CPUs
dynamically to customers
3. virtual LAN creates a secure private network
virtualization gives the illusion of some traditional
functionality (‘boundaries’), but implements it ‘soft’
modeling challenges: different views for different users,
dynamic changing of boundaries (performability!),
how to utilize the models contained by the software
April 2003
Copyright Aad van Moorsel, HP Labs
page 28
modeling issue II
on-line algorithms
on-line algorithms are key to conquer complexity:
• automated adaptation needs on-line algorithms
on-line algorithms come in many shapes and forms:
• days: resource scheduling
• seconds: load balancing, admission control, retries
• milliseconds: memory optimization, real-time scheduling
typical issues:
• speed of the model solution
• chose between statistical and structural models
• obtaining the right on-line data
• plug-in algorithm module need data model that fits with
operational model
April 2003
Copyright Aad van Moorsel, HP Labs
page 29
modeling issue III
how to validate large scale systems
many facets to scale:
• more and more devices
• more and more interconnected (even globally)
• increasing number of users
• multi-party and multi-ownership
• greater differences in scale: smaller devices, bigger
data centers
• amount of data collected and analysis done increases
with the scale of the systems
we have no good ways of analyzing large-scale systems:
no test beds, no reliable data, no widely accepted
modeling approaches
April 2003
Copyright Aad van Moorsel, HP Labs
page 30
modeling issue IV
how to evaluate for business metrics
the real metric of interest is euros:
• how much is the total cost of ownership
• how much am I as customer willing to pay for a service
• what penalties do I as provider accept in an SLA
• if I invest x, what is the return on IT investment
how do we model the money/QoS correlation?
April 2003
Copyright Aad van Moorsel, HP Labs
page 31
conclusion
•
adaptive/utility/autonomic computing has intrinsic need
for QoS (performability) modeling and analysis
•
the grid is believed to be the platform of choice
–
•
applications are more interesting than the middleware
challenges for stochastic modeling larger than ever in
this setting:
–
–
–
–
April 2003
virtualization
on-line algorithms
large-scale systems
business metrics
Copyright Aad van Moorsel, HP Labs
page 32
Download