VMware vCenter Operations Standard
Real-time Performance Management for VMware Administrators
Technical Presentation
© 2010 VMware Inc. All rights reserved
Why vCenter Operations Standard?
 80% of VMware admin time spent
isolating performance problems
• “1st generation” green-yellow-red static
threshold reporting insufficient and too
complex to use
• Point solutions only address a subset of issues
 VMware administrators have two
conflicting goals
• Maximize ROI by increasing VM density
• Ensure required capacity for business growth
and other changes in real-time
• Ensure that virtual component performance
supports required application performance
2
VMware vCenter Operations Standard Basics
 Clear and quick way to identify VMware performance problems
 Easy to use for VMware Administrators
• Deeply integrated as a vCenter pane
• Intuitive screens guide users to issues needing attention
• Automatically collects data from vCenter
• Time-series performance data, topological relationships and configuration change
events
 VMware vCenter Operations Standard business benefits
• Increased performance for end users of business applications and services
• Reduced infrastructure costs through increased VM to ESX density
• Reduced VM administration costs and optimized VMware admin productivity
3
VMware Named 2011 Best of Interop Grand Prize Winner
vCenter Operations Standard 1.0 named Best
of Interop 2011 out of 135 award entries
vCenter Operations Standard 1.0 honored as
the winner in Cloud Computing and
Virtualization
What the judges said…
• 16 editors and analysts from InformationWeek
Analytics
“VMware's vCenter Operations is another one of those
products where you can see an ambitious company testing the
limits… It is a bold effort to combine the data center disciplines
of system configuration, performance management, and
capacity management into one management tool and apply
them to what in the future will be referred to as the private
cloud.”
Charlie Babcock, Editor at Large
"VMware and each of the category winners represent market
innovation and deserve recognition for helping build the
energy and growth of today's IT marketplace."
Art Wittmann, Vice President & Director
4
"We are pleased to have Interop be the platform for
bringing together the latest advancements in business
technology and we congratulate VMware for their
innovation in the industry.”
Lenny Heymann, Interop General Manager
Understanding your Virtual Environment - Workload
 Workload Measures
• Demand for resources vs. Resources currently used
• Result is a percentage of Workload
• Low number is Good – Object has the resources it needs
• Can go above 100% - Object is “Starving”
 Workload summarized across critical resources
• CPU
• Network I/O
• Storage I/O
• Memory (VM and ESX Allocation)
 Workload Details View
• Detailed understanding of the lacking resource and associated metrics
• View the state of the Peer and Parent Objects and troubleshoot
• Am I a victim or a villain?
• A Configuration issue?
• Is this a population problem?
• Lack of resources?
• Should we move the VM?
• Virtual infrastructure is fine. OS or application
issue?
5
Understanding your Virtual Environment - Health
 Health Measures
• How normal is this object behaving: 0-100 (Higher is Healthier)
• Learns dynamic ranges of “Normal” for each metric
• Learns patterns of behavior and identifies metric abnormalities
• Lower the health the more abnormalities
 Once a virtual element Health problem is identified
• Single screen provides details on problem based on behavioral
understanding of the element
• Points to the Root Cause metrics to help you troubleshoot
• Eliminates 100s of clicks and memorization of many metric
behaviors that 1st generation monitoring tools require
 Health and Workload together tell you a lot
• Workload High & Health High – Normal Behavior for this timeframe
• Workload High & Health Low – Something is amiss!
6
Important Note
Low Health does not
imply a problem. It
tells you that the
object is acting
differently than
normal.
Understanding your Virtual Environment - Capacity
 Capacity Measures
• How much time do you have left before a object runs out of resources?
• Based on a 0-100 scale – Higher the number the longer you have
• Thresholds User Configurable
• 30 Days Left = RED
• 60 Days Left = Orange
• Etc.
 Capacity measured for critical resources
• CPU
• Network I/O
• Storage I/O
• Memory
 Capacity Details View
• Shows the chart and trend for each of the above resources
• Denotes current state
• Projected breach point and days left
7
Business Benefits
8
Increased Visibility
• Lack of holistic VC environment view
• Single pane of glass
• Can’t determine state of all elements
• All VC data contextually consolidated
(clusters, hosts, guests) at once
• Overwhelming details obscure valuable
One click to any detail
• Filters on “all” “normal” and “problem”
Searches on any string.
information.
BEFORE
AFTER
• Visibility, comprehension of virtualized environment in one screen
• Better product usability
• Visually isolate problems via a “HUD” for vCenter
• Unnecessary details hidden until necessary.
9
Reduced Complexity
• Administrators blind to brewing
• Provide a single measure of
normality across all virtualized
elements – Health
problems
• Too much data, too many clicks
• Preset thresholds, many details
• Automatically aggregate, correlate
• Impossible to understand health of
elements
states of 100s of metrics into two
scores for each element – Health
and Workload
BEFORE
• Reduce complexity of usage
• Remove guesswork, provide clarity into the environment
• Speed up MTTR
• Enable administrators to do more with less.
10
AFTER
Slide 10
Understand Normal Metric Behavior
• Unable to understand normal range of
metrics
• Is 65% usage normal for an hour,
day, week or month?
• Visibility into normal operation of
every metric in VC
Slide 11
• Continuous, automatic learning of
normal behavior
• Or, is it the beginning of a problem?
BEFORE
AFTER
• Understand metric behavior based on history
• Project forward future behavior hours or days in advance
• Remove guess work and confusion, clarify expectations
• Equivalent of 10 people watching, measuring and adjusting system constantly.
11
Workload Optimization
• VC unaware of affinities and workload
profiles of all VMs
• Only understands raw resource
• Calculates and stores workload
profile of each ESX
• Increase density by matching
opposite VM behaviors on an ESX
consumption
• Ensure smooth, consistent use of
resources
BEFORE
• Increase density of VMs per ESX
• Optimize use of resources
• Consistent and maximized ESX workloads
12
AFTER
Slide 12
Understand Impact of Change
• Change is common and necessary in
VM environments
• Change can lead to degradation in
• Changes and events mashed on
health chart for every element
• Easier to see impact of change and
before and after performance
performance
BEFORE
AFTER
• Immediate visibility into impact of change
• Visual correlation to component's health
• Admin can immediately determine if change had positive (expected) or
negative (unexpected) effect on the element
13
Slide 13
Multidimensional Analysis
• Which of my many Hosts have high
levels of CPU Ready contention but low
memory usage?
BEFORE
• Slice, dice, visualize entire
environment by any of 100s of VCcollected metrics
AFTER
• Full Business Intelligence like capabilities
• Slice and dice historical collected data across any dimension
• Visualize results in heat maps, single click drill down to resource details.
14
Slide 14
Screenshots
15
Performance dashboard based on self-learning analytics
Visualize environment
performance in three
unique dimensions
Simple, actionable
scores that indicate
overall performance
Highlights resources that
are deviating from
“normal” behaviour
16
Get “At-a-glance” insights into performance issues
Performance
scores
“Details” for further
analysis
Visualize impact
17
Drill down into problem source
Key metrics of interest based
on continuous learning of
“normal” behavior
Stress caused by net I/O
Quickly identify
problem source
18
Correlate cause-and-effect of the problem
Check health of
related objects
in the hierarchy
Correlate events
that occurred at
the same time
19
Deep Dive into Disk and Network IO performance
Disk subsystem
performance details by
datastores and LUNs
20
Network
statistics for
every NIC
Identify and isolate KPI metrics
Quickly identify
“suspect”
performance
metric
KPI history with
timestamp to
indicate root
cause
21
Anticipate Capacity Issues Before They Happen
Proactive
warning related
to capacity
shortfall
Correlated
workload metrics
forecast a
potential breach
22
Project forward
future issues
hours or days in
advance
Opportunities to remediate
Move VMs to
another host?
This host looks
healthy…
23
This host seems
to be overloaded!
Individual performance metric details
Single view that
correlates
multiple metrics
Detailed list of
all metrics
indicating smart
alerts
24
vCenter Operations Architecture,
Process and Deployment
25
vCenter Operations Standard Architecture
 Four Main Services:
Collector, Analytics,
Web, ActiveMQ
 Architecture includes
 PostgresSQL DB
 File-based DB
(FSDB) for raw
metric storage
 Single Collector for
vCenter embedded in
appliance
26
vCenter Operations Standard Processing
1a: vCenter Collector collects
metrics, topology & change
events from vCenter
3: Incoming data points are
tested against Dynamic
Threshold bands and used
2a:toAnalytics
daily to
calculateruns
Health,
determineand
hour-by-hour
Workload
Capacity
Dynamic Thresholds for
next 24 hours
4: Results provided
to UI: Update
“Badges”, provide
Root Cause for
Health scores, etc.
- Ongoing -
27
1b:isData
2b: Full FSDB
scanned
stored
in
by the analytic
algorithms
to determineFSDB
per metric
best match the next 24
hour period
2c: Store metric
Dynamic
Thresholds data in
PostgresSQL DB
VMware vCenter Operations Standard - Deployment
 One vCenter Operations Standard per vCenter instance
 For VMware environments of 1500 or fewer Virtual Machines
 vCenter Operations Standard is a virtual appliance (.ova)
• SUSE Linux Enterprise Server 11 SP1
• 8GB RAM
• 2 vCPUs
• 124 GB Disk (4 GB system disk + 120 GB data disk)
 Supported Systems
• ESX host where the appliances is deployed to must be 4.0 U2 and above
• 4.1 is recommended
• vCenter
• vCenter 4.0U2
• vCenter 4.1 – Preferred as more data is available
28
VMware vCenter Operations Standard - Deployment
 Simplified implementation – 15 mins
• Deploy the appliance – Deploy OVF Template
• Change passwords and set Timezone
• Set up network configurations (Optional)
• Connect to vCenter Server
• IP, Admin User Name, Admin Password, Collector User Name, Collector Password
• Apply your license
 Polling and analytics start automatically
• Polling set to every 5 mins
 Accessing the UI
• Supported browsers include: Internet Explorer 7 or 8, or Firefox 3.6.x
• Internet Explorer 7 is required on the machine where vSphere Client runs
29
vCenter Operations Editions
30
VMware vCenter Operations Editions
vCenter Operations Enterprise
+ Full Configuration & Compliance
Management
vCenter Operations Advanced
vCenter Operations Standard
+ Capacity
Planning
+ Other VMware & 3rd Party Integrations
(View, management, servers, storage)
Performance
Real-time
Capacity
Configuration
Change
vSphere
VMware Cloud / vCenter
31
Non-VMware (incl. physical) environments
Understanding the vCenter Operations Editions
Function
Scope
vCenter Operations Standard
Edition
32
vCenter Operations Enterprise
- Stand-Alone
Data Sources
vCenter x 1
• Any 3rd party monitoring tools’
time series data
• Change events
• Multiple vCenter Servers
Objects
vCenter Objects (i.e.)
• Data Centers
• Clusters
• ESX Hosts
• Datastores
• VMs x 1500
Unlimited Scope (i.e.)
• Applications
• Network Infrastructure
• Storage
• Hosts (ESX, Win, Linux, etc)
• VMs
Users
Infrastructure (e.g. VI Admins)
Operations, Infrastructure,
Application Teams, Business
Owners, CxOs
Dynamic Thresholds
Yes
Yes
Performance Root Cause
Yes
Yes
Proactive Alerting
No
Yes
Customizable Dashboards
No
Yes
Notifications
No
Yes
Demo
33
Questions
34