Advanced Management Technologies for Exchange 5.5 (PowerPoint)

advertisement
Advanced Management
Technologies For
Exchange 5.5
Greg Todd
Program Manager
NT Solutions Group
BMC Software, Inc.
Agenda

Current issues with
problem diagnosis


Theory of root cause analysis (RCA)


Primer on RCA
How RCA can help you today


Application availability timeline
Demos of RCA on Exchange 5.5
Systems management vision


Management maturity curve
The future of Exchange management
The Business Problem


Event automation #1 priority of
IT executives
Problem diagnosis is a critical
aspect that requires attention

Wasted Time
80% of down time spent diagnosing
20% of time spent fixing

Wasted Resources
Diagnosis often a finger-pointing exercise

Frustrated Users
Users have no idea what to expect
Gartner, 1998
Application Availability
Timeline
Point of
Failure
Point of
Notification
PoF
Monitoring
Point of
Diagnosis
PoN
Point of
Recovery
PoD
Analysis
PoR
Recovery
Point of
Postmortem
PoP
Evolution
Application Availability
Timeline
Application Violating Service Level
time
PoF
PoN
Monitoring
PoD
Root Cause
Analysis
PoR
Recovery
PoP
Evolution
Application Availability
Timeline
Application Violating Service Level
Significant Decrease
Faster
Service
Restoration
PoF
PoN
Monitoring
PoD
Root Cause
Analysis
PoR
Recovery
Diagnosis Time Reduced
PoP
Evolution
time
Benefits Of RCA


Based on well-established theories
Quicker problem resolution






Problem isolation saves resources to address
the real problem
Symptom filtering allows administrator to
ignore sympathetic events
Performs tests to find the root cause
Far superior to rules-based approach
Key enabler to make systems
self-sufficient
Provides impact analysis capability
RCA
Key concepts



Symptoms are problems to
be investigated
Faults are the root causes of
these symptoms
Tests are active tasks which
gather information
RCA is a problem analysis methodology geared
towards finding the real cause of a problem and
preventing it from happening again.
Rules-Based Approach Vs. RCA
Rules-Based
Root Cause Analysis

Symptom received

Symptom received

Possible causes looked
up in a
fixed table of rules

Possible causes
determined from a
generic fault model

Set of possible causes
presented to user

Each cause is tested
against suspects

Only suggested actions
can be provided to user

Actual root cause is
presented to user after
suspects are eliminated

Specific actions can be
provided to user
Root Cause Analysis
For Exchange Server
Three components that work synergistically
Exchange Server
Windows NT
IP Network
High Level RCA Architecture
Enterprise
Console
Mid-Level
Manager
Managed
Node
Managed
Node
Managed
Node
RCA Architecture
Exchange Server
and OS KMs
BMC PATROL
Managed Node
Mid-Level Manager
KM
ARB
Bridge
RTEP
KM
Managed Node
Javalink Bridge
Protocol
Layer
Agent Request
Broker
RCA Engine
Realtime
Event Proxy
Mid-level agent
Enterprise
Console
ARB
KM
RTEP
KM
Managed Node
Custom
ARB
Other
Monitor
Diagnostic
KM
Root Cause Analysis
Sample problem
Inbound
Server
Exchange
Server D
Remote
Office
Exchange
Server
T1 Link to
Remote Office
Inbound
Messages
To Internet
Firewall
Bridgehead
Server
Exchange
Server A
Outbound
Messages
Legend
Internal Mail
Internal &
Internet Mail
Internet Mail
Outbound
Server
Exchange
Server C
Exchange
Server B
PATROL RCA
Sample problem

Symptom received by model

Queue Growth Alarms from
multiple Exchange Servers
Queue Growth
on Server A

Queue Growth
on Server B
Queue Growth
on Server C
Queue Growth
on Server D
Suspected root causes found in model
CPU Usage
High
Memory
Bottlenecks
MTA down on
target machine
Network
Problem
PATROL RCA
Sample problem

Suspected root causes tested
?
CPU Usage
High

?
Memory
Bottlenecks
?
MTA down on
target machine
?
Network
Problem
Root cause isolated

CPU usage high on bridgehead
CPU Usage
High
Memory
Bottlenecks
MTA down on
target machine
Network
Problem
Demo
Simple RCA Scenario
Sample Generic Fault Model
Sample Specific Fault Model
Sample Specific Fault Model
Close-up
Demo
RCA Engine
Causal Directed Graphs
Demo
Root Cause Analysis
Exchange, NT, IP Network
Demo
Impact Analysis
Exchange, NT, IP Network
Benefits Of RCA


Based on well-researched theories
Quicker problem resolution






Problem isolation saves resources to
address the real problem
Symptom filtering allows administrator
to ignore sympathetic events
Performs tests to find the root cause
Far superior to rules-based approach
Key enabler to make systems
self-sufficient
Provides impact analysis capability
Systems Management
Vision
Where’s all this stuff going?
Phases Of Management Maturity
Based on commonly
known process
control theory
VIRTUALIZE
STABILIZE
CONTROL
MANAGE
MONITOR
Applies directly to
management of complex
software systems
Maturity Phases

Monitoring is plumbing


Included with Windows 2000 and
Exchange 2000
Server-centric data and event
collection



MONITOR
Monitors component and system data
No awareness of other systems or apps
Basic alerting, scripting, and actions

WMI, PerfMon, HealthMon,
Exchange 2000 monitoring
Maturity Phases

Application-specific and server-centric






View and take action on components
Availability and performance monitoring
Rich reporting
Application SLA definition


MANAGE
ASAP resolution when out of compliance
Most correlation done in your head
Some tools have reached this level
Key enabler to Control phase
Maturity Phases



Places system automation in control
Provides holistic view of systems
Enables high level of SLA compliance




CONTROL
Quick problem diagnosis
Action <--> Reaction
Proactive correction before users
feel impact
Management automation maturing
Maturity Phases

Provides utility-level service



Reliable as electric, telephone, water
Assures continuous application service
Clusters



STABILIZE
Built-in fault tolerance, re-routing,
workload management
Failure does not impact service
Prediction / impact analysis

Awareness of impact on SLAs caused
by planned changes
Maturity Phases


The system learns how to intelligently
deal with various issues
Automatic everything






VIRTUALIZE
Actions and responses for the IT group
Alerts and communications
Acquires and stores knowledge for
future reference
Uses policy engines to control actions
Systems become truly self-sufficient
User becomes self-serviced
Virtualization Example
Problem Research Assistant

Correlates problem root cause
diagnoses with:



Previous resolutions - presents the user
with previous remedies based on exact
matches or best guess
On-line technical documentation - integrates
with vendor-supplied support documentation
(e.g. Microsoft Knowledge Base articles)
Technical Support Request Generator - formats
required user information and diagnosed fault
into a support request, according to vendorspecific templates
Virtualization Example
Problem Research Assistant
Correlation
Backend
Diagnosed
Faults
Problem Research
Assistant
Bridge
RCA Server
Help
Previous
Resolutions
Domain Model
Domain Model
Domain Model
IP Reachability
Analyzer
Online
Technical
Articles
Problem
Response
History
Repository
Support
Requests
RCA Takes Management To
The Next Level
VIRTUALIZE
STABILIZE
Many Players
Many Choices
CONTROL
MANAGE
MONITOR
Root
Cause
Analysis
Summary


GOAL: No interruptions in service
RCA is key to Exchange availability




RCA paves the way to virtualization




Accelerates the diagnosis process
Can assess impact of failures
before-hand
Not unreasonable to achieve “five 9’s”
Managed systems that learn and adapt
You never have to intervene
Free to invest more time in pro-activity
RCA is in beta now!!
Call To Action

Demand sophistication and simplicity
in Exchange management solutions





Solutions that learn
Solutions that are easy to use
Start thinking of Exchange availability
in terms of utility-level service
Consider where to implement RCA in
your current environment
Bring along those whom you service


Take care of your users
Communicate with them as you progress
Download