Uploaded by trainerwebinar

ITIL® v3 Event Management ( PDFDrive )

advertisement
ITIL® v3 Event Management —
A Look at the Theory (from the Real World)
Brenda L. Peery, 14th September 2009
BCS Specialist Group Session,
All copyrights acknowledged.
ITIL ® is a Registered Trade Mark of the Office of Government Commerce, and is Registered in the U.S. Patent and Trademark Office
An ‘event’
Not here for the tents and soundstage?
What is worth taking from that as we go forward
to look at our idea of Event Management?
• It looks like it could be a bit muddy
– Very broad definition
– Obscure language
• But there is an idea of purpose …
– [from ‘Event Management’, Wikipedia]
“to market themselves, build business
relationships, raise money or celebrate”
Speaker’s Background
• 15+ years experience with IT Service Related projects and
roles – both vendor and user sides with Event & Systems
Management related work
• ITIL v2 Manager, v3 Expert, MSP and Prince2
Practitioner, ITIL instructor, APMG committee member
developing new ITIL credentials
• As an independent consultant for the last 5 years,
“IT Service Management Architect” is my favourite title
thus far …
Main Topics / Goals
• Event Management –you may already know it and have it
– Monitoring and Event Management (key relationship)
• Event Management – the Basics according to ITIL®
• Where EM fits & What to consider in doing it
– First ask why – strategy
– Planning and managing
• Evaluation of the need
• What are you trying to solve / what need are you trying to serve
• Define a model and develop a strategy
Initial Context – Familiarity?
• Event Management
(EM) as a core process
is new with v3 ITIL
with some roots in v2
• What elements are
familiar?
© Crown copyright. Reproduced from the OGC's
ITIL® version 2 volume: ICT Infrastructure
Management and version 3 Core volume Service
Operation. All rights acknowledged.
Initial Context – Monitoring?
• Almost everyone has some familiarity with “Monitoring”
• Consider monitoring and management over the last decade:
– Systems Management software tools: IBM Tivoli (particularly
TEC), CA NSM, BMC Patrol
– the reporting capability of underlying Operating Systems: log files
and system utilities, Task Manager in Windows, the “top”
command in Unix
– And never underestimate the diagnostic scripts that your
SysAdmins have written or inherited
(Illus.) Ops Bridge Monitoring
Monitoring
Other kinds of monitoring?
• Other IT?
• Other sector?
• Inventory?
• Business monitoring?
• Projects to bring in & Manage that
monitoring
•Why do we do it?
Initial Context – History
So even though Event Management is ‘new’ there are
some challenges – in creating a process model – from the
back history that comes along with your infrastructure:
• There may already be strategies in place and benefits being
realised from monitoring programmes
• There are likely to ‘competing understandings’:
– what events are
– what you are or are not doing about them and
– at what levels you are engaging to monitor and utilise them
• Stakeholders may range from in-depth technical all the way up
to non technical consumers of the information EM can produce
Your back history, embedded in your kit, will shape or
constrain your EM possibilities
Best Practice Benefits
Develop a shared understanding and common
language based on best practice recommendations,
at least as your starting point …
EM Basics 1 – EM Process
“Event Management is the process that monitors all events
that occur through the IT infrastructure to allow for
normal operation and also to detect and escalate exception
conditions” (SO p.35).
So it is about:
– Detecting events
– Making sense of them
– Determining appropriate control actions in response to them
But also:
– Acting as a basis for automating routine Operations Management, and
– Because it provides data for comparison, supporting
• Service Assurance and Reporting
• Service Improvement
Event Management - Value
“Generally indirect” (SO p.39)
• EM provides mechanisms for early detection of incidents
(possibly action before any impact felt)
• EM provides a basis for automated operations
• EM provides a basis for monitoring automated activity by
exception
– Reducing the need for “expensive and resource intensive real-time
monitoring while reducing downtime”
• Improves performance of other major Processes (early
responses, more business benefit from more effective and
efficient ITSM)
EM Basics 2 – Event Definition
What is an Event?
Any detectable or discernible occurrence that has
significance for the management of the IT infrastructure or
the delivery of IT service and the evaluation of the impact
a deviation may cause to a service.
Events are typically notifications created by an IT service,
Configuration Item (CI) or monitoring tool.
(SO, p.35-36)
EM Basics 3 – Event Definition (Breadth)
Checking the official scope doesn’t narrow it
down much:
“Event Management can be applied to any aspect of
Service Management that needs to be controlled and which
can be automated” (SO p.36).
EM Basics 4 – Event Type
But there is more detail – the guidance suggests that you sub divide
Events and “that at least these three broad categories be represented” in
your Event Types:
1. Informational
• There is no action required
• Signifies regular operation (not an exception).
2. Warning
• Approaching a threshold.
• Signifies unusual, but not exceptional, operation
3. Exception
• Abnormal operation. Breach of parameters.
Note also: Alert
(to trigger human attention or intervention)
[SO, p.40]
Event
EM Basics 5
Process Flowchart
Event Notification
Generated
Event Detected
Event Filtered
Informational
Significance?
Exception
Warning
Event Correlation
Trigger
Event Logged
Auto Response
I
Alert
Type?
C
P
Human Intervention
Incident Management
Review Actions
Effective?
Yes
Close Event
© Crown copyright 2008. Reproduced from
the OGC's ITIL® core volume: Service
Operation. All rights acknowledged.
End
No
Problem Management
Change Management
EM Basics 6 – Process Activities Summary
Event Occurs –
Notification / Detection
Filtering (Categorisation)
I
C
Correlation (Logic/rules)
Note: Load
P
Trigger / Response Selection
Note: Human Perception
Review / Close
EM Basics 7 – Events and Infrastructure
Consider the extent to which your process design is and
must be connected to your installed architecture
– Notification/Detection: How are you detecting and how are
notifications sent or collected (and what impact does this have)?
– Filtering/Categorising: events into I , W , E streams, ignore
event (or log/record locally)
– Triggering an Alert, Auto Response, or related Process (does your
architecture allow this?)
EM – Lifecycle & Summary
I
CS
Service
Operatio
n
Service
Design
Service
Strategy
Service
Transitio
n
In the Lifecycle concept that is
at the heart of v3 ITIL, the Event
Management process is seated in
Service Operation with the full
set of SO processes including:
–
–
–
–
–
–
Event Management
Incident Management
Request Fulfilment
Problem Management
Access Management
Operational aspects of other
Processes
The EM & Monitoring Relationship
If we revisit the basic defintion:
“Event Management is the process that monitors all
events that occur through the IT infrastructure to allow for
normal operation and also to detect and escalate exception
conditions” (SO p.35).
While the Service Operation book provides a high level
model of a ‘sample’ EM process, have we really looked at
its key activity sufficiently ...
Designing EM – Alternate Lifecycle
I
CS
Service
Operatio
n
Service
Design
Service
Strategy
Service
Transitio
n
“In an ideal world, the
Service Design process
should define which events
need to be generated and
then specify how this can be
done for each type of CI.
During Service Transition,
the event generation options
would be set and tested”.
(SO p.39)
Monitoring and Infrastructure
The base monitoring architecture:
• Agent based
• Agent less
• A sample of an evolved monitoring
architecture
Agent based
Advantages:
* Technically more efficient
* Possible offline operation
* Often Richer in Functionality
GUI Console
Disadvantage:
* More complicated to install
* Agent disk footprint
Monitoring Server
Hub /
gateway /
Monitoring
Server
Server (Windows/Unix)
system
error
log
config
(once)
Alerts
Metrics
disks
Agent
app
log
Config
History
* Alerts
* Metrics
stored
config
UP?
Mem?
CPU?
Process 1
Process 2
script
CMD
Process 3
Agent-less
Advantages:
* No agent to install -> easy to install
* No Agent Footprint
Web Console
Disadvantages:
* More load on monitored machine
* Less resilient to network problems
Monitoring Server
New connection every cycle
Al
er
ts
Sc
he
du
l
es
Web
Server
relist
proce
sses
& filte
r
Process 2
Ex
u
ec
ric
s
Cross-Machine
Scheduling Loop
app
log
Process 1
e
ot
m
M
et
Rescan file
Re
te
Config
History
* Alerts
* Metrics
Server (Windows/Unix)
system
rescan file
error
log
Check disks
disks
script
D
CM
Process 3
Design Considerations – Starting Systems
Unix Database Server
Unix Database Server
CRON
Windows Database Server
Unix Application Server
Windows Application Server
CRON
App 1
Proc 3
App 1
Proc 1
Oracle
1
Sybase
1
Oracle
2
MSS
1
Sybase
2
MSS
2
App 1
Proc 2
App 2
Proc 1
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Logs
CPU
Disk
App 1
Proc 2
Mem
Logs
Design Considerations – System Capacity
Unix Database Server
Unix Database Server
CRON
Windows Database Server
Unix Application Server
Windows Application Server
CRON
App 1
Proc 3
App 1
Proc 1
Oracle
1
Oracle
2
Cap
Sybase
1
Sybase
2
Cap
MSS
1
MSS
2
Cap
Cap
App 1
Proc 2
Cap
App 2
Proc 1
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Open Source
Capacity Tool
In House
GUI
Logs
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Logs
CPU
Disk
App 1
Proc 2
Mem
Logs
Design Considerations – DB Mon. Capacity
Database Monitoring
Unix Database Server
CRON
Oracle
1
Unix Database Server
DbMon
CRON
Sybase
1
Oracle
2
Windows Database Server
DBMon
DBMon
MSS
1
Sybase
2
Unix Application Server
MSS
2
Windows Application Server
App 1
Proc 3
App 1
Proc 1
App 1
Proc 2
App 2
Proc 1
CPU
Disk
Mem
Logs
Database
Cap Plan
Web
Reports
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Logs
CPU
Disk
Mem
Logs
CPU
Disk
App 1
Proc 2
Mem
Logs
Design Considerations – App. Log Check
Unix Database Server
Unix Database Server
CRON
Windows Database Server
Unix Application Server
Windows Application Server
CRON
App 1
Proc 3
App 1
Proc 1
Oracle
1
Sybase
1
Oracle
2
MSS
1
Sybase
2
MSS
2
App 1
Proc 2
App 2
Proc 1
CPU
Disk
Agent
Mem
Logs
CPU
Agent
Disk
Mem
Logs
CPU
Agent
Disk
Mem
Logs
CPU
Agent
Monitoring Server
with thresholds &
app-specific
monitoring
configuration
Disk
Mem
Logs
CPU
Agent
Disk
App 1
Proc 2
Mem
Logs
ESM Arch (Generic)
Network Monitoring
ev e
n ts
Live Outage
Report
Central Event
Server
Cap
Additional Departmental
Monitoring
(Application Specific)
eve
nts
Rules
Incident Management
System
Ticket w/
Events
ts
even
Database Monitoring
Unix Database Server
Unix Database Server
DbMon
CRON
Windows Database Server
DBMon
Unix Application Server
Windows Application Server
DBMon
events
CRON
event
details
Agent
Agent
Agent
Agent
Monitoring Server
with thresholds &
app-specific
monitoring
configuration
Agent
The Two Perspectives
Operations led and Design led
– Operations led delivers the everyday working process
– Operations led vision is really pre-Incident Incident
management
– Design led establishes a conduit between IT Service
Management and the underlying technology
– Design led has the potential to be a very effecttive front end
and interface for traditionally less visible processes:
• Performance & Management Information (dashboards)
• Capacity
• Availability
Strategy – What it Takes to Do EM
I
CS
Service
Operatio
n
Service
Design
Service
Strategy
Service
Transitio
n
Start in the center ...
First ask “Why?”
Download