Impact Alternatives
Impact without Impact
Daniel L. Needles
January 21, 2011
Rev 2.0
Modification History
AUTHOR
Daniel L. Needles
Daniel L. Needles
DATE
05/01/2003
1/1/2010
Daniel L. Needles
1/21/2011
COMMENTS
Initial draft.
Added Omnibus 7.X case
studies:
XinY
Event Audit.
Impact Replacement
Additional clean up.
COPYRIGHT © 2010, NMS Guru
Any copying, distribution, or use of any of the information contained within this document
in any way or form is not permitted without the written consent of NMS Guru.
This is an unpublished work protected under the copyright laws. All rights reserved.
Impact without Impact
Modification History  i
Contents
MODIFICATION HISTORY .................................................................................... I
CONTENTS .......................................................................................................... II
INTRODUCTION ................................................................................................... 1
WHAT DOES THE TIVOLI NETCOOL SUITE DO? .......................................................... 1
WHAT IS OMNIBUS? .......................................................................................................... 2
WHAT IS IMPACT? .............................................................................................................. 2
WEIGHING THE ALTERNATIVE OPTIONS? ................................................................... 3
INTRODUCTION ............................................................................................................ 3
COMPLEXITY ................................................................................................................. 3
COST ............................................................................................................................... 4
FLEXIBILITY .................................................................................................................. 4
LOAD .............................................................................................................................. 4
ROBUSTNESS (HA/DR) ............................................................................................... 4
SUMMARY....................................................................................................................... 5
DOCUMENT PURPOSE ...................................................................................................... 5
INTENDED AUDIENCE ..................................................................................................... 5
POLICIES WITHOUT IMPACT ................................................................................6
INTRODUCTION ................................................................................................................. 6
POLICY DEVELOPMENT ................................................................................................... 6
PROBE, AUTOMATION, AND DATABASE FIELD DESIGN ............................................ 8
AUTOMATION AND DATABASE FIELD DESIGN ...................................................... 8
DATABASE FIELD DESIGN.......................................................................................... 9
AUTOMATION DESIGN.............................................................................................. 10
SOLUTION IMPLEMENTATION AND TESTING ............................................................ 10
DOCUMENTATION AND TRAINING .............................................................................. 10
SUMMARY.......................................................................................................................... 11
CASE STUDY: LIGHTS OUT NOC (OMNIBUS 3.X)............................................ 12
INTRODUCTION ............................................................................................................... 12
Impact without Impact
Contents  ii
BACKGROUND ................................................................................................................. 12
THE SOLUTION ................................................................................................................ 14
PAGING POLICY DEVELOPMENT ............................................................................ 14
PAGING SOLUTION - FIRST ATTEMPT WITH DELAY EDGE TRIGGERS ............. 15
PAGING SOLUTION - SECOND ATTEMPT WITH AUTOMATIONS AND DB FIELDS16
TRANSLATION .................................................................................................................. 18
AUTOMATIONS ................................................................................................................ 18
CASE STUDY: XINY (OMNIBUS 7.X) .................................................................. 20
INTRODUCTION ............................................................................................................... 20
REQUIREMENTS ............................................................................................................... 21
PROVIDED SOLUTION FRAGMENT .......................................................................... 21
ENVIRONMENT FEATURES ....................................................................................... 21
RECOMMENDATIONS...................................................................................................... 22
INTRODUCTION .......................................................................................................... 22
METHOD 1: PROBE UNIVERSAL INCLUDE FILE 1 ................................................. 24
METHOD 2: PROBE UNIVERSAL INCLUDE FILE 2 ................................................. 26
METHOD 3: AUTOMATIONS IN COLLECTION (EVENT DEPENDENT) ............... 26
METHOD 4: AUTOMATIONS IN COLLECTION (EVENT INDEPENDENT) ........... 26
METHOD 5: COLLECTION IMPACT .......................................................................... 26
METHOD 6: AGGREGATION IMPACT ...................................................................... 27
SUMMARY..................................................................................................................... 27
XINY SOLUTION DESIGN .............................................................................................. 28
INTRODUCTION .......................................................................................................... 28
NEW_ROW AND DEDUPLICATION: ......................................................................... 29
XINY_NEW_ROW AND XINY_DEDUPLICATION................................................... 29
XINY_EXPIRE............................................................................................................. 29
XINYWINDOW_EXPIRE............................................................................................ 30
XINY_CLEANUP ........................................................................................................ 30
TEST SCENARIOS ............................................................................................................. 31
INTRODUCTION .......................................................................................................... 31
SCENARIO 1: XINY WITHIN EVENT LIFESPAN ...................................................... 31
SCENARIO 2: XINY LIFESPAN SPANNING TWO RELATED EVENTS ................... 31
SCENARIO 3:XINY LIFESPAN NOT QUITE SPANNING TWO RELATED EVENTS31
SCENARIO 4: CLEAN HA/DR FAILOVER (BETWEEN XINY EVENTS) ................ 32
SCENARIO 5: DIRTY HA/DR FAILOVER (DURING AN XINY EVENT) ............... 32
SCENARIO 6: AN XINY FAILURE TO THRIVE......................................................... 32
SCENARIO 7: AN INTERRUPTED XINY LIFESPAN (BY GENERICCLEAR) ........... 32
SUMMARY.......................................................................................................................... 33
Impact without Impact
Contents  iii
CASE STUDY: EVENT AUDIT (OMNIBUS 7.X) ................................................... 34
INTRODUCTION ............................................................................................................... 34
AUTOMATION AUDIT ARCHITECTURE ........................................................................ 34
AUTOMATION AUDIT REPORT...................................................................................... 35
INSTALLATION ................................................................................................................. 35
STEP 1: CREATE THE TABLE ALERTS.ALARMAUDIT............................................... 35
STEP 2: ADD THE AUDIT FIELD (VARCHAR(64)) TO ALERTS.STATUS.................. 37
STEP 3: INSERT AN INITIAL RECORD INTO THE ALERTS.ALARMAUDIT TABLE.. 37
STEP 4: CREATE THE CLEAN_ALARMAUDIT_TABLE AUTOMATION .................. 37
STEP 5: OPTIONAL: MAKE A PLACE HOLDER FOR TRACKING PROBE EVENTS39
STEP 6: OPTIONAL: MAKE A PLACE HOLDER FOR TRACKING PROBE EVENTS40
STEP 7: UPDATE STATE_CHANGE TO TRACK AUTOMATION UPDATES .............. 41
STEP 8: UPDATE GENERICCLEAR, EXPIRE, AND OTHER AUTOMATIONS TO ENABLE
EVENT AUDITING ...................................................................................................... 42
STEP 9-N: UPDATE OTHER EVENT SPECIFIC AUTOMATIONS (FUTURE EXPANSION)
....................................................................................................................................... 43
SUMMARY.......................................................................................................................... 43
CASE STUDY: IMPACT REPLACEMENT .............................................................. 45
INTRODUCTION ............................................................................................................... 45
COMPATIBLE FUNCTIONALITY ..................................................................................... 46
FLEXIBILITY ..................................................................................................................... 46
SPEED ................................................................................................................................ 48
ROBUSTNESS .................................................................................................................... 48
MAINTAINABILITY .......................................................................................................... 48
SUMMARY ........................................................................................................... 49
APPENDIX A: XINY NEW TABLES AND TABLE UPDATES ................................ 50
ALERTS.XINY................................................................................................................... 50
ALERTS.XINYWINDOW.................................................................................................. 51
ALERTS.STATUS ............................................................................................................... 52
APPENDIX B: ALERTS.STATUS XINY CENTRIC AUTOMATIONS ...................... 53
NEW_ROW AUTOMATION: .............................................................................................. 53
SETTINGS ................................................................................................................. 53
ACTION ..................................................................................................................... 53
Impact without Impact
Contents  iv
DEDUPLICATION AUTOMATION .................................................................................... 54
SETTINGS ................................................................................................................. 54
ACTION ..................................................................................................................... 54
APPENDIX C: ALERTS.XINY CENTRIC AUTOMATIONS .................................... 56
XINY_NEW_ROW AUTOMATION .................................................................................. 56
SETTINGS ................................................................................................................. 56
ACTION ..................................................................................................................... 56
XINY_DEDUPLICATION AUTOMATION ....................................................................... 56
SETTINGS ................................................................................................................. 56
ACTION ..................................................................................................................... 56
APPENDIX D: ALERTS.XINY CENTRIC AUTOMATIONS ................................... 58
XINY_EXPIRE (TEMPORAL TRIGGER) ........................................................................ 58
SETTINGS ................................................................................................................. 58
EVALUATE ............................................................................................................... 58
ACTION ..................................................................................................................... 58
APPENDIX E: XINYWINDOW CENTRIC AUTOMATIONS .................................. 60
XINYWINDOW_EXPIRE (DATABASE TRIGGER) ........................................................ 60
SETTINGS ................................................................................................................. 60
ACTION ..................................................................................................................... 60
XINY_CLEANUP (TEMPORAL TRIGGER) .................................................................... 60
SETTINGS ................................................................................................................. 60
ACTION ..................................................................................................................... 60
APPENDIX F: EVENTSTREAM.PL ....................................................................... 61
APPENDIX G: EVENT AUDIT REPORT SCRIPT.............................................. 63
APPENDIX H: EVENT AUDIT REPORT (EXAMPLE 1) ................................... 67
APPENDIX I: EVENT AUDIT REPORT (EXAMPLE 2) ..................................... 73
APPENDIX I: IMPACT-LIKE PERL SHELL .................................................... 86
IMPORTANT NOTICE ................................................................................. 96
ABOUT NMS GURU ........................................................................................... 97
AUTHOR ............................................................................................................. 97
Impact without Impact
Contents  v
Figures
FIGURE 1: EXAMPLE OF AN EVENT LIST .............................................................. 1
FIGURE 2: EXAMPLE OF TYPICAL SINGLE TIER NETCOOL DEPLOYMENT ......... 2
FIGURE 3: POLICY FLOW DIAGRAM ...................................................................... 7
FIGURE 4: LIGHTS OUT NOC - ENTERPRISE MONITORING ARCHITECTURE .. 13
FIGURE 5: LIGHTS OUT NOC - NOTIFICATION SUPPRESSION GUI .................. 13
FIGURE 6: LIGHTS OUT NOC - DELAYED EDGE TRIGGER EXAMPLE ............... 15
FIGURE 7: LIGHTS OUT NOC - POLICY FOR A DOWN EVENT ........................... 16
FIGURE 8: LIGHTS OUT NOC - POLICY FOR A UP EVENT ................................. 17
TABLE 9: LIGHTS OUT NOC - FIELD ADDITIONS ............................................. 18
FIGURE 10: LIGHTS OUT NOC - TRIGGER NEODOWNPAGE............................. 18
FIGURE 11: LIGHTS OUT NOC - TRIGGER NEOCORRELATEUP ........................ 19
FIGURE 12: LIGHTS OUT NOC - TRIGGER NEOFIREUPPAGE........................... 19
FIGURE 13: XINY - SIX POSSIBLE APPROACHES .................................................. 24
FIGURE 14: XINY - AUTOMATION FLOW DIAGRAM ............................................ 28
FIGURE 15: XINY - TEST SCENARIO 1 ................................................................. 31
FIGURE 16: XINY - TEST SCENARIO 2 ................................................................. 31
FIGURE 17: XINY - TEST SCENARIO 3 ................................................................. 31
FIGURE 18: XINY - TEST SCENARIO 4 ................................................................. 32
FIGURE 19: XINY - TEST SCENARIO 5 ................................................................. 32
FIGURE 20: XINY - TEST SCENARIO 6 ................................................................ 32
FIGURE 21: XINY - TEST SCENARIO 7 ................................................................. 33
FIGURE 22: ADMIN GUI - TABLE CREATION ..................................................... 36
FIGURE 23: ADMIN GUI - AUTOMATION CLEAN_ALARMAUDIT_TABLE............ 37
FIGURE 24: ADMIN GUI - AUTOMATION CLEAN_ALARMAUDIT_TABLE............ 38
FIGURE 25: ADMIN GUI - AUTOMATION NEW_ROW .......................................... 39
FIGURE 26: ADMIN GUI - AUTOMATION DEDUPLICATION................................ 40
FIGURE 27: ADMIN GUI - AUTOMATION STATE_CHANGE ................................. 41
FIGURE 28: ADMIN GUI - AUTOMATION GENERICCLEAR ................................ 42
FIGURE 29: ADMIN GUI - AUTOMATION EXPIRE ............................................... 43
Impact without Impact
Contents  vi
FIGURE 30: SCHEMA: ALERTS.XINY ................................................................... 51
FIGURE 31: SCHEMA: ALERTS.XINYWINDOW .................................................... 51
FIGURE 32: SCHEMA CHANGES: ALERTS.STATUS ............................................... 52
Impact without Impact
Contents  vii
Introduction
What does the Tivoli Netcool Suite do?
IBM's flagship for fault management is Tivoli Netcool suite. It consists of several
products working together to shuttle events from disparate sources into a common
database for display, reporting, and analysis.
Figure 1: Example of an Event List
The events flow through the system as follows. Each probe receives events from a
particular source, the most common being syslog and SNMP traps. Using detailed
instructions contained in rules files, the probes convert the information about the alarm
into a common format and insert them into the object server database. Many
applications pull the events from the object server for reports, display, data redundancy,
or further additional analysis.
Impact without Impact
Introduction  1
Figure 2: Example of Typical Single Tier Netcool Deployment
What is Omnibus?
Together the probes, gateway, and object server are called Omnibus. Omnibus
collectively acts as an event processor by effectively:
 Consolidating events from multiple Network Management Systems (NMSs) and
Element Management Systems (EMS's) into a single display screen.
 Deduplicating multiple events to a single device and/or service centric event and
in the process drastically reducing the number of events.

Normalizing the disparate events into a consistent presentation while preserving
the events' meaning.

Correctly stressing an event’s individual importance and/or urgency relative to
other events displayed regardless of the disparate event sources.
What is Impact?
Impact is a separate product that can be integrated into Omnibus to increase the
functionality of the fault management solution. Impact provides two additional features
not readily available in Omnibus:
Impact without Impact
Introduction  2
1. Outside data sources integration (i.e. web services, Oracle, Sybase, etc.)
2. Complex event enrichment, correlation, and/or augmentation.
Impact performs these two functions by executing three steps periodically:
1. Pull object server events
2. Performing work: Programmatically access and write data to and from:
a. Object server
b. Web services (XML, HTML)
c. Databases (Oracle, Sybase, Mysql, etc.)
3. Update object server events
The result is an extension of fault management functionality.
Weighing the Alternative Options?
Introduction
At first glance Impact appears the most logical, if not the only, method to provide
outside data source integration and/or complex event enrichment and correlation.
However, one of the chief strengths of Netcool is its ability to interoperate with other
applications and programs. As a result Impact is not always the best solution for the
problem at hand.
The following chart weighs four common approaches with regards to the complexity,
cost, flexibility, load, and robustness (high availability (HA) and disaster recovery (DR))
of each approach.
Option
Impact
Automations
Probes
Scripts
Complex
4
3
5
1
Cost
1
5
5
2
Flexible
4
3
1
5
Load
HA/DR
2
1
5
4
2
4
4
5
Complexity
Impact uses a proprietary, 4th generational language. As a result, specialists are required
to administer, program, and maintain the policies in the application. Further, Impact
lacks many of the program constructs of the third generational language while not
abstracting out constructs as most forth generational languages do. For example, in order
to scale multiple polices, the policies need to be cascaded from a single data source
reader. This construct is not prebuilt and requires considerable configuration and coding.
The result is neither intuitive nor self documenting.
Impact without Impact
Introduction  3
Despite these issues, Impact is designed to extend Netcool and as a result is simpler than
most other approaches. The only reason Impact is not simpler than the probes' approach
is because the probes are limited and as a result have less options, making their approach
simpler.
Cost
The complexity of Impact requires high end consultants to effectively maintain it. This
on top of a high purchase price and yearly maintenance makes Impact very costly in
comparison to the alternatives: automations, probes, and scripts. Automations and
probes licensing and maintenance are already paid for since they come with the core
Netcool product - Omnibus. Scripts are slightly cheaper because they do not carry the
hefty maintenance cost associated with Impact.
Flexibility
Impact is the most flexible approach besides scripting. Both automations and probes are
limited in language and do not have the ability to work directly with outside data sources.
However, automations and probe both can impact the event immediately so there is no
latency in event updates, unlike Impact or scripting. Scripting provides all the flexibility
of Impact and more at the cost of building the Impact framework functionality.
Load
Load is not one of Impacts strongest suites. Once Impact is installed and configured, in
most cases the product can handle at most 6-10 mildly complex algorithms and that can
only occur if the polling is backed off from 3-4 seconds to 30-90 seconds. This of course
means that for 30-90 seconds the object server events are not updated, which leads to
other issues. Automations do not perform much better and can place a heavy load on the
system. Probes are the least "heavy" but are very limited in application. Finally, scripts
often provide the best functionality and can scale considerably better than Impact.
Robustness (HA/DR)
As the JAVA based Impact program consumes system resources, the program as a whole
begins to behave badly. Part of this is poor product architecture in that Impact does not
take into account the architecture of JAVA. In particular each JVM handles resource
allocation for the given Java programs. When the JVM gets overloaded or is context
switching, Garbage Collection falls behind. When it does, the JVM blocks all processing
until the Garbage is collected. In effect, it is cleaning up objects that have been exercised
and abandoned or are done. If the code spawns a lot of objects, the Garbage collector
consumes the CPU as it traverses memory to recover memory. The result is when the
primary needs to failover to the secondary in the cluster, there is not enough memory or
CPU to handle the failover correctly. Unfortunately this is exactly the scenario one
expects the clustering to handle. Worse, to work around the issue, the primary and
secondary Impact are shut down together before the primary is restarted, which results in
an Impact functionality outage anyway.
Impact without Impact
Introduction  4
Summary
When extending the functionality of Netcool into event manipulation and integration
with outside data sources, there are many options. Only one of these is Impact. A variety
of factors need to be taken in consideration when making the selection of approach. This
section has discussed five of these, which are: complexity, cost, flexibility, load, and
robustness (high availability (HA) and disaster recovery (DR)).
Document Purpose
There are alternatives to purchasing Impact or building an in-house solution. Many
NOCs require only simple automated diagnostics, notifications, and/or escalation
processes. The purpose of this document is to explore Impact alternatives for these cases
through either:
 Omnibus Automations and Database Fields
 Probe Rules and Database Fields
 Customized Scripts and Database Fields
Several case studies are provided within this document that fully document how this is
done:
 Lights Out NOC: Automated notification and escalation.
 XinY: Flagging X Events occurring in Y period of time.
 Event Audit: What policy/automation/probe touched which events when.
 Impact Replacement: A Impact-like PERL script that emulates Impact behavior.
Procedures to emulate policies can be written and embedded in the script to
replace Impact functionality.
Intended Audience
This document is directed toward managers and engineers tasked with creating,
maintaining, and/or using a distributed NMS architecture.
Impact without Impact
Introduction  5
Policies without Impact
Introduction
Through the addition of database fields and the careful use of Omnibus automations,
Omnibus can implement complex and effective state machines. However, since this is
not a designed feature of the product, there are three caveats, which complicate the
implementation of these solutions:
1. The state transitions are ALL timed or based on database actions: insert, reinsert,
update, and delete. Only Object Server triggers or probe processing can provide the
action to transition between states.
2. Unlike a normal policy model, all states are available to the automation for state
transitions. That is, in the case of the Object Server automations the SQL WHERE
clause must be used to exclude events from the transitions. Otherwise events not in a
particular state will also be transitioned, breaking the model.
3. The presentation of automations and event records is not engineered towards
displaying policy models. Thus, other facilities need to be used to describe and
document the policy Model. This is critical since natural entropy more than any other
factor eventually leads to the demise of any monitoring solution.
With these limitations in mind, there is a straightforward four step process to create and
implement models within the Probes and/or Omnibus’s Object Server:
1. Policy Development.
2. Probe, Automations, and Database Field Design.
3. Solution Implement and Testing.
4. Documentation and training.
Policy Development
The first step is to fully delineate the 'policy' to be modeled. That is, the business process
and its impact should be fully delineated and vetted with the customer base. This is best
explained through an example.
Impact without Impact
Policies without Impact  6
Yes
Manager
Tier-3
No
No
Older then 60 minutes?
Yes
Escalation Process
Event Resolved?
Yes
Engineer
Tier-2
No
No
Older then 15 minutes?
Event Resolved?
Yes
Operator
Tier-1
Event Received
Event Removed
From Escalation
Figure 3: Policy Flow Diagram
Imagine a small company wants to implement a lights out NOC to enable the workers
the flexibility of working from home as well as perform other duties.
Impact without Impact
Policies without Impact  7
In this case once all the stake holders are consulted and the requirements identified, a
simple escalation policy was drafted:
1. When an event for a set of particular devices is received by Omnibus, a 24 X 7 NOC
tier-1 operator is notified and troubleshoots the problem.
2. If the operator cannot solve (clear) the issue within 15 minutes, the system notifies
tier-2 by paging an engineer (whether or not the event has been acknowledged.)
3. If after an hour no resolution is found, the system escalates to tier-3 by paging the
manager of the engineering group (whether or not the event has been acknowledged.)
In each case the event must be resolved in order to remove it from the escalation
process. Once the problem is resolved, the escalation process can begin anew if the issue
reoccurs.
During the vetting process one area of concern was discovered - what if the tier-1 or tier2 deletes the event? This would bypass the escalation solution. The decision was made to
alter the delete event tool in Webtop and the thick windows-based Omnibus client to
prevent the deletion of these events for these users. Though these users should not have
access to nco_sql, this also needed to be verified.
Probe, Automation, and Database Field Design
Once the policy has been identified and described, the solution can be designed though
modifications and additions to probe rules, automations, and the alerts.status database.
In our light out NOC example, implementing the solution within the probe rules can be
ruled out. The issue is that the escalation is time based and not event instance based.
Since the probe rules are only activated when an event hits the probe, this is likely not
the best location to model the bulk of the policy.
This leaves the Omnibus automations to perform the state transitions and the
alerts.status database fields to record the various states in the policy.
Automation and Database Field Design
Once the models have been designed, they must be translated into Omnibus database
fields and automations. The database fields are used to keep track of an events flow
through the model while the automations transition the events from one stage to the
next in the model by changing the value of the database fields.
Impact without Impact
Policies without Impact  8
Database Field Design
Database fields contain information regarding the event’s participation in the policy
model. Fields can be used to indicate the position (state) within the model. They can also
be used to provide pertinent information for the model such as paging addresses.
Which database fields are to be used, depends on the application. The cleanest method is
to add custom fields with field names that clearly describe how they are used. However,
depending on change control, memory limitations, and other constraints this is
sometimes not possible. Alternatively, existing fields can be used by adding additional
possible values, appending values to end of the field, or taking over their function
entirely. In these cases documentation and testing are critical to ensure business
functions are not disrupted.
In this example, a custom integer field called “State” is created to track the status of the
escalation. For instance, the following states could be set:
State
Status
0
Event Received by Tier-1 (default)
1
Escalated to Tier-2
2
Escalated to Tier-3
3
Event Removed or NOT Participating in the
Escalation Process
The State field not only allows for status tracking, but when used in conjunction with the
automations it ensures that events aren’t processed multiple times.
Once the fields and field values are selected, three additional constraints have to be
considered: model granularity, model crosstalk, and failover constraints.
Model Granularity
As stated above, the Omnibus Object Server will use a field in each record to track an
event through the model. The implication is that without building extra complexity into a
model, the lowest granularity that can be obtained by a model is the database record. For
example, if the mttrapd rules file is configured to use the same identifier for any interface
error (buffer overruns, CRC, etc), no simple model using automations and database fields
can be built to track these events separately.
Model Crosstalk and Failover Constraints
In other cases, processes might disrupt the model unexpectedly. For example, a
misconfigured: probe or the generic clear automation could reset the database field and
thus the model in random cases. Similarly, if the bidirectional gateway is not set up
Impact without Impact
Policies without Impact  9
correctly the model’s database fields could fail to copy across or get reset depending on
which automation is firing when on both servers.
Automation Design
As stated earlier, automations provide the transport mechanism between states within the
policy by changing the values of the database fields. Where the database fields remember
the current state of the model, the automations provide the logic, such as when and if the
event should be escalated. Details such as how often the events should be checked, what
conditions should be considered and what actions need to be performed are all specified
within the automations.
Correctly understanding how each type of trigger works is essential to the design of the
policy. In general, there are three types of automations: temporal, database, and signal.
Temporal automations can be used for transitions based on a timer as well as transitions
that can place a load on the system. Using a temporal automation guarantees the
regularity at which the automation will execute.
Database automations can be used to provide a more interrupt rather than 'polled' based
transition.
Signal automations have less relevance in policy building. However, when augmenting an
existing administration function, they do have their uses.
The lights off NOC automation design is discussed in detail in the first case study.
Solution Implementation and Testing
Once completed and implement each transition in the model should be unit tested. This
can be accomplished by monitoring the data stream or by simulating the actions through
database insertions using nco_sql. If possible, every known event occurrence should be
tested to provide the most comprehensive testing. In addition, failover should be forced
for every state of the model to verify the model is correctly designed.
After the solution is rolled into production, the solution still should be monitored. It is
unlikely that every possible case has been caught in development, test, and stage
environments. As a result, expecting and planning for issues is a more cost effective,
customer centric approach.
Documentation and Training
Once completed and implement each transition in the model should be unit tested. This
can be accomplished by monitoring the data stream or by simulating the actions through
database insertions using nco_sql. If possible, every known event occurrence should be
Impact without Impact
Policies without Impact  10
tested to provide the most comprehensive testing. In addition, failover should be forced
for every state of the model to verify the model is correctly designed.
Summary
Through a simple scenario, this section described the four-step approach to creating
policy models within an Omnibus Object Server. However, models can be created and
used for numerous complex scenarios. The following sections describe multiple solutions
that were incorporated into several fortune 500 businesses.
Impact without Impact
Policies without Impact  11
Case Study: Lights Out NOC
(Omnibus 3.X)
Introduction
This case study reviews a simple escalation and paging notification policy built for a fortune
500 company with a pre existing Netcool architecture. Their Omnibus solution remained on
version 3.6 due to preexisting integrations and lack of budget. Since several companies
remain in this earlier state, this case study presents a 3.X based solution.
The company's architecture consisted of a Netcool virtual server pair collecting events from
five probes. Initially a simple paging procedure was developed to send pages to the
responsible engineer, but due to an excessive number of pages being sent, the engineers
began to ignore the pages. As such the system was ineffective. The client required a solution
that capitalized on their existing investment and would increase the effectiveness of their
system.
Background
A NOC staff monitored the network using the Netcool architecture illustrated in the
following figure. The system also sent out pages for specific events. In order to issue the
actual pages, Netcool calls the Netcool utility nco_page that relayed the requests to the
paging application.
The probes collect network information and translate them into events. Using a lookup file,
the engineer responsible for the event is added into the custom PageDestination field during
the probe processing. If the device creating the event is unknown or no engineer is assigned
to the event, then the field is left blank. The event is then passed onto the Object Server.
The Object Server’s automations call the nco_page script to page the engineer if the
PageDestination field was populated.
The nco_page script calls the paging application. This application uses the parameter
originally set by the probe in the PageDestination field to determine which people or groups
need to be paged. This is accomplished by having the paging application look up the value
of the PageDestination field in its configuration file. The page is then sent.
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  12
Primary Server
Secondary Server
Paging
Application
Paging
Application
nco_page
script
nco_page
script
Bi Gate
Primary_Object_Server
Secondary_Object_Server
Virtual Object Server
System 1
System 2
TrapD
Probe
Ping
Probe
Paging.lookup
System 3
Syslog
Probe
HPOV NNM5
Probe
Paging.lookup
TrapD
Probe
Paging.lookup
Figure 4: Lights Out NOC - Enterprise Monitoring Architecture
Figure 5: Lights Out NOC - Notification Suppression GUI
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  13
The final aspect to the paging process is the paging/notification blackout periods. Through
WebTop, a blackout web page (a cgi form) was created to enable users to create blackout
periods. The cgi form created a special event in the Object Server that indicated the blackout
start and stop time. Within the Object Server, an automation suppressed any events
corresponding to the blackout event, thus prohibiting notifications being sent out for the
specific events.
Although this process correctly sent pages to the engineers during the correct time periods, it
had one major flaw. So many pages were sent that engineers began ignoring their pagers.
This rendered the mechanism ineffective. More discretion was necessary.
The Solution
The best solution available was to use a policy to develop an escalation system to reduce the
number of pages. However, the client could not afford the overhead of a policy. It was
decided that a combination of database fields and automations would be used to create the
necessary policy model.
Architecting this solution was a four-step process.
 An effective paging policy had to be agreed upon.
 The model needed to be drawn out as a flow diagram and visually tested to ensure
that the policy was implemented effectively.
 The model had to be implemented using automations and database fields.
 The end result needed to be unit tested to ensure effectiveness.
Paging Policy Development
The first step to rectifying the problem was to develop an effective paging policy. After
some discussion among the managers, engineers, and NOC staff, the following policy was
agreed upon:
1. The engineer that is responsible for a device will receive the page. If the engineer
responsible for the device is unknown, no page is sent.
2. Pages do NOT go out for down events resolved within 5 minutes by a corresponding up
event.
3. Pages MUST go out for down events not resolved after 5 minutes.
4. Pages do NOT go out for up events that do NOT match a "paged" down event.
5. Pages MUST go out for up events that correspond to "paged" down events.
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  14
6. Severity MUST be set to 0 for any matching up and down events. (This enabled a
preexisting cleaning automation to delete the records after a period of time.)
Paging Solution - First Attempt with Delay Edge Triggers
Once the paging policy was decided, a delay edge trigger solution was attempted. The trigger
was configured to activate on a per row basis after a one-interval time delay. The time
interval for the trigger was 3 minutes. Any event requiring a page outstanding after 3 minutes
would cause the ascend action to generate a problem page, while the disappearance of the
event would cause the descend action to generate a resolved page for the event.
At first this seemed like a reasonable approach. However, after reviewing how the edge
triggers have been designed the logic reveals a flaw.
Even though the delay edge trigger is configured to perform its ascend and descend actions
on each record, the activation of the trigger itself occurs on the entire group of records.
This results in 4 distinct types of behaviors depending on how the lifespan of an event falls
relative to the lifespan of the delay edge trigger.
Condition
Met
Condition
Not Met
Event E
Event C
Event B
Event A
Ascend
Action
Descend
Action
Figure 6: Lights Out NOC - Delayed Edge Trigger Example
Event A
An ascend action is executed by the edge trigger for the event. This does not meet the logic’s
requirements because no descend action is executed for the event.
Event B
An ascend action and a descend action is executed. This meets the logic’s requirements.
Event C
No action is executed. This does not meet the logic’s requirements.
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  15
Event E
A descend action is executed. This does not meet the logic’s requirements because no ascend
action is executed for the event.
Events A and B are present when the ascend action is executed and the trigger executes an
action for these two events. Event C & E occur after the trigger has executed, therefore no
ascend actions are executed for these events. Event A & C resolve, but since events B & E
still met the trigger’s condition to descend action is executed. Finally, events B&E resolve
and the descend action is executed.
To meet the logic’s requirements an ascend and descend action would need to be executed
for each event. Obviously the use of a delay edge trigger at a row level will not work.
Paging Solution - Second Attempt with automations and db fields
Since the use of an edge trigger would not work, it was decided to design a model based
solution. The paging policy was converted into two related models shown below - one
model for the down event and another model for the up event.
NeoPageDown
(Action)
Send
Problem
Page
NeoPageDown
(trigger)
Meet
Paging
Criteria
Down
Event Initial
State
Figure 7: Lights Out NOC - Policy for a Down Event
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  16
Yes
NeoFireUpPage
(Action)
Send Page
Clear Down
Event
Mark Down
Event for
Page
Clear Up
Event
Yes
NeoCorrelateUp
(Action)
NeoFIreUpPage
(Trigger)
Event
Marked for
Page?
NeoCorrelateUp
(Trigger)
Matching
Down
Event?
Yes
Down
Page
Sent?
No
Clear Down
Event
Up Event
Initial
State
Figure 8: Lights Out NOC - Policy for a Up Event
When Netcool receives a down event, it begins the Down Event Model. This model waits in
the Down Event Ground State until either a corresponding up event occurs or until the
down event ages more than 5 minutes. If a corresponding up event does occur within 5
minutes the automation, NeoCorrelateUp clears the down and up events. If no
corresponding up event is detected within five minutes, the NeoPageDown automation will
execute, sending a page and updating the event to indicate that a page has been sent.
If the corresponding up event occurs 5 minutes or more after the down event the Up Event
policy begins. The NeoCorrelateUp automation determines if there is a down event
corresponding to the up event and if so determines if a page has been sent out for the down
event. If a (problem) page has been sent out, the NeoCorrelateUp automation updates the
down event indicating that another (resolution) page needs to be sent. The NeoFireUpPage
automation finds the updated down event requiring a resolution page to be sent and sends
out the page. If no (problem) page has been sent out for the corresponding down event, the
NeoCorrelateUp automation clears the up and down events. If no down event is found the
automation clears the up event.
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  17
Translation
The implementation required the addition of database fields and the creation of three
automations, each with a trigger.
Four fields were added to the Netcool database. They are as follows.
Field
Description
PageDestination The Probe populates this field. It specifies a Telamon paging
group representing an engineer who is responsible for the device.
Usummary
The Deleted event's Summary field is copied into the Up event's
Usummary field.
Cleared By
Either 0 or 1. Used to determine if an Up event has been
correlated to a down event.
Page
This field is set to 1 if a page is to be sent. It simplifies the policy
by offloading the responsibility to determine if an Up Event
requires a page to the nco_page script.
Table 9: Lights Out NOC - Field Additions
Automations
In addition to the field additions the following triggers and automations were used to
create the policies.
Trigger Name
NeoDownPage
Sample Rate
57
Type
Level
Threshold
0
Row Count
Greater than 0
Execution
For each matched row
Condition
select * from alerts.status where Page = 0 and Type = 1 and ClearedBy = 0 and Severity > 0
and PageDestination <> '' and ServerName = 'NCOMS' and (StateChange < getdate – 300);
Ascend Action
NeoDownPage
SQL Action
update alerts.status via '@Identifier' set Page = 1;
Executable
/opt/Omnibus/utils/nco_page**
Host
Parameters
-d @PageDestination -n @Node -u
@Mastername -m @Summary
Run As
** This PERL script was augmented such that if the Page field is set to 0, no page will go out. If the Page field is set to any other value, a page will go
out.
Figure 10: Lights Out NOC - Trigger NeoDownPage
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  18
Trigger Name
NeoCorrelateUp
Sample Rate
58
Type
Level
Threshold
0
Row Count
Greater than 0
Execution
For each matched row
Condition
select * from alerts.status where Type = 2 and ClearedBy = 0 and Severity > 0 and
ServerName = 'NCOMS';
Ascend Action
NeoCorrelateUp
update alerts.status set USummary = @Summary, ClearedBy = 1 where Type = 1 and
ClearedBy = 0 and Manager = ‘@Manager' and AlertGroup = '@AlertGroup' and
AlertKey = '@AlertKey' and Node = '@Node' and ServerName = 'NCOMS' and Severity
>0 ;
SQL Action
update alerts.status via '@Identifier' set Severity=0;
Executable
Parameters
Host
Run As
Figure 11: Lights Out NOC - Trigger NeoCorrelateUp
Trigger Name
NeoFireUpPage
Sample Rate
57
Type
Level
Threshold
0
Row Count
Greater than 0
Execution
For each matched row
Condition
select * from alerts.status where Type=1 and ClearedBy>0 and Page = 1 and Severity >0 and
ServerName = 'NCOMS';
Ascend Action
NeoFireUpPage
SQL Action
update alerts.status via '@Identifier' set Severity=0;
Executable
/opt/Omnibus/utils/nco_page
Parameters
-d @PageDest -n @Node
-u @Mastername -q @Page
-m @USummary
Host
localhost
Run As
0
Figure 12: Lights Out NOC - Trigger NeoFireUpPage
The behavioral model highlights three points.
1. All up events can immediately be set to severity 0 (clear) regardless if a matching down
event is discovered or not.
2. Down events that do not have their PageDestination field populated do not issue a page.
3. Repetitive (flapping) events occurring at a rate greater than 5 minutes will be ignored by
the model.
Impact without Impact
Case Study: Lights Out NOC (Omnibus 3.X)  19
Case Study: XinY (Omnibus
7.X)
Introduction
Often when detecting network, system, and other problems, the individual events cannot be
taken out of context of a larger picture. For example, sometimes the number of occurrences
of a particular event within a window of time determines if there is an issue or not. This type
of detection is called XinY functionality because X number of instances within a window of
Y seconds flags a problem.
This section describes a case study where a fortune 500 company requested that the XinY
functionality be built into the exiting Netcool fault management solution such that upon
detection of an XinY event either:
 The Severity is incremented unless it is already set at Critical (5) in which case the
AlarmPriority is incremented
 The Summary is prefixed with the text: "XinY Policy X events in Y seconds Met:"
where X and Y are populated with the correct number of instances and seconds.
In addition the XinY solution needed to provide the following abilities:
 Allow for individually specified X and Y values for different types of events.
 Unset the XinY condition after a period of time. (The time period is globally set.)
 Track XinY state even if an event is deleted and reinserted into the event list.
 Ignore GenericClear associated with a problem event.
There were many possible XinY implementations that met these criteria. This case study
explores an implementation which
 Reviewed the requirements of the customer and the demands of the pre existing
Netcool architecture.
 Reviewed and ranked six XinY possible solutions, recommending one solution for
implementation.
 Implemented one of the XinY solutions with a detailed design
 Fully tested the implementation
At the end of this case study the reader should have enough information to evaluate their
own customer needs and environmental demands to determine the correct XinY approach
for their customer environment.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  20
Requirements
The requirements for a XinY solution for the customer depended on:
 Provided solution fragment
 Environment features
These requirements in turn pointed to four possible strategies to address these aspects.
Provided Solution Fragment
Very few projects are set up perfectly. Usually the customer not only defines what they need,
but also aspect of the solution. Sometimes consulting with the customer can free up the
solution constraints. However, this was not the case with this customer.
In particular as part of the initial requirements, a fragment of the XinY solution was
provided. The population specification for the events through the rules files was decided via
four additional fields within the event:
1. *XinY Current X count against XinY state.
2. XinYXValue Event count threshold for event
3. XinYYValue Event time threshold for event
4. XinYEffect The action taken when the XinY threshold is breached.
* Since the event will likely be deleted and possibly reinserted during the course of XinY
calculations, this field could not be used to track XinY X count unless the field was back
populated by an outside source. However, the field was leveraged as a semaphore to prevent
corrupted updates to alerts.status due to simultaneous updates by the primary and secondary
object servers.
Environment Features
In addition to the solution fragment, five key environment features dominated the potential
architecture for the XinY logic and helped determine the final design:
1. The customer had a two tier Netcool architecture: collection and aggregation/display.
2. "Blocked" events are pooled for five minutes inside collection and are not visible in
aggregation. (NOTE: This "Hold Down" logic was created in a previous project. As
part of the analysis for the XinY project, it was noted that marking rather than
restraining the events in collection would have been a much less complex and
maintainable approach. However, the scope of the XinY project prevented revisiting
and correcting this "hold down" logic.)
3. Non-blocked events are deleted every minute after forwarding to aggregation. Thus
XinY "state" information cannot be stored within the events.
4. Without an added stream of data between collection and aggregation, the sliding
window information is lost when events are pushed from collection to aggregation
since that information cannot be contained efficiently within the events (alerts.status.)
5. Like XinY, the Generic Clear automation updates the Severity field. Thus, the
conflict between these two solutions needed to be addressed.
Four possible strategies were devised to counter these issues:
Impact without Impact
Case Study: XinY (Omnibus 7.X)  21
Two were efficient:
1. Apply XinY logic BEFORE the collection layer
2. Apply XinY logic completely within the collection layer
One was inefficient and complex:
3. Shuttle sliding window information from the collection layer to aggregation.
One was inaccurate:
4. Use fixed windows instead of sliding windows between layers. (Thus all necessary
XinY information is contained in the event's alerts.status fields.)
Based on these strategies in response to the requirements, six possible approaches were
discussed. These are described in the next section.
Recommendations
Introduction
There are many possibilities for implementing XinY logic. A virtual meeting (conference
call) was created to brain storm options among the engineers and customer and other stake
holders. Like most solutions, the biggest hurdles were political, not technical. In particular
the natural bias of a good engineer came into play.
The customer had many good and great engineers. Good engineers are more created than
born. They are forged from life experiences working among many companies coupled with a
strong personality and mindset. A lot of bad comes with the good. Inventing options within
a group is not a strong point of the good engineer. The pressures of business, an artist
personality, as well as of the politics of past relationships often inhibit the openness required
to build a full set of options within a team. As a result, most participants came with their
minds already made with regards to the correct solution.
Having an open mind is not easy for engineers, especially good ones who naturally over time
tend to drink more deeply of their own Kool-Aid. Though such an approach serves the
individual well, it often hinders the outcome of team effort. Most of the participants came
with solutions already in mind.
Much of the problem comes from the fact that engineers are creators. Like all artist they
tend to build from the inside out, manipulating natural creativity and inspiration into
working solutions. Unfortunately, like artist, their ego becomes inseparable with their
solutions. Any criticism, justified or not, is perceived as a personal attack. Unlike salesmen,
which create via adaptation to the customer and outside environment, the engineer when
pressured in a team will tend to become more ridged in their beliefs and focus on their
position. The result is a battle over position rather than a brainstorm to generate possible to
solution. To combat these issues five ground rules were given for the discussion in advance:
 Only one stakeholder to represent each area, voted in by the respective group.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  22
 Strict Agenda
o Brain storm ideas and squash all judgment for a later
o Select potential ideas to develop
o Brain storm development of the selected solutions
o Solution selection
 Separate the person from the position/problem. Be hard on the mutually shared
issues, not people.
 One person could be irrational/emotional at one time. (i.e. no emotional reaction to
someone's' emotions.)
 Focus on interests, not positions.
As a result, the mindset of the group was opened and peer pressure and moderation by the
consultant prevented escalation.
The next issue became the single mindedness of the group. As plausible solutions emerged
from the group, the tendency was argue from these positions rather than brainstorm more
approaches. This naturally leads to all or nothing thinking - where only by compromising
one approach can lead to success of another. This rarely reflect reality. So to keep out of this
rut focus was constantly returned to a discussion of issues rather than positions.
For example, at one point the brainstorming broke down into a struggle between using
Impact verses using Omnibus automations. People engaged in narrowing the gap between
these two positions rather than broadening the discussion to other options. Further the
thought was only by comprising one approach would lead to success in the other. The focus
became these two positions rather than inventing alternatives to the core issue that everyone
shared.
The moderator was able to expand the discussion by throwing in a third option of using
PERL with the DBI:DBD Sybase module instead of either Impact or Omnibus. In the end,
the PERL option was not even among the developed approaches. However, introducing the
alternative at this stage broke apart the Omnibus and Impact camps. As a result, two probe
options arose as possibilities.
The result of this session was six developed approaches to the XinY solution. The six
approaches took into account the four strategies listed above and are listed in the table
below.
#
1
Method
Probe Universal Include File 1
XinY Set
Accuracy
Max
Xvalue
Skill
Req.
System
Load
HA/DR
Low
N/A
Lowest
Low
Low
*No
#4
Very High
<20+
Low
Low
Low
*No
#3
Event
Rank
Reliant
(Strategies 1 and 4 )
2
Probe Universal Include File 2
Impact without Impact
Case Study: XinY (Omnibus 7.X)  23
(Strategy 1)
3
Collection Automations
High
Unlimited
Med
Med
Med
+Yes
No
High
Unlimited
Med
Med
Med
*+No
#1
High
Unlimited
MedHigh
High
Med
*+No
#2
Med
Unlimited
Highest
Very
High
Med
+Yes
No
(Strategy 2)
4
Collection Automations with
XinY Lifespan tracking
(Strategy 2)
5
Collection Impact
(Strategy 2)
6
Aggregation Impact +
Automations
(Strategy 3 and maybe 4)
* Special logic needed to handle Generic clear since it too updates Severity
+ Applied at collection layer
Figure 13: XinY - Six possible Approaches
The meaning of the table's fields are described as follows:
 XinY Set Accuracy: How quickly the XinY condition is detected and set.
 Max XValue: If a sliding window is used, the maximum X that can be used.
 Skill Req.: What is the required staff skill level to maintain the solution.
 System Load: What is the memory and CPU load of the solution.
 HA/DR: How accurate and resilient is high availability and disaster recovery
associated with the solution
 Event reliant: Does the XinY state depend on the event not being deleted in
alerts.status.
 Rank: Is the relative recommendations.
Method 1: Probe Universal Include File 1
This was by far the simplest and most maintainable approach. Basically an include rules file
used arrays indexed by the Identifier to track the threshold conditions. The following is the
proof of concept implemented for this method:
## IF XinY SET, NOTHING TO DO. (NOTE: THERE IS NO XinY UNSET in this
logic)
@XinYXValue=10 ## XinY Watermark
@XinYYValue=10 ## XinY Watermark
if (!(match(IsXinY[@Identifier],"1"))){
## INTIALIZE ARRAY (NOTE: Taking the MAX in the dedup automation can
prevent a restart from clearing the XinY state)
if (match(X[@Identifier],"")){
$tempdate = getdate
Y[@Identifier] = timetodate($tempdate,"%D %T")
Impact without Impact
Case Study: XinY (Omnibus 7.X)  24
X[@Identifier] = "1"
CurrX[@Identifier] = "1"
@XinY=0 ## OBJECT SERVER MEMORY
IsXinY[@Identifier] = "0" ## PROBE MEMORY
$State = "XinY Initialize."
## UPDATE X TALLY, AND CHECK IS XinY ASSERTED (NOTE: NO XinY RESET IN
THIS LOGIC.)
}else {
CurrX[@Identifier] = int(CurrX[@Identifier]) + 1
$dx = int(CurrX[@Identifier]) - int(X[@Identifier])
$dy = getdate - datetotime(Y[@Identifier],"%D %T")
## INSIDE TIME WINDOW
if (int($dy) <= @XinYYValue) {
## AND THRESHOLD CROSSED
if (int($dx) >= @XinYXValue) ) {
$State = "XinY Tripped."
IsXinY[@Identifier] = "1" ## So Probe as well as Object Server
knows
@XinY=1
## AND NOT CROSSED THRESHOLD
} else {
$State = "XinY: Not tripped."
}
## TIME WINDOW EXPIRED
} else {
Y[@Identifier] = timetodate(getdate,"%D %T")
X[@Identifier] = 0
$State = "XinY: Window expired, reset"
}
}
# DEBUGGING
$Y = Y[@Identifier]
$X = X[@Identifier]
$CurrX = CurrX[@Identifier]
details($Y,$X,$CurrX,$dx,$dy,$State)
@XinYDebug = $State
update(@XinYDebug)
}
Probe rules' arrays persist from event to event. This enabled this method as a viable
approach. However, the solution was inaccurate since the logic did not use a sliding window.
Further, there were HA/DR issues since all XinY state information was lost if the probe
dies (though this information was preserved through IHUPs.) This solution preserved the
XinY state when the event was deleted from the object server since the information was
stored within the probe. Further this form of the solution was in production at more than 2
Fortune 500 companies. Finally, clearing the XinY condition was implemented through a
simple automation. The maximum delay of clearing the XinY condition was the poll period
of the automation. The HA/DR, probe system limitations, and lack of a sliding window
made this a less desirable solution for the customer in this case study.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  25
Method 2: Probe Universal Include File 2
This solution mirrors Method 1 with the addition of a sliding window. The sliding window
bumped up the event accuracy to the highest value. Only the delay in clearing the XinY
condition via the automation compromised its accuracy. Still this was the highest event
accuracy that could be expected from any of the solutions. The HA/DR and probe system
limitations made this a less desirable solution for the customer in this case study.
Method 3: Automations in Collection (Event Dependent)
This solution performed all its work in the collection tier, verses the aggregation tier. The
automation that shuttled events from collection to aggregation was changed to enable XinY
events to persist in collection until their XinY status cleared. This greatly simplified the
complexities of determining the sliding window and prevented the need when going to
aggregation of either discarding the sliding window or shuttling all the associated
information back and forth.
This solution had 4 draw backs. First, the solution was more complex than the probe
solution - the sliding window information must be shuttled between the two object servers.
Second, the deletion of aged events (i.e. old X) was a temporal trigger so delays led to
inaccuracy. That is, an XinY event in some cases was asserted when actually the problem was
the temporal trigger hadn't fired yet. By subtracting the temporal trigger period from the Y
value, the nature of the error was controlled as to whether a false XinY assertion or missed
XinY assertion or somewhere in between was committed. Third, the load of the automation
solution was placed on the object server rather than the probe. Forth, the combination of
Probe HA and Object Server HA could potentially cause an outage. Specifically, if the
primary object server goes down hard or for an extended period of time, there would be an
outage for all events up to the duration of the bi-direction gateway’s poll cycle.
Method 4: Automations in Collection (Event Independent)
This method performed all its work in collection tier. Unlike Method 3, this approach
tracked the XinY state in a persistent table separate from the alerts.status table. Though the
extra table increased the complexity over method 3, it gave the XinY logic independence
from the event shuttling logic between the tiers. The customer implemented this approach
due to its logic independence and automation centric approach.
Method 5: Collection Impact
Like method 3, this method performed all its work in collection. To do so Impact was
moved from the aggregation tier to the collection tier.. This allowed the normal XinY
algorithm to have full visibility of the blocked events, which remain stuck in collection for
five minutes. However, Impact’s difficulties with failover and proper function under high
loads made it less HA/DR safe. Also, this method inherited the HA/DR issue cause by the
incompatibility between the probe HA and the Object Server HA. Finally, the added
complexity of Impact made the solution less maintainable. Despite these draw backs, the
customer bias, system resource availability, and flexibility of the solution made this the
second most desirable approach.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  26
Method 6: Aggregation Impact
This method had all the complexity of method 3 and method 4 combined. The solution
needed to operate independently within collection and aggregation. Further, any sliding
window information in collection needed to be either discarded or shuttled from collection
to aggregation. In short, this method accounted for the many nuances arising from splitting
the solution across the two tiers. This was the least desirable solution out of the six.
Summary
Six potential solutions were developed with the engineers, customers, and other stake
holders. As a result the biases and politics of the group were accounted for and the
expectations set. The group settled on method 4 due to its flexibility, event independence,
HA/DR ability, and its decent accuracy, maintenance, and load requirements. Though the
probe alternative appeared a better fit, the probe's system resources and political mindset
prevented this from being a viable option.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  27
XinY Solution Design
Introduction
Once the solution the automation based XinY solution was selected, the detailed design
began. The XinY solution was designed to provide:
 XinY state independence from event existence.
 Co-exist with GenericClear
 Provide XinY unset as well as set ability
Three tables were required instead of the standard two. In particular in addition to
alerts.status tracking the event state as well as alerts.XinYWindow tracking the sliding
window, the addition of alerts.XinY was required to preserve the XinY state even when
events were deleted from collection after being forwarded to aggregation.
Along with the three tables, five new automations and modification to two existing
automations were required. The figure below describes the relationship between the tables
and the automations. These are discussed in detail in the sub sections below.
Event State
XinY State
XinY Sliding Window
State
XinY_
deduplication
4
XinY_
new_row
3
3
4
Alerts.status
Alerts.XinY
Alerts.XinYWindow
9
1
new_row
1
XinY_
CleanUp
2
2
8
XinYWindow_
deduplication
6
5
Expire
6
XinY_Expire
6
Figure 14: XinY - Automation Flow Diagram
Impact without Impact
Case Study: XinY (Omnibus 7.X)  28
New_row and Deduplication:
The modifications are used to update the XinY state and keep the alerts.status table in sync
with the alerts.XinY table. The numbers below refer to the numbers in the chart above and
specify the actions taken:
1. Initializes Alerts.XinY with the XinY state and records the "before XinY state" values of
the Summary, Severity, and AlarmPriority fields.
2. If Alerts.XinY already is tracking a XinY state associate with the event then
a. Update the XinY state in alerts.XinY
b. If XinY is set, populate alerts.status from alerts.XinY with the current Severity
(unless event cleared), AlarmPriority or Summary according to the XinYEffect value. (Since
Event deletion or deduplication erases the current XinY state in alerts.status.)
c. Update alerts.status XinY to 2 to indicate XinY is in progress. (This field is also
used to keep the primary and secondary Object Servers from stepping on each others as they
independently update the events.)
XinY_new_row and XinY_deduplication
These automations update the sliding window, assert and reassert the XinY condition. The
numbers below refer to the numbers in the chart above and specify the actions taken:
3. Keep the XinY state current for the effected Summary, Severity, and AlarmPriority fields
and update the sliding window state.
4. If the XinY state is triggered or retriggered,
a. If the first time XinY is tripped,
1. Update the affected alerts.status fields: Summary, Severity, and
AlarmPriority.
2. Update XinYEffect to enable the correct back out procedures.
b. Reset LastY to the value of LastOccurrence. This resets the fixed windows for
unsetting the XinY condition.
XinY_Expire
This automation times out the XinY state and sliding window events. In addition, the
automation detects and unsets the XinY state when it is time. The numbers below refer to
the numbers in the chart above and specify the actions taken:
5. Delete from alerts.XinYWindow the sliding window events that have aged outside the 'Y'
parameter. If no window events remain, delete the XinY state from alerts.XinY table.
6. Detect if it is time and then un-assert a XinY event by:
Impact without Impact
Case Study: XinY (Omnibus 7.X)  29
a. Reverting to the previous Summary, Severity (unless currently cleared), and
AlarmPriority alerts.status field values.
b. Clear the XinY state information for the event from the alerts.XinY and
alerts.XinYWindow tables.
XinYWindow_Expire
This automation provides a shortcut to counting the sliding window events since this
operation is expensive in SQL. Instead, this automation decrements the alerts.XinY X field
every time a record is deleted from the sliding window table. The numbers below refer to the
numbers in the chart above and specify the actions taken:
8. The alert.XinY X field is decremented every time a sliding window record expires.
XinY_CleanUp
This automation is purely precautionary. If things are not working well, this automation
ensures that the alerts.XinY and alerts.XinYWindow tables do not grow without bounds.
The numbers below refer to the numbers in the chart above and specify the actions taken:
9. Any records older than a day are trimmed from the alert.XinYWindow table. Any events
older than 4 weeks are trimmed from the XinY table.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  30
Test Scenarios
Introduction
The solution was put through a battery of tests to verify expected behavior. Each of the
scenarios is described in the subsections below.
Scenario 1: XinY within Event Lifespan
XinY Lifespan
Event Lifespan
TIME
Figure 15: XinY - Test Scenario 1
This is the simplest scenario. The counting, setting, and unsetting of the XinY event occurs
within the lifespan of an event. In reality this will be rare except for blocked events since non
blocked events are deleted from collection about every minute.
Scenario 2: XinY Lifespan Spanning Two Related Events
XinY Lifespan
Event Lifespan
Event Lifespan
TIME
Figure 16: XinY - Test Scenario 2
This is the second most common scenario. The counting, setting, and unsetting of the XinY
event occurs over the lifespan of multiple inserts and deletes of an event (using the same
Identifier.) This mean when the event returns the current state (if XinY is set) then the
affected fields must be reset correctly in alerts.status: XinY and one of the following:
Summary, Severity, or AlarmPriority. In addition the XinY window expires when no event is
in the alerts.status. (Thus the XinY state needs to be cleared independent of the event state.)
Scenario 3:XinY Lifespan not quite Spanning Two Related Events
XinY Lifespan
Event Lifespan
Event Lifespan
TIME
Figure 17: XinY - Test Scenario 3
This is the first most common scenario. The counting, setting, and unsetting of the XinY
event occurs over the lifespan of multiple inserts and deletes of an event (using the same
Impact without Impact
Case Study: XinY (Omnibus 7.X)  31
Identifier.) The XinY state expires while an event is still in the alerts.status table. Thus, both
the XinY and event state need to be cleared.
Scenario 4: Clean HA/DR Failover (between XinY events)
HA/DR
Event
XinY Lifespan
Event Lifespan
TIME
Figure 18: XinY - Test Scenario 4
This is a special case where a failover occurs outside a XinY event.
Scenario 5: Dirty HA/DR Failover (during an XinY event)
XinY Lifespan
Event Lifespan
TIME
HA/DR
Event
Figure 19: XinY - Test Scenario 5
This is a special case where a failover occurs within a XinY event.
Scenario 6: An XinY Failure to Thrive
XinY Lifespan
Event Lifespan
TIME
Figure 20: XinY - Test Scenario 6
This is a case where the XinY assertion never happens because not enough instances occur
within the window.
Scenario 7: An interrupted XinY Lifespan (by GenericClear)
XinY Lifespan
Event Lifespan
TIME
Impact without Impact
Case Study: XinY (Omnibus 7.X)  32
Figure 21: XinY - Test Scenario 7
This is a case where a Generic Clear causes the problem event to clear out. In this case the
XinY state should persist and ignore the Generic Clear but allow the Generic Clear to delete
the event as it normally would. If a new instance of the event occurs, the XinY state will be
used to back fill the correct Severity, AlarmPriority or Severity according to the XinYEffect
set for that event.
Summary
After two relatively short weeks, the XinY solution was created and rolled into production.
Impact without Impact
Case Study: XinY (Omnibus 7.X)  33
Case Study: Event Audit
(Omnibus 7.X)
Introduction
One of the largest weaknesses of the Netcool suite is the inability to determine what
automation, impact policy, probe rule or other entity updated an event and when. The Event
Audit Automations combined with the Audit Report Script were designed to solve this
problem. This case study describes the steps to implement an event audit solution through
automation changes and additions along with additional alerts tables and alerts.status fields.
Automation Audit Architecture
The Event Audit Solution is a fully extendible and scalable way of tracking what
automation(s) touch what events and when. It can be further extended to track probe and
other program updates to the events by applying changes in the state_change automation
also to the new_row and deduplication automations.
If a new automation or program is added to the Netcool solution it too can be audited. The
new automation or program simply must update the event's Audit field with its name. The
automation will grab this information and create an audit trail of the change. If there is more
than one insert and/or update statement in the new automation or program, appending an
index to the program name in the Audit field can provide further granularity to the audit.
The Event Audit Solution consists of 4 parts:
1. Audit field in alerts.status
2. State_change automation (new_row and deduplication if probe tracking is desired as
well.)
3. Event inserting/updating automations and or tools.
4. clean_alarmaudit_table automation
The audit process occurs in a 4 step life cycle:
1. The event inserting and/or updating automation populates the alerts.status’ Audit
field with its name (and an index if there is more than one insert and/or update
statement.)
2. Before the insertion occurs, the state_change automation takes the Audit field and
populates the alarmaudit_table through a simple SQL statement:
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  34
insert into alerts.alarmaudit VALUES
(0,getdate(),new.Audit,new.Identifier);
3. The audit trail hangs around until the event in alerts.status is deleted from the system.
4. Within a minute after the event is deleted from the alerts.status table, the
clean_alarmaudit_table automation runs and clears out the audit trail.
The result of this solution is a fully extendable audit capability, tracking what automation or
program is touch which events and when.
Automation Audit Report
The Automation Audit Report (AuditTable.cgi) is a PERL program that provides statistics
regarding what automations have touch which events when. The program has four
command line options:
 Identifier: Provides statistics on a particular event as identified by its identifier.
 Node: Provides statistics on a particular node
 Start: Relative to the present when does the period start that should include the audit
records for the report
 End: Relative to the present when does the period end that should include the audit
records for the report
Examples run and results of this report program are included in the appendices at the end of
this whitepaper.
Installation
Before beginning installation it is important that the entire section is read and understood. In
particular, know exactly what is needed for the tables. The reason being there is a bug in
Omnibus where the deletion of a column and other edits to a table will cause the deletion of
all automations that touch this table. Needless to say this can take a few minutes to resolve
and results in an unacceptable outage in production.
Step 1: Create the table alerts.alarmaudit
This can be perform either with nco_sql at the command line or through the Admin GUI.
The command line version is as follows:
-- CREATE THE TABLE
CREATE TABLE alerts.alarmaudit PERSISTENT (
Key INCR PRIMARY KEY,
Occurred TIME,
Entity VARCHAR(64),
Identifier VARCHAR(255)
);
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  35
Go
In the GUI you will see the following if the table is created correctly:
Figure 22: Admin GUI - Table Creation
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  36
Step 2: Add the Audit field (varchar(64)) to alerts.status
The solution also requires adding the Audit field (varchar(64)) to alerts.status. This can be
perform either with nco_sql at the command line or through the Admin GUI.
Step 3: Insert an initial record into the alerts.alarmaudit table
This step can be performed from the command line using the nco_sql utility. Within nco_sql
enter the following command:
INSERT INTO alerts.alarmaudit VALUES (0,getdate,'TestAutomation','TestEvent');
go
Step 4: Create the Clean_Alarmaudit_Table automation
Figure 23: Admin GUI - Automation Clean_alarmaudit_table
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  37
This automation deletes any audit entry not associated with a event in the alerts.status table.
This step can be performed through the Admin GUI. The automation needs specific values
set as shown in these figures.
Figure 24: Admin GUI - Automation Clean_alarmaudit_table
After creating this automation, use the nco_sql command at the command prompt to verify
that it deletes the entry added to the audit table in step 2.
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  38
Step 5: OPTIONAL: Make a place holder for tracking probe events
When probes and other tools insert new events into the alerts.status table they evoke the
new_row automation. Through the Admin GUI insert the following lines in GREEN to
stage tracking these updates at a later date:
Figure 25: Admin GUI - Automation new_row
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  39
Step 6: OPTIONAL: Make a place holder for tracking probe events
When probes and other tools insert on top of existing events in the alerts.status table they
evoke the deduplication automation. Through the Admin GUI insert the following lines in
GREEN to stage tracking these updates at a later date:
Figure 26: Admin GUI - Automation deduplication
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  40
Step 7: Update state_change to track automation updates
This automation is the work horse of the audit automation solution. Specifically, when
automations or other tools update existing events (rather than insert new events or insert on
top of existing events) in the alerts.status table they evoke the state_change automation.
Through the Admin GUI insert the following line in the automation:
insert into alerts.alarmaudit VALUES (0,getdate(),new.Audit,new.Identifier);
This populates the audit table each time an automation touches an event. The Audit field
contains the automation that last touched the event. Since this is a pre-insert automation, it
guarantees consistency and no trashing of updates:
Figure 27: Admin GUI - Automation state_change
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  41
Step 8: Update GenericClear, Expire, and other automations to enable
Event Auditing
By setting the Audit field, this sets the stage for the state_change automation to populate the
audit table against every event that this automation touches. Thus, any automation with
insert, reinsert, or update requires an update to assign the Audit field to the automation
name and sub index (if required.) Below is an example of an updated GenericClear
(rudimentary) automation.
Figure 28: Admin GUI - Automation GenericClear
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  42
Similar updates can be performed on the Expire automation:
Figure 29: Admin GUI - Automation expire
Step 9-N: Update Other Event Specific automations (Future Expansion)
Any automation that updates an event via the insert or update command needs to be
modified to set the Audit field with the name of the automation. If there is more than one
insert or update command, then numeric tags should be used at the end to indicate which
command updated what event (i.e. Automation-1.) By doing this simple step, this sets the
stage for the state_change automation to populate the audit table against every event that
this automation touches.
Summary
The previous sections described the step by step installation of the event auditing solution.
Though solution design and implementation should be shared by the same responsible
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  43
parties, this ideal was not possible. The testing and deployment of the solution was
performed by another group due to budget short falls.
During the testing by that group, the Generic Clear automation generated over 98% of the
audit records. The solution was modified to consolidate the statistics much in the same way
deduplication summarizes multiple alerts into a single event.
Based on the results of the on going audit, future plans were made to migrate the Generic
Clear logic from the Object Server automation into the probe mttrapd and syslog rules files.
Although the Generic Clear automation was still needed to resolve polled events, the
majority of the traffic was trap and syslog based.
In addition the solution proved extremely effective at reducing troubleshooting time by
narrowing down the culprit automation, rule, and/or script that erroneously updated various
events. Time will tell if additional modifications to the logic is required.
Impact without Impact
Case Study: Event Audit (Omnibus 7.X)  44
Case Study: Impact
Replacement
Introduction
In some cases for a variety of technical and budgetary reasons a custom script is the best
solution. The most readily available methods to do this are via PERL and JAVA. In this case
study the client had a 24 page Impact policy that had over the course of 6 years migrated
from Impact 2.3 all the way to Impact 4.01. As a result, the policy was very complex and
very customer-specific.
This case study will discuss the generic part of the project which was to create a PERL script
to simulate Impact's generic functionality. During the course of the 4 week project the
generic aspects of the project included:
1. Write the PERL based Impact-like shell.
2. Test and improve the Impact-like shell efficiency and robustness
3. Integrate the policy specific PERL
4. Deploy the PERL program into production
The result was a PERL based replacement for an Impact implementation of a policy.
Appendix I contains the source code of the Impact-like PERL shell. The program was
written with five goals in mind.
1. Compatible Functionality
2. Flexibility
3. Speed
4. Robustness
5. Maintainability
The following sections describe how the program was architected to satisfy these
requirements.
Impact without Impact
Case Study: Impact Replacement  45
Compatible Functionality
The most important goal of the project was to keep the same functionality. To ensure this
was the case during the testing and initial production phases the events were fed to both a
Netcool stack using Impact as well as a Netcool stack using the Impact-like PERL script.
Another script periodically scraped the events from both solutions and highlighted the
differences. Manual inspection of the differences determined whether the discrepancies were
caused by true script functionality gaps, Impact bugs, or simple timing. After a couple weeks
the pseudo code of the Impact-like PERL shell logic coalesced to:
1. Declare packages, environment variables, database security variables, command line
parameter defaults, other database assignments.
2. Declare and prepare Oracle SQL statements for tables: equipment, site, card, path,
port, and channel.
3. Load procedures: LogMsg, dumpnetcool, usage, and commandline. Each is
documented regarding PSUDO code.
4. Run main program:
a. Prepare Netcool SQL for alert.status and open logfile and change summary
file.
b. Loop forever (unless DEBUG & 16 in which case only loop once.)
i. Pull events from Netcool alerts.status
ii. For each event
iii. Clear data structures.
iv. Perform Impact Policy work
v. Build and execute UPDATE against Netcool alerts.status to update the
event.
vi. Sleep the remaining time in the cycle and log if the cyle has overran its
length. Make sure to sleep a minimum time between cycles
($CYCLEMIN)
c. Disconnect from Netcool and Oracle.
Flexibility
From the onset it was predicted that the manor in which the script was used would evolve as
the project progressed. This was certainly the case. Midway through the project the script
was used to audit the event stream, which required different functions. Further,
benchmarking and debugging became stronger requirements as dictated by the end customer
that used the system. Most of this flexibility was managed by three aspects: an open
architecture, global configuration variables, and extendible command line options. The
Impact without Impact
Case Study: Impact Replacement  46
command line options became extensive. As a result a 'help' option was added to document
the various ways the program could be run:
USAGE: psuedoimpact
[-debug
1
2
4
8
16 32 64 128 -
<debug number 1-255>]
Log any warning or errors.
Function Entry and exit logging.
Inner function verbose logging
Dump before and after Netcool alert.status fields.
Query Netcool alerts.status only once.
Clear old log.
Show initial alert.status field values.
Do not perform update, instead document
what would be done.
[-node] (Node to perform extra logging on.)
[-logfile] (full path and file to log file.)
*[-sumfile] (full path and file to the change summary file.)
[-cycle] (Cycle period in seconds.)
[-cyclemin] (minimum delay between cycles, even if the cycle
overruns.)
[-delay] (Ignore events with Populated field younger than)
[-stdout] --Log to standard out
[-help]
NOTE: Program must run as the user root
DEBUGING **************************
There are 8 flags available for debugging.
When diagnosing a specific problem in development -255 or -127 is used as it will:
Clear the old log (32)
Query the events only once and exit the program (16)
Show any warning or errors (1)
Show entry and exit from functions (2)
Provide verbose logging of the inner workings (4)
Dump the initial values of the event's fields that can be updated (64)
Dump the before and after values (if there is a change) for each event (8)
Write a summary of the updates for each event in the sumfile (8)
Normal operation of the program can used the default DEBUG value of 1, which
will log WARNING and ERRORS to an event's journal as well as to the
logfile.
SUMFILE ***************************
* When debug flag 8 is set, the summary file contains a record of all
proposed (debug & 128) or actual changes made in a ~ delimited file that
can easily be viewed with Excel. It also includes any journal entries or
detected errors. Each row represents one event. The following is a column
description:
-- Fields used to identify and select the event...
Node
- Node referenced in the event.
DESCR
- Description of the Node (From Oracle.)
IP_ADDRESS - IP Address of the Node (From Oracle.)
MODEL
- Model of the event (From Oracle.)
AlertGroup - Netcool event field
AlertKey
- Netcool event field
IsJournal
- Journal exist?
-- Fields show the before and after values of netcool fields for the
event. If there is no change, null is shown in the before and after
values to make the changes stand out. Each field is repeated twice
to show the before and after values respectively. The columns include:
Customer Node Site Summary JournalEntry
-- Error messages against the event from processing.
ERRORS
Impact without Impact
Case Study: Impact Replacement  47
Speed
The initial tests of the program proved the script could only handle 1-2 events per second.
After some benchmark analysis it was discovered the majority of the time was taken up in
Oracle sql queries and creation and destruction of hashes, which were caused by calls to the
DBI:DBD perl module's $sth->fetchrow_hashref.
Two changes to the code improved the script to handle 15-20 events per second.
1. Database select return function changes. The fetchrow_hashref() calls were converted to
fetchrow_arrayref() calls. In addition the bind_columns() function was used to avoid having
to map array indexes to field names and instead call the fields directly by field names via
binded variables.
2. Database select prepare function changes. In addition all the various forms of the select
statement were prepared ahead of the main loop. As a result, the initial parsing by the
database and binding of search variables was done in advance.
These changes enabled the script to handle 15-20 events per second rather than the initial 12 events per second.
Robustness
The robustness of the solution was improved in two ways:
Scheduler: The scheduler was written to provide a minimum downtime between cycles. This
ensured that other components would not be starved contact to the object server by the
scripts persistence. Additionally checks were built into the scheduler to see if it overran its
scheduled run time. If this occurred, messages were logged to this effect.
High availability/Disaster recovery: The program was written to have a primary and
secondary mode. The secondary gathers only older events to update and performs the same
functionality as the primary. By changing the settings and infinite number of scripts could
run to provide redundancy or the algorithm could be changed to provide load balancing.
Maintainability
A significant percentage of the script is directed toward maintainability. These features are
self evident:
1. Debug option
2. Inline commands
3. Use of bind_columns() in combination with fetchrow_arrayref()
Together these features help to self document the code.
Impact without Impact
Case Study: Impact Replacement  48
Summary
In large and small enterprises alike, often the out of the box functionality of IBM Tivoli
Netcool Omnibus is not enough. Additional event enrichment and correlation involving
or not involving outside data sources may be required. Traditionally, Impact is selected as
the only alternative. However, in many cases though judicious use of probe rules,
automations, and alert.status database fields, an Impact-like policy can be created
through these facilities alone. Further, in some cases replacing Impact with home grown
scripts makes sense. This document described generally how to implement these
alternatives and provided several case studies that are currently in production at several
fortune 500 companies.
Impact without Impact
Summary  49
Appendix A: XinY New Tables
and Table Updates
Alerts.XinY
The XinY table is used to track the current state of the XinY property of an event.
Order Name
Datatype
Length
Primary
Key
Description
2
Identifier
VarChar
255
Yes
Unique index for XinY
state
3
LastOccurrence
UTC
4
No
Used for the time window
calc
4
X
Integer
N/A
No
Used for the instance calc
5
XinYXValue
Integer
N/A
No
Instance count required for
XinY
6
XinYYValue
Integer
N/A
No
Time window limit for
XinY
9
XinYEffect
Integer
N/A
No
What to do once XinY
tripped
7
LastY
UTC
N/A
No
Fixed window for XinY
set/unset calculation
13
NowXinYSeverity
Integer
N/A
No
For backfilling events with
current XinY state.
12
NowXinYAlarmPriority Integer
N/A
No
For backfilling events with
current XinY state.
14
NowXinYSummary
VarChar
255
No
For backfilling events with
current XinY state.
10
PreXinYSeverity
Integer
N/A
No
For restoration if XinY
unset
11
PreXinYAlarmPriority
Integer
N/A
No
For restoration if XinY
unset
8
PreXinYSummary
VarChar
255
No
For restoration if XinY
unset
Impact without Impact
Appendix A: XinY New Tables and Table Updates  50
Figure 30: Schema: Alerts.XinY
NOTE: The order field indicates the order of the columns. This is only important for the
INSERT statements that appear in the automation new_row and deduplication. Otherwise
the order of the columns can be changed.
Alerts.XinYWindow
The XinYWindow table is used to track the various instances of an event to calculate the X
value (number of instances) in XinY condition. As the events age outside the Y time period
(in seconds), they are deleted from this table. NOTE: The order field indicates the order of
the columns. This is only important for the INSERT statements that appear in the
automations XinY_new_row and XinY_deduplication. Otherwise the order of the columns
can be changed.
Order Name
Datatype Length Primary
Key
Description
4
Idx
INCR
N/A
Yes
Unique index for
XinYWindow event
2
Identifier
VarChar
255
No
Unique index for XinY sate
3
Occurred
UTC
4
No
When it occurred (for
sliding window calculations)
Figure 31: Schema: Alerts.XinYWindow
Impact without Impact
Appendix A: XinY New Tables and Table Updates  51
Alerts.Status
These are the additional fields added to the alerts.status. They are the same as what was
specified as part of the requirements for the solution.
Name
Datatype Length
Primary
Key
XinY
Integer
N/A
Yes
1=new, 2=in XinY calc, 3=XinY
set
XinYXValue
Integer
255
No
Instance count to hit.
XinYYValue
UTC
4
No
Time period limit
XinYEffect
Integer
N/A
No
What to do once XinY is tripped.
Figure 32: Schema Changes: Alerts.Status
Impact without Impact
Description
Appendix A: XinY New Tables and Table Updates  52
Appendix B: Alerts.Status XinY
Centric Automations
new_row automation:
SETTINGS
On alerts.status
Pre database action on insert
Apply to row
Enabled
ACTION
begin
if ( %user.is_gateway = false )
then
set new.Tally = 1;
set new.ServerName = getservername();
end if;
set new.StateChange = getdate();
set new.InternalLast = getdate();
if( new.ServerSerial = 0 )
then
set new.ServerSerial = new.Serial;
end if;
-- Company Customer XinY Logic
if (new.XinYEffect > 0) then
for each row XinYrow in alerts.XinY where XinYrow.Identifier =
new.Identifier
begin
if (new.XinYEffect = 1) then
set new.Summary=XinYrow.NowXinYSummary;
else
if (new.Severity > 0) then
set new.Severity=XinYrow.NowXinYSeverity;
end if;
set new.AlarmPriority=XinYrow.NowXinYAlarmPriority;
end if;
end;
INSERT INTO alerts.XinY
VALUES(new.Identifier,new.LastOccurrence,1,new.XinYXValue,new.XinYYValue,0,ne
w.Summary,new.XinYEffect,new.Severity,new.AlarmPriority,new.AlarmPriority,new
.Severity,new.Summary);
set new.XinY = 2; -- XinY activated.
end if;
-- ENDCompany Customer XinY Logic
Impact without Impact
Appendix B: Alerts.Status XinY Centric Automations  53
deduplication automation
SETTINGS
On alerts.status
Pre database action on Reinsert
Apply to row
Enabled
ACTION
declare
gw_dedup char( 255 );
time_now utc;
begin
-- Get the date once
set time_now = getdate();
if( %user.is_gateway = false ) then
-- Deduplication for non-gateway clients
set old.Tally = old.Tally + 1;
set old.LastOccurrence = new.LastOccurrence;
set old.StateChange = time_now;
set old.InternalLast = time_now;
set old.Summary = new.Summary;
set old.AlertKey = new.AlertKey;
if ( (new.Severity = 0) and (old.Severity > 0) )
then
set old.ClearTime = time_now;
set old.InitialSeverity = old.Severity;
end if;
set old.Severity = new.Severity;
else
-- Deduplication for gateway clients.
-- This section of the trigger emulates
-- the gateway deduplication in v3.6 ObjectServer.
set gw_dedup = get_prop_value( 'GWDeduplication' );
case
-- Do not increment Tally
when( gw_dedup = '0' )
then
set old.LastOccurrence = new.LastOccurrence;
set old.StateChange = time_now;
set old.InternalLast = time_now;
set old.Summary = new.Summary;
set old.AlertKey = new.AlertKey;
set old.Severity = new.Severity;
-- Replace the 'old' row with the 'new' row
when( gw_dedup = '1' )
then
set row old = new;
-- Drop the reinsert
when( gw_dedup = '2' )
then
cancel;
Impact without Impact
Appendix B: Alerts.Status XinY Centric Automations  54
-- Identical to non-gateway deduplication
when( gw_dedup = '3' )
then
set old.Tally = old.Tally + 1;
set old.LastOccurrence = new.LastOccurrence;
set old.StateChange = time_now;
set old.InternalLast = time_now;
set old.Summary = new.Summary;
set old.AlertKey = new.AlertKey;
set old.Severity = new.Severity;
-- Any other value is taken to be a drop
else
cancel;
end case;
end if;
-- Company Customer XinY Logic
if (new.XinYEffect > 0) then
for each row XinYrow in alerts.XinY where XinYrow.Identifier =
new.Identifier
begin
if (new.XinYEffect = 1) then
set old.Summary=XinYrow.NowXinYSummary;
else
if (new.Severity > 0) then
set old.Severity=XinYrow.NowXinYSeverity;
end if;
set old.AlarmPriority=XinYrow.NowXinYAlarmPriority;
end if;
end;
INSERT INTO alerts.XinY
VALUES(new.Identifier,new.LastOccurrence,1,new.XinYXValue,new.XinYYValue,0,ol
d.Summary,new.XinYEffect,old.Severity,old.AlarmPriority,old.AlarmPriority,old
.Severity,old.Summary);
Set old.XinY = 2; -- XinY reactivated.
end if;
-- ENDCompany Customer XinY Logic
end
Impact without Impact
Appendix B: Alerts.Status XinY Centric Automations  55
Appendix C: Alerts.XinY
Centric Automations
XinY_new_row automation
SETTINGS
On alerts.XinY
Pre database action on Insert
Apply to row
Enabled
ACTION
Begin
-- NOTE: Do not increment X, so that when X=0 => delete XinY state.
INSERT INTO alerts.XinYWindow VALUES (new.Identifier ,new.LastOccurrence,0);
-- THIS APPEARS UNEEDED
--SET old.NowXinYSummary = new.NowXinYSummary;
--SET old.NowXinYSeverity = new.NowXinYSeverity;
--SET old.NowXinYAlarmPriority = new.NowXinYAlarmPriority;
end
XinY_deduplication automation
SETTINGS
On alerts.XinY
Pre database action on Reinsert
Apply to row
Enabled
ACTION
begin
SET old.X=old.X+1;
IF (((old.X >= old.XinYXValue) and (old.X < 32000)) or ((old.X >=
old.XinYXValue-32000) and (old.X > 32000))) THEN
-- XINY FIRST TIME, SO DO ACTION
IF (old.LastY = 0) THEN
SET old.X = 31999 + old.XinYXValue; -- Offset for decrementing from
delete
DELETE FROM alerts.XinYWindow WHERE Identifier=new.Identifier; -- New
slate.
IF (old.XinYEffect = 1) THEN
UPDATE alerts.status SET Summary = 'XinY Policy (' +
to_char(old.XinYXValue) + ' event in ' + to_char(old.XinYYValue) + ' seconds
Impact without Impact
Appendix C: Alerts.XinY Centric Automations  56
Met:' + old.PreXinYSummary, XinY=3 where Identifier=new.Identifier and
XinY!=3;
SET old.NowXinYSummary = 'XinY Policy (' + to_char(old.XinYXValue) + '
event in ' + to_char(old.XinYYValue) + ' seconds Met:' + old.PreXinYSummary;
ELSEIF (old.XinYEffect = 2) THEN
IF (new.PreXinYSeverity < 5) THEN
SET old.NowXinYSeverity = new.PreXinYSeverity+1;
UPDATE alerts.status SET Severity = old.NowXinYSeverity, XinY=3 where
Identifier=old.Identifier and XinY!=3;
ELSEIF (old.PreXinYAlarmPriority < 3) THEN
SET old.NowXinYAlarmPriority = new.NowXinYAlarmPriority+1;
UPDATE alerts.status SET AlarmPriority = old.NowXinYAlarmPriority,
XinY=3 where Identifier=old.Identifier and XinY!=3;
SET old.XinYEffect=3; -- Alternate undo behavior when unsetting XinY
ELSE
-- Exception code if Severity and AlarmPriority maxed out?
SET old.XinYEffect=4; -- Alternate undo behavior when unsetting XinY
END IF;
END IF;
END IF;
-- UPDATE START OF UNSET FIXED WINDOW (NOT SLIDING WINDOW ON UNSET)
SET old.LastY=new.LastOccurrence;
-- NEW ALERTS.STATUS ON OLD XINY; REPOPULATE ALERTS.STATUS
END IF;
INSERT INTO alerts.XinYWindow VALUES (new.Identifier ,new.LastOccurrence,0);
end
Impact without Impact
Appendix C: Alerts.XinY Centric Automations  57
Appendix D: Alerts.XinY
Centric Automations
XinY_Expire (Temporal Trigger)
SETTINGS
Every: 15 seconds
Priority: 1
Enabled
EVALUATE
Bind As: XinYbind
-- CLEAR XinY STATE REGARDLESS IF EVENT EXISTS CURRENTLY.
select Identifier,X,XinYYValue,LastY,XinYEffect, PreXinYSeverity,
PreXinYAlarmPriority, PreXinYSummary from alerts.XinY
ACTION
begin
if %rowcount > 0 then
for each row XinYrow in XinYbind
begin
-- UNSET XINY AFTER LONG ENOUGH PERIOD OF NO XINY: NOTE ASSUMES SAME
SYSTEM CLOCK MATCHES EVENT CLOCK
IF ((XinYrow.LastY > 0) AND ((getdate()-XinYrow.LastY) >
XinYrow.XinYYValue*2)) THEN
IF (XinYrow.XinYEffect = 1) THEN -- Cut new text from Summary
UPDATE alerts.status SET Summary=XinYrow.PreXinYSummary, XinY=1
WHERE Identifier = XinYrow.Identifier and XinY!=1;
ELSEIF (XinYrow.XinYEffect = 2) THEN -- Unset Severity
UPDATE alerts.status SET Severity=XinYrow.PreXinYSeverity, XinY=1
WHERE Identifier = XinYrow.Identifier and XinY!=1 and Severity>0;
ELSEIF (XinYrow.XinYEffect = 3) THEN -- Unset AlarmPriority
UPDATE alerts.status SET
AlarmPriority=XinYrow.PreXinYAlarmPriority, XinY=1 WHERE Identifier =
XinYrow.Identifier and XinY!=1;
-ELSEIF (XinYrow.XinYEffect = 4) THEN -- Unset Nothing
END IF;
DELETE FROM alerts.XinY WHERE (Identifier = XinYrow.Identifier); -Delete XinY lifespan record
DELETE FROM alerts.XinYWindow WHERE (Identifier =
XinYrow.Identifier); -- Delete XinY window records
-- OTHERWISE AGE OUT OLDER XINY WINDOW EVENTS
ELSE
DELETE FROM alerts.XinYWindow WHERE (Identifier = XinYrow.Identifier)
and (Occurred < (getdate()-XinYrow.XinYYValue));
end if;
end;
end if;
Impact without Impact
Appendix D: Alerts.XinY Centric Automations  58
DELETE FROM alerts.XinY WHERE (X=0); -- IF XinYWindow_Expire aged all
events.
end
Impact without Impact
Appendix D: Alerts.XinY Centric Automations  59
Appendix E: XinYWindow
Centric Automations
XinYWindow_Expire (Database Trigger)
SETTINGS
On alerts.XinYWindow
Pre database action on Delete
Apply to row
Enabled
ACTION
Being
-- Short cut for impossible SQL: update set X=(select count() from
alerts.XinYWindow)...
UPDATE alerts.XinY SET X=X-1 WHERE ((Identifier=old.Identifier) AND (X !=
32000));
End
XinY_CleanUp (Temporal Trigger)
SETTINGS
Every: 1 hour
Priority: 1
Enabled
ACTION
Begin
-- Stop gap measure too ensure stale events don’t hang around.
DELETE FROM alerts.XinYWindow WHERE (Occurred < (getdate()-134400)); -- 1
day
-DELETE FROM alerts.XinY WHERE (Occurred < (LastOccurrence-2419200)); -4 weeks
End
Impact without Impact
Appendix E: XinYWindow Centric Automations  60
Appendix F: Eventstream.pl
The following script was used to inject events into the alerts.status table to verify the
XinYsolution functioned as expected.
#!/usr/bin/perl
#############################################################################
#
# PROGRAM: eventstream.pl
Daniel L. Needles
2009-10-06
# PURPOSE: Simulate events by directly inserting into alerts.status
#############################################################################
#
#use strict;
use IPC::Open2;
my $user='root';
# USER NAME
my $name='DEV_COL01';
# DATABASE NAME (OMNIBUS)
## GENERIC CLEAR TEST
my @cmds = (
#
Node/IDentifer,WaitTime,Summary,Severity,XinY,XinYXValue,XinYYValue,XinYEffec
t
"'Node 0',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1b',5,'Good Event',0,2,0,0,0,0",
"'Node 1b',5,'Good Event',0,2,0,0,0,0",
"'Node 1b',5,'Good Event',0,2,0,0,0,0",
"'Node 1b',5,'Good Event',0,2,0,0,0,0",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
"'Node 1',5,'Bad Event: Sev 4',4,1,0,2,20,2",
# "'Node 1','Bad Event: Sev 4',4,1,2,10,1",
# "'Node 1','Bad Event: Sev 4',4,1,2,10,1",
# "'Node 2','Bad Event: Sev 4',4,1,2,10,1",
# "'Node 3','Bad Event: Sev 4',4,1,2,10,1",
# "'Node 4','Bad Event: Sev 4',4,1,2,10,1",
# "'Node 5','Bad Event: Sev 4',4,1,2,10,1",
# "'Node 6','Bad Event: Sev 4',4,1,2,10,1"
);
########################################################################
########################### MAIN #####################################
########################################################################
if ( $#cmds > 0) {
#my $pid = open2(FIN,FOUT,"nco_sql -server $name -user $user -password
$pass");
Impact without Impact
Appendix F: Eventstream.pl  61
my $pid = open2(FIN,FOUT,"nco_sql -server $name -user $user");
for ($i=0; $i<=$#cmds; $i++) {
my $tm = time();
my ($Node,$waittm,$dmy) = split /,/,$cmds[$i],3;
$cmds[$i]="$Node,$dmy";
my $ins="insert into alerts.status
(Serial,Identifier,Tally,FirstOccurrence,StateChange,InternalLast,LastOccurre
nce,Node,Summary,Severity,Type,XinY,XinYXValue,XinYYValue,XinYEffect) values
($i, $Node,1,$tm,$tm,$tm,$tm,$cmds[$i]);\n";
print FOUT "$ins";
print "$ins";
print FOUT "go\n";
sleep $waittm;
}
# print FOUT "select
rtrim(Node),rtrim(Summary),XinYXValue,XinYYValue,XinY,XinYEffect,Tally,FirstO
ccurrence,LastOccurrence from alerts.status;\n";
# sleep 1;
# print FOUT "go\n";
# sleep 4;
print FOUT "quit\n";
}
## DETERMINE SCHEMA AND BUFFER ALL INPUT INTO ARRAY
$d=1;
my $allline;
while ( my $line=<FIN> ) {
# chomp($line);
$line=~s:\0:,:g;
$line=~s:\s+,:,:g;
$line=~s:,\s+:,:g;
$line=~s:[ \t]+: :g;
if (!($line =~ /^\s*$/)) {
$allline.=$line;
}
}
#$allline =~s:,\n::smg;
print $allline;
Impact without Impact
Appendix F: Eventstream.pl  62
APPENDIX G: Event Audit
Report Script
This program provides statistical information regarding what automations, probes, and/or
Impact policies have touched which events and when.
#!/usr/bin/perl
########################################################################
PROGRAM: AuditTable.cgi
# PURPOSE: To create an HTML table of Netcool Automations
# ISSUES: There is a known limit of 255 char retreval via this
#
version of DBI.
#######################################################################
use CGI;
use Time::localtime;
use Getopt::Long;
## HANDLES COMMAND LINE OPTIONS
use DBI;
my $Id;
my $Node;
my $Start;
my $End;
GetOptions("Identifier=s"
=> \$Id,
"Node=s"
=> \$Node,
"Start=i"
=> \$Start,
"End=i"
=> \$End);
my $q = new CGI;
$q->import_names("X");
# Read in site data inputs
my $filter = ($q->param(filter));
my $custom_vis = "hidden";
#### OPEN Netcool DATABASE CONNECTION
$ENV{"ORACLE_HOME"}="/opt/oracle/product/11.1.0/db_1";
$ENV{"LD_LIBRARY_PATH"}="/usr/local/lib:/usr/lib:/opt/oracle/product/11.1.0/d
b_1/lib";
$ENV{"SYBASE"}="/usr/local";
#print "Content-TYPE: text/html","\n\n";
$|++; # Unbuffer output
my $dbh = DBI->connect( 'dbi:Sybase:NETCOM', "user", "password", {
RaiseError=> 0, PrintError => 0, AutoCommit => 0 } );
## OUTPUT HTML FILE HEADER
print <<HTML;
<html>
<head>
<style>
Impact without Impact
APPENDIX G: Event Audit Report Script  63
a.menu:link { color: #FFFFFF; size: -3; text-decoration: none; }
a.menu:visited { color: #FFFFFF; size: -3; text-decoration: none; }
a.menu:hover { color: #99CCFF; size: -3 }
.fttext1 {
font-size: 12px;
font-family: Arial, Helvetica, sans-serif;
color: #000000;
font-weight: bold;
}
.Texth {
horizontal-align:center;
color: black;
background-color:yellow
font-family: "Arial, Helvetica, sans-serif";
font-size: 12pt;
font-weight: bold;
text-decoration: underline;
}
.Textu {
color: black;
background-color: yellow
font-family: "Arial, Helvetica, sans-serif";
font-size: 12pt;
font-weight: bold;
text-decoration: underline;
}
.Text2 {
color: black;
font-family: "Arial, Helvetica, sans-serif";
font-size: 10pt;
font-weight: normal;
}
.Text1 {
vertical-align:top;
color: black;
font-family: "Arial, Helvetica, sans-serif";
font-size: 10pt;
font-weight: bold;
}
</style>
<body>
<table width=90\% border=1 cellspacing =1 cellpadding =1>
HTML
my $select_sql = ($Id)?"select Occurred, Entity, Identifier from
alerts.alarmaudit where Identifier = '$Id'":($Node)?"select Occurred, Entity,
Identifier from alerts.alarmaudit where Identifier in (select Identifier from
alerts.status)":'select Occurred, Entity, Identifier from alerts.alarmaudit';
if ($Start) {
my $tm=time()-$Start;
$select_sql.=" and Occurred > $tm";
}
if ($End) {
my $tm=time()-$End;
$select_sql.=" and Occurred < $tm";
Impact without Impact
APPENDIX G: Event Audit Report Script  64
}
#print "'$select_sql'\n";
my $sth = $dbh->prepare("$select_sql");
$sth->execute;
$min=99999999999999;
while(my ($Occurred,$Entity,$Identifier)=$sth->fetchrow_array ) {
$Occurred=~s:\0::;
$Entity=~s:\0::;
$Identifier=~s:\0::;
$Event{$Identifier}++; ## Group by Event
$Automation{$Entity}++; ## Group by Automation
$a=$Occurred;
$Occurred=int($a/3600)*3600; $Hourly{$Occurred}++; ## Group by the hour
$a%=3600; $a=int($a/300); $Periods{$a}++; ## Group by 5 min on hour
$min=($min<$Occurred)?$min:$Occurred;
$max=($max>$Occurred)?$max:$Occurred;
}
## GET EVENT DISTRIBUTION
foreach $event (sort keys %Event) {
$Distribution{$Event{$event}}++;
}
## HEADER
print "<tr><td><table width=100% border=1 cellspacing =2 cellpadding =2>\n";
print " <tr><td width=100% class=Texth>AUTOMATION AUDIT REPORT</td></tr>\n";
## TRIGGER NAME
my $tm = localtime($min);
my $buf1=sprintf("%02d/%02d/%04d %02d:%02d:%02d",
$tm->mon+1, $tm->mday, $tm->year+1900,
$tm->hour, $tm->min, $tm->sec);
$tm = localtime($max);
my $buf2=sprintf("%02d/%02d/%04d %02d:%02d:%02d ",
$tm->mon+1, $tm->mday, $tm->year+1900,
$tm->hour, $tm->min, $tm->sec);
my $name=($Id)?"All Events Where Identifier = $Identifier":($Node)?"All
Events Where Node = $Node":'All Events In The Audit Table';
print " <tr><td width=100% class=Texth>Against $name</td></tr>\n"; ##
TRIGGER NAME
print " <tr><td width=100% class=Texth>For period from $buf1 to
$buf2</td></tr>\n"; ## TRIGGER NAME
print "</table>\n"; ## CODE BLOCK
## AUTOMATION HITS
print "<tr><td><table width=100% border=1 cellspacing =2 cellpadding =2>\n";
print " <tr><td width=250 class=Textu>Automation</td><td class=Textu>Hit
Count</td></tr>\n"; ## TRIGGER NAME
foreach $automation (sort keys %Automation) {
$automation=($automation)?$automation:'Non Automation Update';
print " <tr><td width=250 class=Text1>$automation:</td><td
class=Text2>$Automation{$automation}</td></tr>\n"; ## TRIGGER NAME
}
print "</table>\n"; ## CODE BLOCK
## PERIODS IN THE HOUR HITS
print "<tr><td><table width=100% border=1 cellspacing =2 cellpadding =2>\n";
Impact without Impact
APPENDIX G: Event Audit Report Script  65
print " <tr><td width=250 class=Textu>Time Periods</td><td
class=Textu>Number of Events during period</td></tr>\n"; ## TRIGGER NAME
$i=0;
foreach $event (sort { $a <=> $b } keys %Periods) {
my $a = 5 * $event;
my $b = $a+5;
print "
<tr><td width=250 class=Text1>Period $a-$b Min:</td><td
class=Text2>$Periods{$event}</td></tr>\n"; ## TRIGGER NAME
}
print "</table>\n"; ## CODE BLOCK
## HOURLY HISTORY
print "<tr><td><table width=100% border=1 cellspacing =2 cellpadding =2>\n";
print " <tr><td width=250 class=Textu>Hourly History</td><td
class=Textu>Number of Hits</td></tr>\n"; ## TRIGGER NAME
$i=0;
foreach $event (sort { $a <=> $b } keys %Hourly) {
my $tm = localtime($event);
my $buf=sprintf("%02d/%02d/%04d %02d hr",
$tm->mon+1, $tm->mday, $tm->year+1900,
$tm->hour);
print "
<tr><td width=250 class=Text1>$buf</td><td
class=Text2>$Hourly{$event}</td></tr>\n"; ## TRIGGER NAME
}
print "</table>\n"; ## CODE BLOCK
## HIT BY EVENT BY COUNT
print "<tr><td><table width=100% border=1 cellspacing =2 cellpadding =2>\n";
print " <tr><td width=250 class=Textu>Events Hit X Times by
Automations</td><td class=Textu>Number of Events with X Hits</td></tr>\n"; ##
TRIGGER NAME
$i=0;
foreach $event (sort { $a <=> $b } keys %Distribution) {
print "
<tr><td width=250 class=Text1>Events hit $event times:</td><td
class=Text2>$Distribution{$event}</td></tr>\n"; ## TRIGGER NAME
}
print "</table>\n"; ## CODE BLOCK
## END OF OUTPUT
print" </body></html>";
$dbh->disconnect;
Impact without Impact
APPENDIX G: Event Audit Report Script  66
APPENDIX H: Event Audit
Report (Example 1)
This is the first example using the Automation Audit Report script. In this case we are
interested in the audit trail for a particular Identifier. We accomplish this by specifying the
identifier through the following command:
perl AuditTable.cgi -Identifier 'den0118-admw103-2003OMX DS1VME CLI
LOST1E1LOS0303-2003-202' > q.html
This produces the following report:
AUTOMATION AUDIT REPORT
Against All Events Where Identifier =
For period from 09/21/2009 20:00:00 to 09/26/2009 15:00:00
Automation
Hit Count
GenericClear Automation-1:
625
Time Periods
Number of Events during period
Period 0-5 Min:
52
Period 5-10 Min:
53
Period 10-15 Min:
51
Period 15-20 Min:
52
Period 20-25 Min:
55
Period 25-30 Min:
53
Period 30-35 Min:
53
Period 35-40 Min:
50
Period 40-45 Min:
52
Period 45-50 Min:
52
Impact without Impact
APPENDIX H: Event Audit Report (Example 1)  67
Period 50-55 Min:
51
Period 55-60 Min:
51
Hourly History
Number of Hits
09/21/2009 20 hr
5
09/21/2009 21 hr
5
09/21/2009 22 hr
6
09/21/2009 23 hr
5
09/22/2009 00 hr
6
09/22/2009 01 hr
5
09/22/2009 02 hr
6
09/22/2009 03 hr
5
09/22/2009 04 hr
5
09/22/2009 05 hr
6
09/22/2009 06 hr
5
09/22/2009 07 hr
6
09/22/2009 08 hr
5
09/22/2009 09 hr
6
09/22/2009 10 hr
5
09/22/2009 11 hr
6
09/22/2009 12 hr
5
09/22/2009 13 hr
5
09/22/2009 14 hr
6
09/22/2009 15 hr
5
09/22/2009 16 hr
5
09/22/2009 17 hr
6
09/22/2009 18 hr
5
Impact without Impact
APPENDIX H: Event Audit Report (Example 1)  68
09/22/2009 19 hr
5
09/22/2009 20 hr
6
09/22/2009 21 hr
5
09/22/2009 22 hr
6
09/22/2009 23 hr
5
09/23/2009 00 hr
6
09/23/2009 01 hr
5
09/23/2009 02 hr
6
09/23/2009 03 hr
5
09/23/2009 04 hr
5
09/23/2009 05 hr
6
09/23/2009 06 hr
5
09/23/2009 07 hr
6
09/23/2009 08 hr
5
09/23/2009 09 hr
6
09/23/2009 10 hr
5
09/23/2009 11 hr
5
09/23/2009 12 hr
6
09/23/2009 13 hr
5
09/23/2009 14 hr
6
09/23/2009 15 hr
4
09/23/2009 16 hr
6
09/23/2009 17 hr
5
09/23/2009 18 hr
5
09/23/2009 19 hr
6
09/23/2009 20 hr
5
09/23/2009 21 hr
6
Impact without Impact
APPENDIX H: Event Audit Report (Example 1)  69
09/23/2009 22 hr
5
09/23/2009 23 hr
6
09/24/2009 00 hr
5
09/24/2009 01 hr
6
09/24/2009 02 hr
5
09/24/2009 03 hr
5
09/24/2009 04 hr
6
09/24/2009 05 hr
5
09/24/2009 06 hr
6
09/24/2009 07 hr
5
09/24/2009 08 hr
6
09/24/2009 09 hr
5
09/24/2009 10 hr
5
09/24/2009 11 hr
5
09/24/2009 12 hr
5
09/24/2009 13 hr
6
09/24/2009 14 hr
5
09/24/2009 15 hr
6
09/24/2009 16 hr
5
09/24/2009 17 hr
6
09/24/2009 18 hr
5
09/24/2009 19 hr
6
09/24/2009 20 hr
5
09/24/2009 21 hr
5
09/24/2009 22 hr
6
09/24/2009 23 hr
5
09/25/2009 00 hr
6
Impact without Impact
APPENDIX H: Event Audit Report (Example 1)  70
09/25/2009 01 hr
5
09/25/2009 02 hr
6
09/25/2009 03 hr
5
09/25/2009 04 hr
5
09/25/2009 05 hr
6
09/25/2009 06 hr
5
09/25/2009 07 hr
6
09/25/2009 08 hr
5
09/25/2009 09 hr
6
09/25/2009 10 hr
5
09/25/2009 11 hr
5
09/25/2009 12 hr
6
09/25/2009 13 hr
5
09/25/2009 14 hr
6
09/25/2009 15 hr
5
09/25/2009 16 hr
6
09/25/2009 17 hr
5
09/25/2009 18 hr
6
09/25/2009 19 hr
5
09/25/2009 20 hr
5
09/25/2009 21 hr
6
09/25/2009 22 hr
5
09/25/2009 23 hr
6
09/26/2009 00 hr
5
09/26/2009 01 hr
6
09/26/2009 02 hr
5
09/26/2009 03 hr
5
Impact without Impact
APPENDIX H: Event Audit Report (Example 1)  71
09/26/2009 04 hr
6
09/26/2009 05 hr
5
09/26/2009 06 hr
6
09/26/2009 07 hr
5
09/26/2009 08 hr
6
09/26/2009 09 hr
5
09/26/2009 10 hr
6
09/26/2009 11 hr
5
09/26/2009 12 hr
5
09/26/2009 13 hr
6
09/26/2009 14 hr
5
09/26/2009 15 hr
3
Events Hit X Times by
Automations
Number of Events with X Hits
Events hit 625 times:
1
Impact without Impact
APPENDIX H: Event Audit Report (Example 1)  72
APPENDIX I: Event Audit
Report (Example 2)
This is the second example using the Automation Audit Report script. In this case we are
interested in all events or the last 3 days. We accomplish this by specifying:
 Node
 Start time as second prior to now
 End time for the period as seconds prior to now
This translates to the following command:
perl AuditTable.cgi -Node 'wdc1749-rocm1' -start 6220800 -end 0 > q.html
AUTOMATION AUDIT REPORT
Against All Events Where Node = wdc1749-rocm1
For period from 09/21/2009 19:00:00 to 09/26/2009 15:00:00
Automation
Hit Count
Non Automation Update:
EscalateAlarms Automation -1:
2
EscalateAlarms Automation-1:
22
GenericClear Automation-1:
69491
GenericClear Automation-2:
58919
Level1Page Automation-1:
181
Level1Page Automation-3:
11
Level1Page Automation-5:
3
Time Periods
Number of Events during period
Period 0-5 Min:
11375
Period 5-10 Min:
12114
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  73
Period 10-15 Min:
10671
Period 15-20 Min:
11495
Period 20-25 Min:
10392
Period 25-30 Min:
10638
Period 30-35 Min:
11969
Period 35-40 Min:
11043
Period 40-45 Min:
12175
Period 45-50 Min:
10442
Period 50-55 Min:
11780
Period 55-60 Min:
12088
Hourly History
Number of Hits
09/21/2009 19 hr
3
09/21/2009 20 hr
855
09/21/2009 21 hr
1060
09/21/2009 22 hr
1123
09/21/2009 23 hr
891
09/22/2009 00 hr
1131
09/22/2009 01 hr
1099
09/22/2009 02 hr
1053
09/22/2009 03 hr
1109
09/22/2009 04 hr
914
09/22/2009 05 hr
1052
09/22/2009 06 hr
950
09/22/2009 07 hr
998
09/22/2009 08 hr
933
09/22/2009 09 hr
819
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  74
09/22/2009 10 hr
944
09/22/2009 11 hr
738
09/22/2009 12 hr
913
09/22/2009 13 hr
814
09/22/2009 14 hr
991
09/22/2009 15 hr
1071
09/22/2009 16 hr
1136
09/22/2009 17 hr
1089
09/22/2009 18 hr
1081
09/22/2009 19 hr
1038
09/22/2009 20 hr
1090
09/22/2009 21 hr
1003
09/22/2009 22 hr
1106
09/22/2009 23 hr
1087
09/23/2009 00 hr
892
09/23/2009 01 hr
1068
09/23/2009 02 hr
1084
09/23/2009 03 hr
949
09/23/2009 04 hr
1182
09/23/2009 05 hr
1070
09/23/2009 06 hr
983
09/23/2009 07 hr
993
09/23/2009 08 hr
891
09/23/2009 09 hr
1135
09/23/2009 10 hr
969
09/23/2009 11 hr
916
09/23/2009 12 hr
1034
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  75
09/23/2009 13 hr
815
09/23/2009 14 hr
2092
09/23/2009 15 hr
1387
09/23/2009 16 hr
1290
09/23/2009 17 hr
1019
09/23/2009 18 hr
991
09/23/2009 19 hr
858
09/23/2009 20 hr
1019
09/23/2009 21 hr
961
09/23/2009 22 hr
1070
09/23/2009 23 hr
1062
09/24/2009 00 hr
1006
09/24/2009 01 hr
1195
09/24/2009 02 hr
1160
09/24/2009 03 hr
1090
09/24/2009 04 hr
1123
09/24/2009 05 hr
1055
09/24/2009 06 hr
1021
09/24/2009 07 hr
1020
09/24/2009 08 hr
2324
09/24/2009 09 hr
1322
09/24/2009 10 hr
1037
09/24/2009 11 hr
1736
09/24/2009 12 hr
2774
09/24/2009 13 hr
1356
09/24/2009 14 hr
2472
09/24/2009 15 hr
1116
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  76
09/24/2009 16 hr
1187
09/24/2009 17 hr
1221
09/24/2009 18 hr
1361
09/24/2009 19 hr
1233
09/24/2009 20 hr
1472
09/24/2009 21 hr
1539
09/24/2009 22 hr
1348
09/24/2009 23 hr
1230
09/25/2009 00 hr
1206
09/25/2009 01 hr
1292
09/25/2009 02 hr
1195
09/25/2009 03 hr
1115
09/25/2009 04 hr
1199
09/25/2009 05 hr
994
09/25/2009 06 hr
1027
09/25/2009 07 hr
1089
09/25/2009 08 hr
965
09/25/2009 09 hr
945
09/25/2009 10 hr
1424
09/25/2009 11 hr
1188
09/25/2009 12 hr
977
09/25/2009 13 hr
1000
09/25/2009 14 hr
1057
09/25/2009 15 hr
965
09/25/2009 16 hr
1196
09/25/2009 17 hr
1089
09/25/2009 18 hr
1250
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  77
09/25/2009 19 hr
1091
09/25/2009 20 hr
1468
09/25/2009 21 hr
1138
09/25/2009 22 hr
1340
09/25/2009 23 hr
2162
09/26/2009 00 hr
1100
09/26/2009 01 hr
1455
09/26/2009 02 hr
1076
09/26/2009 03 hr
1041
09/26/2009 04 hr
1136
09/26/2009 05 hr
984
09/26/2009 06 hr
1129
09/26/2009 07 hr
1688
09/26/2009 08 hr
1467
09/26/2009 09 hr
1671
09/26/2009 10 hr
1980
09/26/2009 11 hr
1606
09/26/2009 12 hr
1421
09/26/2009 13 hr
1152
09/26/2009 14 hr
1187
09/26/2009 15 hr
498
Events Hit X Times by
Automations
Number of Events with X Hits
Events hit 1 times:
2086
Events hit 2 times:
520
Events hit 3 times:
622
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  78
Events hit 4 times:
226
Events hit 5 times:
185
Events hit 6 times:
135
Events hit 7 times:
110
Events hit 8 times:
52
Events hit 9 times:
25
Events hit 10 times:
24
Events hit 11 times:
31
Events hit 12 times:
10
Events hit 13 times:
8
Events hit 14 times:
44
Events hit 15 times:
121
Events hit 16 times:
217
Events hit 17 times:
24
Events hit 18 times:
4
Events hit 19 times:
7
Events hit 20 times:
4
Events hit 21 times:
5
Events hit 22 times:
5
Events hit 23 times:
8
Events hit 24 times:
5
Events hit 25 times:
1
Events hit 26 times:
5
Events hit 27 times:
4
Events hit 28 times:
3
Events hit 29 times:
11
Events hit 30 times:
6
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  79
Events hit 31 times:
6
Events hit 32 times:
6
Events hit 33 times:
1
Events hit 34 times:
2
Events hit 35 times:
2
Events hit 37 times:
1
Events hit 38 times:
2
Events hit 39 times:
1
Events hit 40 times:
1
Events hit 42 times:
3
Events hit 43 times:
1
Events hit 44 times:
2
Events hit 47 times:
2
Events hit 50 times:
3
Events hit 51 times:
3
Events hit 52 times:
1
Events hit 53 times:
4
Events hit 55 times:
2
Events hit 57 times:
1
Events hit 62 times:
2
Events hit 63 times:
2
Events hit 64 times:
1
Events hit 67 times:
1
Events hit 72 times:
1
Events hit 73 times:
1
Events hit 74 times:
2
Events hit 76 times:
1
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  80
Events hit 84 times:
1
Events hit 85 times:
2
Events hit 89 times:
1
Events hit 91 times:
1
Events hit 93 times:
4
Events hit 94 times:
2
Events hit 95 times:
1
Events hit 97 times:
4
Events hit 101 times:
1
Events hit 105 times:
1
Events hit 111 times:
1
Events hit 112 times:
1
Events hit 117 times:
2
Events hit 118 times:
1
Events hit 123 times:
1
Events hit 128 times:
1
Events hit 129 times:
2
Events hit 130 times:
1
Events hit 131 times:
2
Events hit 133 times:
1
Events hit 134 times:
1
Events hit 138 times:
1
Events hit 139 times:
1
Events hit 140 times:
1
Events hit 141 times:
1
Events hit 142 times:
1
Events hit 145 times:
1
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  81
Events hit 146 times:
1
Events hit 150 times:
1
Events hit 153 times:
2
Events hit 154 times:
1
Events hit 155 times:
2
Events hit 158 times:
1
Events hit 159 times:
2
Events hit 162 times:
1
Events hit 165 times:
1
Events hit 170 times:
1
Events hit 171 times:
1
Events hit 172 times:
2
Events hit 174 times:
1
Events hit 175 times:
1
Events hit 177 times:
1
Events hit 178 times:
3
Events hit 179 times:
1
Events hit 182 times:
2
Events hit 184 times:
1
Events hit 188 times:
1
Events hit 194 times:
1
Events hit 205 times:
1
Events hit 207 times:
1
Events hit 208 times:
1
Events hit 219 times:
1
Events hit 220 times:
2
Events hit 228 times:
1
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  82
Events hit 236 times:
1
Events hit 239 times:
1
Events hit 253 times:
1
Events hit 278 times:
2
Events hit 299 times:
2
Events hit 301 times:
1
Events hit 302 times:
1
Events hit 305 times:
1
Events hit 391 times:
1
Events hit 437 times:
1
Events hit 438 times:
1
Events hit 450 times:
1
Events hit 454 times:
1
Events hit 457 times:
1
Events hit 538 times:
1
Events hit 571 times:
2
Events hit 573 times:
2
Events hit 623 times:
1
Events hit 624 times:
4
Events hit 625 times:
2
Events hit 627 times:
1
Events hit 634 times:
1
Events hit 665 times:
1
Events hit 809 times:
1
Events hit 812 times:
2
Events hit 814 times:
2
Events hit 816 times:
1
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  83
Events hit 817 times:
1
Events hit 818 times:
1
Events hit 819 times:
3
Events hit 820 times:
1
Events hit 822 times:
3
Events hit 823 times:
2
Events hit 824 times:
1
Events hit 876 times:
1
Events hit 1440 times:
1
Events hit 1442 times:
1
Events hit 1443 times:
2
Events hit 1806 times:
2
Events hit 1854 times:
1
Events hit 2155 times:
1
Events hit 2160 times:
1
Events hit 2164 times:
1
Events hit 2165 times:
1
Events hit 3139 times:
1
Events hit 3189 times:
1
Events hit 3198 times:
1
Events hit 3257 times:
1
Events hit 3268 times:
1
Events hit 3915 times:
1
Events hit 3924 times:
1
Events hit 4255 times:
1
Events hit 4346 times:
1
Events hit 4362 times:
1
Impact without Impact
APPENDIX I: Event Audit Report (Example 2)  84
Events hit 5030 times:
1
Events hit 6549 times:
1
Events hit 6848 times:
1
Impact without Impact
 85
APPENDIX I: Impact-like PERL
Shell
The following is a shell which provides a basic example of an impact-like PERL shell.
#!/usr/bin/perl
################################################################################
# 09/24/2009
Daniel L. Needles
Version 0.1
#
# PROGRAM: psuedoimpact
#
# USAGE: psuedoimpact
#
#
[-debug <debug number 1-255>]
#
#
1
- Log any warning or errors.
#
#
2
- Function Entry and exit logging.
#
#
4
- Inner function verbose logging
#
#
8
- Dump before and after Netcool alert.status fields.
#
#
16 - Query Netcool alerts.status only once.
#
#
32 - Clear old log.
#
#
64 - Show initial alert.status field values.
#
#
128 - Do not perform update, instead document
#
#
what would be done.
#
#
[-node] (Node to perform extra logging on.)
#
#
[-logfile] (full path and file to log file.)
#
#
*[-sumfile] (full path and file to the change summary file.)
#
#
[-cycle] (Cycle period in seconds.)
#
#
[-cyclemin] (minimum delay between cycles, even if the cycle
#
#
overruns.)
#
#
[-delay] (Ignore events with Populated field younger than)
#
#
[-stdout] --Log to standard out
#
#
[-help]
#
# DESCRIPTION: Program enriches Netcool based on Oracle info:
#
#
@Node
: With HOST name, if Node=IP and HOST in Oracle.
#
#
CIRC_PATH_HUM_ID.
#
#
@Customer
: With Customer information found ($customer).
#
#
Journal
: With warning, errors found by Populate.pl.
#
# PURPOSE: Program replaces Impact's enrichment functionality and
#
#
updates Netcool with information pulled from Oracle.
#
# PSUDO CODE:
#
# 1. Declare packages,environment vars,database security vars, command line
#
#
parmaeter defaults, other database assignments.
#
# 2. Declare and prepare Oracle SQL statements for tables: equipment, site,
#
#
card, path, port, and channel.
#
# 3. Load procedures: LogMsg, dumpnetcool, usage, commandline,
#
#
Each is documented regarding PSUDO code.
#
# 4. Run main program:
#
#
a. Prepare Netcool SQL for alert.status and open logfile and change
#
#
summary file.
#
#
b. Loop forever (unless DEBUG & 16 in which case only loop once.)
#
#
1. Pull events from Netcool alerts.status
#
#
2. For each event
#
#
a. Clear data structures.
#
#
b. Perform Impact Policy work
#
#
i. Build and execute UPDATE against Netcool alerts.status to update #
#
the event.
#
#
3. Sleep the remaining time in the cycle and log if the cyle has
#
#
overran its length. Make sure to sleep a minimum time between
#
#
cycles ($CYCLEMIN)
#
#
c. Disconnect from Netcool and Oracle.
#
Impact without Impact
APPENDIX I: Impact-like PERL Shell  86
################################################################################
# ERRORS AND WARNING:
#
#
There are several errors and warnings logged by the system. These are
#
#
stored in the Journal entries for the events and the log file IF DEBUG
#
#
IS SET TO 1. (The default value of DEBUG is 1 if not specified.)
#
#
The errors occur within the Policy code's exceptions and are coded as
#
#
follows:
#
#
#
# WARN: <ERROR NUMBER>: <ERROR MESSAGE>
#
################################################################################
# DEBUG RULES:
#
# 0
- No logging.
#
# 1
- Errors and warning logging.
#
# 2
- Function Entry and exit logging.
#
# 4
- Node based logging.
#
# 8
- Dump before and after Netcool alert.status fields.
#
# 16 - Query Netcool alerts.status only once.
#
# 32 - Clear old log
#
# 64 - Show initial alert.status field values.
#
#128 - Do not perform update, instead document what would be done.
#
################################################################################
use strict;
## TRAINING WHEELS ON
###################################### PACKAGES ################################
use Getopt::Long;
## HANDLES COMMAND LINE OPTIONS
use DBI;
## ACCESS TO NETCOOL AND ORACLE VIA DATABASES
use Time::localtime;
## Convert raw seconds to date
############################# ENVIRONMENT VARIABLES ############################
$ENV{SYBASE}='/usr/local'; ## Local of SYBASE
$ENV{LD_LIBRARY_PATH}='/usr/local/lib:/usr/lib:/opt/oracle/orahome/lib';
$ENV{'ORACLE_HOME'}='/opt/oracle/orahome';
##########################
my $ORA_USER='USER';
my $ORA_PASS='PASSWORD';
my $NETCOOL_USER='root';
my $NETCOOL_PASS='';
my $NETCOOL_UID=500;
DATABASE PERMISSIONS AND IDENT ######################
## Oracle USER
## Oracle PASSWORD
## Netcool USER
## Netcool PASSWORD
## Netcool USERID (Needed to insert journals)
################### COMMAND LINE OPTION DEFAULTS ###############################
my $STDOUT=0;
## DEFAULT TO OUTPUT TO LOG FILE RATHER THAN STDOUT
my $NODE;
## Used for Debug & 4.
my $DEBUG=1;
## DEFAULT TO PRINT ERRORS AND WARNS TO THE LOG
my $CYCLESPEED=60;
## CYCLE PERIOD IN SECONDS
my $CYCLEMIN=10;
## MIN REST PERIOD BETWEEN CYCLES IN SECONDS
my $ISPRIMARY=1;
## IS THIS THE PRIMARY PROGRAM?
my $SECONDARYDELAY=$CYCLESPEED*2; ## IF SECONDARY WHAT ADDITIONAL DELAY IS USED
my $EVENTDELAY=($ISPRIMARY)?259200:345600; ## REPOPULATE EVENTS 2 DAYS OLD ON
## PRIMARY AND 3 DAYS OLD ON SECONDARY
my $POPULATESERVER=($ISPRIMARY)?'Primary':'Secondary'; ## WILL BE CALCULATED AGAIN
AFTER COMMANDLINE ARGS
my $LOGFILE='';
## ASSUME NO LOG FILE NAME.
my $SUMMARYFILE='';
## ASSUME NO 'SUMMARY OF CHANGES' FILE
##################### OTHER DATABASE ASSIGNMENTS ###############################
my
($NC,$XNG_CARD,$XNG_DS1CHANNEL,$XNG_EQUIPMENT,$XNG_PATH,$XNG_PORT,$XNG_SITE,$XNG_STSPO
RT);
## DATABASE TABLE HANDLES.
my $NCJournalEntry;
## JOURNAL UPDATES
my %NCDMP;
## DETECTS UPDATES TO NETCOOL (BEFORE AND AFTER)
my $oracle_dbh;
## ORACLE FILE HANDLE
Impact without Impact
APPENDIX I: Impact-like PERL Shell  87
my $netcool_dbh;
## NETCOOL FILE HANDLE
my $UPDATESET='';
## USE TO BUILD UPDATE SET COMMAND CLAUSE
my @UPDATECOL= qw/Customer Node Site Summary/;
## FULL INVENTORY OF FIELDS THAT CAN BE SET
###########################
DATABASE SET UP ##################################
## OPEN CONNECTION TO ORACLE
$oracle_dbh = DBI->connect( 'dbi:Oracle:oracle', $ORA_USER, $ORA_PASS, { RaiseError=>
1, PrintError => 1, AutoCommit => 1 } ) or die "Connection failed to Oracle (oracle):
" . $DBI::errst;
## OPEN CONNECTION TO NETCOOL
$netcool_dbh = DBI->connect( 'dbi:Sybase:NCOMS', $NETCOOL_USER, $NETCOOL_PASS, {
RaiseError=> 1, PrintError => 0, AutoCommit => 1} ) or die "Connection failed: " .
$DBI::errst;
## BIND VARIABLES FOR THE NETCOOL EVENT FIELDS IN FORM: NC<FIELD NAME>
my
($NCSummary,$NCSite,$NCSerial,$NCTally,$NCNode,$NCIdentifier,$NCCustomer,$NCAlertKey,$
NCAlertGroup);
my $NCERROR; ## ERROR TRACKING...
## SQL PREP: DEFINE THE COLUMNS SELECTED FROM NETCOOL
my
$IMPACTSELECT="Summary,Site,Serial,Tally,Node,Identifier,Customer,AlertKey,AlertGroup"
;
my $IMPACTFILTER; ## NETCOOL FILTER HAS TO BE SET AFTER COMMANDLINE IS CALLED.
################################################################################
##################### SQL PREPARE STATEMENTS ###################################
####### THIS, ALONG WITH BIND_COLUMNS IS USED TO SPEED UP THINGS 10X ###########
################################################################################
# HERE ARE SOME EXAMPLES OF ORACLE SQL PREP STATEMENTS
#
####################### SQLPREP: ORACLE EQUIPMENT TABLE #######################
my ($EQEQUIP,$EQDESCR,$EQIP_ADDRESS,$EQMODEL);
## VERSION 1: LOOKUP ON IP ADDRESS
my $xng_equipment1 = $oracle_dbh->prepare(<<SQL);
SELECT
EQUIP,
DESCR,
IP_ADDRESS,
MODEL
FROM webuser.IMP_OSS_EQUIP_INST
WHERE
IP_ADDRESS = ?
SQL
#$xng_equipment1->bind_columns(\$EQEQUIP,\$EQDESCR,\$EQIP_ADDRESS,\$EQMODEL);
## VERSION 2: LOOKUP ON HOSTNAME
my $xng_equipment2 = $oracle_dbh->prepare(<<SQL);
SELECT
EQUIP,
DESCR,
IP_ADDRESS,
MODEL
FROM webuser.IMP_OSS_EQUIP_INST
WHERE
DESCR = ?
SQL
#$xng_equipment2->bind_columns(\$EQEQUIP,\$EQDESCR,\$EQIP_ADDRESS,\$EQMODEL);
################################################################################
Impact without Impact
APPENDIX I: Impact-like PERL Shell  88
####################### END OF SQL PREPARE SECTION #############################
################################################################################
################################################################################
# PROCEDURE: LogMsg
# PURPOSE: Write messages to logfiles or whatever else is needed.
################################################################################
sub LogMsg {
# NOTE: DO NOT CALL LogMsg to declare Function ENTRY,EXIT FOR LOGMSG - INFINITE LOOP
my $msg = shift;
my $tm = localtime(time());
my $buf=sprintf("%02d/%02d/%04d %02d:%02d:%02d:",
$tm->mon+1, $tm->mday, $tm->year+1900,
$tm->hour, $tm->min, $tm->sec);
print LOG "$buf $msg\n";
## LOG ERROR AND WARNINGS ADDED TO JOURNAL
if ($msg =~ /^ERROR:/) {
$NCJournalEntry .= " ERROR: contact Netcool admin. $msg";
} elsif ($msg =~ /^WARN:/) {
$NCJournalEntry.=" $msg";
}
return;
}
################################################################################
# PROCEDURE: dumpnetcool
# PURPOSE: If debug & 8 is set, then this procedure is called. The procedure
#
documents the changes done to the particular event.
################################################################################
sub dumpprint {
## LOG THE BEFORE AND AFTER IMAGE OF EACH POTENTIALLY UPDATABLE NETCOOL
# EVENT FIELD TO THE LOG. (NOTE EACH FIELD WILL BE OF THE FORM:
#
<BEFORE VALUE>~<AFTER VALUE>
# IF THERE HAS BEEN NO CHANGE TO THE FIELD THEN THE UPDATE WILL SHOW:
#
~
LogMsg("\n************ EVENT DUMP ******************");
LogMsg("Customer:
$NCDMP{Customer}");
LogMsg("Node:
$NCDMP{Node}");
LogMsg("Site:
$NCDMP{Site}");
LogMsg("Summary:
$NCDMP{Summary}");
LogMsg("JOURNAL:
$NCDMP{JournalEntry}");
LogMsg("Error:
$NCERROR");
LogMsg("******************************************\n");
## SPECIAL FILTERS TO MAKE IT EASIER TO READ THE CHANGE SUMMARY FILE
my $journalit=($NCTally < 30)?'Yes':'No';
## DUMP A ROW TO THE CHANGE SUMMARY FILE THAT REPRESENTS THE EVENTS
print SUM
"$NCNode~$EQDESCR~$EQIP_ADDRESS~$EQMODEL~$alertgroup~$alertkey~$journalit~$NCDMP{Custo
mer}~$NCDMP{Node}~$NCDMP{Site}~$NCDMP{Summary}~$NCDMP{JournalEntry}~$NCERROR\n";
return;
}
################################################################################
# PROCEDURE: usage
# PURPOSE: Reports available usage parameters for the program to the
#
standard out.
################################################################################
sub usage {
Impact without Impact
APPENDIX I: Impact-like PERL Shell  89
print <<EOF;
USAGE: psuedoimpact
[-debug
1
2
4
8
16 32 64 128 -
<debug number 1-255>]
Log any warning or errors.
Function Entry and exit logging.
Inner function verbose logging
Dump before and after Netcool alert.status fields.
Query Netcool alerts.status only once.
Clear old log.
Show initial alert.status field values.
Do not perform update, instead document
what would be done.
[-node] (Node to perform extra logging on.)
[-logfile] (full path and file to log file.)
*[-sumfile] (full path and file to the change summary file.)
[-cycle] (Cycle period in seconds.)
[-cyclemin] (minimum delay between cycles, even if the cycle
overruns.)
[-delay] (Ignore events with Populated field younger than)
[-stdout] --Log to standard out
[-help]
NOTE: Program must run as the user root
DEBUGING **************************
There are 8 flags available for debugging.
When diagnosing a specific problem in development -255 or -127 is used as it will:
Clear the old log (32)
Query the events only once and exit the program (16)
Show any warning or errors (1)
Show entry and exit from functions (2)
Provide verbose logging of the inner workings (4)
Dump the initial values of the event's fields that can be updated (64)
Dump the before and after values (if there is a change) for each event (8)
Write a summary of the updates for each event in the sumfile (8)
Normal operation of the program can used the default DEBUG value of 1, which
will log WARNING and ERRORS to an event's journal as well as to the
logfile.
SUMFILE ***************************
* When debug flag 8 is set, the summary file contains a record of all
proposed (debug & 128) or actual changes made in a ~ delimited file that
can easily be viewed with Excel. It also includes any journal entries or
detected errors. Each row represents one event. The following is a column
description:
-- Fields used to identify and select the event...
Node
- Node referenced in the event.
DESCR
- Description of the Node (From Oracle.)
IP_ADDRESS - IP Address of the Node (From Oracle.)
MODEL
- Model of the event (From Oracle.)
AlertGroup - Netcool event field
AlertKey
- Netcool event field
IsJournal
- Journal exist?
-- Fields show the before and after values of netcool fields for the
event. If there is no change, null is shown in the before and after
values to make the changes stand out. Each field is repeated twice
to show the before and after values respectively. The columns include:
Customer Node Site Summary JournalEntry
-- Error messages against the event from processing.
ERRORS
EOF
Impact without Impact
APPENDIX I: Impact-like PERL Shell  90
exit;
}
################################################################################
# PROCEDURE: commandline
# PURPOSE: Parses commandline parameters and sets their values
################################################################################
sub commandline {
my $HELP;
GetOptions("debug=i"
"stdout"
"node=s"
"cycle=i"
=>
"cyclemin=i" =>
"logfile=s"
=>
"sumfile=s"
=>
"delay=i"
'h|help|?'
if ( $HELP ) {
die usage();
}
return;
=> \$DEBUG,
=> \$STDOUT,
=> \$NODE,
\$CYCLESPEED,
\$CYCLEMIN,
\$LOGFILE,
\$SUMMARYFILE,
=> \$EVENTDELAY,
=> \$HELP);
}
###############################################################################
# PROCEDURE: POLICY-SPECIFIC-CODE
# PURPOSE: Code specific to the policy shows up as procedures.
# NETCOOL: The procedure assigns the following fields
#
@Site
: Oracle <Database>.<Field>
# PSUDO CODE:
#
###############################################################################
sub PolicySpecificCode {
($DEBUG & 2) && LogMsg('Entering PolicySpecificCode');
::
:: ::
($DEBUG & 4) && LogMsg("PolicySpecificCode: AlertGroup: $NCAlertGroup\n Site:
$NCSite");
::
:: ::
($DEBUG & 2) && LogMsg('Exiting PolicySpecificCode');
return;
}
################################################################################
################################## MAIN ######################################
################################################################################
$|++; # Unbuffer output
commandline(); ## PARSE PARAMETERS
$IMPACTFILTER=($ISPRIMARY)?"(Populated < getdate-$EVENTDELAY) and (Severity > 0) and
((Type = 1) and ( ...IMPACTFILTER ...)":"(Populated < getdate-$EVENTDELAY) and
(Severity > 0) and (LastOccurrence<getdate-$SECONDARYDELAY) and (Type = 1) and (
...IMPACTFILTER... )";
my $nc = $netcool_dbh->prepare("SELECT $IMPACTSELECT FROM status WHERE
$IMPACTFILTER");
$POPULATESERVER=($ISPRIMARY)?'Primary':'Secondary';
($DEBUG & 2) && LogMsg("Starting Main Program (after comandline parsing)");
## OPEN LOG
$LOGFILE=($LOGFILE)?$LOGFILE:'psuedoimpact.log'; ## DEFAULT FILE.
if ($DEBUG & 32) {
open(LOG, "> $LOGFILE") || die("Can't open '$LOGFILE'");
} else {
Impact without Impact
APPENDIX I: Impact-like PERL Shell  91
open(LOG, ">> $LOGFILE") || die("Can't open '$LOGFILE'");
}
select(LOG); $|=1; select(STDOUT); ## Unbuffer logging
## OPEN SUMMARY OF CHANGES TO EVENTS LOG
$SUMMARYFILE=($SUMMARYFILE)?$SUMMARYFILE:'psuedoimpact.sum'; ## DEFAULT FILE.
if ($DEBUG & 32) {
open(SUM, "> $SUMMARYFILE") || die("Can't open '$SUMMARYFILE'");
} else {
open(SUM, ">> $SUMMARYFILE") || die("Can't open '$SUMMARYFILE'");
}
select(SUM); $|=1; select(STDOUT); ## Unbuffer logging
my $loopstart; ## TIME THE START OF THE LOOP
my $istrue=1; ## FLAG TO ENABLE EXITING OF LOOP IF DEBUG 16 SET
## (ONLY LOOP ONCE)
## PRINT TABLE HEADER WITH ~ DELIMITED FIELDS TO CHANGE SUMMARY FILE
($DEBUG & 8) && print SUM
"Node~DESCR~IP_ADDRESS~MODEL~AlertGroup~AlertKey~IsJournal~Customer~Customer~Node~Node
~Site~Site~Summary~Summary~JournalEntry~JournalEntry~ERRORS\n";
############################# MAIN LOOP ########################################
while ( $istrue ) {
##########################################################################
########################## PREPARE NETCOOL QUERY #########################
##########################################################################
$loopstart=time();
## START TIMER
$istrue=($DEBUG & 16)?0:1; ## LOOP FOREVER, OR ONCE?
($DEBUG & 4) && LogMsg("netcool_dbh->prepare(\"SELECT $IMPACTSELECT FROM status
WHERE $IMPACTFILTER\")");
$nc->execute();
## GRAB UPDATED EVENTS FROM NETCOOL
$nc->bind_columns(\$NCSummary,\$NCSite,\$NCSerial,\$NCTally,\$NCSeverity,\$NCNode,
\$NCIdentifier,\$NCCustomer,\$NCAlertKey,\$NCAlertGroup);
my $ncidx=0;
## COUNT NETCOOL EVENTS. RESET AFTER EVERY LOOP
########################## PREPARE NETCOOL QUERY #########################
################################### END ################################
##########################################################################
######################## ITERATE OVER ALL NETCOOL EVENTS #################
##########################################################################
while ($NC = $nc->fetchrow_arrayref) {
######################################################################
####################### INITIALIZE DATA #############################
######################################################################
## REMOVE NULLS
for (my $i=0; $i<=$#$NC; $i++) {
$$NC[$i]=~s:\0::g;
}
$ncidx++;
## COUNT NETCOOL EVENTS. START AT 1.
## DEBUG AND DETECT CHANGES
foreach my $item (%NCDMP) {
$NCDMP{$item} = '';
}
$NCERROR='';
##
$NCJournalEntry='';
##
PREP
## INITIALIZE HASH TO NULL
INITIALIZE ERROR DETECTION
CLEAR JOURNAL CACHE
## TRANSCRIBE BIND VARIABLES TO HASH
Impact without Impact
APPENDIX I: Impact-like PERL Shell  92
## BEFORE I UPDATE ANY FIELDS WHAT ARE THEIR INITIAL VALUES
$NCDMP{Customer} = $NCCustomer;
$NCDMP{Node} = $NCNode;
$NCDMP{Site} = $NCSite;
$NCDMP{Summary} = $NCSummary;
$NCDMP{Class} = $NCClass;
## LOG INITIAL VALUES OF THE FIELDS THAT MAY BE UPDATED
if ($DEBUG & 64) {
LogMsg("BEFORE UPDATE");
foreach my $fld (sort keys %NCDMP) {
LogMsg(" $fld = $NCDMP{$fld}");
}
}
## LOG IF DEBUG ON THE NODE
if ( $DEBUG & 4) {
LogMsg("Processing " . uc($NCNode) . " - " . $NCSummary);
}
####################### INITIALIZE DATA #############################
############################### END ################################
######################################################################
#################### DETERMINE VALUES FROM ORACLE ####################
######################################################################
# The code that represents the work the policy does goes here
#
#################### DETERMINE VALUES FROM ORACLE ####################
############################### END ################################
######################################################################
####################### UPDATE EVENT LOGIC ##########################
######################################################################
## MARK EVENT AS TOUCHED IF JOURNAL UPDATED, ERROR LOGGED, OR PROPER UPDATE TIME
if ($NCJournalEntry || $NCERROR || (($NCSeverity > 3) and ($NCTally < 30))) {
my $tm = localtime(time());
my $buf=sprintf("%02d/%02d/%04d %02d:%02d:%02d: ",
$tm->mon+1, $tm->mday, $tm->year+1900,
$tm->hour, $tm->min, $tm->sec);
$NCJournalEntry = "POPULATE.PL: $buf $NCJournalEntry ";
}
## DETECT IF ANY CHANGE WAS MADE
$NCDMP{Customer}=($NCCustomer ne $NCDMP{Customer})?
"$NCDMP{Customer}~$NCCustomer":'~';
$NCDMP{Node}=($NCNode ne $NCDMP{Node})?"$NCDMP{Node}~$NCNode":'~';
$NCDMP{Site}=($NCSite ne $NCDMP{Site})?"$NCDMP{Site}~$NCSite":'~';
$NCDMP{Summary}=($NCSummary ne $NCDMP{Summary})?
"$NCDMP{Summary}~$NCSummary":'~';
$NCDMP{JournalEntry}=($NCJournalEntry ne $NCDMP{JournalEntry})?
"$NCDMP{JournalEntry}~$NCJournalEntry":'~';
## BUILD UPDATE STATEMENT (NOTE ALL FIELDS ARE STRINGS)
$UPDATESET='';
foreach my $fld (@UPDATECOL) {
if ($NCDMP{$fld} ne '~') {
my ($dmy,$val) = split /~/,$NCDMP{$fld};
## DO INTEGERS DIFFERENT FROM STRINGS
if ($fld eq 'Class' || $fld eq 'Severity') {
$UPDATESET.=",$fld = $val";
} else {
$UPDATESET.=",$fld = '$val'";
}
Impact without Impact
APPENDIX I: Impact-like PERL Shell  93
}
}
## UPDATE OR LOG NO UPDATE
if ($UPDATESET && !($DEBUG & 128)) {
$UPDATESET="PopulatedBy='$POPULATESERVER',Populated=$loopstart" . $UPDATESET;
if ($DEBUG & 8) {
dumpprint();
}
my $sth=$netcool_dbh->prepare(
qq{update status set $UPDATESET where Identifier = '$NCIdentifier'});
$sth->execute;
($DEBUG & 8) && LogMsg("netcool_dbh->do(UPDATE status SET $UPDATESET where
Identifier = '$NCIdentifier')");
if ($NCJournalEntry ne '~') {
## DO JOURNAL ENTRY
my $sth2=$netcool_dbh->prepare(<<SQL);
INSERT INTO alerts.journal VALUES (
'$NCSerial:$NETCOOL_UID:$loopstart',
$NCSerial, $NETCOOL_UID, $loopstart, '$NCJournalEntry',
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '')
SQL
($DEBUG & 8) && LogMsg("INSERT INTO alerts.journal VALUES
('$NCSerial:$NETCOOL_UID:$loopstart',$NCSerial,$NETCOOL_UID,$loopstart,'$NCJournalEntr
y','', '', '', '', '', '', '', '', '', '', '', '', '', '', '')");
$sth2->execute;
}
} else {
if (($DEBUG & 128) && $UPDATESET) {
$UPDATESET="PopulatedBy='$POPULATESERVER',Populated=$loopstart" . $UPDATESET;
if ($DEBUG & 8) {
dumpprint();
}
LogMsg("netcool_dbh->do(UPDATE status SET $UPDATESET where Identifier =
'$NCIdentifier')");
} elsif (($DEBUG & 8) || ($DEBUG & 128)) {
LogMsg("No update.");
## IF NO DEBUG THEN SAY NOTHING SO DONT NEED AN ELSE CLAUSE HERE...
}
}
####################### UPDATE EVENT LOGIC ##########################
################################ END #################################
#
}
######################## ITERATE OVER ALL NETCOOL EVENTS #################
################################### END ################################
## LOOP TIMING CALCULATIONS AND LOGGING
my $loopend=time();
my $loopperiod=$loopend-$loopstart;
my $SLEEP=(($loopperiod+$CYCLEMIN)<$CYCLESPEED)?$CYCLESPEED-$loopperiod:$CYCLEMIN;
if ($SLEEP == $CYCLEMIN) {
$loopperiod+=$SLEEP;
## THIS IS THE ONLY ERROR THAT DOESNT GET LOGGED TO AN EVENT BECAUSE IT IS GLOBAL
IN NATURE
($DEBUG & 1) && LogMsg("ERROR: Poll cycle overrun. Cycle set at $CYCLESPEED secs
but will be set to $loopperiod secs to allow for a minimum $CYCLEMIN secs break
between cycles.");
}
Impact without Impact
APPENDIX I: Impact-like PERL Shell  94
($DEBUG & 4) && LogMsg("$ncidx Netcool events processed in $loopperiod seconds.
Sleeping $SLEEP seconds for loop period of $loopperiod (should be $CYCLESPEED. Minimum
rest allowed: $CYCLEMIN)");
if ( $istrue ) { ## IF DEBUG & 16 AND GOING TO EXIT ANYWAY, DON'T WAIT.
sleep($SLEEP);
}
}
$netcool_dbh->disconnect;
$oracle_dbh->disconnect;
($DEBUG & 2) && LogMsg("Exiting Main Program");
Impact without Impact
APPENDIX I: Impact-like PERL Shell  95
IMPORTANT NOTICE
ALL INFORMATION PROVIDED IN THIS PAPER IS PROVIDED "AS IS" WITH
ALL FAULTS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR
IMPIED. NMS GURU DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED
INCLUDING, WITHOUT LIMITATION, THOSE OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
NMS GURU SHALL NOT BE LIABLE FOR ANY INDIRECT, SPEICAL,
CONSEQUENTIAL, EXEMLARY, PUNITIVE OR INCIDENTAL DAMAGES
INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR REVENUES, COSTS
OF REPLACEMENT GOODS, LOSS OR DAMAGE TO DATA ARISING OUT OF
THE USE OR INABILITY TO USE ANY PRODUCT MENTIONED IN THIS PAPER,
DAMAGES RESULTING FROM USE OF OR RELIANCE ON THE INFORMATION
PRESENT, EVEN IF NMS GURU HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
NMS GURU IS NOT LIABLE FOR THE ACCURACY OR UTILITY OF THE
INFORMATION CONTAINED IN THIS WHITE PAPER. NMS GURU'S
DISCUSSION OF ANOTHER COMPANY'S PRODUCTS AND/OR SERVICES DOES
NOT CONSTITUTE EITHER AN ENORSEMENT OR A RECOMMENDATION.
THE CONTENTS OF THIS PAPER ARE FOR INFORMATION PURPOSES ONLY.
Impact without Impact
APPENDIX I: Impact-like PERL Shell  96
About NMS Guru
NMS Guru architects and manages comprehensive enterprise management solutions
through principal consultants with decades of experience and deep roots into the industry.
Specialties include: monitoring, performance, configuration, provisioning, change, and
security solutions for networks, systems, applications, and business processes. These
practices are integrated holistically by weaving together the strategic initiatives from above
(OSS/BSS, BPM, ITIL, FCAPS, TMN, etc) with the tactical realities from below (tools,
people, knowledge and processes.) The result is increased operational awareness and
extended useful lifespan of the enterprise management solution.
NMS Guru is headquartered in Austin, TX. (Along with NMS tools: IBM Tivoli, CA
NetQOS, SolarWinds, and many others.) For more information, visit the website at
http://nmsguru.com or call 1.512.617.6694.
Author
Dan Needles is the founder of NMS Guru. Over the past 20 years as an Enterprise IT
Operations Architect he has designed and implemented fault, performance and
configuration management solutions and products for over 50 fortune 500 companies and
government entities. He can be contacted via: guru@nmsguru.com or call 1.512.627.6694.
Impact without Impact
About NMS Guru  97