Implementin g a Monitoring solution for SCM EAI centeractive AG Overview Maximize application availability Accelerate event resolution Support root-cause analysis Support business-process remediation Reduce reliance on single person knowledge Maintain extensibility Maintain control over Implement ation Strategy Select Monitoring Solution Start with simple requirements Extend functionality over time Plan for the future from the start Select a Monitoring Solution In-House Open Source Commerci al The Present Situation Enterprise monitoring of EAI components at SCM is limited. Production: Network ,process, and server level via Hawk and OpenView Martin Green: Component sanity daily Cron jobs checks via No/Limited support for root-cause analysis Causes time waste - eg. investigating log files Martin Green's EAI Sanity Check tool Text-only output Sanity checks of EAI components only no business intelligence Not real-time: currently run daily Alert sent by email Not in production? Candidate Alternatives TIBCO Products (Com) Hawk EMA Business Business Factor Events MicroMuse NetCool (Com) Big Sister (OSS) Nagios (OSS) In-house Tibco Hawk Process level monitoring Limited to Up/down alerts No business process awareness status Built into BW Hawk console only used by limited group Production Support uses Hawk's certified messaging ledger files Tibco EMA A Bridge between Hawk and HP Openview Tibco EMA Features Single-vendor solution Embeds into Tibco Administrator Enterprise-grade Tibco EMA Drawbacks High server load No business-process awareness Not stand-alone In-house trials not encouraging Vendor tie-in Tibco Business Factors Evaluated by SVC centeractive and centeractive conclusion: not up to task High server load Technical perspective of GUI SCM conclusion pending Tibco Business Events Business event level monitoring define, map, measure, relate events No detailed evaluation available Tibco equivalent to Micromuse Netcool Promoted as a datamining tool Micromuse NetCool Owned by IBM Domain-specific IT event management Enterprise grade monitoring Major clients in banking, cable ISP, and telecoms Real-time Several components Netcool/Omnibus monitoring dashboard Micromuse NetCool Advantages Enterprise-grade monitoring Proven in banks telecoms High-quality functionality and Intuitive graphical user interface Alert incidence tabular view Graphical views (floor plans) Micromuse NetCool Drawbacks risk of vendor lock-in Incompatible with competing monitoring systems High server load, requires several servers, reliability issues Inflexible, parts are archaic Sybase DB, browser applets in BigSister Mature Opensource network monitoring solution lSupports grouping of events into a single metaelement for business-process association l BigSister Main Features Highly Configurable Browser based GUI instant status and drill-down Active alerting sms, email, IM Extensive monitoring support oracle sql, watchers log BigSister Advantages Easy to extend Scalable; new hosts and tests are trivial to integrate Swiss-based project Up-to-date implementation Open standards No risk of vendor lock-in Extensive in house experience at BigSister Drawbacks Standard installation requires root Extension requires Perl knowledge No support for alert acknowledge BigSister Architecture uxmon data collectors supporting wide range protocols User of bsmon manages html generation and alarming bbd server listening Nagios - Features Monitoring of network services and host resources Web interface, alarming, and extensible monitoring plug-ins Ability to acknowledge problems via the web interface Nagios - Issues More aligned for network rather than application monitoring Available Sensors are few Recommendation: Monitoring Tool Use BigSister No vendor lock-in, no license costs Responsive development community Easy to extend Fastest time to production Easiest to implement Modular design facilitates incremental installation Proposed Monitoring Implementation Strategy Recommendation: Development strategy Identify meaningful monitoring points for each process and its interface What do we to know? really need Develop desirable alerting strategy Who does this help? Implement BigSister in 2-phase approach Identify Interfaces and IO in floorplan EMS queues and web services Log files errors, warnings (recoveries) DB connection Disk access FTP, HTTP, SSH events Process invocation Beppi2 Floor-plan Arrows identify potential monitoring ponts Webservices O M D e ta ille dF lo o rP la n EMS/RV Messages (ta rg e ts ta te ) S a m b a P M C T IC D B N e wC o m p o n e n ts S O A P C R M S O A P (S ie b e l) S ie b e lO X C h a n g e dC o m p o n e n ts O D B C OM Logs S ie b e lD a taB e a n s S a m b a E n g in e T ic k le r-T a rg e t A D BA d a p te r J M S (B u s in e s s W o rk s) S ie b e lC u s to m P u b lis h e rA d a p te r J M S R V R V T IB/E M S O M L a y e r C ro s sc u ttin gc o n c e rn s P a y m e n tW o rk flo w J M S D o c u m e n t D is p a tc h e r J M S M B G W o rk flo w T ra c in glib ra ry B u s in e s s W o rk s X re flib ra ry In te g ra tio nM a n a g e r E rro rH a n d lin g F ra m e w o rk In te g ra tio nM a n a g e r T ra c in glib ra ry In te g ra tio nM a n a g e r B a c k e n dro lero u tin g B u s in e s s W o rk s D o c u m e n tC a c h e B u s in e s s W o rk s E A ID B C ro s s re fe re n c in g P ro d u c ts E rro rh a n d lin gc o n fig u ra tio n E rro rlo g g in g D o c u m e n tre p o s ito ry E S PA d a p te r P ro d u c tR u leE n g in e O M E n d 2 E n d W o rk flo w S e lf-C a re (B u s in e s s W o rk s) F ile s y s te m c lu s te r P ro c e s s e s B W C (In C o n c e rt) E C EM 4 S ie b e lM 5 R M C A S D K D M J D B C Ic J a v aA P I E M SS e rv e rA E M SS e rv e rB (a c tiv e ) (p a s s iv e ) B S K B u s in e s s W o rk s R V (toR M C A ) W o rk flo wG U I In te g ra tio nM a n a g e r E rro rH a n d lin g T o m c a t R V B u s in e s s W o rk s J M S R V S O A P J M S T IB/E M S R V J M S J M S R V Cus Reqtom Ad uest apte r1 Custo Resp m Ad onse apte r2 DV TIBSC Adap O ADter 3 B K TIBDCM Ada O ADpter B Ad ap ter om Cu st ust om B u s in e s s W o rk s J M S In fra n e tO X E n g in eG O 1 DVS M O D E AC u s to m A d a p te r R V DVS M O C C AC u s to m A d a p te r BS CS (B u s in e s s W o rk s) J M S J M S In fra n e tO X E n g in eG O 2 r (B u s in e s s W o rk s) (B u s in e s s W o rk s) M S R VJ J D B C ap te r B S C S E n g in e in te ra c tin g kle r- S (ADource B) Ad ap te B S C S E n g in e O rd e rE n try Ad M O D E A E n g in e R V R V R V Tic S ie b e lC u s to m S u b s c rib e rA d a p te r J M S J M S Cu sto m (B u s in e s s W o rk s) J M S EC E (B u s in e s s W o rk s) ap te r M O C C A E n g in e J M S Ad S D E n g in e SD R M C AC u s to m A d a p te r R V J M S /K C (B u s in e s s W o rk s) R V BS N P S E n g in e J M S Ad ap ter R V om J M S Cu st R V O XL a y e r OX Logs R V O M E n g in etra n s ie n t O M E n g in ep e rs is te n t EMS/RV Messages Q u ic kC h e c k (B u s in e s s W o rk s) S e lfC a re E rro rH a n d lin gA d a p te r A d a p te rF ra m e w o rk (B u s in e s s W o rk s) B W CD B (O ra c le9 i) OM processes Beppi 1b OM Workers OM End2End transient processes E rro rH a n d lin g F ra m e w o rk B u s in e s s W o rk s OM Workers WF interface processes X re flib ra ry B u s in e s s W o rk s J D B C J D B C ID O C R F C S ie b e lD a taB e a n s F T P C O R B A C O R B A J D B C J D B C O D B C S O A P (B u s in e s s W o rk s) File IO B u s in e s s W o rk s C A S (P la in S o c k e t C o m m .) R V D V S E n g in e S O A P S O A P O D B C O D B C J D B C Logs N P S S ie b e lB a c kE n d P ro v is io n in g S w is s c o m D ire c to rie s (P ro v id e n t) P h o n eD ire c to ry K D M M O C C A C A S P M C D E A (S A PR 3 ) B a c k e n d s D V So rC N O (K u n d e n d a te n m a n a g e m e n t) A d re s s q u a litä t D o k u m e n te n v e rw a ltu n g s s y s te m o r C u s to m e rN o tific a tio n s P M C B S D B M O D E A E C E R M C A A c c o u n tM a n a g e m e n t D u n n in g T o m c a t B S /K E v e n tC o lle c to r a n dE n ric h e r J D B C B S C S B illin g In fra n e t (R a tin g ) W in d re a m DBs, Adapter logs Identify Alerting Strategy When is passive monitoring enough? When trends visualized can be When are alerts really needed? When something can be prevented When immediate resolution is necessary Implement BigSister Pilot Stick to built-in functionality Implement spreadsheet-based monitoring configuration Enable non-technical users to update monitoring details Minimize total cost of ownership Simplify definition of business-process Implement monitoring of business Map EAI components processes sub to processes Mapping at process level Creating dependency delationship mappings define individual business-process Ids KB, Self-care commissioning Mappings are configurable Visualisation example: alarms for an arbitrarybusiness process Implementation Concept Start simple Alert on errors in log entries Alert on non-responsive services Alert on JMS queues and depth Alert on machine resources Facilitate Administration Get system to users Growing than a system better finished rollout Extend Monitoring Capability Acquisition Add any functionality requested by SVC Implement Spreadsheet configuration Single-document monitoring configuration Add advanced monitoring Throughput statistics Trends Process loads Extend Monitoring Capability Visualisation Add any functionality requested by SVC Implement businessprocess groups Add visual dependency data SVG + XHTML graphics smart Dynamic graphical dashboards Plan for future the Organic monitoring strategy Error remediation Adhere to open standards Stay with proven, transparent tools Centeractive how know- 2 senior developers with BigSister experience deployment usage 2 senior developers with Netcool experience development implementation 1 administrator ...Thank you