JMP404 Master Class: Advanced Techniques for Domino Server Monitoring and Alerting Andy Pedisich | President – Technotics, Inc. Rob Axelrod | Vice-president – Technotics, Inc. © 2013 IBM Corporation Your presenters They are two hard working IBM® Notes® Administrators/Developers who have worked with IBM® Notes® and IBM Domino® since version 2.1 – From Technotics, Inc. in Philadelphia, Pennsylvania - USA Andy Pedisich – 28 years in IT – 19 years with Lotus Notes Rob Axelrod – 23 years in IT – 19 years with Lotus Notes 2 About Technotics, Inc. Technotics was founded in 1998 as a consultancy to focus on collaboration in the enterprise. Since that time we have provided strategic advice, project management and technical support to organizations world wide, focusing on high levels of customer engagement and long term relationships. Rob Axelrod Our services include environmental audits, premium support, executive briefings on cloud based collaboration and migrations between messaging and collaboration systems. Contact Andy at andyp@technotics.com or rob@technotics.com. Andy Pedisich 3 What we are working with during this session Our host laptop is a Dell Studio™ 1555 – Intel® Core™ 2Duo CPU T9600 2.80 GHz – 8 GB memory The operating system is Microsoft Windows 7 Ultimate – 64 bit – Copyright © 2009 Microsoft Corporation. All rights reserved – Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. IBM® Domino® Server Release 8.5.3 FP3 64 bit – Running on host system, Using IP address of the laptop's network interface card – Running IBM Traveler IBM Notes® Release 8.5.3 FP3 4 Agenda 5 Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics © 2013 IBM Corporation The humble yet mighty Monitoring Configuration Applicaiton This is also known as EVENTS4.NSF – This controls all the monitoring and notification in your domain • Unless you use 3rd Party software • Unless someone like me gets involved because I custom develop Domino domain monitoring applications Yes, I am both a full time administrator and a developer when required 6 Requirements for efficient and accurate statistics collection Two things are required for statistics collection: – The Collect task must be running on any server that is designated to collect the statistics • And Not all servers should run the Collect task • Only servers designated as collecting servers – The EVENTS4 database must have at least one Statistics Collection document 7 A few other important items are needed Statistics should be collected centrally on one or two servers so that the data is easy to get to – If you have offices in Europe and in Australia, you should probably have at least two servers collecting stats, one in each location • EUStats.nsf • AUStats.nsf – Replicate them to a central location – Stats should be collected at least every hour to be effective EVENTS4 should be the same replica on all servers in the domain – That’s right! You should be able to put all the EVENTS4.NSF’s stacked up together on your desktop – If you can’t maybe you have around 200 servers and stacking that many is impossible • OR maybe all of your EVENTS4.NSFs are not the same replica ID 8 There is a special replica ID for your EVENTS4.NSF The replica ID of system databases, such as EVENTS4, is derived from the replica ID of the Domino directory Database NAMES.NSF CATALOG.NSF EVENTS4.NSF ADMIN4.NSF Replica ID 852564AC:004EBCCF 852564AC:014EBCCF 852564AC:024EBCCF 852564AC:034EBCCF – Notice that the first two numbers after the colon for the EVENTS4.NSF replica are 02 – Make sure that EVENTS4.NSF is the same replica ID throughout the domain by opening a copy from every server and putting it on your desktop • Here’s some code to help you do that 9 Add a button to your toolbar Add this code to a button on your toolbar – This is courtesy of Thomas Bahn – He’s a smart guy, nice guy, and sometimes brings chocolates to his friends from Europe • http://www.assono.de/blog _names := @Subset(@MailDbName; 1) : "names.nsf"; _servers := @PickList([Custom]; _names; "Servers"; "Select servers"; "Select servers to add database from"; 3); _db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file name and path of the database to add."; "log.nsf"); @For( n := 1; n <= @Elements(_servers); n := n + 1; @Command([AddDatabase]; _servers[n] : _db) ) 10 Add a database icon from all servers to the desktop This code will prompt you to pick the servers that have the database you want on your desktop – Then it will prompt for the name of the database • And open it on all the servers you’ve selected Use it to make sure all the EVENTS4.NSF are the same replica in your domain 11 A Single collection document looks like many in the view A single document will look like it is multiple documents in the EVENTS4 database – It’s one document with a multi-value field containing all the server names – Make sure administrators know this, or they might delete everything by mistake • Guess how I know this? 12 Agenda Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics 13 © 2013 IBM Corporation Event Monitoring Details Enough setting up already! Event monitors of all types are set in the EVENTS4 database Two broad categories of events: – Event handlers • Specify the action that Domino takes when a specific event occurs – Event generators • Each type of event generator has a view that provides a list of all event generators, plus additional configuration information 14 Event Generators Event generators deal with specific Notes/Domino issues There are six types of event generators: – Database Event Generator – Domino Server Response Event Generator – Mail Routing Event Generator – Statistic Event Generator – Task Status Event Generator – TCP Server Event Generator • Some are used more than others We’ll stick to the more popular ones that every administrator should use for starters 15 Database Event Generator Use Database Event Generators to monitor: – Database activity – Free space – Frequency and success of database replication – ACLs • And get reports on ACL change • Including those made by replication or an API program Monitor specific servers or every server in the domain 16 Here’s one that everyone should use The ACL of Names.nsf should be monitored for changes in every Notes domain – Once properly set, the ACL of Names.nsf should rarely change! • Alarms should go off when it does change Select Names.nsf – You can choose either a single server, such as the administration server for the address book, OR – All servers in the domain I like to pick all servers in the domain – Admins won’t get away with anything! – But I do get a storm of messages when an ACL change occurs • Every server tells me about the change 17 Unused Space event generator This one is an interesting example of the Events system actually doing something automatically when a certain condition exists – It’s questionable in that it is going to execute the Compact task immediately upon detection of the free space threshold being exceeded • I could see this event being used on archive servers • And I wish there was a way to run it during specific hours 18 Server Response Generator Domino Server Response Event Generator – Checks connectivity/port status of server’s network One server checks others by sending a probe – It’s a good idea to try opening Names.nsf • If you can’t open Names.nsf, then something is wrong! 19 What’s Your Response Tolerance? You set the interval for checking Names.nsf – Default is every three minutes And your response time tolerance – Default is 1,000 Msecs (one second) • These will both depend on your own environment 20 More About Probes The response time is a bit on the harsh side – If you leave it at 1,000 Msecs (one second) you will receive a lot of notifications • You should make it ten seconds or whatever the metrics in your Service Level Agreement (SLA) requires Also, be careful what servers you choose to probe other servers – Try to pick probing servers that are in the same LAN as the probed servers • Otherwise, your probing will actually be testing network latency rather than the servers themselves However, have used these probes as a method of testing exactly that – Network latency 21 Statistic Event Generators Statistic Event Generators monitor a specific Domino or platform statistic – They can let you know when a stat goes over a particular threshold • These stat event generators are extremely valuable Smart administrators use them every day! 22 Complete listing of all statistics are in EVENTS4.NSF The complete listing is in the view Statistics by Name The default statistics thresholds view only shows documents where the field “useful” is equal to the word “Yes” 23 EVENTS4.NSF database has all the stats and thresholds The Monitoring Configuration (EVENTS4.NSF) supplies document detailing thresholds for each statistic – There are 1,193 statistic documents available • But only 166 of them are considered useful for setting thresholds and are found in the Each document contains other information about what kind of stat it is Plus info about the default threshold – And yes, you can change these settings 24 Why are most statd considered “not useful” for thresholds? There is one setting on the advanced that controls whether it will appear in the dropdown list when you’re setting an event generator – Note that there are no Agent statistics in this list 25 Why no agent stats It’s not that the Agent stats aren’t useful – They might not be valuable for threshold tracking In some releases, Agent.Hourly.UsedRunTime has a data type of text – We can’t set a threshold with text values 26 We do have a nice way of seeing that stat though Technotics has created a super customized version of the Monitoring Results database, STATREP.NSF It’s called the Technotics R8.5.3 statrep and it is the stock IBM statrep with added views One of these valuable views is the Agent Stats view You can download this from http://www.andypedisich.com – Look for the Connect2013 link 27 Static statistics are not useful for thresholds Statistics that don’t change usually represent the operating environment of the server – Server.Version.Notes = Release 8.5.3 – Server.Version.OS = Windows NT 5.0 – Server.CPU.Type = Intel Pentium – Disk.D.Size = 71,847,784,448 – Mem.PhysicalRAM = 527,433,728 – Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT Server Adapter 28 Show Me the Stats When you issue a SHOW STAT command at the console, Domino dumps every statistic it is tracking Every one of these statistics is in every single one of the documents in the STATREP.NSF database – All you need is a view to see them 29 What Good Are These? Think these stats aren’t helpful? They are! If you are collecting stats correctly from all your servers, you can take a pretty detailed server inventory – Without leaving your desk • From servers all around the world, just by looking at the data collected in the Monitoring Results database This database is also known by its filename: Statrep.nsf 30 Finding the “not useful” stats You might find that a statistic you need has been marked as not useful To see which are marked as not useful, full text index the EVENTS4.nsf Create an advanced query checking the field useful = “No” to find them – You might discovery a statistic who’s threshold would be right for using 31 Wizard follows up event generator with event handler As you complete the form for an event generator, you’ll see the button to create a new event handler When you click it, you are walked through the process of creating an event handler 32 Wizard lets you choose the method of handling the event There are lots of methods of event handing – Which one you choose depends a lot on your infrastructure – We’re going to talk more about the notification methods in the next section of the presentation For now just remember that an event generator is fairly worthless by itself – Unless you have effective event handler that tells you, in it’s own way, what is going on with your servers 33 Event handlers are a an exquisite gift They provide a way to give you a heads-up about issues provided by event generators They also give you a free-form way of being alerted of anything that happens in the Domino server log and most of what happens on the Domino server console You can use event handlers to respond to generators and certain add-in tasks – They are most valuable for picking out text on the console that will mean trouble if ignored We’re going to focus on this type of event handling since it is less intuitive than responding to generators or add-ins 34 Basics of the event handler configuration There are three screens to deal with Decide whether you want to track an event on just a few servers or all servers – You might want to track a particular event on mail servers only Decide what triggers a notification – We’re going for free-form, so we will select “any event that matches a criteria” 35 Second set of choice for event handling When working with console events, select “events can be of any type” And “events can be of any severity Then look for a particular string of text in the event message – This can be absolutley any text that appears on the console • We will explain why we are picking the text “full administrator access in a moment 36 Final setup tab for event handling Lastly , we define what action will take place when the text appears We’ve selected email notification as the method we will use – But there are over a dozen others that we will discuss in a few moments Note that you can control the time of day the event handler is on the job – I wish they did that for event generators 37 Why did we choose the text Full Access Administrator Full access administrator is the highest level of administrative access to the server – Here are just some of the rights available: • Manager access, with all access privileges enabled, to all databases on the server, regardless of the ACL settings • Access to all documents in all databases, regardless of Reader names fields • The ability to create agents that run in unrestricted mode with full administration rights • Access to any unencrypted data on the server The act of turning on full access administrator should not be taken lightly – Your security model should make it almost unnecessary to ever turn it on Therefore when a privileged user activates full access administrative access, you want to know about it to prevent some hooligan from doing shenanigans 38 When the privilege is turned on, it’s logged When an admin turns on Full Administrator Access (FAA) it appears on the server console – It is grabbed by the event handler and I get an email Each time the admin moves to another server I get another email – Until eventually I call the admin and ask why does he need FAA power! – Usually the admin has forgotten they turned it on and stops using it. 39 Other words you should track with event handlers “deleted by” – This generally means someone has deleted a database – Usually their mail file if they have manager access – You’ll be getting the out the backup tapes in a minute 01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab 01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab 40 Other bad words to watch for extremely inefficient Here are some other words and expressions to watch for: 41 Expression Issue An exception occurred while writing data into database Bad news all round. You’re going have to get to the db and run some maintenance. Replication cannot proceed Replication cannot proceed because cannot maintain uniform access control list on replicas. This is a result of “Enforce Consistent ACL” RRV bucket is corrupt RRV stands for Record Relocation Vector. It is a pointer that tells Notes where to find a specific NoteID and it is bad if it’s corrupted. You can try a fixup, but it might be borked and needs a new replica. truncated Try fixup. Maybe. Maybe not. Device error Uh oh. Database is corrupt; cannot allocate space This one is bad too. B-tree structure is invalid You never want to see a b-tree error. It usually means you have to replace the database. extremely inefficient Agent Manager: Full text operations on database 'xyz.nsf' which is not full text indexed. Agenda Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics 42 © 2013 IBM Corporation We’re circling back to notification methods Here is the panoply of notification methods The most widely used notification method is to send an email to an admin group when a problem occurs – And yet that is also very risky, since the email system itself might be the problem 43 The Most Important Notification Options There are 14 ways to be notified – these are the best Method Result Comments Log to Database Logs the event to a database, typically STATREP.NSF, on a local server Always record any event in STATREP.NSF for historical purposes regardless of what else you do Mail Mails the event to a person or to a mailin database Good for most events in multi-protocol environments, but as mentioned, it’s bad if the mail system goes down Pager Uses the mail address of an alphanumeric pager OK, but limited value because it uses mail system; if mail itself is down, there are issues 44 Paging Dr. Howard, Dr. Fine, Dr. Howard … Paging notification is a good choice – But not if you are paging through a third-party phone system like Verizon or AT&T • They generally require an email to be sent • They have no Service Level Agreement – NONE! Sadly, due to budget and resource constraints, we generally see these two mail or paging methods used the most in production environments 45 The Most Important Notification Options (cont.) These two are the best, and there’s one more that’s not listed Method Result Comments SNMP Trap Sends the event as an SNMP trap. Select this method only if the specified server is running the Event Interceptor task and the Domino SNMP Agent. This is truly an ideal notification method because it does not depend on Notes protocols actually working Forward event to Tivoli Event Console Allows the Tivoli Enterprise Console (TEC) to receive IBM Domino events and reformat them as TEC events. The reformatted TEC event is then sent to the TEC server that you specify in the Configuration Settings document. Check with the Tivoli team to see if it’s possible to use this in your environment 46 Customized Tivoli package As someone who develops a lot of monitoring solutions, I often have to bend the rules and do some development (Ugh!) – I was given an executable called postemsg.exe which was placed on the c: drive of a Windows based server that was the central hub for monitoring servers With some knowledge of Lotusscript I was able to craft a system to monitor servers and send the results back to the Tivoli event console vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" } vMess2 = {hostname="} + vReportServerName + {" } vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" } vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" } vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" } vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING} vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6 result = Shell( vmess , 6 ) 47 Customized Tivoli package In this case I developed a custom monitoring solution that fed trouble tickets into a version of the Tivoli Event Console that was not supported by the Tivoli event handler system – When you have to deal with extreme monitoring capability with high reliability you sometimes need to get in deep – This is very effective because it uses that postemsg.exe executable on the OS level to send the message to the TEC – Note that the message is carefully crafted to form a large command string which sends the ticket to Tivoli • Check with your Tivoli team to see if you can take advantage of this method vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" } vMess2 = {hostname="} + vReportServerName + {" } vMess3 = {sub_source="MESSAGINGLOTUS" MyNotify_supportfilter="1" MyNotify_severity="2" } vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" } vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" } vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING} vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6 result = Shell( vmess , 6 ) 48 Agenda Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics 49 © 2013 IBM Corporation If you’re not using DDM, you see this with each server start 01/22/2013 11:49:08 AM Warning: All Domino Domain Monitoring probes are disabled resulting in the loss of valuable diagnostic information. Please configure DDM probes in events4.nsf. Assess DDM reports in ddm.nsf. 50 DDM is an advanced topic and is best used by new admins Domino Domain Monitoring (DDM) is a powerful, yet complex tool that is often overlooked by administrators If you are using Domino 6, 7, or 8 you are already a proud owner of Domino Domain Monitoring Databas and could already be using it’s powerful functionality 51 51 DDM backs up its discoveries with explanations DDM explains the probable cause, possible solution, and sometimes corrective actions – That’s right, actions that will actually correct the problem you’re experiencing These are stored in the EVENTS4.NSF and are configurable by you – Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY” 52 Looking in the view, “Event Messages by Text” We can find that error message in the EVENTS4.NSF – And discover how we might change report DDM produces 53 The cause, solution, and corrective action is listed This document has all the probable cause, possible solution and corrective action – These are supplied by Lotus and include the code in the corrective action 54 Click the link to the modular corrective action Clicking the link will take you to the code – This could be in formula language, Lotusscript 55 The modular corrective action is re-usable At the bottom of the modular action there is a list of other error text messages that also use this action – That same action that was written only a single time can be used as a corrective action multiple times 56 Modular Documents for cause, solution and corrective actions Domino 8 comes with over 1,000 modular documents – Chances are your solutions are already there for most issues – You can add new ones 57 Modular Documents were new in Domino 8 Modular documents are a welcome addition to DDM – But to appreciate why they are so cool, we must first go back in time to see how similar functionality was accomplished in Release 7 In R7 DDM, some events could have automated solutions – These automated solutions were hard-coded into the Events documents in the Monitoring Configuration application EVENTS4.NSF 58 Modular documents let you create describe issues Modular documents let you add your own probable cause and possible solution text – And create corrective actions that are created with formula code, Lotusscript, and agents 59 The re-usable modular solution saves you time and work You can use any of the same solutions provided by IBM for your custom solution 60 You can add to the solutions that will display with the error A custom solution of composing an email to the target user can be inserted 61 Changes the DDM report The modular document now has the “compose an e-mail” choice 62 It starts the email for you The code plugs in the user’s name and the database that was being accessed – And it’s all done with modular documents in EVENTS4.SNF 63 Remember, actions are matched with events Match up the modular document with the event in the Monitoring Configuration application 64 Changes Might Take Time Events and modular documents are cached – You might find that updates to events and modular documents are not reflected in DDM.NSF right away – Be patient! • If you’re not a patient person, restart the Event task to ensure updates to Events and modular documents are reflected in DDM.NSF immediately 65 Don’t touch the IBM entries Event documents have three categories of Probable Cause/Possible Solution and Corrective Actions The first tab contains the IBM Entries – These are, of course, provided by Lotus • Do not modify or delete these entries • If you want to disassociate the entry with the event, simply edit the document and uncheck the Enabled box 66 Your Custom Entries Add your own references to PC, PS, and CA on the Custom Entries tab – The interface looks similar to the Lotus Entries tab, but you can only add up to two Probable Cause/Possible Solution actions • And there is no Enabled setting as with the Lotus entries These settings will be retained as you move forward upgrading from Domino 8.x 67 Role in DDM ACL that will restrict who can use actions Many events have corrective actions associated with them – Only users with the Execute CA role in the DDM ACL are able to access the command actions and the corrective action text and links • This ensures that only qualified team members will be able to make the changes 68 Agenda Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics 69 © 2013 IBM Corporation Dealing with problematic servers Sometimes there are servers with issues that crop up – We would like to collect statistics for analysis from these systems more frequently than we do from the standard statistics collection interval • If you try to add a second collection interval on a server you’ll get this: 70 Each server is allowed to collect stats with only one interval It makes sense that a server can only have one collection interval – You must create a second collection document for another server – And don’t forget to add the “collect” task to the servertasks= parameter in the server’s NOTES.INI Let’s look at a server that has CPU spikes – We want to determine exactly when they are happening by creating a chart First we create a statistics collection document for a second server to take statistics from our problem server 71 Set the collection interval for five minutes In the statistics collection document set the collection interval for 5 minutes By the way, do not check any filters – They tell the collector to ignore the statistics you have checked Note that statistics are being logged to a database called ProblemServer.NSF – This database will be used exclusively to track CPU utilization of the Traveler task Please note that the data in this example has been fictionalized for effect – This is not actual data from a real server – It is being used as an example of capturing and analyzing data on a problematic server 72 Create a special view that tracks CPU utilization for Traveler In this case it’s the Traveler CPU we want to track We create a custom view for the collecting database that only has the server name, the time of collection, and the statistic called Platform.Process.Traveler.1.PctCpuUtil – This will be used to easily create a graph of the CPU activity 73 Collect the data, then copy it as a table from the custom view After collecting a week’s worth of data, we experience the CPU utilization All the data in the view is selected using Ctrl-A – It is copied as a table • Copying views as a table is one of my favorite features in Notes A Monitoring Results template is posted on my web site – A URL to this template is included at the end of the presentation 74 Data has been copied to a spreadsheet A simple paste of the data puts it into a spreadsheet where we are ready to turn it into a chart 75 Use the tools in your spreadsheet to create a graph Select the columns Collection Time and Traveler CPU Create a graph from the data – In this example, a scatter chart type with smooth lines is being used 76 The resulting graph This produces an excellent graph of the CPU utilization over a ten day period with samples being taken at intervals of 5 minutes – And it took less that 5 minutes to make this chart • One adjustment was made to the x-axis formatting and the legend was removed 77 Demonstration Creating a graph of results from a custom view of collected data 78 Agenda Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics 79 © 2013 IBM Corporation Some events occur on the console, but not in the log Note in this example the server stops reporting at 11:04 PM Then at 11:27 PM it is back on line What happened in the interim? Name: Time: Mail1/domlab 01/04 11:02:05 PM Miscellaneous Events: 01/04/2013 11:04:17 PM 01/04/2013 11:04:31 PM 01/04/2013 11:04:31 PM 01/04/2013 11:04:33 PM 01/04/2013 11:04:35 PM 01/04/2013 11:04:38 PM 01/04/2013 11:04:43 PM 01/04/2013 11:04:51 PM Name: Time: Pulling icl.ntf from Maill2/domlab icl.ntf Access control is set in catalog.nsf to not allow replication from BES02/domlab catalog.nsf Access control is set in mail2/domlab catalog.nsf to not allow replication from catalog.nsf Pulling ddm.nsf from Mail2/domlab ddm.nsf Pushing ddm.nsf to Mail2/domlab ddm.nsf Finished replication with server Mail2/domlab Router: Transferred 1 messages to MAIL2.domlab.COM (host MAIL02.domlabUSA.COM) via SMTP Opened session for Mail2/domlab (Release 8.5.2FP1) Mail1/domlab 01/04 11:27:11 PM - 01/04 11:27:47 PM Miscellaneous Events: 01/04/2013 11:27:11 PM Recovery Manager: Restart Recovery complete. (196/1686 databases needed full/partial recovery) 01/04/2013 11:27:11 PM Informational - The DAOS catalog is not synchronized. Deletions will be postponed. Please run 'tell daosmgr resync' at the next convenient opportunity to re-synchronize. 01/04/2013 11:27:12 PM Event Monitor started 01/04/2013 11:27:12 PM Warning: All Domino Domain Monitoring probes are disabled res 80 There is action in the CONSOLE.LOG CONSOLE.LOG and other logs are in the folder called IBM_TECHNICAL_SUPPORT under the data folder on servers and on clients The CONSOLE.LOG on a server often contains data that has been seen on the Domino server console, but not in the Domino server log – It shows there was a Long Held Lock Dump and then a panic! Lock(Mode=SIX* LockID(DB DB=G:\Lotus\Domino\Data\mail\web\Complaints.nsf)) Waiters countNonIntentLocks = 1 countIntentLocks = 1, queuLength = 95 [Req(Status=Granted Mode=IS Class=Manual Nest=0 Cnt=1 Tran=0 Func=N/A m\lkmgr.cpp:159 [0D64:0002-0D60]) rm_lkmgr_cpp:2070 rm_lkmgr_cpp:1306 nsfsem1_c:169 nsfsem1_c:1020 nsfsem6_c:503 Req(Status=Granted Mode=SIX Class=Manual Nest=0 Cnt=1 Tran=0 Func=N/A inplace.c:153 [099C:0165-12FC]) LkMgr END Long Held Lock Dump -----------------01/04/2013 11:04:51 PM Opened session for Terry Mallory/domlab (Release 8.5.2FP2) 01/04/2013 11:04:51 PM Closed session for Terry Mallory/domlab Databases accessed: 1 Documents read: 0 Documents written: 0 The server process terminated abnormally with the exit status = 1. Please send this information and the collected nsd log to IBM Support. This process will now Panic in order to start fault recovery operations. 81 Why did this happen? In this case there were a large number of email messages with big attachments waiting to be processed in the MAIL.BOXES The server was relatively underpowered Plus I think the messages were part of an emailing made by a CEO – And we all know, the mostly visible executives have the worst time with any piece of messaging software 82 Here’s another example of helpful Console logging I entered the following into the Domino server console Tell traveler stat show That command generates hundreds of lines of statistics and other information Clearly it shows on the server console. 83 Here’s another reason for Console logging Here’s the Domino server log, showing me doing several furious requests to the Traveler task to Tell traveler stat show I get nothing > tell traveler stat show 01/06/2013 12:24:49 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show > tell traveler stat show > tell traveler stat show 01/06/2013 12:24:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show 01/06/2013 12:24:55 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show > tell traveler stat show 01/06/2013 12:24:55 PM 01/06/2013 12:25:43 PM 01/06/2013 12:25:43 PM 01/06/2013 12:25:43 PM 01/06/2013 12:25:44 PM 01/06/2013 12:25:44 PM 01/06/2013 12:25:44 PM 01/06/2013 12:25:52 PM > tell traveler stat show 84 Directory Cataloger finished processing names.nsf: Directory Catalog has no Configuration record AMgr: Start executing agent 'PullFromAdmin4' in 'certreq.nsf' by Executive '1' AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'PullFromAdmin4' in 'certreq.nsf' AMgr: 'Agent 'PullFromAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab' AMgr: Start executing agent 'SubmitToAdmin4' in 'certreq.nsf' by Executive '1' AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'SubmitToAdmin4' in 'certreq.nsf' AMgr: 'Agent 'SubmitToAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab' Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show Check the IBM_TECHNICAL_SUPPORT folder CONSOLE.LOG from the IBM_TECHNICAL_SUPPORT folder on the server Whenever there are server issues, don’t forget to check the console.log for evidence 01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show tell traveler stat show CPU.Pct.000-010 = 7 ClusterCache.Access = 1 Constrained.count = 0 Constrained.state = false DB.Connections = 1 DB.Connections.Idle = 1 DB.Connections.Max = 7000 DCA.C.CheckAccessRights = 2 DCA.C.Count.NSFDbClose = 3 DCA.C.Count.NSFDbOpen = 3 DCA.C.Count.NSFNoteClose = 2 DCA.C.Count.NSFNoteOpen = 2 DCA.C.HTMLCreateConverter = 1 DCA.C.HTMLDestroyConverter = 1 DCA.C.ModDoc.RunCount = 1 DCA.C.ModDoc.SyncableDocs = 1 85 Console logging configuration To start a console log permanently on your servers, add this to the NOTES.INI – Console_Log_Enabled = 1 Use the following values – 0 - Disable Console Log file logging – 1 - Enable Console Log file logging You can also toggle logging to the Console Log file from the server console – Use the start consolelog and stop consolelog commands Obviously this is an important feature and you’d want it to be enabled all the time Set a maximum size of almost 100MB for the console log using the following parameter – Console_Log_Max_Kbytes = 100000 86 Console Mirroring You can also use Console Mirroring which is slightly different that just the normal console logging Console log mirroring causes a new server thread to be created – It monitors all messages written to the Console Log file and duplicates these messages into another file – When this new file is filled, the thread closes the mirrored file and creates a new file into which subsequent messages are written. You can delete the closed mirrored files at your discretion. Console log mirroring has three related NOTES.INI settings: – Console_Log_Mirror=1 -- Enables the mirroring feature – Retain_Mirror_Logs=1 -- Prevents deletion of previous mirrors when Domino starts – Console_Log_Max_Kbytes= -- Sets the maximum size of the Console Log/mirror files 87 A little more about mirroring If the NOTES.INI setting Retain_Mirror_Logs=1 is not set, when the new task starts it begins deleting previous mirror files Then a new file is created and assigned the name of the log with a number in it – For example CONSOLE1.LOG is created When the log fills to the configured capacity it closes the current log and starts a new one with a new number 88 Agenda Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console but not in the log Using Lotusscript to access server statistics 89 © 2013 IBM Corporation Can you be an Admin/Dev person? When you’re an admin there are a lot of reasons to learn Lotusscript You can write your own agents that gather statistics and monitor servers Lotusscript lets you ask for a statistic on all of your servers, one by one, then store it in a database and produce alerts and notifications that are more sophisticated than native Notes monitoring The following are two examples of coding that you might find helpful – If you have buddies in the Dev side of the house they might find this interesting • Generally dev people don’t do applications that help administrators • Their focus is on user applications These two snippets can give you an idea of the potential you have when dealing with statistics and Lotusscript 90 Gathering script using Lotusscript is easy Here’s an agent that simply issues a Domino server console command – Then show you the value in a messagebox It’s pretty cool for 11 lines of code Sub Initialize Dim session As New NotesSession Dim vServername As String Dim vConsoleCommand As String Dim vConsoleReturn As String vConsoleCommand = "sho stat server.trans.total“ vServerName = "admin/domlab“ vConsoleReturn = session.sendConsoleCommand(vServerName,vConsoleCommand) MessageBox(vConsoleReturn) Exit Sub End Sub 91 The Mail.TotalPending statistic This stat was introduced in Release 5, and I use it all the time in monitoring servers for mail backups From SPR# BSAW4HFMPY – https://www304.ibm.com/support/docview.wss?uid=sim43d86a0d3e79e0e6785256a8500737f2b Added a new Mail.TotalPending statistic that shows the count of messages pending in mail.box. This statistic is updated once every 5 minutes by the Server task, and therefore does not depend on the Router task for updates. This provides information about total backlog of mail in the event that the router is hung or not started, and may be useful to indicate that a mail routing problem needs further investigation. 92 Here’s a similar code snippet that gets total pending mail This is from a much larger agent that runs every 5 minutes on 70 servers Remember, Lotusscript lets you issue console commands – Then take the results of the command and take other actions Our job is to parse out the number 130 from the show stat command – Show stat mail.totalpending We’re grabbing the stat mail.waiting, which looks like this on the console Mail.TotalPending = 130 1 statistics found 93 Here’s the meat and potatoes Mail.TotalPending = 130 1 statistics found Then it’s being parsed out so that only the number is grabbed – vLocStart = InStr(1,vConsoleReturn,"=",5 )+2 • Gives the location of 2 characters past the equal sign where the number starts – vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart • Gives the location of the end of the number where there is a line feed (CHR(13) – vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd) • That’s the number as a string, which is converted to a number vConsoleCommandPending = "sh stat mail.pending“ 'lets ask the console how many messages are pending vConsoleReturn = session.SendConsoleCommand(vServerName, vConsoleCommandPending) vLocStart = InStr(1,vConsoleReturn,"=",5 )+2 vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd) 'Print "Pending: " + Str(vMailTotalPending) + " Pending: " + vStatStr 94 vMailPending = Val(vStatStr) Lotusscript and monitoring/alerting – a great pair of tools You get the advantage of automation with the power of monitoring and alerting Stop issues before they become problems Don’t forget, download the custom statrep Technotics Statrep 8.5.3 from – Http://www.andypedisich.com 95 Thank you for attending our session! Please don’t forget to fill out your evaluations. We read them all! Please feel free to stop us and ask questions or just have pleasant conversations Contact us! Andyp@technotics.com Rob@Technotics.com http://www.technotics.com http://www.andypedisich.com 96 Legal disclaimer © IBM Corporation 2013. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. I Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. 97 © 2013 IBM Corporation