Advanced Server Monitoring and Alert Notifications Andy Pedisich Technotics © 2013 Wellesley Information Services. All rights reserved. Your Presenter • • • One half of a pair of two hard-working IBM® Notes® Administrators/Developers who have worked with IBM® Notes® and IBM Domino® since version 2.1 From Technotics, Inc. in Philadelphia, Pennsylvania – USA Andy Pedisich 28 years in IT 19 years with Lotus Notes Rob Axelrod 23 years in IT 19 years with Lotus Notes 1 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 2 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 3 Requirements for Efficient and Accurate Statistics Collection • Two things are required for statistics collection: The Collect task must be running on any server that is designated to collect the statistics And Not all servers should run the Collect task Only servers designated as collecting servers The EVENTS4 Monitoring Configuration database must have at least one Statistics Collection document Minimum collection time should be an hour 4 There Is a Special Replica ID for Your EVENTS4.NSF • • The replica ID of system databases, such as EVENTS4, is derived from the replica ID of the Domino directory Database Replica ID NAMES.NSF 852564AC:004EBCCF CATALOG.NSF 852564AC:014EBCCF EVENTS4.NSF 852564AC:024EBCCF ADMIN4.NSF 852564AC:034EBCCF Notice that the first two numbers after the colon for the EVENTS4.NSF replica are 02 Make sure EVENTS4.NSF is the same replica ID Opening a copy from every server and putting it on your desktop There’s some code on the next slide to help you do that 5 Add a Button to Your Toolbar • Add this code to a button on your toolbar This is courtesy of Thomas Bahn He’s a smart guy, nice guy, and sometimes brings chocolates to his friends from Europe www.assono.de/blog _names := @Subset(@MailDbName; 1) : "names.nsf"; _servers := @PickList([Custom]; _names; "Servers"; "Select servers"; "Select servers to add database from"; 3); _db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file name and path of the database to add."; "log.nsf"); @For( n := 1; n <= @Elements(_servers); n := n + 1; @Command([AddDatabase]; _servers[n] : _db) ) 6 Add a Database Icon from All Servers to the Desktop • • This code will prompt you to pick the servers that have the database you want on your desktop Then it will prompt for the name of the database And open it on all the servers you’ve selected Use it to make sure all the EVENTS4.NSF are the same replica in your domain 7 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 8 Event Monitoring Details • • • Enough setting up already! Event monitors of all types are set in the EVENTS4 database Two broad categories of events: Event handlers Specify the action that Domino takes when a specific event occurs Event generators Each type of event generator has a view that provides a list of all event generators, plus additional configuration information 9 Event Generators • • • Event generators deal with specific Notes/Domino issues There are six types of event generators: Database Event Generator Domino Server Response Event Generator Mail-Routing Event Generator Statistic Event Generator Task Status Event Generator TCP Server Event Generator Some are used more than others We’ll stick to the more popular ones that every administrator should use, for starters 10 Here’s One That Everyone Should Use • • • The ACL of Names.nsf should rarely change, so monitor it! Alarms should go off if it changes Select Names.nsf Choose either a single server or all servers in the domain I like to pick all servers in the domain Admins won’t get away with anything! But I do get a storm of messages when an ACL change occurs Every server tells me about the change 11 Unused Space Event Generator • This is an example of the Events system actually doing something automatically when a certain condition exists It’s questionable – it is going to execute the Compact task immediately upon detection of free space threshold being exceeded I could see this event being used on archive servers And I wish there was a way to run it during specific hours 12 Domino Server Response Generator • • • One server checks others by sending a probe It’s a good idea to try opening Names.nsf If you can’t open Names.nsf, then something is wrong! Default is every three minutes Default response time tolerance is 1,000 Msecs (one second) Your settings will depend on your own environment 13 More About Probes • • The response time is a bit on the harsh side If you leave it at 1,000 Msecs (one second), you will receive a lot of notifications You should make it ten seconds, or whatever the metrics in your Service Level Agreement (SLA) require Also, be careful what servers you choose to probe other servers Try to pick probing servers that are in the same LAN as the probed servers Otherwise, your probing will actually be testing network latency, rather than the servers themselves I have used these probes as a method of testing exactly that Network latency 14 Statistic Event Generators • Statistic Event Generators monitor a specific Domino or platform statistic They can let you know when a stat goes over a particular threshold These stat event generators are extremely valuable Smart administrators use them every day! 15 Complete Listing of All Statistics Is in EVENTS4.NSF • • The Monitoring Configuration (EVENTS4.NSF) supplies document detailing thresholds for each statistic 1,193 statistic documents available The complete listing is in the view Statistics by Name But only 166 of them are considered useful for setting thresholds and are found in the default statistics view The default statistics thresholds view only shows documents where the field “useful” is equal to the word “Yes” 16 Finding the “Not Useful” Stats • • • You might find that a statistic you need has been marked as not useful To see which are marked as not useful, full text index the EVENTS4.nsf Create an advanced query checking the field useful = “No” You might discover a statistic who’s threshold would be just right for using 17 Why Are Most Stats Considered “Not Useful” for Thresholds? • One setting on the advanced query that controls whether it will appear in the drop-down list when you’re setting an event generator Note that there are no Agent statistics in this list 18 Why No Agent Stats • • It’s not that the Agent stats aren’t useful They might not be valuable for threshold tracking In some releases, Agent.Hourly.UsedRunTime has a data type of text We can’t set a threshold with text values 19 We Do Have a Nice Way of Seeing That Stat, Though • • • Technotics has created a super-customized version of the Monitoring Results database, STATREP.NSF Technotics R8.5.3 statrep It’s the stock statrep with added views One of these valuable views is Agent Stats view You can download this from: www.andypedisich.com Look for the Admin2013 link 20 Show Me the Stats • • When you issue a SHOW STAT command at the console, Domino dumps every statistic it is tracking Every one of these statistics is in every single one of the documents in the STATREP.NSF database All you need is a view to see them 21 Static Statistics Are Not Useful for Thresholds • • • Statistics that don’t change usually represent the operating environment of the server Server.Version.Notes = Release 8.5.3 Server.Version.OS = Windows NT 5.0 Server.CPU.Type = Intel Pentium Disk.D.Size = 71,847,784,448 Mem.PhysicalRAM = 527,433,728 Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT Server Adapter Think these stats aren’t helpful? They are! You can take a pretty detailed worldwide server inventory Just by looking at the fields in STATREP.NSF 22 Wizard Lets You Choose the Method of Handling the Event • • There are lots of methods of event handing Which one you choose depends a lot on your infrastructure We’re going to talk more about the notification methods in the next section of the presentation For now, just remember that an event generator is fairly worthless by itself Unless you have an effective event handler that tells you, in its own way, what is going on with your servers 23 Event Handlers Are an Exquisite Gift • • • • They can give you a heads-up about issues provided by event generators They also give you a free-form way of being alerted of anything that happens in the Domino server log and most of what happens on the Domino server console You can use event handlers to respond to generators and certain add-in tasks They are most valuable for picking out text on the console that will mean trouble if ignored We’re going to focus on this type of event handling, since it is less intuitive than responding to generators or add-ins 24 Basics of the Event Handler Configuration • • • 3 screens to deal with Decide whether you want to track an event on just a few servers or all servers You might want to track a particular event on mail servers only Decide what triggers a notification We’re going for free-form, so we will select “any event that matches a criteria” 25 Second Set of Choice for Event Handling • • When working with console events, select: “Events can be of any type” “Events can be of any severity” Then look for a particular string of text in the event message This can be absolutely any text that appears on the console We will explain why we are picking the text “full administrator access” in a moment 26 Final Set-Up Tab for Event Handling • • • Define action to occur when the text appears We’ve selected email notification But there are over a dozen others that we will discuss in a few moments Note: You can control the time of day the event handler is on the job I wish they did that for event generators 27 Why Did We Monitor the Text Full Access Administrator? • • It is the highest level of administrative access to the server Manager access with all access privileges enabled to all databases on the server, regardless of the ACL settings or readername settings Access to any unencrypted data on the server Your security model should make FAA almost unnecessary When full FAA is turned on, you want to know about it to prevent some hooligan from doing shenanigans 28 Other Words You Should Track with Event Handlers • “Deleted by” This generally means someone has deleted a database Usually their mail file if they have manager access You’ll be getting out the back-up tapes in a minute 01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab 01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab 29 Other Bad Words to Watch for Extremely Inefficient • Here are some other words and expressions to watch for: Expression Issue An exception occurred while writing data into database Bad news all around. You’re going have to get to the database and run some maintenance. Replication cannot proceed Replication cannot proceed because it cannot maintain uniform access control list on replicas. This is a result of “Enforce Consistent ACL.” RRV bucket is corrupt RRV stands for Record Relocation Vector. It is a pointer that tells Notes where to find a specific NoteID, and it is bad if it’s corrupted. You can try a fixup, but it might be borked and needs a new replica. Truncated Try fixup. Maybe. Maybe not. Device error Uh oh Database is corrupt; cannot allocate space This one is bad, too B-tree structure is invalid You never want to see a b-tree error. It usually means you have to replace the database. Extremely inefficient Agent Manager: Full text operations on database “xyz.nsf” which is not full-text indexed 30 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 31 We’re Circling Back to Notification Methods • • Here is the panoply of notification methods The most widely-used notification method is to send an email to an admin group when a problem occurs And yet, that is also very risky, since the email system itself might be the problem 32 Paging Dr. Howard, Dr. Fine, Dr. Howard … • • • 14 ways to be notified – these 2 are the most widely used But not necessarily the best to use Paging notification is a good choice, but not if you are paging through a third-party phone system, like Verizon or AT&T They generally require an email to be sent They have no Service Level Agreement – NONE! Sadly, due to budget and resource constraints, we generally see these two mail or paging methods used the most in production environments Method Result Mail Mails the event to a person Good for most events in multi-protocol environments, but as or to a mail-in database mentioned, it’s bad if the mail system goes down Pager Uses the mail address of an alphanumeric pager Comments OK, but limited value because it uses mail system; if mail itself is down, there are issues 33 The Most Important Notification Options • These two are the best, and there’s one more that’s not listed Method Result Comments SNMP Trap Sends the event as an SNMP trap. Select this method only if the specified server is running the Event Interceptor task and the Domino SNMP Agent. This is truly an ideal notification method because it does not depend on Notes protocols actually working Forward event to Tivoli Event Console Allows the Tivoli Enterprise Console (TEC) to receive IBM Domino events and reformat them as TEC events. The reformatted TEC event is then sent to the TEC server that you specify in the Configuration Settings document. Check with the Tivoli team to see if it’s possible to use this in your environment 34 Customized Tivoli Package • In one case, I developed a custom monitoring solution that fed trouble tickets into a version of the Tivoli Event Console that was not supported by the Domino Tivoli event handler system When you have to deal with extreme monitoring capability with high reliability, you sometimes need to get in deep This is very effective because it uses that postemsg.exe executable on the OS level to send the message to the TEC Note that the message is carefully crafted to form a large command string which sends the ticket to Tivoli Check with your Tivoli team to see if you can take advantage of this method 35 Customized Tivoli Package (cont.) • • As someone who creates a lot of Domino monitoring solutions, I often have to bend the rules and do some development (Ugh!) Executable called postemsg.exe was placed on the c: drive of a Windows server that was the central Domino monitoring hub This is very effective because it uses that postemsg.exe executable on the OS level to send the message to the TEC With some knowledge of LotusScript, I crafted a system to monitor servers and send results back to the Tivoli event console vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" } vMess2 = {hostname="} + vReportServerName + {" } vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" } vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" } vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" } vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING} vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6 result = Shell( vmess , 6 ) 36 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 37 DDM Is an Advanced Topic and Is Best Used by New Admins • • • Domino Domain Monitoring (DDM) is a powerful, yet complex tool, that is often overlooked by administrators If you are using Domino 6, 7, or 8, you are already a proud owner of Domino Domain Monitoring Database, and could already be using its powerful functionality If you’re not using DDM, you see this with each server start 01/22/2013 11:49:08 AM Warning: All Domino Domain Monitoring probes are disabled resulting in the loss of valuable diagnostic information. Please configure DDM probes in events4.nsf. Assess DDM reports in ddm.nsf. 38 DDM Backs Up Its Discoveries with Explanations • • DDM explains the probable cause, possible solution, and sometimes corrective actions That’s right; actions that will actually correct the problem you’re experiencing These are stored in the EVENTS4.NSF and are configurable by you Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY” 39 Looking in the View, “Event Messages by Text” • We can find that error message in the EVENTS4.NSF And discover how we might change report DDM produces 40 The Cause, Solution, and Corrective Action Are Listed • This document has all the probable cause, possible solution, and corrective action These are supplied by Lotus and include the code in the corrective action 41 Click the Link to the Modular Corrective Action • Clicking the link will take you to the code This could be in formula language, LotusScript 42 The Modular Corrective Action Is Re-Usable • At the bottom of the modular action, there is a list of other error text messages that also use this action That same action that was written only a single time can be used as a corrective action multiple times 43 Modular Documents – Cause, Solution, and Corrective Actions • Domino 8 comes with over 1,000 modular documents Chances are your solutions are already there for most issues You can use any of the same solutions provided by IBM for your custom solution Or you can add brand new ones 44 Modular Documents Let You Create Describe Issues • Modular documents let you add your own probable cause and possible solution text And create corrective actions that are created with formula code and LotusScript agents 45 You Can Add to the Solutions That Will Display with the Error • • Select the custom entries tab and add the description A custom solution of composing an email to the target user can be inserted 46 Changes the DDM Report • The modular document now has the “compose an email” choice 47 It Starts the Email for You • The code plugs in the user’s name and the database that was being accessed And it’s all done with modular documents in EVENTS4.SNF 48 Role in DDM ACL That Will Restrict Who Can Use Actions • Many events have corrective actions associated with them Only users with the Execute CA role in the DDM ACL are able to access the command actions and the corrective action text and links This ensures that only qualified team members will be able to make the changes 49 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 50 Dealing with Problematic Servers • Sometimes there are servers with issues that crop up We would like to collect statistics for analysis from these systems more frequently than we do from the standard statistics collection interval If you try to add a second collection interval on a server, you’ll get this: 51 Each Server Is Allowed to Collect Stats with Only One Interval • • • A server can only have one collection interval You must create a second collection document for another server Don’t forget to add the “collect” task to servertasks= in NOTES.INI Let’s look at a server that has CPU spikes First, we create a statistics collection document for a second server to take statistics from our problem server 52 Set the Collection Interval for Five Minutes • • Set collection interval for 5 minutes Do not check any filters!!! They tell the collector to ignore the statistics you checked Note that stats are being logged to a database called ProblemServer.NSF Used exclusively to track CPU util of Traveler task Note that the data in this example has been fictionalized for effect 53 Create a Special View That Tracks CPU Utilization for Traveler • • In this case, it’s the Traveler CPU we want to track We create a custom view for the collecting database that only has the server name, the time of collection, and the statistic called Platform.Process.Traveler.1.PctCpuUtil This will be used to easily create a graph of the CPU activity 54 Collect the Data, Copy It as a Table from the Custom View • • • After collecting a week’s worth of data, we experience the CPU utilization All the data in the view is selected using Ctrl-A It is copied as a table Copying views as a table is my favorite feature in Notes A Monitoring Results template is posted on my Web site A URL to this template is included at the end of the presentation 55 Data Has Been Copied to a Spreadsheet • A simple paste of the data puts it into a spreadsheet where we are ready to turn it into a chart 56 Use the Tools in Your Spreadsheet to Create a Graph • • Select the columns Collection Time and Traveler CPU Create a graph from the data In this example, a scatter chart type with smooth lines is being used 57 The Resulting Graph • This produces an excellent graph of the CPU utilization over a tenday period with samples being taken at intervals of 5 minutes And it took less than 5 minutes to make this chart One adjustment was made to the x-axis formatting and the legend was removed 58 Demonstration • Creating a graph of results from a custom view of collected data 59 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 60 Some Events Occur on the Console, but Not in the Log • • • Note: In this example, the server stops reporting at 11:04 pm Then, at 11:27 pm, it is back on line What happened in the interim? Name: Time: Mail1/domlab 01/04 11:02:05 PM Miscellaneous Events: 01/04/2013 11:04:17 PM 01/04/2013 11:04:31 PM 01/04/2013 11:04:31 PM 01/04/2013 11:04:33 PM 01/04/2013 11:04:35 PM 01/04/2013 11:04:38 PM 01/04/2013 11:04:43 PM SMTP 01/04/2013 11:04:51 PM Name: Time: Pulling icl.ntf from Maill2/domlab icl.ntf Access control is set in catalog.nsf to not allow replication from BES02/domlab catalog.nsf Access control is set in mail2/domlab catalog.nsf to not allow replication from catalog.nsf Pulling ddm.nsf from Mail2/domlab ddm.nsf Pushing ddm.nsf to Mail2/domlab ddm.nsf Finished replication with server Mail2/domlab Router: Transferred 1 messages to MAIL2.domlab.COM (host MAIL02.domlabUSA.COM) via Opened session for Mail2/domlab (Release 8.5.2FP1) Mail1/domlab 01/04 11:27:11 PM - 01/04 11:27:47 PM Miscellaneous Events: 01/04/2013 11:27:11 PM Recovery Manager: Restart Recovery complete. (196/1686 databases needed full/partial recovery) 01/04/2013 11:27:11 PM Informational - The DAOS catalog is not synchronized. Deletions will be postponed. Please run 'tell daosmgr resync' at the next convenient opportunity to re-synchronize. 01/04/2013 11:27:12 PM Event Monitor started 01/04/2013 11:27:12 PM Warning: All Domino Domain Monitoring probes are disabled res 61 There Is Action in the CONSOLE.LOG • • CONSOLE.LOG and other logs are in the folder called IBM_TECHNICAL_SUPPORT under the data folder The CONSOLE.LOG on a server often contains data that has been seen on the Domino server console, but not in the server log It shows there was a Long Held Lock Dump and then a panic! Lock(Mode=SIX* LockID(DB DB=G:\Lotus\Domino\Data\mail\web\Complaints.nsf)) Waiters countNonIntentLocks = 1 countIntentLocks = 1, queuLength = 95 [Req(Status=Granted Mode=IS Class=Manual Nest=0 Cnt=1 Tran=0 Func=N/A m\lkmgr.cpp:159 [0D64:0002-0D60]) rm_lkmgr_cpp:2070 rm_lkmgr_cpp:1306 nsfsem1_c:169 nsfsem1_c:1020 nsfsem6_c:503 Req(Status=Granted Mode=SIX Class=Manual Nest=0 Cnt=1 Tran=0 Func=N/A inplace.c:153 [099C:0165-12FC]) LkMgr END Long Held Lock Dump -----------------01/04/2013 11:04:51 PM Opened session for Terry Mallory/domlab (Release 8.5.2FP2) 01/04/2013 11:04:51 PM Closed session for Terry Mallory/domlab Databases accessed: 1 Documents read: 0 Documents written: 0 The server process terminated abnormally with the exit status = 1. Please send this information and the collected nsd log to IBM Support. This process will now Panic in order to start fault recovery operations. 62 Why Did This Happen? • • • In this case, there was a large number of email messages with big attachments waiting to be processed in the MAIL.BOXES The server was relatively underpowered Plus, I think the messages were part of an emailing made by a CEO And we all know, the mostly visible executives have the worst time with any piece of messaging software 63 Here’s Another Example of Helpful Console Logging • • I entered the following into the Domino server console Tell traveler stat show That command generates hundreds of lines of statistics and other information It shows clearly on the server console 64 Here’s Another Reason for Console Logging • • Here’s the Domino server log showing me doing several furious requests to the Traveler task to Tell traveler stat show I get nothing > tell traveler stat show 01/06/2013 12:24:49 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show > tell traveler stat show > tell traveler stat show 01/06/2013 12:24:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show 01/06/2013 12:24:55 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show > tell traveler stat show 01/06/2013 12:24:55 PM 01/06/2013 12:25:43 PM 01/06/2013 12:25:43 PM 01/06/2013 12:25:43 PM 01/06/2013 12:25:44 PM 01/06/2013 12:25:44 PM 01/06/2013 12:25:44 PM 01/06/2013 12:25:52 PM Directory Cataloger finished processing names.nsf: Directory Catalog has no Configuration record AMgr: Start executing agent 'PullFromAdmin4' in 'certreq.nsf' by Executive '1' AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'PullFromAdmin4' in 'certreq.nsf' AMgr: 'Agent 'PullFromAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab' AMgr: Start executing agent 'SubmitToAdmin4' in 'certreq.nsf' by Executive '1' AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'SubmitToAdmin4' in 'certreq.nsf' AMgr: 'Agent 'SubmitToAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab' Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show > tell traveler stat show 65 Check the IBM_TECHNICAL_SUPPORT Folder • • CONSOLE.LOG from the IBM_TECHNICAL_SUPPORT folder on the server Whenever there are server issues, don’t forget to check the console.log for evidence 01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show tell traveler stat show CPU.Pct.000-010 = 7 ClusterCache.Access = 1 Constrained.count = 0 Constrained.state = false DB.Connections = 1 DB.Connections.Idle = 1 DB.Connections.Max = 7000 DCA.C.CheckAccessRights = 2 DCA.C.Count.NSFDbClose = 3 DCA.C.Count.NSFDbOpen = 3 DCA.C.Count.NSFNoteClose = 2 DCA.C.Count.NSFNoteOpen = 2 DCA.C.HTMLCreateConverter = 1 DCA.C.HTMLDestroyConverter = 1 DCA.C.ModDoc.RunCount = 1 DCA.C.ModDoc.SyncableDocs = 1 66 Console Logging Configuration • • • • • To start a console log permanently on your servers, add this to the NOTES.INI Console_Log_Enabled = 1 Use the following values 0 – Disable Console Log file logging 1 – Enable Console Log file logging You can also toggle logging to the Console Log file from the server console Use the start consolelog and stop consolelog commands Obviously, this is an important feature and you’d want it to be enabled all the time Set a maximum size of almost 100MB for the console log using the following parameter Console_Log_Max_Kbytes = 100000 67 Console Mirroring • • • You can also use Console Mirroring, which is slightly different than just the normal console logging Console log mirroring causes a new server thread to be created It monitors all messages written to the Console Log file and duplicates these messages into another file When this file is filled, the thread closes the mirrored file and creates a new file into which subsequent messages are written Console log mirroring has three related NOTES.INI settings: Console_Log_Mirror=1 – Enables the mirroring feature Retain_Mirror_Logs=1 – Prevents deletion of previous mirrors when Domino starts Console_Log_Max_Kbytes= – Sets the max size of the Console Log/mirror files 68 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 69 Can You Be an Admin/Dev Person? • • • • When you’re an admin, there are a lot of reasons to learn LotusScript Write your own agents that gather statistics and monitor servers LotusScript lets you ask for a statistic on all of your servers, one by one, then store it in a database and produce alerts and notifications These can be more sophisticated than native Notes monitoring The following are two examples of coding that you might find helpful If you have buddies in the Dev side of the house, they might find this interesting Generally, Dev people don’t do applications that help administrators Their focus is on user applications These two snippets can give you an idea of the potential you have when dealing with statistics and LotusScript 70 Gathering Script Using LotusScript Is Easy • • Here’s an agent that simply issues a Domino server console command Then shows you the value in a MessageBox It’s pretty cool for 10 lines of code Sub Initialize Dim session As New NotesSession Dim vServername As String Dim vConsoleCommand As String Dim vConsoleReturn As String vConsoleCommand = "sho stat server.trans.total“ vServerName = "admin/domlab“ vConsoleReturn = session.sendConsoleCommand(vServerName,vConsoleCommand) MessageBox(vConsoleReturn) End Sub 71 The Mail.TotalPending Statistic • • • • This stat was introduced in Release 5, and I use it all the time in monitoring servers for mail backing up From SPR# BSAW4HFMPY www-304.ibm.com/support/docview.wss?uid=sim43d86a0d3e79 e0e6785256a8500737f2b Added a new Mail.TotalPending statistic that shows the count of messages pending in mail.box This statistic is updated once every 5 minutes by the Server task Does not depend on the Router task for updates Provides information about total backlog of mail in the event that the router is hung or not started High value indicates that a mail routing problem needs further investigation 72 Here’s a Similar Code Snippet That Gets Total Pending Mail • • • • This is from a much larger agent that runs every 5 minutes on 70 servers Remember, LotusScript lets you issue console commands Then, take the results of the command and take other actions Our job is to parse out the number 130 from the show stat command Show stat mail.totalpending We’re grabbing the stat mail.waiting, which looks like this on the console Mail.TotalPending = 130 1 statistics found 73 Here’s the Meat and Potatoes • • Mail.TotalPending = 130 1 statistics found Then, it’s being parsed out so that only the number is grabbed vLocStart = InStr(1,vConsoleReturn,"=",5 )+2 Gives location 2 chars past = sign where the number starts vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart Gives location of end of number at line feed CHR(13) vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd) That’s the number as a string, which is converted to a number 74 Here’s the Meat and Potatoes (cont.) • • Mail.TotalPending = 130 1 statistics found Here’s a snippet of code that gets you the mail.totalpending statistic vConsoleCommandPending = "sh stat mail.pending“ 'lets ask the console how many messages are pending vConsoleReturn = session.SendConsoleCommand(vServerName, vConsoleCommandPending) vLocStart = InStr(1,vConsoleReturn,"=",5 )+2 vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd) 'Print "Pending: " + Str(vMailTotalPending) + " Pending: " + vStatStr vMailPending = Val(vStatStr) 75 LotusScript and Monitoring/Alerting – A Great Pair of Tools • • • You get the advantage of automation with the power of monitoring and alerting Stop issues before they become problems Don’t forget, download the custom statrep Technotics Statrep 8.5.3 from: www.andypedisich.com 76 What We’ll Cover … • • • • • • • • Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 77 Where to Find More Information • • • • www-01.ibm.com/support/docview.wss?uid=swg27008849 Notes/Domino Best Practices: Performance (IBM, 2010). www-10.lotus.com/ldd/__00256C3E0030650D.nsf/0/1F2EBFCA1F3 5CA71852571DB00618159?Open Harry Peebles, “Domino Domain Monitoring (DDM) Educational Resources” (IBM, 2006). www-01.ibm.com/support/docview.wss?uid=swg21293213 How Does the notes.ini File Parameter ‘server_session_timeout’ Affect Server Performance? (IBM, 2010). www.ibm.com/developerworks/lotus/library/domino-servercrashes/ Kiran Bellari, “Troubleshooting Lotus Domino Hangs and Crashes” (developerWorks, 2006). 78 7 Key Points to Take Home • • • • Write your own program in LotusScript or formula language and add it to DDM’s corrective actions Collect statistics from problem servers by creating a second collecting server in your domain Console logs collect everything that happens on the console, including messages from tasks and from NOTES.INI debug parameters Check the replica ID for the Events4.NSF in your domain to ensure it is the same on all servers 79 7 Key Points to Take Home (cont.) • • • Full Administrator Access is a powerful tool that should be monitored for proper usage Event handlers can notify you about any message that appears on the console Email is the most widely-used notification system, but is also the most risky 80 Thank You for Attending Our Session! • • Please don’t forget to fill out your evaluations. We read them all! Please feel free to stop us and ask questions or just have pleasant conversations Contact us! Andyp@technotics.com www.technotics.com www.andypedisich.com 81