Best Practices for Domino Server and Application Tuning Andy Pedisich Technotics © 2012 Wellesley Information Services. All rights reserved. What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 1 Keep Up with Domino Fixpacks and Releases • • • Use this link to find out what’s new www-10.lotus.com/ldd/r5fixlist.nsf/WhatsNew In some cases, this will take you to a “Top 20 Fixes” for a new release Granted, reading all this material can cure anyone’s insomnia, but someone has to do it and it might as well be you Lots of Domino shops like to lag a bit when it comes to fixpacks Why do I need to keep up? “I didn’t see anything that might affect us” Here’s a good example of why you might want to keep up with the fixpacks, even if you didn’t see a problem in your environment 2 Running Domino on Windows 2008 64-Bit • • Windows 64-bit introduced a new problem with Domino Microsoft Windows 2008 64-bit servers sometimes have significantly increased CPU usage and I/O degradation when Lotus Domino opens or backs up large numbers of databases www-01.ibm.com/support/docview.wss?uid=swg21449825 I personally saw one case where we couldn’t seem to put enough RAM into the system Server started running at 100% RAM util, and stayed that way It wasn’t until we were in the 16GB range that the utilization dropped down to 85% No users were on the Domino server at the time Not everyone would even see this problem 3 Virtual Address Space Becomes Exhausted • The Virtual Address Space cache may be completely used up Successive calls to OS cache manager to get memory from the OS system cache results in mapping/un-mapping of views from the system cache These operations take a lot of CPU time and, as a result, show as high OS CPU usage In addition, the large OS system cache may now reside on the disk RAM is not large enough to hold the OS system cache • The result is significant I/O on the system This occurs with Domino 8.5.2 4 You Might Need a Hotfix and a Domino Parameter • Domino opens databases with a RANDOM flag FILE_FLAG_RANDOM_ACCESS In Windows 2008 64-bit, this flag causes file blocks that are read to stay in the cache until the file is closed Domino keeps files open in the Database Cache (dbcache) for performance reasons It takes quite a long time until the cache is released 5 Parameter Needed for Release 8.5.2 FP2 and a Hotfix • • • SPR #KBRN899NF6 and a hotfix provides a notes.ini variable to disable the FILE_FLAG_RANDOM_ACCESS Once you have installed the hotfix, use this parameter Disable_Random_RW_File_ATTR=1 It is fixed in Domino 8.5.2 FP3 and 8.5.3 It’s another great reason to keep up with fixpacks and new releases But you’re still going to need a lot more memory running on Windows 2008 (R2 also) SPR# KBRN8AKKA9 – Fix to better improve performance when opening files on Windows 64-bit platform 6 Keep Disks Unfragmented • • • Many administrators falsely believe that Domino does not suffer from fragmented files on disk Fun fact: Domino uses smaller allocations for new documents This can cause files to be spread out across the disk, which can cause performance issues, especially during backups The system has to hunt for all the sectors spread everywhere on the disk Defragment once per week when the server is not busy There are several Windows tools, such as: Contig V 1.6 It’s a free tool from Microsoft http://technet.microsoft.com/en-us/sysinternals/bb897428 7 A Free Defrag Tool for Domino that Uses Contig 1.6 • • Domino Defrag 3.2 OpenNTF Project www.openntf.org/internal/home.nsf/project.xsp?action=openDo cument&name=DominoDefrag An open source solution of R853+ C API Lotus Domino server task (DominoDefrag.exe) and a R853+ Lotus Domino server XPages database called the DominoDefrag Administrator DominoDefragAdmin.nsf – relies on http://extlib.openntf.org/ Server task uses “contig.exe” (v1.6) to defrag Domino databases on all Windows server 2003-2008 versions (32-bit and 64-bit) And will also defrag a full-text index associated with a Notes database and the Domino server’s transaction log and DAOS files 8 A NOTES.INI Parameter Improves the Product • • DominoDefrag_EnterpriseSupport=1 (on) Output is recorded to CSV files, and sent to the DominoDefrag Administrator for processing attached to a summary email Has the added functionalities: Being able to compact a database prior to defragging Supports multi-processing (can load multiple times to run concurrently) and use of an indirect file (.ind) for compact batch functionality Performance checks can also be tested using generated document collections This will help to determine the “before and after” defrag millisecond read performance of databases and their associated full-text indexes 9 A General File System Recommendation for All OS • • • Keep at least 30% free space available on all drives This allows the file system to optimize where to write data Helps to reduce file fragmentation Keep file systems below 1GB on all platforms This helps performance, and makes disaster recovery faster and simpler You might have to split your data up to fit the smaller volumes The payback will come from better performance for mail and applications Admittedly, it is harder to have smaller volumes with mail files than with applications We like keeping all mail in one folder, don’t we? 10 Working With the Server Availability Index (SAI) • Did you ever track an SAI and noticed that a server never really seemed to be available? Or maybe you never tracked an SAI before You can, with our special Statrep database TechnoticsR85Statrep.ntf Download free from www.andypedisich.com 11 The Stats Are There, Now You Can See Them • It has all the views that are on the original Statrep Plus over a dozen additional views to help you analyze the stats your servers generate 12 The SAI Is Fixed in R8.5 • • It was broken for many years SAI calculation on fast servers still might not work for you There is a routine called LOADMON that runs on Domino that stores values in a LOADMON.NCF file on the server It compares access times using micro-seconds On a fast server, at off-peak times, transactions can take just a few micro-seconds For normal servers, the SAI can sometimes look low 13 The Expansion Factor • Servers determine their workload based on the expansion factor This is calculated based on response times for recent requests Server compares recent response time to minimum response time that the server has completed Example: Server currently averages 12ms for DBOpen requests; minimum time was 4ms Expansion factor = 3 (current time/fastest time) This is averaged over different types of transactions Fastest time is stored in memory and in LOADMON.NCF LOADMON.NCF is read each time server starts 14 Delete LOADMON.NCF When the Server Starts • • Delete LOADMON.NCF when server is down to delete old minimum values Do this with a scripted start under the Windows platform Delete LOADMON.NCF before Domino starts You can still do it on the Linux platform for free Nash!Com has a start script for free www.nashcom.de/nshweb/pages/startscript.htm The link has a list of all changes Plus a link where you can request the script from Daniel Daniel is one of the smartest Domino administrators I have met in my entire career Linux/Unix start script can delete LOADMON.NCF automatically 15 The Expansion Factor • • But sometimes, Domino has a difficult time calculating the expansion factor The result is that the Server_AvailabilityIndex is not a reliable measure of how busy the server is This can happen with extremely high-performing servers If you see a very low Server_AvailabilityIndex at a time you know servers are supposed to be idle and you are trying to load balance, there is something you can do to correct it And Domino can help! 16 Changing Expansion Factor Calculation • • Use this parameter to change how the Expansion Factor is calculated SERVER_TRANSINFO_RANGE=n To determine the optimal value for this variable: After the server has experienced heavy usage, use this console command: Show AI This means, show the availability index calculation It has nothing to do with that 2001 Steven Spielberg movie, about the robot that looks like a child and tries to become a real boy 17 An Easy Way to Find the Parameter Value • Show AI is a console command that has been around since Domino Release 6 It runs some computations on the server And suggests a SERVER_TRANSINFO_RANGE for you 18 Platform Disk Statistics • • • • The disk specification will vary by server Platform.LogicalDisk.1.AvgQueueLen AvgQueueLen: The average number of both read and write requests that were queued for all logical disks on all physical disks during the sample interval Should not consistently rise above 2 Platform.LogicalDisk.1.PctUtil PctUtil: Percent of time the drives are busy reading or writing Watch for disks constantly hitting above 80% Track both of these statistics in Notes with the new Statrep Follow up with performance monitoring on the OS level 19 Change the View Temp File Default Folder • • • By default, Domino generates temp files in the server’s temporary folder when it rebuilds a view Directory used by update/updall tasks for rebuilding indexes The default is usually somewhere on the system drive C: when using Windows servers If the system doesn’t have a temp folder, Domino puts the temp files in the Domino data folder Because of the disk I/O and disk space required, you should change the location to a different drive Not your Domino data drive, or your transaction log drive, or your OS drive, or your DAOS file system For maximum performance, it should be on its own drive 20 Make Sure There Is Plenty of Space Available • • Use this parameter: VIEW_REBUILD_DIR=(drive and folder location) Make sure you have plenty of space available The performance increase is worth the trouble If Domino calculates that there isn’t enough space on the temporary folder’s drive, it uses a slower method to rebuild the view You’ll see the message below in the log and console It’s best to remedy this with more disk space, or performance will actually drop Warning: Unable to use optimized view rebuild for view due to insufficient disk space at directory. Estimate may need x million bytes for this view. Using standard rebuild instead. 21 Anti-Virus Software on Domino Servers • • I hate running AV software as a Domino task Many shops have stopped using it because malicious software is caught with perimeter software or desktop software If you must run OS platform AV software, remember to exclude: Domino data directory Transaction log drive TMP directory DAOS drive View rebuild directory 22 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 23 Use Transaction Logging • • Transaction logging can increase performance significantly Enable transaction logging in the server document T-Logs might already be in use in Archive logging style if servers are backed up incrementally Otherwise, use the Circular logging style so that transaction logging reuses space But be careful where you put the logs 24 Choices to be Made by Administrators • You’ll need to decide whether to configure the transaction logs to create more or less checkpoints To record a recovery checkpoint, Domino evaluates each active logged database to determine how many transactions would be necessary to recover each database after a system failure Then, it creates a recovery checkpoint record in the transaction log that lists each open database and the starting point transaction needed for recovery 25 Runtime/Restart Performance • Your choices are: Standard (default and recommended) To record checkpoints regularly Favor runtime To record fewer checkpoints Requires fewer system resources and improves server runtime performance, but causes more of the log to be applied during restart Favor restart recovery time To record more checkpoints This option improves restart recovery time because fewer transactions are required for recovery 26 Location of Transaction Logs • • Transaction logs work best if placed on Raid 1 disks These are mirrored drives And should be local to the server These logs should not be placed: On the Wintel system drive C: On the same drive as the Domino data On a SAN drive 27 Disconnect Idle Users • • • An idle user stays connected to a server for 4 hours This takes up valuable server resources Use this parameter to drop idle users faster SERVER_SESSION_TIMEOUT=(number of minutes) Users will not have to re-enter a password if they become active after the time limit The minimum recommended setting is 30-45 minutes A lower setting may negatively impact server performance IBM/Lotus says it’s not needed in R8 But I like to use the parameter regardless It gives you more realistic user concurrency stats 28 1,000 Users – Server_session_timeout=60 • Comparison of memory usage on a Domino server 29 650 Users – Server_session_timeout=30 • Domino server memory comparison with and without the parameter set to 30 30 650 Users – Server_session_timeout=30 (cont.) • CPU Utilization comparing with and without the parameter 31 Disable HTTP Server Logging • • We’ve found many instances where DOMLOG.NSF was well over 2GB And it was nearly impossible to wait for it to open Because it had never actually been opened before If you don’t look at the logs, improve performance by disabling the HTTP server logging It’s in the HTTP section of the server document Disable both the Enable Logging and Domlog.nsf 32 Don’t Maintain Read Marks on All Databases • • Replication of unread marks was primarily designed for mail databases If you don’t need them, don’t replicate them, because it can significantly slow database performance For example, keep them switched off in Help, LOG.NSF, NAMES.NSF, and any reference application Work with your developers to develop standards for enabling or disabling the feature 33 Plan on a Monthly Restart for Domino Servers • • • Consider regular monthly restarts of Domino servers Not just Wintel-based servers, all servers Server memory allocation and shared memory fragmentation can occur over time Plus, there could be undocumented memory leaks Regular restarts will help ensure your Domino servers are running as efficiently as possible 34 Keep as Few Documents in Inbox as Possible • • • We all know large mail files are a problem, right? This is true, if only from the perspective of disk space But the issue is bigger than just disk space And here’s the proof you can take back to your domain IBM/Lotus did a study using Domino on the iSeries called: Sizing Large-Scale Domino Workloads on iSeries They found that reducing the number of documents kept in the inbox: Reduces overall CPU usage Improves response time And can dramatically improve startup/recovery performance 35 It’s Very Logical When You Think About It • • In terms of performance, the Inbox is the most “expensive” container in a mail file The Inbox folder contains all new messages a mail file receives It must be updated each time a user opens the file Or clicks Refresh to see new mail The more documents kept in the Inbox folder, the more expensive it is to refresh the view of it Reducing the number of documents in the folder reduces the CPU and main storage required to update the view of it 36 What Can You Do About It? • Two things you can do about this problem First, when a user calls and says that Notes is slow, ask this question: How many messages are in your inbox? This should be a standard part of your help desk response Urge them to keep no more than 90 days in the inbox Use NOTES.INI parameters on Notes client to demonstrate how indexing the inbox is a major problem CLIENT_CLOCK=1 Debug_Console=1 37 Use Release 8.x Inbox Manager • • • Second, control the number of messages in the inbox using settings in the AdminP section of the server document AdminP can start an agent in the user’s mail file to remove messages from the Inbox This can also be controlled from policies The messages are not deleted They are still in the All Documents view Users need to know where the messages can be found 38 Control User Polling for New Mail • • Some users want to know if they have new mail They configure a user preference to check for new mail every couple of minutes If there are a lot of users on a server, a setting like this can really hurt performance 39 Override the User Configuration for New Mail Polling • • • Add this parameter to mail server’s NOTES.INI to control how often a client can check for new mail MinNewMailPoll= (number of minutes) Experiment with this number, but 15 is safe This parameter overrides the user’s selection in the Mail Setup dialog box This can prevent frequent polling from affecting server performance Parameters like this one should be in every server’s NOTES.INI That’s why they belong in a server configuration document 40 Port Compression • • • Enable network port compression! This is especially good for server-to-server communication Must be enabled on server Client should be enabled using policies Up to 60% compression of data 41 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 42 There Is a New On-Disk Structure for Domino 8 • • The term On-Disk Structure (ODS) describes the internal architecture of Notes databases Each new release, except ND7, has included an update to the ODS to accommodate new features and functions Domino 8 includes a new On-Disk Structure, ODS48 43 Design Compression Saves Space • Design compression reduces the size of databases by compressing design elements by up to 60% It will shrink the standard Notes 8 mail template MAIL8.NTF from 25MB to 11MB The compression percentage achieved will vary from database to database This is based on the compression ratio achieved for each design element in each application 44 Enabling Design Note Compression • • The design compression switch is available on the Advanced tab of the properties of applications with ODS43 and ODS48 You must be using the Notes 8 client to see the option However, the compression will not occur unless the application is subsequently upgraded to ODS48 Once enabled, the Design Compression setting replicates to other replicas of the application Keep in mind that the ODS itself does not replicate 45 Your ODS By Default Is 43 • • When a new application is created in a Lotus Notes 7, 8, or 8.5 client or on a Lotus Domino 7, 8, or 8.5 server, the on-disk structure (ODS) remains at 43 The on-disk structure has been upgraded in Notes/Domino 8.5 to the new ODS version of 51 Add the following parameter to the NOTES.INI on the server or client to use ODS 51: CREATE_R85_DATABASES=1 46 Use Compact –C to Upgrade to New ODS • • Yes, it must be a compact –C, –B will not work Makes it easy to plan the ODS upgrade Low risk, no problems have been seen Besides the “compress database design” option from ODS 48 in advanced properties, it gives you options to turn on Compression of non-summary data Use Domino Attachment and Object Service (DAOS) 47 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 48 Making Applications Behave • • You’re not a developer, you’re an administrator What can you do to help applications stay under control? The biggest complaints about agents that run applications are: The agents run too long The agents consume vast amounts of memory The agents utilize too much CPU on the server And all of these complaints are usually made anecdotally They are in conversations heard in elevators or around water coolers Are there still water coolers for people to stand around, gossiping? 49 Domino Domain Monitor Probes • • One way to scientifically prove when agents consume extraordinary resources is to use application probes in DDM These are set up in the Monitoring Configuration Database That’s EVENTS4.NSF Note that you can track agents by how long they run, behind schedule, by CPU utilization, and by memory usage 50 Long Running Agents • • Every administrator knows that you can set a maximum agent execution time in server documents You could just set it for 1,440 minutes and allow agents to run all day long How do you know how long agents really run? Just ask the developer! They are very honest, hardworking people, for the most part 51 Find the Truth • • You can set up a probe to monitor agents and report back to DDM if an agent ran longer than a time you think is reasonable For example, 4 hours or 240 minutes You can monitor agent manager or the HTTP process DDM will report to the DDM database when an agent runs longer, and will report it as a event of Fatal severity Or you can set up a probe that monitors memory utilization 52 De-Mystify the Situation • • • The probe will report back to the DDM database You will have actual data rather than water cooler data You can make an intelligent choice about agents and resources 53 Full-Text Indexing for Searches • • • • Should all servers be able to update full-text indexes? NO! FTI uses disk resources – adds 25%-45% to DB size FTI requires CPU and memory resources Only enable FTI where it is absolutely necessary Such as mail and application servers where users require it Disable full-text index building on hubs, gateways, and any other server that does not have the requirement Use Notes.ini parameter Update_No_Fulltext Set to 1 to prevent FTI builds Set to 0 to allow FTI builds 54 Simple Search Is Simply Awful Sometimes • Simple search is the type of processing used when a user searches a non-full-text indexed application The simple search algorithm does the job, but is not very efficient It can significantly impact performance on a Domino server For some applications, the ability to search documents may not really be necessary However, the default functionality still allows users to do simple searches on applications that are non-full-text indexed 55 Preventing Simple Searches • Administrators can now prevent simple searches if an application is not full-text indexed Enable this by selecting “Don’t allow simple search” on the Advanced tab of Database Properties 56 Preventing Simple Searches (cont.) • If users attempt to simple search a database with this option enabled, they will receive an error message as shown below This will probably generate a few help desk calls Be prepared by providing info about this feature, if you’re going to deploy it 57 Property Doesn’t Replicate • Keep in mind that the “Don’t allow simple search” property does not replicate for existing database replicas This lets you decide selectively whether each replica should have the setting enabled The setting is carried over to new replicas and copies 58 Properties and How They Affect the Environment • • Database properties that impact performance and that should NOT be set by the developer (these are up to you) They in no way impact the behavior of the app, but they do impact the behavior of the server or client 59 Database Settings for Optimal Performance Property Tab To optimize performance/size Improves database performance? Reduces database size? Set By Administrator or Developer Document table bitmap optimization Advanced Select option Yes No Admin Don't overwrite free space Advanced Select option Yes No Admin Disable Transaction Logging Advanced Depends on type of Application Maintain LastAccessed property Advanced Deselect option Use LZ1 Compression for Attachments Advanced Select the option only if ALL elements of environment are ND6 Admin Yes No Admin Yes Admin *Original Table from Domino Administrator Help – Modified by Technotics 60 Database Settings for Optimal Performance (cont.) Property Tab To optimize performance/size Improves database performance? Reduces database size? Set By Administrator or Developer Allow use of stored forms in this database Basics Deselect option Yes Yes Developer Display images after loading Basics Select option Yes No Developer Don't maintain unread marks Advanced Select option Yes Yes Developer Don't support specialized response hierarchy Advanced Select the option Yes Slightly Developer Don't allow headline monitoring Advanced Select the option Prevents performance degradation No Developer *Original Table from Domino Administrator Help – Modified by Technotics 61 Design Elements That Adversely Impact Performance • • @DbLookup/@DbColumn Excessive numbers of these will degrade your server’s performance, as well as that of the client This especially applies to applications that will be accessed from a browser WebQueryOpen/WebQuerySave Agents These are agents that are triggered anytime a form is opened or saved from the Web They execute on the server and can crush your performance Make sure you do performance testing WITH LOAD before deploying 62 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 63 Understanding Cluster Replication • • Cluster replication is event driven It doesn’t run on a schedule The cluster replicator detects a change in a database and immediately pushes the change to other replicas in the cluster If a server is down or there is significant network latency, the cluster replicator stores changes in memory, so it can push them out when it can If a change to the same application happens before a previous change has been sent, the CLREPL gathers them and sends them all together 64 Only One Cluster Replicator by Default • • When a cluster is created, each server has only a single cluster replicator instance If there have been a significant number of changes to many applications, a single cluster replicator can fall behind Databases synchronization won’t be up to date If a server fails when database synch has fallen behind, users will think their mail file or app is “missing data” They won’t understand why all the meetings they made this morning are not there They think their information is gone forever! Users need their cluster insurance! 65 Condition Is Completely Manageable • • • • Adding a cluster replicator will help fix this problem You can load cluster replicators manually, using the following console command: Load CLREPL Note that a manually loaded cluster replicator will not be there if the server is restarted after manually loading a cluster replicator Add cluster replicators permanently to a server Use this parameter in the NOTES.INI: CLUSTER_REPLICATORS=# I always use at least two cluster replicators 66 When to Add Cluster Replicators • • • But how do you tell if there’s a potential problem? Do you let it fail and then wait for the phone to ring? No! You look at the cluster stats and get the data you need to make an intelligent decision Adding too many will have a negative effect on server performance Here are some important statistics to watch 67 Key Stats for Vital Information About Cluster Replication Statistic What It Tells You Acceptable values Replica.Cluster. SecondsOnQueue Total seconds that last DB replicated spent on work queue < 15 sec – light load < 30 sec – heavy Replica.Cluster. SecondsOnQueue.Avg Average seconds a DB spent on Use for trending work queue Replica.Cluster. SecondsOnQueue.Max Maximum seconds a DB spent on work queue Use for trending Replica.Cluster. WorkQueueDepth Current number of databases awaiting cluster replication Usually zero Replica.Cluster. WorkQueueDepth.Avg Average work queue depth since the server started Use for trending Replica.Cluster. WorkQueueDepth.Max Maximum work queue depth since the server started Use for trending 68 What to Do About Stats Over the Limit • • Acceptable Replica.Cluster.SecondsOnQueue Queue is checked every 15 seconds, so under light load, should be less than 15 Under heavy load, if the number is larger than 30, another cluster replicator should be added If the above statistic is low and Replica.Cluster.WorkQueueDepth is constantly higher than 10 … Perhaps your network bandwidth is too low Consider setting up a private LAN for cluster replication traffic 69 Stats That Have Meaning but Have Gone Missing • There aren’t any views in Lotus version of Statrep that let you see these important statistics Matter of fact, the Cluster view is pretty worthless They lack the key cluster statistics you need to make decisions 70 Stats That Have Meaning but Have Gone Missing (cont.) • But there is a view like that in the Technotics R8.5 Statrep.NTF It shows the key stats you need To help track and adjust your clusters Download from my blog www.andypedisich.com 71 Use a Scheduled Connection Document, Also • Back up your clustered replication with a scheduled connection document between servers Have it replicate at least once per hour You’ll always be assured to have your servers in sync, even if one has been down for a few days And it replicates deletion stubs, too! 72 Don’t Forget About Silent Failover • • Was a parameter you could set in R8.5.2 FailoverSilent = 1 Now available in a desktop policy settings document Client will silently fail over to a different server if the current server is no longer operational No confusing prompts Best practices = set to 1 73 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 74 What Causes Corruption? • • • Lots of changes to a database The more changes to a database, the greater your chance for corruption Until 7.0.2, a database could deal with no more than 30 million Notes IDs in its lifetime – think a high-volume mail.box Frequent view and Full-Text Index (FTI) refreshing or rebuilding Consider a 20 Gig mail file with FTI update frequency of “immediate” Insufficient hard drive space Third-party apps improperly set up to “lock” open DBs 75 What Causes Corruption? (cont.) • • • • • • Read-only databases or views Partially-written transactions or changes Agents running against non-existent views Running defrag on the OS with Domino running The servers/agents’ lack of access to design elements And many more … 76 How Do I Find Corruption? • • • • Domino log (Log.nsf) review Server console Domino Domain Monitoring Database (DDM.NSF) Ad hoc – your phone rings/you get a ticket Most likely way to find corruption, if you don’t have proactive monitoring setup for keywords, such as: “corrupt” “RRV-bucket” “b-tree” Corruption may prevent access to applications, cause phantom data to “appear” in a view, or generate error messages for end users 77 The Three Commands That Fix Most Issues – Fixup • • • Fixup does a great job, but be careful Resolves inconsistencies resulting from partially-written operations, including improperly closed databases “This database cannot be opened because a consistency check of it is in progress” Is not needed when transaction logging is enabled Takes open databases offline for duration of task Is the only “destructive” maintenance task “Removes” corrupt data elements! Does not leave a deletion stub Requires a replica somewhere to replace removed corrupted documents 78 Fixup Options • • • • Fixup –L Logs every database, fixup opens and checks Without this, only encountered problems are logged Fixup –N Prevents fixup from “removing” corrupted documents Use this to salvage data if there are no other replicas Fixup –V Prevents fixup from running on views Reduces Fixup runtime Fixup –C Verifies the integrity of the database and reports errors Does not purge corrupted documents For more on fixup switches, see Administrator Help 79 The Three Commands That Fix Most Issues – Compact • Compact Upgrades the On-Disk Structure (ODS) of a database Allows disk space to be re-used After documents and attachments are deleted from a database Removes documents from a database if archiving to a server is set up via policies Comes in three styles In-place with space recovery In-place with space recovery and reduction in file size Copy-style compacting 80 The Three Compact Styles • In-place with white space recovery but no file size reduction Retains Database Instance ID (DBIID) Important for transaction logging The database can be accessed while this runs Default if no switch is used • This is the same as compact –B In-place with space recovery and reduction in file size Assigns a new DBIID Only appropriate for transaction-logged servers if incremental differential back-up software is used Also known as compact –B More resource-intensive and slower than “compact” 81 The Three Compact Styles (cont.) • Copy-style compacting Meaning compact –C Creates a copy and then deletes the original Requires sufficient disk space Assigns new DBIID Does not allow access to the DB while this runs DB access can be granted by adding –L BUT if DB changes, compact is cancelled 82 Compact Options • • • • Compact –S 15 Compacts DBs with 15% or more unused space Compact –R Compacts without conversion to current Domino release Uses copy style Compact –D Discards built view indexes and runs a copy-style compact Compact –A Archives and deletes documents, then compacts DB For more compact switches, see Administrator Help 83 The Three Commands That Fix Most Issues — Updall • Updall Updates or rebuilds view indexes or Full-Text Indexes (FTIs) Including corrupt ones Purges deletion stubs from DBs and discards view indexes By default, view indexes remain for 45 days Use the Notes.ini setting Default_Index_Lifetime_Days to change when updall discards unused view indexes Is the “as needed” version of UPDATE Does not run continuously Included in the Notes.ini setting ServerTasksAt2 More on why you may not want this setting later 84 Updall Options • • • Updall database.nsf –T $Servers Updates a specific view Updall –X Rebuilds full-text indexes, but not views Use to fix FTI corruption Updall –R Rebuilds all used views Use to fix corruption For more updall switches, see Administrator Help 85 How Do I Prevent Corruption? • • • • • • • Avoid conflicting/overlapping maintenance tasks Implement maintenance program docs Set up third-party apps appropriately Set database quotas Control attachment sizes Monitor disk space availability Don’t allow immediate full-text index updates 86 Avoid Conflicting/Overlapping Maintenance Tasks • • • Running more than one maintenance task at the same time can cause corruption, instead of solve it When running maintenance tasks manually via the server console, always wait until they’re done before starting a new one Remove ServerTasksAt2 from your Notes.ini Most admins don’t know it’s there, and schedule compact or other conflicting server tasks at 2:00 am, causing corruption Avoid editing the Notes.ini directly via the operating system Doing so is impossible to track and troubleshoot in case of issues 87 Avoid Conflicting/Overlapping Maintenance Tasks (cont.) • Use these server console commands to capture and edit the Notes.ini: Show config ServerTasksAt2 To see present settings Set config ServerTasksAt2= To set settings Removes the setting entirely from the Notes.ini 88 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 89 If You Are Dealing With … • • • • • “Invalid or nonexistent document” “file.nsf is damaged, field length stored is incorrect” “Database.nsf is CORRUPT – Now Read-Only!” Cause A document or a view index has become corrupted, in some cases due to replication or save conflicts Solution Run “standard database maintenance” Transaction Logging No Transaction Logging Compact –B Fixup –F –L Updall –R –X Updall –R –X Or create a new replica Compact –B Or create a new replica 90 If You Are Dealing With … (cont.) • • • “RRV Bucket is corrupt” Cause A Record Relocation Vector (RRV) table is an index mapping to the actual data’s location on the hard disk RRV buckets don’t replicate Improper Domino server shutdown can cause this Third-party app altered the physical location of the database on disk Disk defrag utility running with Domino server up Solution Run standard maintenance, but with compact –C Make a new replica or copy 91 If You Are Dealing With … (cont.) • • • • “Detected Storage Corruption” “Attempt to use an invalid database pointer” “B-tree structure is invalid” B-tree structure creates efficient lookups, fast access to data “This database cannot be read due to an invalid ODS” Cause ODS problem, incomplete or corrupt index, soft deletes being turned on in pre-ND6 versions of Domino, and in-place compaction moves non-summary data to another location in the database Solution Run standard maintenance 92 If You Are Dealing With … (cont.) • • • “corrupt desktop.ndk or corrupt local names.nsf” B-tree structure defines the way a view index is encoded for efficient lookups and fast access to data Cause ODS problem, incomplete or corrupt index View or full-text index corruption If full-text index is set to “immediate” and view index gets discarded while agents are accessing it Solution Run standard maintenance 93 If You Are Dealing With … (cont.) • • • “Extendible Hash Index is Corrupt and Can’t be Used” Cause The Extendible Hash Index (EHI) is a list of design element names converted into unique values, and does not replicate Corruption occurs when the EHI gets too large or partially overwritten Solution Do a copy-style compaction of the database to force a rebuild Fixup and updall will not repair the Extendible Hash Index Refresh the design in another replica, then replicate, forcing the corrupt EHI to rebuild Pull a new replica or database copy 94 If You Are Dealing With … (cont.) • • Corrupt mail.box Online maintenance does not usually work on mail.box Solution Rebuild the affected mail.box Stop the router and issue “dbcache flush” Rename the mail.box file from the OS Be sure to copy all valid mail out of the old mail.box Restart the router This procedure may not always work, and a Domino server shutdown may be required 7.0.2 is capable of routing “around” corrupt mail.boxes 95 If You Are Dealing With … (cont.) • • • Cannot Write to log file: Database is corrupt – Cannot allocate space – Now Read-Only! Cause Insufficient hard drive space Back-up or anti-virus software running on the server is locking open databases Solution Check hard disk space 1. Shut down the Domino server 2. Rename Log.nsf 3. Restart the Domino server 96 If You Are Dealing With … (cont.) • • • Corrupt transaction logs Cause Domino is not reusing archive-style transaction logs after severe server crash Hard disk problems and crashes Copying transaction logs (*.txn files) over the network Solution Make a full backup of the server Disable transaction logging in the Server doc Stop the Domino server Delete the transaction log directory on the OS Restart the Domino server Re-enable transaction logging 97 If You Are Dealing With … (cont.) • • B-tree, RRV bucket error messages on Names.nsf, and you have tried using online database maintenance Solution 1. Shut down your Domino server 2. Open a DOS prompt 3. Navigate to the Domino Data directory (D:\domino\data) 4. Enter the following commands: C:\Domino\nfixup.exe names.nsf –F If you are transaction logging, use fixup –F –J C:\Domino\ncompact.exe names.nsf –C C:\Domino\nupdall.exe names.nsf –R –X 98 What We’ll Cover … • • • • • • • • Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication Dealing with database corruption Resolving specific problems with databases Wrap-up 99 Additional Resources • • • • Domino Defrag 3.2 OpenNTF Project www.openntf.org/internal/home.nsf/project.xsp?action=openDo cument&name=DominoDefrag Nash!Com’s free Linux start script that deletes LOADMON.NCF www.nashcom.de/nshweb/pages/startscript.htm Download new Technotics Monitoring Results STATREP.NSF template www.andypedisich.com/blogs/andysblog.nsf/dx/admin2011.htm How does the notes.ini file parameter “server_session_timeout” affect server performance www-01.ibm.com/support/docview.wss?uid=swg21293213 100 7 Key Points to Take Home • • • • Consider transaction logging, not only for incremental backups, but also for faster restarts Eliminate tasks you don’t need from the ServerTasksAt parameters, especially ones that interfere with program documents Turn off full-text indexing on hubs, gateways, and other servers that don’t absolutely require it Make it a habit to check cluster statistics to determine if you need more cluster replicators 101 7 Key Points to Take Home (cont.) • • • Use DDM probes to ensure agents aren’t consuming unreasonable amounts of resources Implement silent cluster failover-using policies and make your users happier Prevent simple searches of databases that are not full-text indexed 102 Your Turn! How to contact me: Andy Pedisich andyp@technotics.com www.andypedisich.com 103