Progress Database Repair & Recovery Dan Foreman BravePoint, Inc. Email: danf@prodb.com Introduction- Dan Foreman Progress User since 1984 (V2.1) Speaker at many Progress User Conferences from 1990 to 2012 PUG Challenge 2012 Introduction- Dan Foreman Author of: Progress Performance Tuning Guide Progress Database Administration Guide Progress System Tables Guide Progress V10 Database Admin Jumpstart Book purchase allows free online access ProMonitor – Database Monitoring Tool ProD&L - Accelerated Dump/Load Utility Balanced Benchmark – Load testing tool PUG Challenge 2012 Introduction - Who Are You Progress Version: V6, V7, V8, V9, V10.0*, V10.1*, V10.2* DB OS: Unix? Windows? Linux? Is there anything else? Largest Single Database? Highest Concurrent User Count? PUG Challenge 2012 Special Request Mobile Phones on Mute Please! PUG Challenge 2012 Goals Can I teach you database brain surgery in an hour? Note that DBAs have one big advantage that human brain surgeons don’t…do you know what that is? PUG Challenge 2012 Real Horror Stories Fortune 500 Company (sorry but they would not appreciate us sharing their name) DB Corruption February 23 Last Good Backup: January 11 Last Good AI Files: January 17 We facilitated a special version of rfutil that would ignore errors during the Roll Forward process PUG Challenge 2012 Real Horror Stories Customer running SCO OpenServer We had told the customer to move to a more “modern” OS, i.e. Linux OS Problem – Can’t mount the Disks; Discovered that backups to tape were not occurring (backup to disk was OK but couldn’t see the disks) Had to boot using Knoppix to repair things – took forever to reinstall SCO PUG Challenge 2012 Real Horror Stories Fortune 1000 Company HP Server Admin outsourced to IBM Backups outsourced to 3d party 3d party stopped doing backups, unannounced, due to non-payment DB Corrupted Restoration impossible BravePoint 2012 Preventive Maintenance Backups (yes, I know you think you have backups but have you tested one recently?) Test your Entire Recovery Plan PUG Challenge 2012 Preventive Maintenance Warm Standby Database - A database on another machine with a recent copy of the production DB Also called a D/R Database This is easy to do in Progress...covered soon PUG Challenge 2012 Preventive Maintenance Unix: don’t logon as root unless you really need to Use O/S security to protect the DB, BI, and AI files from accidental or casual or intentional deletion PUG Challenge 2012 Preventive Maintenance Unix: Don’t use kill -9 to terminate a Self Service Progress session; You might bring the database DOWN! if you happen to kill a session that is holding a Latch ALWAYS have an up-to-date Structure (.st) file available - we will see why later PUG Challenge 2012 Preventive Maintenance Monitor the BI file High Water Mark Monitor 'Delinquent' Transactions (Active Transactions longer than 30-60 minutes) Monitor Large Transactions (A Client with a Large Number of concurrent Record Locks) longtrx*.p Progress program on the BravePoint Website to detect Delinquent Transactions PUG Challenge 2012 Preventive Maintenance Use the -bithold parameter as an extra safeguard; Set to 50% of available BI Disk Space; even in V9 V9/V10 supports Terabyte sized BI Files but extent sizes are still limited to 2gb unless you use the EnableLargeFiles option on proutil and the file system must be 2gb enabled too PUG Challenge 2012 Preventive Maintenance Monitor the Area High Water Marks to avoid growing into the Variable Length Extent There is a Performance Hit, usually insignificant but sometimes not, when growing the Variable Extent A Single Variable Extent can limit some of the recovery options discussed later PUG Challenge 2012 Quiz Question Who are the Smartest DBAs in the Room? PUG Challenge 2012 Answer DBAs who enabled After Imaging on their Mission Critical Databases If you’re not using AI, you probably shouldn’t be responsible for your company’s databases Management need convincing? Play Chicken with them PUG Challenge 2012 After Imaging Who is currently not using AI? If not, why not? Is public flogging or humiliation required? PUG Challenge 2012 After Imaging PSC docs say that AI offers protection against Disk failure Disk fails 5 minutes before the backup starts on the final day of your year end close No paper trail Ouch! Time to work on your resume (C.V.) After Image File(s) + Last (Good) Backup = State of DB at time of crash PUG Challenge 2012 After Imaging - Why Use It? But you say…”I have disk mirroring (also known as RAID 1) so I’m protected against a disk failure” BUT Mirroring does NOT protect against all database evils PUG Challenge 2012 After Imaging - Why Use It? True Horror Story #1 A DBA (logged on as root) FTP’d a test database into the directory where the production database resided... unfortunately they had the same name Disk Mirroring worked just fine….. After Imaging would have probably saved the day PUG Challenge 2012 After Imaging - Why Use It? True Horror Story #2 A user ran an archiving program on live data that wasn’t ready to be archived Once again the mirroring performed perfectly AI might have improved the situation as it is possible to Roll Forward to a specific point in time PUG Challenge 2012 After Imaging - Why Use It? True Horror Story #3 – Part 1 BI file hit the V8 2GB limit @ 1600 on a busy day (300+ users) Large Production Database was corrupted Progress Support Recommendation: dump & load or restore from backup which meant substantial down time or data loss PUG Challenge 2012 After Imaging - Why Use It? True Horror Story #3 – Part 2 Fortunately the customer called me and I was able to temporarily patch the database until a D&L could be performed Irony: I had recommended AI to this customer over one year prior to this event PUG Challenge 2012 After Imaging - Why Use It? Avoid probkup online issues Transaction Activity is Frozen while the BI File is Backed Up The I/O Overhead of disk/tape backup Possible Solution Use AI to maintain a Warm Spare DB Backup the Replicated Database PUG Challenge 2012 After Imaging - Why Use It? Warm Standby (D/R) Database A Warm Standby DB is: A replicated Database on another Server DB can an be brought online quickly in case of catastrophic failure to the production system It’s ‘warm’ because it is not 100% current…usually 2-15 minutes behind PUG Challenge 2012 After Imaging - Why Use It? A HOT spare database is not possible using AI except with: Fathom Replication (oops…OpenEdge Replication) Replication Triggers SAN Mirroring Even these options don’t guarantee zero loss of data PUG Challenge 2012 After Imaging - Why Use It? Easy refreshing of a Report Server DB A Report Server DB is: A database on another server Used for reporting only To relieve the production system of the overhead imposed by reporting Doesn’t require same level of hardware or Progress license PUG Challenge 2012 Essential DB Monitoring Performed periodically to make sure you don’t have hidden or unreported corruption PUG Challenge 2012 Essential DB Monitoring Corruption Checks proutil dbanalys probkup/procopy proutil dbrpr proutil dbscan (non-interactive dbrpr) proutil idxfix dbtool BravePoint 2012 Essential DB Monitoring -MemCheck AND ALL THE OTHER SIMILAR OPTIONS BravePoint 2012 Database Log File Monitoring Check the Database log (.lg) file for errors DAILY. Look for words such as: kill* drastic warn* error system dead fatal abnormal exceed* fail* wrong unexpected* invalid died damage* overflow* violation insufficient missing disappear* corrupt* allow* attempt* cannot enough illegal beyond impossible increase unknown unable stop* ProMonitor supports automated log file monitoring PUG Challenge 2012 1124 Errors SYSTEM ERROR: wrong dbkey in block 99.9999% probability of H/W problem Don’t limit search to disks; consider: Disk Controllers, RAM (parity errors), Firmware, etc. Don’t let the Hardware Technician blame Progress or the Application Don’t let the Hardware Technician escape without fixing the problem PUG Challenge 2012 1124 Stories Seagate Firmware on 2gb drives (mid-90’s) HP Server/EMC SAN administered by HP HP/UX diagnostics showed no problems EMC diagnostics showed no problems Cause: Bad SAN Fabric Switch PUG Challenge 2012 1124 Stories Sometimes a Server reboot can (temporarily) fix a 1124 situation However this might be a situation where the hardware is in a bad-goodbad-good… cycle BravePoint 2012 Database File System Full Use prostrct repair to relocate Extents to a place with more space Copy the Extent to the new location Update the Structure File (.st) to reflect the current location of the Extents (one good reason to have a current one) Run prostrct repair new.st Alternatively use prostrct add to add new Extents PUG Challenge 2012 No Space for Before Image File Do Not run out of BI disk space if: More space isn’t available elsewhere The BI extent(s) can’t be relocated To perform Crash Recovery (part of the proutil truncate bi process), the BI file will grow; sometimes 2X or more PUG Challenge 2012 No Space for Before Image File If there is no space for the BI file to grow, there is no Crash Recovery If Crash Recovery partially completes but then crashes, the next Crash Recovery will create an even larger BI file!!! PUG Challenge 2012 No Space for Before Image File Force Access (-F) is the only option (if you don’t have AI enabled) Even having AI enabled is problematic Crash Recovery Notes are also written to the AI logs You can’t do rfutil aimage empty during Crash Recovery!! This is why the –bithold parameter is so important PUG Challenge 2012 Corrupt Database Blocks What Kind of Block? Index or Data Block Type 2 (Index = IX) Block Type 3 (Record Manager = RM) Use proutil dbrpr to get the Block Type Or if the Block is in a Storage Area dedicated to Indexes or Tables, you automatically know PUG Challenge 2012 Corrupt IX Block Options Try rebuilding the Index that the IX Block belongs to Try to Truncate the Area Reformat the block as a Free Block (proutil dbrpr) If it is an RM block, see the next set of slides PUG Challenge 2012 Corrupt RM Blocks Reformat the block as a Free Block (proutil dbrpr) Replace the Block with the same Block from another DB (restored from a backup) proutil dbrpr 5. Dump Block (from the good DB) 4. Load Block (into the bad DB) Don’t forget to backup the DB fi PUG Challenge 2012 RM Block Transplant How can I tell if the block has changed since the last probkup? The Block Update Counter stored in the Block Header If not using probkup, the DBKEY from the AI Logs can be obtained with aimage scan verbose but you REALLY need to be motivated PUG Challenge 2012 Emergency Dump ‘Front and Back’ 4GL Dump for each customer by custnum (until you hit the bad spot) for each customer by custnum descending Assumes the damage is limited in scope PUG Challenge 2012 Emergency Dump If the Primary Index is damaged, try dumping using a non-Primary Index Or try fixing the index with proutil idxfix Indexless Binary Dump proutil <db> -C dump <table> -index 0 Only works for Type 2 Areas BravePoint 2012 Emergency Dump RECID Dump Doesn’t require an Index Very Slow on a Large Database If one table per Area, perhaps no so bad Usually the Last, Last, Last Resort PUG Challenge 2012 Deleted/Damaged Extents First, backup the “remnant” of the DB This may seem like a useless step but if the last backup is defective you may need to repair the broken DB and that’s difficult to do if it’s deleted The backup gives you time to: Prepare a plan of action Call outside resources (like me) for help Calm down, take a Xanax & lock the door Prepare a new Resume (C.V.) PUG Challenge 2012 Deleted/Damaged AI Extents If an AI Extent is deleted, simply disable AI and... What? Not running AI? Disable AI (rfutil aimage end) Fix the issue that caused the lost Extent Recreate the Extent with prostrct add Restart AI (rfutil aimage begin) If this doesn’t work, next slide PUG Challenge 2012 Deleted/Damaged AI Extents Method #2 Disable AI with rfutil aimage end. You may get an error message regarding the missing AI Extent but typically AI is still disabled Truncate the BI file with proutil truncate bi. You may get an error regarding the missing AI Extent but typically the BI file is still truncated PUG Challenge 2012 Deleted/Damaged AI Extents Method #2 continued Remove all AI Extents with prostrct remove Recreate the original AI Extents with prostrct add Restart After Imaging with rfutil aimage begin Reformat the truncated BI file with proutil bigrow PUG Challenge 2012 Deleted/Damaged BI Extents Force access with -F V8.2 and later -F only works on proutil truncate bi (and promon and proshut) If you Force access, consider the DB damaged! Forcing access THROWS AWAY the BI file Unfortunately, sometimes –F is the only option PUG Challenge 2012 Deleted/Damaged BI Extents Force Access with –F continued Forcing Access sets the ‘Tainted Flag’ in the Master Block Even if you fix the Tainted Flag (idxbuild), consider the DB damaged! Dump & Load (this is if there is no AI recovery option) If you can’t get into the database with -F or any other way, try the Read Only (-RO) option PUG Challenge 2012 Deleted/Damaged Database Extents .db PUG Challenge 2012 .d1 .d2 .d3 HWM .d4 .d5 Deleted/Damaged DB Extents Restore the DB and BI from Backup Apply the AI files Re-enable AI BI Grow Done! That didn’t work?, next slide PUG Challenge 2012 Deleted/Damaged DB Extents Use prostrct unlock if the deleted Extent was Empty (above the High Water Mark) prostrct unlock -extents will recreate missing Extents (and set the Tainted flag) However unlock also changes the time stamps on the AI files and they can’t be used any longer PUG Challenge 2012 Deleted DB Extents – Extent Transplant This technique is for Extents that contain Data except the Schema Area .d1 Extent (contains Master Block) Restore a copy of the deleted Extent from a Backup or other source The Extent’s ‘Last Opened’ time stamps won’t match Use prostrct unlock to sync the time stamps (broken in some versions) PUG Challenge 2012 Deleted DB Extents – Extent Transplant The data in the Extent might not match but… Use the -miracle option to re-create the Data Loss of the Schema Area .d1 extent is usually not recoverable Loss of a High Water Mark extent is also usually not recoverable PUG Challenge 2012 Deleted DB Extents – Extent Transplant This is why small fixed extents may still be a good idea PUG Challenge 2012 Deleted DB Extents If the DB Broker is still running #1 DON’T Shutdown the Database That ‘closes’ the database extents and you won’t be able to re-open them If a Client is still connected to the DB can access the Progress Editor, just dump the Database from the Dictionary Even if they can’t access the Editor, put dict.p (renamed as a menu item) into their PROPATH PUG Challenge 2012 Deleted DB Extents - Unix If the DB Broker is still running #2 Warm Boot the System ASAP Don’t Shut Down the DB First Run fsck (*ix only) and it can probably recover the deleted Extent Why? On Unix a file is not absolutely deleted until every process that has it open is gone (the Broker still has it open) PUG Challenge 2012 Lost .db File In V9 and later it is relatively easy to restore the .db file prostrct builddb Requires an up-to-date Structure File (remember that point from the Preventive Maintenance list?) PUG Challenge 2012 Sources of Help Progress Documentation Progress DB Administration Guide dba@peg.com PSC Kbase (i.e. Krapbase) PSC Tech Support My mobile phone: +1 541-908-3437 For those weekend emergencies when you need expert assistance This is not a free call PUG Challenge 2012 Conclusion Thank you for coming More details can be found in my Progress Database Administration Guide Publications are available at www.BravePoint.com danf@prodb.com Do we have time for Questions? PUG Challenge 2012