Database Repair and Recovery - Dan Foreman

advertisement
Progress Database
Repair & Recovery
Dan Foreman
BravePoint, Inc.
Email: danf@prodb.com
Introduction- Dan Foreman
Progress User since 1984 (V2.1)
Speaker at many Progress User
Conferences from 1990 to 2012
PUG Challenge 2012
Introduction- Dan Foreman
Author of:
Progress Performance Tuning Guide
Progress Database Administration Guide
Progress System Tables Guide
Progress V10 Database Admin Jumpstart
Book purchase allows free online access
ProMonitor – Database Monitoring Tool
ProD&L - Accelerated Dump/Load Utility
Balanced Benchmark – Load testing tool
PUG Challenge 2012
Introduction - Who Are You
Progress Version: V6, V7, V8, V9,
V10.0*, V10.1*, V10.2*
DB OS: Unix? Windows? Linux?
Is there anything else?
Largest Single Database?
Highest Concurrent User Count?
PUG Challenge 2012
Special Request
Mobile Phones on Mute Please!
PUG Challenge 2012
Goals
Can I teach you database brain
surgery in an hour?
Note that DBAs have one big
advantage that human brain
surgeons don’t…do you know what
that is?
PUG Challenge 2012
Real Horror Stories
Fortune 500 Company (sorry but they
would not appreciate us sharing their
name)
DB Corruption February 23
Last Good Backup: January 11
Last Good AI Files: January 17
We facilitated a special version of rfutil
that would ignore errors during the Roll
Forward process
PUG Challenge 2012
Real Horror Stories
Customer running SCO OpenServer
We had told the customer to move to a
more “modern” OS, i.e. Linux
OS Problem – Can’t mount the Disks;
Discovered that backups to tape were
not occurring (backup to disk was OK
but couldn’t see the disks)
Had to boot using Knoppix to repair
things – took forever to reinstall SCO
PUG Challenge 2012
Real Horror Stories
Fortune 1000 Company
HP Server
Admin outsourced to IBM
Backups outsourced to 3d party
3d party stopped doing backups,
unannounced, due to non-payment
DB Corrupted
Restoration impossible
BravePoint 2012
Preventive Maintenance
Backups (yes, I know you think you
have backups but have you tested
one recently?)
Test your Entire Recovery Plan
PUG Challenge 2012
Preventive Maintenance
Warm Standby Database - A
database on another machine with
a recent copy of the production DB
Also called a D/R Database
This is easy to do in
Progress...covered soon
PUG Challenge 2012
Preventive Maintenance
Unix: don’t logon as root unless you
really need to
Use O/S security to protect the DB,
BI, and AI files from accidental or
casual or intentional deletion
PUG Challenge 2012
Preventive Maintenance
Unix: Don’t use kill -9 to terminate
a Self Service Progress session;
You might bring the database
DOWN! if you happen to kill a
session that is holding a Latch
ALWAYS have an up-to-date
Structure (.st) file available - we
will see why later
PUG Challenge 2012
Preventive Maintenance
Monitor the BI file High Water Mark
Monitor 'Delinquent' Transactions
(Active Transactions longer than 30-60
minutes)
Monitor Large Transactions (A Client
with a Large Number of concurrent
Record Locks)
longtrx*.p Progress program on the
BravePoint Website to detect
Delinquent Transactions
PUG Challenge 2012
Preventive Maintenance
Use the -bithold parameter as an
extra safeguard; Set to 50% of
available BI Disk Space; even in V9
V9/V10 supports Terabyte sized BI
Files but extent sizes are still
limited to 2gb unless you use the
EnableLargeFiles option on proutil
and the file system must be 2gb
enabled too
PUG Challenge 2012
Preventive Maintenance
Monitor the Area High Water Marks
to avoid growing into the Variable
Length Extent
There is a Performance Hit, usually
insignificant but sometimes not, when
growing the Variable Extent
A Single Variable Extent can limit some
of the recovery options discussed later
PUG Challenge 2012
Quiz Question
Who are the Smartest DBAs in the
Room?
PUG Challenge 2012
Answer
DBAs who enabled After Imaging
on their Mission Critical Databases
If you’re not using AI, you
probably shouldn’t be responsible
for your company’s databases
Management need convincing?
Play Chicken with them
PUG Challenge 2012
After Imaging
Who is currently not using AI?
If not, why not?
Is public flogging or humiliation
required?
PUG Challenge 2012
After Imaging
PSC docs say that AI offers protection
against Disk failure
Disk fails 5 minutes before the backup starts
on the final day of your year end close
No paper trail
Ouch! Time to work on your resume (C.V.)
After Image File(s) + Last (Good)
Backup = State of DB at time of crash
PUG Challenge 2012
After Imaging - Why Use It?
But you say…”I have disk mirroring
(also known as RAID 1) so I’m
protected against a disk failure”
BUT Mirroring does NOT protect
against all database evils
PUG Challenge 2012
After Imaging - Why Use It?
True Horror Story #1
A DBA (logged on as root) FTP’d a test
database into the directory where the
production database resided...
unfortunately they had the same name
Disk Mirroring worked just fine…..
After Imaging would have probably
saved the day
PUG Challenge 2012
After Imaging - Why Use It?
True Horror Story #2
A user ran an archiving program on live
data that wasn’t ready to be archived
Once again the mirroring performed
perfectly
AI might have improved the
situation as it is possible to Roll
Forward to a specific point in time
PUG Challenge 2012
After Imaging - Why Use It?
True Horror Story #3 – Part 1
BI file hit the V8 2GB limit @ 1600 on
a busy day (300+ users)
Large Production Database was
corrupted
Progress Support Recommendation:
dump & load or restore from backup
which meant substantial down time or
data loss
PUG Challenge 2012
After Imaging - Why Use It?
True Horror Story #3 – Part 2
Fortunately the customer called me
and I was able to temporarily patch
the database until a D&L could be
performed
Irony: I had recommended AI to this
customer over one year prior to this
event
PUG Challenge 2012
After Imaging - Why Use It?
Avoid probkup online issues
Transaction Activity is Frozen while the
BI File is Backed Up
The I/O Overhead of disk/tape backup
Possible Solution
Use AI to maintain a Warm Spare DB
Backup the Replicated Database
PUG Challenge 2012
After Imaging - Why Use It?
Warm Standby (D/R) Database
A Warm Standby DB is:
A replicated Database on another Server
DB can an be brought online quickly in
case of catastrophic failure to the
production system
It’s ‘warm’ because it is not 100%
current…usually 2-15 minutes
behind
PUG Challenge 2012
After Imaging - Why Use It?
A HOT spare database is not
possible using AI except with:
Fathom Replication
(oops…OpenEdge Replication)
Replication Triggers
SAN Mirroring
Even these options don’t guarantee
zero loss of data
PUG Challenge 2012
After Imaging - Why Use It?
Easy refreshing of a Report
Server DB
A Report Server DB is:
A database on another server
Used for reporting only
To relieve the production system of
the overhead imposed by reporting
Doesn’t require same level of
hardware or Progress license
PUG Challenge 2012
Essential DB Monitoring
Performed periodically to make
sure you don’t have hidden or
unreported corruption
PUG Challenge 2012
Essential DB Monitoring
Corruption Checks
proutil dbanalys
probkup/procopy
proutil dbrpr
proutil dbscan (non-interactive dbrpr)
proutil idxfix
dbtool
BravePoint 2012
Essential DB Monitoring
-MemCheck
AND ALL THE OTHER SIMILAR
OPTIONS
BravePoint 2012
Database Log File Monitoring
Check the Database log (.lg) file for
errors DAILY. Look for words such as:
kill* drastic warn* error system dead fatal
abnormal exceed* fail* wrong unexpected*
invalid died damage* overflow* violation
insufficient missing disappear* corrupt*
allow* attempt* cannot enough illegal beyond
impossible increase unknown unable stop*
ProMonitor supports automated log file
monitoring
PUG Challenge 2012
1124 Errors
SYSTEM ERROR: wrong dbkey in block
99.9999% probability of H/W problem
Don’t limit search to disks; consider:
Disk Controllers, RAM (parity errors),
Firmware, etc.
Don’t let the Hardware Technician
blame Progress or the Application
Don’t let the Hardware Technician
escape without fixing the problem
PUG Challenge 2012
1124 Stories
Seagate Firmware on 2gb drives
(mid-90’s)
HP Server/EMC SAN administered
by HP
HP/UX diagnostics showed no problems
EMC diagnostics showed no problems
Cause: Bad SAN Fabric Switch
PUG Challenge 2012
1124 Stories
Sometimes a Server reboot can
(temporarily) fix a 1124 situation
However this might be a situation
where the hardware is in a bad-goodbad-good… cycle
BravePoint 2012
Database File System Full
Use prostrct repair to relocate
Extents to a place with more space
Copy the Extent to the new location
Update the Structure File (.st) to reflect
the current location of the Extents (one
good reason to have a current one)
Run prostrct repair new.st
Alternatively use prostrct add to add
new Extents
PUG Challenge 2012
No Space for Before Image File
Do Not run out of BI disk space if:
More space isn’t available elsewhere
The BI extent(s) can’t be relocated
To perform Crash Recovery (part of
the proutil truncate bi process), the
BI file will grow; sometimes 2X or
more
PUG Challenge 2012
No Space for Before Image File
If there is no space for the BI file to
grow, there is no Crash Recovery
If Crash Recovery partially completes
but then crashes, the next Crash
Recovery will create an even larger BI
file!!!
PUG Challenge 2012
No Space for Before Image File
Force Access (-F) is the only option (if
you don’t have AI enabled)
Even having AI enabled is problematic
Crash Recovery Notes are also written to
the AI logs
You can’t do rfutil aimage empty during
Crash Recovery!!
This is why the –bithold parameter is
so important
PUG Challenge 2012
Corrupt Database Blocks
What Kind of Block? Index or Data
Block Type 2 (Index = IX)
Block Type 3 (Record Manager = RM)
Use proutil dbrpr to get the Block Type
Or if the Block is in a Storage Area
dedicated to Indexes or Tables, you
automatically know
PUG Challenge 2012
Corrupt IX Block Options
Try rebuilding the Index that the IX
Block belongs to
Try to Truncate the Area
Reformat the block as a Free
Block (proutil dbrpr)
If it is an RM block, see the next set
of slides
PUG Challenge 2012
Corrupt RM Blocks
Reformat the block as a Free Block
(proutil dbrpr)
Replace the Block with the same
Block from another DB (restored
from a backup)
proutil dbrpr
5. Dump Block (from the good DB)
4. Load Block (into the bad DB)
Don’t forget to backup the DB fi
PUG Challenge 2012
RM Block Transplant
How can I tell if the block has
changed since the last probkup?
The Block Update Counter stored in
the Block Header
If not using probkup, the DBKEY
from the AI Logs can be obtained
with aimage scan verbose but you
REALLY need to be motivated
PUG Challenge 2012
Emergency Dump
‘Front and Back’ 4GL Dump
for each customer by custnum (until you hit
the bad spot)
for each customer by custnum descending
Assumes the damage is limited in scope
PUG Challenge 2012
Emergency Dump
If the Primary Index is damaged, try
dumping using a non-Primary Index
Or try fixing the index with proutil
idxfix
Indexless Binary Dump
proutil <db> -C dump <table> -index 0
Only works for Type 2 Areas
BravePoint 2012
Emergency Dump
RECID Dump
Doesn’t require an Index
Very Slow on a Large Database
If one table per Area, perhaps no so bad
Usually the Last, Last, Last Resort
PUG Challenge 2012
Deleted/Damaged Extents
First, backup the “remnant” of the DB
This may seem like a useless step but
if the last backup is defective you may
need to repair the broken DB and
that’s difficult to do if it’s deleted
The backup gives you time to:
Prepare a plan of action
Call outside resources (like me) for help
Calm down, take a Xanax & lock the door
Prepare a new Resume (C.V.)
PUG Challenge 2012
Deleted/Damaged AI Extents
If an AI Extent is deleted, simply
disable AI and...
What? Not running AI?
Disable AI (rfutil aimage end)
Fix the issue that caused the lost Extent
Recreate the Extent with prostrct add
Restart AI (rfutil aimage begin)
If this doesn’t work, next slide
PUG Challenge 2012
Deleted/Damaged AI Extents
Method #2
Disable AI with rfutil aimage end. You
may get an error message regarding
the missing AI Extent but typically AI is
still disabled
Truncate the BI file with proutil truncate
bi. You may get an error regarding the
missing AI Extent but typically the BI
file is still truncated
PUG Challenge 2012
Deleted/Damaged AI Extents
Method #2 continued
Remove all AI Extents with prostrct
remove
Recreate the original AI Extents with
prostrct add
Restart After Imaging with rfutil
aimage begin
Reformat the truncated BI file with
proutil bigrow
PUG Challenge 2012
Deleted/Damaged BI Extents
Force access with -F
V8.2 and later -F only works on proutil
truncate bi (and promon and proshut)
If you Force access, consider the DB
damaged!
Forcing access THROWS AWAY the BI
file
Unfortunately, sometimes –F is the only
option
PUG Challenge 2012
Deleted/Damaged BI Extents
Force Access with –F continued
Forcing Access sets the ‘Tainted Flag’
in the Master Block
Even if you fix the Tainted Flag
(idxbuild), consider the DB damaged!
Dump & Load (this is if there is no AI
recovery option)
If you can’t get into the database with
-F or any other way, try the Read Only
(-RO) option
PUG Challenge 2012
Deleted/Damaged Database Extents
.db
PUG Challenge 2012
.d1
.d2
.d3
HWM
.d4
.d5
Deleted/Damaged DB Extents
Restore the DB and BI from Backup
Apply the AI files
Re-enable AI
BI Grow
Done!
That didn’t work?, next slide
PUG Challenge 2012
Deleted/Damaged DB Extents
Use prostrct unlock if the deleted
Extent was Empty (above the High
Water Mark)
prostrct unlock -extents will
recreate missing Extents (and set
the Tainted flag)
However unlock also changes the
time stamps on the AI files and
they can’t be used any longer
PUG Challenge 2012
Deleted DB Extents – Extent Transplant
This technique is for Extents that
contain Data except the Schema Area
.d1 Extent (contains Master Block)
Restore a copy of the deleted Extent
from a Backup or other source
The Extent’s ‘Last Opened’ time
stamps won’t match
Use prostrct unlock to sync the time
stamps (broken in some versions)
PUG Challenge 2012
Deleted DB Extents – Extent Transplant
The data in the Extent might not
match but…
Use the -miracle option to re-create
the Data 
Loss of the Schema Area .d1 extent
is usually not recoverable
Loss of a High Water Mark extent is
also usually not recoverable
PUG Challenge 2012
Deleted DB Extents – Extent Transplant
This is why small fixed extents may
still be a good idea
PUG Challenge 2012
Deleted DB Extents
If the DB Broker is still running #1
DON’T Shutdown the Database
That ‘closes’ the database extents and
you won’t be able to re-open them
If a Client is still connected to the DB
can access the Progress Editor, just
dump the Database from the Dictionary
Even if they can’t access the Editor, put
dict.p (renamed as a menu item) into
their PROPATH
PUG Challenge 2012
Deleted DB Extents - Unix
If the DB Broker is still running #2
Warm Boot the System ASAP
Don’t Shut Down the DB First
Run fsck (*ix only) and it can probably
recover the deleted Extent
Why?
On Unix a file is not absolutely deleted
until every process that has it open is
gone (the Broker still has it open)
PUG Challenge 2012
Lost .db File
In V9 and later it is relatively easy
to restore the .db file
prostrct builddb
Requires an up-to-date Structure
File (remember that point from the
Preventive Maintenance list?)
PUG Challenge 2012
Sources of Help
Progress Documentation
Progress DB Administration Guide
dba@peg.com
PSC Kbase (i.e. Krapbase)
PSC Tech Support
My mobile phone: +1 541-908-3437
For those weekend emergencies when
you need expert assistance
This is not a free call
PUG Challenge 2012
Conclusion
Thank you for coming
More details can be found in my
Progress Database Administration
Guide
Publications are available at
www.BravePoint.com
danf@prodb.com
Do we have time for Questions?
PUG Challenge 2012
Download