After Imaging

advertisement
After Imaging
The DBA’s Best Friend
A Few Words About The Speaker
•
•
•
•
•
Tom Bascom
Progress® User since 1987
White Star Software, LLC
DBAppraise®, LLC
Consulting Services related to Progress
Databases and Application Architecture.
tom@wss.com
tom@dbappraise.com
What is it?
and
Why Do I Need it?
What is After-Imaging?
• A journal of transaction “notes” that can be
replayed against a baseline backup to restore a
database to the last completed transaction or a
point in time or a specific transaction number.
• This is the same concept that some other
databases refer to as the “redo log”.
• Differs from the before image file (undo log) as
space is not reused without interaction or
scripting.*
* 10.1B AI Archiver improves this.
Why do I need after-imaging?
• Protection from media loss -- such as bad
tapes, a crashed disk, a destroyed data center
or stolen servers…
I have backups.
Do I still need after-imaging?
• With a backup your potential exposure to data
loss is the entire time period between backups.
• For example -- if you do nightly backups and your
disk crashes at 4:45pm you restore from backup
and lose an entire day of work. If you have one
or more bad tapes your data loss could be much
worse.
• With after-imaging you restore the same backup,
roll-forward your archived ai files and lose only
uncommitted transactions.
Why else do I need after-imaging?
• Protection from human errors:
$ cd /db
$ rm *
for each customer:
delete customer.
end.
for each order:
delivered = yes.
end.
$ vi dbname.db
…
:x
• Human error is at least as big a risk as hardware
problems.
Isn’t AI the same as disk mirroring?
• No, disk mirrors will happily delete both
copies of your deleted database.
• Or delete all of your customers on both
mirrors.
Or an Audit Log?
• No, an audit log cannot be replayed to
reconstruct the missing data.
I have OpenEdge Replication.
Do I still need after-imaging?
• OE Replication is a super-set of after-imaging.
You still must configure and manage afterimaging.
• After-imaging still provides an additional layer
of protection – even with OE Replication in
place.
• OE Replication is aggressively real-time. You
cannot build in a time delay like you can with
after-imaging.
Are there downsides to after-imaging?
• It is not automatically enabled.
• You must manage archived logs.
• Recovery is not automated.
What about performance?
• There might be a very small penalty.
• But you can usually only measure it under
extremely high loads.
Loss Prevention Strategies
SLA
Data Loss Strategy
Hardware Loss Strategy
Days
Nightly Backups
• Simple & Inexpensive
Service contract
• Relatively low % of system cost
Hours
Multiple online backups during day
•More files to keep track of
Contract with same-day, on-site repair
• More expensive, a long time to wait
Many
Minutes
After Imaging
• Moderately complex scripting
• Monitoring becomes more critical
• Skilled DBA is helpful
Some redundant HW
• SAN with RAID
• Spare parts kept onsite
A few
Minutes
After Imaging
• Complex scripting
• Monitoring becomes more critical
• Skilled DBA is important
Warm spare server
• Twice the cost of production HW
• Ideally in a remote facility
• Additional DB licensing costs
Seconds
Open Edge Replication
• Much more complex
•Skilled DBA is critical
•Monitoring extremely critical
Hot spare server & automated fail-over
• Twice the cost of production HW
• Ideally in a remote facility
• Additional DB licensing costs
• Additional OS & 3rd party SW costs
Balancing Cost vs Lost Data
$1,000
Hypothetical Relative Costs of Different SLAs
$750
$500
$250
$0
Days
Hours
Many Minutes Few Minutes
Seconds
How Does
After-Imaging
Work?
How does after-imaging work?
BI
File
Database
DB
BI
.a1
.a2
.a3
probkup dbname dbname.pbk
.a4
AI
Logs
First, make a backup!
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
BI
DB
AI
Logs
AIW
.a1
.a2
.a3
.a4
busy
empty
empty
empty
rfutil dbname –C aimage begin
Then, enable afterimaging, start the
database and start an
AI Writer. Extent .a1
will be “busy”.
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
BI
DB
AI
Logs
AIW
.a1
.a2
.a3
.a4
full
busy
empty
empty
rfutil dbname –C aimage new
Switch extents. Extent
.a1 will be marked
“full” and extent .a2
will become “busy”.
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
BI
DB
AI
Logs
AIW
.a1
.a2
.a3
.a4
full
full
busy
empty
rfutil dbname –C aimage new
Switch extents again.
Extent .a2 will be
marked “full” and
extent .a3 will become
“busy”.
How does after-imaging work?
Shared Memory
BI
File
Database
DB
BIW
AI
Logs
AIW
BI
.a1
.a2
.a3
.a4
full
full
full
busy
rfutil dbname –C aimage new
Once more, switch
extents. Extent .a3 will
be marked “full” and
extent .a4 will become
“busy”.
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
BI
DB
AI
Logs
AIW
.a1
.a2
.a3
.a4
full
full
full
busy
rfutil dbname –C aimage new
Switch… Oops! There
are no “empty”
extents! All afterimage extents are
either “full” or “busy”!
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
AI
Logs
DB
AIW
.001
BI
.a1
.a2
.a3
.a4
full
full
full
busy
Copy full extents…
Use the extent sequence number to name them.
.002
.003
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
AI
Logs
DB
AIW
.001
BI
.a1
.a2
.a3
.a4
empty
empty
empty
busy
Mark the full extents as “empty”.
rfutil dbname -C aimage extent empty
.002
.003
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
AI
Logs
DB
AIW
.001
BI
.a1
.a2
.a3
.a4
busy
empty
empty
full
rfutil dbname –C aimage new
.002
.003
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
AI
Logs
DB
AIW
.001
BI
ai.sweep
.a1
.a2
.a3
.a4
busy
empty
empty
full
.002
.003
.004
How does after-imaging work?
Shared Memory
BI
File
Database
AI
Logs
DB
BIW
AIW
.001
BI
ai.new
ai.sweep
.a1
.a2
.a3
.a4
full
busy
empty
empty
.002
.003
.004
.005
How does after-imaging work?
Shared Memory
BI
File
Database
BIW
BI
ai.new
ai.sweep
AI
Logs
DB
AIW
.a1
.a2
.a3
.a4
empty
full
busy
empty
.001
.005
.002
.006
.003
…
.004
How do I use after-imaging to recover?
• Restore from backup. The preferred method is to
restore to a dedicated recovery area. DO NOT DESTROY
a damaged database without first backing it up.
• Determine where to recover to (point in time,
transaction id, last archived ai extent...)
• Obtain the archived ai extents from the backup point
through to the recovery point.
• Roll forward the archived extents:
rfutil dbname -C roll forward [–endtime yyyy:mm:dd:hh:ss] –a archiveExtent
ai.roll dbname startExtent [endExtent]
How do I recover using AI?
Shared Memory
BI
File
Database
DB
BI
.a1
.a2
.a3
AI
Logs
/ailogs
.001
.005
.002
.006
.003
…
.a4
prorest dbname dbname.pbk < backup.list
rfutil dbname –C roll forward –a /ailogs/dbname.001
.004
How do I recover using AI?
Shared Memory
BI
File
Database
DB
BI
.a1
.a2
.a3
AI
Logs
/ailogs
.001
.005
.002
.006
.003
…
.a4
rfutil dbname –C roll forward –a /ailogs/dbname.002
.004
How do I recover using AI?
Shared Memory
BI
File
Database
DB
BI
.a1
.a2
.a3
AI
Logs
/ailogs
.001
.005
.002
.006
.003
…
.a4
rfutil dbname –C roll forward –a /ailogs/dbname.003
…
.004
Post-recovery…
• Remember to enable after-imaging. It is
disabled on the roll-forward target!
What is “Log Based Replication”?
• Log Based Replication is a fancy name for using
after-image files (“logs”) to maintain a copy of
your database.
• Uses for Log Based Replication:
– Verified Backup – make sure that your archived AI files
are valid.
– Reporting Database – use “norecover” to create a
reporting database.
– Warm Spare – keep a copy of your database (almost)
ready to go in failover mode.
How does Log Based Replication work?
/stg
BI
File
Database
.001
AI
Logs
DB
/arc
.001
BI
.a1
.a2
.a3
.a4
rfutil dbname –C roll forward –a /stg/dbname.001
mv /stg/dbname.001 /arc/dbname.001
How does Log Based Replication work?
/stg
BI
File
Database
.002
AI
Logs
DB
/arc
.001
BI
.a1
.a2
.a3
.a4
.002
rfutil dbname –C roll forward –a /stg/dbname.002
mv /stg/dbname.002 /arc/dbname.002
How does Log Based Replication work?
/stg
BI
File
Database
.006
AI
Logs
DB
/arc
BI
.a1
.a2
.a3
.001
.005
.002
.006
.003
…
.a4
rfutil dbname –C roll forward –a /stg/dbname.seq#
mv /stg/dbname.seq# /arc/dbname.seq#
.004
What about the New! AI Archiver?
• The ai archiver is a daemon that automates
extent switching and archiving.
• New startup parameters allow you to start, stop
and configure the ai archiver.
• Does not handle off-site archiving, redundant
archiving, compression or purging of archived
logs.
• Uses a hideous file naming convention.
• Does not handle recovery.
• Does not handle monitoring or alerting.
AI Archiver (and some other loosely related features)
Command
Purpose
proutil dbname -C enableaiarchiver
Enable ai archiver (offline).
probkup online dbname -enableaiarchiver
Enable ai archiver (online).
-aiarcdir dir -aiarcinterval n [-aiarcdircreate]
rfutil dbname -C aiarchiver setarcdir <dir-list>
Set or change archive directory(s)
rfutil dbname -aiarchiver setinterval #
Set or change archive interval
(seconds; 120 to 86400).
proutil dbname -C addonline [st-file-name]
Add extents online.
probkup online dbname backupFile -enableai
Enable after-imaging online.
Practical
Matters
How often should I switch extents?
• How much data can you afford to lose?
– Can users re-enter 5 minutes of data? 15? 60?
– Can you “replay” external transactions? (EDI interfaces
and so forth…)
• Is your workload the same 24x7?
– Do the answers above vary between a “batch window”
and “online activity”?
– How about weekends and holidays?
• I often find hourly switches at night and every 15
minutes during the day to be a good starting point.
How should I setup after-imaging?
• Add ai extents:
prostrct add dbname ai.st
-orprostrct addonline dbname ai.st
# ai.st
a /ai
a /ai
a /ai
a /ai
• How many extents?
– 4 is the absolute minimum:
• 1 busy, 1 full, 1 empty (plus 1 “locked” if using OE
Replication).
– 8 is my recommended default:
• The “extras” give you time to react to issues.
– 16 is my suggested maximum – more is just awkward.
Should I use fixed or variable extents?
• Variable Length
–
–
–
–
More flexible.
Simpler scripting.
Easier monitoring.
More time to correct problems.
• Fixed Length
– Many legacy implementations still use them.
– Fixed might be appropriate for very high volume sites.
• Recommendation: Use variable length extents.
How much disk space do I need?
• How much BI space do you use? (How many bi
clusters do you close in a period of time?)
• How many archived logs should you keep online?
• Do you keep disk images of backups online?
• What about off-site copies of backups and
archived logs?
• Do you plan to recover to dedicated recovery disk
space or “on top of” the existing database?
What sort of disks should I use for AI?
• Dedicated disks.
– The primary job of after-imaging is to protect against
media failure.
– Storing after-image files on the same disks as the data
extents nullifies that protection!
• RAID5 (parity) is probably not your best option:
– After-Imaging is, essentially, write-only.
– RAID5 disks are performance-challenged when writing.
• RAID10 (mirrored stripes) is probably not beneficial:
– After-Imaging writes are sequential.
• RAID1 (mirroring) is the best choice.
AI Implementation Worksheet
Item
FileSystem
Description
Extent Switching Schedule
M-F, 9-5 Every 15 minutes; hourly otherwise
Number & Type of Extents
8, Variable, Dedicated RAID 1 disks
AI Extents
/ai
8GB (~50 16MB bi clusters per day = 800MB)
Archived Logs
/ailog
/aizip
/aistg
/aiver
32GB (40 days)
16GB, Zipped logs
8GB, staging area for logs to be verified from
32GB, archive of verified logs
Verified Backup
/aitest
125GB
Backup Strategy
/backup
250GB, Backup –norecover from /aitest to disk, then tape
Offsite Archives
/ailog
scp logs to remote server X, 32GB (40 days)
Recovery Strategy
/recover
250GB (current production db size x 2.5)
Warm Spare Strategy
Report Server
X is an offsite mirror of prod, apply offsite logs continuously
/reports
125GB, Restored from /backup nightly
How do I start after-imaging?
• Backup:
– probkup is simpler because it marks the db as
“backed up”.
– OS backups require an extra manual step:
rfutil dbname -C mark backedup
• Enable After Imaging:
rfutil dbname -C aimage begin
• Start an AI Writer (AIW):
proaiw dbname
How do I manage after-imaging?
Script
AI Archiver
ai.new
Yes
ai.sweep
Partial
Description
Switches to the next available empty extent.
Copies full extents to (multiple, redundant and possibly
remote) archive locations. (The AI Archiver only copies
archived extents to a single location on the same server.)
ai.roll
No
Rolls forward a set of AI logs against a database. Simplifies
roll-forward by grouping files and ignoring “wrong extent”
warnings.
ai.purge
No
Purges old archived extents.
ai.warm
No
Applies AI logs that appear in a staging directory to a target
database. Used to maintain warm spares and verified
backup databases.
ai.ready
No
Checks a warm spare or verified backup database to ensure
that AI logs are being properly applied.
After-Imaging on UNIX
# crontab (source server)
#
1,16,31,46 * * * * ai.new cs608 base callb callr invpr >> /logs/ai.log 2>&1
#
2,17,32,47 * * * * ai.sweep cs608 base callb callr invpr >> /logs/ai.log 2>&1
#
0 20 * * * ai.purge cs608
# crontab (target server)
#
10,25,40,55 * * * * ai.warm cs608 base > /dev/null
#
0 * * * * ai.ready cs608 base callb callr invpr > /tmp/ai.ready.log
#
0 20 * * * ai.purge cs608
How should I monitor after-imaging?
•
•
•
•
•
After-imaging should be enabled.
Busy extents should be 1.
Full extents should be less than or equal to 2.
Empty extents should be “most of them”.
The last messages in the .lg file of a replicated
database should be:
(662)Roll forward completed.
(334) rfutil -C roll forward session end.
(with appropriately recent date and time stamps.)
Troubleshooting
Extents Stop Switching
• You may have disabled cron, the cron job or the ai
archiver (if you are using it).
• Or you may have introduced a scripting error.
• You may have run out of disk space somewhere.
• With variable extents in use and “large files”
enabled disk space becomes the limiting factor.
You have more time to detect, respond to and fix
the problem.
• With fixed extents the database may stall or crash
much sooner.
• If you are out of ideas try a manual extent switch.
Roll Forward Fails
• You may have guessed the wrong extent – this is
harmless. Try another. The message in the .lg file
tells you which sequence# you need.
• An archived extent might be missing or damaged –
find a valid copy and try again. This is a good
reason to make redundant copies of ai logs.
• A more serious error may have occurred. Read the
.lg file and check out the error on PSDN if
necessary. Use “roll forward retry” after correcting
the error.
Opening a Replication Target 
• Once you start a server or open a single-user
session against a replication target you cannot
roll-forward any more logs.
• Even if you change no data.
• You can, however, safely start a –RO session.
• If someone opens the database you will need
to re-initialize the replication target.
Forgetting to Enable After-Imaging.
• Usually happens after a conversion or a
recovery/fail-over.
• Add extents online (if necessary).
• probkup and enable ai online.
• Re-initialize your replication targets.
(Re-)Initializing a Replication Target
• Move any accumulated staged ai logs to a
temporary directory.
• Obtain a backup of the source database.
• Restore the backup on the target server.
• Transfer the 1st needed ai log and all
subsequent logs to the staging directory.
– An incorrect log will result in a message in the .lg
file that identifies the needed sequence#.
Why re-initialize?
• Failing back from fail-over recovery to your
warm spare.
• Someone accidentally opened your replication
target.
• After-imaging was deliberately disabled for
some reason.
• Dump and load.
Disabling After-Imaging
• There are not many good reasons to disable afterimaging. This should be very rare.
• Among the possible reasons:
– Dumping and loading.
– Large, write-intensive processes that can be restarted.
• If you must disable after-imaging:
– Backup and be prepared to restore.
• Allowing users to have access in this period is often not
compatible with being able to restore from backup.
– Do what needs to be done.
– Re-enable after-imaging.
– Re-initialize any replication targets.
• The actual commands are in the documentation.
Tricks!
• Getting the next “full” extent:
EXTENT=`$DLC/bin/_rfutil ${DB} -C aimage extent full`
• Getting an extent’s sequence number:
SEQ=`rfutil ${DUMMY} -C aimage scan -a ${EXTENT} | grep number | tail -1 | awk '{print $6}'`
• Using the verification database for backups:
probkup dbname dbname.pbk –com –norecover < backup.list
• Using the backed up verification database for
reporting:
prorest dbname dbname.pbk < backup.list
Conclusion
After-Imaging Best Practices
• Enable after-imaging on all updateable databases.
• Place after-image extents on separate disks from data
extents.
• Use 8 to 16 variable extents with “large files” enabled.
• Run an AIW.
• Switch extents as often as the business needs you to.
• Use the sequence number when naming archived logs.
• Copy archived logs to a remote location ASAP.
• Verify your process by continuously rolling forward.
• Monitor your “empty” and “full” extents.
• Keep at least 30+ days of archived after-image logs.
• Establish a dedicated backup and recovery directory.
Download