Data Recovery Best Practices “Backup provides you a recovery avenue when

advertisement
“Backup provides you a
Data Recovery Best Practices
recovery avenue when
Building a responsible backup and recovery system for your databases
things go wrong.
When people think about Data Recovery, they think largely about backups and the actual act of both
Hard drives fail, connections
backing up the database and associated files and the process of restoring those files to the server.
between systems fail and
and executing on that plan, you can quickly get into trouble.
have to be restored, people
Without a solid plan in place that reviews the best approaches for setting up a plan, testing the plan
Planning for data recovery is more than just making sure your database is backed up. You need to
make mistakes, all causing
understand how the process works, you need to have the right tools in place, and you need to have
the need to recover at
won’t want to be learning about how things work; you’ll want to get the job done as quickly as possible.
different levels.”
There are many different components to a competent backup and recovery plan. In addition, there are
practice in using those tools. When the time comes to restore information to your production systems, you
many types of recovery plans available. Each of these different approaches may suit what you need for
different types of issues that arise. You need to understand and plan for the differences between a full
system restore and a point-in-time recovery. At the most precise level, you may even need to recover
a specific transaction or data element. As you can imagine, understanding each of these, and how to
execute on them, is critical to managing your data resources.
In this white paper, we’ll explain each of these items, talk about what they mean and how they
apply. We’ll also provide key planning points, and investigate how some different tools can help you
accomplish these tasks.
by Stephen Wynkoop, Microsoft SQL Server MVP
Founder The SQL Server Worldwide User’s Group
swynk@bitsonthewire.com
Why Backup is Necessary
Backup provides you a recovery avenue when things go wrong. Hard drives fail, connections between systems
fail and have to be restored, people make mistakes, all causing the need to recover at different levels.
Note: Backup processes and planning often revolve around the unsettling question of “how much
can you afford to lose.” This is because you need to determine the frequency that you backup the
transaction logs and databases, while at the same time paying attention to disk and/or tape space
constraints. In addition, you’ll need to decide how you store backups, how many days of backups you
retain and lastly, whether you want to maintain a sub-set of your backups off-site.
Remember, in the worst possible scenario, if your backups are stored right next to your computer and
there is a fire, the backups will go up in smoke too, right along with your computer. It’s important to
have at least a skeleton off-site storage plan.
Keep in mind that responsible planning and management of your systems
It’s also possible to roll back specific transactions (either literal transactions
includes more than just backing up to a device and then restoring the
or merely changes to the data in the database) using third party tools.
database should systems fail. There are really three different types of
Lumigent’s Log Explorer product will let you peruse data changes, along
recoveries you may be faced with, and several shades of gray between each
with a whole host of information about those changes. This includes who
of these. The major restore options are explained in the next three sections.
made the change, what was the value before the change, etc. From this
information, the tool will allow you to restore specific values, in essence
Full Database Recovery and Restore
Full database backup and restore is what many people think of when they
consider their backup strategy, and it’s the most drastic recovery path. This
rolling-back data modifications, even without the benefit of transactions.
Disaster Planning and Recovery
requires that you restore the most recent full database backup, and then
Disaster planning must take into account the types of recovery you want and
apply all transaction logs that were backed up after that backup was taken.
need to support. You need to have a written plan, and you need to test the
At the end of the process, your database will be in the same state it was as
plan to make sure it addresses the different facets of any restore process.
of the time of the last transaction log backup. Your data loss in this scenario
Remember, you won’t control when the process is needed. You want to be
will amount to that information that was not in the most recent transaction
able to provide for how the process is done, what the expected outcome will
log backup.
be, and how to provide for support for these processes up to the time you
need the recovery efforts to begin.
Point-in-Time Recovery
What follows are some guidelines to thinking through your plan.
Point-in-Time Recovery lets you recover, typically using transaction logs, to
How Much Data Can You Afford To Lose?
a specific time when you know the data was valid. This typically means
you’ve discovered data issues after some time has passed. This usually
As mentioned above, this is perhaps the most telling question you need to
means restoring the most recent backup, then applying transaction logs to
be sure you can answer. If you can’t lose a single transaction or a single
the system up to just before the time when you know the data began to have
change, your disaster planning and recovery efforts will need to include
issues. This lets you restore to a known good point in time. You can also
fail-over systems. This means you’ll be looking into clustering solutions,
perform differential database backups – these allow you to backup just the
and you’ll be working with hot stand-by systems and real-time replication
changes since the last backup was performed.
and archival solutions. These tend to lead to rather large budgets, so
depending on your budget, “no data loss whatsoever” may not be a reality.
Specific Transaction Recovery
Transaction-based recovery is typically done in one of two different ways.
First, your application can be managing transactions in the code by starting
transactions, doing a bit of work, and then committing the work to the
database with an end transaction call. If the transaction fails, it can be rolledback, putting the information in the tables into the same state that it was in
when the transaction was started. In addition, if the server were forced to
restart during the transaction, SQL Server would roll back the transaction,
putting the database into a known state – the values representing the
values in the database at the time that the transaction was started.
That said, and assuming that you’re not looking into a clustered solution,
you’ll need to know how much data you have in the actual database(s) you’re
backing up, and you’ll need to know what size the transaction logs get to as
the database is used.
One of the most common approaches to backups, and one which allows
Keep this in mind as you architect your recovery solution. You need to
for only a maximum one hour data loss window, is to backup the database
consider your transaction log rotation schedule in addition to your backup
nightly and the transaction logs hourly. Typically, you’ll set up SQL Server to
rotation schedule. It all goes back to “how much data can you lose” and how
keep a specific number of days worth of backup as archive. When you set up
far back are you willing to support in the need to recover that data? If the
this type of backup structure, you’ll tell SQL Server “Keep 14 days of backups,
answer is that you need to be able to restore to a point in time during that
backup the database each morning at 3AM and the transaction logs every
five day window (from our example of five days online backup storage), you’ll
hour for all other times.”
need to also be storing five days of hourly transaction logs.
Keep in mind that, if you’re using this approach, you need to have disk (or
Optimize availability
tape, if you’re backing up directly to tape) space equal to more than 14 times
the size of your database since you’ll be keeping 14 archival copies in the
queue. In addition, you need to plan for enough space to support the 13
transaction log dumps. The size of transaction log dumps varies wildly and is
entirely dependent on the volume of information processed by SQL Server.
About Transaction Logs and Keeping
Historical Backups
Many people make the mistake of thinking that as long as they have several
days of backups, they can restore to any point in time during those several
days.
It can be a painful lesson to learn that this may not be the case,
depending on your archive solution. Consider the following backup policy:
• Nightly backups
• Hourly transaction log dumps
• Database backups are kept online for five days, then archived
to a secondary source
• Transaction logs are rotated to keep the most recent
When you’re building out your plan, be sure to consider the impact on your
users and those dependent on access to the database. If you’re in a situation
that requires access at all times (financial applications are an example of
this), you’ll want to look not only at a recovery plan, but also a failover plan.
Failover will protect you in cases where a hard drive fails, or other instances
where the server goes offline, taking your database systems with it. Failover
typically includes clustered server capabilities, where you have more than
one server working against a given set of data. If one server does fail, the
other server is able to pick up where the failing server left off and the user
experience is largely unaffected by the downtime.
Note
In a clustered environment, if a failover situation does occur, the application
working against the database may need to be restarted to “see” the
recovery server. Typically this is merely a restart of the application, or
a reconnection to the web site or other resource working with your SQL
Server. The important point here is that your recovery plan in a clustered
environment should include several phases:
24 hours available
At first glance, this is great. You can recover to the last database backup,
• Bring the applications back online against the recovery server(s)
then apply the transaction logs to recover beyond that to the current state, or
• Take the server offline that is down and/or experiencing trouble
any time in between. If your system fails, and you recognize the failure within
• Correct the issue with the original server
24 hours of the last database backup, you’re correct in saying that you’re
covered. Keep in mind, though, that if you have the possibility of needing to
restore further back than that last database backup, you will be faced with
data loss.
This situation comes from the fact that you’ll restore the database from
three days ago (as an example), which would be available online. But if you
follow the history configuration for the transaction logs, you’ll find that the
transaction logs are only available for the last 24 hours. This would mean
you wouldn’t be able to move forward beyond that three-day old backup.
You’d be restoring to that point and no further in the database.
• Bring the original server back into the cluster to begin supporting the
cluster again
On the other hand, if you don’t need to make sure you have full access, all
the time to the server, you can work out your plan so you know exactly what
you need to do to recover your system, get people back working again in the
shortest period of time, and how to address problems that may arise during
that process.
Plan for the Failure, Don’t Fail to Plan
Executing on your plans will be key – below you’ll find different things you’ll
need to consider and work through as you design your recovery plans.
Once you have your backup files, you need to make absolutely certain they are
valid, that you know how to restore them, and that the restoration process
is documented. Remember, if you’re encrypting or password protecting your
backups, the password should be stored somewhere safe, but somewhere
Backup Procedure Checks –
where the right person knows how to get to it. If you’re away on vacation
•
Are they working?
and the system must be restored, there should be a procedure that can be
•
Check your scheduled task’s history entries
followed to complete the restoration, complete with passwords.
•
Check the backup directory for the related database and transaction
log dump files
Keep in mind that just because you may not be taking vacations, this doesn’t
•
Are they archiving appropriate numbers of past copies of the backups?
mean you don’t need a plan. When things go wrong, the last thing you
•
Check the directory for past copies of the database and transaction
want to be doing is trying to remember the steps you need to follow to get
log dump files – if you’re expecting a rotation of files, perhaps several
your systems back online. Take the time now to write out the steps… then
days worth or more of these files, make sure they’re in the directory.
practice them.
•
Are the transaction logs backing up on time?
•
Check the job history
•
Check the directory that is used for the backups; make sure the
transaction logs dumps are there.
Tip
Here are some pointers to keep in mind
for the restoration process planning:
•
process. One very important thought on this topic has surfaced given
log dump files are rather large, you may want to consider making the time
the recent mass power outage in New York City and the surrounding
between transaction log backups smaller. Remember, in the case of a
areas. If you consider that, if you were the DBA, the phones and many
restore, you’ll be restoring the database, then the transaction logs to get
transportation systems were out of commission, and you quickly see
caught up. If the transaction logs are large, this can mean that you are
that you can’t count on getting back to the office to address issues.
running a large number of transactions, which translates into losing a large
While this is extreme, it does point out that it’s possible that whoever
number of transactions (since the last transaction log backup) between
happens to be in the office at the time a critical issue arises needs to
backup processes.
•
Have a written plan with steps to follow for the restoration and recovery
When you review the backup file sizes, if you see that your transaction
If you’re using SQL LiteSpeed, try running LiteSpeed with the debug option
turned on. This will enable you to see the various messages as the backups
be able to address that issue. You need to have a written plan.
•
Try performing your restores against a second server. Make sure you
are performed. You’ll need to manually run the backups to be able to
know the process and that you’ve gone through the steps of restoring
review/see these messages. Alternatively, you can have the output of
the database, checking user permissions, applying transaction logs.
the backup operations directed to a log file, external to SQL Server. You
can then review this log file for any issues that may arise.
For more
information, read about the @logfile option with LiteSpeed.
Perhaps the most important check is whether your backup files can be restored.
It sounds silly, but there are a large number of people that can attest to the fact
that they thought they were successfully backing up and were protected from
disaster. When it came time to recover and restore their files from backup,
they found that they didn’t know how (didn’t know the commands), the backup
files were either missing or corrupt, or they couldn’t find the correct hardware/
software combination to get the files back onto the server for restoration. (This
last point is one that pertains largely to tape backup systems.)
•
If you’re working in a clustered environment, run through a test with
a failed node. Note of course that unless you have an extra clustered
environment this can be tricky relative to downtime. Make sure you
have a planned maintenance window and that you’re prepared for
issues that may arise. While this will take some meticulous planning
to avoid complications, all the planning and studying to understand the
failover technologies will pay off – not just in the dry run, but in the real
thing when the knowledge is needed most.
Disk to Disk = Best Practices
The table below shows some examples and recovery approaches you
You have several options when considering the actual approach to backing
you’re facing.
can employ with this type of system in place, based on the scenario
up your system, especially as it relates to how you’ll store the backups, how
you make them available for restores, and how you archive those backups.
Typically, you can expect your backups to be needed for a restoration process
Recover a database
within a reasonably short time. This is because backups are used to recover
Restore the database; restore the logs, in order, from the point in time
a system after a system failure – not to “go back in time” to see data. This is
an important distinction because you’ll want to make sure your most recent
backups are both the most protected and the most readily available.
of the last backup. The resulting system will include all updates up to
the time of the more recent transaction log backup.
If you only want to recover to a specific point in time, determine which
log file occurs closest to the point in time before your target time
As a general rule of thumb, you’ll find that disk-to-disk backup is a much
period. Restore the database, restore the log files up to that point.
better solution than tape-based alternatives when it comes to recovery
options and processes. Some of the benefits of this approach include:
•
Speed – with no tape transfer process to work with, you can access your
database and transaction log backups immediately, providing a much
faster path to recovery.
•
•
Additional recovery options – you can use products like Lumigent’s
Recover a specific data element change
Using Log Explorer, you can review the transaction logs, locate the change
that was in error and restore the data to the value prior to the change.
Recover a dropped table
Log Explorer to work with the transaction logs, making transaction and
Restore your database and log files to a new, temporary database.
specific data element recovery possible. This may be possible with tape
From this database, you can copy the lost table back to the production
backup, but would force a restore to your server or other location.
database.
More reliable data storage medium – since you’re backing up to disk,
Alternate solution, use Lumigent’s Log Explorer product to recover the
you stand a better chance of not having the media go “bad” for your
lost table – recovery is possible for DROPped or TRUNCATEd tables,
backups. That said, of course, make sure you’re backing up your backup
depending on your transaction logs.
devices, just in case. Keep in mind too that the “Acts of God” issues
still remain – if you’re backing up to the disk on the same server that
has your SQL Server, or you’re backing up to another server physically
located near your SQL Server, you can still be in danger of not being able
to recover from fire or other catastrophic disaster. For this reason, it’s
Being Prepared for Recovery –
the Backup Process
good to keep archive copies (perhaps weekly, for example) off-site as a
By utilizing disk-based backup procedures, you can optimize your
last-step recovery mechanism.
responsiveness and available up time to support the recovery methods you’ll
need. By using the right tools, you will have a full circle of options when it
By backing up to disk, and keeping those backups online and available, you
comes to restoring and recovering from system and database issues.
are able to use world class tools to quickly provide recovery options. Time is
of the essence when you’re working to bring systems or data elements back
Backing up your information, and how you do it is just as important as having
online. Backing up to tape requires locating the tape, restoring to your server
the tools and knowledge available to you to recover your data. Backing up
– both of which require time and introduce variables that can stand in the
your data with tools or technologies that can become faulty or cause time
way of your recovery process’ success.
delays in your recovery cycles are simply not good practice.
If given a choice, it’s always a better solution to backup to disk.
A very significant tool you can use to optimize your system – both on the backup and
If you’re interested in more information on
either of the products mentioned, you can
visit their sites here:
recovery sides of the equation is the SQL LiteSpeed product from DBassociatesIT.
The product offers fast, non-CPU-intensive, encrypted and compressed backups. One
objection to backing up to disk has been the amount of disk space required to support
DBassociatesIT, SQL LiteSpeed
a solid recovery model. With LiteSpeed’s compression technologies, you’ll not have
http://www.dbassociatesit.com
to use third-party archive and compression utilities, and you can save drastically
on the disk space you need to store and manage your database and transaction
log backups.
Lumigent Technologies, Log Explorer
http://www.lumigent.com
LiteSpeed runs just like the native backup routines in SQL Server and syntax is nearly
identical to native backup options in all but just a few new commands. In addition, you
can address the security issues associated with traditional backups by encrypting your
database and transaction log backups with true encryption that protects the whole of
your backup set.
To be best prepared, set up a backup server – the destination for your backups. Install a
good amount of disk space and use this as the destination for your backups. Don’t store
the backups on the same drive as your databases. This is a solution that would provide
no recovery path when the disk fails.
Summary/Conclusion
There is much to consider as you build out your backup, restore and recovery plans. It’s
more than the ability to simply restore your database; you need to manage the recovery
options and make sure you have all available options available to you.
Be sure to write out your plan. Test the plan, practice the plan, and make sure others
that may be in contact with the servers in your absence are also aware of and familiar
with your plans. While restoration of a single point in time transaction isn’t something
you need to train everyone one, you should consider training on full system restores,
transaction log restores and how to work with the backup media you use.
Use 3rd party tools as appropriate to make sure your systems are both optimized and
providing the highest level of functionality you need. Having too many options is just not
possible when the users are screaming, the boss is sweating and you’re in the hot seat
to get things right again with your database server.
About Stephen Wynkoop
Stephen Wynkoop is the founder of
The SQL Server Worldwide User’s Group
(www.sswug.org) where he writes a daily
database column and newsletter, and a
Microsoft SQL Server MVP. Stephen is a bestselling SQL Server author and a well-known
speaker at technical conferences. Stephen
first started working with SQL Server when it
was first introduced in 1993 and has worked
with SQL Server ever since.
In addition, Stephen has authored online and
offline columns, books, and other references
on Office Development Technologies, web
site design and deployment technologies and
Microsoft Access.
Download