I Never Backup Production So Why Would You?

advertisement
Why Backup Your Production Database? I Never Do.
Adam Backman adam@wss.com
Partner, White Star Software
Backup are for Sissies







No reason to backup – Our stuff never fails
Just takes up resources
We already have redundancy
I hate changing the tapes
I’m tired
I’m hungry
I don’t feel good
Portions of a OpenEdge Database





Database table of contents (.db)
The data files (.d*)
Before image journals (.b*)
After image journals (.a)
Modified buffers in memory
Other Important Stuff




Application
External data (GIS, photos, …)
User files
External systems (EDI, Data warehouse, …)
Reliability is Important
 Loss of data is expensive
 Many businesses now lack a paper trail
 Redundancy does not equal reliability
− Rogue program
− 2 copies of bad data
What is High Availability?
 Classic definition equals 24x7 operation
− Examples: manufacturing/e-commerce/follow the sun
− Little or no downtime
− Maintenance is done in very specific windows
 More common definition
− Traditional business 8a-6p, single country 3 time zones 9-5
− Operational hours are critical
− Maintenance windows on a regular basis
 Unconventional definition
− Is performance good enough to run the business
Backup Process’ Impact to Production
 Backing up production
− Pause during backup of before image journal
− Uses I/O capacity of production
− Impacts the effectiveness of the buffer pool
 Split mirror backup
− Use of quiet point keeps pause to a minimum
− Pause is non-zero
 After Image file backup
− Still needs a backup to begin the process
− Very little impact for backup process
− Long recovery time
Cover all sides
 Everyone should be running after image journaling
 Need removable backup periodically
− Wide scale events (fire, flood, …)
− To recover from after image journals
 Replication is becoming the new default
− OE Replication
− Log-based replicarion
− Hardware replication
OpenEdge Replication
 OpenEdge Replication is only replication method
supported by Progress
 OpenEdge Replication is the only method that
allows you to use the target database(s) from
reporting
 OpenEdge Replication requires that you have
after image journaling enabled
 Do not attempt to implement OpenEdge
Replication until after you have a good AI
management plan implemented
OpenEdge Replication
Production (source)
Shared
Memory
Source
DB
Replication
Server
Reporting (target)
Replication
Agent
Target
DB
Shared
Memory
Log-Based Replication
 Log-based replication has been used for years as
OE Replication is a fairly new product
 Log-based replication provides a vehicle for
replication without the licensing costs of OE
Replication
 Not real-time
 Code for this type of replication must be
maintained by the user and there is no official
support from the vendor
Hardware-Based Replication
 Hardware-based replication is a function of the
hardware vendors and thus supported directly by
them
 This method is NOT supported by Progress
 ALL write operations must be guarantied across
the source and target disk systems
Archiving
 Who does your archiving (Iron Mountain, thirdparty, someone’s house, …)
 What do you keep
− Two weeks of dailies
− 5 weeks of weeklies
− 1 year of monthlies
 How to label you backups
− Who did the backup
− Command to restore
− Date and Time
Archiving (continued)
 Data Archiving
− Archive/Delete?
− Archive/Save historical
− Archive/Save aggregates
 After Image file archiving
− At least 2 backups worth
− I recommend a week or more if possible
Building a Good Recovery Strategy
 Know your business
− Components of business
how people do business with you
− Components of systems
Tools (applications and physical)
 Know your risks (fire, flood, hurricane, …)
 Be inclusive
− Technical people (network, phones, facilities, …)
− Business people (people who own the data)
 Build an execution plan with contingencies
Creating a plan
 Goals (Event-based goals)
− If we lose a disk (DB gone)
− If we have a fire (Machine Gone)
− If we have a natural disaster (Facility Gone)




Hardware
Software
Data
Other stuff
Creating a plan - Goals
 Acceptable downtime (Generally cost based)
Everyone wants zero but it is generally cost prohibitive
 Planned outages
− Hardware install and maintenance
− Software upgrade
− O/S upgrade or patch
 Notifications (Both before and during outage)
− Who
− When
− What do they do?
Creating a Plan – Other Stuff
 What makes your business run?
− Phones
− Faxes
− Business to Business (EDI, XML Feed, …)
 Can people work from home?
 Do you have/need another location?
 Contact lists in case of major catastrophe
− Kept up-to-date
− Kept online and printed in an accessible location
How about if I am a SAAS user




Who is your provider
Verify their recovery plan
Run dry run of at least one recovery scenraio
Have specific service level agreements
− Time to recover
− Maximum loss of data
− Penalties for missing times
How about if I am a SAAS provider
 Build regular recovery plan
 Unique concerns
− Security
− Compliance (HIPAA, SOX, …)
 Build achievable SLAs for your users
Implementing Your Plan
 First implementation should be a totally manual
process to insure the steps work and allow for
documentation
 Document the process as you go
−
−
−
−
−
Who are you logged in as?
Exactly what you typed
Where you were (console, remote, …)
Can things be done in parallel or sequentially
Where are the logs and what to look for in the logs
Documentation
 All recovery documentation should be VERY
specific
 Create documents for normal maintenance
− Backups
− Database growth
− Modification of OS, Application, printers, …
 Create scenario based recovery plans
− Lose a disk (or disk pair)
− Fire
− Flood
Testing Your Plan
 Who does the test?
− Not the person who wrote it
− The backup person for the implementation
− Someone who is “always” there regardless of technical
ability
 How often to test?
−
−
−
−
Material data change (10% increase is a good target)
Any change in database configuration
Do you have a second site or redundant hardware?
Do you have enough disk capacity (space and
throughput)
How to test your plan
 Fail over to your backup system
 Fail back to your primary system
 Contingency planning for personnel, physical plant
and equipment (Lead time for resources)
Summary: Recovery Planning




Get over it. You still need to backup.
Backup your backup not production if possible
Be inclusive when building your team
Always backup what you have now, however little,
before starting to recover
 Create and maintain a comprehensive plan
− Include everything needed to use the application:
Hardware, applications, and data
 Create and maintain physical and online contact
lists
 Test your plan periodically (At least annually)
Still have questions?
Please feel free to contact me directly.
Adam Backman
White Star Software
(603)897-1010
adam@wss.com
Thank you for your time
THANK YOU
Download