Why Backup Your Production Database? I Never Do. Adam Backman adam@wss.com Partner, White Star Software Backup are for Sissies No reason to backup – Our stuff never fails Just takes up resources We already have redundancy I hate changing the tapes I’m tired I’m hungry I don’t feel good Portions of a OpenEdge Database Database table of contents (.db) The data files (.d*) Before image journals (.b*) After image journals (.a) Modified buffers in memory Other Important Stuff Application External data (GIS, photos, …) User files External systems (EDI, Data warehouse, …) Reliability is Important Loss of data is expensive Many businesses now lack a paper trail Redundancy does not equal reliability − Rogue program − 2 copies of bad data What is High Availability? Classic definition equals 24x7 operation − Examples: manufacturing/e-commerce/follow the sun − Little or no downtime − Maintenance is done in very specific windows More common definition − Traditional business 8a-6p, single country 3 time zones 9-5 − Operational hours are critical − Maintenance windows on a regular basis Unconventional definition − Is performance good enough to run the business Backup Process’ Impact to Production Backing up production − Pause during backup of before image journal − Uses I/O capacity of production − Impacts the effectiveness of the buffer pool Split mirror backup − Use of quiet point keeps pause to a minimum − Pause is non-zero After Image file backup − Still needs a backup to begin the process − Very little impact for backup process − Long recovery time Cover all sides Everyone should be running after image journaling Need removable backup periodically − Wide scale events (fire, flood, …) − To recover from after image journals Replication is becoming the new default − OE Replication − Log-based replicarion − Hardware replication OpenEdge Replication OpenEdge Replication is only replication method supported by Progress OpenEdge Replication is the only method that allows you to use the target database(s) from reporting OpenEdge Replication requires that you have after image journaling enabled Do not attempt to implement OpenEdge Replication until after you have a good AI management plan implemented OpenEdge Replication Production (source) Shared Memory Source DB Replication Server Reporting (target) Replication Agent Target DB Shared Memory Log-Based Replication Log-based replication has been used for years as OE Replication is a fairly new product Log-based replication provides a vehicle for replication without the licensing costs of OE Replication Not real-time Code for this type of replication must be maintained by the user and there is no official support from the vendor Hardware-Based Replication Hardware-based replication is a function of the hardware vendors and thus supported directly by them This method is NOT supported by Progress ALL write operations must be guarantied across the source and target disk systems Archiving Who does your archiving (Iron Mountain, thirdparty, someone’s house, …) What do you keep − Two weeks of dailies − 5 weeks of weeklies − 1 year of monthlies How to label you backups − Who did the backup − Command to restore − Date and Time Archiving (continued) Data Archiving − Archive/Delete? − Archive/Save historical − Archive/Save aggregates After Image file archiving − At least 2 backups worth − I recommend a week or more if possible Building a Good Recovery Strategy Know your business − Components of business how people do business with you − Components of systems Tools (applications and physical) Know your risks (fire, flood, hurricane, …) Be inclusive − Technical people (network, phones, facilities, …) − Business people (people who own the data) Build an execution plan with contingencies Creating a plan Goals (Event-based goals) − If we lose a disk (DB gone) − If we have a fire (Machine Gone) − If we have a natural disaster (Facility Gone) Hardware Software Data Other stuff Creating a plan - Goals Acceptable downtime (Generally cost based) Everyone wants zero but it is generally cost prohibitive Planned outages − Hardware install and maintenance − Software upgrade − O/S upgrade or patch Notifications (Both before and during outage) − Who − When − What do they do? Creating a Plan – Other Stuff What makes your business run? − Phones − Faxes − Business to Business (EDI, XML Feed, …) Can people work from home? Do you have/need another location? Contact lists in case of major catastrophe − Kept up-to-date − Kept online and printed in an accessible location How about if I am a SAAS user Who is your provider Verify their recovery plan Run dry run of at least one recovery scenraio Have specific service level agreements − Time to recover − Maximum loss of data − Penalties for missing times How about if I am a SAAS provider Build regular recovery plan Unique concerns − Security − Compliance (HIPAA, SOX, …) Build achievable SLAs for your users Implementing Your Plan First implementation should be a totally manual process to insure the steps work and allow for documentation Document the process as you go − − − − − Who are you logged in as? Exactly what you typed Where you were (console, remote, …) Can things be done in parallel or sequentially Where are the logs and what to look for in the logs Documentation All recovery documentation should be VERY specific Create documents for normal maintenance − Backups − Database growth − Modification of OS, Application, printers, … Create scenario based recovery plans − Lose a disk (or disk pair) − Fire − Flood Testing Your Plan Who does the test? − Not the person who wrote it − The backup person for the implementation − Someone who is “always” there regardless of technical ability How often to test? − − − − Material data change (10% increase is a good target) Any change in database configuration Do you have a second site or redundant hardware? Do you have enough disk capacity (space and throughput) How to test your plan Fail over to your backup system Fail back to your primary system Contingency planning for personnel, physical plant and equipment (Lead time for resources) Summary: Recovery Planning Get over it. You still need to backup. Backup your backup not production if possible Be inclusive when building your team Always backup what you have now, however little, before starting to recover Create and maintain a comprehensive plan − Include everything needed to use the application: Hardware, applications, and data Create and maintain physical and online contact lists Test your plan periodically (At least annually) Still have questions? Please feel free to contact me directly. Adam Backman White Star Software (603)897-1010 adam@wss.com Thank you for your time THANK YOU