Technical Notes PostgreSQL backups with NetWorker Release number 1.0 302-001-174 REV 01 June 30, 2014 u u u u u u u u u u u u Audience................................................................................................................... 2 Requirements............................................................................................................ 2 Terminology............................................................................................................... 2 PostgreSQL backup methodologies............................................................................2 PostgreSQL dump backup ......................................................................................... 3 Configuring NetWorker for PostgreSQL dump backups................................................3 Recovering backup issued by pg_dump command..................................................... 4 PostgreSQL WAL backup............................................................................................ 5 Configuring NetWorker for WAL backups.................................................................... 7 Recovering WAL backup ............................................................................................ 9 Known limitations.................................................................................................... 10 Conclusion...............................................................................................................11 Technical Notes Audience The document is intended for use by system administrators of NetWorker. Readers of this document are expected to have DBA-level knowledge of PostgreSQL as well as NetWorker Administration skills to successfully implement backups and recoveries. Requirements Be aware of the following requirements before attempting any backups: u You should know the specific folder or filesystems to store the archive logs. u PostgreSQL configuration files must have been updated with the relevant attributes for backup to succeed. u The NetWorker client must be installed and running on the PostgreSQL host. No NetWorker Module (for example, NMDA) is required to be installed on the PostgreSQL host. Terminology You should be familiar with the following terms and their definitions. u PostgreSQL— an open source object-relational database management system (ORDBMS) with an emphasis on extensibility and compliance to standards. u Write Ahead Log (WAL)— a standard method for ensuring data integrity. The WAL central concept is that changes to data files, where tables and indexes reside, must be written only after those changes have been logged; that is, after log records which describe the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit because we know that in the event of a crash we will be able to recover the database using the log: Any change that has not been applied to the data pages can be redone from the log records. This is roll-forward recovery, also known as REDO. PostgreSQL backup methodologies You can perform PostgreSQL backups in two different ways. One way, as with many other databases, is to dump the database, schema, and content to a file at certain point in time and backup that file. Another way is to interface the backup software with the database in order to have consistent online backups allowing Point in Time (PiT) recoveries. PostgreSQL can perform both backups methods because its dump process has the ability to interface the database dump with the backup as well an online backup functionality called Write Ahead Logging (WAL). WAL creates redo log files that enable incremental backups and PiT Recoveries. Both the dump backup and the WAL backup methodologies are discussed in this document. Dumps backups are easy and fast to set up but provide the ability to recover to recover only at the time of the dump itself. WAL backups are a bit more complex to set up but provide full flexibility over of recovery, because you can recover your database up to a specific second. 2 PostgreSQL backups with NetWorker 1.0 Technical Notes PostgreSQL dump backup PostgreSQL dumps are text files containing all the SQL commands that recreate the database in the exact same state as it was at the time of the dump. Those dumps provide the minimum security needed for a backup administrator to guarantee the backup and recovery of a database. PostgreSQL dump backup is the most commonly used backup method. pg_dump Command command usage The easiest way to back up a database is to dump it to a file and back up this dump with NetWorker. There are two different pg_dump commands to create PostgreSQL dumps. u pg_dump to dump a specific database u pg_dumpall to dump all databases of a specific PostgreSQL server These create a full backup. Full documentation on thepg_dump command is available http://www.postgresql.org/ docs under Dump. Command usage example These commands can be used with a script, either directly on the client definition as pre commands starting with NetWorker 8.1, or within a savepnpc script. An example of a pre command script is: "C:\Program Files\PostgresPlus\9.3AS\bin\pg_dump.exe" -p 5432 -d "testDB" -U postgres -w > "C:\Program Files\PostgresPlus\Backups \dump_testDB.dmp where: u -p 5432 specifies the port number 5432 that is being used to connect to the PostgreSQL database. The port number is configurable. u -dspecifies specifies the folder location where the base backup is to be stored. For ease of configuration and recovery, the folder location should be the same as the one specified to store archive logs when you enable WAL. u -U postgres specifies that the username is postgres. u -w specifies that the application does not prompt for a password. Configuring NetWorker for PostgreSQL dump backups Configuring NetWorker for PostgreSQL dumpbackups involves using the pg_dump commands. These can be used in a script which can be triggered by a pre command attribute in the NetWorker client or in a savepnpc script. Before you begin u The pre command attribute in the NetWorker client is supported on NetWorker version 8.1 and later. u Ensure that: l the pre command script is prefixed with nsr l you save the pre command script in the NetWorker client /bin folder. PostgreSQL dump backup 3 Technical Notes Below is an example of the NetWorker client pre command setting: Procedure 1. Create the script. Ensure that the script contains the pg_dump or pg_dumpall command, with the appropriate arguments to create the dump in a specific location which can be shared with the archive logs generated by WAL. pg_dump Command command usage on page 3 provides more information. 2. Generate a specific dump name that contains the date and time of the dump. Doing this prevents issues during recovery. 3. Create a specific post command to remove the dump created from the NetWorker client attribute or from the savepnpc script. This is because if dumps are stored in the same folder as WAL archive logs, the dumps are backed up as well. This increases the incremental backup time and potentially could affect your SLAs. Recovering backup issued by pg_dump command This procedure describes PostgreSQL in-place recovery using the psql program. PostgreSQL dumps are recovered using the psql program, reading back in and executing the SQL commands contained in the dump file, such as the following: -- EnterpriseDB database dump -SET statement_timeout = 0; SET lock_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SET check_function_bodies = false; SET client_min_messages = warning; --- Name: Countries; Type: SCHEMA; Schema: -; Owner: postgres -CREATE SCHEMA "Countries"; ALTER SCHEMA "Countries" OWNER TO postgres; This recreates all the databases and schemas and inserts data in the databases to match the exact point in time of when the dump was taken. Procedure 1. Create the database dbname to be used by the psql dbname command. You can create it graphically or by command line. For example: createdb -T template0 dbname 2. Run the following command to restore the dump: psql dbname < dumpfile 4 PostgreSQL backups with NetWorker 1.0 Technical Notes where dumpfile is the file output by the pg_dump command. PostgreSQL WAL backup Integrating WAL with NetWorker enables incremental backups of PostgreSQL databases as well as Point-in-Time recoveries, by applying the logs to the recovered database up to a certain point. The high-level process for a WAL PostgreSQL backup is: 1. Configure the postgresql.conf file to copy the PostgreSQL logs in a specific location, at a specific point in time, and to be available for backup. 2. Configure a base backup in Networker to back up those logs. A base backup is a specific backup created by using either the pg_basebackup command or low-level API calls. We discuss only the pg_basebackup command in this document. 3. Carry out an in initial full backup of the database. 4. Then incremental backups are enough to recover the database entirely. Best practice is to run one full a day in order to speed up incremental recoveries because the transaction log can be large. Setting up PostgreSQL WAL backups The WAL feature is configured directly in the postgresql.conf file located in the \data folder of your PostgreSQL installation. Before you begin u Create a folder or filesystem to save the archive log files. u Create a script using the pg_archivecleanup command to delete the archive log files. "After you finish" provides more information. u Ensure that you create a specific user to run the PostgreSQL service, and that you match the PostgreSQL admin privileges. This is on both Windows and UNIX. Doing this eases recoveries. u For Windows: Ensure that permissions are granted for the PostgreSQL user for the / backup folder, as logs won't be populated if it isn't set. u For Linux/UNIX: Use the cp command to copy the files when you archive and other applicable Linux/UNIX commands. Procedure 1. Go to the postgresql.conf file located in the \data folder of your PostgreSQL installation. 2. Edit the postgresql.conf file attributes according to the following table to enable consistent backups: Attribute Value to set Details wal_level Archive Enables Write-Ahead Logging fsync on Ensures that updates are physically written to disk synchronous_commit on Ensures that transaction commit waits for WAL records to be written to disk before command success is indicated. PostgreSQL WAL backup 5 Technical Notes Attribute Value to set Details wal_sync_method fsynch Calls the fsync command at each commit to ensure consistency full_page_writes on Ensures that PostgreSQL writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint archive_mode on Enables Write-Ahead Logging archive_command Windows: copy "%p" "D:\\backup\\%f" Secures the system archive logs UNIX: cp "%p" "/ backup/%f" Note that the archive command calls the cp function to copy the files or any other UNIX/ Linux OS function archive_timeout Your RPO value; default is 600. Specifies when the archive log files will be switched and thus copied to the /backup folder. This value must match your Recovery Point Objective (RPO). PostgreSQL default archive log size is 16MB with an archive_timeout default of 600 seconds. The default generates a daily log file of 2.3GB. "After you finish" provides additional information. 3. Save and close the file. After you finish u Errors can be seen in the pg_logs folder: 2013-12-30 15:54:42 PST LOG: archive command failed with exit code 1 2013-12-30 15:54:42 PST DETAIL: The failed archive command was: copy "pg_xlog\000000010000000000000001" "C:\Program Files\PostgresPlus\Backups\000000010000000000000001" Access is denied 6 u On Windows, errors are logged to the /pg_logs folder. u To delete the archive log file, use the pg_archivecleanup command. For example: pg_archivecleanup archivelocation restartwalfile u Recommended best practices regarding deleting the archive log file: u Run the pg_archivecleanup command daily. u Delete the previous day log file each time you run the pg_archivecleanup command. u You can use a script to launch pg_archivecleanup command after the backup or at any point in time. PostgreSQL backups with NetWorker 1.0 Technical Notes pg_basebackup command usage You use the native command pg_basebackup to create a base backup of the PostgreSQL databases. The base backup is mandatory for point-in-time recoveries because it informs the database of the backup and creates a checkpoint that transaction logging refers to. Full documentation on the pg_basebackup command is available http:// www.postgresql.org/docs under Backup. Command usage example pg_basebackup.exe -p 5432 -U postgres -w -D D:\backups\unique_folder_name -X stream -F plain where: u -p 5432 specifies the port number 5432 that is being used to connect to the PostgreSQL database. The port number is configurable. u -U postgres specifies that the username is postgres. u -w specifies that the application does not prompt for a password. u -Dspecifies the folder location where the base backup is to be stored. For ease of configuration and recovery, the folder location should be the same as the one specified to store archive logs when you enable WAL. u -X stream specifies that the incoming transactions come in as a stream so no data is lost during the base backup. u -F plainspecifies that the data is stored in plain-text format for ease of recovery. It could be stored as a tarball. Configuring NetWorker for WAL backups To carry out a complete PostgreSQL WAL backup, you must perform a base backup by using the pg_basebackup command and you must back up the transaction logs. Both the basebackup using the pg_basebackup command and the backup of the transaction logs can be performed as part of an incremental backup. The pg_basebackup command creates a dump of the file needed in a specific folder or tarball. On NetWorker, backups need a specific pre command to create the base backup. Configuring NetWorker for PostgreSQL dump backups on page 3 provides information on pre commands. The backup must include the backup folder location you specify in the PostgreSQL configuration, as in the procedure below: Procedure 1. In Client Properties, General tab: a. Ensure that the following options are selected: In Backup: Scheduled backup and Client direct; In Group: Posgresql; and Backup renamed directories b. Set the following file paths in the Save set field: C:\Program Files \PostgresPlus\9.3AS and C:\Program Files\PostgresPlus \Backups pg_basebackup command usage 7 Technical Notes The image below provides an example of the save set files and other option settings: 2. In Group Properties, Advanced tab, set the backup interval. In the Group Properties setting a special configuration might be needed. For example, the backup interval could be set according to your SLAs or to whatever plan you deem convenient. The image below provides an example of a backup interval set to every 30 minutes, with one full backup every Friday. 8 PostgreSQL backups with NetWorker 1.0 Technical Notes Recovering WAL backup This procedure describes PostgreSQL in-place recovery for a WAL backup. This is also applicable to directed recoveries and disaster recoveries. Procedure 1. Go to the /postgresql/data folder and delete the contents. 2. Go to the backup directory where the archive logs are stored and delete contents. 3. Launch NetWorker. 4. Run the recovery of the backup directory where the archive logs are stored, at the closest Point in Time to which you want to recover. The images below provide an example of the backup directory recoveries: 5. Log in to the PosgreSQL server and move the data contained in the base backup folder where the transaction logs are stored to the data directory. 6. Create a recover.conf file under the /postgresql/data folder. 7. Open the recover.conf file in the /postgresql/data folder and edit the following attributes: Recovering WAL backup 9 Technical Notes Attribute Value to set Details restore_command = 'copy "C:\\Program Files\\PostgresPlus\ \Backups\\%f" %p' Ensures that the copy log files are saved to the PostgreSQL backups folder. recovery_target_time = '[timestamp]' Ensures that recovery roll-forward stops at a specific time. For example: '2013-12-31 10:30:00 PST' recovery_target_inclusive= true Ensures that recovery roll-forward stop includes includes the given target time. 8. Restart PostgreSQL and verify the backup logs. After you finish After the PostgreSQL database is recovered, the recover.conf file is automatically renamed to recover.done. Restarts to not affect the file. Depending on your backup policy, you may need to reinstall PostgreSQL. Known limitations You should be aware of the following limitations for WAL backups. PostgreSQL documentation [is there a particular doc name or subject we can reference?] provides additional information on known limitations. 10 u Operations on hash indexes are not presently WAL-logged, so replay will not update these indexes. This will mean that any new inserts will be ignored by the index, updated rows will apparently disappear and deleted rows will still retain pointers. In other words, if you modify a table with a hash index on it then you will get incorrect query results on a standby server. When recovery completes it is recommended that you manually REINDEX each such index after completing a recovery operation. u If a CREATE DATABASE command is executed while a base backup is being taken, and then the template database that the CREATE DATABASE copied is modified while the base backup is still in progress, it is possible that recovery will cause those modifications to be propagated into the created database as well. This is of course undesirable. To avoid this risk, it is best not to modify any template databases while taking a base backup. u CREATE TABLESPACE commands are WAL-logged with the literal absolute path, and will therefore be replayed as tablespace creations with the same absolute path. This might be undesirable if the log is being replayed on a different machine. It can be dangerous even if the log is being replayed on the same machine, but into a new data directory: The replay will still overwrite the contents of the original tablespace. To avoid potential "gotchas" of this sort, the best practice is to take a new base backup after creating or dropping tablespaces. PostgreSQL backups with NetWorker 1.0 Technical Notes Conclusion PostgreSQL WAL funtionality, when configured with NetWorker, enables you to run successful backups and recoveries, so that you can meet company RPOs and RTOs with a minimum amount of work. This interaction gives you the ability to recover data up to a second or before specific actions, so that you can restore a complete systems as it was before a crash or corruption. Conclusion 11 Technical Notes Copyright © 2014 EMC Corporation. All rights reserved. Published in USA. Published June 30, 2014 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com). 12 PostgreSQL backups with NetWorker 1.0 Technical Notes