PostgreSQL backups with NetWorker 1.0 Technical Notes

Technical Notes
PostgreSQL backups with NetWorker
Release number 1.0
302-001-174
REV 01
June 30, 2014
u
u
u
u
u
u
u
u
u
u
u
u
Audience................................................................................................................... 2
Requirements............................................................................................................ 2
Terminology............................................................................................................... 2
PostgreSQL backup methodologies............................................................................2
PostgreSQL dump backup ......................................................................................... 3
Configuring NetWorker for PostgreSQL dump backups................................................3
Recovering backup issued by pg_dump command..................................................... 4
PostgreSQL WAL backup............................................................................................ 5
Configuring NetWorker for WAL backups.................................................................... 7
Recovering WAL backup ............................................................................................ 9
Known limitations.................................................................................................... 10
Conclusion...............................................................................................................11
Technical Notes
Audience
The document is intended for use by system administrators of NetWorker. Readers of this
document are expected to have DBA-level knowledge of PostgreSQL as well as NetWorker
Administration skills to successfully implement backups and recoveries.
Requirements
Be aware of the following requirements before attempting any backups:
u
You should know the specific folder or filesystems to store the archive logs.
u
PostgreSQL configuration files must have been updated with the relevant attributes
for backup to succeed.
u
The NetWorker client must be installed and running on the PostgreSQL host. No
NetWorker Module (for example, NMDA) is required to be installed on the PostgreSQL
host.
Terminology
You should be familiar with the following terms and their definitions.
u
PostgreSQL— an open source object-relational database management system
(ORDBMS) with an emphasis on extensibility and compliance to standards.
u
Write Ahead Log (WAL)— a standard method for ensuring data integrity.
The WAL central concept is that changes to data files, where tables and indexes
reside, must be written only after those changes have been logged; that is, after log
records which describe the changes have been flushed to permanent storage. If we
follow this procedure, we do not need to flush data pages to disk on every transaction
commit because we know that in the event of a crash we will be able to recover the
database using the log: Any change that has not been applied to the data pages can
be redone from the log records. This is roll-forward recovery, also known as REDO.
PostgreSQL backup methodologies
You can perform PostgreSQL backups in two different ways. One way, as with many other
databases, is to dump the database, schema, and content to a file at certain point in time
and backup that file. Another way is to interface the backup software with the database
in order to have consistent online backups allowing Point in Time (PiT) recoveries.
PostgreSQL can perform both backups methods because its dump process has the ability
to interface the database dump with the backup as well an online backup functionality
called Write Ahead Logging (WAL). WAL creates redo log files that enable incremental
backups and PiT Recoveries.
Both the dump backup and the WAL backup methodologies are discussed in this
document. Dumps backups are easy and fast to set up but provide the ability to recover
to recover only at the time of the dump itself. WAL backups are a bit more complex to set
up but provide full flexibility over of recovery, because you can recover your database up
to a specific second.
2
PostgreSQL backups with NetWorker 1.0 Technical Notes
PostgreSQL dump backup
PostgreSQL dumps are text files containing all the SQL commands that recreate the
database in the exact same state as it was at the time of the dump.
Those dumps provide the minimum security needed for a backup administrator to
guarantee the backup and recovery of a database. PostgreSQL dump backup is the most
commonly used backup method.
pg_dump Command command usage
The easiest way to back up a database is to dump it to a file and back up this dump with
NetWorker. There are two different pg_dump commands to create PostgreSQL dumps.
u
pg_dump to dump a specific database
u
pg_dumpall to dump all databases of a specific PostgreSQL server
These create a full backup.
Full documentation on thepg_dump command is available http://www.postgresql.org/
docs under Dump.
Command usage example
These commands can be used with a script, either directly on the client definition as pre
commands starting with NetWorker 8.1, or within a savepnpc script. An example of a pre
command script is:
"C:\Program Files\PostgresPlus\9.3AS\bin\pg_dump.exe" -p 5432 -d
"testDB" -U postgres -w > "C:\Program Files\PostgresPlus\Backups
\dump_testDB.dmp
where:
u
-p 5432 specifies the port number 5432 that is being used to connect to the
PostgreSQL database. The port number is configurable.
u
-dspecifies specifies the folder location where the base backup is to be stored. For
ease of configuration and recovery, the folder location should be the same as the one
specified to store archive logs when you enable WAL.
u
-U postgres specifies that the username is postgres.
u
-w specifies that the application does not prompt for a password.
Configuring NetWorker for PostgreSQL dump backups
Configuring NetWorker for PostgreSQL dumpbackups involves using the pg_dump
commands. These can be used in a script which can be triggered by a pre command
attribute in the NetWorker client or in a savepnpc script.
Before you begin
u
The pre command attribute in the NetWorker client is supported on NetWorker
version 8.1 and later.
u
Ensure that:
l
the pre command script is prefixed with nsr
l
you save the pre command script in the NetWorker client /bin folder.
PostgreSQL dump backup
3
Technical Notes
Below is an example of the NetWorker client pre command setting:
Procedure
1. Create the script.
Ensure that the script contains the pg_dump or pg_dumpall command, with the
appropriate arguments to create the dump in a specific location which can be shared
with the archive logs generated by WAL. pg_dump Command command usage on
page 3 provides more information.
2. Generate a specific dump name that contains the date and time of the dump. Doing
this prevents issues during recovery.
3. Create a specific post command to remove the dump created from the NetWorker
client attribute or from the savepnpc script.
This is because if dumps are stored in the same folder as WAL archive logs, the
dumps are backed up as well. This increases the incremental backup time and
potentially could affect your SLAs.
Recovering backup issued by pg_dump command
This procedure describes PostgreSQL in-place recovery using the psql program.
PostgreSQL dumps are recovered using the psql program, reading back in and executing
the SQL commands contained in the dump file, such as the following:
-- EnterpriseDB database dump
-SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning; --- Name: Countries; Type: SCHEMA; Schema: -; Owner: postgres
-CREATE SCHEMA "Countries";
ALTER SCHEMA "Countries" OWNER TO postgres;
This recreates all the databases and schemas and inserts data in the databases to match
the exact point in time of when the dump was taken.
Procedure
1. Create the database dbname to be used by the psql dbname command.
You can create it graphically or by command line.
For example: createdb -T template0 dbname
2. Run the following command to restore the dump: psql dbname < dumpfile
4
PostgreSQL backups with NetWorker 1.0 Technical Notes
where dumpfile is the file output by the pg_dump command.
PostgreSQL WAL backup
Integrating WAL with NetWorker enables incremental backups of PostgreSQL databases
as well as Point-in-Time recoveries, by applying the logs to the recovered database up to
a certain point.
The high-level process for a WAL PostgreSQL backup is:
1. Configure the postgresql.conf file to copy the PostgreSQL logs in a specific location,
at a specific point in time, and to be available for backup.
2. Configure a base backup in Networker to back up those logs. A base backup is a
specific backup created by using either the pg_basebackup command or low-level
API calls. We discuss only the pg_basebackup command in this document.
3. Carry out an in initial full backup of the database.
4. Then incremental backups are enough to recover the database entirely. Best practice
is to run one full a day in order to speed up incremental recoveries because the
transaction log can be large.
Setting up PostgreSQL WAL backups
The WAL feature is configured directly in the postgresql.conf file located in the \data
folder of your PostgreSQL installation.
Before you begin
u
Create a folder or filesystem to save the archive log files.
u
Create a script using the pg_archivecleanup command to delete the archive log
files. "After you finish" provides more information.
u
Ensure that you create a specific user to run the PostgreSQL service, and that you
match the PostgreSQL admin privileges. This is on both Windows and UNIX. Doing
this eases recoveries.
u
For Windows: Ensure that permissions are granted for the PostgreSQL user for the /
backup folder, as logs won't be populated if it isn't set.
u
For Linux/UNIX: Use the cp command to copy the files when you archive and other
applicable Linux/UNIX commands.
Procedure
1. Go to the postgresql.conf file located in the \data folder of your PostgreSQL
installation.
2. Edit the postgresql.conf file attributes according to the following table to enable
consistent backups:
Attribute
Value to set
Details
wal_level
Archive
Enables Write-Ahead Logging
fsync
on
Ensures that updates are physically written to
disk
synchronous_commit on
Ensures that transaction commit waits for WAL
records to be written to disk before command
success is indicated.
PostgreSQL WAL backup
5
Technical Notes
Attribute
Value to set
Details
wal_sync_method
fsynch
Calls the fsync command at each commit to
ensure consistency
full_page_writes
on
Ensures that PostgreSQL writes the entire content
of each disk page to WAL during the first
modification of that page after a checkpoint
archive_mode
on
Enables Write-Ahead Logging
archive_command
Windows: copy "%p"
"D:\\backup\\%f"
Secures the system archive logs
UNIX: cp "%p" "/
backup/%f"
Note that the archive
command calls the cp
function to copy the
files or any other UNIX/
Linux OS function
archive_timeout
Your RPO value; default
is 600.
Specifies when the archive log files will be
switched and thus copied to the /backup folder.
This value must match your Recovery Point
Objective (RPO).
PostgreSQL default archive log size is 16MB with
an archive_timeout default of 600 seconds. The
default generates a daily log file of 2.3GB. "After
you finish" provides additional information.
3. Save and close the file.
After you finish
u
Errors can be seen in the pg_logs folder:
2013-12-30 15:54:42 PST LOG: archive command failed with
exit code 1
2013-12-30 15:54:42 PST DETAIL: The failed archive command
was: copy "pg_xlog\000000010000000000000001" "C:\Program
Files\PostgresPlus\Backups\000000010000000000000001" Access
is denied
6
u
On Windows, errors are logged to the /pg_logs folder.
u
To delete the archive log file, use the pg_archivecleanup command. For
example: pg_archivecleanup archivelocation restartwalfile
u
Recommended best practices regarding deleting the archive log file:
u
Run the pg_archivecleanup command daily.
u
Delete the previous day log file each time you run the pg_archivecleanup
command.
u
You can use a script to launch pg_archivecleanup command after the backup or
at any point in time.
PostgreSQL backups with NetWorker 1.0 Technical Notes
pg_basebackup command usage
You use the native command pg_basebackup to create a base backup of the
PostgreSQL databases. The base backup is mandatory for point-in-time recoveries
because it informs the database of the backup and creates a checkpoint that transaction
logging refers to.
Full documentation on the pg_basebackup command is available http://
www.postgresql.org/docs under Backup.
Command usage example
pg_basebackup.exe -p 5432 -U postgres -w -D D:\backups\unique_folder_name -X stream
-F plain
where:
u
-p 5432 specifies the port number 5432 that is being used to connect to the
PostgreSQL database. The port number is configurable.
u
-U postgres specifies that the username is postgres.
u
-w specifies that the application does not prompt for a password.
u
-Dspecifies the folder location where the base backup is to be stored. For ease of
configuration and recovery, the folder location should be the same as the one
specified to store archive logs when you enable WAL.
u
-X stream specifies that the incoming transactions come in as a stream so no data is
lost during the base backup.
u
-F plainspecifies that the data is stored in plain-text format for ease of recovery. It
could be stored as a tarball.
Configuring NetWorker for WAL backups
To carry out a complete PostgreSQL WAL backup, you must perform a base backup by
using the pg_basebackup command and you must back up the transaction logs.
Both the basebackup using the pg_basebackup command and the backup of the
transaction logs can be performed as part of an incremental backup. The
pg_basebackup command creates a dump of the file needed in a specific folder or
tarball.
On NetWorker, backups need a specific pre command to create the base backup.
Configuring NetWorker for PostgreSQL dump backups on page 3 provides information on
pre commands. The backup must include the backup folder location you specify in the
PostgreSQL configuration, as in the procedure below:
Procedure
1. In Client Properties, General tab:
a. Ensure that the following options are selected: In Backup: Scheduled backup and
Client direct; In Group: Posgresql; and Backup renamed directories
b. Set the following file paths in the Save set field: C:\Program Files
\PostgresPlus\9.3AS and C:\Program Files\PostgresPlus
\Backups
pg_basebackup command usage
7
Technical Notes
The image below provides an example of the save set files and other option
settings:
2. In Group Properties, Advanced tab, set the backup interval.
In the Group Properties setting a special configuration might be needed. For example,
the backup interval could be set according to your SLAs or to whatever plan you deem
convenient.
The image below provides an example of a backup interval set to every 30 minutes, with
one full backup every Friday.
8
PostgreSQL backups with NetWorker 1.0 Technical Notes
Recovering WAL backup
This procedure describes PostgreSQL in-place recovery for a WAL backup. This is also
applicable to directed recoveries and disaster recoveries.
Procedure
1. Go to the /postgresql/data folder and delete the contents.
2. Go to the backup directory where the archive logs are stored and delete contents.
3. Launch NetWorker.
4. Run the recovery of the backup directory where the archive logs are stored, at the
closest Point in Time to which you want to recover.
The images below provide an example of the backup directory recoveries:
5. Log in to the PosgreSQL server and move the data contained in the base backup folder
where the transaction logs are stored to the data directory.
6. Create a recover.conf file under the /postgresql/data folder.
7. Open the recover.conf file in the /postgresql/data folder and edit the following
attributes:
Recovering WAL backup
9
Technical Notes
Attribute
Value to set
Details
restore_command =
'copy "C:\\Program
Files\\PostgresPlus\
\Backups\\%f" %p'
Ensures that the copy log files are
saved to the PostgreSQL backups
folder.
recovery_target_time =
'[timestamp]'
Ensures that recovery roll-forward
stops at a specific time. For example:
'2013-12-31 10:30:00 PST'
recovery_target_inclusive= true
Ensures that recovery roll-forward
stop includes includes the given
target time.
8. Restart PostgreSQL and verify the backup logs.
After you finish
After the PostgreSQL database is recovered, the recover.conf file is automatically
renamed to recover.done. Restarts to not affect the file.
Depending on your backup policy, you may need to reinstall PostgreSQL.
Known limitations
You should be aware of the following limitations for WAL backups. PostgreSQL
documentation [is there a particular doc name or subject we can reference?] provides
additional information on known limitations.
10
u
Operations on hash indexes are not presently WAL-logged, so replay will not update
these indexes. This will mean that any new inserts will be ignored by the index,
updated rows will apparently disappear and deleted rows will still retain pointers. In
other words, if you modify a table with a hash index on it then you will get incorrect
query results on a standby server.
When recovery completes it is recommended that you manually REINDEX each such
index after completing a recovery operation.
u
If a CREATE DATABASE command is executed while a base backup is being taken,
and then the template database that the CREATE DATABASE copied is modified
while the base backup is still in progress, it is possible that recovery will cause those
modifications to be propagated into the created database as well. This is of course
undesirable. To avoid this risk, it is best not to modify any template databases while
taking a base backup.
u
CREATE TABLESPACE commands are WAL-logged with the literal absolute path,
and will therefore be replayed as tablespace creations with the same absolute path.
This might be undesirable if the log is being replayed on a different machine. It can
be dangerous even if the log is being replayed on the same machine, but into a new
data directory: The replay will still overwrite the contents of the original tablespace.
To avoid potential "gotchas" of this sort, the best practice is to take a new base
backup after creating or dropping tablespaces.
PostgreSQL backups with NetWorker 1.0 Technical Notes
Conclusion
PostgreSQL WAL funtionality, when configured with NetWorker, enables you to run
successful backups and recoveries, so that you can meet company RPOs and RTOs with a
minimum amount of work.
This interaction gives you the ability to recover data up to a second or before specific
actions, so that you can restore a complete systems as it was before a crash or
corruption.
Conclusion
11
Technical Notes
Copyright © 2014 EMC Corporation. All rights reserved. Published in USA.
Published June 30, 2014
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software
license.
EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries.
All other trademarks used herein are the property of their respective owners.
For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com).
12
PostgreSQL backups with NetWorker 1.0 Technical Notes