Provisioning Storage for Oracle Database
with ZFS and NetApp
Mike Carew
Oracle University
UK
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Content
In this presentation:
• Background: Some interesting things about disks.
• Key Features of ZFS managed storage
• Key Features of NetApp managed storage
• Provisioning storage for Oracle DB using ZFS
• Provisioning storage for Oracle DB using NetApp
1-2
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
A few interesting things about disks …
Two categories for all disks (including fc, sas, sata, pata, scsi,
ssd):

Failed
…. and

Failing
Always a disappointment:

size

speed

reliability
… we know this already, this is why we have RAID systems.
1-3
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Trends in Storage
As disk capacity increases:
 MTBF decreases
 disk bottleneck increases
Uncorrectable bit error rates have stayed roughly constant
1 in 10^14 bits (~12TB) for desktop-class drives
1 in 10^15 bits (~120TB) for enterprise-class drives (allegedly)
Bad sector every 8-20TB in practice (desktop and enterprise)
1-4
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Some facts: Measurements at CERN
How valuable is my data? How secure is my data on disk?
They wrote a simple application to write/verify 1GB file
 Write 1MB, sleep 1 second, etc. until 1GB has been written
 Read 1MB, verify, sleep 1 second, etc.
Ran continuously on servers with traditional HW RAID
After 3 weeks, found 152 instances of silent data corruption
 Previously thought “everything was fine”.
Traditional HW RAID only detected “noisy” data errors
Need end-to-end verification to catch silent data corruption
1-5
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
ZFS Key Features


Pooled Storage – Defines the physical aspects of capacity
and redundancy
Transactional object store – FS is always consistent
 Application still has to deal with file content consistency, but
ZFS manages the File System consistency.

End to end data integrity authentication: Recognition of
and Recovery from:
 bit rot, lost writes, misdirected writes, phantom writes

Snapshot backup through Copy on Write
 Lightweight, fast, low cost

1-6
Unparalleled scalability
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
ZFS Data Authentication
Checksum of data stored with parent data structure
Isolates checksum from data, therefore can validate the data
 Safeguards against: Bit rot, Phantom writes, Misdirected
reads and writes, DMA parity errors, Driver
bugs, Accidental overwrite
1-7
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
ZFS Self Healing
With redundant storage, ZFS detects the bad block from the
CRC stored in parent structure, then reconstructs from
alternative copy and re-writes the defective block to heal the
data.
1-8
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Virtual Devices and Dynamic Striping
ZFS dynamically stripes data across all of the top-level virtual
devices.
Stripe 1
Data
Stripe
Stripe 1
Stripe 3
Data
2
Mirror Device
36 GB
36 GB
36 GB
36 GB
Mirror Device
36 GB
36 GB
Stand-alone Devices
1-9
Stripe 2
36 GB
36 GB
36 GB
Mirrored Devices
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
36 GB
RAID-Z Dynamic Stripe Width





1 - 10
All writes are full stripe Writes
Adjusted to the size of the IO
Each logical block is its own
stripe.
Stripes written to vdevs
Avoids the Read-Modify-Writes
Record size/block size/stripe
size needs consideration for
Database use.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
NetApp Key Features

Write Anywhere file layout – WAFL
 Coalesces otherwise random writes into contiguous
sequential IO


Snapshots by reference – lightweight, low cost, fast
Write optimized – (Correspondingly not read optimized)
 NVRAM write cache for write performance and commitment


1 - 11
Mature data management applications: data backup, DR
replication, Application Integration (e.g. Snap Manager for
Oracle) – All based around the snapshot
ONTAP 8 Cluster Mode has scale out capabilities which
offer very high scalability options.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
NetApp Data Authentication
Block Checksums co-located with data
Not as extensive as ZFS measures
Safeguards against:
Bit rot
Other measures (RAID Scrubbing) needed to safeguard:
Phantom writes
Misdirected writes
1 - 12
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
NetApp Disk Aggregation Technology
Dual Parity raid groups – RAID-DP
The Raid Group is the protection boundary.
RAID-DP protects against dual concurrent disk failure within
the raid group. DP is the only practical choice!
Everything operating against a fixed size File System block – 4KB
(4KB WAFL Block size is not configurable or negotiable)
Raid
group
parity
1 - 13
double
parity
data
data
data
data
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
data
data
6-13
NetApp Disk Aggregation Concept
The Aggregate
Aggregates constructed from 1 or more raid groups
aggr0
DP
P
DP
P
rg1
DP
P
rg2
1 - 14
D
D
D
D
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
D
rg0
6-14
NetApp Disk Space Allocation: Flexible Volumes
Aggregates
The NetApp Aggregate is
equivalent to the ZFS pool. It
represents the useable capacity of
the disk.
Flexible Volumes
Are the means of using space.
They contain NAS file systems or
SAN luns (you choose) and can
be resized easily.
Snapshot Reserve
Management of snapshot backup
space is through snapshot
reserve.
10%
WAFL Overhead
WAFL Aggregate Space
FlexVol1
80%
.snapshot
90%
FlexVol
Space
95%
plus
Aggregate
Snapshot
Reserve
20%
FlexVol#n
80%
.snapshot
20%
5%
(adjustable)
Aggregate Snapshot Reserve
1 - 15
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
6-15
Disk and Data Protection
Data ONTAP protects against media flaws, misdirected writes
and lost writes in several ways:
• RAID-4 and RAID-DP protecting against disk failure
• Media Scrubbing – Periodic checking block data against
checksums
– Bit rot
•
RAID scrubbing – Periodic checking parity in 2 Dimensions
is good
– Lost writes
– Misdirected writes
– Phantom writes
1 - 16
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
6-16
Provisioning Storage for Oracle with ZFS
Array Considerations

ZFS designed to work with JBOD and disk level caches
 NVRAM write cache based arrays should ignore ZFS cache
flush requests



1 - 17
General ZFS rules apply ref use of whole disks
If using HW RAID Storage array to present LUNS, then
quantity of LUNs should equal the number of physical disk
Avoid dynamic space provisioning arrays for allocating
LUN’s for ZFS. ZFS uses the whole LUN space quickly
negating the benefits of thin and dynamic provisioning.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with ZFS
Pool Considerations

If the array technology gives enough redundancy then use
it. Duplicating the protection may work against you.
 ZFS may offer higher protection and recovery, but your array
may give enough

1 - 18
RAID-Z not recommended where IOPS performance is
important. Then use mirrors.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with ZFS
ZFS Record Size Considerations


1 - 19
Match ZFS record size to Oracle database block
size - The general rule is to set recordsize =
db_block_size for the file system that contains the
Oracle data files. This sets the maximum ZFS block
size equal to DB_BLOCK_SIZE. Resulting efficiencies
ensue in read performance and buffer cache
occupancy.
When the db_block_size is less than the OS memory
page size, 8 KB on SPARC systems and 4 KB on x86
systems, set the record size to the page size.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with ZFS
ZFS Record Size Considerations contd.

Modifying Record size is not retrospective
 Must copy files after record size change to have change
effected.

Performance may be optimized with different block sizes
for different DB components
 Set appropriate record sizes for those file systems that
contain the respective files using diff block sizes.
1 - 20
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with ZFS
Improving Writing and Caching Performance

logbias ZFS property: Latency or Throughput
 Redo – Latency
 Data – Throughput
 Unless … Storage throughput is saturated, then set redo
logbias to Throughput, (therefore not performing double IO
by first writing to ZIL and subsequently to FS, and as a
consequence overall improvement in performance results)

1 - 21
primarycache ZFS property – to control what is cached in
main memory (the primary ARC – Adaptive Replacement
Cache)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with ZFS
Use secondarycache (L2ARC)





1 - 22
Since Solaris 10 10/09
Store a cached copy of data for fast access
SSD devices recommended
Use the secondarycache ZFS property to determine
which file systems will use the secondary cache and
what the cache contents should be.
For read latency sensitive workloads
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with ZFS
Separation of Data from Redo logs

Consider physical separation of Data files from Redo
logs by placement in separate pools.
 Reduces conflict between sometimes opposite storage
needs
 Large storage for Data files require emphasis on throughput
 Small storage for Redo logs require emphasis on latency
1 - 23
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with NetApp
Write Performance

Write performance primarily achieved with NVRAM
 Remember NetApp is write optimized storage
 However, Physical disk must be able to keep up, otherwise
we lose benefits of NVRAM. We fall back to disk
performance from memory performance.
 Aggregated write throughput achieved with single large
aggregate for all volumes of all types: Data, redo, control
files.
1 - 24
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with NetApp
Read Performance

Read performance achieved through large aggregate with
as many disks as possible/necessary
 Many small disks better than few large disks

ONTAP 7.x is 32 bit system, suffers limits on aggregate
size (16TB)
 Large databases may need to span Aggregates
 Use ONTAP 8.x (64 bit aggregates)
1 - 25
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle with NetApp
SAN or NAS?

SAN and NAS both supported
 FC, iSCSI, FCoE
 NFS

SAN implies some need for RAID management
 ZFS managed LUN’s provisioned from NetApp?


Suggest not to do this, ZFS suited to JBOD
Or if must then focus on one or the other, do not try to use all
features of both. Unnecessarily complicated.
 Oracle ASM managed LUNs is a good solution using ASM
external redundancy. Not necessary to mirror when already
highly redundant.

1 - 26
NFS on NetApp is a perfect solution, thin provisioned file
system space, good performance, and easy management
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle using ZFS
Backup & Recovery Integration

Home grown self engineered solutions
 Can use snapshots and clones
 Replication of snapshots to Secondary DR/Backup location

zfs send operation
 Fast and efficient


1 - 27
Recommend using granular objects for easy of management
i.e. snap and send several small objects rather than single
massive file system – may never succeed.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Provisioning Storage for Oracle using NetApp
Backup & Recovery Integration

Mature data management tools
 Application layer integration with backup snapshots

Snap Manager for Oracle (SMO) offer hot backup integration for
OS image copy backup.
 SMO



Offers some integration with RMAN (Snapshot image copy
cataloging)
Supports DB cloning
Supports snapshot management of ASM disk groups built upon
NAS files or SAN devices.
 Storage layer replication with mature tools

1 - 28
Snapmirror (async/sync/semi-sync)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Summary




1 - 29
ZFS is a very powerful file system with unsurpassed
scalability, and many very interesting features.
Deploying Oracle on ZFS requires a detailed knowledge of
the demands placed by Oracle on the storage system, and
of ZFS to meet the need configurationally.
NetApp not so fully featured, but more mature
environment.
Simpler aggregation approach, although some severe size
limits if restricted to modern large disks on ONTAP 7.3
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thank You



1 - 30
I hope you’ve found the subject of interest.
Thank you for listening.
Any questions?
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.