Provisioning Storage for Oracle Database with ZFS and NetApp Mike Carew Oracle University UK Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Content In this presentation: • Background: Some interesting things about disks. • Key Features of ZFS managed storage • Key Features of NetApp managed storage • Provisioning storage for Oracle DB using ZFS • Provisioning storage for Oracle DB using NetApp 1-2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. A few interesting things about disks … Two categories for all disks (including fc, sas, sata, pata, scsi, ssd): Failed …. and Failing Always a disappointment: size speed reliability … we know this already, this is why we have RAID systems. 1-3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Trends in Storage As disk capacity increases: MTBF decreases disk bottleneck increases Uncorrectable bit error rates have stayed roughly constant 1 in 10^14 bits (~12TB) for desktop-class drives 1 in 10^15 bits (~120TB) for enterprise-class drives (allegedly) Bad sector every 8-20TB in practice (desktop and enterprise) 1-4 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Some facts: Measurements at CERN How valuable is my data? How secure is my data on disk? They wrote a simple application to write/verify 1GB file Write 1MB, sleep 1 second, etc. until 1GB has been written Read 1MB, verify, sleep 1 second, etc. Ran continuously on servers with traditional HW RAID After 3 weeks, found 152 instances of silent data corruption Previously thought “everything was fine”. Traditional HW RAID only detected “noisy” data errors Need end-to-end verification to catch silent data corruption 1-5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. ZFS Key Features Pooled Storage – Defines the physical aspects of capacity and redundancy Transactional object store – FS is always consistent Application still has to deal with file content consistency, but ZFS manages the File System consistency. End to end data integrity authentication: Recognition of and Recovery from: bit rot, lost writes, misdirected writes, phantom writes Snapshot backup through Copy on Write Lightweight, fast, low cost 1-6 Unparalleled scalability Copyright © 2013, Oracle and/or its affiliates. All rights reserved. ZFS Data Authentication Checksum of data stored with parent data structure Isolates checksum from data, therefore can validate the data Safeguards against: Bit rot, Phantom writes, Misdirected reads and writes, DMA parity errors, Driver bugs, Accidental overwrite 1-7 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. ZFS Self Healing With redundant storage, ZFS detects the bad block from the CRC stored in parent structure, then reconstructs from alternative copy and re-writes the defective block to heal the data. 1-8 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Virtual Devices and Dynamic Striping ZFS dynamically stripes data across all of the top-level virtual devices. Stripe 1 Data Stripe Stripe 1 Stripe 3 Data 2 Mirror Device 36 GB 36 GB 36 GB 36 GB Mirror Device 36 GB 36 GB Stand-alone Devices 1-9 Stripe 2 36 GB 36 GB 36 GB Mirrored Devices Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 36 GB RAID-Z Dynamic Stripe Width 1 - 10 All writes are full stripe Writes Adjusted to the size of the IO Each logical block is its own stripe. Stripes written to vdevs Avoids the Read-Modify-Writes Record size/block size/stripe size needs consideration for Database use. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. NetApp Key Features Write Anywhere file layout – WAFL Coalesces otherwise random writes into contiguous sequential IO Snapshots by reference – lightweight, low cost, fast Write optimized – (Correspondingly not read optimized) NVRAM write cache for write performance and commitment 1 - 11 Mature data management applications: data backup, DR replication, Application Integration (e.g. Snap Manager for Oracle) – All based around the snapshot ONTAP 8 Cluster Mode has scale out capabilities which offer very high scalability options. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. NetApp Data Authentication Block Checksums co-located with data Not as extensive as ZFS measures Safeguards against: Bit rot Other measures (RAID Scrubbing) needed to safeguard: Phantom writes Misdirected writes 1 - 12 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. NetApp Disk Aggregation Technology Dual Parity raid groups – RAID-DP The Raid Group is the protection boundary. RAID-DP protects against dual concurrent disk failure within the raid group. DP is the only practical choice! Everything operating against a fixed size File System block – 4KB (4KB WAFL Block size is not configurable or negotiable) Raid group parity 1 - 13 double parity data data data data Copyright © 2013, Oracle and/or its affiliates. All rights reserved. data data 6-13 NetApp Disk Aggregation Concept The Aggregate Aggregates constructed from 1 or more raid groups aggr0 DP P DP P rg1 DP P rg2 1 - 14 D D D D Copyright © 2013, Oracle and/or its affiliates. All rights reserved. D rg0 6-14 NetApp Disk Space Allocation: Flexible Volumes Aggregates The NetApp Aggregate is equivalent to the ZFS pool. It represents the useable capacity of the disk. Flexible Volumes Are the means of using space. They contain NAS file systems or SAN luns (you choose) and can be resized easily. Snapshot Reserve Management of snapshot backup space is through snapshot reserve. 10% WAFL Overhead WAFL Aggregate Space FlexVol1 80% .snapshot 90% FlexVol Space 95% plus Aggregate Snapshot Reserve 20% FlexVol#n 80% .snapshot 20% 5% (adjustable) Aggregate Snapshot Reserve 1 - 15 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6-15 Disk and Data Protection Data ONTAP protects against media flaws, misdirected writes and lost writes in several ways: • RAID-4 and RAID-DP protecting against disk failure • Media Scrubbing – Periodic checking block data against checksums – Bit rot • RAID scrubbing – Periodic checking parity in 2 Dimensions is good – Lost writes – Misdirected writes – Phantom writes 1 - 16 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6-16 Provisioning Storage for Oracle with ZFS Array Considerations ZFS designed to work with JBOD and disk level caches NVRAM write cache based arrays should ignore ZFS cache flush requests 1 - 17 General ZFS rules apply ref use of whole disks If using HW RAID Storage array to present LUNS, then quantity of LUNs should equal the number of physical disk Avoid dynamic space provisioning arrays for allocating LUN’s for ZFS. ZFS uses the whole LUN space quickly negating the benefits of thin and dynamic provisioning. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with ZFS Pool Considerations If the array technology gives enough redundancy then use it. Duplicating the protection may work against you. ZFS may offer higher protection and recovery, but your array may give enough 1 - 18 RAID-Z not recommended where IOPS performance is important. Then use mirrors. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with ZFS ZFS Record Size Considerations 1 - 19 Match ZFS record size to Oracle database block size - The general rule is to set recordsize = db_block_size for the file system that contains the Oracle data files. This sets the maximum ZFS block size equal to DB_BLOCK_SIZE. Resulting efficiencies ensue in read performance and buffer cache occupancy. When the db_block_size is less than the OS memory page size, 8 KB on SPARC systems and 4 KB on x86 systems, set the record size to the page size. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with ZFS ZFS Record Size Considerations contd. Modifying Record size is not retrospective Must copy files after record size change to have change effected. Performance may be optimized with different block sizes for different DB components Set appropriate record sizes for those file systems that contain the respective files using diff block sizes. 1 - 20 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with ZFS Improving Writing and Caching Performance logbias ZFS property: Latency or Throughput Redo – Latency Data – Throughput Unless … Storage throughput is saturated, then set redo logbias to Throughput, (therefore not performing double IO by first writing to ZIL and subsequently to FS, and as a consequence overall improvement in performance results) 1 - 21 primarycache ZFS property – to control what is cached in main memory (the primary ARC – Adaptive Replacement Cache) Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with ZFS Use secondarycache (L2ARC) 1 - 22 Since Solaris 10 10/09 Store a cached copy of data for fast access SSD devices recommended Use the secondarycache ZFS property to determine which file systems will use the secondary cache and what the cache contents should be. For read latency sensitive workloads Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with ZFS Separation of Data from Redo logs Consider physical separation of Data files from Redo logs by placement in separate pools. Reduces conflict between sometimes opposite storage needs Large storage for Data files require emphasis on throughput Small storage for Redo logs require emphasis on latency 1 - 23 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with NetApp Write Performance Write performance primarily achieved with NVRAM Remember NetApp is write optimized storage However, Physical disk must be able to keep up, otherwise we lose benefits of NVRAM. We fall back to disk performance from memory performance. Aggregated write throughput achieved with single large aggregate for all volumes of all types: Data, redo, control files. 1 - 24 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with NetApp Read Performance Read performance achieved through large aggregate with as many disks as possible/necessary Many small disks better than few large disks ONTAP 7.x is 32 bit system, suffers limits on aggregate size (16TB) Large databases may need to span Aggregates Use ONTAP 8.x (64 bit aggregates) 1 - 25 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle with NetApp SAN or NAS? SAN and NAS both supported FC, iSCSI, FCoE NFS SAN implies some need for RAID management ZFS managed LUN’s provisioned from NetApp? Suggest not to do this, ZFS suited to JBOD Or if must then focus on one or the other, do not try to use all features of both. Unnecessarily complicated. Oracle ASM managed LUNs is a good solution using ASM external redundancy. Not necessary to mirror when already highly redundant. 1 - 26 NFS on NetApp is a perfect solution, thin provisioned file system space, good performance, and easy management Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle using ZFS Backup & Recovery Integration Home grown self engineered solutions Can use snapshots and clones Replication of snapshots to Secondary DR/Backup location zfs send operation Fast and efficient 1 - 27 Recommend using granular objects for easy of management i.e. snap and send several small objects rather than single massive file system – may never succeed. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Provisioning Storage for Oracle using NetApp Backup & Recovery Integration Mature data management tools Application layer integration with backup snapshots Snap Manager for Oracle (SMO) offer hot backup integration for OS image copy backup. SMO Offers some integration with RMAN (Snapshot image copy cataloging) Supports DB cloning Supports snapshot management of ASM disk groups built upon NAS files or SAN devices. Storage layer replication with mature tools 1 - 28 Snapmirror (async/sync/semi-sync) Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Summary 1 - 29 ZFS is a very powerful file system with unsurpassed scalability, and many very interesting features. Deploying Oracle on ZFS requires a detailed knowledge of the demands placed by Oracle on the storage system, and of ZFS to meet the need configurationally. NetApp not so fully featured, but more mature environment. Simpler aggregation approach, although some severe size limits if restricted to modern large disks on ONTAP 7.3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Thank You 1 - 30 I hope you’ve found the subject of interest. Thank you for listening. Any questions? Copyright © 2013, Oracle and/or its affiliates. All rights reserved.