RAID ARRAYS Redundant Array of Inexpensive Discs WHAT IS RAID ARRAYS? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) The various types of RAID are data storage schemes that divide and/or replicate data among multiple hard drives WHY USE RAID? Improved Reliability Improved Performance Fault Tolerance Improved Availability Higher Data Security KEY TERMS Mirroring - the copying of data to more than one disk Striping - the splitting of data across more than one disk Parity - a redundancy check that ensures that the data is protected without having to have a full set of duplicate drives. Duplexing - an extension of mirroring that is based on the same principle as that technique expect it goes one step further in that it also duplicates the hardware that controls the two hard drives (or sets of hard drives). RAID Arrays RAID - REDUNDANT ARRAY OF INDEPENDENT DISKS RAID Arrays RAID Controller Host RAID Array RAID COMPONENTS Physical Array RAID Controller Logical Array Host RAID Array RAID Arrays Logical Array DATA ORGANIZATION: STRIPS AND STRIPES RAID Arrays Stripe 1 Stripe 2 Stripe 3 Strips RAID LEVELS 0 Striped array with no fault tolerance 1 Disk mirroring 3 Parallel access array with dedicated parity disk 4 Striped array with independent disks and a dedicated parity disk 5 Striped array with independent disks and distributed parity 6 Striped array with independent disks and dual distributed parity Combinations of levels (I.e., 1 + 0, 0 + 1, etc.) RAID Arrays RAID 0 A striped set of at least two disks without parity The data is broken down into blocks and each block is written to a separate disk drive Best performance is achieved when data is striped across multiple controllers with only one drive per controller RAID 0 – STRIPED ARRAY WITH NO FAULT TOLERANCE Block 0 4 3 2 1 RAID Block 0 4 3 1 2 Controller Host RAID Arrays - 10 ADVANTAGES OF RAID 0 I/O performance is greatly improved by spreading the I/O load across many channels and drives No parity calculation overhead is involved Very simple design Easy to implement DISADVANTAGES OF RAID 0 Not a "True" RAID because it is NOT fault-tolerant The failure of just one drive will result in all data in an array being lost Should never be used in mission critical environments RAID 1 – DISK MIRRORING Block 0 1 RAID Block 0 1 Controller Host RAID Arrays - 13 RAID 1 ADVANTAGES High data availability and high I/O rate (small block size). Improves read performance twice the read transaction rate of single disks, same write transaction rate as single disks 100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk Simplest RAID storage subsystem design – easy to maintain RAID 1 DISADVANTAGES Expensive due to the extra capacity required to duplicate data. Overhead cost equals 100%, while usable storage capacity is 50%. May not support hot swap of failed disk when implemented with software. Use hardware implementation. RAID 0+1 – STRIPING AND MIRRORING RAID Arrays Block 0 3 2 1 Host RAID Block 0 3 2 1 Controller RAID 1+0 – MIRRORING AND STRIPING RAID Arrays Block 0 3 2 1 Host RAID Block 0 3 2 1 Controller RAID 0+1 VS. RAID 1+0 Benefits are identical under normal operations Rebuild operations are very different RAID is 0+1 is a poorer solution and is less common RAID Arrays RAID 1+0 uses a mirrored pair – only 1 disk is rebuilt if a disk fails RAID 0+1 if a single drive fails, the entire stripe is faulted RAID REDUNDANCY: PARITY 0 4 8 RAID Controller RAID Arrays 1 5 9 2 6 10 3 7 11 Host 0123 4567 8 9 10 11 - 19 Parity Disk PARITY CALCULATION 5 + 3 + 4 + 2 = 14 Data 3 Data 4 Data 2 Data 5 + 3 + ? + 2 = 14 ? = 14 – 5 – 3 – 2 ?=4 14 Parity RAID Array RAID Arrays The middle drive fails: 5 RAID 3 – PARALLEL TRANSFER WITH DEDICATED PARITY DISK RAID Arrays Block 0 3 2 1 Host RAID0 Block Controller Block Parity1 Generated Block 2 Block 3 P0123 RAID 4 – STRIPING WITH DEDICATED PARITY DISK Block 0 Block 4 Block 5 Block 0 Parity RAID0 Block Generated Controller Block 2 Block 6 P0123 Block 3 Host Block 7 P0123 P4567 RAID Arrays Block 1 RAID 5 – INDEPENDENT DISKS WITH DISTRIBUTED PARITY Block 0 Block 4 Block 5 Block 0 4 Parity RAID4 Block 0 Generated Controller Block 2 Block 6 P4 05 16 27 3 Block 3 Host P4567 P0123 Block 7 RAID Arrays Block 1 RAID 6 – DUAL PARITY RAID Two disk failures in a RAID set leads to data unavailability and data loss in single-parity schemes, such as RAID-3, 4, and 5 Increasing number of drives in an array and increasing drive capacity leads to a higher probability of two disks failing in a RAID set RAID-6 protects against two disk failures by maintaining two parities Even-Odd, and Reed-Solomon are two commonly used algorithms for calculating parity in RAID-6 RAID Arrays Horizontal parity which is the same as RAID-5 parity Diagonal parity is calculated by taking diagonal sets of data blocks from the RAID set members RAID IMPLEMENTATIONS Hardware (usually a specialized disk controller card) Software Generally runs as part of the operating system Volume management performed by the server Provides more flexibility for hardware, which can reduce the cost Performance is dependent on CPU load Has limited functionality RAID Arrays Controls all drives attached to it Performs all RAID-related functions, including volume management Array(s) appear to the host operating system as a regular disk drive Dedicated cache to improve performance Generally provides some type of administrative software HOT SPARES RAID Arrays RAID Controller HOT SWAP RAID Controller RAID Arrays RAID Controller CHECK YOUR KNOWLEDGE What is a RAID array? What benefits do RAID arrays provide? What methods can be used to provide higher data availability in a RAID array? What is the primary difference between RAID 3 and RAID 5? What is a hot spare? RAID Arrays