Chapter 3
Presented by:
Anupam Mittal

Data protection: Concept of RAID and its
Components
Data Protection: RAID
-2
After completing this chapter, you will be able to:
 Describe what is RAID and the needs it addresses
 Describe the concepts upon which RAID is built
 Define and compare RAID levels
 Recommend the use of the common RAID levels
based on performance and availability
considerations
 Explain factors impacting disk drive performance
Data Protection: RAID
-3

Performance limitation of a single drive disk
drive
◦ Limited Capacity
◦ Limited access speed

An individual drive has a certain life expectancy
◦ Measured in MTBF
◦ Example - If the MTBF of a drive is 750,000 hours, and
there are 100 drives in the array, then the MTBF of the
array becomes 750,000 / 100, or 7,500 hours


RAID was introduced to mitigate this problem
RAID provides:
◦ Increase capacity
◦ Higher availability
◦ Increased performance
Data Protection: RAID
-4
RAID
Controller
Host
RAID Array
RAID Arrays
-5
Physical
Array
Logical
Array
RAID
Controller
Hard Disks
Host
RAID Array
Data Protection: RAID
-6

Hardware (usually a specialized disk
controller card)
◦ Controls all drives attached to it
◦ Array(s) appear to host operating system as a
regular disk drive
◦ Provided with administrative software

Software
◦ Runs as part of the operating system
◦ Performance is dependent on CPU workload
◦ Does not support all RAID levels
Data Protection: RAID
-7







0 Striped array with no fault tolerance
1 Disk mirroring
3 Parallel access array with dedicated parity
disk
4 Striped array with independent disks and a
dedicated parity disk
5 Striped array with independent disks and
distributed parity
6 Striped array with independent disks and
dual distributed parity
Nested RAID (i.e., 1 + 0, 0 + 1, etc.)
Data Protection: RAID
-8
RAID Redundancy: Parity
0
4
8
1
5
9
RAID
Controller
2
6
10
3
7
11
Host
0123
4567
8 9 10 11
Parity Disk
© 2008 EMC Corporation. All rights reserved.
RAID Arrays - 9
Parity Calculation
5 + 3 + 4 + 2 = 14
The middle drive fails:
5
Data
3
Data
4
Data
2
Data
5 + 3 + ? + 2 = 14
? = 14 – 5 – 3 – 2
?=4
14
Parity
RAID Array
© 2008 EMC Corporation. All rights reserved.
RAID Arrays - 10
Lecture 8, 9, 10
 Different RAID levels and their suitability for different
application environments: RAID 0, RAID 1
© 2008 EMC Corporation. All rights reserved.
RAID Arrays - 11
Stripes
Stripe 1
Strip 2
Strip 1
Strip 3
Strips
Stripe
Strip 1
Strip 2
Strip 3
Stripe 1
Stripe 2
Strips
Data Protection: RAID
12
0
1
5
9
2
6
10
RAID
Controller
3
7
11
Host
Data Protection: RAID
13
Block 0
1
RAID
Block 0
1
Controller
Host
Data Protection: RAID
14
RAID 1
Block 0
Block 2
Block 0
3
2
1
RAID
Controller
RAID 0
Block 1
Host
Block 3
Data Protection: RAID
15
RAID 1
Block 0
Block 0
Block 2
Block 2
RAID
Controller
Host
RAID 0
Block 1
Block 1
Block 3
Block 3
Data Protection: RAID
16
RAID 0
Block 1
Block 3
Block 2
0
RAID
Controller
RAID 1
Block 1
Host
Block 3
Data Protection: RAID
17
RAID 0
Block 0
Block 1
Block 2
Block 3
RAID
Controller
Host
RAID 1
Block 0
Block 1
Block 2
Block 3
Data Protection: RAID
18


Benefits are identical under normal
operations
Rebuild operations are very different
◦ RAID 1+0 uses a mirrored pair – only 1 disk is
rebuilt if a disk fails
◦ RAID 0+1 if a single drive fails, the entire stripe is
faulted
 RAID is 0+1 is a poorer solution and is less common
RAID Arrays
19
0
4
8
1
5
9
2
6
10
RAID
Controller
3
7
11
Host
0123
4567
8 9 10 11
Parity Disk
RAID Arrays
20
0
4
1
6 5
9
RAID
Controller
Host
The middle drive fails:
Parity calculation 4 + 6 + 1 + 7 = 18
4 + 6 + ? + 7 = 18
1
?
3
7 7
11
0123
4 518
67
? = 18 – 4 – 6 – 7
?=1
Parity Disk Data Protection: RAID
21
Block 0
3
2
1
Host
RAID0
Block
Controller
Block
Parity1
Generated
Block 2
Block 3
P0123
Data Protection: RAID
22
RAID 4 – Striping with Dedicated Parity Disk
Block 0
Block 4
Block 1
Block 5
Block 0
Parity
RAID0
Block
Generated
Controller
Block 2
Block 6
P0123
Block 3
Host
Block 7
P0123
P4567
© 2008 EMC Corporation. All rights reserved.
RAID Arrays - 23
Block 0
Block 4
Block 1
Block 5
Block 0
4
Block 2
Parity
RAID4
Block
0
Generated
Controller
Block 6
P4
05
16
27
3
Block 3
Host
P4567
P0123
Block 7
Data Protection: RAID
24



Two disk failures in a RAID set leads to data
unavailability and data loss in single-parity
schemes, such as RAID-3, 4, and 5
Increasing number of drives in an array and
increasing drive capacity leads to a higher
probability of two disks failing in a RAID set
RAID-6 protects against two disk failures by
maintaining two parities
◦ Horizontal parity which is the same as RAID-5
parity
◦ Diagonal parity is calculated by taking diagonal sets
of data blocks from the RAID set members

Even-Odd, and Reed-Solomon are two
commonly used algorithms for calculating
Data Protection: RAID
25

Hardware (usually a specialized disk
controller card)
◦ Controls all drives attached to it
◦ Performs all RAID-related functions, including
volume management
◦ Array(s) appear to the host operating system as a
regular disk drive
◦ Dedicated cache to improve performance
◦ Generally provides some type of administrative
software

Software
◦ Generally runs as part of the operating system
◦ Volume management performed by the server
RAID Arrays
26

Comparison of RAID Levels
Data Protection: RAID
27
RAID Comparison
RAID
0
1
3
5
Min
Disks
2
2
3
3
Storage
Efficiency %
100
50
(n-1)*100/n
where n=
number of
disks
(n-1)*100/n
where n=
number of
disks
6
4
(n-2)*100/n
where n=
number of
disks
1+0
and
0+1
4
50
Cost
Low
High
Moderate
Moderate
Read Performance
Write Performance
Very good for both
random and sequential
read
Very good
Good
Better than a single disk
Good
Slower than a single
disk, as every write must
be committed to two
disks
Good for random reads
and very good for
sequential reads
Poor to fair for small
random writes
Good for large,
sequential writes
Very good for random
reads
Good for sequential
reads
Fair for random write
Slower due to parity
overhead
Fair to good for
sequential writes
Moderate
but more
than RAID 5
Very good for random
reads
Good for sequential
reads
Good for small, random
writes
(has write penalty)
High
Very good
Good
-
Data Protection: RAID
28
RAID Controller
Ep new
Ep old
=
E4 old
-
+
E4 new
2 XOR
Ep new
Ep old
P0




D1
E4 old
D2
D3
E4 new
D4
Small (less than element size) write on RAID 3 & 5
Ep = E1 + E2 + E3 + E4 (XOR operations)
If parity is valid, then: Ep new = Ep old – E4 old + E4 new (XOR operations)
◦
2 disk reads and 2 disk writes
◦
◦
◦
Reading, calculating and writing parity segment introduces penalty to every write operation
Parity RAID penalty manifests due to slower cache flushes
Increased load in writes can cause contention and can cause slower read response times
Parity Vs Mirroring
Data Protection: RAID
29
RAID
Controller
Data Protection: RAID
31







What is a RAID array?
What benefits do RAID arrays provide?
What methods can be used to provide higher
data availability in a RAID array?
What is the primary difference between RAID
3 and RAID 5?
What is advantage of using RAID 6?
What is a hot spare?
Data Protection: RAID
33