Disk Arrays Presentation

advertisement
Disk Arrays
COEN 180
Large Storage Systems
Collection of disks to store large amount of
data.
 Performance advantage:

 Each
drive can satisfy only so many IO per
seconds.
 Data spread across more drives is more
accessible.
 JBOD: Just a Bunch Of Disks
Large Storage Systems

Principal difficulty: Reliability
 Data

needs to be stored redundantly:
Mirroring, Replication
Simple
 Expensive (double, triple, … storage costs)
 Good performance


Erasure correcting codes
Complex
 Save storage
 Moderate performance

Large Storage Systems

Mirrored Disks
 Used


by Tandem
1970 – 1997, bought by Compact
Nonstop architecture

 Data

Used redundancy (CPU, storage) for fail-over capacity
is replicated on both drives
Performance:


Writes as fast as single disk model
Reads: Slightly faster, since we can serve the read from the
drive with best expected service time.
Disk Performance Modeling Basics

Service Time:
 Time

to satisfy a request if system is otherwise idle.
Response Time:
 Time
to satisfy a request at a given system load.
 Response time = service time + waiting time

Utilization:
 Time
system is busy
Disk Performance Modeling Basics


M/M/1 queue single server
Assume Poisson arrival,
exponential service time





Arrival rate 
Service time S
Utilization U = S (Little’s law)
Response time R
R 20
S=1
15
hence
U=
10
5
0.2
0.4
Determine R by:

R = S + UR 
R= S/(1-U) = S/(1- S)


0.6
0.8

Disk Performance Modeling Basics
Need to determine service time of disk
request.
 Service time = seek time + latency +
transfer time
 Industrial (but wrong) determination:

 Seek
 Why?
time = time to travel one third of a disk.
Disk Performance Modeling Basics
Assume that head position is randomly on
any track.
 Assume that target track is another
random track.
 Given x [0,1], calculate

 D(x)
x.
= distance of random point in [0,1] from
Disk Performance Modeling Basics

Given x [0,1], calculate
 D(x)
= distance of random point in [0,1] from x.
1
D( x) 
0.5
 y  x dy
0.45
0
x

1

 ( x  y )dy  ( y  x)dy
0
x 2 (1  x) 2


2
2
1
 x2  x 
2
x
0.4
0.35
0.2
0.25
0.4
0.6
0.8
1
Disk Performance Modeling Basics

Now calculate the average distance from a
random point to a random point in [0,1]
1

D  D( x)dx
0
x 1
x
x
x
   
2 2  x 0
3
1
3
3
2
Disk Performance Modeling Basics


Is Average Seek Time = Seek Time for Average
Distance?
NO:
 Seek
Time is not linearly dependent on average seek
time.
 Seek Time consists




acceleration
cruising (if seek distance is long
braking
exact positioning
Disk Performance Modeling Basics
Is Average Seek Time = Seek Time for
Average Distance?
 Practical measurements suggests

 Seek
time depends on the seek distance
roughly as a square-root of distance
4
3
2
1
2
4
6
8
10
Disk Performance Modeling Basics

Rules of Thumb
 Keep
80%.
utilization of disks between 50% and
Disk Arrays

Dealing with reliability
 RAID

Redundant array of inexpensive (independent) disks
 RAID



Levels
RAID Level 0: JBOD (striping)
RAID Level 1: Mirroring
RAID Level 2:



Encodes symbols (bytes) with a Hamming code.
Stores a bit per symbol on different disk.
Not used in practice.
Disk Arrays

Dealing with reliability
 RAID

Levels
RAID Level 3:
Encodes symbols (bytes) with the simple parity code.
 Breaks a file up into n stripes. Calculates parity stripes.
 Stores all n + 1 stripes on n + 1 disks.

Disk Arrays

Data
Data
Data
Parity
Dealing with Reliability
 RAID

Levels
RAID Level 4
Maintains n data drives.
 Files are stored completely on one drive.
 Or perhaps in stripes if files become very large.
 Additional drive storing the byte-wise parity of the disk
arrays.

Disk Arrays

Level 4 RAID
 Uneven
load of parity drive and data drives
Disk Arrays

Dealing with Reliability
 RAID

Level 5
No dedicated parity disk
Data in blocks
 Blocks in parallel positions on disks form reliability stripe.
 One block in each reliability stripe is the parity of the
others.


No performance bottleneck
Disk Arrays

Dealing with Reliability
 RAID
Level 6
Like RAID Level 5, but every stripe has two parity
blocks
 Lower write performance
 2-failure resilience

 RAID

Level 7
Proprietary name for a RAID Level 3 with lots of
caching. (Marketing bogus)
Disk Arrays

Disk Array Operations
 Reads:
 Directly from data in RAID Level 3-6
 Writes:
 Large Writes:


Writes to all blocks in a single reliability stripe.
 Calculate parity from data and write it.
Small Writes:

Need to maintain parity.
 Option 1: Write data, then read all other blocks in the stripe
and recalculate parity.
 Option 2: Read old data, then overwrite it. Calculate the
difference (XOR) between old and new data. Then read
old parity, XOR it with the result of the previous operation
and overwrite with it the parity block.
Disk Arrays

Disk Array Operations
 Reconstruction

(RAID Level 4-5):
Systematically:
 Reconstruct only lost data.
 Read all surviving blocks in the reliability stripe.
 Calculate its parity. This is the lost data block.
 Write data block in place of parity.
 Out
of order reconstruction for data that is
being read.
Disk Arrays

Performance Analysis
 Assume
that read and write service times are the
same.



seek
latency
(transfer)
 Write
operation involves the read-modify operation.

About twice as long as read / write service time

seek
latency
transfer
two latencies
transfer




Disk Arrays

Performance Analysis
 Level 4 RAID
 Offered read load r
 Offered write load w
 n disks
 Utilization at data disk:
r S /(n – 1) + w 2S/(n – 1)
 Utilization at parity disk:
 w 2S
 Equal utilization only if
 r = 2(n – 2) w

Disk Arrays
Utilization
1
parity disk
0.8

Performance Analysis
 Level



4 RAID
Offered load .
Assume only small writes.
Assume read /write ratio of 
 Utilization

data disk
0.4
0.2
100
200
300
400
500
Offered Load (IO/sec)
 S/n
 Utilization

at data disk
0.6
at write disk
(1- )2 S
Parameters:
4+1 layout
70% reads
Service time 10 msec
Disk Arrays

Performance Analysis RAID Level 5



Offered load 
Read ratio 
n disks
 Read
Load
  S/n
 Write Load
 (1- ) 4S/n
 Every write leads to two read-modify-write ops.
Disk Arrays
Level 4 RAID vs Level 5 RAID

1
0.8
1
0.8
parity
drive
0.6
0.6
0.4
0.4
data drive
0.2
RAID
Level 5
Without parity disk
(JBOD)
0.2
100
200
300
400
500
100
200
Parameters:
4+1 layout
70% reads
Service time 10 msec
300
400
500
Disk Arrays

Performance
 Small
writes are expensive.
 Parity logging (Daniel Stodolsky, Garth Gibson, Mark Holland)

Write operation:
Read old data,
 Write new data,
 Send XOR to a parity log file.


Whenever parity log file becomes to big, process it
by updating parity information.
Disk Arrays

Reliability
 Accurately
given by the probability of failure at
every moment in time.
1
0.8
0.6
0.4
5
10
15
20
25
30
Disk Arrays

Reliability
 Often
given by Mean Time To Data Loss
 MTTDL
 Warning:

MTTDL numbers can be deceiving.
Red line is more reliable
during Design Life, but
has lower MTTDL
Disk Arrays

Use Markov Model to model system in
various states.
 States
describe system.
 Assumes constant rates of transitions.
 Transitions correspond to:
component failure
 component repair

Disk Arrays

One component system

Initial State
Failure State
(absorbing)
MTTDL = MTTF = 1/
Disk Arrays

Two component system without repair
2
Initial State:
2 components
working
2
1
1 component
working, one
failed

Failure State
(absorbing)
Disk Arrays

Two component system with repair
2
2
Initial State:
2 components
working

1
1 component
working, one
failed

Failure State
(absorbing)
Disk Arrays

How to calculate MTTF
 Start with original Markov model.
 Remove failure state.
 Replace transition(s) to failure state
with failure
transitions to initial state.
 This models a meta-system where we replace a failed
system immediately with a new one.
 Now calculate the steady-state solution of the
Markov model.

It typicallyhas become ergodic.
 Use
this to calculate the average rate of a failure
transition being taken. This gives the MTTF.
Disk Arrays

One component system
System in initial state all the time.
Failure transition taken at rate .

Initial State
“Loss rate” L = .
MTTDL = 1/L = 1/ 
Disk Arrays

Two component system without repair

2
2
Steady-state solution
Let x be the probability to be in state 2, y
the probability to be in state 1.
1
Then:
Inflow into state 2 = Outflow from state 2:
Initial State:
2 components
working
1 component
working, one
failed
2x = y
Total sum of probabilities is 1:
x+y = 1.
Disk Arrays

Two component system without repair

Steady-state solution
2x = y
2
2
x+y = 1.
1
Solution is:
x = 1/3, y = 2/3.
Initial State:
2 components
working
1 component
working, one
failed
Loss rate is L = (2/3).
MTTF = 1/L = 1.5 (1/ ).
(1.5 times better than before).
Disk Arrays

Two component system with repair
2x  (   ) y , x  y  1


2
2
Initial State:
2 components
working

1
1 component
working, one
failed
 
2
x
,y
3  
3  

22
L
3  

3   3 1

MTTF 
  
2
2
2
22
Disk Arrays

RAID Level 4/5 Reliability
n
n
n-1
(n-1)

Initial State:
n disks
n – 1 disks
Failure State
(absorbing)
Disk Arrays

RAID Level 6 Reliability
n
n
n-1

Initial State:
n disks
(n-1)
(n-2)
n-2
2
n – 1 disks
n – 2 disks
Failure State
(absorbing)
Disk Arrays

Sparing
 Create
more resilience by adding a hot spare.
 Failover to hot spare reconstructs and
replaces contents of the lost disk on spare
disk.
 Distributed sparing (Menon et al.):

Distribute the spare space throughout the disk
array.
Download