Lecture 29

advertisement
Lecture 29



Reminder: Homework 7 is due on Monday at
class time for Exam 2 review; no late work
accepted.
Reminder: Exam 2 is on Wednesday. Exam 2
review sheet is posted.
Questions?
Friday, March 23
CS 470 Operating Systems - Lecture 29
1
Outline

Disk systems

Disk scheduling

Disk management

RAID
Friday, March 23
CS 470 Operating Systems - Lecture 29
2
Disk Drives


A disk is viewed logically as a linear array of
blocks. How is it mapped onto a circular disk
drive?
A disk drive is one or more platters rotating on
a spindle. Each side of a platter has a head
that reads the data off that side of the platter.
Each platter side has concentric grooves called
tracks. The vertical extent of the same track
position on each platter is a cylinder. Each
track/cylinder is divided into sectors.
Friday, March 23
CS 470 Operating Systems - Lecture 29
3
Disk Drives
Friday, March 23
CS 470 Operating Systems - Lecture 29
4
Disk Drives


Generally, block numbers are mapped with
Block 0 at cylinder/track 0 (outermost groove),
head 0, sector 0. The next block is sector 1
until the track is full, then the next block is head
1, sector 0, etc., until the cylinder is full, then
the next block is cylinder/track 1, head, 0,
sector 0, and so forth.
Conceptually, it is possible for OS's to map
logical block numbers to <cyl, head, sector>
addresses, but this does not happen any more
with mapping handled by the disk controller.
Friday, March 23
CS 470 Operating Systems - Lecture 29
5
Disk Drives

One reason mapping is done in disk controller
is that disks have been getting larger. Density
has increased in three dimensions.




# sectors/track (higher rotation speed)
# tracks/platter (shorter seek separation)
# bits/space (vertical writes in groove)
Components of disk performance are


seek time: disk arm movement to correct cylinder
rotational delay (latency): wait for correct sector to
rotate under the head
Friday, March 23
CS 470 Operating Systems - Lecture 29
6
Disk Drives

Taken together, data access time is determined
by



Bandwidth (bytes transferred/unit time): buffer to
disk, buffer to host
Buffer size
Disk drives come in various speeds and sizes
optimized for various applications
Friday, March 23
CS 470 Operating Systems - Lecture 29
7
Disk Drives

Using Western Digital as a prototypical line
Disk drive
Application
Sizes
RPM
Cache
Buffer
to host
Notes, Street price
WD Caviar Blue
Standard, internal
desktop
80GB1TB
7200
32MB
SATA
6Gb/s
1TB ~$105
WD Caviar Black
Maximum speed,
internal desktop
500GB2TB
7200
64MB
SATA
6Gb/s
2TB ~$210
WD Caviar Green
Maximum capacity,
low power, internal
desktop
320GB3TB
variable
64MB
SATA
3Gb/s
3TB ~$200
WD VelociRaptor
Internal, enterprise
server
150600GB
10000
32MB
SATA
6Gb/s
600GB ~$270
WD Scorpio Blue
Standard, internal
laptop
80GB1TB
5200
8MB
SATA
3Gb/s
1TB ~$135
WD Scorpio Black
Maximum power,
internal laptop
160GB750GB
7200
16MB
SATA
3Gb/s
750GB ~$165
Friday, March 23
CS 470 Operating Systems - Lecture 29
8
Disk Drives
Disk drive
Application
Sizes
RPM
Cache
Buffer
to host
Notes, Street price
WD AV-25
24/7 surveillance
160500GB
5400
32MB
SATA
3Gb/s
MTBF 1 million hours,
500GB ~$90
WD My Book
Essential
External desktop
1-3TB
USB 3.0
5Gb/s
3TB ~$170
WD My Passport
Essential
External portable
500GB2TB
USB 3.0
5Gb/s
1TB ~$130
WD My Book Live
Duo
Networked
Personal Cloud
Storage
4-6TB
Ethernet
RAID 1/0 (2 drives in
box), 6TB ~$480

Toshiba makes a 240GB, 4200 RPM, 8MB
cache disk drive. Why would anyone want to
buy this small, slow drive?
Friday, March 23
CS 470 Operating Systems - Lecture 29
9
Disk Drives

What is the limit on the capacity of a disk drive
using conventional magnetic media?

Typical drives are ~250Gb/sq.in.

The Toshiba drive is ~344Gb/sq.in.

Current limit is ~500Gb/sq.in.


Theoretical limit is ~1Tb/sq.in., any smaller grains
and heat will change the magnetization of the bits
Seagate research into ways of packing more
bits. Theoretically up to 50Tb/sq.in.
Friday, March 23
CS 470 Operating Systems - Lecture 29
10
Disk Scheduling

As with all resources, can extract best performance if schedule disk accesses. Now mostly
done in the disk controller because:



Original IDE interface has maximum 16383
cylinders x 16 heads x 63 sectors = 8.4GB to be
reported. All disks do this now and the EIDE
interface was added to find the actual geometry
using LBA (linear block addressing).
Most disks map out defective sectors to spare ones.
# sectors/track is not constant. About 40% more
sectors on outer tracks than on inner tracks.
Friday, March 23
CS 470 Operating Systems - Lecture 29
11
Disk Scheduling



OS generally just makes requests to the
controller. The controller has a queue and a
scheduling algorithm to choose which request
is serviced next.
The algorithms are straightforward and have
similar properties to other scheduling
algorithms that we have studied.
OS's are now more concerned with disk
management. I.e., how to make a disk usable
to users.
Friday, March 23
CS 470 Operating Systems - Lecture 29
12
Formatting


Low-level, physical formatting is done at the
factory, but OS can do this, too.
File system formatting



Create a partition table that groups cylinders into a
virtual disk. Tools like fdisk, sfdisk,
PartitionMagic
Create file system. In Unix, makefs allocates
inodes (index blocks).
Create swap space.
Friday, March 23
CS 470 Operating Systems - Lecture 29
13
Boot Block


How does a computer find the OS to boot?
Cannot require that it be in a particular location
on a particular disk, since we can choose
between more than one.
Bootstrap loader is a program that loads OS's.
It could be stored in ROM, but then would be
hard to change. Usually very small loader is
stored in ROM that knows where the loader
program is in the boot block (aka MBR master boot record). Example loaders include
grub, lilo, the Windows loader,...
Friday, March 23
CS 470 Operating Systems - Lecture 29
14
Boot Block


Boot loaders know how to initialize the CPU
and bring up the file system. They are
configured to know where the OS program
code resides. E.g., grub knows they are in the
file system, usually in /boot.
Boot loader loads the kernel into memory, then
jumps to the first instruction of the OS. Then
the OS takes over.
Friday, March 23
CS 470 Operating Systems - Lecture 29
15
Bad Blocks



All disks have bad areas. The factory initially
maps out the blocks that would be been
allocated to these areas. (Too many of them
causes the disk to be rejected.)
Some disk controllers are "smart" (e.g., SCSI)
and automatically remap bad blocks when
encountered. Spare sectors are reserved on
each cylinder for this.
Other controllers rely on OS to inform. E.g. Win
marks FAT entries after chkdisk scan.
Friday, March 23
CS 470 Operating Systems - Lecture 29
16
Swap Space


Usage of swap space depends on memory
management algorithm and OS.
Some store entire program and data in swap
space for duration of execution. Others only
store the pages being used.
Friday, March 23
CS 470 Operating Systems - Lecture 29
17
Swap Space

Swap space issues include




file vs. disk partition - usually a raw partition with
dedicated manager for speed
single vs. multiple spaces
location - if single, usually in center of disk; multiple
only if multiple disks
size - running out means aborting processes, but
more real memory means less need to swap
Friday, March 23
CS 470 Operating Systems - Lecture 29
18
RAID


Disks have gotten physically smaller and much
cheaper. Want to combine multiple disks into
one system to increase read/write performance
and to improve reliability.
Initially, RAID was Redundant Arrays of
Inexpensive Disks focusing on providing large
amounts of storage cheaply. Now focus is on
reliability, so now RAID is Redundant Arrays of
Independent Disks.
Friday, March 23
CS 470 Operating Systems - Lecture 29
19
RAID



Reliability is characterized by mean time to
failure (MTF). E.g., 100,000 hours for a disk.
For an array of 100 disks, the MFT that some
disk will fail is 100000/100 = 1000 hours =
41.66 days(!). If only one copy of each piece of
data is stored, each failure is costly.
To solve this problem, introduce redundancy.
I.e., store extra information that can be used to
rebuild lost information.
Friday, March 23
CS 470 Operating Systems - Lecture 29
20
RAID


Simplest redundancy is to mirror a disk. I.e.,
create a duplicate. Every write goes to both
disks and a read can go to either one. The only
way to lose data is if the second disk fails
during the time to repair the first disk.
MTF for the system depends on the MTF of the
disks and the mean time to repair (MTR).
Friday, March 23
CS 470 Operating Systems - Lecture 29
21
RAID

If disk failures are independent and MTR is 10
hours, MTF (i.e., data loss) is
1000002/(2*10) hours
= 500x106 hours
= ~57,000 years(!)

Of course, many failures are not independent.
E.g., power failures, natural disasters,
manufacturing defects, etc.
Friday, March 23
CS 470 Operating Systems - Lecture 29
22
RAID


Performance is increased through parallelism.
E.g., for a mirrored disk, transfer rate is the
same as a single disk, but overall read rate
doubles.
Transfer rate can be improved by striping data
across multiple disks. E.g., if we have 8 disks,
can write one bit of each byte on each disk
simultaneously. Number of accesses per unit
time is the same, but each accesses reads 8
times as much data. Larger units such as
block striping is common.
Friday, March 23
CS 470 Operating Systems - Lecture 29
23
RAID Levels


Striping does not help with
reliability, and mirroring is
expensive. Various
schemes, called RAID
levels, provide both with
different tradeoffs.
RAID 0 is simple striping.
RAID 1 is simple mirroring.
Higher levels are more
complicated.
Friday, March 23
CS 470 Operating Systems - Lecture 29
24
RAID 0+1 and RAID 1+0



Can also combine
schemes.
RAID 0+1 is a
mirrored RAID 0
system.
RAID 1+0 is a
RAID 1 system
that is striped.
Friday, March 23
CS 470 Operating Systems - Lecture 29
25
Download