The Memory Hierarchy

advertisement
MEMORY HIERACHY & EXTERNAL
MEMORY
By Noordiana Kassim
The Memory Hierarchy
•
Topics
•
•
•
Storage technologies and trends
Locality of reference
Caching in the memory hierarchy
Random-Access Memory (RAM)
•
Key features
•
•
•
•
Static RAM (SRAM)
•
•
•
•
•
RAM is packaged as a chip
Basic storage unit is a cell (one bit per cell)
Multiple RAM chips form a memory
Each cell stores bit with a six-transistor circuit
Retains value indefinitely, as long as it is kept powered
Relatively insensitive to disturbances such as electrical noise
Faster and more expensive than DRAM
Dynamic RAM (DRAM)
•
•
•
•
Each cell stores bit with a capacitor and transistor
Value must be refreshed every 10-100 ms
Sensitive to disturbances
Slower and cheaper than SRAM
Non-Volatile RAM (NVRAM)
•
Key Feature: Keeps data when power lost
•
•
•
•
Several types
Most important is NAND flash
Ongoing R&D
NAND flash
•
•
Reading similar to DRAM (though somewhat slower)
Writing packed with restrictions:
•
•
•
•
•
Can’t change existing data
Must erase in large blocks (e.g., 64K)
Block dies after about 100K erases
Writing slower than reading (mostly due to erase cost)
Chips often packaged with Flash Translation Layer (FTL)
•
•
Spreads out writes (“wear leveling”)
Makes chip appear like disk drive
Typical Bus Structure Connecting CPU and
Memory
•
•
A bus is a collection of parallel wires that carry address,
data, and control signals
Buses are typically shared by multiple devices
CPU chip
register file
ALU
system bus
bus interface
I/O
bridge
memory bus
main
memory
Memory Read Transaction (1)
•
CPU places address A on memory bus
register file
%eax
Load operation: movl A, %eax
ALU
I/O bridge
bus interface
A
main memory
0
x
A
Memory Read Transaction (2)
•
Main memory reads A from memory bus,
retrieves word x, and places it on bus
register file
%eax
Load operation: movl A, %eax
ALU
I/O bridge
bus interface
x
main memory
0
x
A
Memory Read Transaction (3)
•
CPU reads word x from bus and copies it into
register %eax
register file
%eax
x
Load operation: movl A, %eax
ALU
I/O bridge
bus interface
main memory
0
x
A
Memory Write Transaction (1)
•
CPU places address A on bus; main memory reads it
and waits for corresponding data word to arrive
register file
%eax
y
Store operation: movl %eax, A
ALU
I/O bridge
bus interface
A
main memory
0
A
Memory Write Transaction (2)
CPU places data word y on bus
register file
%eax
y
Store operation: movl %eax, A
ALU
I/O bridge
bus interface
y
main memory
0
A
Memory Write Transaction (3)
•
Main memory reads data word y from bus and
stores it at address A
register file
%eax
y
Store operation: movl %eax, A
ALU
I/O bridge
bus interface
main memory
0
y
A
Disk Access Time
•
Average time to access some target sector approximated by :
•
•
Seek time (Tavg seek)
•
•
•
Time to position heads over cylinder containing target sector
Typical Tavg seek = 9 ms
Rotational latency (Tavg rotation)
•
•
•
Taccess = Tavg seek + Tavg rotation + Tavg transfer
Time waiting for first bit of target sector to pass under r/w head
Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min
Transfer time (Tavg transfer)
•
•
Time to read the bits in the target sector.
Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min
Disk Access Time Example
•
Given:
•
•
•
•
Derived:
•
•
•
•
Rotational rate = 7,200 RPM
Average seek time = 9 ms
Avg # sectors/track = 400
Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms
Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02
ms
Taccess = 9 ms + 4 ms + 0.02 ms
Important points:
•
•
•
Access time dominated by seek time and rotational latency
First bit in a sector is the most expensive, the rest are free
SRAM access time is about 4 ns/doubleword, DRAM about 60 ns
•
•
Disk is about 40,000 times slower than SRAM, and
2,500 times slower then DRAM
Logical Disk Blocks
•
•
•
Modern disks present a simpler abstract view of the complex sector
geometry:
• The set of available sectors is modeled as a sequence of b-sized
logical blocks (0, 1, 2, ...)
Mapping between logical blocks and actual (physical) sectors
• Maintained by hardware/firmware device called disk controller
• Converts requests for logical blocks into (surface,track,sector)
triples
Allows controller to set aside spare cylinders for each zone
• Accounts for the difference in “formatted capacity” and
“maximum capacity”
KEY CHARACTERISTICS OF
COMPUTER MEMORY SYSTEMS
•Number of words
•Number of bytes
•Word
•Block
•Sequential
•Direct
•Random
•Associative
Location
Capacity
Unit of
Transfer
Access
Method
•Access Time
•Cycle Time
•Transfer Rate
•Semiconductor
•Magnetic
•Optical
•Magneto-Optical
•Volatile/non
•Erasable/non
•Memory modules
Physical
Type
Physical
Characteristic
• Internal (e.g:
registers,cache, main
memory)
• External (e.g: disks,
tapes)
Performance
Organization
MEMORY HIERACHY
•
Design constraints on a computer’s memory can
be summed up by three (3) questions:
How much?
How fast?
CAPACITY
ACCESS
TIME
COST
How Expensive?
MEMORY HIERACHY
•
There is a trade-off among the three (3)
characteristics of memory : capacity, access time,
and cost that hold the following relationships.
•
•
•
Faster access time, greater cost per bit.
Greater capacity, smaller cost per bit.
Greater capacity, slower access time.
MEMORY HIERACHY DIAGRAM
Memory Hierarchies
•
Some fundamental and enduring properties of hardware
and software:
• Fast storage technologies cost more per byte and have
less capacity
• Gap between CPU and main memory speed is widening
• Well-written programs tend to exhibit good locality
•
These fundamental properties complement each other
beautifully
•
They suggest an approach for organizing memory and
storage systems known as a memory hierarchy
An Example Memory Hierarchy
Smaller,
faster,
and
costlier
(per byte)
storage
devices
Larger,
slower,
and
cheaper
(per byte)
storage
devices
L5:
L0:
registers
L1: on-chip L1
cache (SRAM)
L2:
L3:
L4:
off-chip L2
cache (SRAM)
CPU registers hold words retrieved
from L1 cache
L1 cache holds cache lines retrieved
from the L2 cache memory
main memory
(DRAM)
local secondary storage
(local disks)
remote secondary storage
(distributed file systems, Web servers)
L2 cache holds cache lines retrieved
from main memory
Main memory holds disk
blocks retrieved from local
disks
Local disks hold files
retrieved from disks on
remote network servers
Caches
•
•
•
Cache: Smaller, faster storage device that acts as staging area
for subset of data in a larger, slower device
Fundamental idea of a memory hierarchy:
• For each k, the faster, smaller device at level k serves as
cache for larger, slower device at level k+1
Why do memory hierarchies work?
• Programs tend to access data at level k more often than
they access data at level k+1
• Thus, storage at level k+1 can be slower, and thus larger
and cheaper per bit
• Net effect: Large pool of memory that costs as little as
the cheap storage near the bottom, but that serves data to
programs at ≈ rate of the fast storage near the top.
Cache memory
•
•
•
•
If the active portions of the program and
data are placed in a fast small memory, the
average memory access time can be
reduced,
Thus reducing the total execution time of
the program
Such a fast small memory is referred to as
cache memory
The cache is the fastest component in the
memory hierarchy and approaches the
speed of CPU component
Cache memory
•
When CPU needs to access memory, the
cache is examined
•
If the word is found in the cache, it is
read from the fast memory
•
If the word addressed by the CPU is not
found in the cache, the main memory is
accessed to read the word
Cache memory
When the CPU refers to memory and
finds the word in cache, it is said to
produce a hit
• Otherwise, it is a miss
•
The performance of cache memory is
frequently measured in terms of a
quantity called hit ratio
• Hit ratio = hit / (hit+miss)
•
Cache memory
•
•
•
The basic characteristic of cache memory is its fast access
time,
Therefore, very little or no time must be wasted when
searching the words in the cache
The transformation of data from main memory to cache
memory is referred to as a mapping process, there are
three types of mapping:
•
•
•
Associative mapping
Direct mapping
Set-associative mapping
Cache memory
•
To help understand the mapping procedure, we
have the following example:
Associative mapping
•
•
•
•
The fastest and most flexible cache organization
uses an associative memory
The associative memory stores both the address
and data of the memory word
This permits any location in cache to store any
word from main memory
The address value of 15 bits is shown as a fivedigit octal number and its corresponding 12-bit
word is shown as a four-digit octal number
Associative mapping
Associative mapping
•
•
•
•
A CPU address of 15 bits is places in the
argument register and the associative
memory us searched for a matching
address
If the address is found, the corresponding
12-bits data is read and sent to the CPU
If not, the main memory is accessed for
the word
If the cache is full, an address-data pair
must be displaced to make room for a
pair that is needed and not presently in
the cache
Direct Mapping
Associative memory is expensive
compared to RAM
• In general case, there are 2^k words in
cache memory and 2^n words in main
memory (in our case, k=9, n=15)
• The n bit memory address is divided
into two fields: k-bits for the index and
n-k bits for the tag field
•
Direct Mapping
Direct Mapping
Set-Associative Mapping
•
The disadvantage of direct mapping is that two
words with the same index in their address but
with different tag values cannot reside in cache
memory at the same time
•
Set-Associative Mapping is an improvement over
the direct-mapping in that each word of cache can
store two or more word of memory under the same
index address
Set-Associative Mapping
Set-Associative Mapping
In the slide, each index address refers to two
data words and their associated tags
• Each tag requires six bits and each data word
has 12 bits, so the word length is 2*(6+12) = 36
bits
•
General Caching Concepts
•
Types of cache misses:
•
•
•
Cold (compulsory) miss
• Cold misses occur because the cache is empty
Conflict miss
• Most caches limit blocks at level k to a small subset
(sometimes a singleton) of the block positions at level k+1
• E.g. block i at level k+1 must be placed in block (i mod 4) at
level k
• Conflict misses occur when the level k cache is large enough,
but multiple data objects all map to the same level k block
• E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every
time
Capacity miss
• Occurs when the set of active cache blocks (working set) is
larger than the cache
EXTERNAL MEMORY
Types of External Memory
•
Magnetic Disk
•
•
•
Optical
•
•
•
•
•
•
•
RAID (Redundant Array of Independent Disks)
Removable
CD-ROM
CD-Recordable (CD-R)
CD-R/W
DVD
DVD-R
DVD-RW
Magnetic Tape
Magnetic Disk



Coated with magnetizable material for read
and write purpose.
The substrat used to be aluminum.
Recently use glass.
 Better stiffness
 Greater shock/damage resistance
 Lower fly height
 Improved uniformity of surface helps to
reduced read-write errors
Magnetic Write and Read Mechanism
•


Head:
Fixed head

One read-write head per
track

Heads build into a fixed
ridged arm
Movable head

One read-write head per
surface

Build into a movable arm
When the track passes under the head, it generates a current
of the same polarity as the one already recorded.
Disk Data Layout

Contains:
1.
Tracks

2.
3.
Intertrack Gaps
Sectors

4.
Same with as the head.
Fixed-length (512
bytes) is commonly
used in industry.
InterSector Gaps
Gaps are there to minimize errors
due to misalignment of head or
interference of magnetic field.
Disk Layout Methods


CAV – Constant Angular Velocity
Multiple Zone Recording: to enhance density(capacity)
Characteristics



Movable Head or not
Removability
 Provides unlimited storage
capacity
 Easy data transfer between
systems
Multiple Platter
 Single or double sided.
Disk Performance
Parameters



Seek Time : time to position the head at the track
Rotational Delay : The time it takes for the begining of the
sector to reach the head
Transfer Time : time required for the transfer
T = b/rN
T = Transfer time
b = Number of bytes to be transfered
N = Number of bytes on a track
r = rotation speed in rev/sec

Units usually is in ms, and considered average case
RAID



Stand for Redundant Arrays of Independent Disks
 RAID is a set of physical disk drives viewed by the
perating system as a single logical drive
 Data are distributed across the physical drives of array in
ascheme known as striping, describes subseuently.
 Redundant disk capacity is used to store parity
information, which quarantees data recoverability in case
of a disk failure.
Uses Array Management Software
Level 0 ~ 6 and more, such as RAID 10 (a combination of
RAID 0 and RAID 1)
RAID Level 0






Not a true member of RAID family
No redundancy or fault tolerance
High transfer capacity for large
and small I/O data
It's there because it distrbites
datas across mutiple disks
No parity coculation is needed
Easy to implement
RAID Level 0

In a transaction environment, there may be hundreds of
I/O requests per second. A disk array can provide high
I/O executtion rates by balancing the I/O load across
mutiple disks.

Parallel processing

Any error is uncorrectable

One disk's failure will result in all data in an array being
lost
RAID Level 1

Redundancy is achieved by having a mirror disk

Insufficient use of space



Read request is really efficiency
(the one involves minimum seek
time plus rotational latency)
Write request could be done
parallelly (T = the larger one)
Recovery is really simple.
Just replace the broken disk
with a new one
Optical Storage CD-ROM
Originally for audio
• 650Mbytes giving over 70 minutes audio
• Polycarbonate coated with highly reflective coat,
usually aluminium
• Data stored as pits
• Read by reflecting laser
• Constant packing density
• Constant linear velocity
•
Random Access on CD-ROM
Difficult
• Move head to rough position
• Set correct speed
• Read address
• Adjust to required location
•
CD-ROM for & against
Large capacity (?)
• Easy to mass produce
• Removable
• Robust
•
Expensive for small runs
• Slow
• Read only
•
Other Optical Storage
•
CD-Recordable (CD-R)
•
•
•
•
WORM (Write once, read many)
Now affordable
Compatible with CD-ROM drives
CD-RW
•
•
•
•
Erasable
Getting cheaper
Mostly CD-ROM drive compatible
Phase change
• Material has two different reflectivities in different
phase states
DVD - what’s in a name?
•
Digital Video Disk ?
•
•
Used to indicate a player for movies
• Only plays video disks
Digital Versatile Disk ?
•
Used to indicate a computer drive
• Will read computer disks and play video disks
DVD - technology
Multi-layer
• Very high capacity (4.7G per layer)
• Full length movie on single disk
•
•
Using MPEG compression
CD vs DVD
DVD+R
•
The +R format pre-groove also uses a wobble frequency, but at
a much higher frequency 817kHz. Instead of pre-pits, the R+
formats convey the sector addressing information by frequency
modulation of the wobble frequency.
Magnetic Tape
Serial access
• Slow
• Very cheap
• Backup and archive
•
Download