**************g **h **i **j **k **l **m **n **o **p **q **r **s **t **u **v

advertisement
第五章
Memory system
1
主要記憶體
處理器
k-位元位址匯流排
MAR
n-位元資料匯流排
MDR
高達 2k 個可定址的位置
字組長度 = n 位元
控制線
( R / W , MFC, 等等)
圖5.1
記憶體到處理器的連線
2
Basic Computer Organization Revisited
Memory
I/O
Data
Processor
GeneralPurpose
Registers
MAR
Program
MDR
ALUs
Control
Logic
PC
3
Access time vs cycle time

Memory access time


Memory cycle time



A measurement of how quickly two back-to-back accesses of a
memory chip can be made
Cycle time > access time due to latency between
successive memory accesses
DRAM (For construct Main memory)



A measurement of single access
access time - 50 to 150 nanoseconds
require a pause (refresh) between back-to-back accesses
SRAM (For construct Cache memory)


access time - 10 nanoseconds
no pause between back-to-back accesses
4
word line
Bit line
b7
b7
b1
b1
b0
b0
W0
•
•
•
FF
A0
A2
位址
解碼器
•
•
•
A1
W1
FF
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
記憶體
基本單元
A3
須4+2+8=14條外
部連線
資料輸入/輸出線
圖5.2
•
•
•
W15
Sense / Write
電路
b7
Sense / Write
電路
b1
Sense / Write
電路
R/W
CS
b0
一個記憶體晶片中位元基本位元的組織
5
RAS: Row address strobe
5-位元列位址
W0
W1
32 * 32
5-位元
解碼器
記憶體基本單元
陣列
W31
Sense/Write
電路
10-位元位址
32-to-1
R/ W
輸出多工器與
輸入解多工器
CS
5-位元行位址
CAS: Column address strobe
圖5.3
資料輸入/輸出
1K*1 記憶體晶片的組織
6
SRAM

SRAM
 Static
Random Access Memory
 Read/write very fast
 Needs 6 transistors thus high cost and needs more
area
 Do not need to refresh
 Low power consumption
 Implementation technology

CMOS
 Construct
cache memory
7
b
b
Vsupply
T3
T4
T1
T2
X
Y
T5
T6
字組線
位元線
圖5.5
互補金屬氧化物半導體(CMOS)記憶體基本位元的範例
8
DRAM

DRAM
 Dynamic
Random Access Memory
 Needs 1 transistor and 1 capacitor
 Lower cost and compact
 Each bit must be refreshed periodically
 Implementation technology

CMOS
 Construct
Main Memory
9
位元線
字組線
T
C
圖5.6
單一電晶體的動態記憶體(RAM)基本位元
10
Asynchronous DRAM
RA S
列位址閂
A20 - 9  A 8 -
列解碼器
4096 * (512 * 8)
元素陣列
Sense / Write
電路
0
行位址閂
CA S
圖5.7
CS
R/ W
行解碼器
D7
D0
2M*8 動態記憶體晶片的內部組織
11
Fast Page Mode
conventional DRAM requires that a row
and column be sent for each access
 FPM works by sending the row address
just once for many accesses to memory in
locations near each other, improving
access time. That is Row address is
decoded once with varied Column address
decoded to access different bytes on the
same row.(見page 5-54範例5.1)

12
Extended Data Out (EDO) DRAM
EDO DRAM
 also called hyper page mode DRAM
 EDO memory has had its timing circuits
modified so one access to the memory
can begin before the last one has finished
(note: conventional DRAM needs some
delay between two consecutive accesses)

13
Synchronous DRAM(SDRAM)
更新
計數器
列位址閂
列解碼器
基本位元陣列
行位址閂
行解碼器
Read/Write
電路與閂
列行位址
Clock
RA S
CA S
R/ W
模式暫存器
與時序控制
資料輸入
暫存器
資料輸出
暫存器
CS
資料
圖5.8
同步動態隨機存取記憶體
14
SDRAM



Support burst operation
Auto Column Address increment, that is do not
need external CAS cycle time to select column
address
Interleaving memory
 contains
two banks of memory internally instead of
one
 This allows the second bank to be "precharging"
(RAS and CAS activation) while the first bank is
transferring data

Will replace older DRAM technologies
15
DDR SDRAM

Double data rate SDRAM




Standard SDRAM takes action only at rising edge of
clock
DDR II


Access data both as rising and falling edge of clock
Thus doubles the bandwidth of the memory by transfering data
twice per clock
running at 1/2 clock frequency of the I/O buffers
DDR : 100MHz driven clock -> 100MHz data buffers ->
DDR applied -> 200MHz final data frequency
DDR-II: 100MHz driven clock -> 200MHz data buffers ->
DDR applied -> 400MHz final data frequency
16
SIMM vs DIMM

SIMM
 Single
In-line Memory Modules
 30 pins (8 bit bus version)
 72 pins (wider bus, more address lines)

DIMM
 Dual
In-line Memory Modules
 168 pins
17
RAMBUS




RAMBUS Company
Make a single chip act more like a memory system than a memory
componet
Each chip has interleaved memory and high-speed interface
RDRAM (1st generation)



Drop RAS/CAS, replacing it with a bus that allows other accesses over
the bus between the sending of the address and return of the data.
Run at 300 MHz clock
DRDRAM (2nd generation)


Direct RDRAM
Separate row- and column-command buses instead of the conventional
multiplexing
 Run at 400 MHz clock

RIMM

16 RDRAM
18
Other memory
ROM
 PROM
 EPROM
 EEPROM
 Flash

 Low
power consumption
 Portable system such as PDA, mobile phone,
digital camera, MP3
19
處理器
Memory hierarchy
暫存器
大小遞增
速度遞增
每位元成本
遞增
主要快取 L1
次要快取 L2
主記憶體
磁碟次要
記憶體
圖5.13
記憶體的階層架構
20
Memory hierarchy




Level 1
 Registers
 <1KB
 0.25-0.5 ns
 20,000-100,000 MB/sec
 Managed by compiler
Level 2
 Cache
 <16MB
 0.5-25 ns
 5000-10000 MB/sec
 Managed by hardware
Level 3
 Main memory
 <16GB
 80-250 ns
 1000-5000 MB/sec
 Managed by OS
Level 4
 Disk storage
 >100GB
 5000000 ns
 20-150 MB/sec
 Managed by OS/operator
21
Cache Terms

Locality of reference







Cache block (cache line)
Replacement algorithm
Read/write hit/miss
Write-through
Write-back (copy-back)


Temporal
spatial
Dirty bit/modified bit
Valid bit

The valid bit is set every time a row is loaded into the cache
by a cache miss, and can only be reset by the flush line
22
Cache mapping functions
Direct mapping(直接映射)
 Fully associative mapping(完全關聯映射)
 Set associative mapping (集合關聯映射)

 N-way
associative mapping
23
•Direct mapping(直接映射)
主記憶體
Block 0
Block 1
tag
快取
Block 127
Block 0
Block 128
tag
Block 1
Block 129
tag
Block 127
Block 255
Block 256
Block 257
Block 4095
標籤
5
區塊
7
字組
4
圖5.15
主記憶體位址
直接映射的快取
24
Fully associative mapping(完全關聯映射)
主記憶體
Block 0
Block 1
快取
tag
Block 0
tag
Block 1
Block
tag
i
Block 127
Block 4095
標籤
12
圖5.16
字組
4
主記憶體位址
關聯式映射的快取
25
Set associative mapping (集合關聯映射)
主記憶體
Block 0
Block 1
快取
tag
Set 0
tag
tag
Set 1
tag
tag
Set 63
tag
Block 0
Block 63
Block 1
Block 64
Block 2
Block 65
Block 3
Block 127
Block 126
Block 128
Block 127
Block 129
Block 4095
T 標籤
6
圖5.17
集合
6
字組
4
主記憶體位址
每個集合有2個區塊的集合關聯式映射快取
26
Replacement algorithm

LRU
 Least
recently used
 最近最少使用到

Random
 隨機

First in First out (FIFO)
 最舊
27
68040 cache




4K Data cache
4K Instruction cache
Contains 64 set
Every set contains 4 blocks
 4-way





associative mapping
1 cache block contains 4 long words
1 valid bit for cache block
1 dirty bit for long word
Write-back/write-through
Random replacement
28
位址
0 0 0 0
0 0 0 0
000BF2
22 個位元
0 0 1 0
1 1 1 1
00
1 1 0 0
6 個位元
0 0 0 0
4 個位元
1 0 0 0
8
位元組
集合
No
1 0 0 0
0CA020
v
d
區塊 0
=?
標籤
M i ss = 1
v
d
d
v
d
d
區塊 1
H it = 0
標籤
集合
0
區塊 2
000BF2
v
d
d
區塊 3
=?
Yes
M i ss = 0
d
H it = 1
標籤
v
d
v
d
d
v
d
d
v
d
d
區塊 0
標籤
區塊 1
標籤
集合
63
區塊 2
標籤
區塊 3
d
圖 5.23
在 68040 微處理器中的資料快取組織
29
ARM710T cache




Only one cache for both data and instructions
4 KB cache
64 sets
1 set contains 4 blocks
 4-way



associative mapping
1 cache block contains 4 words(32bits)=16bytes
Write-through
Random replacement
30
Pentium III cache

L1 cache

16KB data cache



16KB instruction cache



2-way
No write strategy due to pure code
L2 cache




4-way
Write-back or write-through
512KB
4-way
Write-back or write-through
Coppermine

L2 cache built in CPU


256KB
8-way
31
Pentium 4 cache

L1 cache

8KB data cache




L2 cache






4-way
block contains 64 bytes
Write-through
within CPU
256KB
8-way
Block contains 128 bytes
Write-back
L3 cache

Server-based CPU
32
處理單元
L1 指令快取
L1 資料快取
匯流排介面單元
系統匯流排
快取匯流排
L2 快取
主記憶體
輸入/輸出
圖 5.24 在 Pentium III 處理器中的快取與外部連線
33
34
35
Memory Performance
Every memory module has address buffer
register (ABR) and data buffer register
(DBR)
 Single module continuous words
 Continuous module continuous words

 Interleaved
memory
 CPU reference to continuous memory
accesses multiple module concurrently (lower
bits select modules)
36
Caculate miss penalty



See p5-54 ~p5-59 examples
Tave= hC + (1-h)M ,
where h: hit rate, M: miss penalty, C: access time for
cache
Tave=h1C1+(1-h1)h2C2+(1-h1)(1-h2)M,







h1 hit rate for L1 cache
C1 access time for L1 cache
h2 hit rate for L2 cache
C2 access time for L2 cache
M access time for main memory
Note: if h1=h2=0.9 then miss penalty=(1-9)(1-.9)=1%
This means if we use two level cache with 0.9 hit rate then the
penalty for main memory will less than 1% memory access
37
Other methods to reduce miss
penalty

Write buffer (improvement for write-through)
 Built in CPU
 Write to write
buffer rather than to memory, thus CPU
doesn’t need to wait memory write

Prefetch
 Compiler
inserts prefetch instructions (via analyzing
codes)

Lockup-free
 Allowing
the data cache to continue to supply cache
hits during a miss
 Helpful for processor that supports out-of-order
completion (eg. Via Tomasulo’s Algorithm)
38
Virtual memory







Virtual address (logical address)
MMU (built in CPU)
Physical address
Page table (in Main Memory)
Page frame
Address translation
TLB



Cache built within CPU for holding translated address just used
Page fault
Replacement algorithm

LRU
39
處理器
虛擬位址
資料
MMU
實際位址
快取
資料
實際位址
主記憶體
DMA 傳送
磁碟儲存體
圖 5.26 虛擬記憶體組織
40
指向分頁表的起始位址
來自處理器的虛擬位址
分頁表基底暫存器
分頁表位址
虛擬分頁編號
位移
+
分頁表
指向分頁表中某個entry
指向實體分頁表的起始位址
控制位元
記憶體中
的分頁訊框
分頁訊框
位移
Valid bit
Dirty bit
主記憶體中的實際位址
Access right of the program to the page
圖5.27
虛擬記憶體位址轉譯
指向實體分頁表中的某個byte
41
Intel IA-32 Processor’s Memory management
42
Intel IA-32 Page Translation
The entries in the page directory point to page tables, and the entries in a page
table point to pages in physical memory. This paging method can be used to address up to 220
pages, which spans a linear address space of 232 bytes (4 GBytes).
43
To select the various table entries, the linear address is divided into
three sections:
• Page-directory entry—Bits 22 through 31 provide an offset to an entry
in the page directory. The selected entry provides the base physical
address of a page table.
• Page-table entry—Bits 12 through 21 of the linear address provide an
offset to an entry in the selected page table. This entry provides the
base physical address of a page in physical memory.
• Page offset—Bits 0 through 11 provides an offset to a physical
address in the page.
44
來自處理器的虛擬位址
虛擬分頁編號
位移
TLB
虛擬分頁編號
No
控制
位元
記憶體中
的分頁訊框
=?
Yes
Miss
Hit
分頁訊框
儲存CPU剛才用過的實體與虛擬位址對應表
位移
主記憶體中的實際位址
圖 5.28 關聯式映射 TLB 的使用
45
第 0 區, 第 1 軌
第 3 區, 第 n 軌
第 0 區, 第 0 軌
圖 5.30 硬碟的表面組織
46
Disk Access Time
Seek time
 Rotation time (latency time)
 Transfer time

47
處理器
主記憶體
系統匯流排
磁碟控制器
磁碟機
圖5.31
磁碟機
連接到系統匯流排的磁碟機
48
RAID
Redundant Array of Inexpensive Disk








RAID0 : data stripping, no redundancy,Level 0 stripes data at block level
RAID1 : mirroring (shadowing)
RAID01(RAID0+1): mirrored stripes
RAID2 :Error-Correcting Coding with hamming code
 Not a typical implementation and rarely used, Level 2 stripes data at the
bit level rather than the block level.
RAID3:Bit-Interleaved Parity
 Provides byte-level striping with a dedicated parity disk. Level 3, which
cannot service simultaneous multiple requests, also is rarely used.
RAID4:Dedicated Parity Drive.
 A commonly used implementation of RAID, Level 4 provides block-level
striping (like Level 0) with a parity disk. If a data disk fails, the parity
data is used to create a replacement disk. A disadvantage to Level 4 is
that the parity disk can create write bottlenecks.
RAID5:Block Interleaved Distributed Parity,
 Provides data striping at the byte level and also stripe error correction
information. This results in excellent performance and good fault
tolerance. Level 5 is one of the most popular implementations of RAID.
49
Compact Disc (CD)




CD-ROM 1X : 150KB/sec
CD-ROM 40X:150 x 40 = 6MB/sec
DVD (Digital Versatile Disk)
DVD + R


DVD+R/W


has some "better" features than DVD-R/W such as lossless linking and
both CAV and CLV writing.
DVD – R


is a non-rewritable format and it is compatible with about 89%of all DVD
Players and most DVD-ROMs
is a non-rewriteable format and it is compatible with about 93% of all
DVD Players and most DVD-ROMs.
DVD-R/W

was the first DVD recording format released that was compatible with
standalone DVD Players.
50
DVD Sizes

DVD-5, holds around 4 700 000 000 bytes and that is 4.37 computer
GB where 1 kbyte is 1024 bytes* . DVD+R/W and DVD-R/W
supports this format. Also called Single Sided Single Layered. This
is the most common DVD Media, often called 4.7 GB Media.
DVD-10, holds around 9 400 000 000 bytes and that is 8.75
computer GB. DVD+R/W and DVD-R/W supports this format. Also
called Double Sided Single Layered.
DVD-9, holds around 8 540 000 000 bytes and that is 7.95 computer
GB. DVD+R supports this format. Also called Single Sided Dual
Layered. This media is called DVD+R9, DVD+R DL or 8.5 GB
Media.
DVD-18, holds around 17 080 000 000 bytes and that is 15.9
computer GB. DVD+R supports this format. Also called Double
Sided Dual Layered.
51
DVD Write and read speeds

Single Layer(4.7GB) write speeds
1x (CLV) = about 58 minutes
2x (CLV) = about 29 minutes
2.4x (CLV) = about 24 minutes
4x (CLV) = about 14.5 minutes
6x (CLV/ZCLV) = about 10-12 minutes
8x (PCAV/ZCLV) = about 8-10 minutes
12x (PCAV/ZCLV) = about 6.5-7.5 minutes
16x (CAV/ZCLV) = about 6-7 minutes
Dual/Double Layer(8.5GB) write speeds
1x CLV = about 105 minutes
2.4x CLV = about 44 minutes
4x CLV = about 27 minutes
Single Layer (4.7GB) read speeds
6x CAV (avg. ~4x) read speed is max 7.93MB/s = ~14 minutes
8x CAV (avg. ~6x) read speed is max 10.57MB/s = ~10 minutes
12x CAV (avg. ~8x) read speed is max 15.85MB/s = ~7 minutes
16x CAV (avg. ~12x) read speed is max 21.13MB/s = ~5 minutes
52
檔案
檔案標記
檔案標記
•
•
•
•
•
•
•
•
檔案間隙
記錄
記錄間隙
記錄
7或9
位元
記錄間隙
圖 5.33 磁帶上的資料組織
53
Download