Memory Principle of Locality To Take Advantage of Locality

advertisement
 It is ‘impossible’ to have memory that is both
Datorarkitektur och operativsystem
 Unlimited ((large
g in capacity)
p y)
Lecture 7
 And fast
 We create an illusion for the programmer
 Before that, let us that the way programs access
memoryy
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —2
1
 P
Programs access a smallll proportion off their
h
address space at any time
 Temporal locality
 Items accessed recently are likely to be accessed again
soon
 e.g., instructions in a loop
§5.1 Intrroduction
Principle of Locality
To Take Advantage of Locality
 Employ memory hierarchy
 Use multiple
p levels of memories
 ‘Larger’ distance from processor =>
• larger size
• larger access time
 Spatial locality
 Items near those accessed recently are likely to be
accessed soon
 E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —3
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —4
§5.1 Intrroduction
Memory
Memory
y hierarchy
y
Memory Technology
 Static
S
RAM (SRAM)
 0.5ns – 2.5ns, $2000 – $5000 per GB
 Dynamic RAM (DRAM)
 50ns – 70ns,
70ns $20 – $75 per GB
 Magnetic disk
 5ms
5 – 20ms,
20
$0
$0.20
20 – $2 per GB
 Ideal memory
 Access time of SRAM
 Capacity
p y and cost/GB of disk
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —5
Taking Advantage of Locality
FIGURE 5.1 The basic structure of a memory hierarchy. By implementing the memory system as a hierarchy, the user has the illusion of a memory that is as large as the largest level of the hierarchy, but can be accessed as if it were all built from the fastest memory. Flash memory has replaced disks in many embedded devices, and may lead to a new level in the storage hierarchy for desktop and server computers; see Section 6.4. Copyright © 2009 Elsevier, Inc. All rights reserved.
6
6
Memory Hierarchy Levels
 Block (aka line): unit of copying
 Memory hierarchy
 Store everything on disk
 Copy recently accessed (and nearby) items
f
from
disk
d k to smaller
ll DRAM memory
 Main memory
y
 Copy more recently accessed (and nearby)
items from DRAM to smaller SRAM memory
 Cache memory attached to CPU
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —7
 May
M be
b multiple
lti l words
d
 If accessed data is present in upper
level
 Hit: access satisfied by upper level
• Hit ratio: hits/accesses
 If accessed data is absent
 Miss: block copied from lower level
• Time taken: miss penalty
• Miss ratio: misses/accesses
= 1 – hit ratio
 Then accessed data supplied from upper
level
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8
 Cache memory
 The level of the memory hierarchy closest to the
CPU
 Given accesses X1, …, Xn–1, Xn


This structure, with the appropriate operating mechanisms, allows the processor to have an access time that is determined primarily by level 1 of the hierarchy and yet have a memory as large as level n. Although the local disk is normally the bottom of the hierarchy, some systems use tape or a file server over a local area network as the next levels of the hierarchy. Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10
Direct Mapped Cache
 Location determined by address
 Direct mapped: only one choice
 (Block address) modulo (#Blocks in cache)
(Block address) modulo (#Blocks in cache)
How do we know if the data is present?
Where do we look?
§5.2 Thee Basics off Caches
Cache Memory
Direct Mapped Cache

If #Blocks is a power of 2
If
#Blocks is a power of 2
 Use low‐order address bits to compute address
dd
Tag Bits
 Each cache location can store the contents of
more than one memoryy location
 How do we know which particular block is
stored in a cache location?
 Add a set of tag bits to the cache
 Tag needs only need the high-order bits of the
address
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14
Valid Bits
 What if there is no data in a location?
 Valid bit: 1 = p
present,, 0 = not ppresent
 Initially 0 because when the processor starts up,
the cache does not have any valid data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 15
Cache Example
 8-blocks, 1 word/block, direct mapped
 Initial state
Index
V
000
N
001
N
010
N
011
N
100
N
101
N
110
N
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 16
Tag
Data
Cache Example
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
Word addr
Binary addr
Hit/miss
Cache block
22
10 110
Miss
110
26
11 010
Miss
010
Index
V
Index
V
000
N
Tag
000
N
001
N
001
N
010
N
010
Y
011
N
011
N
100
N
100
N
101
N
101
N
110
Y
110
Y
111
N
111
N
10
Data
Mem[10110]
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17
Tag
Data
11
Mem[11010]
10
Mem[10110]
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18
Cache Example
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
Word addr
Binary addr
Hit/miss
Cache block
22
10 110
Hit
110
16
10 000
Miss
000
26
11 010
Hit
010
3
00 011
Miss
011
16
10 000
Hit
000
Index
V
000
N
001
N
010
Y
011
100
101
N
110
Y
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19
Tag
Index
V
Tag
Data
000
Y
10
M [10000]
Mem[10000]
001
N
010
Y
11
Mem[11010]
N
011
Y
00
Mem[00011]
N
100
N
10
Mem[10110]
11
10
Data
Mem[11010]
Mem[10110]
101
N
110
Y
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20
Cache Example
Cache Misses
Word addr
Binary addr
Hit/miss
Cache block
18
10 010
Miss
010
 On cache hit, CPU proceeds normally
 On cache miss
V
Tag
Data
 Stall the CPU pipeline
000
Y
10
M [10000]
Mem[10000]
 Fetch
F h block
bl k from
f
next level
l l off hierarchy
h
h
001
N
010
Y
10
Mem[10010]
011
Y
00
Mem[00011]
 Instruction cache miss
• Restart instruction fetch
100
N
101
N
110
Y
111
N
Index
10
Mem[10110]
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21
 Data cache miss
• Complete data access
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22
Write Through
Write-Through
 O
On eachh data-write
d
h could
hit,
ld just update
d
the
h block
bl k in
cache
 But
B then
h cache
h and
d memory would
ld be
b inconsistent
i
i
 Write through: also update memory
 But
B makes
k writes take
k longer
l
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23
Write Buffer
Write-Buffer
 Solution:
S l
write buffer
b ff
 Holds data waiting to be written to memory
 CPU continues
i
i
immediately
di l after
f writing
i i to write-buffer
i b ff
• Write-buffer is freed later when memory write is completed
 But CPU stalls on write if write buffer is already full
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24
Write Buffer
Write-Buffer
Write Back
Write-Back
 Write-buffer can become full and the
processor will stall if
p
 Alternative: On data-write hit, just update the
block in cache
 Rate of memory completing the write operation is
 Keep track of whether each block is dirty
less than the rate at which write instructions are
generated
 Or if there is a burst of writes
 When a dirty block is replaced
 Write it back to memory
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25
 P
Program execution
ti cycles
l
• Includes cache hit time
 Memory stall cycles
• Mainly from cache misses
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27
Measuring
g Cache Performance
 Components of CPU time
 P
Program execution
ti cycles
l
• Includes cache hit time
 Memory stall cycles
• Mainly from cache misses
 With simplifying assumptions:
Memory stall cycles

Memory accesses
 Miss rate  Miss penalty
Program

Instructions
Misses

 Miss penalty
Program
Instruction
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 28
§5.3 Measuring and Improvving Cache Perform
mance
 Components of CPU time
§5.3 Measuring and Improvving Cache Perform
mance
Measuring
g Cache Performance
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26
Cache Performance Example
Cache Performance Example
 Given
Gi





 Mi
Miss cycles
l for
f allll instructions
i
i
if I is
i the
h
instruction count
I-cache miss rate = 2%
D
D-cache
h miss rate = 4%
Miss penalty = 100 cycles
B
Base
CPI (ideal
( d l cache)
h )=2
Load & stores are 36% of instructions
 II-cache:
h I × 0.02
0 02 × 100 = 2 I
 D-cache: I × 0.36 × 0.04 × 100 = 1.44 I
 Mi
Miss cycle
l per instruction
i
i is
i 2 + 1.44
1 44
 Total cycles per instruction is 2 + 2 + 1.44
 So, actual CPI = 2 + 2 + 1.44 = 5.44
 H
How muchh faster
f
would
ld be
b a processor withh
a perfect cache that never misses ?
 Ideal CPU is 5.44/2 =2.72 times faster
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 30
 Managed jointly by CPU hardware and the
VM Terms
 Virtual address is the address produced by the
program
operating system (OS)
 Same underlying concept as in cache but different
 Physical
y
address is an address in the main
memory
in terminologies
 CPU and OS translate virtual addresses to
physical addresses
 VM “block” is called a page
 VM translation “miss” is called a page fault
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 31
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32
§5.4 Virttual Memory
 VM is the technique of using main memory as
a “cache” for secondary (disk) storage
§5.4 Virttual Memory
Virtual Memory
Motivation 1
 Multiple programs share main memory and
they can change dynamically
 On page fault,
fault the page must be fetched from
disk
 Takes
T k millions
illi
off clock
l k cycles.
l M
Main
i memory
 To avoid writing into each other’s data, we would
latency is around 100,000 times better than the
disk latency
like separate
p
address space
p
for each pprogram
g
 Try to minimize page fault rate
 With VM, each g
gets a private
p
virtual address space
p
 Smart
S
replacement algorithms implemented in
holding its frequently used code and data
software in the OS
•R
Reading
di from
f
disk
di k is
i slow
l
enoughh and
d software
f
overhead is negligible
 VM translates the virtual address into physical
address allowing protection from other programs
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 34
 A large program cannot fit into the main
memory
§5.4 Virttual Memory
Motivation I1
Address Translation
 Address translation: the process by which the virtual address
is mapped to a physical address
 VM automatically maps addresses into disk space if
the main memoryy is not sufficient
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 36
§5.4 Virttual Memory
Page Fault Penalty
Address Translation
Two components of virtual p
address
Page offset does not change and the number of bits determines the size of the page
the size of the page
Translation Using a Page Table
 The number of pages addressable with virtual address might be larger than
the number of pages addressable with the physical address which gives the
ill i off unbounded
illusion
b
d d amount off virtual
i
l memory
Translation Using a Page Table
A page table, that resides on the memory, is used for address translation
 Each program has its own page table
Translation Using a Page Table
To indicate the location of the page table in the memory, a hardware register points to the start of the page table
 Each program has its own page table
Translation Using a Page Table
Page Tables
 Stores p
placement information
 Array of page table entries, indexed by virtual page
number
 Page table register in CPU points to page table in
physical memory
 If page is present in memory
Note the use of a Valid bit
 it stores the physical page number
 Plus other status bits
 If p
page
g is not present
p
 a page fault occurs and the OS is given control
 Next few slides, we recap
p some OS concepts
p
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 42
The 5 State Process Model
Recap: Process
 A process (a
( program in execution),
) has
h a
context defined by the values in its program
counter, registers, and page table
 If another pprocess preempts
p
p a runningg
process, this context must be saved
Terminated
New
admit
d it
create
switch: schedule new job
Ready
exit
Running
switch: preempt
Blocked
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 43
44
Role of the OS
 A process (a
( program in execution),
) has
h a
context defined by the values in its program
counter, registers, and page table
 If another pprocess preempts
p
p this process,
p
, this
context must be saved
 Rather than save the entire page table, only the
page table register is saved
R l off the
Role
h OS
 Th
The OS is responsible
bl for
f allocating
ll
the
h
physical memory and updating the page tables
 It maintains that virtual address of different
processes do not collide thus providing
protection
 To restart the process in the ‘running’
running state,
state
the operating system reloads the context
 Page fault is handled by OS
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 45
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 46
R l off the
Role
h OS
Mapping Pages to Storage
 Th
The OS creates a space on the
h disk
d k (not
(
main
memory) for all pages of a process when the
process is created
 called swap space
 OS also creates a record of where each virtual
address of the process is located in the disk
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 47
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 48
Page Fault
 Handled by
b OS
 If all pages in main memory are in use (it is full),
full)
the OS must choose a page to replace
 The replaced page must be written to the swap
space on the disk
Writes
 Disk
Di k writes
i take
k millions
illi
off cycles
l
 Write through is impractical because it takes
millions of cycles to write to the disk
• Building a write buffer is impractical
 VM uses write-back
 To reduce page fault rate,
 prefer
f least-recently
l
l used
d (LRU) replacement
l
 Predict that the page that was NOT used recently will
be NOT used in near future
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 49
Fast Translation Using a TLB
 Address translation would appear to require extra
memory references
 One to access the page table itself
 Then the actual memory access
 But access to page tables has good locality
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 50
Memory Protection
 VM allows
ll
diff
different processes to share
h
the
h
same main memory
 Need to protect against errant access
 Requires OS assistance
 So use a fast cache of recently
y used translations
 Called a Translation Look-aside Buffer (TLB)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 51
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 52
§5.12 Co
oncluding Remarks
Memory Protection
Concluding Remarks
 Hardware support for OS protection
 Fast
F memories are small,
ll llarge memories are slow
l
 Support two modes
• Privileged supervisor mode (aka kernel mode) meaning that the
OS is running
• User mode
 We really want fast, large memories 
 Caching
C hi gives
i
this
hi ill
illusion
i 
 Principle of locality
 Privileged instructions that only the OS can use
• Allow it to write to the supervisor bit, and page table pointer
 Programs
P
use a smallll part off their
h memory space
frequently
 Allow mechanisms (e.g., special instructions) to switch
 Memory hierarchy
between supervisor mode and the user mode (e.g.,
(e g syscall
in MIPS)
 L1 cache  L2 cache  …  DRAM memory
 disk
 These features allow the OS to change page tables
p
from changing
g g them
while ppreventingg a user process
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 54
Exam
E
Exam
Questions
Q
ti
 Questions will be in English
 You may answer in Swedish or English
 Dictionary is allowed
 Most likely a mixture of problem oriented
qquestions and subjective
j
qquestions
 Problem oriented questions
 Like
Lik questions
i
iin homework
h
k
 Swedish to English
 Subjective
j
qquestions …
 English
g
to Swedish
 But, electronic dictionary is NOT allowed
55
56
Subjective questions like …
•
•
•
•
Exam Questions
What does the control unit do?
What is RISC? CISC?
g instructions ggood ?
Whyy are fix-length
What is principle of locality?
 Questions from past year are in the course
website
 You may look at them for ‘some’ inspiration
 Note:
N t P
Pastt year had
h d different
diff
t instructor,
i t
t
different textbook, and different homeworks
k even if there
th
are some common
topics
57
57
From the Textbook
 Parts from : 5.1, 5.2, 5.3, 5.4
59
59
58
Download