Datorarkitektur och operativsystem Lecture 7 1

advertisement
Datorarkitektur och operativsystem
Lecture 7
1
 It is ‘impossible’ to have memory that is both
 Unlimited ((large
g in capacity)
p y)
 And fast
 We create an illusion for the programmer
 Before that, let us that the way programs access
memoryy
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —2
§5.1 Intrroduction
Memory
 P
Programs access a smallll proportion off their
h
address space at any time
 Temporal locality
 Items accessed recently are likely to be accessed again
soon
 e.g., instructions in a loop
 Spatial locality
 Items near those accessed recently are likely to be
accessed soon
 E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —3
§5.1 Intrroduction
Principle of Locality
To Take Advantage of Locality
 Employ memory hierarchy
 Use multiple
p levels of memories
 ‘Larger’ distance from processor =>
• larger size
• larger access time
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —4
Memory Technology
 Static
S
RAM (SRAM)
 0.5ns – 2.5ns, $2000 – $5000 per GB
 Dynamic RAM (DRAM)
 50ns – 70ns,
70ns $20 – $75 per GB
 Magnetic disk
 5ms
5 – 20ms,
20
$0
$0.20
20 – $2 per GB
 Ideal memory
 Access time of SRAM
 Capacity
p y and cost/GB of disk
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —5
Memory
y hierarchy
y
FIGURE 5.1 The basic structure of a memory hierarchy. By implementing the memory system as a hierarchy, the user has the illusion of a memory that is as large as the largest level of the hierarchy, but can be accessed as if it were all built from the fastest memory. Flash memory has replaced disks in many embedded devices, and may lead to a new level in the storage hierarchy for desktop and server computers; see Section 6.4. Copyright © 2009 Elsevier, Inc. All rights reserved.
6
6
Taking Advantage of Locality
 Memory hierarchy
 Store everything on disk
 Copy recently accessed (and nearby) items
f
from
d
diskk to smaller
ll DRAM memory
 Main memory
y
 Copy more recently accessed (and nearby)
items from DRAM to smaller SRAM memory
 Cache memory attached to CPU
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —7
Memory Hierarchy Levels
 Block (aka line): unit of copying
 M
May bbe multiple
lti l words
d
 If accessed data is present in upper
level
 Hit: access satisfied by upper level
• Hit ratio: hits/accesses
 If accessed data is absent
 Miss: block copied from lower level
• Time taken: miss penalty
• Miss ratio: misses/accesses
= 1 – hit ratio
 Then accessed data supplied from upper
level
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8
This structure, with the appropriate operating mechanisms, allows the processor to have an access time that is determined primarily by level 1 of the hierarchy and yet have a memory as large as level n. Although the local disk is normally the bottom of the hierarchy, some systems use tape or a file server over a local area network as the next levels of the hierarchy.  Cache memory
 The level of the memory hierarchy closest to the
CPU
 Given accesses X1, …, Xn–1, Xn


Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10
How do we know if the data is present?
Where do we look?
§5.2 Thee Basics off Caches
Cache Memory
Direct Mapped Cache
 Location determined by address
 Direct mapped: only one choice
 (Block address) modulo (#Blocks in cache)
(Block address) modulo (#Blocks in cache)
Direct Mapped Cache

If #Blocks is a power of 2
If
#Blocks is a power of 2
 Use low‐order address bits to compute dd
address
Tag Bits
 Each cache location can store the contents of
more than one memoryy location
 How do we know which particular block is
stored in a cache location?
 Add a set of tag bits to the cache
 Tag needs only need the high-order bits of the
address
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14
Valid Bits
 What if there is no data in a location?
 Valid bit: 1 = p
present,, 0 = not ppresent
 Initially 0 because when the processor starts up,
the cache does not have any valid data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 15
Cache Example
 8-blocks, 1 word/block, direct mapped
 Initial state
Index
V
000
N
001
N
010
N
011
N
100
N
101
N
110
N
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 16
Tag
Data
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
22
10 110
Miss
110
Index
V
000
N
001
N
010
N
011
N
100
N
101
N
110
Y
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17
Tag
Data
10
Mem[10110]
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
26
11 010
Miss
010
Index
V
000
N
001
N
010
Y
011
N
100
N
101
N
110
Y
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18
Tag
Data
11
Mem[11010]
10
Mem[10110]
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
22
10 110
Hit
110
26
11 010
Hit
010
Index
V
000
N
001
N
010
Y
011
N
100
N
101
N
110
Y
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19
Tag
Data
11
Mem[11010]
10
Mem[10110]
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
16
10 000
Miss
000
3
00 011
Miss
011
16
10 000
Hit
000
Index
V
Tag
Data
000
Y
10
M [10000]
Mem[10000]
001
N
010
Y
11
Mem[11010]
011
Y
00
Mem[00011]
100
N
101
N
110
Y
10
Mem[10110]
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20
Cache Example
Word addr
Binary addr
Hit/miss
Cache block
18
10 010
Miss
010
Index
V
Tag
Data
000
Y
10
M [10000]
Mem[10000]
001
N
010
Y
10
Mem[10010]
011
Y
00
Mem[00011]
100
N
101
N
110
Y
10
Mem[10110]
111
N
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21
Cache Misses
 On cache hit, CPU proceeds normally
 On cache miss
 Stall the CPU pipeline
 Fetch
F h block
bl k from
f
next level
l l off hierarchy
h
h
 Instruction cache miss
• Restart instruction fetch
 Data cache miss
• Complete data access
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22
Write Through
Write-Through
 O
On eachh data-write
d
h could
hit,
ld just update
d
the
h block
bl k in
cache
 But
B then
h cache
h and
d memory would
ld be
b inconsistent
i
i
 Write through: also update memory
 But
B makes
k writes take
k longer
l
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23
Write Buffer
Write-Buffer
 Solution:
S l
write buffer
b ff
 Holds data waiting to be written to memory
 CPU continues
i
i
immediately
di l after
f writing
i i to write-buffer
i b ff
• Write-buffer is freed later when memory write is completed
 But CPU stalls on write if write buffer is already full
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24
Write Buffer
Write-Buffer
 Write-buffer can become full and the
processor will stall if
p
 Rate of memory completing the write operation is
less than the rate at which write instructions are
generated
 Or if there is a burst of writes
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25
Write Back
Write-Back
 Alternative: On data-write hit, just update the
block in cache
 Keep track of whether each block is dirty
 When a dirty block is replaced
 Write it back to memory
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26
 Components of CPU time
 P
Program execution
ti cycles
l
• Includes cache hit time
 Memory stall cycles
• Mainly from cache misses
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27
§5.3 Measuring and Improvving Cache Perform
mance
Measuring
g Cache Performance
 Components of CPU time
 P
Program execution
ti cycles
l
• Includes cache hit time
 Memory stall cycles
• Mainly from cache misses
 With simplifying assumptions:
Memory stall cycles
Memory accesses

 Miss rate  Miss penalty
Program

Instructions
Misses

 Miss penalty
Program
Instruction
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 28
§5.3 Measuring and Improvving Cache Perform
mance
Measuring
g Cache Performance
Cache Performance Example
 Given
Gi





I-cache miss rate = 2%
D
D-cache
h miss rate = 4%
Miss penalty = 100 cycles
B
Base
CPI (ideal
( d l cache)
h )=2
Load & stores are 36% of instructions
 H
How muchh faster
f
would
ld be
b a processor withh
a perfect cache that never misses ?
Cache Performance Example
 Mi
Miss cycles
l for
f allll instructions
i
i
if I is
i the
h
instruction count
 II-cache:
h I × 0.02
0 02 × 100 = 2 I
 D-cache: I × 0.36 × 0.04 × 100 = 1.44 I
 Mi
Miss cycle
l per instruction
i
i is
i 2 + 1.44
1 44
 Total cycles per instruction is 2 + 2 + 1.44
 So, actual CPI = 2 + 2 + 1.44 = 5.44
 Ideal CPU is 5.44/2 =2.72 times faster
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 30
 VM is the technique of using main memory as
a “cache” for secondary (disk) storage
 Managed jointly by CPU hardware and the
operating system (OS)
 Same underlying concept as in cache but different
in terminologies
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 31
§5.4 Virttual Memory
Virtual Memory
 Virtual address is the address produced by the
program
 Physical
y
address is an address in the main
memory
 CPU and OS translate virtual addresses to
physical addresses
 VM “block” is called a page
 VM translation “miss” is called a page fault
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32
§5.4 Virttual Memory
VM Terms
Page Fault Penalty
 On page fault,
fault the page must be fetched from
disk
 Takes
T k millions
illi
off clock
l k cycles.
l M
Main
i memory
latency is around 100,000 times better than the
disk latency
 Try to minimize page fault rate
 Smart
S
replacement algorithms implemented in
software in the OS
•R
Reading
di from
f
disk
di k iis slow
l
enough
h and
d software
f
overhead is negligible
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33
 Multiple programs share main memory and
they can change dynamically
 To avoid writing into each other’s data, we would
like separate
p
address space
p
for each pprogram
g
 With VM, each g
gets a private
p
virtual address space
p
holding its frequently used code and data
 VM translates the virtual address into physical
address allowing protection from other programs
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 34
§5.4 Virttual Memory
Motivation 1
 A large program cannot fit into the main
memory
 VM automatically maps addresses into disk space if
the main memoryy is not sufficient
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35
§5.4 Virttual Memory
Motivation I1
Address Translation
 Address translation: the process by which the virtual address
is mapped to a physical address
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 36
Address Translation
Two components of virtual p
address
Page offset does not change and the number of bits determines the
the size of the page
size of the page
 The number of pages addressable with virtual address might be larger than
the number of pages addressable with the physical address which gives the
ill i off unbounded
illusion
b
d d amount off virtual
i
l memory
Translation Using a Page Table
Translation Using a Page Table
A page table, that resides on the memory, is used for address translation
 Each program has its own page table
Translation Using a Page Table
To indicate the location of the page table in the memory, a hardware register points to the start of the page table
 Each program has its own page table
Translation Using a Page Table
Note the use of a Valid bit
Page Tables
 Stores p
placement information
 Array of page table entries, indexed by virtual page
number
 Page table register in CPU points to page table in
physical memory
 If page is present in memory
 it stores the physical page number
 Plus other status bits
 If p
page
g is not present
p
 a page fault occurs and the OS is given control
 Next few slides, we recap
p some OS concepts
p
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 42
Recap: Process
 A process (a
( program in execution),
) has
h a
context defined by the values in its program
counter, registers, and page table
 If another p
process ppreempts
p a runningg
process, this context must be saved
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 43
The 5 State Process Model
Terminated
New
admit
d it
create
switch: schedule new job
Ready
exit
Running
switch: preempt
Blocked
44
Role of the OS
 A process (a
( program in execution),
) has
h a
context defined by the values in its program
counter, registers, and page table
 If another p
process ppreempts
p this pprocess,, this
context must be saved
 Rather than save the entire page table, only the
page table register is saved
 To restart the process in the ‘running’
running state,
state
the operating system reloads the context
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 45
R l off the
Role
h OS
 Th
The OS is responsible
bl for
f allocating
ll
the
h
physical memory and updating the page tables
 It maintains that virtual address of different
processes do not collide thus providing
protection
 Page fault is handled by OS
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 46
R l off the
Role
h OS
 Th
The OS creates a space on the
h disk
d k (not
(
main
memory) for all pages of a process when the
process is created
 called swap space
 OS also creates a record of where each virtual
address of the process is located in the disk
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 47
Mapping Pages to Storage
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 48
Page Fault
 Handled by
b OS
 If all pages in main memory are in use (it is full),
full)
the OS must choose a page to replace
 The replaced page must be written to the swap
space on the disk
 To reduce page fault rate,
 prefer
f least-recently
l
l used
d (LRU) replacement
l
 Predict that the page that was NOT used recently will
be NOT used in near future
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 49
Writes
 Disk
Di k writes
i take
k millions
illi
off cycles
l
 Write through is impractical because it takes
millions of cycles to write to the disk
• Building a write buffer is impractical
 VM uses write-back
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 50
Fast Translation Using a TLB
 Address translation would appear to require extra
memory references
 One to access the page table itself
 Then the actual memory access
 But access to page tables has good locality
 So use a fast cache of recently
y used translations
 Called a Translation Look-aside Buffer (TLB)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 51
Memory Protection
 VM allows
ll
diff
different processes to share
h
the
h
same main memory
 Need to protect against errant access
 Requires OS assistance
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 52
Memory Protection
 Hardware support for OS protection
 Support two modes
• Privileged supervisor mode (aka kernel mode) meaning that the
OS is running
• User mode
 Privileged instructions that only the OS can use
• Allow it to write to the supervisor bit, and page table pointer
 Allow mechanisms (e.g., special instructions) to switch
between supervisor mode and the user mode (e.g.,
(e g syscall
in MIPS)
 These features allow the OS to change page tables
while ppreventingg a user process
p
from changing
g g them
 Fast
F memories are small,
ll llarge memories are slow
l
 We really want fast, large memories 
 Caching
C hi gives
i
this
hi ill
illusion
i 
 Principle of locality
 Programs
P
use a smallll part off their
h memory space
frequently
 Memory hierarchy
 L1 cache  L2 cache  …  DRAM memory
 disk
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 54
§5.12 Co
oncluding Remarks
Concluding Remarks
Exam
 Questions will be in English
 You may answer in Swedish or English
 Dictionary is allowed
 Swedish to English
 English
g
to Swedish
 But, electronic dictionary is NOT allowed
55
E
Exam
Questions
Q
ti
 Most likely a mixture of problem oriented
questions and subjective
q
j
qquestions
 Problem oriented questions
 Like
Lik questions
i
iin homework
h
k
 Subjective
j
qquestions …
56
Subjective questions like …
•
•
•
•
57
What does the control unit do?
What is RISC? CISC?
Whyy are fix-length
g instructions ggood ?
What is principle of locality?
57
Exam Questions
 Questions from past year are in the course
website
 You may look at them for ‘some’ inspiration
 Note:
N t P
Pastt year had
h d different
diff
t instructor,
i t
t
different textbook, and different homeworks
k even if there
th
are some common
topics
58
From the Textbook
 Parts from : 5.1, 5.2, 5.3, 5.4
59
59
Download