Konstruktion garantiert fehlerfreier Betriebssysteme

advertisement
Theory of Memory
W. Paul
Saarland University and DFKI
bmb+f Projekt Verisoft-XT
joint work with
Ulan Degebaev and Norbert Schirmer
Saarland University
why might his be important?
• Unites theories of
–
–
–
–
–
–
–
–
–
store buffers
interlocking
caches
cache coherence
out of order execution
X64 instruction set
address translation
optimized compilation
structured parallel C
semantics
• Explains why hypervisor
might run structured
parallel C
• VCC is supposed to
mirror structured parallel
C semantics
• thus VCC might be(come)
sound
Specifying Memory
x
M(x)
Store Buffer
memory M
sbuf(y)
r(j)
w(i)
Store Buffer
memory M
sbuf(y)
r(j)
w(i)
Caches
M
ca
Many Caches: Snooping
M
ca(1)
ca(p)
Many Caches
x.la
M
ca(1)
ca(p)
x.off
Many Caches
x.la
M
ca(1)
ca(p)
x.off
Many Caches
x.off
M
ca(1)
ca(p)
Overlapping Transactions
c
public (a)
b
a
c
c
Sequentially Consistent Memory
lemma 5
c
public (a)
b
a
c
c
Tomasulo Schedulers for OOO
IF
issue
reservation stations
funct.
units
CDB
ROB
WB
Two Memory Units
m
RS
MMU
RS
sbuf
funct.
units
LS
CDB
ROB
Single Processor OOO correctness
lemma 6
m
RS
MMU
RS
sbuf
funct.
units
LS
CDB
ROB
Multi Processor OOO implementation
m
RS
MMU
RS
sbuf
funct.
units
LS
CDB
data(i,j)
ROB
Multi Processor OOO correctness
lemma 7
m
RS
MMU
RS
sbuf
funct.
units
LS
CDB
data(i,j)
ROB
Multi Processor OOO correctness
lemma 7
m
RS
MMU
RS
sbuf
funct.
units
LS
CDB
data(i,j)
ROB
X64 architecture
• CPU core
mm
– R: user registers
– SR: system registers
ca
• CR3
– acc: access
– segmentation
sbuf
acc
mmu
• mmu: memory management
unit
– tlb: translation look aside
buffer
tlb
• memory system
acc
CR3
segmentation
core
R
– mm: main memory
– ca: cache
– sbuf: store buffer
segmentation off
lemma 8
mm
• 1 segment
• large as entire address
space
• segmentation invisible
ca
sbuf
acc
mmu
acc
tlb
CR3
segmentation
core
R
Bad news: cache state is visible
• CPU core
mm or devices
– acc: access
ca
sbuf
acc
mmu
acc
core
tlb
CR3
R
• acc.adr: address
• acc.r: rights (user,write,
exe)
• acc.data
• acc.mmode: memory
mode
– WB: write back
– WT: write through ...
– NC: no cache
Good News: no device, no NC mode
• acc.mmode: memory
mode
mm
ca
– WB: write back
– WT: write through ...
– NC: no cache not used
sbuf
acc
mmu
acc
core
tlb
CR3
R
Sequentially Consistent Physical Memory
lemma 9
• acc.mmode: memory
mode
PM
– WB: write back
– WT: write through ...
mix on same address
sbuf
acc
mmu
acc
core
tlb
CR3
R
• PM: sequentially
consistent physical
memory abstraction
– Proof: MOESI invariants
are maintained
Initialize page tables
• 1 processor
page
tables
PM
sbuf
– sbuf invisible
• operating mode: paging
disabled
– mmu invisible
acc
mmu
acc
core
tlb
CR3
R
• set up page table tree in
PM
Translated Linear Memory
page
tables
PM
sbuf
acc
mmu
acc
core
tlb
CR3
R
• many processors
• operating mode: paging
enabled
• keep tlb consistent
Translated Consistent Linear Memory
+ sbufs lemma 10
LM
page
tables
sbuf
acc
core
CR3
R
• many processors
• operating mode: paging
enabled
• keep tlb consistent
C0: Pascal with C syntax
configurations
• c = ( pr, rd, lms, hm,gm)
–
–
–
–
–
memory m
pr program rest
rd recursion depth
lms: [0: recursion depth]!{local memories}
hm: heap memory
gm: global memory
• subvariables
– (m,i)[17].gpr[3]
• value of pointers: subvariables !
va(c,(m,i))
ba(m,i)
size(m,i)
Parallel C
• c = ( pr, rd, lms, hm,gm)
–
–
–
–
–
memory m
pr program rest
rd recursion depth
lms: [0: recursion depth]!{local memories}
hm: heap memory
gm: global memory
• Share
– gm
– hm
• Interleave at small steps semantics steps
va(c,(m,i))
ba(m,i)
size(m,i)
Parallel C
• c = ( pr, rd, lms, hm,gm)
–
–
–
–
–
memory m
pr program rest
rd recursion depth
lms: [0: recursion depth]!{local memories}
hm: heap memory
gm: global memory
• Share
– gm
– hm
• Interleave at small steps semantics steps
• Problem:
– Processor interleaves instructions
of compiled programs code(p)
va(c,(m,i))
ba(m,i)
size(m,i)
simulation relation consis(c, alloc, d)
LM
alloc
(c,y)
y
alloc
(c,p)
p
Non optimizing compiler:
step by step simulation
Optimizing compiler:
simulation between IO-steps
IO-steps (1): volatile accesses
Volatiles Sequentially Consistent
lemma 11
Structured Parallel C
• Implement Locks using Volatiles
• IO-steps (2): lock release
• Run Processors alone on locked portions
of linear memory
• Lemma 1: sbufs invisible
• Lemma 10: Ordinary C code in linear memory
Summary
• Implement Locks using Volatiles
• IO-steps (2): lock release
• Run Processors alone on locked portions
of linear memory
• Lemma 1: sbufs invisible
• Lemma 10: Ordinary C code in linear memory
• Outlined correctness proof for implementation of
structured parallel C
– Initialisation
– compilation
Download