slides - Utah Arch

advertisement
Efficient Virtual Memory for Big
Memory Servers
U Wisc and HP Labs
ISCA’13
Architecture Reading Club Summer'13
1
Key points
• Big memory workloads
 Memcached, databases, graph analysis
• Analysis shows
 TLB misses can account for upto 51% of execution time
 Rich features of Paged VM is not needed by most applications
• Proposal : Direct Segments
 Paged VM as usual where needed
 Segmentation where possible
• For big memory workloads – this eliminates 99% of data TLB
misses !
Architecture Reading Club Summer'13
2
Main Memory Mgmt Trends
• The amount of physical memory has gone from a few MBs to a
few GBs and then to several TBs now
• But at the same time the size of the DTLB has remained fairly
unchanged
 Pent III – 72 Pent IV – 64 Nehalem – 96 IvyBridge – 100
• Also workloads were nicer in the days-gone-by (higher locality)
• So higher memory cap + const TLB + misbehaving apps = more
TLB misses
Architecture Reading Club Summer'13
3
35
51.1
83.1 51.3
4KB
30
25
2MB
20
15
1GB
10
Direct
Segment
5
Architecture Reading Club Summer'13
PS
GU
NP
B:
CG
NP
B:
BT
yS
QL
M
ap
h5
00
m
em
ca
ch
ed
0
gr
Percentage of execu on cycles wasted
So how bad is it really ?
4
Main Features of Paged VM
Feature
Analysis
Verdict
Swapping
No swapping
Not required
Per Page Access Perms
99% of pages are read-write Overkill
Fragmentation mgmt.
Very little OS visible
fragmentation
Architecture Reading Club Summer'13
Per-page reallocation is not
important
5
Main Memory Allocation
Architecture Reading Club Summer'13
6
Paged VM – why is it needed ?
• Shared memory regions for Inter-Process-Communication
• Code regions protected by per-page R/W
• Copy on-write uses per-page R/W for lazy implementation of
fork.
• Guard pages at the end of thread-stacks.
VA
Paging Valuable
Code Constants
Paging Not Needed
*
Dynamically allocated
Heap region
Shared Memory
Mapped Files Stack
Guard Pages
Architecture Reading Club Summer'13
7
Direct Segments
• Hybrid Paged + Segmented memory (not one on top of the
other).
Architecture Reading Club Summer'13
8
Address Translation
Architecture Reading Club Summer'13
9
OS Support : Handling Physical Memory
• Setup Direct Segment registers




BASE = Start VA of Direct Segment
LIMIT = End VA of Direct Segment
OFFSET = BASE – Start PA of Direct Segment
Save and restore register values as part of process metadata on
context-switch
• Create contiguous physical memory region
 Reserve at startup – big memory apps are cognizant of memory
requirement at startup.
 Memory compaction – latency insignificant for long running jobs
Architecture Reading Club Summer'13
10
OS Support : Handling Virtual Memory
• Primary regions
 Abstraction presented to application
 Contiguous Virtual address space backed by Direct Segment
• What goes in the primary region
 Dynamically allocated R/W memory
 Application can indicate what it needs to put in primary region
• The size of the primary region is set to a very high value to
accommodate the whole of the physical memory if need be
 64-bit VA support 128TB of VM, so pretty much never running
out of VA space
Architecture Reading Club Summer'13
11
Evaluation
• Methodology
• Implement Primary Region in the kernel
• Find the number of TLB misses that would be served by the
non-existent direct segments
 x86 uses hardware page-table walker
 they trap all TLB misses by duping the system into believing that
the PTE residing in memory is invalid
 In the handler
• They touch the page with the faulting address
• Again mark the PTE to invalid
Architecture Reading Club Summer'13
12
35
51.1
83.1 51.3
4KB
30
25
2MB
20
15
0.01
10
~0
0.48
~0
0.01
1GB
0.49
Direct
Segment
5
GU
PS
NP
B:
CG
PB
:B
T
N
L
yS
Q
M
ca
ch
ed
m
em
h5
00
0
gr
ap
Percentage of execu on cycles wasted
Results
Architecture Reading Club Summer'13
13
Results
Architecture Reading Club Summer'13
14
Why not large pages ?
• Huge pages does not automatically scale
 New page size and/or more TLB entries
• TLBs dependent on access locality
• Fixed ISA-defined sparse page sizes
 e.g., 4KB, 2MB, 1GB
 Needs to be aligned at page size boundaries
• Multiple page sizes introduces TLB tradeoffs
 Fully associative vs. set-associative designs
Architecture Reading Club Summer'13
15
Virtual Memory Basics
Process 1
Virtual Address Space
Core
Physical Memory
Cache
TLB
Process 2
(Translation Lookaside Buffer)
Page Table
16
Architecture Reading Club Spring'13
16
Download