Introduction to Computer Systems Today Process Memory Image

advertisement
Carnegie Mellon
Carnegie Mellon
Today
„
Dynamic memory allocation
Introduction to Computer Systems
15‐213/18‐243, fall 2009
16th Lecture, Oct. 22th
Instructors:
Gregory Kesden and Markus Püschel
Carnegie Mellon
Process Memory Image
Carnegie Mellon
Why Dynamic Memory Allocation?
memory protected
from user code
kernel virtual memory
„
stack
%esp
Allocators request
additional heap memory
from the kernel using the sbrk() function:
Sizes of needed data structures may only be known at runtime
the “brk” ptr
run‐time heap (via malloc)
error = sbrk(amt_more)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
0
Carnegie Mellon
Malloc Package
Dynamic Memory Allocation
„
„
Memory allocator?
ƒ VM hardware and kernel allocate pages
ƒ Application objects are typically smaller
ƒ Allocator manages objects within pages Application
#include <stdlib.h>
„
void *malloc(size_t size)
ƒ Successful:
ƒ Returns a pointer to a memory block of at least size bytes
(typically) aligned to 8‐byte boundary
ƒ If size == 0, returns NULL
ƒ Unsuccessful: returns NULL (0) and sets errno
„
void free(void *p)
ƒ Returns the block pointed at by p to pool of available memory
ƒ p must come from a previous call to malloc() or realloc()
„
void *realloc(void *p, size_t size)
ƒ Changes size of block p and returns pointer to new block
ƒ Contents of new block unchanged up to min of old and new size
ƒ Old block has been free()'d (logically, if new != old)
Heap Memory
Explicit vs. Implicit Memory Allocator
ƒ Explicit: application allocates and frees space ƒ
ƒ
In C: malloc() and free()
In Java, ML, Lisp: garbage collection
Allocation
ƒ A memory allocator doles out memory blocks to application
ƒ A “block” is a contiguous range of bytes
ƒ
„
„
Dynamic Memory Allocator
ƒ Implicit: application allocates, but does not free space
„
Carnegie Mellon
of any size, in this context
Today: simple explicit memory allocation
Carnegie Mellon
Malloc Example
Carnegie Mellon
Assumptions Made in This Lecture
void foo(int n, int m) {
int i, *p;
„
Memory is word addressed (each word can hold a pointer)
/* allocate a block of n ints */
p = (int *)malloc(n * sizeof(int));
if (p == NULL) {
perror("malloc");
exit(0);
}
for (i=0; i<n; i++) p[i] = i;
Allocated block
(4 words)
/* add m bytes to end of p block */
if ((p = (int *)realloc(p, (n+m) * sizeof(int))) == NULL) {
perror("realloc");
exit(0);
}
for (i=n; i < n+m; i++) p[i] = i;
Free block
(3 words)
Free word
Allocated word
/* print new array */
for (i=0; i<n+m; i++)
printf("%d\n", p[i]);
free(p); /* return p to available memory pool */
}
Carnegie Mellon
Carnegie Mellon
Constraints
Allocation Example
„
Applications
ƒ Can issue arbitrary sequence of malloc() and free() requests
ƒ free() requests must be to a malloc()’d block
p1 = malloc(4)
p2 = malloc(5)
„
Allocators
p3 = malloc(6)
ƒ Can’t control number or size of allocated blocks
ƒ Must respond immediately to malloc() requests
free(p2)
ƒ Must allocate blocks from free memory
ƒ
ƒ
i.e., can’t reorder or buffer requests
i.e., can only place allocated blocks in free memory
ƒ Must align blocks so they satisfy all alignment requirements
p4 = malloc(2)
ƒ
8 byte alignment for GNU malloc (libc malloc) on Linux boxes
ƒ Can manipulate and modify only free memory
ƒ Can’t move the allocated blocks once they are malloc()’d
ƒ
i.e., compaction is not allowed
Carnegie Mellon
Performance Goal: Throughput
„
Given some sequence of malloc and free requests:
Carnegie Mellon
Performance Goal: Peak Memory Utilization
„
Given some sequence of malloc and free requests:
„
Def: Aggregate payload Pk
ƒ R0, R1, ..., Rk, ... , Rn‐1
ƒ R0, R1, ..., Rk, ... , Rn‐1
„
ƒ malloc(p) results in a block with a payload of p bytes
ƒ After request Rk has completed, the aggregate payload Pk is the sum of Goals: maximize throughput and peak memory utilization
ƒ These goals are often conflicting
„
Throughput:
currently allocated payloads
ƒ all malloc()’d stuff minus all free()’d stuff
„
5,000 malloc() calls and 5,000 free() calls in 10 seconds ƒ Throughput is 1,000 operations/second
ƒ How to do malloc() and free() in O(1)? What’s the problem?
ƒ
Def: Current heap size = Hk
ƒ Assume Hk is monotonically nondecreasing
ƒ Number of completed requests per unit time
ƒ Example:
ƒ
„
reminder: it grows when allocator uses sbrk()
Def: Peak memory utilization after k requests ƒ Uk = ( maxi<k Pi ) / Hk
Carnegie Mellon
Internal Fragmentation
Fragmentation
„
Carnegie Mellon
„
Poor memory utilization caused by fragmentation
ƒ internal fragmentation
ƒ external fragmentation
For a given block, internal fragmentation occurs if payload is smaller than block size
block
Internal fragmentation
Internal fragmentation
payload
„
Caused by ƒ overhead of maintaining heap data structures
ƒ padding for alignment purposes
ƒ explicit policy decisions „
Depends only on the pattern of previous requests
ƒ thus, easy to measure
(e.g., to return a big block to satisfy a small request)
Carnegie Mellon
External Fragmentation
„
Carnegie Mellon
Implementation Issues
Occurs when there is enough aggregate heap memory, but no single free block is large enough
„
How to know how much memory is being free()’d when it is given only a pointer (and no length)?
„
How to keep track of the free blocks?
„
What to do with extra space when allocating a block that is smaller than the free block it is placed in?
„
How to pick a block to use for allocation—many might fit?
„
How to reinsert a freed block into the heap?
p1 = malloc(4)
p2 = malloc(5)
p3 = malloc(6)
free(p2)
p4 = malloc(6)
„
Oops! (what would happen now?)
Depends on the pattern of future requests
ƒ Thus, difficult to measure
Carnegie Mellon
Knowing How Much to Free
„
Carnegie Mellon
Keeping Track of Free Blocks
„
Standard method
Method 1: Implicit list using length—links all blocks
ƒ Keep the length of a block in the word preceding the block
ƒ
This word is often called the header field or header
5
4
6
2
ƒ Requires an extra word for every allocated block
„
5
p0
p0 = malloc(4)
Method 2: Explicit list among the free blocks using pointers
5
„
4
6
2
Method 3: Segregated free list
ƒ Different free lists for different size classes
block size
free(p0)
data
„
Method 4: Blocks sorted by size
ƒ Can use a balanced tree (e.g. Red‐Black tree) with pointers within each free block, and the length used as a key
Carnegie Mellon
Implicit List: Finding a Free Block
Method 1: Implicit List
„
Carnegie Mellon
„
For each block we need: length, is‐allocated?
ƒ Could store this information in two words: wasteful!
„
p = start;
while ((p < end) &&
((*p & 1) ||
(*p <= len)))
p = p + (*p & -2);
Standard trick
ƒ If blocks are aligned, some low‐order address bits are always 0
ƒ Instead of storing an always‐0 bit, use it as a allocated/free flag
ƒ When reading size word, must mask out this bit
Format of
allocated and
free blocks
Next fit:
ƒ Like first‐fit, but search list starting where previous search finished
ƒ Should often be faster than first‐fit: avoids re‐scanning unhelpful blocks
ƒ Some research suggests that fragmentation is worse
„
Best fit:
ƒ Search the list, choose the best free block: fits, with fewest bytes left over
ƒ Keeps fragments small—usually helps fragmentation
ƒ Will typically run slower than first‐fit
size: block size
payload
not passed end
already allocated
too small
goto next block (word addressed)
„
a = 1: allocated block a = 0: free block
a
\\
\\
\\
\\
ƒ Can take linear time in total number of blocks (allocated and free)
ƒ In practice it can cause “splinters” at beginning of list
1 word
size
First fit:
ƒ Search list from beginning, choose first free block that fits: (Cost?)
payload: application data
(allocated blocks only)
optional
padding
Carnegie Mellon
Implicit List: Allocating in Free Block
Bit Fields
„
„
Carnegie Mellon
„
How to represent the Header: Masks and bitwise operators
Allocating in a free block: splitting
ƒ Since allocated space might be smaller than free space, we might want #define SIZEMASK
(~0x7)
#define PACK(size, alloc)
((size) | (alloc))
#define GET_SIZE(p)
((p)->size & SIZEMASK)
to split the block
4
4
6
2
p
Bit Fields
addblock(p, 4)
struct {
4
unsigned allocated:1;
4
2
4
2
unsigned size:31;
void addblock(ptr p, int len) {
int newsize = ((len + 1) >> 1) << 1;
int oldsize = *p & -2;
*p = newsize | 1;
if (newsize < oldsize)
*(p+newsize) = oldsize - newsize;
}
} Header;
Check your K&R: structures are not necessarily packed
// round up to even
// mask out low bit
// set new length
// set length in remaining
//
part of block
Carnegie Mellon
Implicit List: Coalescing
Implicit List: Freeing a Block
„
Carnegie Mellon
Simplest implementation:
„
Join (coalesce) with next/previous blocks, if they are free
ƒ Coalescing with next block
ƒ Need only clear the “allocated” flag
void free_block(ptr p) { *p = *p & -2 }
ƒ But can lead to “false fragmentation”
4
4
4
4
2
4
4
4
4
2
2
2
2
2
2
p
free(p)
malloc(5)
4
4
p
free(p)
2
Oops!
There is enough free space, but the allocator won’t be able to find it
4
void free_block(ptr p) {
*p = *p & -2;
next = p + *p;
if ((*next & 1) == 0)
*p = *p + *next;
}
6
// clear allocated flag
// find next block
// add to this block if
//
not allocated
ƒ But how do we coalesce with previous block?
logically
gone
Carnegie Mellon
Implicit List: Bidirectional Coalescing „
Carnegie Mellon
Constant Time Coalescing
Boundary tags [Knuth73]
ƒ Replicate size/allocated word at “bottom” (end) of free blocks
ƒ Allows us to traverse the “list” backwards, but requires extra space
ƒ Important and general technique!
4
4 4
4 6
Header
size
Format of
allocated and
free blocks
6 4
a
size
Case 2
Case 3
Case 4
allocated
free
free
allocated
free
allocated
free
a = 1: allocated block a = 0: free block
payload and
padding
Boundary tag
(footer)
block being
freed
4
Case 1
allocated
size: total block size
payload: application data
(allocated blocks only)
a
Carnegie Mellon
Constant Time Coalescing (Case 1)
Carnegie Mellon
Constant Time Coalescing (Case 2)
m1
1
m1
1
m1
1
m1
1
m1
1
m1
1
m1
1
m1
1
n
1
n
0
n
1
n+m2
0
n+m2
0
n
1
n
0
n
1
m2
1
m2
1
m2
0
m2
1
m2
1
m2
0
Carnegie Mellon
Constant Time Coalescing (Case 3)
m1
0
n+m1
m1
n
0
1
n
m2
1
1
n+m1
m2
m2
1
m2
0
Carnegie Mellon
Constant Time Coalescing (Case 4)
m1
0
m1
n
0
1
0
1
n
m2
1
0
1
m2
0
n+m1+m2
0
n+m1+m2
0
Carnegie Mellon
Summary of Key Allocator Policies
Disadvantages of Boundary Tags
„
Internal fragmentation
„
Can it be optimized?
Carnegie Mellon
„
Placement policy:
ƒ First‐fit, next‐fit, best‐fit, etc.
ƒ Trades off lower throughput for less fragmentation
ƒ Interesting observation: segregated free lists (next lecture) approximate a best fit placement policy without having to search
entire free list
ƒ Which blocks need the footer tag?
ƒ What does that mean?
„
Splitting policy:
ƒ When do we go ahead and split free blocks?
ƒ How much internal fragmentation are we willing to tolerate?
„
Coalescing policy:
ƒ Immediate coalescing: coalesce each time free() is called ƒ Deferred coalescing: try to improve performance of free() by deferring coalescing until needed. Examples:
ƒ Coalesce as you scan the free list for malloc()
ƒ Coalesce when the amount of external fragmentation reaches some threshold
Carnegie Mellon
Implicit Lists: Summary
„
Implementation: very simple
Allocate cost: „
Free cost: „
ƒ linear time worst case
ƒ constant time worst case
ƒ even with coalescing
„
Memory usage: ƒ will depend on placement policy
ƒ First‐fit, next‐fit or best‐fit
„
Not used in practice for malloc()/free() because of linear‐time allocation
ƒ used in many special purpose applications
„
However, the concepts of splitting and boundary tag coalescing are general to all allocators
Download