L6: Malloc Lab Writing a Dynamic Storage Allocator October 30, 2006 15-213

advertisement
15-213
“The course that gives CMU its Zip!”
L6: Malloc Lab
Writing a Dynamic Storage Allocator
October 30, 2006
Topics


Memory Allocator (Heap)
L6: Malloc Lab
Reminders

L6: Malloc Lab Due Nov 10, 2006
Section A (Donnie H Kim) recitation8.ppt (some slides from lecture notes)
L6: Malloc Lab
Things that matter in this lab:

Performance goal
 Maximizing throughput
 Maximizing memory utilization

Implementation Issues (Design Space)
 Free Block Organization
 Placement Policy
 Splitting
 Coalescing

–2–
And some advice
15-213, F’06
Some sort of useful
backgrounds
–3–
15-213, F’06
So what is memory allocation?
kernel virtual memory
memory invisible to
user code
stack
%esp
Memory mapped region for
shared libraries
Allocators request
additional heap memory
from the operating
system using the sbrk
function.
the “brk” ptr
run-time heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
–4–
0
15-213, F’06
Malloc Package
#include <stdlib.h>
void *malloc(size_t size)

If successful:
 Returns a pointer to a memory block of at least size bytes, (typically)
aligned to 8-byte boundary.
 If size == 0, returns NULL

If unsuccessful: returns NULL (0) and sets errno.
void free(void *p)


Returns the block pointed at by p to pool of available memory
p must come from a previous call to malloc or realloc.
void *realloc(void *p, size_t size)


–5–
Changes size of block p and returns pointer to new block.
Contents of new block unchanged up to min of old and new size.
15-213, F’06
Allocation Examples
p1 = malloc(4)
p2 = malloc(5)
p3 = malloc(6)
free(p2)
p4 = malloc(2)
–6–
15-213, F’06
Performance goals
Maximizing throughput (Temporal)

Defined as the number of requests that it completes per unit
time
Maximizing Memory Utilization (Spatial)

Defined as the ratio of the requested memory size and the
actual memory size used
There is a tension between maximizing throughput and utilization!
Find an appropriate balance between two goals!
Keep this in mind, we will come back to these issues
–7–
15-213, F’06
Implementation Issues
 Free Block Organization


How do we keep track of the free blocks?
How do we know how much memory to free just given a pointer?
 Placement Policy

How do we choose an appropriate free block?
 Splitting

What do we do with the extra space when allocating a structure that
is smaller than the free block it is placed in?
 Coalescing

How do we reinsert freed block?
p0
free(p0)
–8–
p1 = malloc(1)
15-213, F’06
Implementation Issues 1:
Free Block Organization
 Identifying which block is free or allocated

Available design choices of how to manage free blocks
 Implicit List
 Explicit List
 Segregated List

Header, Footer organization
 storing information about the block (size, allocated, freed)
–9–
15-213, F’06
Keeping Track of Free Blocks
Method 1: Implicit list using lengths -- links all blocks
5
4
6
2
Method 2: Explicit list among the free blocks using
pointers within the free blocks
5
4
6
2
Method 3: Segregated free list

Different free lists for different size classes
Method 4: Blocks sorted by size

– 10 –
Can use a balanced tree (e.g. Red-Black tree) with pointers
within each free block, and the length used as a key
15-213, F’06
Free Block Organization
 Free Block with header
1 word
size
Format of
allocated and
free blocks
a
a = 1: allocated block
a = 0: free block
size: block size
payload
payload: application data
(allocated blocks only)
optional
padding
– 11 –
15-213, F’06
Free Block Organization
 Free Block with Header and Footer
Header
Format of
allocated and
free blocks
Boundary tag
(footer)
– 12 –
size
a
payload and
padding
size
a = 1: allocated block
a = 0: free block
size: total block size
a
payload: application data
(allocated blocks only)
15-213, F’06
Implementation Issues 2:
Placement Policy
 “Placement Policy” choices

First Fit
 Search free list from the beginning and chose the first free
block

Next Fit
 Starts search where the previous search has left off

Best Fit
 Examine every free block to find the best free block
– 13 –
15-213, F’06
Implementation Issues 3:
Splitting
 “Splitting” Design choices

Using the entire free block
 Simple, fast
 Introduces internal fragmentation (good placement policy might
reduce this)

Splitting
 Split free block into two parts, when second part can be used
for other requests (reduces internal fragmentation)
– 14 –
p1 = malloc(1)
15-213, F’06
Implementation Issues 4:
Coalescing
 False Fragmentations
Free block chopped into small, unusable free blocks
Coalesce adjacent free blocks to get bigger free block

 Coalescing - Policy decision of when to perform coalescing

Immediate coalescing
 Merging any adjacent blocks each time a block is freed

Deferred coalescing
 Merging free blocks some time later
» Ex) when allocation request fails.

– 15 –
Trying “Bidirectional Immediate Coalescing” proposed by Donald
Knuth would be good enough for this lab
15-213, F’06
Performance goals
Maximizing throughput (Temporal)

Defined as the number of requests that it completes per unit
time
Maximizing Memory Utilization (Spatial)

Defined as the ratio of the requested memory size and the
actual memory size used
There is a tension between maximizing throughput and utilization!
Find an appropriate balance between two goals!
– 16 –
15-213, F’06
Performance goal (1) - Throughput
Throughput is mostly determined by time consumed to search
free block
How you keep track of your free block affects search time

Naïve allocator
 Never frees block, just extend the heap when you need a new block :
throughput is extremely fast, but…?

Implicit Free List
 The allocator can indirectly traverse the entire set of free blocks by
traversing all of the blocks in the heap, definitely slow.

Explicit Free List
 The allocator can directly traverse entire set of free blocks by
traversing all of the free blocks in the heap

Segregated Free List
 The allocator can directly traverse a particular free list to find an
appropriate free block
– 17 –
15-213, F’06
Performance goal (2) –
Memory Utilization
Poor memory utilization caused by fragmentation


Comes in two forms: internal and external fragmentation
Internal Fragmentation
 Based on previous requests
 Causes
» Allocator impose minimal size of block (depending on
allocator’s choice of block format)
» Satisfying alignment requirements

External Fragmenatation
 Based on future requests
 Aggregate free memory is enough, but no single free block is
large enough to handle the request
– 18 –
15-213, F’06
Internal Fragmentation
Internal fragmentation

For some block, internal fragmentation is the difference
between the block size and the payload size.
block
Internal
fragmentation


– 19 –
payload
Internal
fragmentation
Caused by overhead of maintaining heap data structures,
padding for alignment purposes, or explicit policy decisions
(e.g., not to split the block).
Depends only on the pattern of previous requests, and thus
is easy to measure.
15-213, F’06
External Fragmentation
Occurs when there is enough aggregate heap memory, but no single
free block is large enough
p1 = malloc(4)
p2 = malloc(5)
p3 = malloc(6)
free(p2)
p4 = malloc(6)
oops!
External fragmentation depends on the pattern of future requests, and
thus is difficult to measure.
– 20 –
15-213, F’06
The Malloc Lab
– 21 –
15-213, F’06
Assumptions
Assumptions made in Malloc Lab


Standard C library malloc always returns payload pointer
that is aligned to 8 bytes, so should yours
64-bit Architecture
 pointers are 8 bytes long!
 size_t is now 8 bytes (unsigned long)

But the requested size will be less than 4 bytes
 You may use 4 byte headers and footers and get away
Allocated block
(4 words)
– 22 –
Free block
(2 words)
Free word
Allocated word
15-213, F’06
Porting to 64-bit Machine
Porting the code in your CS:APP text book to 64-bit




sizeof(long) == 4 // 32-bit
sizeof(long) == 8 // 64-bit
The only significant difference is in the definitions of the GET
and PUT macros.
Changes (To keep our 32-bit header and footers)
 #define GET(p) (*(size_t *)(p)) // 32 bits
» #define GET(p) (*(unsigned int *)(p)) // 64 bits
 #define PUT(p, val) (*(size_t *)(p) = (val)) // 32 bits
» #define PUT(p, val) (*(unsigned int *)(p) = (val)) // 64 bits
 if ((long)(bp = mem_sbrk(size)) < 0)
» if ((int)(bp = mem_sbrk(size)) < 0)
– 23 –
15-213, F’06
Using MACROS – why?
#include <stdio.h>
#define GET8(p)
(*(unsigned long *)(p))
#define PUT8(p, val) (*(unsigned long *)(p) = (unsigned long)(val))
void test(void *p, void *pval){
unsigned long *newpval;
/* Reading and writing pointers the hard way */
*(unsigned long *)p = (unsigned long) pval;
newpval = (unsigned long *)(*(unsigned long *)p);
printf("pval=%p newpval=%p\n", pval, newpval);
/* Reading and writing pointers the easy way */
PUT8(p, pval);
newpval = (unsigned long *) GET8(p);
printf("pval=%p newpval=%p\n", pval, newpval);
}
int main() {
char *pval = (char *)0x99;
char buf[128];
test(&buf[0], pval);
– 24 –
return 0;
}
15-213, F’06
Approach Advice
1.
2.
Start with the implicit list implementation in your text book, and
understand every details of it
When you finish your implicit list, start thinking about your heap
checker

3.
Go on and start implementing explicit list with several placement
policies

4.
5.
– 25 –
Modulate, and save each of your placement policy for comparison
When you finish your explicit list, you would like to add more
checks in your heap checker, do this right away.
Now when you feel your explicit list is robust, move on to the
segregated free list.

6.
The more time you spend on this, the more time you will save later
We are looking for a good segregated free list implementation. You can
go further by trying other schemes such as balanced trees, but a solid
segregated free list implementation is good enough for a full credit
You can also try some tweaks on the given trace files
15-213, F’06
Heap Checker (10 pts)
Basic Checks Guidelines (5/10 pts)

Check Heap (while working on implicit list)
1. Check epilogue and prologue blocks
2. Block’s address alignment (8 bytes)
3. Heap boundaries
4. Check your blocks’ header and footer
 Size (minimum size , alignment)
 prev/next allocate/free bit consistency (explicit list)
 header and footer matching each other
5. Check your coalescing
» All blocks are coalesced correctly (no two consecutive
free blocks in the heap)
– 26 –
15-213, F’06
Heap Checker (10 pts)
Free List Checks Guidelines (5/10 pts)

Check Free List (while working on explicit free list)
1. All next/prev pointers are consistent (If A’s next pointer points
to B, B’s prev pointer should point to A)
2. All free list pointers points between mem_heap_lo() and
mem_heap_high()
3. Count free blocks by iterating every block, and traversing free
list by pointers, see if they match
4. Recommended to add more as you wish

Check Segregated Free List (segregated free list)
 All blocks in each list bucket fall within bucket size range
 Be creative
– 27 –
15-213, F’06
Style (10 pts)

It will be some of the most difficult and sophisticated code you have
written so far in your career.


Thing we are looking for:

Explain your high level design at front of your code (2 pts)

Each function should be prepared by a header comment (2 pts)

Comment properly inside each functions (2 pts)

Decompose into functions and use as few global variables as possible (2 pts)

Use macros, inline functions, C preprocessors wisely (2 pts)
Please try to write a clean code that is readable and self-explaining!



– 28 –
For you
For your Teaching Staff
And for world peace
15-213, F’06
Debugging Techniques
Guidelines for Debugging


Intensively testing your code even though it seems to work
is a good programming practice, try to learn the process
from this lab
You can print out all the information and monitor it
 Do this when you just started
 When the trace file is small

You can also print out error messages only when something
is wrong
 Printing and monitoring becomes painful when trace files are
huge
 Just print errors
– 29 –
15-213, F’06
Debugging Tips
Guidelines for using mdriver’s options

Use ./mdriver –c <file> option to run a particular trace file
just once, which only checks correctness
 ./mdriver runs your allocator multiple times to estimate the
throughput of your allocator by using k-best measurement
scheme (if you are interested, refer to ch 9 and mdriver source
code)

Use ./mdriver –v <level> option to set verbosity level
 It is sometimes useful to have layers of debugging depth
 Can also use #define, #ifdef, #if

– 30 –
Make sure to turn all checking routines off completely when
measuring performance – it does affect performance
15-213, F’06
More Hints?
Going further (beyond solid segregated list)


– 31 –
Before trying this, make sure your allocator is doing what
you intended, using heap/free list checkers
If you think you have implemented a solid segregated free
list, try focus on trace files that gives you less performance
results
15-213, F’06
More Hints?
Some possible tackle points

In malloc(), you have to adjust the requested size to meet
alignment requirements or minimum block size requirements
 It turns out that how you adjust size affects the performance of
some trace files
 And sometimes it is better to force your allocator to avoid
splitting the free block by using larger block than the request
size
» It will obviously increase internal fragmentation, but can
also increase throughput by avoiding repeated splitting and
coalescing


– 32 –
How large will you extend your heap, when you have to
extend your heap?
How do you classify each free list?
15-213, F’06
Questions ?
– 33 –
15-213, F’06
Download