Programming Languages

advertisement
Memory Management
Chapter 11
1


Memory management: the process of
binding values to (logical) memory
locations.
The memory accessible to a program is its
address space, represented as a set of
values {0, 1, …, n}.
◦ The numbers represent memory locations.
◦ These are logical addresses – do not usually
correspond to physical addresses at runtime.

The exact organization of the address space
depends on the operating system and the
programming language being used.
2

Runtime memory management is an
important part of program meaning.
◦ The language run-time system creates & deletes
stack frames, creates & deletes dynamically
allocated heap objects – in cooperation with the
operating system

Whether done automatically (as in Java or
Python), or partially by the programmer (as in
C/C++), dynamic memory management is an
important part of programming language
design.
3


Static: storage requirements are known prior
to run time; lifetime is the entire program
execution
Run-time stack: memory associated with
active functions
◦ Structured as stack frames (activation records)

Heap: dynamically allocated storage; the least
organized and most dynamic storage area
4



Simplest type of memory to manage.
Consists of anything that can be completely
determined at compile time; e.g., global
variables, constants (perhaps), code.
Characteristics:
◦ Storage requirements known prior to execution
◦ Size of static storage area is constant throughout
execution
5




The stack is a contiguous memory region that
grows and shrinks as a program runs.
Its purpose: to support method calls
It grows (storage is allocated) when the
activation record (or stack frame) is pushed
on the stack at the time a method is called
(activated).
It shrinks when the method completes and
storage is de-allocated.
6


In block structured languages such as C or
C++, a new activation record will be created
if variables are declared in an enclosed
block
The activation record at the top of the stack
represents the current local scope.
◦ The static pointer points to the enclosing block,
which represents the non-local scope.
7


Non-local data is usually global data but if
there are nested blocks the static pointer of
an inner block would point to the outer
enclosing block or method.
Static pointers in the activation record are
the run-time equivalent to the stack of
symbol tables.
◦ If a variable is non-local, trace the static pointer
chain to find its location on the stack.
8



The stack frame has storage for local
variables, parameters, and return linkage.
The size and structure of a stack frame is
known at compile time, but actual contents
and time of allocation is unknown until
runtime.
How is variable lifetime connected to the
structure of the stack?
9


Heap objects are allocated/deallocated
dynamically as the program runs (not
associated with specific event such as
function entry/exit).
The kind of data found on the heap
depends on the language
◦ Strings, dynamic arrays, objects, and linked
structures are typically located here.
◦ Java and C/C++ have different policies.

Heap data is accessed from a variable on
the stack – either a pointer or a reference
variable.
10



Special operations (e.g., malloc, new) may
be needed to allocate heap storage.
When a program deallocates storage (free,
delete) the space is returned to the heap to
be re-used.
Space is allocated in variable sized blocks, so
deallocation may leave “holes” in the heap
(fragmentation).
◦ Compare to deallocation of stack storage
11

Some languages (e.g. C, C++) leave heap
storage deallocation to the programmer
◦ delete

Others (e.g., Java, Perl, Python, listprocessing languages) employ garbage
collection to reclaim unused heap space.
12
The Structure of Run-Time Memory
Figure 11.1
These two areas grow
towards each other as
program events
require.
13


The following relation must hold:
0≤a≤h≤n
In other words, if the stack top bumps into
the heap, or if the beginning of the heap is
greater than the end of memory, there are
problems!
14

For simplicity, we assume that memory words
in the heap have one of three states:
◦ Unused: not allocated to a program yet
◦ Undef: allocated, but not yet assigned a value by
the program
◦ Contains some actual value
15

new returns the start address of a block of k
words of unused heap storage and changes the
state of the words from unused to undef.
◦ where k is the number of words of storage needed;
e.g., suppose a Java class Point has data members
x,y,z which are floats.
◦ If floats require 4 bytes of storage, then
Point firstCoord = new Point( )
calls for 3 X 4 bytes (at least) to be allocated and
initialized to some predetermined state.
16


Heap overflow occurs when a call to new
occurs and the heap does not have a
contiguous block of k unused words
So new either fails, in the case of heap
overflow, or returns a pointer to the new
block
17



delete returns a block of storage to the heap
The status of the returned words are returned
to unused, and are available to be allocated in
response to a future new call.
One cause of heap overflow is a failure on the
part of the program to return unused storage.
18
The New (5) Heap Allocation Function Call: Before and After
Figure 11.2
A before and after view of the heap. The “after” shows the
effect of an operation requesting a size-5 block. (Note
difference between “undef” and “unused”.) Deallocation
reverses the process.
19


Heap space isn’t necessarily allocated and
deallocated from one end (like the stack)
because the memory is not allocated and
deallocated in a predictable (first-in, firstout or last-in, first-out) order.
As a result, the location of the specific
memory cells depends on what is available at
the time of the request.
20



The memory manager can adopt either a
first-fit or best-fit policy.
Free list = a list of all the free space on the
heap: 4 bytes, 32 bytes, 1024 bytes, 16
bytes, …
A request for 14 bytes could be satisfied
◦ First-fit: from the 32-byte block
◦ Best-fit: from the 16 byte block
21


The view of a process address space as a
contiguous set of bytes consisting of static,
stack, and heap storage, is a view of the
logical (virtual) address space.
The physical address space is managed by
the operating system, and may not
resemble this view at all.
22



The language is responsible for assigning
logical memory
OS is responsible for mapping memory to
physical memory and deciding how much
physical memory a program can have at a
time.
A compiler’s logical addresses are relative to
the start of the program.
◦ Logical addresses = virtual addresses
23




The value of a pointer variable is an address.
Memory that is accessed through a pointer is
dynamically allocated in the heap
The pointer variable is on the stack and holds
the address of the object pointed to, which is
in the heap.
Java doesn’t have explicit pointers, but
reference types are represented by their
addresses and their storage is also allocated
on the heap
◦ the reference is on the stack
24

In addition to simple variables (ints, floats,
etc.) most imperative languages support
structured data types.
◦ Arrays: “[finite] ordered sequences of values that all
share the same type”
◦ Records (structs): “finite collections of values that
have different types”
◦ Lists: A list data structure has different features
depending on the language but in general it is an
ordered sequence of values, possibly of different
types, possibly accessible through indexes or in the
case of linked lists only by list traversals using
pointers or references.
25


In Java, arrays are always allocated dynamically
from heap memory.
In some other languages (e.g., C++, C)
◦ Globally defined arrays - static memory.
◦ Local (to a function) arrays - stack storage.
◦ Dynamically allocated arrays - heap storage.

Dynamically allocated arrays also have storage
on the stack – a reference (pointer) to the heap
block that holds the array.
26

Typical Java array declarations:
◦ int[] arr = new int[5];
◦ float[][] arr1 = new float [10][5];
◦ Object[] arr2 = new Object[100];

Typical C/C++ array declarations
◦ int arr[5];
◦ float arr1[10][15];
◦ int *intPtr;
intPtr = new int[5]
27



Consider the declaration
int A(n);
Since array size isn’t known at compile
time, storage for the array can’t be
allocated in static storage or on the runtime stack.
The stack contains the dope vector for the
array, including a pointer to its base
address, and the heap holds the array
values, in contiguous locations. See Figure
11.3, page 266
28

The dope vector has information needed to
interpret array references:
◦ Array base address (a pointer to heap data)
◦ Array size (number of elements)
for multi-dimensioned arrays, size of each
dimension
◦ Element type (which indicates the amount of
storage required for each element)

For dynamically allocated arrays, this
information must be available at runtime.
29


The Meaning Rule on page 266 describes
the semantics of a 1-d array declaration ad
in Clite:
There are 4 parts to the rule:
◦
◦
◦
◦
Compute addr(ad[0]) = new(ad.size)
Push addr(ad[0]) onto the stack
Push ad.size onto the stack (for bounds checking)
Push ad.type onto the stack (for type checking)
30
Allocation of Stack and Heap
Space for Array A Figure 11.3
31
Meaning Rule 11.3 The meaning of an array
Assignment as is (assumes size = 1):
1.
2.
3.
Compute addr(ad[ar.index])=addr(ad[0])
+(ad.index-1)
If addr(ad[0])
 addr( ad[ar.index] )
< addr(ad[0])+ad.size)
then assign the value of as.source to
addr(ad[ar.index]) (the target)
Otherwise, signal an index-out-of-range error.
32
The assignment A[5]=3 changes the value at
heap address addr(A[0])+4 to 3, since
ar.index=5 and addr(A[5])=addr(A[0])+4.
This assumes that the size of an int is one
word.
33

In addition to dynamically allocated
arrays, C/C++ support
◦ static (globally defined) arrays
◦ fixed stack-dynamic arrays
 Arrays declared in functions are allocated storage on
the stack, just like other local variables.
 Index range and element type are fixed at compile
time
34




The increasing popularity of OO programming
has meant more emphasis on heap storage
management.
Active objects: can be accessed through a
pointer or reference located on the stack.
Inactive objects: blocks that cannot be
accessed; no reference exists.
(Accessible and inaccessible may be more
descriptive.)
35


Garbage: any block of heap memory that
cannot be accessed by the program; i.e., there
is no stack pointer to the block; but the
runtime system thinks it is still in use.
Garbage is created in several ways:
◦ A function ends without returning the space
allocated to a local array or other dynamic variable.
The pointer (dope vector) is gone.
◦ A node is deleted from a linked data structure, but
isn’t freed
◦ …
36



A second type of problem can occur when a
program assigns more than one pointer to a
block of heap memory
The block may be deleted and one of the
pointers set to null, but the other pointers still
exist.
If the runtime system reassigns the memory to
another object, the original pointers pose a
danger.
37



A dangling pointer (or dangling reference,
or widow) is a pointer (reference) that still
contains the address of heap space that has
been deallocated (returned to the free list).
An orphan (garbage) is a block of allocated
heap memory that is no longer accessible
through any pointer.
A memory leak is a gradual loss of available
memory due to the creation of garbage.
38
Consider this code:
class node {
int value;
node next; }
. . .
node p, q;
p = new node();
q = new node();
. . .
q = p;
delete(p);
// p & q are on the
// stack but point to
// the heap




The statement q = p;
creates a memory leak.
The node originally
pointed to by q is no
longer accessible – it’s
an orphan (garbage).
Now, add the
statement delete(p);
The pointer p is
correctly set to null,
but q is now a
dangling pointer (or
widow)
39
Creating Widows and Orphans: A Simple Example
Figure 11.4
(a): after new(p); new(q); (b): after q = p;
(c): after delete(p); q still points to a location in the heap,
which could be allocated to another request in the future.
The node originally pointed to by q is now garbage.
40
Variables contain references to data values
A
5
A = A * 2.5
A
5
A = “cat”
2.5
cat
Python may allocate new storage with each assignment, so it
handles memory management automatically. It will create
new objects and store them in memory; it will also execute
garbage collection algorithms to reclaim any inaccessible
memory locations.
41



Memory management in programming languages
binds (logical) addresses to instructions and data.
◦ The memory accessible to a program is its address
space, represented as a set of values {0, 1, …, n}.
Three types of storage
◦ Static
◦ Stack
◦ Heap
Stack and heap are managed by the runtime system;
◦ Stack: structured, contains activation records
◦ Heap: less structured: holds dynamically allocated
variables
42


Problems with heap storage:
◦ Memory leaks (garbage): failure to free storage when
pointers (references) are reassigned
◦ Dangling pointers: when storage is freed, but references to
the storage still exist.
Two schools of thought
◦ Programmer takes care of memory management
◦ The language’s runtime system takes care of it
◦ Memory management = allocating & deallocating memory
when the program no longer needs it.
 C/C++: programmer allocates, deallocates
 Java: language handles deallocation, some allocation
 Python: language does it all
43


All inaccessible blocks of storage are
identified and returned to the free list.
The heap may also be compacted at this time:
allocated space is compressed into one end
of the heap, leaving all free space in a large
block at the other end.
44

C & C++ leave it to the programmer – if an unused
block of storage isn’t explicitly freed by the
program, it becomes garbage.
◦ You can get C++ garbage collectors, but they aren’t
standard

Java, Python, Perl, (and other scripting languages)
are examples of languages with garbage collection
◦ Python, etc. also automatic allocation: no need for “new”
statements

Garbage collection was pioneered by languages like
Lisp, which constantly creates and destroys linked
lists.
45

There are three major approaches to
automating the process:
◦ Reference counting, Mark-sweep, Copy collection

All have the same basic format: determine
the heap nodes that are accessible (directly
or indirectly) and get rid of everything else.
◦ A node is directly accessible if a global or stack
variable points to it (has a reference to it)
◦ A node is indirectly accessible if it can be reached
through a chain of pointers that originates on the
stack or in global memory
46



Initially, the heap is structured as a linked list
(free list) of nodes.
Each node has a reference count field; initially
0.
When a block is allocated it’s removed from
the free list and its reference count is set to 1.
◦ The address of the block is assigned to a pointer or
reference variable on the stack.
47



When another pointer is assigned the
reference count is incremented.
When a pointer is freed (or re-assigned) the
reference count of the block it points to is
decremented.
When a block’s count goes back to zero,
return it to the free list.
◦ If it points to any other nodes, reduce their
reference by one.
48
Reference count
for each node would
be 1.
Reference count for top node
would be 2, for bottom node
would be 0
49
Simple Example using Ref. Count
P
2
1
1
null
Q
“P = null” reduces the reference count of the first node to 1
“Q = null” reduces the reference count of the first node to 0, which
triggers the reduction of the reference count in node 2 to 0,
recursively reduces the ref. count in node 3 to 0, and then returns
all three nodes to the free list.
50
Node Structure and Example Heap for Reference Counting
Figure 11.5
•There’s a block at the bottom with ref. count = 0. What does
this represent?
•What would happen if delete is performed on p and q?
51
Node Structure and Example Heap for Reference Counting
Figure 11.5
• Suppose the instruction p→next = null is executed?
• the nodes on the right form an unreachable circular list
52


Advantage: the algorithm is performed
whenever there is an assignment or other
heap action. Overhead is distributed over
program lifetime
Disadvantages are:
◦ Can’t detect inaccessible circular lists.
◦ Some extra overhead due to reference counts
(storage and time).
53


Runs when the heap is full (free list is empty
or cannot satisfy a request).
Two-pass process:
◦ Pass 1: All active references on the stack are
followed and the blocks they point to are marked
(using a special mark bit set to 1).
◦ Pass 2: The entire heap is swept, looking for
unmarked blocks, which are then returned to the
free list. At the same time, the mark bits are turned
off (set to 0).
54
Mark(R): //R is a stack reference
If (R.MB == 0)
R.MB = 1;
If (R.next != null)
Mark(R.next);
All reachable nodes are marked.
Starts in the stack, moves to the heap.
55
Sweep( ):
i = h; // h = first heap address
While (i<=n) {
if(i.MB == 0)
free(i);//add node i to free list
else i.MB = 0;
i++;
}
Operates only on the heap.
56
Node Structure and Example for Mark-Sweep Algorithm
Figure 11.6
Before the mark-sweep algorithm begins
57
Heap after Pass I of Mark-Sweep
Figure 5.16
After the first (mark) pass, accessible nodes are
marked, others aren’t
58
Heap after Pass II of Mark-Sweep
Figure 11.8
After the 2nd (sweep) pass: All inaccessible nodes are
linked into a free list; all accessible nodes have their mark
bits returned to 0
59

Advantages:
◦ It may never run (it only runs when the heap is full).
◦ It finds and frees all unused memory blocks.

Disadvantage: It is very intensive when it does
run. Long, unpredictable delays are
unacceptable for some applications.
60




Like Mark & Sweep (M&S), it runs when the
heap is full.
Possibly faster than M&S because it only
makes one pass through the heap, but the
Copying part slows it down.
No reference count or mark bit needed.
The heap is divided into two halves,
from_space and to_space.
61

While garbage collection isn’t needed,
◦ From_space contains allocated nodes and nodes
on the free list.
◦ To_space is not used.

When from_space is full, “flip” the two
spaces, and pack all accessible nodes in the
old from_space into the new from_space.
Any left-over space is the free space.
◦ (A node is accessible if it can be reached from the
stack, or from another node in the heap)
62
Initial Heap Organization for Copy Collection
Figure 11.9
Not available
63
Result of a Copy Collection Activation
Figure 11. 9
After “flipping” and repacking into
the former to_space.
(The accessible nodes are packed, orphans
are returned to the free_list, and the two
halves reverse roles.)
64



When an active object is copied to the
to_space, update any references contained in
the objects (addresses have changed)
When copying is completed, the new to_space
contains only active objects, and they are
tightly packed into the space.
Consequently, the heap is automatically
compacted (defragmented).
65


Automatic compaction is the main advantage
of this method when compared to mark-andsweep.
Disadvantages:
◦ All active objects must be copied: may take a lot of
time (not necessarily as much as the two-pass
algorithm).
◦ Requires twice as much space for the heap
66

If r, the ratio of active heap blocks to heap
size, is significantly less than (heap size)/2,
copy collection is more efficient
◦ Efficiency = amount of memory reclaimed per unit of
time


As r approaches (heap size)/2 mark-sweep
becomes more efficient
Based on a study reported in a paper Jones and
Lins, 1996.
67





Different languages and implementations will
probably use some variation or combination of
one of the above strategies.
Java runs garbage collection as a background
process when demand on the system is low,
hoping that the heap will never be full.
Java also allows programmers to explicitly
request garbage collection, without waiting for
the system to do it automatically.
Functional languages (Lisp, Scheme, …) also
have built-in garbage collectors
C/C++ do not.
68

Some commercial applications divide nodes
into categories according to how long they’ve
been in memory
◦ The assumption is that long-resident nodes are
likely to be permanent – don’t examine them
◦ New nodes are less likely to be permanent –
consider them first
◦ There may be several “aging” levels
69
Download