Garbage Collection Techniques to Enhance Memory Locality in Java Programs

advertisement
Garbage Collection Techniques to Enhance Memory Locality in Java Programs
James Smith and Min Zhong
Java is a new popular object oriented programming language. Dynamic memory allocation is
natural in object oriented programming language. Objects are created (have memory space
allocated for them), at run time, used, then die (are no longer referenced and space no longer in use).
A feature of Java is that memory is periodically "Garbage Collected" (GC) -- that is, memory space
holding dead objects is re-claimed for use by newly created objects. High performance computers
use memory hierarchies where objects accessed close together in time are also placed close
together in memory. This property is a form of "locality", and good memory locality is critical for high
performance. Dynamic memory allocation and GC often have negative effects on the placement of
data in memory, and, therefore, the memory locality properties. We plan to study how the
conventional memory allocation and GC schemes affect the system performance using a tool from
Sun Microsystems (the original Java developers) that provides a dynamic trace of object activity as a
program runs, and a software we will develop that models memory allocation/GC algorithms and
memory hierarchy. By observing the performance trend and the objects’ characteristics, we can
characterize the optimum performance for an ideal algorithm, and get an idea of how much
performance improvement can be achieved. We will then be able to come up with some new
memory allocation/GC algorithms that could enhance memory locality and improve the system
performance.
Java is a new object oriented programming language that is coming into widespread use because it
simplifies software development, is platform-independent, and contains built-in security features. Memory
allocation in Java is dynamic in nature -- a large number of data objects are created, used, then die (are no
longer referenced) throughout the execution of a Java program.
One attractive feature of Java is that the run time system (not the programmer) is responsible for all
memory allocation and management. A key part of memory management is the use of automatic "Garbage
Collection" (GC). That is, memory space holding dead objects is re-claimed for use by newly created objects
without the need for any programmer intervention.
Dynamic memory allocation and garbage collection can have detrimental performance effects,
however, because the placement of objects in memory can occur in a rather non-systematic, unpredictable
way. In contrast, most computers are dependent on memory data placement for good performance. In
particular, high performance computers use memory hierarchies which perform better when data (objects)
accessed close together in time are also placed close together in memory -- this is a form of data "locality"
that is exploited by cache memories and paged virtual memory systems. On modern systems, memories are
1
organized in a hierarchy, from the top of the pyramid, the small and fast memory, to the bottom, the larger
and slower memories, i.e. from caches, to the main memory, to the even slower hard disks. The smaller and
expensive memory gives a processor fast data access, while the bottom of the pyramid provides cheap and
large capacity at the cost of speed. Due to the limitedness of the amount of the physical resource at the
higher end of hierarchy, organizing the placement of data in the fast memory, or preserving the memory
locality is essential to high performance.
We plan to study the effects of memory allocation and GC on memory locality in Java programs and
to propose new allocation/GC algorithms that will enhance data locality and increase overall system
performance.
A number of GC algorithms have been proposed in the literature. A classical algorithm is
“Reference Counting” [Collins, 1960], which maintains in each allocated block of memory the number of
pointers pointing to that block. The block is freed when the count drops to zero. This algorithm degrades the
user program's performance since it has to update the count upon every allocation and takes up space in
each block to hold the count. The biggest flaw is that it doesn't detect cyclic garbage.
An often implemented classical algorithm is the "Mark and Sweep" algorithm [McCarthy, 1960]. The
algorithm traverses through the whole dynamic memory space, or heap, marks all the live objects, and then
"sweeps" all the unmarked "garbage" back to the main memory. The algorithm handles cyclic pointer
structures nicely. But once GC is started, it has to go through all the live objects non-stop in order to
complete the marking phase. This could potentially slow down other processes significantly, and is probably
not desirable in the case of real-time systems or where response time is important (this includes many Java
applications). It bears a high cost since all objects have to be examined during the "sweep" phase -- the
workload is proportional to the heap size instead of the number of the live objects. Also, data in a markswept heap tend to be more fragmented, which may lead to an increase in the size of the program's working
set, poor locality and trigger more frequent cache misses and page faults for the virtual memory.
2
Another classic GC method is the copying algorithm [Minsky, 1963]. The heap (storage where
dynamic objects are kept) is divided into two sub-spaces: one containing active objects and the other
"garbage". The collector copies all the data from one space to another and then reverses the roles of the
two spaces. In the process of copying, all live objects can be compacted into the bottom of the new space,
mitigating the fragmentation problem. But the immediate cost is the requirement of the doubled space
compared to non-copying collectors. Unless the physical memory is large enough, this algorithm suffers
more page faults than its non-copying counterparts. And after moving all data to the new space, the data
cache memory will have been swept clean and will suffer cache misses when computation resumes.
A final method, and the one we will use as a starting point is "generational" garbage collection
[Appel, 1989]. A generational garbage collector not only reclaims memory efficiently but also compacts
memory in a way that enhances locality. This algorithm divides memory into spaces. When one space is
garbage collected, the live objects are moved to another space. It has been observed that some objects are
long-lived and some are short-lived. By carefully arranging the objects, objects of similar ages can be kept in
the same space, thus causing more frequent and small scalar GC's on certain spaces. This could preserve
cache locality better than the plain copying algorithm. Its workload is proportional to the number of the live
objects in a subspace instead of all in the entire memory space as is the case of "Mark and Sweep".
To support our proposed research, we will be given access to a proprietary tool developed at Sun
Microsystems (the original Java developers). This tool provides a dynamic trace of object activity as a Java
program runs. We will develop software that models memory allocation/GC algorithms and memory
hierarchies. We will then combine the Sun tracing tool and our memory model with a performance analyzing
tool that will process the traces to provide data on locality and system performance. The following figure is
our proposed structural setting for our study. With the help of the tool and our model, we will be able to first
study the impact of the conventional GC schemes on memory locality and on actual system performance
when running Java programs.
Object
Allocator
Memory
Object
Tracer
Garbage
Collector
performance
Analyzer
3
We will gain some insight into objects referencing behavior and characteristics in relation to memory
allocation and GC through this initial study. Using these data, we will next characterize the optimum
performance of an ideal GC algorithm, e.g. a generational GC that arranges objects perfectly by their life
spans, or via other criteria, thus enhancing memory locality. Such an ideal algorithm will be based on an
"oracle" and will therefore be unimplementable, but it will give a limit on how much performance improvement
can be achieved. This study will be similar to the way that “Belady's Min” algorithm [Belady, 1966] has been
used for studying replacement algorithms in conventional memory hierarchies.
Through our study about the characteristics of optimal GC performance, we hope gain insight that
will lead to new implementable algorithms with similar characteristics. Our goal is to define one such
algorithm, compare it with the conventional and ideal GC schemes, and measure its performance in term of
locality enhancement and system performance improvement.
As a starting point, we have some ideas that might improve the generational GC algorithm. We
hypothesize that a majority of the objects that are allocated around the same time tend to have similar life
spans. For instance, most objects in an array are likely to be allocated together upon initialization and
deserted together when the array is no longer used. As another example, most local objects declared at the
beginning of a block of statements, or a scope, are likely to die together when they fall out of the scope. If
this hypothesis holds, then we could assume most of the objects in a procedure have a similar life span, and
have generation spaces indexed by stack frames. We also propose another correlation: different types of
objects tend to have different life spans. How accurate the predictions are and how much this information
would actually facilitate an effective GC are the typical questions we plan to examine. If these hypotheses
are valid, then we could divide up the generation spaces more accurately, thus facilitating a more efficient
GC, improving the memory locality and system performance.
As memory allocation and GC are critical to memory management and the locality, we think this
research would be of valuable significance in improving system performance for Java programs (and
4
possibly other object oriented languages). Equipped with ideas and supporting tools, we are confident that
our research goals of investigating the automatic dynamic memory management will be fruitful and
achievable within the next year.
[Collins, 1960] Geoge E. Collins. “A Method for Overlapping and Erasure of Lists,”
Communications of the ACM, vol.312, pp. 655-657, December 1960.
[McCarthy, 1960] John McCarthy. “Recursive Functions of Symbolic Expressions and Their Computation by
Machine,” Communications of the ACM, vol.3, pp.184-195, 1960.
[Minsky, 1963] Marvin L. Minsky. “A Lisp Garbage Collector Algorithm Using Serial Secondary Storage,”
Technical Report Memo 58(rev.), Project MAC,MIT,Cambridge,MA, December 1963.
[Appel, 1989] Andrew W. Appel. “Simple Generational Garbage Collection and Fast Allocation,”
Software Pratice and Experience, vol.19, no.2, pp.171-183, 1989.
[Belady, 1966] L. A. Belady, “A Study of Replacement Algorithms for a Virtual Storage Computer,”
IBM Systems Journal, vol.5, no.2, pp.78-101, 1966.
5
Download