Garbage Collection Techniques to Enhance Memory Locality in Java Programs James Smith and Min Zhong Java is a new popular object oriented programming language. Dynamic memory allocation is natural in object oriented programming language. Objects are created (have memory space allocated for them), at run time, used, then die (are no longer referenced and space no longer in use). A feature of Java is that memory is periodically "Garbage Collected" (GC) -- that is, memory space holding dead objects is re-claimed for use by newly created objects. High performance computers use memory hierarchies where objects accessed close together in time are also placed close together in memory. This property is a form of "locality", and good memory locality is critical for high performance. Dynamic memory allocation and GC often have negative effects on the placement of data in memory, and, therefore, the memory locality properties. We plan to study how the conventional memory allocation and GC schemes affect the system performance using a tool from Sun Microsystems (the original Java developers) that provides a dynamic trace of object activity as a program runs, and a software we will develop that models memory allocation/GC algorithms and memory hierarchy. By observing the performance trend and the objects’ characteristics, we can characterize the optimum performance for an ideal algorithm, and get an idea of how much performance improvement can be achieved. We will then be able to come up with some new memory allocation/GC algorithms that could enhance memory locality and improve the system performance. Java is a new object oriented programming language that is coming into widespread use because it simplifies software development, is platform-independent, and contains built-in security features. Memory allocation in Java is dynamic in nature -- a large number of data objects are created, used, then die (are no longer referenced) throughout the execution of a Java program. One attractive feature of Java is that the run time system (not the programmer) is responsible for all memory allocation and management. A key part of memory management is the use of automatic "Garbage Collection" (GC). That is, memory space holding dead objects is re-claimed for use by newly created objects without the need for any programmer intervention. Dynamic memory allocation and garbage collection can have detrimental performance effects, however, because the placement of objects in memory can occur in a rather non-systematic, unpredictable way. In contrast, most computers are dependent on memory data placement for good performance. In particular, high performance computers use memory hierarchies which perform better when data (objects) accessed close together in time are also placed close together in memory -- this is a form of data "locality" that is exploited by cache memories and paged virtual memory systems. On modern systems, memories are 1 organized in a hierarchy, from the top of the pyramid, the small and fast memory, to the bottom, the larger and slower memories, i.e. from caches, to the main memory, to the even slower hard disks. The smaller and expensive memory gives a processor fast data access, while the bottom of the pyramid provides cheap and large capacity at the cost of speed. Due to the limitedness of the amount of the physical resource at the higher end of hierarchy, organizing the placement of data in the fast memory, or preserving the memory locality is essential to high performance. We plan to study the effects of memory allocation and GC on memory locality in Java programs and to propose new allocation/GC algorithms that will enhance data locality and increase overall system performance. A number of GC algorithms have been proposed in the literature. A classical algorithm is “Reference Counting” [Collins, 1960], which maintains in each allocated block of memory the number of pointers pointing to that block. The block is freed when the count drops to zero. This algorithm degrades the user program's performance since it has to update the count upon every allocation and takes up space in each block to hold the count. The biggest flaw is that it doesn't detect cyclic garbage. An often implemented classical algorithm is the "Mark and Sweep" algorithm [McCarthy, 1960]. The algorithm traverses through the whole dynamic memory space, or heap, marks all the live objects, and then "sweeps" all the unmarked "garbage" back to the main memory. The algorithm handles cyclic pointer structures nicely. But once GC is started, it has to go through all the live objects non-stop in order to complete the marking phase. This could potentially slow down other processes significantly, and is probably not desirable in the case of real-time systems or where response time is important (this includes many Java applications). It bears a high cost since all objects have to be examined during the "sweep" phase -- the workload is proportional to the heap size instead of the number of the live objects. Also, data in a markswept heap tend to be more fragmented, which may lead to an increase in the size of the program's working set, poor locality and trigger more frequent cache misses and page faults for the virtual memory. 2 Another classic GC method is the copying algorithm [Minsky, 1963]. The heap (storage where dynamic objects are kept) is divided into two sub-spaces: one containing active objects and the other "garbage". The collector copies all the data from one space to another and then reverses the roles of the two spaces. In the process of copying, all live objects can be compacted into the bottom of the new space, mitigating the fragmentation problem. But the immediate cost is the requirement of the doubled space compared to non-copying collectors. Unless the physical memory is large enough, this algorithm suffers more page faults than its non-copying counterparts. And after moving all data to the new space, the data cache memory will have been swept clean and will suffer cache misses when computation resumes. A final method, and the one we will use as a starting point is "generational" garbage collection [Appel, 1989]. A generational garbage collector not only reclaims memory efficiently but also compacts memory in a way that enhances locality. This algorithm divides memory into spaces. When one space is garbage collected, the live objects are moved to another space. It has been observed that some objects are long-lived and some are short-lived. By carefully arranging the objects, objects of similar ages can be kept in the same space, thus causing more frequent and small scalar GC's on certain spaces. This could preserve cache locality better than the plain copying algorithm. Its workload is proportional to the number of the live objects in a subspace instead of all in the entire memory space as is the case of "Mark and Sweep". To support our proposed research, we will be given access to a proprietary tool developed at Sun Microsystems (the original Java developers). This tool provides a dynamic trace of object activity as a Java program runs. We will develop software that models memory allocation/GC algorithms and memory hierarchies. We will then combine the Sun tracing tool and our memory model with a performance analyzing tool that will process the traces to provide data on locality and system performance. The following figure is our proposed structural setting for our study. With the help of the tool and our model, we will be able to first study the impact of the conventional GC schemes on memory locality and on actual system performance when running Java programs. Object Allocator Memory Object Tracer Garbage Collector performance Analyzer 3 We will gain some insight into objects referencing behavior and characteristics in relation to memory allocation and GC through this initial study. Using these data, we will next characterize the optimum performance of an ideal GC algorithm, e.g. a generational GC that arranges objects perfectly by their life spans, or via other criteria, thus enhancing memory locality. Such an ideal algorithm will be based on an "oracle" and will therefore be unimplementable, but it will give a limit on how much performance improvement can be achieved. This study will be similar to the way that “Belady's Min” algorithm [Belady, 1966] has been used for studying replacement algorithms in conventional memory hierarchies. Through our study about the characteristics of optimal GC performance, we hope gain insight that will lead to new implementable algorithms with similar characteristics. Our goal is to define one such algorithm, compare it with the conventional and ideal GC schemes, and measure its performance in term of locality enhancement and system performance improvement. As a starting point, we have some ideas that might improve the generational GC algorithm. We hypothesize that a majority of the objects that are allocated around the same time tend to have similar life spans. For instance, most objects in an array are likely to be allocated together upon initialization and deserted together when the array is no longer used. As another example, most local objects declared at the beginning of a block of statements, or a scope, are likely to die together when they fall out of the scope. If this hypothesis holds, then we could assume most of the objects in a procedure have a similar life span, and have generation spaces indexed by stack frames. We also propose another correlation: different types of objects tend to have different life spans. How accurate the predictions are and how much this information would actually facilitate an effective GC are the typical questions we plan to examine. If these hypotheses are valid, then we could divide up the generation spaces more accurately, thus facilitating a more efficient GC, improving the memory locality and system performance. As memory allocation and GC are critical to memory management and the locality, we think this research would be of valuable significance in improving system performance for Java programs (and 4 possibly other object oriented languages). Equipped with ideas and supporting tools, we are confident that our research goals of investigating the automatic dynamic memory management will be fruitful and achievable within the next year. [Collins, 1960] Geoge E. Collins. “A Method for Overlapping and Erasure of Lists,” Communications of the ACM, vol.312, pp. 655-657, December 1960. [McCarthy, 1960] John McCarthy. “Recursive Functions of Symbolic Expressions and Their Computation by Machine,” Communications of the ACM, vol.3, pp.184-195, 1960. [Minsky, 1963] Marvin L. Minsky. “A Lisp Garbage Collector Algorithm Using Serial Secondary Storage,” Technical Report Memo 58(rev.), Project MAC,MIT,Cambridge,MA, December 1963. [Appel, 1989] Andrew W. Appel. “Simple Generational Garbage Collection and Fast Allocation,” Software Pratice and Experience, vol.19, no.2, pp.171-183, 1989. [Belady, 1966] L. A. Belady, “A Study of Replacement Algorithms for a Virtual Storage Computer,” IBM Systems Journal, vol.5, no.2, pp.78-101, 1966. 5