Research Statement Adam Wick January 2006 With production servers using Java Virtual Machines running hundreds of servlets, and Microsoft’s Common Language Runtime providing the basis for much of their and other’s future programming, the line between operating system and language runtime has blurred in recent years. This blurring leads to new opportunities in both operating systems and language runtimes, as concepts from one can be applied to the other in order to make programming simpler and systems more reliable. Programming Languages to Operating Systems With the advent and popularity of Java, garbage collection has become mainstream. However, garbage collection for C programs has been limited to conservative garbage collection, almost entirely via the Boehm-Demers-Weiser collector. Linking conservative garbage collection to C programs is simple, but conservative garbage collectors lack precise information about the types of data in local variables and in the heap. This imprecision can lead to large and unacceptable memory leaks, as the collector mistakes a pointer for a system root, or mistakes a number for a pointer. [2] Precise garbage collection – also referred to as accurate or non-conservative garbage collection – has precise information about the types of data in local variables and in the heap. Thus, precise collectors will never leak memory due to mistaken roots or mistaken pointers. Further, the infrastructure for precise collection opens many possibilities for different garbage collection styles, including simple generational collectors, incremental collectors, real time collectors and parallel or distributed collectors. To date, however, providing this infrastructure from C or C++ has required intense and tedious effort on the part of C and C++ programmers. [3] I have created the Magpie toolset for semiautomatically converting existing C and C++ programs from manual memory allocation (malloc() and f ree()) or conservative garbage collection to precise garbage collection. Magpie supports a large subset of C programs without modification, and most C programs after a small amount of modification. While the C++ engine is a proof-of-concept, it supports all C++ grammatical extensions except templates, namespaces and exceptions. This includes arbitrary C++ class hierarchies and overloading. I do not believe that adding templates, namespaces or exceptions would require any further theoretical work, merely some time engineering the software. I have applied Magpie to several C applications, and it required minimal input from the programmer in those cases. While Magpie does ask the programmer for confirmation of some of its analysis results, in my experience this confirmation takes very little time from the programmer. In many cases, the programmer may simply inform Magpie to skip the confirmation steps. Further, Magpie is now being applied to the Linux kernel, to investigate the actual impact of garbage collection on kernel code. While arguments for and against garbage collection in the kernel have been around for a long time, to my knowledge, no one has actually converted 1 Research Statement Adam Wick January 2006 a C kernel to use precise collection (changing as little beyond that as possible) and tried it. I look forward to these results. Operating Systems to Programming Languages Modern applications frequently allow arbitrary user plug-ins or present scripting hooks to users. Mail programs, for example, may allow plug-ins for opening particular types of attachments, decoding encrypted mail, or automatically detecting spam. They may also allow scripting, to allow users filter mail into different folders, for example. Similar plug-ins and scripting abilities can now be found in web browsers, calendar programs, spreadsheets, music players, development environments and many other popular programs. Applications currently restrict memory use by partitioning data and then limiting the memory use of the partitions, typically by invoking operating system primitives. Traditional operating systems partition memory into completely separate heaps for each process, disallowing references between them. This strict partitioning makes interprocess communication difficult, requiring the marshaling and demarshaling of data through pipes, sockets, channels or similar structures. In some cases, marshaling important data proves infeasible, leading to the use of complicated protocol programming. More recent work provides hard resource boundaries within a single virtual machine. Systems in this class, such as the KaffeOS virtual machine [1], JSR-121 [6] or .NET application domains [5], still partition data, but without the separate address space. Generally, the programmer explicitly creates a shared partition and may freely allocate and reference objects in it. However, these systems place restrictions on inter-partition references. For example, KaffeOS disallows references from its shared heap to private heap space. This less strict form of partitioning only partially alleviates the burden on the programmer. While the program may now pass shared values via simple references, the programmer must explicitly manage the shared region. In the case where one process wants to transfer data completely to another process, the transfer may require two deep copies: one to transfer the data into shared space, and the other to transfer it out. In short, the programmer must manually manage accounting in much the same way a C programmer manages memory with malloc() and f ree(). I have developed a system for partition-free memory accounting [7], leveraging an existing garbage collector and process hierarchy. Partition-free memory accounting provides mechanisms to restrict the memory use of subprocesses, but it does not place any artificial partitions to provide this support. Thus, programmers may simply allocate objects and transfer them between processes as they please. At garbage collection points, the memory accounting system tracks what objects are in use by which processes, allowing the language runtime to restrict memory usage of processes. Future Research Directions In the future, I would like to investigate more in this realm between language runtimes and operating systems. I would also like to investigate software tools for program understanding. 2 Research Statement Adam Wick January 2006 In particular, in understanding the complex, and often questionably documented, core system APIs for an operating system. While memory usage is an important concern in creating applications that allow or rely on dynamic plug-ins or scripts, it is not the only area of concern. Managing other resources in the application, without requiring the creation of operating system level processes, seems a fertile area of research. For example, I can imagine systems that allow fine- or coarsegrained throttling of a subprocess’s CPU usage, in order to allow higher-priority processes precedence or to reduce the heat generation or the power consumption of the CPU. On another topic, fewer and fewer programmers are asked to write entire programs from scratch. Instead, the main tasks for programmers often involve maintaining large existing code bases or porting previously written programs to new languages and systems. I am interested in exploring how tools may make some of these problems less difficult. For example, I can imagine a tool that hooks into the language runtime, and observes how the program makes use of system libraries. This would include not only what functions are called, but any ordering constraints on calls observed by the tool, and ideally also what side effects these calls had. The ability to observe this behavior would be extremely helpful, for example, when updating operating system drivers to a newer API. [4] References [1] G. Back, W. C. Hsieh, and J. Lepreau. Processes in KaffeOS: Isolation, resource management, and sharing in Java. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation, San Diego, CA, Oct. 2000. USENIX. [2] H.-J. Boehm. Bounding space usage of conservative garbage collectors. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 93–100. ACM Press, 2002. [3] R. L. Hudson, J. E. B. Moss, A. Diwan, and C. F. Weight. A language-independent garbage collector toolkit. Technical Report 91-47, Object Oriented Systems Laboratory, Department of Comp. and Info. Science, Amherst, MA, 01003, 1991. [4] G. Hunt. Microsoft research. Personal conversation, January 2006. [5] E. Meijer and J. Gough. Technical overview of the common language runtime. [6] Soper, P., specification lead. JSR 121: Application isolation API specification, 2003. http://www.jcp.org/. [7] A. Wick and M. Flatt. Memory accounting without partitions. In Proceedings of the 2004 International Symposium on Memory Management, Vancouver, B.C., Canada, October 2004. 3