Adam Wick

advertisement
Research Statement
Adam Wick
January 2006
With production servers using Java Virtual Machines running hundreds of servlets, and
Microsoft’s Common Language Runtime providing the basis for much of their and other’s
future programming, the line between operating system and language runtime has blurred
in recent years. This blurring leads to new opportunities in both operating systems and
language runtimes, as concepts from one can be applied to the other in order to make
programming simpler and systems more reliable.
Programming Languages to Operating Systems
With the advent and popularity of Java, garbage collection has become mainstream.
However, garbage collection for C programs has been limited to conservative garbage collection, almost entirely via the Boehm-Demers-Weiser collector. Linking conservative garbage
collection to C programs is simple, but conservative garbage collectors lack precise information about the types of data in local variables and in the heap. This imprecision can lead to
large and unacceptable memory leaks, as the collector mistakes a pointer for a system root,
or mistakes a number for a pointer. [2]
Precise garbage collection – also referred to as accurate or non-conservative garbage
collection – has precise information about the types of data in local variables and in the heap.
Thus, precise collectors will never leak memory due to mistaken roots or mistaken pointers.
Further, the infrastructure for precise collection opens many possibilities for different garbage
collection styles, including simple generational collectors, incremental collectors, real time
collectors and parallel or distributed collectors. To date, however, providing this infrastructure from C or C++ has required intense and tedious effort on the part of C and C++
programmers. [3]
I have created the Magpie toolset for semiautomatically converting existing C and C++
programs from manual memory allocation (malloc() and f ree()) or conservative garbage
collection to precise garbage collection. Magpie supports a large subset of C programs
without modification, and most C programs after a small amount of modification. While
the C++ engine is a proof-of-concept, it supports all C++ grammatical extensions except
templates, namespaces and exceptions. This includes arbitrary C++ class hierarchies and
overloading. I do not believe that adding templates, namespaces or exceptions would require
any further theoretical work, merely some time engineering the software.
I have applied Magpie to several C applications, and it required minimal input from the
programmer in those cases. While Magpie does ask the programmer for confirmation of
some of its analysis results, in my experience this confirmation takes very little time from
the programmer. In many cases, the programmer may simply inform Magpie to skip the
confirmation steps.
Further, Magpie is now being applied to the Linux kernel, to investigate the actual impact
of garbage collection on kernel code. While arguments for and against garbage collection in
the kernel have been around for a long time, to my knowledge, no one has actually converted
1
Research Statement
Adam Wick
January 2006
a C kernel to use precise collection (changing as little beyond that as possible) and tried it.
I look forward to these results.
Operating Systems to Programming Languages
Modern applications frequently allow arbitrary user plug-ins or present scripting hooks
to users. Mail programs, for example, may allow plug-ins for opening particular types of
attachments, decoding encrypted mail, or automatically detecting spam. They may also
allow scripting, to allow users filter mail into different folders, for example. Similar plug-ins
and scripting abilities can now be found in web browsers, calendar programs, spreadsheets,
music players, development environments and many other popular programs.
Applications currently restrict memory use by partitioning data and then limiting the
memory use of the partitions, typically by invoking operating system primitives. Traditional
operating systems partition memory into completely separate heaps for each process, disallowing references between them. This strict partitioning makes interprocess communication
difficult, requiring the marshaling and demarshaling of data through pipes, sockets, channels
or similar structures. In some cases, marshaling important data proves infeasible, leading to
the use of complicated protocol programming.
More recent work provides hard resource boundaries within a single virtual machine.
Systems in this class, such as the KaffeOS virtual machine [1], JSR-121 [6] or .NET application domains [5], still partition data, but without the separate address space. Generally,
the programmer explicitly creates a shared partition and may freely allocate and reference
objects in it. However, these systems place restrictions on inter-partition references. For
example, KaffeOS disallows references from its shared heap to private heap space. This less
strict form of partitioning only partially alleviates the burden on the programmer. While the
program may now pass shared values via simple references, the programmer must explicitly
manage the shared region. In the case where one process wants to transfer data completely
to another process, the transfer may require two deep copies: one to transfer the data into
shared space, and the other to transfer it out. In short, the programmer must manually
manage accounting in much the same way a C programmer manages memory with malloc()
and f ree().
I have developed a system for partition-free memory accounting [7], leveraging an existing garbage collector and process hierarchy. Partition-free memory accounting provides
mechanisms to restrict the memory use of subprocesses, but it does not place any artificial
partitions to provide this support. Thus, programmers may simply allocate objects and
transfer them between processes as they please. At garbage collection points, the memory
accounting system tracks what objects are in use by which processes, allowing the language
runtime to restrict memory usage of processes.
Future Research Directions
In the future, I would like to investigate more in this realm between language runtimes and
operating systems. I would also like to investigate software tools for program understanding.
2
Research Statement
Adam Wick
January 2006
In particular, in understanding the complex, and often questionably documented, core system
APIs for an operating system.
While memory usage is an important concern in creating applications that allow or rely
on dynamic plug-ins or scripts, it is not the only area of concern. Managing other resources
in the application, without requiring the creation of operating system level processes, seems
a fertile area of research. For example, I can imagine systems that allow fine- or coarsegrained throttling of a subprocess’s CPU usage, in order to allow higher-priority processes
precedence or to reduce the heat generation or the power consumption of the CPU.
On another topic, fewer and fewer programmers are asked to write entire programs from
scratch. Instead, the main tasks for programmers often involve maintaining large existing
code bases or porting previously written programs to new languages and systems. I am
interested in exploring how tools may make some of these problems less difficult. For example,
I can imagine a tool that hooks into the language runtime, and observes how the program
makes use of system libraries. This would include not only what functions are called, but any
ordering constraints on calls observed by the tool, and ideally also what side effects these
calls had. The ability to observe this behavior would be extremely helpful, for example,
when updating operating system drivers to a newer API. [4]
References
[1] G. Back, W. C. Hsieh, and J. Lepreau. Processes in KaffeOS: Isolation, resource management, and sharing in Java. In Proceedings of the 4th Symposium on Operating Systems
Design and Implementation, San Diego, CA, Oct. 2000. USENIX.
[2] H.-J. Boehm. Bounding space usage of conservative garbage collectors. In Proceedings of
the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
pages 93–100. ACM Press, 2002.
[3] R. L. Hudson, J. E. B. Moss, A. Diwan, and C. F. Weight. A language-independent
garbage collector toolkit. Technical Report 91-47, Object Oriented Systems Laboratory,
Department of Comp. and Info. Science, Amherst, MA, 01003, 1991.
[4] G. Hunt. Microsoft research. Personal conversation, January 2006.
[5] E. Meijer and J. Gough. Technical overview of the common language runtime.
[6] Soper, P., specification lead. JSR 121: Application isolation API specification, 2003.
http://www.jcp.org/.
[7] A. Wick and M. Flatt. Memory accounting without partitions. In Proceedings of the 2004
International Symposium on Memory Management, Vancouver, B.C., Canada, October
2004.
3
Download