Debloating Software Guoqing (Harry) Xu UC Irvine Overview

Debloating Software
Guoqing (Harry) Xu
UC Irvine
Overview Performance and scalability issues are becoming increasingly critical due to the pervasive use
of object-oriented programming languages. The inefficiencies inherent in the implementation of an objectoriented language as well as the commonly adopted design and implementation principles in the objectoriented community often combine to hurt performance. The community-wide recognition of the
importance of abstraction and reuse results in increased emphasis on modular design, declaration of
general interfaces, and use of models and patterns. Programmers are taught to focus first and foremost
on them, taking it for granted that compilers and run-time systems can remove all the inefficiencies. In a
large program that is typically built on top of many layers of frameworks and libraries, a small set of
inefficiencies can multiply and quickly get magnified to slow down the system. When the call stack grows
to be deep, the usefulness of the dataflow analyses in a dynamic compiler becomes limited and the
optimizer can no longer remove these inefficiencies. As a result, many applications suffer from chronic
run-time performance problems that significantly affect scalability and performance. This is a serious
problem for real-world software systems used every day by thousands of businesses.
The pressing need for new optimization techniques can be especially seen as object-orientation goes
everywhere into systems of any size. The extensive use of object-oriented languages in the development
of memory-constrained applications such as smartphone apps (e.g., Java used in Android and C# used in
Windows phones) and data-intensive systems (e.g., Hadoop, Giraph, and Hyracks) introduces numerous
research challenges— these systems have small memory space but large amounts of data to process
and inefficiencies in these systems can be significantly exacerbated. The burden of reducing unnecessary
work should not be only on the shoulder of hardware designers, especially in the modern era when
Moore’s dividend becomes less obvious. It strongly calls for highe-level performance optimization
techniques that can detect and remove inefficiencies for all categories of object-oriented
applications. We envision the following categories of techniques that need to be developed for
improving the performance of the new-generation object-oriented applications.
1. Effective testing techniques that can find performance problems. Performance problems are
notoriously difficult to find during development and in-house testing; many of such problems in modern
applications are scalability issues that can only manifest when the input data is sufficiently large.
Traditional testing focuses on detection of functional bugs and developers often do not have large, realworld input data to test a program. Very often, developers are not aware of the problems until software
is released and users observe that its performance cannot meet their expectations. Novel run-time
techniques need to be developed to amplify performance problems so that they can manifest even
when the program is exercised with small inputs.
2. Semantics-aware adaptive optimizations. Adaptive optimization (such as feedback-directed
optimization in a JIT compiler) has been extensively researched during the past decade. However, recent
studies show that most of the severe performance problems in a modern application are caused by
developers’ mistakes (e.g., inappropriate choices of algorithms, data structures, etc.) closely related to
the semantics of the application. Traditional (dataflow-based) optimizations are semantics-agnostic and
thus cannot effectively remove today’s semantic redundancies. New optimization techniques should be
developed to complement the existing dataflow analyses being performed in the JIT compiler. For
example, it is interesting to develop an automated tuning framework that can selects and switches data
structures implementations in object-oriented programs.
3. Optimization of Big Data applications. Modern computing has entered the era of Big Data. Analyzing
information from Twitter, Google, Facebook, Wikipedia, or the Human Genome Project requires the
development of scalable platforms that can quickly process massive-scale data. Such frameworks often
utilize large numbers of machines in a cluster or in the cloud to process data in a scalable manner. An
object-oriented programming language such as Java is often the developer’s choice for implementing
data-processing frameworks. In fact, the Java community has already been the home of many dataintensive computing infrastructures, such as Hadoop, Hyracks, Storm, and Giraph. Despite the many
development benefits provided by Java, these applications commonly suffer from severe memory bloat,
which stems primarily from a combination of the inefficient memory usage inherent in the runtime of a
managed language as well as the processing of huge volumes of data that can exacerbate the alreadyexisting inefficiencies by orders of magnitude. Novel optimization techniques (based either on human
efforts or on compiler and run-time system support) should be developed to optimize bloat away in the
presence of massive amounts of data, so that Big Data developers can enjoy the many benefits of
object-oriented programming as well as the high performance.
4. Novel program analysis techniques to interpret performance problems. Once a performance
problem is observed, developers often have to perform manual tuning in order to understand its root
cause. This is a daunting task as modern software often has extremely large code base, creates many
millions of objects, and runs for a long period of time. Manual tuning is very difficult because human
experts have to find useful information from an ocean of objects and other executed program entities. It
is thus highly demanding to develop automated program analysis techniques that can automatically
pinpoint the problematic areas in order to assist developers to fix the problems to improve performance.
5. Optimizing memory-constrained systems. Memory constrained systems typified by smartphone
apps are much more vulnerable to inefficiencies than regular server/desktop applications. Identifying
common inefficiency patterns in smartphone apps and developing techniques to optimize them away is
a highly interesting future research direction that may potentially lead to the fundamental changes in
the way such application are developed.