ONR meeting on Automating Software Complexity Reduction For

advertisement
ONR meeting on Automating Software Complexity Reduction For Reclaiming Software
Execution Efficiency and Increasing Security
Meeting location: The MITRE Corporation, McLean, VA
Meeting date: 4 June 2013
Meeting Summary
The purpose of this one-day meeting was to gather together researchers and program managers in
the topic area to assess the state of the art and suggest important open questions to be answered
to make additional progress. The invitation-only meeting was attended by 44 people drawn from
government (6), industry (19), and academia (19).
The agenda included overview briefings, a panel-led question and answer session, and four
breakout groups addressing key sub-areas of interest (see attached agenda). The objective was to
encourage wide-ranging discussion by providing a variety of structures for the day’s sessions.
Discussions were passionate and courteous. It is clear that research in the area is vibrant but that
there is significant disagreement on what the priorities of the objectives are and in some cases
even how to define those objectives.
The majority of attendees took the goal of debloating to be increasing the runtime performance
of a program (or system), as counted by either instructions executed or clock cycles consumed
during execution. However, there was also significant discussion of how debloating could
improve the security of a program (e.g., by reducing attack surface), and how increasing the
number of instructions (by adding instructions to simplify the program logic) could make using
formal methods easier (in this case, perhaps this is not bloat; the effect is usually quite localized,
and in some cases can make the program run faster as well). There was also a lot of discussion
on the tension between making program development simpler using abstractions (which increase
bloat) and improving runtime performance and security.
As debloating is taken into account over time, it must address three standard situations during the
transition to a time when all programs are developed with debloating in mind:
-
Legacy: existing applications, usually binary-only, with no annotations to assist in debloating.
-
Transition: applications that combine newly developed components that can take debloating into account with older components (e.g., libraries) that do not.
-
New: completely new applications developed with de-bloating and enhanced
performance in mind.
Most software today exists only in binary (or bytecode) format, was not developed with
debloating concerns in mind, and any debloating would have to be performed automatically.
Therefore, this is the most critical and challenging situation.
1
Discussions at the meeting were wide-ranging and also considered the other situations.
Combatting debloating can be sub-divided into separately addressing it in canonical phases of
the software lifecycle:
With source code / at
development time / before or
during compilation
With bytecode or binary code
only / after compilation
Without annotations
Typical development
environments do not address
extra code, compilation can do
some streamlining of code but
only in the most extreme cases
(e.g., dead code)
Very difficult to correctly
identify and safely remove
extra code
With annotations
A bloat-aware/performanceaware IDE could help a
developer write streamlined
code given his particular
application and performance
goals
Post-processing (after
compilation) could use
annotations to effectively debloat software, customized for
a particular use.
There is also the difference between applying debloating once for all uses of a program (or
system) versus a mass customization approach that allows debloating to be tailored to specific
uses. The latter would allow better performance enhancement when considering all uses of a
program across all users. But no one yet knows how to instrument an environment to
automatically discover the right customizations and then automatically apply the right debloating
operations.
A few of the positions discussed were:

Should the focus be on shared libraries alone or on the entire system’s environment
(application plus libraries plus operating system/hypervisor). Whole program
optimization can eliminate a lot of dead code- 70% of programs use less than 20% of
system calls, 22% of the remainder use 20-40%, only 8% use more than 40%.

How important are interpreted languages and languages with large runtime
environments? Should uses of an “eval” function be limited or eliminated to decrease
bloat? How should just-in-time compilation be handled?

How can one apply formal methods in multi-language environments (that is, programs
that use code developed in several different source languages)?
Possible solutions discussed lie along several paths:
2

Runtime customizations; dead code removal specific to usage, whole program (include
OS) analysis, pre-initialization, and pre-virtualization (all supported by improvements in
static analysis).

Languages and libraries built with abstractions knowing ahead of time that debloating
needs to be done automatically; apply iterative refinement with provably correct
transformations.

Formal methods and concolic execution, also directed random testing, to guarantee the
correctness of transformations.

Use or construct a software stack that preserves metadata needed to improve efficiency.

Provide interactive development environment (IDE) support for developers to understand
the performance implications. A lot of bloat is incurred when programmers include a
large library and then only use a small portion of it. It would be valuable if the IDE gave
programmers feedback on their decisions, and also if the tool chain allowed programmers
to be very selective in what code was included from external libraries.

Provide better programming languages with more constructs for managing complexity.
The view was expressed that today’s relatively impoverished languages force
programmers to overload the few available mechanisms and that leads to inefficiency and
bloat.

Provide object-oriented abstractions for programmers to use on big data applications that
don’t incur memory bloat. Implementing each data element an object causes a large
amount of memory to be used for storage and a large number of instructions to be used
for retrieval.
At the end of the day, there was a general consensus that this was a difficult problem that
deserves attention. Additional research in static analysis and dynamic analysis, using formal and
robust-but-informal methods is needed to reduce software bloat, especially to address the nearterm need of reducing bloat in legacy applications where only the binary (or bytecode) is
available. Longer-term research topics include how to provide support for understanding and
managing software bloat during development and understanding the value of the additional
information available in this scenario to improving debloating.
List of Related Materials
The following documents are also available from the meeting:

Sukarno Mertoguno, “Improving Software Robustness and Efficiency,” Towards62Tech-II-FutureSw.pdf.

Michael May, “View from the Office of the Secretary of Defense,”
01_ONR_MITRE_Meeting_4_june_2013.pptx.
3

Alexey Loginov, “Robustness Challenges in Improving Efficiency and Security,”
02_Grammatech2013-06-04.pdf.

Jan Viteck, “Dynamic Languages,” 03_Vitek.pdf.

Milo Martin, “Mitigating Software Bloat at the Compiler/Binary Interface,” 04_martintalk-Software-Bloat-Meeting-ONR-June-2013.pdf.

Alessandro Coglio, “Software Efficiency Workshop Position Statement,”
coglio_kestrel.pdf.

David Cok and Alexey Loginov, “Robustness Challenges in Improving Efficiency,”
cok_grammatech.pdf.

Tielei Wang and Wenke Lee, “Unnecessary Shared Library Elimination,” lee_gatech.pdf.

Milo Martin, “Position Statement,” martin-position-statement-workshop-on-softwareefficiency.pdf.

Alastair Murray and Binoy Ravindran, “Requirements for Effective ‘Debloating’ Tools,”
murray_vatech.pdf.

Jens Palsberg, “Increasing Software Efficiency by Removing Layering and Bloat,”
palsberg_ucla.pdf.

R. Sekar, “Static Analysis and Rewriting Techniques for Cross-Component Hardening
and Optimization of Low-Level Code,” sekar_stonybrook.pdf.

Natarajan Shankar and Ashish Gehani, “Static Previrtualization,” shankar_sri.docx.

Scott Stoller and Yanhong Liu, “Towards Practical Software Bloat Removal with
Assurance,” stoller_stonybrook.pdf.

Xiangyu Zhang and Dongyan Xu, “System Software ‘Weight Loss’ via Binary
Distillation,” xu_purdue.pdf.

Guoqing (Harry) Xu, “Debloating Software,” xu_ucirvine.docx.
4
Download