ONR meeting on Automating Software Complexity Reduction For Reclaiming Software Execution Efficiency and Increasing Security Meeting location: The MITRE Corporation, McLean, VA Meeting date: 4 June 2013 Meeting Summary The purpose of this one-day meeting was to gather together researchers and program managers in the topic area to assess the state of the art and suggest important open questions to be answered to make additional progress. The invitation-only meeting was attended by 44 people drawn from government (6), industry (19), and academia (19). The agenda included overview briefings, a panel-led question and answer session, and four breakout groups addressing key sub-areas of interest (see attached agenda). The objective was to encourage wide-ranging discussion by providing a variety of structures for the day’s sessions. Discussions were passionate and courteous. It is clear that research in the area is vibrant but that there is significant disagreement on what the priorities of the objectives are and in some cases even how to define those objectives. The majority of attendees took the goal of debloating to be increasing the runtime performance of a program (or system), as counted by either instructions executed or clock cycles consumed during execution. However, there was also significant discussion of how debloating could improve the security of a program (e.g., by reducing attack surface), and how increasing the number of instructions (by adding instructions to simplify the program logic) could make using formal methods easier (in this case, perhaps this is not bloat; the effect is usually quite localized, and in some cases can make the program run faster as well). There was also a lot of discussion on the tension between making program development simpler using abstractions (which increase bloat) and improving runtime performance and security. As debloating is taken into account over time, it must address three standard situations during the transition to a time when all programs are developed with debloating in mind: - Legacy: existing applications, usually binary-only, with no annotations to assist in debloating. - Transition: applications that combine newly developed components that can take debloating into account with older components (e.g., libraries) that do not. - New: completely new applications developed with de-bloating and enhanced performance in mind. Most software today exists only in binary (or bytecode) format, was not developed with debloating concerns in mind, and any debloating would have to be performed automatically. Therefore, this is the most critical and challenging situation. 1 Discussions at the meeting were wide-ranging and also considered the other situations. Combatting debloating can be sub-divided into separately addressing it in canonical phases of the software lifecycle: With source code / at development time / before or during compilation With bytecode or binary code only / after compilation Without annotations Typical development environments do not address extra code, compilation can do some streamlining of code but only in the most extreme cases (e.g., dead code) Very difficult to correctly identify and safely remove extra code With annotations A bloat-aware/performanceaware IDE could help a developer write streamlined code given his particular application and performance goals Post-processing (after compilation) could use annotations to effectively debloat software, customized for a particular use. There is also the difference between applying debloating once for all uses of a program (or system) versus a mass customization approach that allows debloating to be tailored to specific uses. The latter would allow better performance enhancement when considering all uses of a program across all users. But no one yet knows how to instrument an environment to automatically discover the right customizations and then automatically apply the right debloating operations. A few of the positions discussed were: Should the focus be on shared libraries alone or on the entire system’s environment (application plus libraries plus operating system/hypervisor). Whole program optimization can eliminate a lot of dead code- 70% of programs use less than 20% of system calls, 22% of the remainder use 20-40%, only 8% use more than 40%. How important are interpreted languages and languages with large runtime environments? Should uses of an “eval” function be limited or eliminated to decrease bloat? How should just-in-time compilation be handled? How can one apply formal methods in multi-language environments (that is, programs that use code developed in several different source languages)? Possible solutions discussed lie along several paths: 2 Runtime customizations; dead code removal specific to usage, whole program (include OS) analysis, pre-initialization, and pre-virtualization (all supported by improvements in static analysis). Languages and libraries built with abstractions knowing ahead of time that debloating needs to be done automatically; apply iterative refinement with provably correct transformations. Formal methods and concolic execution, also directed random testing, to guarantee the correctness of transformations. Use or construct a software stack that preserves metadata needed to improve efficiency. Provide interactive development environment (IDE) support for developers to understand the performance implications. A lot of bloat is incurred when programmers include a large library and then only use a small portion of it. It would be valuable if the IDE gave programmers feedback on their decisions, and also if the tool chain allowed programmers to be very selective in what code was included from external libraries. Provide better programming languages with more constructs for managing complexity. The view was expressed that today’s relatively impoverished languages force programmers to overload the few available mechanisms and that leads to inefficiency and bloat. Provide object-oriented abstractions for programmers to use on big data applications that don’t incur memory bloat. Implementing each data element an object causes a large amount of memory to be used for storage and a large number of instructions to be used for retrieval. At the end of the day, there was a general consensus that this was a difficult problem that deserves attention. Additional research in static analysis and dynamic analysis, using formal and robust-but-informal methods is needed to reduce software bloat, especially to address the nearterm need of reducing bloat in legacy applications where only the binary (or bytecode) is available. Longer-term research topics include how to provide support for understanding and managing software bloat during development and understanding the value of the additional information available in this scenario to improving debloating. List of Related Materials The following documents are also available from the meeting: Sukarno Mertoguno, “Improving Software Robustness and Efficiency,” Towards62Tech-II-FutureSw.pdf. Michael May, “View from the Office of the Secretary of Defense,” 01_ONR_MITRE_Meeting_4_june_2013.pptx. 3 Alexey Loginov, “Robustness Challenges in Improving Efficiency and Security,” 02_Grammatech2013-06-04.pdf. Jan Viteck, “Dynamic Languages,” 03_Vitek.pdf. Milo Martin, “Mitigating Software Bloat at the Compiler/Binary Interface,” 04_martintalk-Software-Bloat-Meeting-ONR-June-2013.pdf. Alessandro Coglio, “Software Efficiency Workshop Position Statement,” coglio_kestrel.pdf. David Cok and Alexey Loginov, “Robustness Challenges in Improving Efficiency,” cok_grammatech.pdf. Tielei Wang and Wenke Lee, “Unnecessary Shared Library Elimination,” lee_gatech.pdf. Milo Martin, “Position Statement,” martin-position-statement-workshop-on-softwareefficiency.pdf. Alastair Murray and Binoy Ravindran, “Requirements for Effective ‘Debloating’ Tools,” murray_vatech.pdf. Jens Palsberg, “Increasing Software Efficiency by Removing Layering and Bloat,” palsberg_ucla.pdf. R. Sekar, “Static Analysis and Rewriting Techniques for Cross-Component Hardening and Optimization of Low-Level Code,” sekar_stonybrook.pdf. Natarajan Shankar and Ashish Gehani, “Static Previrtualization,” shankar_sri.docx. Scott Stoller and Yanhong Liu, “Towards Practical Software Bloat Removal with Assurance,” stoller_stonybrook.pdf. Xiangyu Zhang and Dongyan Xu, “System Software ‘Weight Loss’ via Binary Distillation,” xu_purdue.pdf. Guoqing (Harry) Xu, “Debloating Software,” xu_ucirvine.docx. 4