Towards Practical Software Bloat Removal with Assurance

advertisement
Towards Practical Software Bloat Removal with Assurance
Scott D. Stoller and Yanhong Liu
Stony Brook University
Software bloat removal can significantly reduce software inefficiencies and vulnerabilities, and
is becoming increasingly necessary for applications that are resource constrained and mission critical. For effective bloat removal in practical applications, bloat removal tools must satisfy three
challenging requirements:
• Correctness. Bloat removal tools must correctly manipulate many program components—
often written in different languages or available only as binaries—that interact with each
other in complicated ways. In fact, a key reason that software bloat exists is the difficulty of
removing it correctly.
• Productivity. Bloat removal tools need to be developed and maintained with limited time
and expense, and must support reuse, improvements, modifications, and extensions. This
is necessary because the software technologies that must be handled by bloat removal tools
constantly evolve.
• Efficiency. Bloat removal tools need to be efficient, especially for manipulating large applications, or if used at runtime. Because bloat removal is nontrivial, repeated applications of
bloat removal, testing, and problem fixing should be expected, and thus each iteration must
be efficient.
These requirements apply to all critical applications, but especially to bloat removal tools, because
they are aimed at manipulating all applications in nontrivial ways.
We advocate a logic-based method for building bloat removal tools, where all program information and analysis results are expressed directly as logic facts, and all program analysis and
transformations are expressed declaratively as logic rules. Logic inference is used to produce the
analysis results as well as transformed programs. This method provides significant advantages in
addressing the challenging requirements:
• High assurance of correctness. Logic rules and facts are the most fundamental, direct, semantic forms for expressing complex relationships and reasoning about them, for different
languages at all levels (source, bytecode, binary). They make correctness of the analysis and
transformations drastically easier to attain and prove than if these analysis and transformations are written as imperative code or using different frameworks.
• Significantly increased productivity. Expressing the analysis and transformations at the very
high level of facts and rules is significantly easier and faster than writing low-level code or
using different frameworks, and similarly better supports maintenance tasks.
• Guarantee of sufficient efficiency. The biggest challenge in the past has been efficient implementation of logic inference. However, significant progress in recent years has made such
an approach feasible, e.g., [3, 1]. In fact, for Datalog rules, which are particularly suitable
for complex program analysis and transformations, efficient implementations can be generated with better complexity guarantees than previously manually developed and implemented
algorithms [1, 2].
1
Advancing the state of the art in software bloat removal using a logic-based method would also
provide other significant benefits:
• Besides program information and analysis results about components written in different languages, logic facts and rules can also easily express any additional knowledge from external
sources and use it for bloat removal. Also, expressing all information relationally, as facts,
allows easy interfacing with many other program analysis and program verification tools, such
as SMT solvers, and other data analysis and data mining tools tools, including tools based
on big data techniques.
• Logic-based methods and tools for analysis and transformation studied for bloat removal,
especially if designed with appropriate abstractions, can provide a solid infrastructure for
other program analysis and manipulations. This is because effective bloat removal requires
deep and sophisticated program flow and dependence analysis and manipulation.
References
[1] Y. A. Liu and S. D. Stoller. From Datalog rules to efficient programs with time and space
guarantees. ACM Transactions on Programming Languages and Systems, 31(6):1–38, 2009.
[2] K. T. Tekle and Y. A. Liu. More efficient Datalog queries: Subsumptive tabling beats magic
sets. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of
Data, pages 661–672, 2011.
[3] J. Whaley and M. S. Lam. Cloning-based context-sensitive pointer alias analysis using binary
decision diagrams. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming
Language Design and Implementation, pages 131–144, 2004.
2
Download