10. The PyPy translation tool chain Toon Verwaest Thanks to Carl Friedrich Bolz for his kind permission to reuse and adapt his notes. The PyPy tool chain Roadmap > > > What is PyPy? The PyPy interpreter The PyPy translation tool chain © Toon Verwaest 2 The PyPy tool chain Roadmap > > > What is PyPy? The PyPy Interpreter The PyPy translation tool chain © Toon Verwaest 3 The PyPy tool chain What is PyPy? > Reimplementation of Python in Python > Framework for building interpreters and VMs > L * O * P configurations — L dynamic languages — O optimizations — P platforms © Toon Verwaest 4 The PyPy tool chain PyPy © Toon Verwaest 5 The PyPy tool chain Roadmap > > > What is PyPy? The PyPy interpreter The PyPy translation tool chain © Toon Verwaest 6 The PyPy tool chain The PyPy Interpreter > Python: imperative, object-oriented dynamic language > Stack-based bytecode interpreter (like JVM, Smalltalk) def f(x): return x + 1 © Toon Verwaest >>> dis.dis(f) 2 0 LOAD_FAST 0 (x) 3 LOAD_CONST 1 (1) 6 BINARY_ADD 7 RETURN_VALUE 7 The PyPy tool chain The PyPy Bytecode Compiler > Written in Python > .py to .pyc > Standard, flexible compiler — — — — > Lexer Parser AST builder Bytecode generator You only have to build this once © Toon Verwaest 8 The PyPy tool chain Bytecode interpreter > Focuses on language semantics. No low-level details! > Written in RPython — This makes it very slow! About 2000x slower than CPython > PyPy's Python bytecode compiler and interpreter are not the hot topic of the PyPy project! © Toon Verwaest 9 The PyPy tool chain Roadmap > > > What is PyPy? The PyPy interpreter The PyPy translation tool chain © Toon Verwaest 10 The PyPy tool chain The PyPy Translation Tool Chain > Model-driven interpreter (VM) development — Focus on language model rather than implementation details — Executable models (meta-circular Python) > Translate models to low-level (LL) back-ends — Considerably lower than Python — Weave in implementation details (GC, JIT) — Allow compilation to different back-ends (OO, procedural) © Toon Verwaest 11 The PyPy tool chain The PyPy Translation Tool Chain © Toon Verwaest 12 The PyPy tool chain Inside the Translation Tool Chain © Toon Verwaest 13 The PyPy tool chain PyPy “Parser” Tool chain starts from loaded Python bytecode > Translator shares Python environment with the target > Relies on Python's reflective capabilities > Allows meta-programming (runtime initialization) > def a_decorator(an_f): def g(b): an_f(b+10) return g @a_decorator def f(a): print a f(4) -> 14 © Toon Verwaest 14 The PyPy tool chain PyPy Control-Flow Graph © Toon Verwaest 15 The PyPy tool chain PyPy Control-Flow Graph > Consists of Blocks and Links > Starting from entry_point > “Single Static Information” form def f(n): return 3*n+2 © Toon Verwaest Block(v1): # input argument v2 = mul(Constant(3), v1) v3 = add(v2, Constant(2)) 16 The PyPy tool chain PyPy CFG: “Static Single Information” > Remember SSA: PHIs at dominance frontiers © Toon Verwaest 17 The PyPy tool chain PyPy CFG: “Static Single Information” > SSI: “PHIs” for all used variables – Blocks as “functions without branches” def test(a): if a > 0: if a > 5: return 10 return 4 if a < - 10: return 3 return 10 © Toon Verwaest 18 The PyPy tool chain Type Inference © Toon Verwaest 19 The PyPy tool chain Why type inference? > Python is dynamically typed > We want to translate to statically typed code — For efficiency reasons © Toon Verwaest 20 The PyPy tool chain What do we need to infer? > Type for every variable > Messages sent to an object must be defined in the compile-time type or a supertype © Toon Verwaest The PyPy tool chain How to infer types? > Starting from entry_point — Can reach the whole program — We know type of arguments and return-value > Forward propagation — Iteratively, until all links in the CFG have been followed at least once — Results in a large dictionary mapping variables to types © Toon Verwaest 22 The PyPy tool chain Implications of applying type inference Applying type inference restricts type of input programs © Toon Verwaest 23 The PyPy tool chain RPython: Demo def plus(a, b): return a + b def entry_point(arv=None): print plus(20, 22) print plus(“4”, “2”) © Toon Verwaest 24 The PyPy tool chain RPython: Demo @objectmodel.specialize.argtype(0) def plus(a, b): return a + b def entry_point(arv=None): print plus(20, 22) print plus(“4”, “2”) © Toon Verwaest 25 The PyPy tool chain RPython is Zen > Subset of Python > Informally: The subset of Python which is type inferable > Actually: type inferable stabilized bytecode — Allows load-time meta-programming (see parser) — Messages sent to an object must be defined in the compile-time type or supertype © Toon Verwaest 26 The PyPy tool chain RTyper © Toon Verwaest 27 The PyPy tool chain RTyper > Bridge between annotator and low-level code generators > Different low-level models for different target groups — LLTypeSystem — OOTypeSystem C-style (structures, pointers and arrays) JVM, CLI, Squeak (trace-off: single inheritance, ) > Does not need to iterate until a fixpoint is reached > Replaces all operations by low-level ones © Toon Verwaest 28 The PyPy tool chain Back-end Optimizations © Toon Verwaest 29 The PyPy tool chain Back-end Optimizations > Some general optimizations — Inlining — Constant folding — Escape analysis (allocating objects on the stack) > Partly assume code generation for optimizing back-end © Toon Verwaest 30 The PyPy tool chain Back-end Optimizations: “Object Explosion” > OO: lots of helper objects > Allocating objects is expensive > Replace unneeded objects with direct calls © Toon Verwaest 31 The PyPy tool chain Preparation for Source Generation © Toon Verwaest 32 The PyPy tool chain Exception Handling and Memory Management > C has no support for: — automatic memory management — exception handling > Translate explicit exception handling to flags and if/else > Memory management in PyPy spirit: — not language specific — weave garbage collector in during translation © Toon Verwaest 33 The PyPy tool chain JIT Compiler > Makes VMs fast — Dynamic information is key > Is an implementation detail Weave in while translating to low-level! > Still under development > “As you surely know, the key idea of PyPy is that we are too lazy to write a JIT of our own: so, instead of passing nights writing a JIT, we pass years coding a JIT generator that writes the JIT for us :-)” © Toon Verwaest 34 The PyPy tool chain Code Generation © Toon Verwaest 35 The PyPy tool chain Code Generation > One C-function per Control-Flow Graph > All low-level statements can be translated directly > Gets compiled to binary format with C compiler © Toon Verwaest 36 The PyPy tool chain Translation Demo © Toon Verwaest 37 The PyPy tool chain PyPy Performance > Translator — — — — > Slow Uses quite some memory Produces lots of source code (200 kloc for 5 kloc source) But: our models are executable (2000x slower than CPython) Resulting Interpreter — Currently: two times slower to two times faster than CPython — First experiments with JIT: up to 500x faster for special cases — But most importantly: very adaptable! © Toon Verwaest 38 The PyPy tool chain More PyPy & Getting Involved http://codespeak.net/pypy > http://morepypy.blogspot.com > irc://irc.freenode.org/pypy > PyPy sprints > © Toon Verwaest 39 The PyPy tool chain Summary > PyPy project has two main parts — Language interpreter models — PyPy translation tool chain > PyPy translation tool chain — Has no typical parser — Uses SSI — Applies type inference – Limits input from Python to RPython — Compiles to low-level and object-oriented back-ends — Weaves in implementation details © Toon Verwaest 40 The PyPy tool chain Summary © Toon Verwaest 41 The PyPy tool chain What you should know! What is the goal of the PyPy project? What are the main steps of the PyPy toolchain? When is a program RPython? © Toon Verwaest 42 The PyPy tool chain Can you answer these questions? Why do we want to keep the language model separated from implementation details? > Why wouldn't we want to keep those details separated? > Why is it not really a problem that the tool chain can only compile RPython code? > © Toon Verwaest 43 The PyPy tool chain xxx License > http://creativecommons.org/licenses/by-sa/2.5/ Attribution-ShareAlike 2.5 You are free: • to copy, distribute, display, and perform the work • to make derivative works • to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. • For any reuse or distribution, you must make clear to others the license terms of this work. • Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. © Toon Verwaest 44