Where's My Compiler? - Columbia University

advertisement
Where’s My Compiler?
Developer tools: past, present, and future
Jim Miller
Software Architect, Developer Frameworks
Microsoft Corporation
(with help from Carol Eidt, Phoenix Project, Microsoft Corporation)
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere
23-Mar-16
Where's My Compiler?
2
What Is A Compiler?

A converter from one representation
(source code) to another (executable
code)


Preserves (most of) the meaning of the
source
One part of a modern “tool chain” used
to produce executable artifacts
(applications)
23-Mar-16
Where's My Compiler?
3
A Compiler
Source Code
Describes desired
behavior
Compiler
Has desired behavior, but
Executable
Code
23-Mar-16
• May have different internal
structure
• May execute in different
(unobservable) order
Where's My Compiler?
4
Figures of Merit

Code Quality: how efficient is the
generated code?



Speed and Space: these aren’t
independent, but they aren’t the same
either
Throughput: how fast is the code
generated?
Footprint: how large is the compiler?
23-Mar-16
Where's My Compiler?
5
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere
23-Mar-16
Where's My Compiler?
6
1950s: Just a Compiler, Please

The compiler references a runtime, but the runtime is
supplied by the OS at a fixed location in memory





FORTRAN runtime: input/output formatting
COBOL runtime: also search and sort
OS loader loads the compiler output into memory,
transfers control
Address space is small (< 8K word), CPU is slow (<
1,000 instructions/sec.)
Figure of merit: Code Quality


Compiler must optimize code for space
Compiler must optimize code for speed
23-Mar-16
Where's My Compiler?
7
Inside the Compiler
(in concept)
Source Code
Front End
Compiler
Back End
Executable
Code
23-Mar-16
Where's My Compiler?
8
Inside the Compiler
Source Code
(in concept)
• Parse source code
• Produce abstract syntax tree (AST)
• Produce symbol table
Front End
• Generate errors
• Syntax errors
• Type errors
Compiler
• Unbound references
Back End
Executable
Code
23-Mar-16
Where's My Compiler?
9
Inside the Compiler
(in concept)
Source Code
Front End
• Linearize parse tree
• Code Analysis
Compiler
• Basic block analysis
• Control- and data-flow graph analysis
Back End
• Optimize (machine-independent)
• Redundant and dead code elimination
• Code restructuring
Executable
Code
23-Mar-16
• Convert to executable code
• Register allocation
• Peephole optimization
• Branch prediction and tensioning
Where's My Compiler?
10
1960s: Linkers


Programs are growing in size
Programs are built with libraries



Virtual memory systems are invented
Tool chain is in two stages



Libraries provide reusable code fragments
Compile independent modules
Combine the modules using a linker
Figure of merit: Code quality (speed)
23-Mar-16
Where's My Compiler?
11
Tools: Compiler + Linker
Source Code
Source Code
Source Code
Front End
Front End
Front End
Back End
Back End
Back End
Object Code
Object Code
Object Code
Compiler
Includes
external
references
Linker
Executable Code
23-Mar-16
Where's My Compiler?
12
1970s: Symbolic Debugger

OS written in high-level language


High-level languages provide large runtime
libraries in multiple units



Compilers provide sufficient code performance and
low-level access
Static linker pulls only required units into a given
program image
Compiler exports symbol table for use by
debugger, not just internal to front-/back-end
Figure of merit: Code quality (speed)
23-Mar-16
Where's My Compiler?
13
Compiler, Linker, Debugger
Source Code
Source Code
Source Code
Front End
Front End
Front End
Back End
Back End
Back End
Object Code
Object Code
Object Code
Compiler
Symbol table(s)
Linker
Debugger
Running Program
23-Mar-16
Where's My Compiler?
14
1980s: Dynamic Loading,
Threading

To improve OS performance, by reducing physical memory pressure, read/only
parts of libraries are shared between applications


OS loader fixes up references to shared libraries – just like the static linkers





Locks, monitors, events, polling
Order of operations visible across thread boundaries
Memory model semantics become an issue
Ada™ introduces rendez-vous, other languages have other constructs
Tool chain





Not all libraries are loaded into the same virtual address
Concurrency issues addressed in programming languages


Loaded on first reference
Compiler(s)
Linker
Loader
Symbolic debugger
Figure of merit: Code quality (speed, but this is related to space)
23-Mar-16
Where's My Compiler?
15
OS Dynamic Loader
Source Code
Source Code
Source Code
Front End
Front End
Front End
Back End
Back End
Back End
Object Code
Object Code
Object Code
Compiler
Includes fixups for
shared code
Symbol table(s)
Static Linker
Image File
Image File
Image File
OS Loader
Debugger
Running Program
23-Mar-16
Where's My Compiler?
16
1990s: JITs and Managed
Runtimes

Garbage Collection goes mainstream



Verification requires runtime to analyze code



Typically by a factor of 5 to 15
Tool chain: split the compiler in two!




Verification is similar to front-end compiler work
Can be done to native code, but much simpler with an intermediate
language
Just-in-time (JIT) compilation increases performance over pure
interpretation


Previously: LISP, APL, SmallTalk
1990s: Java, Jscript, C#, VB
Linearize the AST to create Intermediate Language (IL)
Save symbol table as “metadata”
Reorder the chain
Figures of merit: Throughput first, code quality second
23-Mar-16
Where's My Compiler?
17
OS Dynamic Loader (repeat)
Source Code
Source Code
Source Code
Front End
Front End
Front End
Back End
Back End
Back End
Object Code
Object Code
Object Code
Compiler
Includes fixups for
shared code
Symbol table(s)
Static Linker
Image File
Image File
Image File
OS Loader
Debugger
Running Program
23-Mar-16
Where's My Compiler?
18
OS Dynamic Loader (repeat)
Source Code
Front End
Compiler
Back End
Object Code
Static Linker
Image File
OS Loader
Debugger
Running Program
23-Mar-16
Where's My Compiler?
19
Managed Runtime
Source Code
Compiler
Front End
Compiler
Image File
Back End
OS Loader
Object Code
Dynamic
Linker
Runtime
Static Linker
Image File
Back End
OS Loader
Debugger
Running Program
23-Mar-16
Where's My Compiler?
20
Managed Runtime
Metadata +
Intermediate
Language
Compiler
Source Code
Front End
Compiler
Image File
Back End
OS Loader
Object Code
Dynamic
Linker
Runtime
Static Linker
Image File
Back End
OS Loader
Debugger
Running Program
23-Mar-16
Where's My Compiler?
21
2000s: Reflection-based
Computation

Reflection: ability of a program to observe and possibly modify its
structure and behavior



Interactive Development Environments (IDEs)




Intellisense™
Refactoring
Interactive syntax analysis
Query Integration



Compilers “preserve meaning” but runtime reflection makes more
information visible, so optimizations are more limited
Metadata (symbol table) or equivalent needed at runtime, not just
compile/link time
Builds expression trees (ASTs) at compile time
Runtime operations to combine and manipulate them
Figures of merit:


“Compiler” and “JIT compiler”: throughput
“Pre-JIT” compiler: balance of throughput and code quality
23-Mar-16
Where's My Compiler?
22
Runtime Reflection
Source Code
Front End
Development
Environment
Metadata +
Intermediate
Language
Image File
OS Loader
Metadata
(symbol table)
Dynamic
Linker
Back End
Debugger
Running Program
23-Mar-16
Where's My Compiler?
23
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere
23-Mar-16
Where's My Compiler?
24
1970: Numbles




“Number puzzles for Nimble minds”
Column in “Computers and Automation”
Numble verifier written by Stuart Nelson
Input language:
SEND
+ MORE
======
MONEY



Output: a program to try all possible values for letter
assignments to digits
Handled +, -, *, and =
Hand coded in PDP-9 assembly language
23-Mar-16
Where's My Compiler?
25
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere





Free-standing compilers
Under the hood
Inside applications
In the tool chain
Inside libraries
23-Mar-16
Where's My Compiler?
26
Special-Purpose Compilers


Compile-to-hardware
Aspect-Oriented Programming (AOP) weaver




Parser finds new syntax to mark insertion points
Back-end inserts code snippets for different
aspects
More generally: “assembly rewriting”
Work-flow and object design languages


Input may be textual or graphic layouts
Output may be code or graphic designs
23-Mar-16
Where's My Compiler?
27
Mark-up Compilers

XML schema (or DTD)



Web-services Description (WSDL)



Output: proxy that parses input and dispatches
Output: code to convert data structure to XML (“serializer”)
XAML (Windows Presentation Framework)



Output: parser
Output: deserializer
Output: parser
Output: executable code
XSL
23-Mar-16
Where's My Compiler?
28
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere





Free-standing compilers
Under the hood
Inside applications
In the tool chain
Inside libraries
23-Mar-16
Where's My Compiler?
29
Modern Hardware: CPU

Compile “machine code” to “micro code”




Part of the instruction cache


CPU Architecture is the abstraction boundary
RISC vs CISC is an old debate
x86 and x64 are CISC on the outside, RISC on the inside
Engineering note: an icache miss now often means a pause
to compile in addition to a memory fetch!
Allows innovation in actual hardware while still
running existing code



Chips optimized for specific usage scenarios
Chips take advantage of materials science advances
Chips take advantage of new internal architectures (multicore)
23-Mar-16
Where's My Compiler?
30
Modern Hardware: Graphics




Graphics memory isn’t just for data
Very sophisticated compilation steps
Parallel execution with CPU
Adapts to changing hardware organization



Raster scan vs vector
Resolution, speed, synchronization
Adapts to predominant usage pattern



Animation
3D
Shading
23-Mar-16
Where's My Compiler?
31
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere





Free-standing compilers
Under the hood
Inside applications
In the tool chain
Inside libraries
23-Mar-16
Where's My Compiler?
32
Databases

SQL is a full programming language




Compiled to intermediate form on client
Intermediate form is passed to server for execution
Server optimizes the intermediate form to produce an “execution
plan”
Query optimization

Additional inputs include




Size of tables
Frequency of query types
Indexing information
Outputs include




23-Mar-16
Executable code
Temporary indexes
Background indexing requests
Updated frequency information
Where's My Compiler?
33
Hardware Emulators

Object code translation at runtime




Alternate hardware emulation


HP3000 to PA-RISC in 1983
Vax to Alpha in 1990s
32-bit programs on 64-bit hardware
Device emulators for everything from smart cards
to cell phones to iPod to pocket PCs
JIT compilation trades start-up time for high
performance execution

Often, but not always, a good trade-off
23-Mar-16
Where's My Compiler?
34
Code Analysis Tools

Analyzing API surface


“Remodularizing” implementation



Simple to do with front end ASTs
Requires static and dynamic dependency analysis
– normal compiler back end work
Requires rebuilding the program, easily done
using front end ASTs
Race detection


Instrument code at compile time
Gather data as it runs under high stress
23-Mar-16
Where's My Compiler?
35
“Tree Shakers”




Start with AST tree and appropriate
dependency graph
Pull AST nodes found starting at a given
graph node, recursively
Convert resulting set of AST nodes to
appropriate output format
Example uses:


Subset library based on initial set of types
Statically link subset of library for a given
application
23-Mar-16
Where's My Compiler?
36
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere





Free-standing compilers
Under the hood
Inside applications
In the tool chain
Inside libraries
23-Mar-16
Where's My Compiler?
37
A Modern Interactive
Development Environment (IDE)

Code editor


Project system



Orders clean-up, compile, and link operations
Debugger



Tracks the public shape of components
Tracks dependencies between components
Build system


Knows the programming language, provides syntax support and contextsensitive name lookup
Allows inspection and modification of values at runtime
Allows control operations (e.g., breakpoint, continue, restart)
Dynamic Support


Allows program modification interwoven with execution (“edit and
continue”)
Global interaction space (“read-eval-print loop”)
23-Mar-16
Where's My Compiler?
38
Compilers in the IDE (I)

In the code editor




Incrementally parses the code as it is being
entered. Note: must deal with incorrect syntax
and partial programs.
Suggests possible completions based on a symbol
table. Note: symbol table must include external
references maintained by the project system.
Refactoring operations require both syntactic and
semantic analysis. Note: refactoring requires
information maintained by the project system.
In the debugger

Expression evaluation
23-Mar-16
Where's My Compiler?
39
Compilers In the IDE (II)

Dynamic support

Edit-and-continue



Requires a full, incremental compiler
For efficiency, it also requires the ability to compress the
output as a “diff” between the original and the new code
Interactive workspace


Like LISP, APL, SmallTalk, Python, etc.
Requires



23-Mar-16
a compiler or
an interpreter -- really, a compiler front end to generate an AST
combined with a tree walker to execute the tree.
The compiler must be capable of generating code that uses
code and objects resident in the evaluation environment, which
generally means a reliance on reflection.
Where's My Compiler?
40
Compilers in the Linker


The linker sees “the whole program”, so it’s better positioned to
do global analysis
Solution: write a compiler



Optimizations:




Input language is object file format (native code or IL)
Output language is OS image file format
Aggressive in-lining across module boundaries
Code motion across module boundaries
Full type system analysis (treat leaf types as sealed)
Issues:



These flow graphs are *big*
The linker doesn’t see the whole program (dynamic linking)
Reflection and dynamic linking reduce permitted optimizations

23-Mar-16
Or require the ability to back out or recompute optimizations at runtime
Where's My Compiler?
41
Profile-Guided Optimization


Idea: Instrument the program, run it with
typical loads, then re-optimize using this
profiling data. (Similar to “Hotspot”)
Optimizations:

Optimize only “hot” code fragments




So you can spend more time on them
Method and basic block reordering to increase
code density
Code reordering to optimize branch prediction and
minimize “long” references
Cache locality optimizations for data and code
23-Mar-16
Where's My Compiler?
42
Outline




What Is A Compiler?
A Brief History of Developer Tools
My First Compiler
Compilers, compilers, everywhere





Free-standing compilers
Under the hood
Inside applications
In the tool chain
Inside libraries
23-Mar-16
Where's My Compiler?
43
For the Developer

“Regular expression” parsing


Grammar is usually more powerful than
regular expressions
Serialization and Deserialization


Reflects on data type to be marshalled
Generates specialized code to convert to
stream format (serialization) or parse into
in-memory format (deserialization)
23-Mar-16
Where's My Compiler?
44
For the Compiler Writer

Parser-generators



AST tool kits



Microsoft is investing in this area
Provides integration into may aspects of the IDE
Executable file format tool kits


lex
yacc
Queensland University of Technology PERWAPI
Optimization tool kits

Microsoft’s Phoenix project
23-Mar-16
Where's My Compiler?
45
Questions?
23-Mar-16
Where's My Compiler?
46
Download