Frameworks for Research in Code Generation and Execution Mark Lewin Program Manager External Research & Programs Microsoft Research Agenda Phoenix SSCLI – The Shared Source Common Language Infrastructure Operating Systems PHOENIX What Is Phoenix? Phoenix is a framework providing a unique and rich infrastructure for writing compilers, development & analysis tools, and plug-ins Foundation of next generation of Microsoft code generators, optimizers and analysis tools Platform for third-party plug-ins Platform for research and teaching: Phoenix Academic Program Phoenix Goals Provide industry leading compilation and tools infrastructure: “VC++ and .NET compilers and tools” Build research/development community around infrastructure: “the Phoenix Platform” Make the infrastructure scalable, configurable, and extensible: “JIT to WPO, compilation and analysis” Make infrastructure quick to retarget and rehost Key Features Written in C++, usable by any .NET language Dual-Mode: entire platform compiles to run native or on top of .NET Phase & Plug-in model for third party extensions to: VC++ Compiler, Binary reader/writer, Analysis Tools, … Support for Multi-threaded clients Support for Code and Data extensibility A single, strongly typed, explicit dataflow/control flow IR used throughout framework IR & Type system capable of processing native and/or managed code Strong inter-phase consistency checking Many diverse compilers and tools reuse the common core Compilation and analysis framework for 10+ years .Net CodeGen MSR Adv. Lang JITs PreJITs OO and .Net optimizations Language Research Direct xfer to Phoenix Research Insulated from code generation Native CodeGen Advanced C++/OO Optimizations FP optimizations OpenMP MSR Tools Phoenix Building Blocks Academic DevKit Retargetable “Machine Models” ~3 months: -Od ~3 months: -O2 Built on Phoenix API’s Both HL and LL API’s Managed API’s Program Analysis Chip Vendor DevKit ~6 month ports Sample port + docs Key ports (Arm) done at msft Full sources Managed API’s IP as DLLs Docs Code Gen Tools Code Gen LL Opts LL Opts LL Opts HL Opts HL Opts HL Opts Compilers Browser Visualizer Lint Formatter Obfuscator Refactor Xlator Profiler Security Checker Phx APIs Phoenix Core AST assembly C# VB Delphi Cobol C++ Native Image IR Syms Types CFG SSA C++ IR C++AST Phx AST C++ PREfast Lex/Yacc Eiffel Tiger Profile Front-end : Language-specific Decorated AST AST for (row = 0; row < sum += a[i]; } Scan for ( = 0 Assem row Parse CSA Middle-end : Optimizations Linear IR CSE Const Prop Dead Code OSR Alg Idents Inline Loop Unroll Hoist Invars Back-end : Codegen Instr Selection Image RegAlloc Schedule Status Phoenix VC++ compiler backend building/running Windows XP (40+ million lines of code) Phoenix .NET JIT compiler passing 95+% of production JIT compiler tests Phoenix static analysis being incorporated into current Enterprise Development tools, some internal tools have already been deployed 600+ RDKs being used by researches on a wide range of projects, active community feedback Working on improving optimizations to surpass current VC++ compiler performance Working on productization Phoenix Academic Program Phoenix RDK Conventional EULA for non-commercial use. Free to use for research and education Free download. Depends on Visual Studio 2005 technology. Support forum for faculty and teaching assistants. Latest version, May 2006 RDK, in your conference materials. Research Support 17 Research projects funded through Rotor research grants in 2006, 12 funded in 2005, 5 funded in 2004 Many Phoenix RFP project PI’s here at the Summit – ask them about their experiences. Academic Users of Phoenix Constructing Compact Debugging Traces with Binary Code Analysis and Instrumentation Phoenix-Based Compiler Course Development Compiler Backend Experimentation and Extensibility Using Phoenix Adaptive Inline Substitution in Phoenix Domain-Specific Language for Efficient Design-Rule Checking Setpoint: An Aspect Oriented Framework Based on Semantic Pointcuts Phase Aware Profiling with Phoenix Using Call Graph Analyses to Guide Selective Specialization in Phoenix Program Visualization with Fulcra and Phoenix Navel: Automating Software Support Using Traces of Software Behavior Techniques and Tools for Software Assurance Type-Checking the Intermediate Languages in the Phoenix JIT Compiler Introducing IMPACT Parallelism Discovery and Visualization Capabilities into Phoenix Wen-mei Hwu Sanders-AMD Endowed Chair University of Illinois, Urbana-Champaign Fulcra Integration Phoenix HIR MIR Type Check SSA LIR Addr Mode Executable Scheduling Phoenix to IMPACT C Source Code Profile Inline Pcode (AST) IMPACT Opti Scheduling Original Ported Fulcra Fulcra Lcode (3 addr) Executable Visualizer Pointer analysis performed in Pcode is based on source code semantics. The result of the analysis is annotated into Pcode Expressions which are translated to Lcode Operations during PtoL. We have ported our pointer analysis to work on our lower level IR (Lcode). Bridge from Phoenix to Lcode feeds MIR into Fulcra Pointer analysis information is directly annotated to Lcode operations for later visualization. Pcode vs. Lcode Interface between Lcode and Fulcra is adjusted, so that the initial constraint graph containing intra-procedural pointer information is correctly generated based on Lcode IR. Building initial constraint graph from Pcode requires shrinking complex expressions to a sequence of simpler expressions. FSM required to keep track of pointer assignment status of a Pcode expression is complicated. In Lcode each operation is itself a very simple expression. Only Lcode operations that can change the pointer assignments are important. ld, st, mov, add, jsr and ret Having type information annotated from Pcode into operands of each Lcode operation helps do a more efficient analysis. Only Lcode operations that include operands of pointer type are considered for building initial constraint graph. Code Example: Image Interpolation void InterpolateImage (Frame * small, Frame * large) { short *short_row, *large_row; unsigned int i, j; int height, width; for (j = 0; j < small->height - 1; j++) { short_row = small->image + j * small->width; large_row = large->image + 2 * j * small->width; for (i = 0; i < small->width - 1; i++) { large_row[2*i] = short_row[i]; large_row[2*i+1] = ((short_row[i] + short_row[i+2]) >> 1); large_row[2*i+large->width] = ((short_row[i] + short_row[i+small->width]) >> 1); large_row[2*i+large->width+1] = ((short_row[i] + short_row[i+small->width+1]) >> 1); } } } Constraint Graph in Pcode Intra-procedural constraint graph Fully solved constraint graph FSM that derives intra-procedural pointer relationships in Pcode can lead to a more conservative initial graph . Constraint Graph in Lcode Intra-procedural constraint graph Fully solved constraint graph The simpler FSM that derive intra-procedural pointer relationships in Lcode provides more accurate results. Code Example: Image Interpolation void InterpolateImage (Frame * small, Frame * large) { short *short_row, *large_row; If these are different objects, there unsigned int i, j; int height, width; are not many false dependences left. } for (j = 0; j < small->height - 1; j++) { short_row = small->image + j * small->width; large_row = large->image + 2 * j * small->width; for (i = 0; i < small->width - 1; i++) { large_row[2*i] = short_row[i]; large_row[2*i+1] = ((short_row[i] + short_row[i+2]) >> 1); large_row[2*i+large->width] = ((short_row[i] + short_row[i+small->width]) >> 1); large_row[2*i+large->width+1] = ((short_row[i] + short_row[i+small->width+1]) >> 1); } } Remaining dependences should be resolved through array disambiguation. Visualizer Mockup iteration 1 … 2 large->width (a) Conceptual access pattern of stores large->width = 2 * small->width 2 iteration 1 2 1 (b) Actual access pattern (flattened array): large->width is 2* small->width and stores do not overlap due to loop bounds 2 * small->width POTENTIAL CONFLICT 2 iteration 1 2 1 (c) Compiler view: because large->width is an unconstrained value, stores may overlap; loop iterations will not be executed in parallel Phoenix RDK Difficulties Memory management can be tricky when connecting managed Phoenix plugin to legacy C research infrastructure Handoff of strings from Phoenix to IMPACT must be handled carefully Updating between RDK releases Feb 2005Nov 2005: memory management, plugin interface changes Nov 2005May 2006: minor changes to available functions Documentation does not seem to have kept up with updates Type information available to Phoenix phase is not complete Structure types Void pointers Phoenix RDK Positives Windows POSIX API makes it simple to migrate Unix-based research infrastructure Minor changes to a few function names MSDN docs contained all necessary information to do the port Less than 50 lines (out of ~200,000) needed to be updated Phoenix documentation and sample code is quite helpful Phoenix Academic Forum is very responsive Phoenix -- Downloads and Info: http://research.microsoft.com/phoenix SSCLI (Rotor) What is SSCLI? SSCLI is a shared source implementation of CORE TECHNOLOGIES that underlie Microsoft’s .NET architecture. SSCLI is current with advances in the commercial Common Language Runtime. SSCLI is designed and documented for ACADEMIC RESEARCH and TEACHING. SSCLI Motivations & Milestones Motivations: Support ECMA/ISO standards efforts Validate OS platform neutrality of .NET Create an open laboratory for academic research Track evolution of commercial CLR Share! Milestones: SSCLI 1.0 released at OOPSLA 2002 Over 250,000 downloads through June, 2006 SSCLI 2.0 released March 2006! Managed Code Execution DEPLOYMENT DEVELOPMENT Source code public static void Main(String[] args ) { String usr; FileStream f; StreamWriter w; try { usr=Environment.GetEnvironmentVariable("USERNAME"); f=new FileStream(“C:\\test.txt",FileMode.Create); w=new StreamWriter(f); public static void Main(String[] args ) w.WriteLine(usr); { String usr; FileStream f; StreamWriter w; w.Close(); try { } catch (Exceptionusr=Environment.GetEnvironmentVariable("USERNAME"); e){ Console.WriteLine("Exception:"+e.ToString()); f=new FileStream(“C:\\test.txt",FileMode.Create); } w=new StreamWriter(f); } w.WriteLine(usr); w.Close(); } catch (Exception e){ Console.WriteLine("Exception:"+e.ToString()); } } Compiler Evidence Host Policy Manager Assembly PE header + MSIL + Metadata + EH Table Assembly Loader Policy <?xml version="1.0" encoding="utf-8" ?> <configuration> <mscorlib> <security> <policy> <PolicyLevel version="1"> <CodeGroup class="UnionCodeGroup" version="1" PermissionSetName="Nothing" Name="All_Code" Description="Code group grants no permissio ns and forms the root of the code group tree."> <IMembershipCondition clas s="AllMembershipCondition" version="1"/> <CodeGroup class="UnionCodeGroup" version="1" PermissionSetName="FullTrust" Granted permissions GAC, app. directory, download cache Permission request (class) (method) Class Loader NGEN EXECUTION Assembly info Module + Class list PEVerify Vtable + Class info JIT + verification CLR Services GC Exception Native code Class init + GC table Security The Shared Source CLI (SSCLI) VS.NET C# JScript VB System.Web (ASP.NET) UI SessionState HtmlControls Caching Security WebControls Configuration Debugger Simple Web Services Protocols Discovery Description Designers System.Data (ADO.NET) VC/MC++ ADO SDK Tools Design SN ILDAsm Boot Loader Threads System.Drawing Drawing2D Printing Imaging Text System.Xml XSLT Adapters Serialization XPath System Collections IO Security Configuration Net ServiceProcess Diagnostics Reflection Text Remoting Globalization Resources Threading Serialization Runtime InteropServices Common Language Runtime MetaInfo PEVerify ComponentModel GC App Domain Loader JIT MSIL Common Type System Class Loader Platform Adaptation Layer Sync Timers Networking Filesystem ECMA CLI ILDbDump Design SQL CorDBG ILAsm System.WinForms SSCLI: For Teaching and Research Complete example to enable research in and support teaching of modern programming languages, compiler design, and runtime infrastructure. SSCLI supports the ECMA standardization process with a real implementation Commercial grade code (but documented for academia) SSCLI license allows for “safe” examination of code For compiler writers who want to target CLI: JScript compiler shows dynamic techniques (in C#) C# compiler shows nearly all runtime features IL Assembler demonstrates low-level API implementation and use How SSCLI Is Organized Four major areas in source code 1. 2. 3. 4. Runtime “execution engine” Frameworks Compilers and tools Portability layer, tests, and build infrastructure Other important points of interest License Documentation Samples Research Support 40 Research projects funded through Rotor research grants in 2002, 40 funded in 2004 SSCLI RFP Capstone Workshop II held Fall 2005 Researchers from 18 countries 27 research and teaching projects presented http://research.microsoft.com/workshops/SSCLI2005/ SSCLI RFP Capstone Workshop I held Fall 2003 Researchers from over 20 countries 16 refereed paper presented IEE Journal: special Rotor research issue Contracts for CLI with Rotor Nam Tran Program Manager Phoenix Microsoft Corporation Research Interests Programming languages and tools Project on component contracts At Monash University, Melbourne, Australia Design-by-Contract & interaction constraints Apply to binary components Neutral of source languages and notations Proof-of-concept system on the CLI Extended the CLI platform Implemented by extending Rotor The Model Contracts include Preconditions, postconditions, invariants Protocols, mandatory calls, forbidden calls Binary components include Contract representation as first class entities Sufficient for run time monitoring Execution environment Aware of contracts Provide built-in run time monitoring The Implementation Extend CLI specification New metadata table to represent contracts Auxiliary methods to evaluate assertions Extended ilasm for source-neutral contracts Execution Engine provides monitoring Extend Rotor to implement extended CLI Dynamic instrumentation of IL before JIT Built-in capability to monitor Protocols as state machines Mandatory/forbidden calls by walking call stacks CLI/Rotor Benefits Open, extensible component platform Components with contracts run-able on .NET Full implementation with source A great research platform Only need to implement innovative features E.g. fewer than 10K lines of code for contract Thank you! Books Shared Source CLI Essentials – Dave Stutz, Geoff Shilling, Ted Neward; O’Reilly (2003) Compiling for the .NET Common Language Runtime (CLR) – John Gough; Prentice Hall (2002) Inside Microsoft .NET IL Assembler – Serge Lidin; Microsoft Press (2002, 2006) The Annotated CLI Standard – Jim Miller, Susann Ragsdale; Addison Wesley (2003) Distributed Virtual Machines: Inside the Rotor CLI – Gary Nutt; Addison Wesley (2005) SSCLI -- Downloads and Info: http://research.microsoft.com/sscli Operating Systems Windows Academic Program Windows CE Singularity Windows Academic Program Windows Operating System Internals Curriculum Resource Kit (CRK) presentation slides, experiments, labs, quizzes and assignments for introducing case studies from the Windows kernel into operating system courses. Windows Research Kernel – the core kernel CRK WRK ProjectOZ sources and binaries integrated with an environment for building and testing experimental versions of the Windows kernel for use in teaching and research. ProjectOZ - an operating systems project environment that uses the native kernel interfaces of Windows to provide simple, clean, user-mode abstractions of the CPU, MMU, trap mechanism, and physical memory that can be used to perform experiments in operating systems principles. WRK (Windows Research Kernel) Source from the latest shipping Windows (NTOS) kernel Version – Windows Server 2003 (x86/x64) and Windows XP x64 Included sources – most everything in NTOS - processes, threads, LPC, VM, scheduler, object manager, I/O manager, synchronization, worker threads, kernel memory manager, … Excluded sources – plug-and-play, power-management, and specialized code such as the driver verifier, splash screen, branding, timebomb, etc. Build environment – makefile-based with object library for the excluded sources. Kernels boot on native hardware or using VirtualPC. Windows CE For mobile and embedded devices All The goodies: Kernel Library, File System, Device Manager, Storage Manager, HTTP Web Server, Explorer Shell, SOAP Implementations, UPnP AV toolkit, Infrared Data Association, Microsoft Message Queuing, C run-time, Binary Rom Image file system, Windows Sockets Interface, Point to Point Protocol Singularity A multidisciplinary research OS, Languages, and Tools project from MSR Key approaches: Pervasive use of safe and analyzable programming languages Improve system resilience despite software errors Design for verifiability Microkernel architecture Written in Spec# (talk later today) Not a Windows/CLR replacement Windows Academic Program (Probert/Retik Brownbag at noon today!) http://www.microsoft.com/resources/sharedsource/Licensing/ WindowsAcademic.mspx Windows CE http://www.microsoft.com/resources/sharedsource/Licensing/ WindowsCE_Academic.mspx Singularity (Larus/Hunt talk at 2:45 today!) http://research.microsoft.com/os/singularity/ Thank you! © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.