DCSPM: Develop and Compile Subset of PASCAL Language to MSIL Master Proposal by Abdullah Sheneamer Master of Computer Science University Of Colorado, Colorado Springs 1. Committee Members and Signatures: Approved by Date Advisor: Dr.Albert Glock Committee member: Dr. Edward Chow Committee member: Albert Brouillette 1 2. Introduction In the computer world, techniques evolve rapidly from theories, algorithms, programming languages, software systems, and software engineering. Fortunately, compilers allow programmers to write at a high level, and automated processing take care of creating the machine-specific instructions. My project will design and create a compiler which translates PASCAL source code into Microsoft Intermediate Language (MSIL). MSIL includes instructions for loading, storing, initializing, and calling methods on objects, as well as instructions for arithmetic and logical operations. There is currently no PASCAL compiler which compiles to MSIL. The Just-in-time (JIT) compiler will convert the MSIL to CPU- Specific code [1]. The advantage in compiling to MSIL is that 1) legacy PASCAL can now be run on modern machines, 2) MSIL is platform independent and 3) JIT compilers can be optimized for specific machines and architectures. The JIT compiler can also do aggressive optimizations specifically for the machine where the code is running. Program HelloWorld; Begin Writeln (‘ Hello World’); End . Compilation PASCAL Compiler Execution MSIL JIT Compiler .method public static void Main() cil managed { .entrypoint .maxstack 1 IL_00: ldstr "Hello World" IL_05: call void [mscorlib]System.Console::WriteLine(string) IL_10: ret } // end of method HelloWorld::Main Figure 1: The compilation and execution process of PASCAL programs. 2 Native Code -Compilation process: takes PASCAL source code and produces MSIL. The PASCAL compiler includes lexical and syntax analysis, and the creation of the symbol table. MSIL is created when compiling to manage native code. MSIL is a CPU-independent set of instructions that can be efficiently converted to native code. Such as in Figure 2. -Execution process: MSIL must be converted to CPU-specific code, usually by a just-in-time (JIT) compiler. Native code is computer programming (code) that is compiled to run with a particular processor (such as an Intel x86-class processor) and its set of instructions. Source code of PASCAL Lexical Analysis Parser Symbol Table MSIL .method public static void Main() cil managed { .entrypoint .maxstack 1 IL_00: ldstr "Hello World" IL_05: call void [mscorlib]System.Console::WriteLine(string) IL_10: ret } // end of method HelloWorld::Main Figure 2: Compilation process 3 Error Handler 3. Project Plan As part of this project, I will design the Intermediate language (IL or MSIL) for PASCAL Language. I plan to design a compiler that can handle subset of PASCAL Language to compile to MSIL including, assignment statement, Writeln instructions, if statement, if/else statement, for statement and switch statement. Also, I will design an algorithm that implements the lexical analysis and another algorithm for syntax analysis and semantic analysis then improve these algorithms and evaluate these different algorithms for their performance. By observing the performance I will try to improve the compiler. ILAsm has the instruction set same as that the native assembly language has. You can write code for ILAsm in any text editor like notepad and then can use the command line compiler (ILAsm.exe) provided by the .NET framework to compile that. ILAsm.exe is a command line tool shipped with the .NET Framework and can be located at <windowsfolder>\Microsoft.NET\Framework\<version> folder. You can include this path in your path environment variable. When you have finished compiling your .IL file, then it will output the exe with the same name as that of .IL file. You can specify the output file name using /OutPut=<filename> switch like ILAsm Test.il /output=MyFile.exe. To run the outputted exe file, just type the name of the exe and hit return. Output will be before you on the screen. [11] So, my project will include: 1- Design and Implement subset of Pascal Language Lexical analysis, Syntax analysis and Semantic analysis. 2- Design Assignment statement that an arithmetic expression is an expression using additions +, subtractions -, multiplications *, and divisions /. A single mode arithmetic expression is an expression all of whose operands are of the same type (i.e. INTEGER, REAL or COMPLEX). However, only INTEGER and REAL will be covered in this project. Therefore, those values or variables in a single mode arithmetic expression are all integers or real numbers. such as a:=b+c/d-e OR an assignment statement gives a value to a variable such as x:=5; and compile that to Intermediate language. 3- The PASCAL compiler is structured in such a way that a write, and writeln statements containing more than one argument is compiled into several write statement with only one argument. For writeln, these statements are followed by a statement that writes the end-of-line. So for example the writeln statement: “ Prgoram Write; Begin writeln('This writeln is compiled into MSIL '); End . ” So,I will Design and Compile “Writeln” instruction to MSIL 4 4- Design and Compile “if” Statement to MSIL a- If variable1 > , < ,= ,>=,<= variable2 Then Begin variable3 := variable1*variable2; End; b- If variable1 > , < ,= ,>=,<= variable2 Then Begin Writeln(‘ Conditional statement End; c- If variable1 > , < ,= ,>=,<= Number Then Begin Writeln(‘ Conditional statement End; 5- Design and Compile “if/Else” Statement to MSIL a- If variable1 > / < / = / >=/<= variable2 Then Begin variable3 := variable1+,-,*,/ variable2; End Else Begin variable2 := variable1 +,-,*,/ variable3 End b- If variable1 > / < / = / >=/<= variable2 Then Begin variable3 := variable1+,-,*,/ variable2; End Else Begin Writeln(‘ Condtitional statement’); End c- If variable1 > / < / = / >=/<= Number Then Begin variable3 := variable1+,-,*,/ variable2; End Else Begin Writeln(‘ Condtitional statement’); End 6- Design and Compile “ While” Statement to MSIL a- While Variable1 >,<.,=,<=,>= Variable2 Do Begin Writeln(‘ While Statement’); Variable1:= Variable1 + 1; End; b- While Variable1 >,<.,=,<=,>= Number Do Begin Writeln(‘ While Statement’); Variable1:= Variable1 + 1; End; c- While Variable1 >,<.,=,<=,>= Variable2 Do 5 Begin Variable3: = Variable1 *,+,/,- Variable2 ….; Variable1:= Variable1 + 1; End; d- While Variable1 >,<.,=,<=,>= Number Do Begin Variable3: = Variable1 *,+,/,- Variable2 ….; Variable1:= Variable1 + 1; End; 7- Design and Compile “For” Statement to MSIL a- For I:= Number To Number Do Begin Writeln(‘ For Statement’); End; b- For I:= Number To Number Do Begin Variable3: = Variable1 *,+,/,- Variable2 ….; End; c- For I:= Number To Number Do Begin If variable1 > , < ,= ,>=,<= variable2 Then Begin Writeln(‘ Conditional statement ‘) End; 8- Design and Compile “Switch” Statement to MSIL a- Case Variable of Value1 Writeln(‘A’); Case Variable of Value2 Writeln(‘B’); Case Variable of Value3 Writeln(‘C’); Else Writeln(‘D’); End 9- Evaluation of the algorithms 6 For example of Compile Pascal program to MSIL: “ Program HelloWorld; Begin Writeln (‘ Hello World’); End . “ The output of MSIL: // Metadata version: v4.0.30319 .assembly extern mscorlib { .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4.. .ver 2:0:0:0 } .assembly HelloWorld { .hash algorithm 0x00008004 .ver 0:0:0:0 } .module expression.dll .imagebase 0x00400000 .file alignment 0x00000200 .stackreserve 0x00100000 .subsystem 0x0003 // WINDOWS_CUI .corflags 0x00000001 // ILONLY // Image base: 0x00820000 // =============== CLASS MEMBERS DECLARATION =================== .class public auto ansi HelloWorld extends [mscorlib]System.Object { .method public static void Main() cil managed { .entrypoint .maxstack 1 IL_00: ldstr "Hello World" IL_05: call void [mscorlib]System.Console::WriteLine(string) IL_10: ret } // end of method HelloWorld::Main .method public specialname rtspecialname instance void .ctor() cil managed { .maxstack 2 IL_00: ldarg.0 IL_01: call instance void [mscorlib]System.Object::.ctor() IL_06: ret } // end of method HelloWorld::.ctor } // end of class HelloWorld 3.1 Tasks: 3.1.1 Already Complete - done during fall 2011 to present Designed and implemented lexical Analysis [2],[4],[5],[6] Built Assembly and Compile “Writeln” instruction to MSIL[7],[8],[9] 3.1.2 In Progress - should finish in spring 2012 Design and Implement Syntax “ Parser” [4], [7],[8],[9] Design and Compile Assignment Statement [7],[8],[9] 7 Design and Compile subset of “if” Statement to MSIL [7],[8],[9] Design and Compile subset of “if/Else” Statement to MSIL [7],[8],[9] 3.1.3 Future - complete during summer2012/fall 2012 (Listed from highest to lowest priority) Design and Compile subset of “For” Statement to MSIL [7],[8],[9] Design and Compile subset of “ While” Statement to MSIL[7],[8],[9] Design and Compile subset of “Switch” Statement to MSIL [7],[8],[9] Evaluation the algorithms. 3.2 Deliverables: 1- A working c# based PASCAL compiler. 2- A master report documenting the design and implementation of the subset of PASCAL compiler. Additionally, improvements in the compilation process will be demonstrated and documented. 4.0 References 1. http://msdn.microsoft.com/en-us/library/c5tkafs1(v=vs.71).aspx 2. C# To Program By H.M Deitel & P.J.Deitel& J.Listfield & T.R. Nieto & C.Yaeger & M.Zlatkina. 3. Compiler Construction principles and practice by Kennth C.louden 4. Data Structure using Java By D.S.Malik & P.S.Nair. 5. An introduction to formal languages and automata. Fourth Edition. Peter Linz 6. Compilers Principles, Techniques and Tools by Alfred V.Aho, Ravi Sethi and Jeffrey D. Ullman. 1985 7. Develop a Compiler in Java for a Compiler Design Course Abdul Sattar and Torben Lorenzen 8. Guide to assembly language [electronic resource] : a concise introduction / James T. Streib.Streib, James T. London ; New York : Springer, c2011. 9. Using a Stack Assembler Language in a Compiler Course by Dr. Gerald Wildenberg St . John Fisher College, Rochester, NY Bristol Polytechnic, England (1989-1990 ) 10. Expert .NET 2. IL assembler/ Serge Lidin. Lidin, Serge. 1956- Berkeley, CA 11. http://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language 8 Appendix A: <program> ::= Program <identifier> ; <block> . <block> ::= <variable declaration part> <procedure declaration part> <statement part> variable declaration part> ::= <empty> | var <variable declaration> ; { <variable declaration> ; } <variable declaration> ::= <identifier > { , <identifier> } : <type> <type> ::= <simple type> <simple type> ::= <type identifier> <type identifier> ::= <identifier> <statement part> ::= <compound statement> <compound statement> ::= begin <statement>{ ; <statement> } end <statement> ::= <simple statement> | <structured statement> <simple statement> ::= <assignment statement> | <read statement> | <write statement>| <if statement> | <for statement> <assignment statement> ::= <variable> := <expression> <read statement> ::= read ( <input variable> { , <input variable> } ) <input variable> ::= <variable> <write statement> ::= write ( <output value> { , <output value> } ) <output value> ::= <expression> <structured statement> ::= <compound statement> | <if statement> | <while statement> <if statement> ::= if <expression> then <statement> | if <expression> then <statement> else <statement> <while statement> ::= while <expression> do <statement> <for statement> ::= for <variable identifier > ::= <expression> to <expression> do < statement> <expression> ::= <simple expression> | <simple expression> <relational operator> <simple expression> <simple expression> ::= <sign> <term> { <adding operator> <term> } <term> ::= <factor> { <multiplying operator> <factor> } <factor> ::= <variable> | ( <expression> ) 9 <relational operator> ::= = | <> | < | <= | >= | > <adding operator> ::= + | <multiplying operator> ::= * | / <variable> ::= <entire variable> <entire variable> ::= <variable identifier> <variable identifier> ::= <identifier> <identifier> ::= <letter> { <letter or digit> } <letter or digit> ::= <letter> | <digit> <integer constant> ::= <digit> { <digit> } <character constant> ::= '< any character other than ' >' | '''' <letter> ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p|q|r|s|t|u|v|w|x|y|z|A|B|C| D|E|F|G|H|I|J|K|L|M|N|O|P |Q|R|S|T|U|V|W|X|Y|Z <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <special symbol> ::= + | - | * | = | <> | < | > | <= | >= | ( | ) | := | . | , | ; | : | if | then | else | of | while | do | begin | end | read | write | var | | program | switch| for | to <predefined identifier> ::= integer | Boolean 10