Appendix A - College of Engineering and Applied Science

Abdullah Sheneamer 2012
DCSPM
Develop and Compile Subset of
PASCAL Language to MSIL
By
Abdullah Sheneamer
A project submitted to the Faculty of Graduate School of the
University of Colorado at Colorado Springs
in Partial Fulfillment of the Requirements
for the Degree of
Master of Science in Computer Science
Department of Computer Science
Fall 2012
1
Abdullah Sheneamer 2012
© Copyright by Abdullah Sheneamer 2012
All Rights Reserved
2
Abdullah Sheneamer 2012
This project for the Master of Science degree by
Abdullah Sheneamer
has been approved for the
Department of Computer Science
By
_______________________________________________________
Dr. Albert Glock, Advisor
_______________________________________________________
Dr. C. Edward Chow, Committee member
_______________________________________________________
Albert Brouillette, Committee member
_______________________________
Date
3
Abdullah Sheneamer 2012
DCSPM
Develop and Compile Subset of
PASCAL Language to MSIL
Abstract
The focus of this project is to design the Intermediate language (IL or MSIL) for PASCAL
Language. This project aims to design a compiler, called DCSPM, that can compile a program
written in subset of PASCAL Language to MSIL including, Assignment statement, Write line
instructions, If statement, If/else statement, While statement, For statement, Switch statement, If
logic statement, and One dimensional array. The compilation time is important so, we have
evaluated these different implementations for their speed performance in Lexical Analysis and
Parser which can become bottleneck. It is shown that the DCSPM implementation is pretty fast
and the generated code is reliable and efficient. First, lexical analysis is built, which reads the
Pascal source code and produces tokens to be passed to the parser. MSIL of PASCAL was
generated as the output of the parser. One of the most difficulties in this research is to verify the
correctness of the MSIL code generated by DSCPM. I need to compare with the MSIL generated
by a similar C# code and verify if they generated the same execution results. DSCPM supports
simple nested if statement, if statement of a complex condition with a single level, and a simple
one dimensional array with limited operation such as inside a print statement. DSCPM produces
efficient MSIL intermediate code which can then be assembled into .NET managed executable.
DSCPM can serve as an education tool for students studying PASCAL, compiler technology, and
MSIL. The lessons learned can be applied to other programming languages.
4
Abdullah Sheneamer 2012
Acknowledgements
I would never have been able to finish my dissertation without the guidance of my
committee members, and support from my family and wife.
I would like to express my deepest gratitude to my advisor, Dr. Albert Glock, for his
excellent guidance, caring, patience, and providing me with an excellent atmosphere for doing
research.
I offer my sincerest gratitude to Dr. Edward Chow, who let me experience the research of
practical issues beyond the textbooks, patiently corrected my writing research and giving important
questionable ideas to me through his comments on my proposal.
I would also like to thank Albert Brouillette for being interested in getting my proposal
succeeded and giving important questionable ideas to me through his comments on my proposal.
5
Abdullah Sheneamer 2012
Contents
1
Introduction ................................................................................................................................ 9
1.1
2
Motivation: ........................................................................................................................ 13
Background .............................................................................................................................. 14
2.1
Overview of Compilation Process .................................................................................... 14
2.2
History ............................................................................................................................... 15
3
Design ...................................................................................................................................... 16
3.1
Introduction to Symbol Table and Lexical Analysis ......................................................... 16
3.1.1
Symbol Table Design ................................................................................................. 16
3.1.2
Lexical Analysis Design ............................................................................................ 17
3.2
Parser and MSIL (Microsoft Intermediate Language) of PASCAL Language Design .... 20
3.2.1
Introduction to Parser (Syntax Analysis) ................................................................... 20
3.2.2
Parser (Syntax Analysis) Design ............................................................................... 21
3.2.3
Introduction to MSIL (Microsoft Intermediate Language) ........................................ 25
3.2.4
Intermediate language Instructions ............................................................................ 28
3.2.5
MSIL (Microsoft Intermediate Language) Design .................................................... 32
3.2.6
Design Common Syntax Errors Table ....................................................................... 45
4
Implementation ........................................................................................................................ 47
5
Improvements and Evaluations ................................................................................................ 50
5.1
Improvements .................................................................................................................... 50
5.1.1
Lexical Analysis Improvement .................................................................................. 50
5.1.2
Microsoft Intermediate Language (MSIL) of If Statement Improvement ................. 51
5.2
Evaluations and performance ............................................................................................ 53
6
Lessons Learned....................................................................................................................... 58
7
Future Works ........................................................................................................................... 62
8
Conclusion ............................................................................................................................... 62
9
References ................................................................................................................................ 64
13.
Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition ..................................................... 64
Appendix A: ..................................................................................................................................... 65
PASCAL Grammar BNF. ............................................................................................................ 65
Appendix B: ..................................................................................................................................... 67
-
Installing Visual C# 2010 Express Edition. .......................................................................... 67
Appendix C: ......................................................................................Error! Bookmark not defined.
6
Abdullah Sheneamer 2012
-
How to use DCSPM Compiler ...............................................Error! Bookmark not defined.
-
How to Compile Your Source Code ......................................Error! Bookmark not defined.
-
How to Test Pascal Code .......................................................Error! Bookmark not defined.
List of Figures
Figure 1: The compilation and execution process of PASCAL programs ................................ 12
Figure 2: Compilation process ...................................................................................................... 13
Figure 3: A Compiler ..................................................................................................................... 14
Figure 4 : Class Token in Lexical Analysis.................................................................................. 17
Figure 5: State diagram for the lexical Analyzer (states 0,1,2) .................................................. 18
Figure 6: State diagram for the lexical Analyzer (states 3, 4, 5) ................................................ 19
Figure 7: Syntax Tree .................................................................................................................... 20
Figure 8: Typical Data Structure for the given Syntax Tree ..................................................... 20
Figure 9: Steps in the top-down construction of Parse Tree ...................................................... 22
Figure 10: Method memory categories ........................................................................................ 28
Figure 19: MSIL of One dimensional Array has one element ................................................... 41
Figure 20: MSIL of One dimensional Array has four elements ................................................ 43
Figure 11: Application Code using .NET..................................................................................... 48
Figure 12: JIT Compilation .......................................................................................................... 48
Figure 13: NET CLR ..................................................................................................................... 49
Figure 14: Array list data structure vs. Dictionary data structure ........................................... 54
Figure 15: Parser phase results .................................................................................................... 56
Figure 16: initial and improved IF/Else MSIL results ............................................................... 57
Figure 17: Benchmark between size files of initial and improved IF/Else.il ............................ 58
7
Abdullah Sheneamer 2012
Figure 18: How Branches of If/else statements logic works ....................................................... 60
List of Tables
Table 1: Part of Symbol Table ...................................................................................................... 16
Table 2 : Two Characters Tokens ................................................................................................ 17
Table 3: Array list data structure vs. Dictionary data structure .............................................. 53
Table 4: Complexity of ArrayList vs. Dictionary ....................................................................... 55
Table 5: Parser phase results ........................................................................................................ 55
Table 6: benchmark between unimproved and improved IF/Else MSIL ................................. 57
Table 7: benchmark between initial and improved IF/Else.il files ............................................ 57
8
Abdullah Sheneamer 2012
1
Introduction
In the computer world, techniques evolve rapidly from theories, algorithms, programming
languages, software systems, and software engineering.
“Programming languages are notations for describing computations to people and to
machines. The world as we know it depends on programming languages, because all the software
running on all the computers was written in some programming language. But, before a program
can be run, it first must be translated into a form in which it can be executed by a computer. The
software systems that do this translation are called compilers.” [6]
Fortunately, compilers allow programmers to write at a high level, and automated
processing takes care of creating the machine-specific instructions. My project designs and creates
a compiler that translates PASCAL source code into Microsoft Intermediate Language (MSIL).
When compiling the source code to managed code in .Net environment, the compiler translates the
source into Microsoft Intermediate Language (MSIL). MSIL includes instructions for loading,
storing, initializing, and calling methods on objects, as well as instructions for arithmetic and
logical operations. There is currently no PASCAL compiler which compiles to MSIL. The Just-intime (JIT) compiler will convert the MSIL to CPU- Specific code [1].
The advantage in compiling to MSIL is that 1) legacy PASCAL can now be run on modern
machines, 2) MSIL is platform independent and 3) JIT compilers can be optimized for specific
machines and architectures. The JIT compiler can also do aggressive optimizations specifically for
the machine where the code is running.
“Before you can run Microsoft intermediate language (MSIL), it must be converted by a
.NET Framework just-in-time (JIT) compiler to native code, which is CPU-specific code that runs
on the same computer architecture as the JIT compiler. Because the common language runtime
9
Abdullah Sheneamer 2012
supplies a JIT compiler for each supported CPU architecture, developers can write a set of MSIL
that can be JIT-compiled and run on computers with different architectures. However, your
managed code will run only on a specific operating system if it calls platform-specific native APIs,
or a platform-specific class library.
JIT compilation takes into account the fact that some code might never get called during
execution. Rather than using time and memory to convert all the MSIL in a portable executable
(PE) file to native code, it converts the MSIL as needed during execution and stores the resulting
native code so that it is accessible for subsequent calls. The loader creates and attaches a stub to
each of a type's methods when the type is loaded. On the initial call to the method, the stub passes
control to the JIT compiler, which converts the MSIL for that method into native code and
modifies the stub to direct execution to the location of the native code. Subsequent calls of the JITcompiled method proceed directly to the native code that was previously generated, reducing the
time it takes to JIT-compile and run the code.
The runtime supplies another mode of compilation called install-time code generation. The
install-time code generation mode converts MSIL to native code just as the regular JIT compiler
does, but it converts larger units of code at a time, storing the resulting native code for use when
the assembly is subsequently loaded and run. When using install-time code generation, the entire
assembly that is being installed is converted into native code, taking into account what is known
about other assemblies that are already installed. The resulting file loads and starts more quickly
than it would have if it were being converted to native code by the standard JIT option.
As part of compiling MSIL to native code, code must pass a verification process unless an
administrator has established a security policy that allows code to bypass verification. Verification
examines MSIL and metadata to find out whether the code is type safe, which means that it only
10
Abdullah Sheneamer 2012
accesses the memory locations it is authorized to access. Type safety helps isolate objects from
each other and therefore helps protect them from inadvertent or malicious corruption. It also
provides assurance that security restrictions on code can be reliably enforced.
The runtime relies on the fact that the following statements are true for code that is verifiably type
safe:

A reference to a type is strictly compatible with the type being referenced.

Only appropriately defined operations are invoked on an object.

Identities are what they claim to be.
During the verification process, MSIL code is examined in an attempt to confirm that the code
can access memory locations and call methods only through properly defined types. For example,
code cannot allow an object's fields to be accessed in a manner that allows memory locations to be
overrun. Additionally, verification inspects code to determine whether the MSIL has been
correctly generated, because incorrect MSIL can lead to a violation of the type safety rules. The
verification process passes a well-defined set of type-safe code, and it passes only code that is type
safe. However, some type-safe code might not pass verification because of limitations of the
verification process, and some languages, by design, do not produce verifiably type-safe code. If
type-safe code is required by security policy and the code does not pass verification, an exception
is thrown when the code is run.” [12]
Program HelloWorld; Begin
Writeln (‘ Hello World’);
End .
11
Abdullah Sheneamer 2012
Compilation
PASCAL
Compiler
Execution
MSIL
JIT
Compiler
Native
Code
.method public static void Main() cil managed
{
.entrypoint
.maxstack 1
IL_00: ldstr "Hello World"
IL_05: call
void [mscorlib]System.Console::WriteLine(string)
IL_10: ret
} // end of method HelloWorld::Main
Figure 1: The compilation and execution process of PASCAL programs
-Compilation process: takes PASCAL source code and produces MSIL. The PASCAL compiler
includes lexical and syntax analysis, and the creation of the symbol table. MSIL is created when
compiling to manage native code. MSIL is a CPU-independent set of instructions that can be
efficiently converted to native code. Such as Figure 2.
-Execution process: MSIL must be converted to CPU-specific code, usually by a just-in-time
(JIT) compiler. Native code is computer programming (code) that is compiled to run with a
particular processor (such as an Intel x86-class processor) and its set of instructions.
12
Source code of PASCAL
Abdullah Sheneamer 2012
Lexical
Analysis
Parser &
MSIL
Symbol Table
Error
Handler
.method public static void Main() cil managed
{
.entrypoint
.maxstack 1
IL_00: ldstr
"Hello World"
IL_05: call
void [mscorlib]System.Console::WriteLine(string)
IL_10: ret
} // end of method HelloWorld::Main
Figure 2: Compilation process
1.1 Motivation:
“During compilation of MSIL, the source code is translated into MSIL code rather than
platform or processor-specific object code. MSIL is a CPU- and platform-independent instruction
set that can be executed in any environment supporting the Common Language Infrastructure, such
as the .NET runtime on Windows, or the cross-platform Mono runtime. In theory, this eliminates
the need to distribute different executable files for different platforms and CPU types. MSIL code
is verified for safety during runtime, providing better security and reliability than natively
compiled executable files. ” [13]
13
Abdullah Sheneamer 2012
Since, there is currently no PASCAL compiler which compiles to MSIL so, I designed
MSIL of subset of PASCAL language which has the advantage in compiling to MSIL is that 1)
legacy PASCAL can now be run on modern machines, 2) MSIL is platform independent and 3) JIT
compilers can be optimized for specific machines and architectures.
2
Background
2.1
Overview of Compilation Process
A compiler is a program that can read a program in one language- the source language –
and translate it into equivalent program in another language as Figure 2. An important role of the
compiler is to report any errors in the source program that it detects during the translation process
[6].
Source
Program
Target
Program
Compiler
Figure 3: A Compiler
“Microsoft Intermediate Language (MSIL) is a language used as the output of a number of
compilers (C#, VB, .NET, and so forth). The ILDasm (Intermediate Language Disassembler) tool
that ships with the .NET Framework SDK (FrameworkSDK\Bin\ildasm.exe) allows the user to see
MSIL code in human-readable format. By using this utility, we can open any .NET executable file
(EXE or DLL) and see MSIL code.
The ILAsm (Intermediate Language Assembler) tool generates an executable file from the
MSIL language. We can find this program in the WINNT\Microsoft.NET\Framework\vn.nn.nn
directory. Any PASCAL programmer starting with .NET development is interested in what
14
Abdullah Sheneamer 2012
happens in the low level of the .NET Framework. Learning MSIL gives a user the chance to
understand some things that are hidden from a programmer working with PASCAL or another
language. Knowing MSIL gives more power to a .NET programmer. We never need to write
programs in MSIL directly, but in some difficult cases it is very useful to open the MSIL code in
ILDasm and see how things are done” [14].
2.2 History
“Pascal is an influential imperative and procedural programming language, designed in
1968–1969 and published in 1970 by Niklaus Wirth a small and efficient language intended to
encourage good programming practices using structured programming and data structuring. A
derivative known as Object Pascal designed for object-oriented programming was developed in
1985.
Pascal, named in honor of the French mathematician and philosopher Blaise Pascal, was
developed by Niklaus Wirth and based on the ALGOL programming language
Prior to his work on Pascal, Wirth had developed Euler and ALGOL W and later went on to
develop the Pascal-like languages Modula-2 and Oberon.
Initially, Pascal was largely, but not exclusively, intended to teach students structured
programming. A generation of students used Pascal as an introductory language in undergraduate
courses. Variants of Pascal have also frequently been used for everything from research projects to
PC games and embedded systems. . Newer Pascal compilers exist which are widely used” [15].
Grace Murray Hopper coined the term compiler in the early 1950s. Translation was viewed
as the “compilation” of a sequence of machine language subprograms selected from a library. One
of the first real compilers was the FORTRAN compiler of the late 1950s. It allowed a programmer
15
Abdullah Sheneamer 2012
to use a problem-oriented source language. Ambitious “optimizations” were used to produce
efficient machine code, which was vital for early computers with quite limited capabilities.
Efficient use of machine resources is still an essential requirement for modern compilers [16].
3 Design
3.1 Introduction to Symbol Table and Lexical Analysis
A symbol table is a data structure containing a record for each identifier, with fields for the
attributes of the identifier (information about storage allocation, type,…, etc.). When the lexical
analyzer detects an identifier in the source, the identifier is entered into the symbol table. However,
its attributes will be entered in the following phases. These attributes are also used later phases.
The lexical Analysis is the first phase of a compiler is called lexical analysis or scanning.
The lexical Analysis reads the stream of characters making up the source program and groups the
characters into meaningful sequences called lexemes.
3.1.1
Symbol Table Design
Every key word is a token and has a unique integer code as shown in table 1:
Keyword
Token Code
Begin
300
If
323
For
302
Switch
305
While
376
Table 1: Part of Symbol Table
So, the identifier token has a code 256, the number token has a code 257, and every special
character is a token and has an integer token code equals its ASCII number. Tokens of two
characters have unique to Codes as shown in the below table:
16
Abdullah Sheneamer 2012
Tow – Characters
Tokens
Token Code
!=
406
==
407
<=
408
>=
409
Table 2 : Two Characters Tokens
A token in an instance of the class as shown in the Figure below:
Figure 4 : Class Token in Lexical Analysis
3.1.2
Lexical Analysis Design
after reading next character from input stream ;
State 0 : identify the current token and decide the next state ;
State 1 : Handle identifiers and keywords.
State 2: Handle Number .
State 3 : Handle one – character token or two –character token .
State 4,5 : Handle Comments “\\” or “\*”, skip the line start with “\\” or skip the data between “\*”
and “*\”.
17
Abdullah Sheneamer 2012
Begin -/ 1lexbuf= “”
2-state=0;
Figure 5: State diagram for the lexical Analyzer (states 0,1,2)
18
Abdullah Sheneamer 2012
Begin -/ 1lexbuf= “”
2-state=0;
Figure 6: State diagram for the lexical Analyzer (states 3, 4, 5)
19
Abdullah Sheneamer 2012
3.2 Parser and MSIL (Microsoft Intermediate Language) of PASCAL Language
Design
3.2.1 Introduction to Parser (Syntax Analysis)
The parser inputs the stream of tokens into a hierarchical structure represented by a syntax
tree. A typical data structure for the syntax tree of this example “ position := initial + rate * 60
token stream is shown below:
=
+
Position
Initial
*
60
Rate
Figure 7: Syntax Tree
=: •
Id1
•
1
+ •
Id2
•
2
* •
Id3
3
•
Num 60
Figure 8: Typical Data Structure for the given Syntax Tree
20
Abdullah Sheneamer 2012
“Grammar is used throughout the parser to organize compiler front ends. A grammar naturally
describes the hierarchical structure of most programming language constructs such as the Pascal
Language. For example, an if- else statement in Pascal language can have the form
If (expression) statement else statement.
That is, an if-else statement is the concatenation of the keyword if , an opening parenthesis, an
expression, a closing parenthesis, a statement, the keyword else, and another statement. Using the
variable expr to denote an expression and variable stmt to denote a statement, this structuring rule
can be expressed as:
stmt  if (expr) stmt else stmt
in which the arrow may be read as “ can have the form” Such a rule is called a production. In
production, lexical elements like the keyword if and the parentheses are called terminals. Variables
like expr and stmt represent sequence of terminals and called non-terminals” [6].
3.2.2
Parser (Syntax Analysis) Design
Parsing is the process of determining if a string of tokens can be generated by a grammar. To
parse Pascal, it is sufficient to make a single left to right scan over the input, looking ahead one
token at a time. Top-Down parsing constructs the nodes of a parse tree starting at the root and
proceeding towards the leaves such as the simple example in Figure 8. To construct the parse tree,
start at the root and repeatedly do the following two steps:
1- At the function “OneDimArray” construct children at “n”. For the symbols on the right side
the production.
2- Find the next node at which a sub tree is to be constructed.
21
Abdullah Sheneamer 2012
“Note: When starting with nonterminal OneDimArray at the root, we should use a production for
OneDimArray that starts with lookahead symbol array. The lookahead symbol always contains the
next token to be parsed in the input stream.” [6]
<OneDimArray>
array
[
num dot
dot
num
]
of
<Standard Type>
integer
Figure 9: Steps in the top-down construction of Parse Tree
“When the node being considered in the parse tree is for a terminal, and the terminal
matches the look ahead symbol, then we advance in both the parse tree and the input. The next
token in the input becomes the new look ahead symbol, and the next child in the parse tree is
considered. When a node labeled with a nonterminal is considered, we repeat the process of
selecting a production for the nonterminal. In general, the selection of a production for a
nonterminal may involve trial-and-error. However, a method called “predictive parsing” is simple
and free from trial-and-error.” [6]
The statements and one dimensional array grammar that include my project:
1- Assignment statement that an arithmetic expression is an expression using additions +,
subtractions -, multiplications *, and divisions div. A single mode arithmetic expression is
an expression all of whose operands are of the same type
22
Abdullah Sheneamer 2012
(i.e. INTEGER, REAL or COMPLEX). However, only INTEGER and REAL will be
covered in this project. Therefore, those values or variables in a single mode arithmetic
expression are all integers or real numbers. such as a:=b+c div d-e OR an assignment
statement gives a value to a variable such as x:=5; and compile that to Intermediate
language.
<assignment statement> ::= <variable> := <expression>
2- The PASCAL compiler is structured in such a way that a write, and writeln statements
containing more than one argument is compiled into several write statement with only one
argument. For writeln, these statements are followed by a statement that writes the end-ofline. So for example the writeln statement: “ Prgoram Write; Begin writeln('This writeln is
compiled into MSIL '); End . ”
3- “if” Statement grammar:
<if statement> ::= if <expression> then <statement>
4- “if/Else” Statement grammar:
<if statement> ::= if <expression> then <statement> |
if <expression> then <statement> else <statement>
23
Abdullah Sheneamer 2012
5- “ While” Statement grammar:
<while statement> ::= while <expression> do <statement>
6- For” Statement grammar:
<for statement> ::= for <variable identifier > ::= <expression> to
<expression> do < statement>
7- “Case” Statement grammar:
<Case> := Case id Of <case_element> End ‘;’ | empty
<case_element> := ‘’’ <case_label_list> ‘:’ <statement>’;’ <statement>
<case_element> | empty
< case_Label_list> := < Constant> ‘{‘ <case_label_list> | ‘,’ <constant>
<case_label_list> |’{‘
<constant> := ‘’’ | ’+’ | ’-‘ | id | num
8- “ Array” structure grammar:
<OnDimArray> := array [ num .. num ] of <standard_type>
<standard_type> := integer | real
24
Abdullah Sheneamer 2012
9- If logic statement grammar:
<IFLogic> := <ANDLOGIC> Or < expression_list> <IFLogic> | empty
<ANDLOGIC> := < expression_list> And <expression_list> <ANDLOGIC> | empty
<expression_list> := < expression> | ‘,’ < expression >
<expression> :=…….
3.2.3 Introduction to MSIL (Microsoft Intermediate Language)
MSIL is the Microsoft Intermediate Language. All .NET compatible languages will get
converted to MSIL. MSIL also allows the .NET Framework to JIT compile the assembly on the
installed computer. The main purpose of this Intermediate code formation is to have a platform
independent code...that is once MSIL is available you can run on any platform provided
appropriate run time environments are installed on the specific platform you wish to run such as
CLR in case of .NET.
IL is what your Pascal code gets compiled into and is sent to the JIT compiler when .NET
programs are run. MSIL is a very low level language that is very fast, and working with it gives
you exceptional control over your programs.
“All operations in MSIL are executed on the stack. When a function is called, its
parameters and local variables are allocated on the stack. Function code starting from this stack
state may push some values onto the stack, make operations with these values, and pop values
from the stack.
Execution of both MSIL commands and functions is done in three steps:
1. Push command operands or function parameters onto the stack.
25
Abdullah Sheneamer 2012
2. Execute the MSIL command or call function. The command or function pops their
operands (parameters) from the stack and pushes onto the stack result (return value).
3. Read result from the stack”[14].
The Pascal code of MSIL in our previous example looks like this simple code:
“ Program HelloWorld;
Begin
Writeln (‘ Hello World’);
End . “
The output MSIL:
// Metadata version: v4.0.30319
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
// .z\V.4..
.ver 2:0:0:0
}
.assembly HelloWorld
{
.hash algorithm 0x00008004
.ver 0:0:0:0
}
.module expression.dll
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
// WINDOWS_CUI
.corflags 0x00000001
// ILONLY
// Image base: 0x00820000
// =============== CLASS MEMBERS DECLARATION ===================
.class public auto ansi HelloWorld
extends [mscorlib]System.Object
{
.method public static void Main() cil managed
{
.entrypoint
.maxstack 1
IL_00: ldstr
"Hello World"
IL_05: call
void [mscorlib]System.Console::WriteLine(string)
IL_10: ret
} // end of method HelloWorld::Main
.method public specialname rtspecialname
instance void .ctor() cil managed
{
.maxstack 2
IL_00: ldarg.0
IL_01: call
instance void [mscorlib]System.Object::.ctor()
IL_06: ret
} // end of method HelloWorld::.ctor
} // end of class HelloWorld
26
Abdullah Sheneamer 2012
What’s inside the Class Members Declaration :
.method : A method definition begins with the .method directive and can be defined at global
scope or within a class. The application entry point must be static, meaning an instance is not
required to call the method, and that is indicated by the static keyword. Declaring a global method
static seems redundant but the ILASM compiler complains if you omit thestatic keyword in some
cases. ‘void main()’ as the signature of the method which, as you would expect, indicates that it
does not return a value and takes zero arguments.
.entrypoint : The .entrypoint directive signals to the runtime that this method is the entry point for
the application. Only one method in the application can have this directive.
.maxstack : The .maxstack directive indicates how many stack slots the method expects to use.
For example, adding two numbers together involves pushing both numbers onto the stack and then
calling the add instruction which pops both numbers off the stack and pushes the result onto the
stack. In that example you will need two stack slots.
Ldstr : The ldstr instruction pushes the string that is passed to the WriteLine method onto the
stack.
Call : The call instruction invokes the static WriteLine method on the System.Console class from
the mscorlib assembly. This is an example of a method declaration. It provides the full signature of
the WriteLine method (including the string argument) so that the runtime can determine which
overload of the WriteLine method to call.
Ret : The ret instruction returns execution to the caller. In the case of the entry point method, this
would bring your application to an end.
Also, some programs have a .local directive that declares variables such as:
.local ( int32 a, int32 b,…..). In this MSIL method, variables are declared using
the .locals directive.
27
Abdullah Sheneamer 2012
3.2.4 Intermediate language Instructions
“When a method is executed, three categories of memory local to the method plus one
category of external memory are involved. All these categories represent typed data slots, not
simply an address interval as is the case in the unmanaged world. The external memory
manipulated from the method is the community of the fields the method accesses (except the fields
of value types belonging to the local categories). The local memory categories include an argument
table, a local variable table, and an evaluation stack. Figure 9 describes data transitions between
these categories. As you can see, all IL instructions resulting in data transfer have the evaluation
stack as a source or a destination, or both.
Figure 10: Method memory categories
The argument and local variable tables have a static type which can be any of the types defined in
the .NET Framework and the application. The evaluation stack table holds different types at
different times during the course of the method execution. So, the same stack could be used for
different variables.
28
Abdullah Sheneamer 2012
IL instructions consist of an operation code (opcode), which for some instructions is
followed by an instruction parameter. Opcodes are either 1 byte or 2 bytes long.
Some of the IL instructions that I used in my project such as:
3.2.4.1 Unconditional branching
Instructions take nothing from the evaluation stack and put nothing on it.

br <int32> (0x38). Branch <int32> bytes from the current position.
By default, the IL assembler does not automatically choose between long-parameter and
short-parameter forms. Thus, if you specify a short-parameter instruction and put the target
label farther away than the short parameter permits, the calculated offset is truncated to 1
byte, and the IL assembler issues an error message.

3.2.4.2
br.s <int8> (0x2B). The short-parameter form of br.

Conditional Branching Instructions
brfalse (brnull, brzero) <int32> (0x39). Branch if <value> is 0. <value>*

brfalse.s (brnull.s, brzero.s) <int8> (0x2C). The short-parameter form of brfalse. I used
brfalse.s in my project is an improvement in the If /Else statement MSIL, I will talk about it
later.

brtrue (brinst) <int32> (0x3A). Branch if <value> is nonzero.

brtrue.s (brinst.s) <int8> (0x2D). The short-parameter form of brtrue.
3.2.4.3
Comparative Branching Instructions
Comparative branching instructions take two values (<value1>, <value2>) from the
evaluatio1n stack and compare them according to the <condition> specified by the opcode.
*<value> is obtained from top value or the stack
29
Abdullah Sheneamer 2012
Not all combinations of types of <value1> and <value2> are valid. These are the ones I
used in my project:2

bgt.s <int8> (0x30). The short-parameter form of bgt.

blt.s <int8> (0x32). The short-parameter form of blt.

beq.s <int8> (0x2E). The short-parameter form of beq.

bne.un.s <int8> (0x33). The short-parameter form of bne.un.

ble.s <int8> (0x31). The short-parameter form of ble.

bge.un.s <int8> (0x34). The short-parameter form of bge.un.
3.2.4.4
Constant Loading
Constant loading instructions take at most one parameter (the constant to load) and load it on
the evaluation stack. The ILAsm syntax requires explicit specification of the constants (in other
words, you cannot use a variable or argument name), in decimal or hexadecimal form:
Some instructions have no parameters because the value to be loaded is specified by the opcode
itself.
Note that for integer and floating-point values, the slots of the evaluation stack are either 4- or 8bytes wide, so the constants being loaded are converted to the suitable size.
2

ldc.i4 <int32> (0x20). Load <int32> on the stack.

ldc.i4.s <int8> (0x1F). Load <int8> on the stack.

ldc.i4.m1 (ldc.i4.M1) (0x15). Load –1 on the stack.

ldc.i4.0 (0x16). Load 0.

ldc.i4.1 (0x17). Load 1.

ldc.i4.2 (0x18). Load 2.

ldc.i4.3 (0x19). Load 3.
<value> is obtained from top value or the stack
30
Abdullah Sheneamer 2012

ldc.i4.4 (0x1A). Load 4.

ldc.i4.5 (0x1B). Load 5.

ldc.i4.6 (0x1C). Load 6.

ldc.i4.7 (0x1D). Load 7.
3.2.4.5
Logical Condition Check Instructions
Logical condition check operations are similar to comparative branching instructions except
that they result not in branching but in putting the condition check result on the stack. The result
type is int32, and its value is equal to 1 if the condition checks and 0 otherwise; in other words,
logically the result is a Boolean value. The two operands being compared are taken from the stack,
and since no branching is performed, the condition check instructions have no parameters.
The logical condition check instructions are useful when you want to store the result of the
condition check for multiple use or for later use. If you need the condition check to decide only
once and on the spot whether you need to branch, you would be better off using a comparative
branching instruction.”[10]

ceq (0xFE 0x01). Check whether the two values on the stack are equal.

cgt (0xFE 0x02). Check whether the first value is greater than the second value. It’s the
stack we are working with, so the “second” value is the one on the top of the stack.

clt (0xFE 0x04). Check whether the first value is less than the second value.
3.2.4.6 Local Variable Loading
Local variable loading instructions are similar to argument loading instructions except that no
“invisible” items appear among the local variables, so local variable number 0 is always the first
one specified in the local variable signature.
31
Abdullah Sheneamer 2012

ldloc <unsigned int16> (0xFE 0x0C). Load the value of local variable
number <unsigned int16> on the stack. Like the argument numbers, local variable numbers
can range from 0 to 65534 (0xFFFE). The value 65535, also admissible for unsigned 2-byte
integers, is excluded because otherwise the counter of local variables would have to be 4
bytes wide. Limiting the number of the local variables, however standardized, seems
arbitrary and implementation specific, because the number of local variables of a method
is not stored in the metadata or in the method header, so this limitation comes purely from
one particular implementation of the JIT compiler.

ldloc.s <unsigned int8> (0x11). The short-parameter form of ldloc.

ldloc.0 (0x06). Load the value of local variable number 0 on the stack.

ldloc.1 (0x07). Load the value of local variable number 1 on the stack.

ldloc.2 (0x08). Load the value of local variable number 2 on the stack.

ldloc.3 (0x09). Load the value of local variable number 3 on the stack.”[10]
3.2.5 MSIL (Microsoft Intermediate Language) Design
After the source code has been tokenized, the parsing phase commences. At the end of this stage, if
the source code is syntactically valid, the compiler will be generating: (1) an abstract syntax tree
(AST) of the source code and (2) Microsoft Intermediate Language (MSIL). The parser phase
starts with the Program() function which matches “program” keyword, Identifier, “;’” , calls
declaration() function, compound Statement() function by called match() function which checks
every element of the source code for any syntax errors and checks for the validity of entered token.
Then the parser will call the nextToken() function to read the next token, MSIL is ready to call the
newlabel() function that sets up a new label and then calls the emit function to combine the new
label with the opcode. The same procedure applies to the rest of the functions, the parser matches
the valid tokens and M SIL sets up the new labels and combines them with the opcodes.
32
Abdullah Sheneamer 2012
3.2.5.1 Constructors:
Constructors are class methods that are executed when an object of a given type is created.
Constructors have the same name as the class, and usually initialize the data members of the new
object.
1- ldloc Table: it is used to save the variable with its load local location
2-Stloc Table: it is used for save the variable with its store local location
3.2.5.2
Functions:
<program> ::= Program <identifier> ; <block> .
<Program>
1. Program funcion:
program
;
ID
<declaration >
<CompoundStatements>
<program> ::= Program <identifier> ; <declaration> <
compoundStatement>.
2. Declaration function
<declaration>
Var
<Identifier list>
:
<type>
< declaration> ::= <empty> | var <Identifier list> : <type>
33
.
Abdullah Sheneamer 2012
3. Identifier List function
<Identifier list>
ID
<Identifier list>
Ldloc.Add( lookahead.attr, ldloc.jj);
Stloc.Add( lookahead.attr, stloc.jj); jj++;
|
,
<Identifier list>
<Identifier list> ::= ID <Identifier list> | , <Identifier list>
Here, IdentifierList() function when matches identifier, it will enqueue attribute of identifier with
its ldloc and its stloc in two queues which retrieves the attribute of identifier from the symbol
table. This will help later on in MSIL when retrieving instructions of identifier for example:
program Sum();
var a,b,c;
begin
c:= a+b;
end; end.
MSIL:
IL_00: ldloc.0
IL_01: ldloc.1
IL_02: add
IL_03: stloc.2
4. Type function
< type>
ID
<Identifier list>
:
<type>
|
Integer | real
<Standard type>
;
<type>
34
|
<OneDimArray>
<type>
Abdullah Sheneamer 2012
5. Standard Type function
<Standard type>
Inqueue(“.locals init ( [1] int32, [2]
int32,…)
Integer| real
:
6. Compound Statements function
<Program>
<Compound_Statements>
Begin
<Statement List>
End ;
emit(IL_####, "
", "ret", "", "\n")
Prgoram function after calling Compound statements function, Compound function
matches “Begin” keyword and then calls Statements List function for the other statements such as
Writeline statement, if statement, if/else statement,..etc. We will talk about MSIL statements in a
bit. Next, matches End of Begin of our program and then it will do emit “IL_####:” and “ret”
instruction which return from method, possibly with a value.
35
Abdullah Sheneamer 2012
7. Statements List function
<Compound List>
<Statements List>
; Semicolon
<Statement>
<Statements List>
After Compound Statements function called Statements list function, Statements list
function is going to call Statement function which decides or parses and compiles a statement to
MSIL. After that, matching the semicolon of statement and then calling Statements List again for
another statement this function is a recursion function because calling itself.
8. Statement function
<Statements List>
<Statement>
<expression>
|
expression
Begin
|
If
|
While
|
<simple expression>
<Newlabel>: “<” : emit(“IL_##”,”clt”)
“>” : emit(“IL_##”,”cgt”)
“<=” : emit(“IL_##”,”cgt”)
“>=” : emit(“IL_##”,”clt”)
“==” : emit(“IL_##”,”ceq”)
“<>” : emit(“IL_##”,”ceq”)
36
For
|
Case
|
Writeline
Abdullah Sheneamer 2012
simple expression
<term>
“+” : emit(“IL_##”,”add”)
“-” : emit(“IL_##”,”sub”)
<Newlabel>
<simple expression>
<term>
<factor>
“*” : emit(“IL_##”,”mul”)
“/” : emit(“IL_##”,”div”)
<Newlabel>
<term>
<factor>
ID : “*” : emit(“IL_##”,”ldloc.##”)
<factor>
(
<IFlogic>
)
NUM : “*” : emit(“IL_##”,”ldc.i4.Num”)
IFlogic
<AndLogic>
OR : <NewLabel> if “bge.s”:emit(IL_##,”blt.s IL_##”)
else “ble.s”:emit(IL_##,”bgt.s IL_##”)
else “blt.s”:emit(IL_##,”bge.s IL_##”)
else “bgt.s”:emit(IL_##,”ble.s IL_##”)
else “bne.un.s”:emit(IL_##,”bne.un.s IL_##”)
else “beq.s”:emit(IL_##,”beq.s IL_##”) }
“brtrue.s”:emit(IL_##,”brtrue.s IL_##”)
<expression list>
<IFlogic>
<ANDlogic>
37
Abdullah Sheneamer 2012
AND: <NewLabel> if “bge.s”:emit(IL_##,”bge.s IL_##”)
else “ble.s”:emit(IL_##,”ble.s IL_##”)
else “blt.s”:emit(IL_##,”blt.s IL_##”)
else “bgt.s”:emit(IL_##,”bgt.s IL_##”)
else “bne.un.s”:emit(IL_##,”beq.s IL_##”)
else “beq.s”:emit(IL_##,”bne.un.s IL_##”) }
“brtrue.s”:emit(IL_##,”brtrue.s IL_##”)
<ANDlogic>
<expression list>
ID | NUM: <expression>
‘,’: <expresson>
38
Abdullah Sheneamer 2012
<Statement>
If stat.
<expression>
If LOGIC==false: <NewLabel> emit(IL_##,”ldc.i4.0”);
emit(IL_##,”ceq”);
if LOGIC == true: <NewLabel> emit(IL_##,”br.s IL_”count+3”);
AND: <NewLabel> emit(IL_##,”ldc.i4.0”);
OR: <NewLabel> emit(IL_##,”ldc.i4.1”);
If IFLOGIC==true: <NewLabel> emit(IL_##,”br.s IL_count+3”);
<NewLabel> emit(IL_##,”ceq”);
<NewLabel> emit(IL_##,”ldc.i4.0”);
<NewLabel> emit(IL_##,”stloc.ii.ToString”);
<NewLabel> emit(IL_##,”ldloc.ii.Tostring”);
ii++;
<NewLabel> emit(IL_##,”brtrue.s IL_”BIF” ”);
Then
<statement>
;
else : Brif =count; <NewLabel> emit(IL_##,”br.s IL_”BrIF” ”);
BIF=count;
<statement>
39
Abdullah Sheneamer 2012
<Statement>
While stat.
(
<Newlabel> emit(“IL_##”, “br.s IL_”ForwordLabel””
Brture=count;
f= lookahead.code; d= lookahead.attr;
)
Do Begin <statement list> ForwordLabel = count;
<Newlabel> emit(“IL_##,”ldloc.”+d.ToString())
If f==NUM; <Newlabel> emit(“IL_##,”ldc.i4.”+f.ToString())
If f==ID; <Newlabel> emit(“IL_##,”ldc.i4.”+f.ToString())
<Newlabel>: “<” : emit(“IL_##”,”clt”);
“>” : emit(“IL_##”,”cgt”);
“<=” : {emit(“IL_##”,”cgt”);
emit(“IL_##,”ldc.i4.0”);
emit(“IL_##,”ceq”);}
“>=” : {emit(“IL_##”,”clt”);
emit(“IL_##,”ldc.i4.0”);
emit(“IL_##,”ceq”);}
“==” : emit(“IL_##”,”ceq”);
“<>” : emit(“IL_##”,”ceq”);
<NewLabel> emit(IL_##,”stloc.ii.ToString”);
<NewLabel> emit(IL_##,”ldloc.ii.Tostring”);
ii++;
<NewLabel> emit(IL_##,”brtrue.s IL_”Brtrue” ”);
End
;
40
Abdullah Sheneamer 2012
3.2.5.3 One dimensional array and ILDASM Tool
Intermediate Language Disassembler (ILDASM) is found to be very useful in this project.
ILDASM allows you to see the pseudo assembly language for .NET and it's the only way you can
see who, what, when, where, and why of .NET. While I will probably never write major programs
in Microsoft intermediate language (MSIL), knowing your way around the assembly language
certainly helps. I have faced many problems in this project. One of the most difficult problem is
dealing with one dimensional array. One dimensional array has two cases when compiling to
MSIL. First, when the array has one element or 2 elements will be the same looks like the MSIL of
other statements ( if/else/while….etc) such as in Figure 19.
Figure 11: MSIL of One dimensional Array has one element
41
Abdullah Sheneamer 2012
I designed this as I did with the other statements by adding a function in parser phase is called
OneDimArray() function and I added its MSIL and emitted them by emit() function. The MSIL of
one dimensional array like this code below:
Pascal code : var a: array[1..2] of integer = (3,4);
.method private hidebysig static void
Main(string[] args) cil managed
{
.entrypoint
// Code size
19 (0x13)
.maxstack 3
.locals init ([0] int32[] a,
[1] int32[] CS$0$0000)
IL_0000: nop
IL_0001: ldc.i4.2
IL_0002: newarr
[mscorlib]System.Int32
IL_0007: stloc.1
IL_0008: ldloc.1
IL_0009: ldc.i4.0
IL_000a: ldc.i4.1
IL_000b: stelem.i4
IL_000c: ldloc.1
IL_000d: ldc.i4.1
IL_000e: ldc.i4.2
IL_000f: stelem.i4
IL_0010: ldloc.1
IL_0011: stloc.0
IL_0012: ret
} // end of method Program::Main
But when I compiled one dimensional array has three elements or more in ILDASM, I got different
MSIL results when it has one element or two elements such as in Figure 20.
Pascal code of One dimensional array which has four elements such as below:
program ArrayOneDim (input,output);
var a: array[1..4] of integer = (1,2,3,4);
begin
writeln(a[2]);
end;
end.
42
Abdullah Sheneamer 2012
Figure 12: MSIL of One dimensional Array has four elements
So, one dimensional array which has three elements or three, has MAINFEST, Test namespace,
Test.Program class, Private Impelentaion Details namespace , _ StaticArrayInitTypeSize=16 value
class. The code below is explaning how I designed one dimensional array has three elements or
more
// =============== CLASS MEMBERS DECLARATION =============
.class public auto ansi ArrayOneDim
extends [mscorlib]System.Object
{
.method private hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 10
.locals init([0] int32 a,[1] bool CS$4$0000,[2] int32 CS$4$0001)
IL_0:
ldc.i4.4
IL_1:
newarr
This is name of the
class, it usually is
changed depending on
the name of the
program
Presumably to
preserve stack usage.
[mscorlib]System.Int32
IL_5: dup
Here, it’s 16 size
IL_6: ldtoken field valuetype
because we have 4
'<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=16'
elements so, 4*4=16
'<PrivateImplementationDetails>'::'$$method0x6000001-1'
IL_B: call
void
[mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(cla
ss [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
IL_13:
ldelem.i4
43
Abdullah Sheneamer 2012
IL_10:
stloc.0
IL_11:
ldloc.0
IL_12:
ldc.i4.2
IL_13:
ldelem.i4
IL_14:
call
IL_19:
ret
void[mscorlib]System.Console::WriteLine(int32)
}
.method public specialname rtspecialname
instance void .ctor() cil managed
{
.maxstack 8
IL_0000: ldarg.0
IL_0001: call
instance void [mscorlib]System.Object::.ctor()
IL_0006: ret
}
I got this usually the
// end of method ArrayOneDim::.ctor
same whatever looks
}
like one dimensional
// end of class
array
.class private auto ansi '<PrivateImplementationDetails>'
extends [mscorlib]System.Object
{
.custom instance void
[mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00
)
.field static assembly valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=16'
'$$method0x6000001-1' at I_00002050
.class explicit ansi sealed nested private '__StaticArrayInitTypeSize=16'
extends [mscorlib]System.ValueType
{
.pack 1
.size 16
} // end of class '__StaticArrayInitTypeSize=16'
} // end of class '<PrivateImplementationDetails>'
.data I_00002050 = bytearray (
01 00 00 00
Elements of one
02 00 00 00
dimensional array in
03 00 00 00
hexadecimal.
04 00 00 00 )
44
Abdullah Sheneamer 2012
3.2.6 Design Common Syntax Errors Table
There are several types of error, with consequences ranging from deficiencies in the
formatting of the output to the calculation of wrong results. A compilation error (which prevents
the compiler from compiling the source code) is usually a syntax error but could be an error in the
compiler itself. A syntax error results when the source code does not obey the rules of the
language. The compiler generates error messages to help the programmer to fix the code. The
source code may compile to machine code which then fails upon execution. A run-time
error causes this situation. Potentially the most serious type of error occurs when the program
appears to be working but is performing faulty processing due to logic errors in the source code. I
classify the common errors as syntax errors.
I designed some of errors which the DCSPM compiler recognizes so, every token has ascii code
such as ‘(‘ , ‘)’, …..so on. The token code in the symbol table represents its keyword such as:
‘program’ keyword has 323, ‘do’ keyword has 305, … so on. The method in DCSPM compiler
looks like:
static void match(int t)
{
if (lookahead.code
else
switch (t)
{
case 40:
Err +=
break;
case 41:
Err +=
break;
case 44:
Err +=
break;
case 46:
Err +=
break;
case 58:
Err +=
break;
case 59:
Err +=
break;
case 60:
Err +=
break;
== t) { lookahead = nextToken(); }
"\n" + "\n" + "Expected Missing '('";
"\n" + "\n" + "Expected Missing ')'";
"\n" + "\n" + "Expected Missing ','";
"\n" + "\n" + "Expected Missing '.'";
"\n" + "\n" + "Expected Missing ':'";
"\n" + "\n" + "Expected Missing ';'";
"\n" + "\n" + "Expected Missing 'operator'";
case 407:
Err += "\n" + "\n" + "Expected Missing ':='";
45
Abdullah Sheneamer 2012
break;
case 91:
Err += "\n" + "\n" + "Expected Missing '['";
break;
case 92:
Err += "\n" + "\n" + "Expected Missing ']'";
break;
case 93:
Err += "\n" + "\n" + "Expected Missing ':='";
break;
case 256:
Err += "\n" + "\n" + "Expected Missing 'identifier' ";
break;
case 357:
Err += "\n" + "\n" + "Expected Missing 'Number' ";
break;
case 300:
Err += "\n" + "\n" + "Expected Missing 'begin' ";
break;
case 305:
Err += "\n" + "\n" + "Expected Missing 'do' ";
break;
case 307:
Err += "\n" + "\n" + "Expected Missing 'then' ";
break;
case 308:
Err += "\n" + "\n" + "Expected Missing 'end' ";
break;
case 319:
Err += "\n" + "\n" + "Expected Missing 'of' ";
break;
case 323: Err += "\n" + "\n" + "Expected Missing 'program'";
break;
case 328:
Err += "\n" + "\n" + "Expected Missing 'to' ";
break;
case 331: Err += "\n" + "\n" + "Expected Missing 'var' ";
break;
case 336: Err += "\n" + "\n" + "Expected Missing 'integer'";
break;
case 342:
Err += "\n" + "\n" + "Expected Missing 'writeln' ";
break;
default:
break;
}
}
46
Abdullah Sheneamer 2012
4 Implementation
DCSPM is programmed in Microsoft visual C# Express 2010 that is contained in the MSDN
Library, which you can install locally on your own computer or network, and which is also
available on the internet at http://msdn.microsoft.com/library.
“C# (pronounced "C sharp") is a programming language that is designed for building a variety of
applications that run on the .NET Framework. C# is simple, powerful, type-safe, and objectoriented. The many innovations in C# enable rapid application development while retaining the
expressiveness and elegance of C-style languages.
Visual C# is an implementation of the C# language by Microsoft. Visual Studio supports Visual
C# with a full-featured code editor, compiler, project templates, designers, code wizards, a
powerful and easy-to-use debugger, and other tools. The .NET Framework class library provides
access to many operating system services and other useful, well-designed classes that speed up the
development cycle significantly” [18].
“This effectively reduces the refactoring capabilities of Visual C# Express to Renaming and
Extracting Methods. Developers state the reason of this removal as "to simplify the C# Express
user experience". However this created a controversy as some end users claim it is an important
feature, and instead of simplifying it cripples the user experience.
The ability to attach the debugger to an already-running process has also been removed, hindering
scenarios such as writing Windows services and re-attaching a debugger under ASP.NET when
errors under the original debugging session cause breakpoints to be ignored.
Additionally it has been observed that the express version requires that the time between builds be
greater than approximately 20 seconds. If a project is rapidly modified and rebuilt the target will
not be updated even though the source has been modified and saved.”[19]
The steps required to create a .NET application :
1.
Application code is written using a .NET-compatible language such as C#.
2.
That code is compiled into CIL, which is stored in an assembly such as Figure 10.
47
Abdullah Sheneamer 2012
Figure 13: Application Code using .NET
3.
When this code is executed (either in its own right if it is an executable or when it is
used from other code), it must first be compiled into native code using a JIT compiler such
as Figure 11.
Figure 14: JIT Compilation
4. The native code is executed in the context of the managed CLR, along with any other
running applications or processes, as shown in such as Figure 12.
48
Abdullah Sheneamer 2012
Figure 15: NET CLR
Microsoft Visual C# is a programming environment used to create computer
applications for the Microsoft Windows family of operating systems. It combines the C#
language and the .NET Framework.
To test the MSIL, I general used the MSIL Disassembler (Ildasm.exe) tool that is
included with the .NET Framework SDK. The Ildasm.exe parses any .NET Framework
.exe or .dll assembly, and shows the information in human-readable format. Ildasm.exe
shows more than just the Microsoft intermediate language (MSIL) code — it also displays
namespaces and types, including their interfaces. You can use Ildasm.exe to examine
native .NET Framework assemblies, such as Mscorlib.dll, as well as .NET Framework
assemblies provided by others or created yourself. Most .NET Framework developers will
find Ildasm.exe indispensable. You can find this tool FrameworkSDK\Bin\ildasm.exe in
your computer as I explained that in Background section 2.
ILAsm has the same instruction set as the native assembly language. You can write code
for ILAsm in any text editor like notepad and then can use the command line compiler
(ILAsm.exe) provided by the .NET framework to compile that. ILAsm.exe is a command
line tool shipped with the .NET Framework and can be located at
<windowsfolder>\Microsoft.NET\Framework\<version> folder. You can include this path
in your path environment variable. When you have finished compiling your .IL file, then it
will output the exe with the same name as that of .IL file. You can specify the output file
name using /OutPut=<filename> switch like ILAsm Test.il /output=MyFile.exe. To run the
49
Abdullah Sheneamer 2012
output exe file, just type the name of the exe and hit return. Output will be before you on
the screen. [11]
When the .il file is compiled it needs the Fusion.dll file. “Fusion.dll is an assembly
manager module used with the .net framework of Microsoft. The Common Language
Runtime (CLR) contains a system component called the assembly manager that takes on
the responsibilities of storing assembly files in the Global Assembly Cache (GAC) and
loading them at run time when they are first used by an application. The Global Assembly
Cache is the central repository for assemblies installed on a Windows machine. It provides
a uniform, versioned and safe access of assemblies by their strong assembly name. The
assembly manager is loaded from the system component fusion.dll.”[20]
5 Improvements and Evaluations
5.1 Improvements
5.1.1
Lexical Analysis Improvement
In the lexical analysis, the symbol table used the array list data structure which may not
always offer the best performance for a given task. The symbol table is a data structure, where
each keyword and identifier in a program's source code is associated with information relating to
its declaration or appearance in the source, such as its type, scope level and sometimes its location.
public static ArrayList a = new ArrayList();
public static void insertKeyword()
{
a.Add(new SymbolTable("Begin", 300));
a.Add(new SymbolTable("And", 301));
a.Add(new SymbolTable("Case", 302));
a.Add(new SymbolTable("Const", 303));
a.Add(new SymbolTable("Div", 304));
.
50
Abdullah Sheneamer 2012
.
.
}
Arrays provide random access of a sequential set of data. Dictionaries (or associative
arrays) provide a map from a set of keys to a set of values. Most of the time a dictionary-like type
is built as a hash table, this type is very useful as it provides very fast lookups on average
(depending on the quality of the hashing algorithm). I have found dictionary data structure faster
than array list data structure when looking up in symbol table for keywords and identifiers. Array
lists just store a set of objects (that can be accessed randomly). Dictionaries store pairs of objects.
This makes array/lists more suitable when you have a group of objects in a set (prime numbers,
colors, students, etc.). Dictionaries are better suited for showing relationships between a pair of
objects. The Dictionary class constructor takes two parameters (generic type), first for the type of
the key and second for the type of the value. The following code snippet creates a Dictionary
where keys are strings and values are short.
public static Dictionary<string, int> a = new Dictionary<string, int>();
public static void insertKeyword()
{
a.Add("Begin", 300);
a.Add("And", 301);
a.Add("Case", 302);
a.Add("Const", 303);
a.Add("Div", 304);
.
.
.
.
}
So, in dictionary data structure doesn’t have a struct type to represent its objects.
5.1.2 Microsoft Intermediate Language (MSIL) of If Statement Improvement
We can optimize “if statement” MSIL by removing ldc.i4.0 instruction and ceq instruction and
replacing brtrue.s with brfalse.s and get the same results that before optimization. We can see this
in sample code below.
Sample code:
51
Abdullah Sheneamer 2012
int a = 0, b = 1, c=2;
if (a == 1)
{
a = b + c;
}
Improvement MSIL Of Code:
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: stloc.0
IL_0003: ldc.i4.1
IL_0004: stloc.1
IL_0005: ldc.i4.2
IL_0006: stloc.2
IL_0007: ldloc.0
IL_0008: ldc.i4.1
IL_0009: ceq
IL_000b: ldc.i4.0
We can Remove
these instructions
to improve the
space and the time
so after removing
them we have to
replace “brtrue.s
IL_18” instruction
with “ brfalse.s
IL_18” instruction
IL_000c: ceq
IL_000e: stloc.3
IL_000f: ldloc.3
IL_0010: brtrue.s
brfalse.s IL_0018
IL_0018
IL_0010:
IL_0012:
IL_0013:
IL_0014:
IL_0015:
IL_0016:
IL_0017:
IL_0018:
nop
ldloc.1
ldloc.2
add
stloc.0
nop
ret
52
Abdullah Sheneamer 2012
5.2 Evaluations and performance
The section describes the evaluation and performance of the DCSPM Compiler in
two stages. First stage is the symbol table of lexical analysis, second stage is the Parser
phase, and initial if/else MSIL results and improvement if/else MSIL results. I have tested
different types of code such as 11,22,33,44,55,66,77,88, and 99 lines in lexical analysis
phase which is using array list data structure and lexical analysis phase which is using
dictionary data structure. Table 3 shows the results of the array list and dictionary.
#Lines
11
22
33
44
55
66
77
88
99
Array List
7.7702 ms
7.8529 ms
7.9264 ms
8.0363 ms
8.4518 ms
8.4946 ms
8.6187 ms
8.9369 ms
9.2126 ms
Dictionary
6.0066 ms
6.5299 ms
6.6787 ms
6.9415 ms
7.1428 ms
7.2742 ms
7.2959 ms
7.4568 ms
7.5075 ms
Table 3: Array list data structure vs. Dictionary data structure
53
Abdullah Sheneamer 2012
It’s obvious that when lexical analysis is using dictionary data structure is faster
than array list data structure such as the chart shown in Figure 14.
10
9
8
7
6
Time ms
5
Array List
4
Dictionary
3
2
1
0
11
22
33
44
55
66
77
88
99
Lines of Program
Figure 16: Array list data structure vs. Dictionary data structure
“The Dictionary<TKey,TVale> is probably the most used associative container class.
The Dictionary<TKey,TValue> is the fastest class for associative lookups/inserts/deletes
because it uses a hash table under the covers. Because the keys are hashed, the key type
should correctly implement GetHashCode() and Equals() appropriately or you should
provide an external IEqualityComparer to the dictionary on construction. The
insert/delete/lookup time of items in the dictionary is amortized constant time - O(1) which means no matter how big the dictionary gets, the time it takes to find something
remains relatively constant. This is highly desirable for high-speed lookups. The only
downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot
easily traverse the items in a Dictionary in order.” [23]
These are differences between dictionary and array list what we've learned in a quick reference
table. [23]
54
Abdullah Sheneamer 2012
Collection
Ordering
Contiguous
Storage?
Direct
Access?
Lookup
Efficienc
y
Manipulate
Efficiency
Notes
Dictionary
Unordered
Yes
Via Key
Key:
O(1)
O(1)
Best for high
performance
lookups.
ArrayList
User has
precise
control
over
element
ordering
Yes
Via Index
O(n)
O(n)
Best for smaller
lists
Table 4: Complexity of ArrayList vs. Dictionary
“ArrayList resizes dynamically. As elements are added, it grows in capacity to accommodate
them. It is most often used in older C# programs. It stores a collection of elements of type object.
This makes casting necessary.” [24]
The Second stage is parser phase which receives tokens of lexical analysis by
nextToken() function. I have tested parser phase after its implementation is completed for
different types of code: 11, 22,33,44,55,66,77,88, and 99 lines of Pascal code. Table 4
shows the results of testing.
# of
Parser Phase
code
lines
11
0.41876
22
1.1104
33
2.1496
44
3.4499
55
5.1268
66
6.719
77
8.8899
88
10.2701
99
13.3532
Table 5: Parser
phase
results
As we see that time goes up when the code gets more lines. I tested the parser phase using
Stopwatch class such as this code below:
55
Abdullah Sheneamer 2012
System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
watch.Start(); Parser(); watch.Stop();
double elapsedMS = watch.ElapsedMilliseconds;
Time ms
Parser Phase
16
14
12
10
8
6
4
2
0
Parser Phase
11
22
33
44
55
66
77
88
99
# lines of Pascal code
Figure 17: Parser phase results
Also, I have tested initial and improvement if/else MSIL results which are generated in il
file by DCSPM compiler. I have created a batch timer.cmd file to calculate time of MSIL
results such as the code below:
@echo off
echo %time% < nul
cmd /c %1
echo %time% < nul
When I have finished compiling file.il by using ILAsm my .il file, then it will
output the exe with the same name as that of .il file. I used the command in cmd: timer
myfile.exe. I have tested 11,22,33,44,55,66,77,88, and 99 lines of Pascal code. The table 5
shows the benchmark between unimproved and improved if/else MSIL results.
Lines of Pascal
Code
11
22
33
44
55
unimprove
MSIL code
12.6 ms
13.4 ms
12.8 ms
13.4 ms
14.4 ms
56
improve MSIL code
13.6 ms
11.6 ms
12.8 ms
10.4 ms
12.2 ms
Abdullah Sheneamer 2012
66
77
88
99
14.8 ms
15 ms
15.2 ms
15.6 ms
12.8 ms
13.2 ms
13.4 ms
13.9 ms
Table 6: benchmark between unimproved and improved IF/Else
MSIL
if/else MSIL results
20
Time ms
15
10
unimprove MSIL code
5
improve MSIL code
0
11 22 33 44 55 66 77 88 99
# lines of Pascal Code
Figure 18: initial and improved IF/Else MSIL results
Improved if/else MSIL results are faster than if/else initial as shown in Figure 16
because of the size code of improved if/else MSIL less than the size code of unimproved
if/else MSIL. Also, when DCSPM generates improved if/else.il file, the file size is less than
the size of initial if/else .il file as shown in Figure 17.
Lines of Pascal code
Initial MSIL Size
11
22
33
44
55
66
77
88
99
2 kB
3 KB
5 KB
7 KB
8 KB
10 KB
11 KB
12 KB
14 KB
Improved
MSIL Size
2 KB
3 KB
5 KB
6 KB
8 KB
9 KB
11 KB
11 KB
13 KB
Table 7: benchmark between initial and improved IF/Else.il files
57
Abdullah Sheneamer 2012
Size of initial and improve if/els MSIL
16
14
Size/kb
12
10
8
Unimprove Size
6
Improve Size
4
2
0
11
22
33
44
55
66
77
88
99
# lines of Pascal Code
Figure 19: Benchmark between size files of initial and improved
IF/Else.il
6 Lessons Learned
I started my research by reading books, papers, and e-books. I found tools which I can use to
verify my compilation of MSIL results such as ildasm.exe converts IL to human readable code
which can be located at C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin. Another tool is
that ilasm.exe converts human readable code to IL and has instructions set the same as that the
native assembly language has. I write my code for ilasm in any text editor like notepad and then I
can use the command line compiler (ilasm.exe) provided by the .NET framework and that can
located at c:\windows\Mircosoft.NET\Frame work\v1.14322 or
C:\Windows\Microsoft.NET\Framework\v2.0.50727.
In the parser phase, when I was programming my DCSPM compiler I faced some issues. First, In
the MSIL code, every instruction has a label and the label is generated depends on the size of
opcode instruction which has different size such as one byte, two byte or 5 bytes, so, I solved this
issue by this function below:
public static string newLabel()
58
Abdullah Sheneamer 2012
{
string s = count1.ToString();
return str6;
}
This function generates the labels number but there’s another issue which is this function
generates decimal numbers since, MSIL code has to be hexadecimal numbers, I changed this
function to generate hexadecimal numbers such as this code below:
public static string newLabel()
{
string hexValue = count1.ToString("X");
str6 = "IL_" + hexValue + ":";
return str6;
}
When I compile Pascal code to MSIL such as if/else statement which has branches to go to
forward some next instructions, it’s difficult to know that label number of instruction before I got it
for example:
Pascal Code
program Example(input,output);
var a,b: integer; begin a:=2; b:= 3;
if(a>=2 or b<3) then begin a:=b+b; end
else begin a:=b-b; end;
end;
end.
MSIL Code
IL_0: ldc.i4.2
IL_1: stloc.0
IL_2: ldc.i4.3
IL_3: stloc.1
IL_4: ldloc.0
IL_6: ldc.i4.2
IL_7: bge.s IL_10
IL_9: ldloc.1
IL_B: ldc.i4.3
IL_C: clt
IL_E: br.s IL_11
IL_10: ldc.i4.0
IL_11: stloc.2
IL_12: ldloc.2
IL_13: brtrue.s IL_20
If ldc.i4.2 greater than
ldloc.0 go to IL_10
Just branch to IL_11
59
Abdullah Sheneamer 2012
Branch to IL_20 if the
value is non-zero “true”
Just Branch to IL_26
Figure 20: How Branches of If/else statements logic works
In this if/else statement logic shows that when I was programming this statement I had to know
next label before parsing it. So, in this issue, I let parser phase to finish scanning all Pascal code
while the parser phase reads Pascal code and generates MSIL code and save it in queue, it will
save all labels in variables which I made them for labels of instructions that look for forward label
such as: 10, 11, 20, and 26 in figure . So, after parsing of Pascal code and saving MSIL in queue
are completed. Since the parser phase reads MSIL code which is inside the queue, I created
another queue for save MSIL code when the parser dequeues the MSIL code. While the parser
dequeues the MSIL code instruction by instruction, it will inqueues the MSIL code to the new
queue until the old queue is empty. I applied the same approach for all branches in the other
statements. I programmed case statement that looks like this:
program SwitchStatement(input,output);
var a,b:integer;
begin
a:=1;
b:= 4;
case a of 1 : a:=a div b; 2 : a:= b+a; 3 : a:= a - b; 4 : a:=b*a;
end;
writeln(a);
end;
end.
When I changed the order of cases number of a such as this
program SwitchStatement(input,output);
var a,b:integer;
begin
60
Abdullah Sheneamer 2012
a:=5;
b:= 4;
case a of 1 : a:=a div b; 2 : a:= b+a; 3 : a:= a - b; 5 : a:=b*a;
end;
writeln(a);
end;
end.
Here, when I compiled this code to MSIL, it should be:
IL_000a:
switch
(
IL_0025,
IL_002b,
IL_0031,
IL_003d,
IL_0037)
This label goes to
ret label
.
.
.
IL_003d:
ret
So, DCSPM supported case statement which has ordered cases numbers to get the correct results
such as case a of 1: ….. 2:….,3:……,4:…. so on. Because I need time to do this problem.
In the test of my Pascal code, I used the DateTime and TimeSpan classes to measure the
speed of my code like this:
DateTime Start = DateTime.Now;
lex();
TimeSpan Elapsed = DateTime.Now- Start;
speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";
“ where you take the current DateTime, run the method whose performance you want to measure,
take the current time again, and subtract the inital time to get a TimeSpan object representing the
length of time your function took to execute.
Unfortunately, this method only gives a good measure of performance when the method you're
measuring has a long run time (a second or longer), since DateTime.Now uses the system timer,
which only has a resolution of about 10 milliseconds, meaning that if your method completes in
less then10 milliseconds, the elapsedMS variable above might return 0, telling you nothing about
how long your method actually took to complete.
Luckily since .Net 2.0, there is a better alternative to DateTime.Now: For example, the stopwatch
61
Abdullah Sheneamer 2012
class in the System.Diagnostics namespace. This class was, as the name implies, designed for
performance measuring, and uses your computer's high-resolution performance counter, which
usually has a resolution of less than one microsecond.”[25]
To rewrite the above code to use the Stopwatch class is easy:
System.Diagnostics.Stopwatch stopwatch =
new System.Diagnostics.Stopwatch();
Stopwatch stopwatch = new Stopwatch();
Stopwatch.Start();
lex();
stopwatch.Stop();
speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";
7 Future Works
Because DCSMP compiler is a relatively new field there is an enormous amount of work still
to be done. Many statements and data structures of Pascal language are yet to be supported and
related MSIL generated. They include complicated case statement, complicated nested if/else
statements logic, assert statement, exit statement, goto statement, repeat statement, next statement,
two dimensional array data structure, queue data structure, and stack data structure. Also, this
project just implements integer and , real types. There are many types need to be supported. The
pre-declared procedures are not done in this project. The other statements which DCSPM is not
supported complicated one dimensional array, if logic of a complex condition with multiple levels
8 Conclusion
In conclusion, since there is currently no Pascal compiler which compiles to MSIL,
therefore, this project focuses in the MSIL code of Pascal language. The DCSPM compiler is
useful to legacy Pascal to run on modern machines and its MSIL is a platform independent. MSIL
code is verified for safety during runtime and MSIL can be executed in any environment
supporting the CLI (Common Language Infrastructure). MSIL certainly helps to understand that
62
Abdullah Sheneamer 2012
the CLR is a stack based machine since others (e.g. JVM) are similar at their core. It really helps to
understand what's going on, how the runtime handles memory, metadata etc. and why some things
work and others don't. So, DCSPM compiler reads the Pascal code, scans the code token by token,
passes the tokens to the parser and MSIL phase, and generates MSIL code of Pascal language. In
nested if statement , the compiler generates a conditional branch. we create a block of instructions
for the structure. The block is separated by the branch instruction (like brtrue) and the branch label
(like IL_##- where the code jumps). And our condition is on the stack. If we find a true condition
branch, we negate it and put it as an if statement condition. The block is initially a block
of MSIL that we will convert to Pascal code later. Note that the labels are not stored in MSIL. It is
just the byte offset of the MSIL in a method. One dimensional array has two cases when compiling
to MSIL. First, when the array has one element or 2 elements will be the same looks like the MSIL
of other statements ( if/else/while….etc) as I explained in section 3.5.2.3. DCSPM has two
improvements one is in lexical analysis which I had explained that in section 5.1.1. The initial
lexical analysis is using array list data structure in symbol table and the improved lexical analysis
which is using a dictionary data structure in symbol table too. So, when I had tested the two
situations by Stopwatch class, I found the lexical analysis which is using dictionary data structure
is faster and efficiency more than lexical analysis which is using array list data structure. The
second improvement is in MSIL results of nested if statement. Also, I had tested initial and
improved if/else MSIL results which are generated in il file by DCSPM compiler. I had created a
batch timer.cmd file to calculate time of MSIL results. I found improved nested if/else statement
faster than initial nested if/else statement, although both of them have the same results. Since, the
improved nested if/else statement has less instructions than initial nested if/else statement
instructions , then the size of improved nested if/else.il less than size of initial nested if/else.il.
I hope DCSPM can be further developed and be used in real life projects. The experiences learned
in this project can serve as a foundation for developing new programming language.
63
Abdullah Sheneamer 2012
9
References
1. http://msdn.microsoft.com/en-us/library/c5tkafs1(v=vs.71).aspx
2. C# To Program By H.M Deitel & P.J.Deitel& J.Listfield & T.R. Nieto & C.Yaeger &
M.Zlatkina.
3. Compiler Construction principles and practice by Kennth C.louden
4. Data Structure using Java By D.S.Malik & P.S.Nair.
5. An introduction to formal languages and automata. Fourth Edition. Peter Linz
6. Compilers Principles, Techniques and Tools by Alfred V.Aho, Ravi Sethi and Jeffrey D.
Ullman. 1985
7. Develop a Compiler in Java for a Compiler Design Course Abdul Sattar and Torben
Lorenzen
8. Guide to assembly language [electronic resource] : a concise introduction / James T.
Streib.Streib, James T. London ; New York : Springer, c2011.
9. Using a Stack Assembler Language in a Compiler Course by Dr. Gerald Wildenberg St .
John Fisher College, Rochester, NY Bristol Polytechnic, England (1989-1990 )
10. Expert .NET 2. IL assembler/ Serge Lidin. Lidin, Serge. 1956- Berkeley, CA
11. http://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language
12. http://msdn.microsoft.com/en-us/library/ht8ecch6(v=vs.71)
13. Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition
14. http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm
15. http://en.wikipedia.org/wiki/Pascal_(programming_language)
16. http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture02.4up.pdf
17. http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx
18. http://msdn.microsoft.com/en-us/library/kx37x362.aspx
19. http://en.wikipedia.org/wiki/Microsoft_Visual_Studio_Express#Visual_C.23_Express
20. http://dll-repair-tools.com/dll-files/fusiondll-the-assembly-manager
21. http://www.learnvisualstudio.net/start-here/lesson-1-1-installing-visual-c-2010-expressedition/)
22. http://www.seas.gwu.edu/~hchoi/teaching/cs160d/pascal.pdf
23. http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentalschoosing-the-right-collection-class.aspx
24. http://www.dotnetperls.com/arraylist
64
Abdullah Sheneamer 2012
Appendix A:
PASCAL Grammar BNF. [22]
<program> ::= Program <identifier> ; <block> .
<block> ::= <variable declaration part>
<procedure declaration part>
<statement part>
variable declaration part> ::= <empty> |
var <variable declaration> ;
{ <variable declaration> ; }
<variable declaration> ::= <identifier > { , <identifier> } : <type>
<type> ::= <simple type>
<simple type> ::= <type identifier>
<type identifier> ::= <identifier>
<statement part> ::= <compound statement>
<compound statement> ::= begin <statement>{ ; <statement> } end
<statement> ::= <simple statement> | <structured statement>
<simple statement> ::= <assignment statement> |
<read statement> | <write statement>| <if statement> | <for statement>
<assignment statement> ::= <variable> := <expression>
<read statement> ::= read ( <input variable> { , <input variable> } )
<input variable> ::= <variable>
<write statement> ::= write ( <output value> { , <output value> } )
<output value> ::= <expression>
<structured statement> ::= <compound statement> | <if statement> |
<while statement>
<if statement> ::= if <expression> then <statement> |
if <expression> then <statement> else <statement>
<while statement> ::= while <expression> do <statement>
<for statement> ::= for <variable identifier > ::= <expression> to <expression> do <
statement>
<expression> ::= <simple expression> |
<simple expression> <relational operator> <simple expression>
<simple expression> ::= <sign> <term> { <adding operator> <term> }
65
Abdullah Sheneamer 2012
<term> ::= <factor> { <multiplying operator> <factor> }
<factor> ::= <variable> | ( <expression> )
<relational operator> ::= = | <> | < | <= | >= | >
<adding operator> ::= + | <multiplying operator> ::= * | /
<variable> ::= <entire variable>
<entire variable> ::= <variable identifier>
<variable identifier> ::= <identifier>
<identifier> ::= <letter> { <letter or digit> }
<letter or digit> ::= <letter> | <digit>
<integer constant> ::= <digit> { <digit> }
<character constant> ::= '< any character other than ' >' | ''''
<letter> ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
p|q|r|s|t|u|v|w|x|y|z|A|B|C|
D|E|F|G|H|I|J|K|L|M|N|O|P
|Q|R|S|T|U|V|W|X|Y|Z
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<special symbol> ::= + | - | * | = | <> | < | > | <= | >= |
( | ) | := | . | , | ; | : | if | then | else | of | while | do |
begin | end | read | write | var | | program | switch| for | to
<predefined identifier> ::= integer | Boolean
66
Abdullah Sheneamer 2012
Appendix B:
-
Installing Visual C# 2010 Express Edition. [21]
Step1 – Navigating to the Microsoft website.
Launch your favorite web browser (IE, Firefox etc).
(1) Navigate to www.Microsoft.com/express/express-2010
(2) Select the Visual C# Express link. Scroll about halfway down the page and click the Microsoft
C# Express link; this will take you to the download page.
NOTE:
Microsoft is not going to distribute a release candidate for the Express Edition tools, but when
Visual Studio 2010 is officially released we’ll update any lessons, like this one, that would be
affected by the change of location or any UI changes. We’ve been assured by our friends at
67
Abdullah Sheneamer 2012
Microsoft that the Beta 2 that we’re about to work with and the final release will be nearly
identical as far as the Express Editions are concerned.
Step 2: Download the appropriate software.
(1) Select Visual C# to expand the options.
(2) Select your language of your choice and click the purple "Free Download" button and
save it to your downloads folder on Windows 7.
68
Abdullah Sheneamer 2012
Step 3: Locating the downloaded software on your computer.
Once that’s finished, close everything down.
(1) Open Windows Explorer. Go to the downloads directory.
(2) Double-click on the vcs_web.exe file.
(3) When you get the user account control questions, select "yes".
69
Abdullah Sheneamer 2012
Step 4: Installing the software.
After a moment or two we’ll see the usual installation screens for an application. After clicking
"next" and agreeing to the license terms, then we want to make sure to install the optional
Microsoft SQL Server express edition service pack one. And a few things to note on this particular
screen: First of all, since this is a Beta we’ll leave everything set to the default option. There aren’t
any known issues regarding the installer for Visual C# 2010 Express, but experience has shown
this to be a best practice. Also, be forewarned that what you downloaded a few moments ago was
just a kind of a bootstrap. What it’ll go out and do now is download all of the bits that you need
based on the selections that you made. So the complete download is 238 MBs, and it’s a very long
download. Click "install". About halfway through we’ll be prompted to restart our computer; we’ll
talk about that in just a moment.
70
Abdullah Sheneamer 2012
Step 5: The computer restart.
bout midway into the install we’re prompted to reboot; and this is because the .net framework run
time 4.0 beta-2 needs to be installed. This is a piece of software that’s foundational. Everything
that we’ll write as an application will build off of this runtime; we’ll discuss this more later on in
the series. For right now, click the ‘restart now’ and as our machine reboots it will look similar to
a Windows update.
71
Abdullah Sheneamer 2012
Step 6: Post-restart software installation.
Once the computer restarts the installation will continue automatically. We needed to reboot
because the .net runtime touches so many parts of the operating system and because Visual C#
2010 Express Edition is built on that framework itself, so it needed to have that in place before it
could be installed. We’re about halfway through the install this point.
72
Abdullah Sheneamer 2012
Step 7: Software installation confirmation.
At this point we have a successful install. If you didn’t see this screen and you encountered some
sort of error message, in a separate lesson we’ll demonstrate how to submit a bug report to
Microsoft as well as how to find solutions to your installation problems. Microsoft wants to hear
about any installation issues that you may have had, so it’s important that you contribute in that
way.
73
Abdullah Sheneamer 2012
Step 8: Navigating to the newly-installed software.
That completes the installation. To verify:
(1) Go to the start menu on Windows
(2) Click the new icon "Microsoft Visual C# 2010 Express".
74
Abdullah Sheneamer 2012
Step 09: The Visual Studio IDE.
The first time it runs it needs to set up a few things with regards to the environment that it’s
running in; screen resolution, keyboard settings, things of that nature. This interface will become
ingrained within your mind over the course of the lessons – but this is a great start.
75