Lecture № 5 Syntax of Assembly 1. Format of instructions and macroinstructions. 2. Syntax of operands in Assembly. 3. Syntax of operators in Assembly. Literature. 1. Jurov V. Assembler, – SPb.: Piter, 2001. – 624 p. 2. Pustovarov V. I. Assembler. Programming and analysis of machinery programs correctness, - Kiev: “Irina”, 2000. - 476 3.Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall, 2002. Syntax of Assembly Sentences, which are included in any program, may represent a syntax construction of one of the types: instruction, macroinstructions, directive or comment. There are concrete regulations, according which every syntax construction is formed. The formats’ diagrams below illustrate these regulations: Format of instructions and macroinstructions. OC Operand 1 Name Of Label (Label) , : ; Operand 2 Comments Here : Name of Label (Label) is an identifier, which meaning is an address of the first byte of the sentence; OC (operational code) is a mnemonic designation of the corresponding machinery instruction or macroinstructions; Operands are parts of instruction or macroinstructions, which are subjected to some actions. Format of Directives. Directive Operand 1 , Name ; Operand n Comments Here : Directive is a mnemonic designation of the translator’s directive Name is an identifier, with help of which the translator distinguishes similar directives The following symbols are permissible in a text of assembly program: - all Latin letters: A-Z, a-z (capital and small letters are considered as equivalent); - digits from 0 to 9; - signs (characters) ?, @, $, _, &; - delimiters (separators): , . [ ] ( ) , . { } + / * % ! “ “ ? \ = # ^. Assembly sentences are formed by lexical units (tokens) [лексемы], which are not separable sequences of permissible language symbols (they have sense only for the translator). The lexical units are: - Identifiers are sequences of permissible symbols, which used for designation such program’s objects as: codes of operations, names of variables and labels names. There is a regulation of spelling (writing) identifiers: Identifier may include one or more symbols, but not more than 255. Symbols may be letters of Latin alphabet and some special signs: __, ?, $, @. It can not(!) begin with a digit. - Symbols chains are sequences of symbols enclosed in inverted commas (single or double); - Integers are presented in: binary, decimal or hexadecimal calculation system. Operands. Let’s consider classification supported by assembly translator: of some operands, Constant or direct operands: number, string, name or expression, which have a fixed meaning. The name must be not removable (i.e. it mustn’t depend on an address of program loading into the memory (for example, it may be defined by operators equ or = . Address operands. These operands set physical location of operand in the memory with help of pointing to address components: segment and offset. Syntax of Address Operands description. CS DS : Integer Absolute Name SS ES GS fs Segment’s Name Name of Group Absolute Expression Removable operands are any symbolic names, which represent some memory addresses. These addresses may designate a place inside the memory of an instruction (if operand is a label), or data (if operand is a name of the memory area inside a data segment). These operands are not fastened to a concrete address of the physical memory. The segment component of the a removable operand address is not known and will be determined only after loading of the program into the memory for execution. For example: data segment mas_w dw 25 dup (0) ….. code segment ….. lea SI, mas_w; mas_w is a removable operand In this fragment mas_w is a removable operand, the meaning of which is the initial address of the memory area of 25 words volume. The full physical address of this memory area will be known after the program loading. Address counter is a specific type of operand. It is designated as $. The specificity of this register consists in following: when the translator meats this symbol in a program, it puts instead of it the current contents of the address counter. Register operand. This is simply a name of one of ALU’s registers. In common operands may be components of more complex formations, which are called expressions. Expressions are combinations of operands and operators. As in high-level languages an execution of assembly operators is also fulfilled during expressions calculation in accordance with their priorities (operations with equal priorities are executed sequentially from left to right, the change of the order is possible by using round brackets (parentheses), which have the highest priority). Example of operators and their priorities. Operators Priority length, size, width, mask, (, ), [, ], <, > 1 . 2 . . . . ptr, offset, seg, type, this 4 high, low 5 +, - (unary) 6 *, /, mod, shl, shr 7 +, - (binary) 8 eq, ne, lt, le, gt, ge 9 Let’s give a short characteristic of operators: Arithmetic operators. The next operators belong to this type: “+”, “-“ (unary and binary); “*”, “/”, “mod”. Syntax of Arithmetic Operators Expression_1 + + Expression_2 + * / MOD D+ + - Example: tab_size equ 68 ; Volume of an array in bytes size_el equ 4 ; size of elements ….. ;a number of elements in the array is determined ;and is inputted into CX register mov CX,tab_size/size_el ; operator”/” Shift operators execute shift of an expression on pointed number of digits (positions). Example: Mask_b equ 10011000 ….. mov AL,mask_b shr 3 ;operator “shr” Syntax of Shift Operators Expression shr Number of shifted positions shl Comparison operators (return meaning of “truth”, “false”), intended for logical expressions formation. Logical meaning “truth” corresponds to logical 1 (unity), and “false” corresponds to 0. Example: tab_size equ 30; a size of the table …. mov AL,tab_size gt 50; loading of the table size ;in register AL cmp AL,0; if tab_size <50, then je m4 ; jump on m4 …. m4: ………………………… Index operator [ ]. The translator understands this operator as an indication to add the meaning of the first expression (which is out of [ ]) with the meaning of the second expression. Example: mov AX,mas[SI]; transfer of word with an address ;mas+(SI) into the register AX Operator of redeclaration (redefinition) of the ptr type. It is used for redeclaration or for making more precise the type of label or variable, which have been determined by the expression. The type may have one of the following meanings: byte, word, dword, qword, tbyte, near, far. Example: d_wrd dd 0 ...... mov AL,byte ptr d_wrd+1; transfer of the second byte ;from the double word Operator of segment redeclaration ’:’ (colon). The translator understands it as an indication to calculate a physical address in correspondence with the given segment component: “name of the segment register”, “name of the segment” from the directive SEGMENT or “Group name”. It is important to keep in mind, that the code segment can not (!) be redeclarated. This may be explained by the role of code segment in the sequenced program execution: for the execution of the next in turn program the microprocessor must first of all “look through” the contents of code segment register (namely in this register the address of the base (beginning) of the code segment is contained). In order to calculate an address of the necessary instruction, the microprocessor multiplies the contents of the CS by 16 (it means to fulfill a shift on 4 positions to the left) and after it , the microprocessor sums the obtained 20 bits product with 16 bits contents of the IP. Approximately the same is executed for operands processing, namely: if the microprocessor understands, that the operand is an address (the efficient address of which is only a part of the physical address), then it knows, in which of the segments it may be located (as a rule, it is fixed in the register DS). If data addresses (or data) are stored in a segment of stack, then we will deal with registers SP and BP (where the necessary addresses are stored, as a rule).If such types of addresses are “as a rule” stored in these segments, it means, that they may be stored in other segments, and it is possible to choose where it will be more convenient to locate them. For this purpose the redeclaration operator serves. It is used as a prefix , which a bit corrects the work of an instruction. The prefix is included in the not compulsory field of the machinery instruction, and represents by itself one bit value, which numerical meaning determines its destination. Let’s consider an example: code segment …… jmp metka ;the walk [обход] of the field sdq ;(compulsory!) sdq db 4; description of data field metka: …….. mov AL, CS:sdq ; this redeclaration allows to work ; with data inside the code segment Operator of obtaining segment component of an address of expression seg. It returns a physical address of a segment for some expression (the expression may be: label, variable, name of group or any symbolic name). The syntax diagram of this operator: seg Expression Operator of obtaining offset of expression offset. It allows to obtain an offset of the expression in bytes (an offset relatively the beginning of those segment, in which this expression is located). Example of using these operators: data segment smth dw 8 …… code segment ……. mov AX,seg smth mov DS,AX mov DX, offset smth ; now in the couple DS:DX ; we have got the full physical address of smth Problems. 1. Which types of sentences is it possible include in assembly program? 2. Describe the general structure of an EXE-format program. 3. Why is it necessary include int 21h in assembly program?