CS241 Notes on Chapter 3 of Sargent and Shoemaker p. 58 Fig. 3-1. fill: movl $51, %ecx num1 = 0x18 var2: .byte 0,0 # Initialize counter (51 is decimal) (num1 is a symbolic constant "sym val") (2 bytes of 0 in data area) Note that fill and var2 look more alike here than in the book-they both are followed by a colon, and both define memory addresses, one in the code area and one in the data area. They are both called "tags" or "labels". var2 can also be called a variable, a spot in memory that at least potentially changes while the program runs. num1 is not a tag or label, as it is not associated with a memory address, but rather just holds that constant value 0x18. All three of fill, num1, and var2 are "symbols", and show up in the "symbol table" of the object file. p. 59 "title test program" (no gas equivalent) CR = 0xd # carriage return dsbseg: .word 0 dascnt: .byte 0 nhelp: .byte 0 xadr: .long 0xffff0000 count: .byte 0 (or use .bss, described below) end (no gas equivalent) p. 61 minimal assembler program # minimal assembly program (it needs C lib, though. # able to write one that doesn't) Later we will be .data msg: .asciz "Hello, World!\n" stackstart: .fill 2560,1,0 # printf needs much more than 256 bytes stackend: .text movl $stackend, %esp # set up own stack call _init_devio # init C library i/o pushl $msg # push address of msg call _printf # call printf addl $4,%esp # adjust stack (could drop this) int $3 # return to Tutor Note there is no "model" directive or segment setup. The model is effectively the "tiny" model with huge segments and identical code and data segments (but don't worry about this, just FYI). The memory segments are set up once and for all by the boot-up code. The assembler has two logical memory areas, also called segments, the text and data segments. The .text directive makes the following assembly language lines assemble into the text seg, until a .data directive switches it over to start putting lines into the data seg. This can switch back and forth several times in a .s file. All the .text lines are gathered together, in order, into one area and all the .data lines into another. The loader, i386-ld, puts the text area first in memory, followed by the data area. In fact there are three logical assembler segments, text, data, and bss, the uninitialized data segment. We could put the stack there since it has no valid values until it is used: # minimal assembly program (still needs C lib, though) .data msg: .asciz "Hello, World!\n" .text _start: #( this tag not actually needed) movl $stack1end, %esp # set up own stack call _init_devio # init C library i/o pushl $msg # print message call _printf # call printf addl $4,%esp # adjust stack (could drop this) int $3 # return to Tutor .bss stack1: .=.+2560 # same stack size, in bss stack1end: The loader will put the text, data, and bss segments in memory in that order. The executable file will have space for the initial values for the data segment but no space used for the bss segment. The bss segment is allocated at runtime and cleared by the code responsible for setting up the program image. A special makefile is needed for building the above program. Since each program needs to initialize a stack and initialize the C library this is provided in startup0.s and startup.c, used in the regular makefile, as is the int $3 needed to get back to Tutor. Thus with the usual makefile, all you need to have the same effect as the above program is: # minimal assembly program (needs startup0.s, startup.c and C lib, # but doesn't need a C driver) .globl _main # to fit with startup.c setup .data msg: .asciz "Hello, World!\n" .text _main: pushl $msg # print message call _printf addl $4,%esp ret # return to startup.c p. 64 Fig. 3-1. Just reverse the order of operands here, including in and out, whereever there are two operands. p. 64 xlat instruction. Here is a C-callable function to convert one byte to hex digit: convb(12) = 'c' for example. .globl _convb .text _convb: movb 4(%esp), %al # put byte in al movl $table, %ebx # and table addr in ebx xlat # replace al with conv'd byte ret .data table: .ascii "0123456789abcdef" p. 67 won: .byte 0 # counter for games won lost: .byte 0 # counter for games lost board: .fill 9, 1, 0 # 3x3 array for Tic-Tac-Toe board msg1: .asciz "You win" msg2: .asciz "You lose" But note that a byte counter can only contain 127 before it "goes negative", so it's usually better to use a doubleword. p. 68 Direct addressing "movb %al, won" for a byte counter, or "movl %eax, won" for a doubleword counter. "movb $3, won" puts a 3 in won. "movb board+3, %al" moves the contents of loc. 1023 into al. When the assembler "board+3", it looks up the 32-bit address of board and then does the indicated addition to yield a 32-bit address for the operand. "mov dx, offset won" is simply "movl $won, %edx", same syntax as constant data. So you have to be sure to discriminate between "movl won, %edx" (assuming a doubleword count), and "movl $won, %edx". The first copies the contents of variable won to edx, the second puts the address of the variable won into edx. Indirect Addressing Examples: "movl (%esi),%ecx" uses the number in esi as a pointer and copies the pointed-to doubleword to ecx. "movl %ebx,(%edi)" uses the contents of edi as a pointer in the destination of the doubleword copy. "movb %al,(%ebx)" uses the contents of ebx as a pointer in the destination of the byte-wide copy. Intel "mov al, [bx]+5" is written in gas as "movb 5(%ebx),%al". This copies the byte located 5 bytes down from where ebx points to, into al. Note this is close to an alternative Intel expression for the same address mode: "mov al, 5[bx]", although still another alternative seems more common: "mov al, [bx + 5]". This example is close to that of Fig. 3-2. There, the Intel "mov ax,[bx+3]" corresponds to gas "movl 3(%ebx),%eax", which copies the doubleword from 8003 (0x00008000 from %ebx, plus 3) to eax. We should show 2 more bytes being copied, from 8005 and 8006 (aka 00008005 and 00008006, showing all 32 bits). "mov [bx]+board, al" AKA "mov board[bx], al" AKA "move [bx + board], al" is written in gas as "movb %al, board(%ebx)". Here the address of board (the symbol value of board) is added to the contents of ebx to get a pointer value to use in the destination. "mov [si + 3 + won], cl", an improbable instruction in the given setup, would be "movb %cl,(3+won)(%esi)" or "movb %cl, 3+won(%esi)". Parentheses can be used in symbol expressions. When they enclose a register, that makes it indirect. Base Plus Index Addressing The kinds of addressing that combine addresses from two registers are called "base plus index" addressing. Intel "mov ax,[bx+si]" corresponds to gas "movl (%ebx,%esi),%eax". Here ebx is the "base" reg and esi is the "index" reg. The CPU just sums the two reg vals and uses that as a pointer. Using 32-bit registers, any reg can be used for the base or index reg, except that %esp cannot be used as an index reg (Table 3-2). The index reg can be "scaled" by 1, 2, 4, or 8, i.e. that reg's contribution is multiplied by this amount. This is good for array referencing. For example, if %esi contains the array index in a doubleword array starting at location table, then we put table's address in %ebx and then Intel "mov %eax,[ebx+4*esi]" or gas "movl (%ebx,%esi,4),%eax" will load eax with that table longword element. p. 73 Structures Gas does not have structures. We can give symbol names to field offsets and have some of the advantages of this facility. For RECT, we have (sticking with word-wide quantities here, maybe we have millions of these in memory...) left = 0 top = 2 right = 4 bottom = 6 sizeofrect = 8 .data rect1: .word 0,0,20,20 .text movw rect1+right, %ax # use sym addition subw rect1+left, %ax movl $rect1, %ebx movw right(%ebx), %ax subw left(%ebx), %ax p. 75 # set up pointer in ebx # use displacement off of ptr Combos of addressing modes movl srecord(%ebx,%ecx,4), %eax movl srecord(%ebx,%ecx), %eax movl srecord(,%ecx,4), %eax movl srecord(%ebx), %eax movl srecord, %eax movl (%esp), %eax lea srecord(%ebx, %ecx, 4), %eax p. 77 jmptable.asm: We would do this kind of work in C. do it in C and disassemble it if we wanted. We could p. 79 C tends to push whole doublewords on the stack. It "widens" arguments to ints, except possibly structs. Thus chars, shorts and ints all take up 4 bytes each on the stack. p. 80 printw--nothing really new here for us. Section 3-6: Skip for now. Section 3-7 p. 91 Conditional Jumps: these are available in gas p. 95 Large Conditional Jumps. We are not worrying about pre-386 (that's pre proper-32-bit!!). All jumps can go anywhere in 32-bit space. Section 3-8 Arithmetic Instructions. Let's do any complicated arithmetic in C. All we might need in gas is adjusting the stack pointer by adding 4, 8, or whatever to it. Section 3-9 Logic Instructions. These might come in handy, but let's wait until we need them. Section 3-10 String Primitive Instructions Let the C-library use these for up. Rest of chapter--skip