chap3.doc

advertisement
CS241 Notes on Chapter 3 of Sargent and Shoemaker
p. 58 Fig. 3-1.
fill: movl $51, %ecx
num1 = 0x18
var2: .byte 0,0
# Initialize counter (51 is decimal)
(num1 is a symbolic constant "sym val")
(2 bytes of 0 in data area)
Note that fill and var2 look more alike here than in the book-they both are followed by a colon, and both define memory addresses,
one in the code area and one in the data area. They are both
called "tags" or "labels". var2 can also be called a variable,
a spot in memory that at least potentially changes while the
program runs. num1 is not a tag or label, as it is not associated
with a memory address, but rather just holds that constant value
0x18. All three of fill, num1, and var2 are "symbols", and
show up in the "symbol table" of the object file.
p. 59
"title test program"
(no gas equivalent)
CR = 0xd
# carriage return
dsbseg:
.word 0
dascnt: .byte 0
nhelp: .byte 0
xadr: .long 0xffff0000
count: .byte 0
(or use .bss, described below)
end (no gas equivalent)
p. 61 minimal assembler program
# minimal assembly program (it needs C lib, though.
# able to write one that doesn't)
Later we will be
.data
msg: .asciz "Hello, World!\n"
stackstart: .fill 2560,1,0
# printf needs much more than 256 bytes
stackend:
.text
movl $stackend, %esp
# set up own stack
call _init_devio # init C library i/o
pushl $msg
# push address of msg
call _printf
# call printf
addl $4,%esp
# adjust stack (could drop this)
int $3
# return to Tutor
Note there is no "model" directive or segment setup. The model
is effectively the "tiny" model with huge segments and identical
code and data segments (but don't worry about this, just FYI).
The memory segments are set up once and for all by the boot-up
code. The assembler has two logical memory areas, also called
segments, the text and data segments. The .text directive makes
the following assembly language lines assemble into the text seg,
until a .data directive switches it over to start putting lines
into the data seg. This can switch back and forth several times
in a .s file. All the .text lines are gathered together, in order,
into one area and all the .data lines into another. The loader,
i386-ld, puts the text area first in memory, followed by the data area.
In fact there are three logical assembler segments, text, data, and
bss, the uninitialized data segment. We could put the stack there
since it has no valid values until it is used:
# minimal assembly program (still needs C lib, though)
.data
msg: .asciz "Hello, World!\n"
.text
_start: #( this tag not actually needed)
movl $stack1end, %esp # set up own stack
call _init_devio # init C library i/o
pushl $msg
# print message
call _printf
# call printf
addl $4,%esp
# adjust stack (could drop this)
int $3
# return to Tutor
.bss
stack1: .=.+2560
# same stack size, in bss
stack1end:
The loader will put the text, data, and bss segments in memory in
that order. The executable file will have space for the initial
values for the data segment but no space used for the bss segment.
The bss segment is allocated at runtime and cleared by the
code responsible for setting up the program image.
A special makefile is needed for building the above program. Since each
program needs to initialize a stack and initialize the C library
this is provided in startup0.s and startup.c, used in the regular
makefile, as is the int $3 needed to get back to Tutor. Thus with
the usual makefile, all you need to have the same effect as the above
program is:
# minimal assembly program (needs startup0.s, startup.c and C lib,
# but doesn't need a C driver)
.globl _main
# to fit with startup.c setup
.data
msg: .asciz "Hello, World!\n"
.text
_main: pushl $msg
# print message
call _printf
addl $4,%esp
ret
# return to startup.c
p. 64 Fig. 3-1. Just reverse the order of operands here, including
in and out, whereever there are two operands.
p. 64 xlat instruction. Here is a C-callable function to convert
one byte to hex digit: convb(12) = 'c' for example.
.globl _convb
.text
_convb: movb 4(%esp), %al
# put byte in al
movl $table, %ebx
# and table addr in ebx
xlat
# replace al with conv'd byte
ret
.data
table: .ascii "0123456789abcdef"
p. 67
won: .byte 0
# counter for games won
lost: .byte 0
# counter for games lost
board: .fill 9, 1, 0
# 3x3 array for Tic-Tac-Toe board
msg1: .asciz "You win"
msg2: .asciz "You lose"
But note that a byte counter can only contain 127 before it "goes
negative", so it's usually better to use a doubleword.
p. 68
Direct addressing
"movb %al, won" for a byte counter, or "movl %eax, won"
for a doubleword counter. "movb $3, won" puts a 3 in won.
"movb board+3, %al" moves the contents of loc. 1023 into al. When
the assembler "board+3", it looks up the 32-bit address of board
and then does the indicated addition to yield a 32-bit address
for the operand.
"mov dx, offset won" is simply "movl $won, %edx", same syntax as
constant data. So you have to be sure to discriminate between
"movl won, %edx" (assuming a doubleword count), and "movl $won, %edx".
The first copies the contents of variable won to edx, the second
puts the address of the variable won into edx.
Indirect Addressing
Examples: "movl (%esi),%ecx" uses the number in esi as a pointer
and copies the pointed-to doubleword to ecx.
"movl %ebx,(%edi)" uses the contents of edi as a pointer in the
destination of the doubleword copy.
"movb %al,(%ebx)" uses the contents of ebx as a pointer in the
destination of the byte-wide copy.
Intel "mov al, [bx]+5" is written in gas as "movb 5(%ebx),%al".
This copies the byte located 5 bytes down from where ebx points to,
into al. Note this is close to an alternative Intel expression
for the same address mode: "mov al, 5[bx]", although still another
alternative seems more common: "mov al, [bx + 5]".
This example is close to that of Fig. 3-2. There, the Intel
"mov ax,[bx+3]" corresponds to gas "movl 3(%ebx),%eax", which
copies the doubleword from 8003 (0x00008000 from %ebx, plus 3)
to eax. We should show 2 more bytes being copied, from 8005
and 8006 (aka 00008005 and 00008006, showing all 32 bits).
"mov [bx]+board, al" AKA "mov board[bx], al" AKA "move [bx + board], al"
is written in gas as "movb %al, board(%ebx)". Here the address of
board (the symbol value of board) is added to the contents of ebx
to get a pointer value to use in the destination.
"mov [si + 3 + won], cl", an improbable instruction in the given
setup, would be "movb %cl,(3+won)(%esi)" or "movb %cl, 3+won(%esi)".
Parentheses can be used in symbol expressions. When they enclose
a register, that makes it indirect.
Base Plus Index Addressing
The kinds of addressing that combine addresses from two registers
are called "base plus index" addressing.
Intel "mov ax,[bx+si]" corresponds to gas "movl (%ebx,%esi),%eax".
Here ebx is the "base" reg and esi is the "index" reg. The CPU
just sums the two reg vals and uses that as a pointer. Using
32-bit registers, any reg can be used for the base or index reg,
except that %esp cannot be used as an index reg (Table 3-2).
The index reg can be "scaled" by 1, 2, 4, or 8, i.e. that reg's
contribution is multiplied by this amount. This is good for
array referencing. For example, if %esi contains the array
index in a doubleword array starting at location table, then
we put table's address in %ebx and then Intel "mov %eax,[ebx+4*esi]"
or gas "movl (%ebx,%esi,4),%eax" will load eax with that
table longword element.
p. 73 Structures
Gas does not have structures. We can give symbol names to
field offsets and have some of the advantages of this facility.
For RECT, we have (sticking with word-wide quantities here, maybe
we have millions of these in memory...)
left = 0
top = 2
right = 4
bottom = 6
sizeofrect = 8
.data
rect1: .word 0,0,20,20
.text
movw rect1+right, %ax # use sym addition
subw rect1+left, %ax
movl $rect1, %ebx
movw right(%ebx), %ax
subw left(%ebx), %ax
p. 75
# set up pointer in ebx
# use displacement off of ptr
Combos of addressing modes
movl srecord(%ebx,%ecx,4), %eax
movl srecord(%ebx,%ecx), %eax
movl srecord(,%ecx,4), %eax
movl srecord(%ebx), %eax
movl srecord, %eax
movl (%esp), %eax
lea srecord(%ebx, %ecx, 4), %eax
p. 77 jmptable.asm: We would do this kind of work in C.
do it in C and disassemble it if we wanted.
We could
p. 79 C tends to push whole doublewords on the stack. It "widens"
arguments to ints, except possibly structs. Thus chars, shorts and
ints all take up 4 bytes each on the stack.
p. 80 printw--nothing really new here for us.
Section 3-6: Skip for now.
Section 3-7
p. 91 Conditional Jumps: these are available in gas
p. 95 Large Conditional Jumps. We are not worrying about pre-386
(that's pre proper-32-bit!!). All jumps can go anywhere in 32-bit
space.
Section 3-8 Arithmetic Instructions.
Let's do any complicated arithmetic in C. All we might need in gas
is adjusting the stack pointer by adding 4, 8, or whatever to it.
Section 3-9 Logic Instructions.
These might come in handy, but let's wait until we need them.
Section 3-10 String Primitive Instructions
Let the C-library use these for up.
Rest of chapter--skip
Download