LN 11: Mirosoft Assembler, masm

advertisement
CS 201
Computer Systems Programming
Chapter 11
“x86 Microsoft Assembler”
Herbert G. Mayer, PSU CS
Status 11/5/2012
1
Introductory Notes
CS 200 has been eliminated from the PSU CS
curriculum, thus assembly language programming is
de-emphasized. Some MS assembly language
program will be covered in CS 201, but focus will be
reading and understanding .asm source programs.
Main assembler used here is Microsoft macro
assembler, commonly known as masm. A version can
be installed from Microsoft, but requires Visual C++
2005 be installed.
Assembler mnemonics and symbols in masm are
somewhat different from asm source emitted by gcc
compiler; the latter reminds of SPARC asm, with %
identifying regs; the latter also is used in our CS 201
text book.
2
Introductory Notes
1. Find a downloadable masm version 8.0 here:
http://www.microsoft.com/enus/download/details.aspx?id=12654
2. Or find references Microsoft to masm here:
http://msdn.microsoft.com/enus/library/afzk3475.aspx
3
Introductory Notes
Assembly Language programs bridge the gap
between low level machine binary instructions and
higher level interface with human programmers. The
former are required to accomplish execution on a
digital computer; the latter are convenient tools of
expression for programmers. Assembly language is a
low-level, target machine specific interface.
But assembler presents a level of abstraction. Users
do not deal with the target in terms of bits that
represent binary machine instructions. The
assembler elevates user to the level of textual
language, up from the level of binary object code.
4
Introductory Notes
Common to many architectures is separation of data
space, instruction space, and perhaps other areas of
program logic. The x86 architecture embodies so
called data segments, code segments, stack
segments, and numerous of these if needed. Each
segment is identified at run time by segment register.
For example, the code label next: is interpreted by the
HW as seg:offset, where seg is segment register cs,
and offset is the offset of next from the start of the
code segment. If the offset of next is 248x and the
value in the cs register is 20030x, the resulting run time
(code) address is 200548x; note the left-shift of the
segment address by 4 bits.
5
Introductory Notes
This chapter introduces complete masm assembler
source programs. Starting with the smallest possible
but complete assembly program, doing nothing but
asking DOS for its assisted suicide, we progress to
more sophisticated programs.
One example emits a single character, the next prints
a complete string onto the standard screen, followed
by conventions that allow us to communicate with the
assembler in an abbreviated way. We also discuss
macros and simple procedures.
The “Definitions” below are in alphabetical order; we
cover them in logical order, to minimize forward
referencing.
6
Syllabus










Motivation
Definitions
Null Program
Print Single Character
Print Character String
Assembler Abbreviations
Assembler Macros
Assembler Procs
Assemble and Link
References
7
Motivation
 Almost impossible to communicate with machine
on the binary level
 Assembler offers a significant level of abstraction
from the machine bits, plus relocatability, symbolic
names and addresses, and limited program reuse
 Symbols permit easy definition and reference of
data and code objects
 Microsoft’s masm even offers high level
constructs, similar to high-level statements
 Assembler programming allows the highest level of
control over the target machine
 And permits to achieve highest performance -for
short code sections
8
Definitions
 Address: identifying attribute of any distinguishable
memory unit. On old x86 architecture a logical
address is a pair seg : offset, translated by hardware
into so called linear address. Segment and offset are
16 bits long each in real mode. The machine address,
called a linear address, is 20 bits long, with the
rightmost (low-order) 4 bits of a segment address
implied to be 0, as a segment must be 16-byte-aligned
 Alignment: Attribute of an address a, requiring that a
must lie on a specified boundary; for example, the
address a must be even, or must be evenly divisible
by 4 or 512. The former case is also called modulo-2
alignment, the latter modulo-512 alignment. Note that
aligned addresses have some (of their lower address)
bits set to 0. Hence, if these addresses are stored in
hardware, these 0s can be omitted, i.e. are implied,
whenever the complete address is needed
9
Definitions

Assembler: source to object translator, reading relocatable,
abstract, machine specific source programs, translating
them into binary object code. After linking, the binary code
is executable

Binary Object: strings of bits which, when interpreted by the
target machine, are legal machine operations plus
associated memory references. Jointly, these represent
executable programs

Code Segment: Subsection of an architecture’s memory
which holds executable instructions with possibly
embedded, immediate operands

Data Segment: Subsection of an architecture’s memory
which holds data being referenced or manipulated. Like any
segment, a data segment is identified by a segment register,
holding its start address. Such an address must be evenly
divisible by 16 on the x86 family processors. These
addresses are also called paragraphs.
10
Definitions

Offset: Distance of a named object (addressable unit) from
the beginning of an area encompassing the name

Paragraph: Range of contiguous memory addresses that is
16 bytes long, and whose first byte address is evenly
divisible by 16

Relocation: Ability of digital computer information to be
placed in any location of memory. For example, referring to
data (or object code) by offsets relative to some start
address allows the code to be placed anywhere, as long as
the respective start address is always added at execution
time

Segment: A subsection of memory. It is identified by a
segment register and holds either code, data, or stack
space; usually has alignment constraint.
11
Definitions

Stack: Data structure holding data that are accessed only in
a particular way, named LIFO (last in first out). The amount
of data varies over time. Increases of data are accomplished
through an operation called pushing, decreases via popping
(on the x86 architecture). A stack segment register points to
the beginning of the stack, the base pointer to the end, and
the stack pointer to the current and varying top

Top of Stack: Select element on the stack that is accessible
(visible). There may be other elements in the stack, hidden
by the top element. Additional elements are created by
pushing, and elements are removed by popping
12
Null Program












Set up the program’s segments: code, data, and stack
In sample below there is only a Code Segment
Note the code string to identify code segment
Communicate implied segment portion of seg:offset in the
assume pseudo-instruction
Define start address (actually offset) via label, here the label is
named start:
Label is user-defined identifier followed by colon, in code
segment
Use DOS services: Here 4ch to terminate
DOS services requested via INT 21h
Specific service defined in register ah and possibly other
registers
Return code is zero, meaning: no errors occurred
Note comments, introduced by ;
A comment ends at the end of a line
13
Null Program
;
;
;
;
Source file:
Author:
Purpose:
Assembler:
out1.asm
Herb Mayer
simple, meaningless program, no data seg, no stack
Microsoft assembler, command "masm »; 16-bit version
code_s segment 'code’
assume cs:code_s
start: mov al, 00
mov ah, 4ch
int 21h
code_s ends
end start
; communicate implied seg register
;
;
;
;
termination code for DOS 21, 4ch
tell DOS to terminate, 4ch in ah
call DOS routine 21h for help
end of [code] segment
; end’s argument defines start
; sounds like Microsoft, say start to stop
14
Print Single Character

Define also data and stack segment; though unused

dw 999 reserves (defines) an int word, initialized to 999; not used

dw 100 dup( 1234 ) defines 100 words, all initially 1234; not used

DOS routine 21h is called for help: INT 21h

Specify to DOS via value in ah, which type of help is needed

E.g. value 2 in ah means: output 1 character, the one in dl

So DOS routine 2 prints the character found in register dl

Moving 4c00h into register ax is same as 4ch into register ah
and 00 into al

They are just two bytes (byte registers) concatenated; and
this will terminate the program
15
Print Single Character
; Purpose:
simple program to output one character
; Assembler:
Microsoft assembler, command "masm”; 16-bit
data_s
segment
; unused data segment
dw
999
; define a word, init to 999
data_s
ends
stack_s
stack_s
segment
dw 100 dup( 0 )
ends
; unused stack segment
; reserve 100 words, init to 0
code_s
segment 'code'
; THE Code Segment
assume cs:code_s, ds:data_s
start:
mov ax,
mov ds,
mov dl,
mov ah,
int 21h
mov ax,
int 21h
ends
end
code_s
seg data_s ; initialize ds, indirectly 
ax
'$'
; char literal to be output by DOS
2h
; DOS call 2h emits char in dl
; call DOS routine 21h
4c00h
; we wanna terminate, ah + al
; terminate finally via DOS call
; repeat segment name at ends
start
; end says: Where to start
16
Print Character String

Data Segment defines a string of bytes, initialized to some
string literal, identified by msg

Note the $ character at the end of a string literal

Used as end criterion for DOS output routine 9

Stack segment is still dummy, holds also 10 strings, each of
length 16, also unused just to show stack seg to students

DOS routine 9 emits character string terminated by ‘$’

Whose start address it finds in ds:offset, offset
communicated in register dx

Note the built-in function offset applied to a data label

Masm also provides built-in seg function to generate other
part of address
17
Print Character String
; Purpose:
data_s
msg
data_s
simple program to output character string
segment
db "Hello class$"
; note ’$’ termination
ends
stack_s
segment
; unused
db 10 dup( "---S t a c k----" )
ends
; repeat the name
Stack_s
code_s
start:
code_s
segment 'code’
assume cs:code_s, ds:data_s
mov ax, seg data_s
mov ds, ax
mov dx, offset msg ; string 2 b output by DOS
mov ah, 9h
; DOS call 9h emits string
int 21h
; call DOS
mov ax, 4c00h
; we wanna terminate, ah + al
int 21h
; terminate finally via DOS
ends
; end code seg
end start
; start execution here: at ‘start’
18
Assembler Abbreviations









Directive .mode small allows for default abbreviations and
assumptions
For example data, code, stack, @data are predefined, as are
assume statements
Here another string is printed, “Hello”, note again the $
terminator
The macro @data is predefined by masm, same as seg data
Note again offset function
Note again DOS routine 9, to output string of characters at
address found in register dx
Program using .model small abbreviation is smaller, more
compact
.code ends previous segment, if any (here data) and starts
code segment
.data ends previous segment, if any, and starts data
segment, etc.
19
Assembler Abbreviations
; Source file: out4.asm
; note: 16-bit assembler
; Purpose:
simpler program to output character string
hi
.model small
.stack 10h
.data
db "Hello$"
.code
start: mov ax, @data
mov ds, ax
; assumes stack data code
; assumes name: stack
; assumes name: data
; assumes name: code
; @data predefined macro
; now data segment reg set
mov dx, offset hi
mov ah, 9h
int 21h
; string 2 b output by DOS
; DOS call 9h emits string
; call DOS
mov ax, 4c00h
int 21h
; we wanna terminate, ah + al
; terminate finally
end
start
; start here, at “start”!
20
Assembler Macros
 Tired of writing segment, and ends? The .model
small allows defaults and abbreviations
 Macros make program source more readable,
easier to maintain
 Macro can be defined anywhere in assembler
source
 Introduced by user defined name and macro
keyword
 Terminated by endm keyword
 Macros may have 0 or more parameters, to be
used in macro body
 When macro name is used, its body is expanded
in-line at that place
21
Assembler Macros
start
Put_Str
Done
hi
main:
macro
movax, @data
mov ds, ax
endm
macro Str
mov dx, offset Str
mov ah, 9h
int 21h
endm
macro ret_code
mov ah, 4ch
mov al, ret_code
int 21h
endm
.model small
.stack 10h
.data
db "Hello$"
.code
start
Put_Str hi
Done
0
end main
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
no parameters
@data predefined macro
now data segment reg set
end of start macro
one formal parameter, “Str”
string 2 b output by DOS
DOS call 9h emits string
call DOS
end of Put_Str macro
formal parameter “ret_code”
we wanna terminate, ah = 4c
communicate return code
terminate finally via DOS
end of macro body of Done
predefined assumptions
assumes segment name: stack
assumes segment name: data
terminate string with $
assumes segment name: code
; invoke macro Put_Str, w. hi
; start at: main!
22
Assembler Procs
 Assembler procedure identified by proc and endp
 Procedure can be called, provides syntactic
grouping mechanism to form logical modules
 Syntax rule for procedure: the name does not
allow ‘:’ as you saw for label
 Return instruction ret ends procedure body and
allows return to the place of call
 Reminiscent of high-level construct
23
Assembler Procs
; Purpose: modular macro program to output string
start
macro
; no parameters
mov ax, @data
; @data predefined macro
mov ds, ax
; now data segment reg set
endm
; end of “start” macro body
Put_Str
macro Str
; “Str” must be data label
. . .
other macros as before
endm
; see earlier def of Put_Str macro
.data
; assumes name: data
hi
db "Hello$"
; terminate string with $
.code
; assumes name: code
main
main
proc
start
Put_Str hi
Done
0
ret
endp
end
;
;
;
;
;
main
begin of procedure body
invoke “start” macro
invoke “Put_Str” w. actual
invoke “Done” with actual 0
unnecessary, unreachable
; entry point is “main”
24
Assemble and Link
 Microsoft old Macro Assembler masm 5.10 to 8.0
 Borland Macro Assembler tasm
 Microsoft newer Macro Assembler ml 6.22
 Again: Microsoft masm assembler 8.0 for 32-bit
processors here: http://www.microsoft.com/enus/download/details.aspx?id=12654
 Microsoft masm for x64 here:
http://msdn.microsoft.com/enus/library/hb5z4sxd.aspx
 Microsoft Linker link
 Borland Linker tlink
25
Assemble and Link
The Microsoft macro assembler old version (up to
about 2003 with .NET 2003) is named masm. Newer
assembler product from Microsoft is named ml. This
section explains the masm command briefly.
Users should consult on-line help by typing masm/h
to get more detailed information. The masm
command version 5.10 and older has 4 arguments,
separated from one another by commas. These
arguments are file names. Arguments are considered
omitted, if no comma (and thus no file name) is given.
The assembler prompts you for each omitted one, so
it is generally better to provide them, at least the
commas, lest there will be repeated interaction with
the assembler asking for file names, or hitting of
carriage returns.
26
Assemble Command
If commas without file names are given, then default file names are
assumed. The four file names, which are the arguments of the masm
command, are left to right:
•assembly source program, say source.asm
•object program generated by assembler, say source.obj
•the listing, generated by the assembler, say source.lst
•the cross-reference file, named source.crf
The suffixes obj, lst, and crf are automatically generated by the assembler, if
no other names are provided
Some complete masm commands, for the assembler file src1.asm would be:
masm
masm
masm
masm
src1.asm, src.obj, src.lst, src.crf ; no prompting
src1,src1,src1,src1
; no prompting
src1,src1.obj,src1,src1.crf
; no prompting
src1,,,;
; no prompting
In the above cases the masm assembler will not prompt you, because you
provided all file names. It was smart enough to provide suffixes (like .lst and
.obj) from the respective positions
27
Com Command
Link
Link also has 4 arguments, 1 input file and 3 output files. Input is
the object to be linked. The object may be a concatenation of
multiple object files (typically ending in the .obj suffix), strung
together by the + operator. For example:
link mem0 + putdec,,,
creates an executable mem0.exe. The file name mem0 is derived
from the first part of the first argument, the suffix .exe is
assumed. Also, the object file putdec.obj is used as input, to
resolve some of the external names used in mem0.obj. The
arguments of the link command, i.e. the 4 file names, are:
•object file or object files, concatenated by + with default suffix .obj
•the linked executable with suffix .exe
•the load map file, whose name ends in .map
•the library
28
Com Command
Link
If the input file is provided without suffix then the suffix .obj is
assumed. If the executable file is specified without suffix, then
.exe is assumed; any other file and explicit suffix is allowable
too.
The file for the load map should be specified; if none is provided
then the file name nul is generated by the linker. And if no suffix
is provided, then the .map suffix is assumed. Similarly, for the
library a file name must be specified. The suffix is .lib.
The commands below do not cause the linker to prompt you for
additional file name inputs, because sufficient information is
allowed to be assumed:
link mem0 + putdec,,,,
link mem0+putdex,foo.bar,,,
link putdec+mem0,mem0.exe,,,
; mem0.exe, no map, no library
; generate executable foo.bar
; mem0.exe
29
Link Command
Note that the concatenation operator + may be embedded in any
number of blanks. Also the commas may be surrounded by
blanks. The order of specifying the object files is immaterial,
provided that the main entry point is unambiguous.
The commands below cause the linker to prompt for some
additional information:
link mem0 + putdec
library
link mem0+putdec,x.y
link putdec+mem0,,
; ask for executable, map, and
; ask for map and lib
; gen putdec.exe, ask for map and lib
30
Main Entry Point
Each assembly unit concludes with an end directive (end
statement). This end statement may have a label, identifying one
of the labels of proc names of the program. The such label
specifies the entry point, i.e. the initial value of ip, set by the
loader.
However, if an executable is composed of multiple objects, there
may be only a single entry point. All other source modules
should not specify an argument after their end statement. If,
however, two or more object modules to be linked into an
executable do have an entry points specified, masm does not
complain. Instead, it takes the first one of the objects listed as
the first argument in the link command. And if this is not the
intended entry point, program execution will bring surprises.
31
References
1. Free masm download:
http://cvrce.blog.com/2009/08/28/masm-v611-freedownload/
2. http://www.emsps.com/oldtools/msasmv.htm
3. ML 64-bit: http://msdn.microsoft.com/enus/library/s0ksfwcf(v=vs.80).aspx
32
Download