Encoding x86 Instructions

advertisement
THE PC MACHINE INSTRUCTION SET
Many instructions have a specific purpose, so that a 1-byte machine language instruction
code is adequate. The following are examples:
MACHINE
CODE
40
50
C3
CB
FD
SYMBOLIC
INSTRUCTION
INC AX
PUSH AX
RET (short)
RET (far)
STD
COMMENT
;Increment AX
;Push AX
;Short return from procedure
;Far return from procedure
;Set Direction Flag
None of these instructions makes a direct reference to memory. Instructions that specify an
immediate operand, two registers, or a reference to memory are more complex and require two
or more bytes of machine code.
Machine code has a special provision for indicating a particular register and another
provision for referencing memory by means of an addressing mode byte.
REGISTER NOTATION
Instructions that reference a register may contain three bits that indicate the particular register
and a w-bit that indicates whether the width is a byte (0) or a word (1). Also, only certain
instructions may access the segment registers. Figure 1 shows the register notations. For
example, bit value 000 means AL if the w bit is 0 and AX if it is 1.
Here’s the symbolic and machine code for a MOV instruction with a one-byte immediate
operand:
MOV AH,00 10110 100 00000000
| |||
w reg = AH
In this case, the first byte of machine code indicates a width of one byte (w = 0) and refers to
AH (100). Here’s a MOV instruction that contains a one-word immediate operand, along with
its generated machine code:
MOV AX,00 10111 000 00000000 00000000
| |||
w reg = AX
The first byte of machine code indicates a width of one word (w = 1) and refers to AX (000).
For other instructions, w and reg may occupy different positions. Also, the first byte of machine
code may contain a d-bit that indicates the direction (left/right) of flow.
Bits for
General, Base, and Index Registers
Segment Registers
Bits
w=0
w=1
000 ES
000
AL
AX/EAX
001 CS
001
CL
CX/ECX
010 SS
010
DL
DX/EDX
011 DS
011
BL
BX/EBX
100 FS
100
AH
SP
101 GS
101
CH
BP
110
DH
SI
111
BH
DI
Figure 1 Register Notation
THE ADDRESSING MODE BYTE
The mode byte, when present, occupies the second byte of machine code and consists of
following three elements:
mod A 2-bit mode, where the values 00, 01, and 10 refer to memory locations 11 refers to
a register
reg A 3-bit reference to a register
r/m A 3-bit reference to a register or memory, where r specifies which register and m
indicates a memory address
In the following example of adding AX to BX
ADD BX,AX
00000011 11 011 000
|| || ||| |||
dw mod reg r/m
d = 1 means that mod (11) and reg (011) describe the first operand and r/m (000) describes
the second operand. Since w = 1, the width is a word. Therefore, the instruction is to add AX
(000) to BX (011).
Mod Bits. The two mod bits distinguish between addressing of registers and memory.
00
01
10
11
r/m bits give the exact addressing option; no offset byte (unless r/m = 110).
r/m bits give the exact addressing option; one offset byte.
r/m bits give the exact addressing option; two offset bytes.
r/m specifies a register. The w-bit (in the operation code byte) determines whether a
reference is to an 8-, 16-, or 32-bit register.
Reg Bits. The three reg bits, in association with the w-bit, determine the actual width.
R/M Bits. The three r/m (register/memory) bits, in association with the mod bits, determine
the addressing mode, as shown in Figure 2.
r/m
mod=00
rnod=01 or 10
000
001
010
011
100
101
110
111
[BX+SI]
[BX+DI]
[BP+SI]
[BP+DI]
[SI]
[DI]
Direct
[BX]
DS: [BX+SI+disp]
DS: [BX+DI+disp]
SS: [BP+SI+disp]
SS: [BP+DI+disp]
DS: [SI+disp]
DS: [DI+disp]
SS: [BP+disp]
DS: [BX+disp]
mod=11
w=0
AL
CL
DL
BL
AH
CH
DH
BH
Figure 2 The r/m BitsTwo-Byte Instructions
mod=11
w=1
AX
CX
DX
BX
SP
BP
SI
DI
The following two-byte instruction adds BX to AX:
ADD AX,BX
0000 0011 11 000 011
|| || ||| |||
dw mod reg r/m
d=1
reg plus w describes the first operand (AX), and mod plus r/m plus w describes the
second operand (BX).
w = 1 The width is a word.
mod = 11 The second operand is a register.
reg = 000 The first operand is AX.
r/m = 011 The second operand is BX.
The next example multiplies AL by BL:
MUL BL 11110110 11
100 011
| ||
||| |||
w mod reg r/m
The width (w = 0) is a byte, mod (11) references a register, and the register (r/m = 011) is BL (011).
Reg = 100 is not meaningful here. The processor assumes that the multiplicand is in AL if the
multiplier is a byte (as in this example), AX if a word, and EAX if a double-word.
Three-Byte Instructions
The following MOV generates three bytes of machine code:
MOV mem-word,AX
10100011 mmmmmmmm mmmmmmmm
||
dw
A move from the accumulator (AX or AL) needs to know only whether the operation is byte or word. In
this example, w = 1 means a word, and the 16-bit AX is understood. (AL coded in the second operand
would cause the w bit to be zero.) Bytes 2 and 3 contain the offset to the memory location. Using the
accumulator register often generates a shorter instruction length and faster execution than the use of
other registers.
Four-Byte Instructions
The following four-byte instruction multiplies AL by a memory locatton:
MUL men—byte 11110110
00
100 110 mmmmmmmmm mmmmmmmm
|
||
|||
|||
w mod
reg
r/m
For this instruction, although reg is 100, the multiplicand is assumed to be AL (one byte, because w =
0). Mod = 00 indicates a memory reference, and r/m = 110 means a direct reference to memory; the
two subsequent bytes provide the offset to the memory location.
The next example illustrates the LEA instruction, which specifies a word address
LEA DX,memory
10001101
00 010 110 mmmmmmmm mmmmmmmm
| | ||| |||
mod reg r/m
Reg = 010 designates DX; mod = 00 and r/m = 110 indicate a direct reference to a memory
address; the next two bytes provide the offset to this location.
Download