INTEL 64 ARCHITECTURE

advertisement
INTEL 64
ARCHITECTURE - I
Assembler Language
H. Wiklicky (with thanks to A. Gopalan, R. Hayden and N Dulay)
h.wiklicky@imperial.ac.uk
Books
• Guide to assembly language programming in Linux
Silvarama Dandamudi, Springer 2005
Good introduction to Linux assembly programming
Available as an e-book in the Imperial Library
• Computer systems: A programmer’s perspective
Randal E. Bryant and David O’Hallaron, Prentice-Hall 2003
Aimed at Linux/BSD. Uses GNU assembler (gas) and C.
• Intel 64 and IA-32 manuals
http://www.intel.co.uk/content/www/us/en/processors/architectu
res-software-developer-manuals.html
Internet resources
• PC assembly language
Paul Carter
http://www.drpaulcarter.com/pcasm
• The art of assembly language programming
Randall Hyde
http://www.plantation-productions.com/Webster
• The netwide assembler nasm
http://www.nasm.us
Where does Assembly Language Fit?
• Compilation stages during a single invocation of gcc:
• Preprocessing (to expand macros)
• Compilation
• source code  assembly language
• Assembly
•
•
•
•
assembly language  machine code using an assembler
Specific to a particular architecture
Each statement corresponds to a single machine code instruction
Uses a mnemonic to represent each low-level machine operation (or
opcode)
• E.g. 10110000 01100001  B0 61  mov AL, 61h
• Linking (to create the final executable)
Compilation Hierarchy
High-Level Language
(C/C++/Pascal)
Assembly Language
Machine Language
Hardware
Intel architecture family
CPU
Cores
Year
Data
bus
Max.
mem.
Trans.
Clock speed
Av. MIPS
Level-1 Caches
8086
1
1978
16
1Mb
29K
5 – 10MHz
0.8
80286
1
1982
16
16Mb
134K
8 – 12MHz
2.7
80386
1
1985
32
4Gb
275K
16 – 33MHz
6
80486
1
1989
32
4Gb
1.2M
25 – 100MHz
20
8Kb
Pentium
1
1993
64
4Gb
3.1M
60 – 233MHz
100
8Kb+8Kb
Pentium Pro
1
1995
64
64Gb
5.5M
150 – 200MHz
440
8Kb+8Kb+L2
Pentium II
1
1997
64
64Gb
7M
266 – 450MHz
466+
16Kb+16Kb+ L2
Pentium III
1
1999
64
64Gb
8.2M
0.5 – 1GHz
1000+
16Kb+16Kb+ L2
Pentium 4
1
2001
64
64Gb
42M
1.3 – 3.8GHz
9000+
12Kb+8Kb+L2
Core 2
1–4
2006
64
256Tb
291M
1.06 – 3.33GHz
20000+
32Kb+32Kb+L2
Core i7
2–6
2008
64
256Tb
781M
1.6 – 3.4GHz
50000+
32Kb+32Kb+L2
Confusing Nomenclature
(Intel 64 vs. IA-64)
• Two different instruction sets and architectures
• Intel 64
• Formerly known as EM64T or IA32e or x86-64
• 64-bit extended instruction set based on x86 processor architecture
• Originally by AMD
• Can also run 32-bit application on a 32-bit operating system
• Backward compatibility which is the key to the success of Intel x86
processor
• IA-64
• Based on an entirely different architecture
• Only Intel Itanium processor employs this
• No backward compatibility with the IA-32 software
• Originally incorporated hardware emulation to the 32-bit application but now
relying on software emulation
Intel 64
• Advantages
• The 64-bit address space (long mode)
• An extended register set
• Extends general registers to 64 bits
• Adds an additional 8 integer registers (r8 … r15)
• A command set familiar to developers
• The capability to launch obsolete 32-bit applications in a
64-bit operating system
• The capability to use 32-bit operating systems
• Uses RIP-relative addressing
• Easier to write position-independent code
Registers (64-bit)
63
0
rax
rbx
rcx
rdx
rsi
rdi
‘A’ register
‘B’ register
‘C’ register
‘D’ register
source index
destination index
rsp
rbp
stack pointer
base pointer
Registers (32-bit)
63
32 31
0
rax
rbx
rcx
rdx
rsi
rdi
eax
rsp
rbp
esp
ebx
ecx
edx
esi
edi
ebp
‘A’ register
‘B’ register
‘C’ register
‘D’ register
source index
destination index
stack pointer
base pointer
Registers (32-bit)
31
0
eax
ebx
ecx
edx
esi
edi
‘A’ register
‘B’ register
‘C’ register
‘D’ register
source index
destination index
esp
ebp
stack pointer
base pointer
Registers (16-bit)
31
16 15
0
eax
ebx
ecx
edx
esi
edi
ax
esp
ebp
sp
bx
cx
dx
si
di
bp
‘A’ register
‘B’ register
‘C’ register
‘D’ register
source index
destination index
stack pointer
base pointer
Registers (8-bit)
31
eax
ebx
ecx
edx
16 15
8 7
0
ah
al
bh
bl
ch
cl
dh
dl
‘A’ register
‘B’ register
‘C’ register
‘D’ register
The 2 least significant bytes of registers rax, rbx, rcx and
rdx also have register names, that can be used for
accessing those bytes.
Least significant byte of rsi, rdi, rsp and rbp are
called sil, dil, spl and bpl respectively
Instruction pointer register
63
0
rip
• Holds the address of the next instruction to be executed,
also known as “program counter” register
• Rarely manipulated directly by programs
• Updated implicitly by control-flow instructions such as
call, jmp and ret
• Used to implement if and while statements; and method
calls
Flags register
63
0
rflags
Bits represent various CPU state information. Also
set/cleared after arithmetic instructions.
• Zero flag (bit 6): 1 if result is zero, 0 otherwise
• Sign flag (bit 7): MS-bit of result, sign bit if a signed integer
• Overflow flag (bit 11): 1 if a signed result overflows, 0 otherwise
• Carry flag (bit 0): 1 if an unsigned result overflows, 0 otherwise
• Parity flag (bit 2): 1 if LS-byte of result contains an even number of
bits, 0 otherwise
Main memory
Byte addressable, little endian, non-aligned accesses
allowed
0H 1H 2H
3H 4H
5H
6H
7H 8H
9H
AH BH CH DH EH FH
12
74
F0
0B
23
1F
36
•
•
•
•
•
•
31
CB
EF
Byte at address 9H?
Byte at address BH?
Word at address DH?
Word at address 6H?
Doubleword at address AH?
Quadword at address 6H?
A4
06
FE
7A
FF
45
Instruction format
• Most Intel instructions have 2, 1 or 0 operands and have
one of the forms:
label: opcode Destination, Source
label: opcode Operand
label: opcode
;comments
;comments
;comments
• label is an optional user-defined identifier whose value
is the address of the instruction or data item which follows
• We’re using netwide assembler (nasm) which follows Intel
syntax. Other syntaxes exist!
Compatibility with IA-32
• Instructions default to 32-bit unless specified
• Instructions referring to 64-bit registers are automatically performed
with 64-bit precision
• Instructions that default to 64-bit operand size in long
mode are:
CALL (near)
PUSH reg/mem
POP FS
LTR
JrCXZ
PUSH GS
PUSH imm8
POP reg/mem
LGDT
ENTER
PUSH reg
POP GS
LOOP
JMP (near)
PUSHFQ
PUSH imm32
MOV CR(n)
LIDT
Jcc
PUSH FS
POP reg
LOOPcc
LEAVE
RET (near)
POPFQ
MOV DR(n)
LLDT
Cannot mix ah, bh, ch and dh with ‘r’ registers
Basic data types
7
0
byte
15
8 7
0
word
31
16 15
high word
high doubleword
doubleword
low word
32 31
63
0
0
low doubleword
quadword
Directives for “global” variables (1)
• Data declaration directives are special assembler
commands allowing “global” data (variables) to be
declared
• Such data is mapped to fixed memory locations and can
be accessed using the name of the variable
• The address of the global variable is then automatically
encoded into instructions at assembly time
users
age
total
message
sequence
array
db
dw
dd
db
dw
times 100 dw
3
21
999
“hello”
1,2,3
33
;
;
;
;
;
;
byte with value 3
word with value 21
doubleword with value 999
5-byte string hello
3 words with values 1, 2 and 3
100 words each with value 33
Directives for “global” variables (2)
• Uninitialised data can be reserved with the following
directives
tiny
little
big
resb
resw
resd
10
100
1000
; reserve 10 bytes
; reserve 100 words (200 bytes)
; reserve 1000 dwords (4K bytes)
• Why are these useful when you can just use the ones on
the previous slide?
Constants
• Can also define named constants using the equ directive
dozen
Century
equ
equ
12
100
• Note: When assembler programs are translated into a
particular (object) file format (e.g. win64, elf) then often
these are organised in sections or segments like
text, data etc.
• A common requirement is that “global” variables can only
be declared in the data segment.
Download