Assembly Language Part 1 (overview)

advertisement
cs3843
syllabus
outline
lecture notes
Brief History of Computers
Year
Processor
programming
assignments
recitations
homework
Max Address Caches
Space
16 KB
Speed
set up
1972
8008
Memory
Size
3500
1978
1982
8086
80286
29 K
134 K
1 MB
16 MB
0.33 - 0.75 MIPS
0.9 - 2.66 MIPS
1985
i386
275 K
4 GB
2.5 - 9.9 MIPS
1989
i486
1.2 M
4 GB,
virtual 1 TB
8 KB cache on
chip
25-50 MHz ;
20-41 MIPS
1993
Pentium
3.1 M
4 GB,
virtual 64 TB
60-100 MHz;
100-150 MIPS
1995
Pentium Pro
5.5 M
1997
Pentium II
7.5 M
4 GB;
virtual 64 TB
5 GB; virtual
64 TB
1999
Pentium III
9.5 M
"
8 KB instruction
cache; 8 KB data
cache
16 KB L1 cache
256 KB L2 cache
32 KB L1 cache
512 KB ext L2
cache
256KB-2MB L2
cache
2000
2006
Pentium 4
Core 2
42 M
291M
"
64 KB L1 cache
per core;
0.05 MIPS
Instr set
designed by
Datapoint in San
Antonio
IBM PC
IBM PC-AT
Compaq
suitcase
computers
With the speed
of the i386 and
the convenience
of MS Windows
3.0, Microsoft
grew significantly
Floating Pt
coprocessor on
chip
166-200 MHz
233-450 MHz
450-600 MHz
1.6 - 1.8 GHz
1.86 - 3.0 GHz
2 cores
TX Instruments
Computers
MS DOS
MS Windows,
SCO Xenix
IBM OS/2
MS Windows 2.0
MS Windows 3.0
(1990)
2008
Core i7
781 M
2010
Itanium
Tukwila
2B
2012
Xeon Phi
5B
4MB L2 cache
64 KB L1 cache
per core; 256 KB
L2 cache per
core; 8MB L3
cache
24 MB L3 cache
2.8-3.5 GHz
4 cores; hyper
threading
1.6-1.73 GHz
up to 6 instr per
clock cycle
240-320 GHz
1-1.2 teraflops
double precision
2-4 cores; instrlevel parallelism
HP Servers
62 cores
HP & Cray
Servers
Moore's Law (revised, 1975): The complexity for minimum component costs has increased at a rate of roughly a factor of two every two years.
Note: from 1978 to 2012, that would be 3.6B based on Moore's Law.
Suppose we need to add two integer variables (which are externals) and store
We will focus on the Intel Architecture 32 bit machines
the result in another integer variable (which is an external).
(IA32). There are two different syntaxes for IA32 Assembly
Intel Asm Syntax
AT&T Asm Syntax
Meaning
Language:
mov eax, valx
movl valx, %eax
Load register eax with
Intel (used by Microsoft)
valx
add eax, valy
addl valy, %eax
Add valy to register eax
AT&T (used by GNU; therefore we use this)
mov
valz,
eax
movl
%eax,
valz
Store the result in valz
The underlying machine code is the fundamentally the same.
AT&T Assembly language syntax uses Source-Destination operands; whereas,
Intel uses Destination-Source. AT&T also places % in front of register
names.
Machine Instructions
The actual machine code (which executes) is binary and is
interpreted by the CPU. Assembly Language is a lot easier to
read than machine code. IA32 machine instructions vary in
size from 1 to 15 bytes. Some other machine architectures
use fixed length instructions.
We will discuss the actual IA32 machine instruction format
later in the semester.
Most instruction formats include:
Op Code - tells the CPU what needs to be done
Operand Type (if needed) - may include data type, whether the operand is a
register, immediate (constant) or memory reference
Operand 1 (if needed) reg value, immediate operand, memory reference info
Operand 2 (if needed) reg value, immediate operand, memory reference info
Note: both operands usually will not be memory references
IA32 Hardware Architecture Overview
 Program Counter is called %eip (extended instruction
pointer).
 8 Integer Registers, each storing 32 bit values.
o These are named: %eax, %ebx, %ecx, %edx,
%esi, %edi, %esp, %ebp
o The lower 2 bytes in the first 4 integer
registers can be referenced as: %ax, %bx, %cx,
%dx
o We can further divide those 2 lower byte
names into two single bytes:: %al, %ah, %bl,
%bh, %cl, %ch, %dl, %dh
o Registers %esp and %ebp are for runtime
stack manipulation
 Based on the history of Intel chips, backward
compatibility forced the inclusion of these 2-byte and
1-byte registers
IA32 Hardware Architecture Overview Continued
 Condition Code Registers are single bit flags which are
set based on the outcome of the most recent
arithmetic or logical instructions.
o OF - overflow flag; set when a signed
arithmetic operation is either too large or too
small to fit in the destination
o CF - carry flag; set when an unsigned
arithmetic operation is too large to fit in the
destination
o ZF - zero flag; set when the result is zero; it is
ON if a comparison shows values are equal
o SF - sign flag; set when the result is a negative
value
o PF - parity flag; its parity is even (PE) when an
short ix = 21234;
short iy = 20841;
short iresult;
iresult = ix + iy;
printf("Result is %d\n");
output:
-23461
Since the sum is greater than 31767, it overflows. Some
languages would generate an error, but C assumes overflows are
expected and does not generate a runtime error.
Note:
21234 + 20841 = 42075
216 = 65536
42075 - 65536 = -23461
even number of 1 bits in the 8 low order bits.
IA32 Hardware Architecture Overview Continued
 Floating Point Registers are used for floating point
arithmetic. There are 8 floating point registers, each
having 80 bits. These are stack based
o Top of the stack: register ST0
o Next: register ST1
o Bottom: register ST7
IA32 Assembly Language AT&T Syntax
Comments begin with #.
Labels begin in column 1 and end with a colon. They are
used to reference an instruction address for JMP and
CALL instructions. They are also used to reference
external variables (basis), static variables, and string
constants.
Dot directives begin with a dot and tell the assembler things
like name of your source code, variables which are
external global basis (.globl), data types, lengths, and
other assembler information.
Instruction Operators should not begin in column 1 for
readability.
Instruction Operands might reference constants, symbol
labels, registers or memory references, but are all
dependent on the instruction operations.
Operands
Operands have several different forms to help reference
registers, constants, symbols, and memory.
%reg
$constant
symbol
$symbol
Register references begin with a %. They
reference the 4 byte registers (begin with "e"
for extended), 2 byte registers or 1 byte
registers.
Numeric constants can be base-10 or
hexadecimal (begin with 0x).
A symbol can be an external variable, an
external function, a static variable or a .label.
The address of the specified symbol is
In addition to arithmetic operations, comparisons set those
condition codes. (see the notes on Flow Instructions)
We will discuss floating point in detail after the midterm
exam.
See sample code below
Some examples:
%ax
register %ax
%eax
register %eax
$150
integer constant 150
$0xAFF3
hexadecimal constant AFF3
valx
a symbol for an external or static variable
.L5
a label to an address of an instruction which
could be used in a jump instruction. It can also
be a label for a character string literal.
8(%ebp)
the memory address which is 8 + the value of
memRef
typically the address of a variable
Memory references can take on many forms:
symbol
Memory address based
on the symbol's
address
off(%reg)
Memory address is an offset
from the value of %reg
(%reg1,%reg2)
Memory address is sum of
the values of %reg1 and
%reg2
off(%reg1,%reg2) Memory address is an offset
from the sum of the values
of %reg1 and %reg2
The offsets can be positive integers, negative integer
symbolics, or symbolics with a positive or negative
offset.
Overview of the Machine instruction categories
Move - move from source to destination
movS source, dest
Load Effective Address - load the address instead of the
value from an address
leaS source, dest
Arithmetic - 2 byte and 4 byte
addS operand1, operand2
subS operand1, operand2
imulS operand
idivS operand
incS operand
decS operand
negS operand
add op1 to ap2
subtract op1 from op2
multiply by operand
divide by operand
increment the operand
decrement the operand
negate the operand
Shift
salS k,reg
sarS k,reg
shlS k,reg
shrS k,reg
shift arithmetic left
shift arithmetic right
shift logical left
shift logical right
Note: S is the size and must be one of
register %ebp
the memory address which is the value of register
%ebp + value of register %ebx.
studentData+4 the memory address which is 4 + the instruction
address of studentData.
(%ebp,%ebx)
There is another form of memory references using a scale which
we will discuss later.
Examples:
movl iValA,%edx
movl $iValA,%edx
movl %edx,lresult
# Moves the long value of iValA to %edx
# Moves the address of iValA to %edx
# Moves the long value in %edx to lresult
addl 4(%ebp),%edx
#
#
#
#
#
incw %dx
movl lresult,%edx
leal lresult,%edx
sarl $3, %edx
The value at the address computed
by an offset of 4 plus the value of ebp
is added to the value of %edx.
The result is stored in %edx.
Increment the 2 byte value in %dx by 1.
# Moves the long value found at lresult
# to %edx
# Move the address of lresult to %edx
# Arithmetic shift of the long value in
# %edx 3 bits to the right
b
w
l
q
byte (1 byte)
word (2 bytes)
long (4 bytes)
quad words (8 bytes)
W for word is based on the old machines where a word was 2
bytes.
Overview of the Machine instruction categories
Flow
jmp label
unconditional jump to label
cmpS operand1, operand2
compare setting
condition code flags
jle label
jump less than or equal
jl label
jump less than
je label
jump equal
jne label
jump not equal
jge label
jump greater than or equal
jg label
jump greater than
call dest
using calling convention to invoke the
function at dest
ret
return to the caller based on the calling
convention
Stack - these manipulate the runtime memory stack
pushS operand pushes the operand onto the runtime
memory stack
popS operand pops the top of the stack and stores it
in operand
leave
prepare to leave the subroutine based
on calling convention
Note that call and ret also manipulate the stack.
C code for calculating the average using the final exam and
the higher of the first two exams.
int calculateAverage(int iExam1, int iExam2,
int iFinalExam)
{
int iSum;
if (iExam1 > iExam2)
iSum = iExam1 + iFinalExam;
Consider the following C statement snippet:
if (iX > iY)
true part
else
false part
In Assembly Language:
movl iX, %edx
cmpl iY, %edx
jle .L3
…
jmp .L4
#
#
#
#
#
#
load reg edx with the iX variable
compare iX:iY (we are comparing the
second operand(edx) with the first)
if <=, jump to .L3
code for the true part
jump over the false part
.L3:
…
# code for the false part
.L4:
…
# code following the entire if
Corresponding assembly language code generated by gcc -O1 -S
(comments added by me)
1
.file
"calculateAverage.c"
2
.text
3 .globl calculateAverage
4
5
6
7
8
9
# The Linker will need to know
# this.
.type
calculateAverage, @function
calculateAverage:
pushl
%ebp
# tbd
movl
%esp, %ebp
# tbd
movl
8(%ebp), %edx
# load iExam1 in %edx
movl
12(%ebp), %eax
# load iExam2 in %eax
else
iSum = iExam2 + iFinalExam;
return iSum / 2;
}
3/2 = 1.5, truncating 1
-3/2 = -1.5, truncating -1
-3 + 1 = -2, if we divide by 2 we get -1
-4 + 1 = -3, divide by 2 = -2
C code for averageDriver.c
StudentData;
void readStudents();
int calculateAverage(int iExam1, int iExam2, int
iFinalExam);
int main(int argc, char *argv[])
{
int i;
studentData.iStudentCnt = 0;
readStudents();
for (i = 0; i < studentData.iStudentCnt; i++)
printf("%s %d\n"
, studentData.studentM[i].szStudentId
, calculateAverage
(studentData.studentM[i].iExam1
, studentData.studentM[i].iExam2
, studentData.studentM[i].iFinalExam)
);
}
10
11
12
13
14 .L2:
15
16
17 .L3:
18
19
20
21
22
23
24
25
26
cmpl
jle
addl
jmp
%eax, %edx
.L2
16(%ebp), %edx
.L3
#
#
#
#
movl
addl
16(%ebp), %edx
%eax, %edx
# move iFinal to %edx
# add iExam2 to %edx (iFinal)
compare iExam1:iExam2
if <=, jump to .L2
add iFinal to %edx (iExam1)
jump over false part
movl
%edx, %eax
# move sum to
shrl
$31, %eax
# shift makes
addl
%edx, %eax
# increase by
sarl
%eax
# divide by 2
popl
%ebp
# tbd
ret
# tbd
.size
calculateAverage, .-calculateAverage
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section
.note.GNU-stack,"",@progbits
%eax for sign
this 0 or 1
0 or 1
via shifting
Assembly Language for averageDriver.c using gcc -O1 -S
1
.file
"averageDriver.c"
2
.section
.rodata.str1.1,"aMS",@progbits,1
3 .LC0:
4
.string "%s %d\n"
5
.text
6 .globl main
# linker will need to know about this
7
.type
main, @function
8 main:
9
leal
4(%esp), %ecx
# tbd
10
andl
$-16, %esp
# tbd
11
pushl
-4(%ecx)
# tbd
12
pushl
%ebp
# tbd
13
movl
%esp, %ebp
# tbd
14
pushl
%edi
# tbd
15
pushl
%esi
# tbd
16
pushl
%ebx
# tbd
17
pushl
%ecx
# tbd
18
subl
$24, %esp
# reserve 24 bytes on the stack
19
movl
$studentData, %ebx
# address of studentData -> %edx
#
# What is at studentData vs. studentData+4 ?
#
20
movl
$0, (%ebx)
21
call
readStudents
# gcc decided to just compare iStudentCnt:0 instead of using i
22
cmpl
$0, (%ebx)
23
jle
.L5
24
movl
%ebx, %esi
25
movl
$0, %ebx
26
movl
%esi, %edi
27 .L3:
#
#
#
#
# set iStudentCnt to 0
# call readStudents
# compare iStudentCnt:0
# if iStudentCnt <= 0, jump to .L5
# save addr of studentData
# move 0 to %ebx (gcc using %ebx for i)
# addr of studentCnt
offset of iExam1 is after iStudentCnt,
szStudentId, and a slack byte. 7+4+1 = 12
offset of iExam2 is 4 past iExam1. 12+4 = 16
offset of iFinalExam is 4 past iExam2. 16+4 = 20
# pass the parameters to calculateAverage by loading the stack
28
movl
20(%esi), %eax
# iFinalExam -> %eax
29
movl
%eax, 8(%esp)
# load it onto the stack as a parm
30
movl
16(%esi), %eax
# iExam2 -> %eax
31
movl
%eax, 4(%esp)
# load it onto the stack as a parm
32
movl
12(%esi), %eax
# iExam1 -> %eax
33
movl
%eax, (%esp)
# load it onto the stack as a parm
34
call
calculateAverage
# call calculateAverage
# result of calculateAverage was returned in %eax
# prepare the parameters for printf call
35
movl
%eax, 12(%esp)
# move calculateAverage result to stack
# determine the address of the szStudentId[i]
# since each element is 20 bytes long, we need to multiply
# the subscript by 20.
# using leal (x,x,4) will multiply register x by 5
# using leal (,x,4) will multiply register x by 4
# Doing both of those leal instructions is x*20
#
# First multiply i (which is in %ebx) by 5
36
leal
(%ebx,%ebx,4), %eax #
# Now multiply that by 4 which effectively mult by 20
# and add the address of the beginning of the array
37
leal
studentData+4(,%eax,4), %eax
38
movl
%eax, 8(%esp)
# load szStudentId[i] on the stack
39
movl
$.LC0, 4(%esp)
# load the address of the format string
# onto the stack
40
movl
$1, (%esp)
# move 1 onto the stack. This is a error
# checking/ optimization flag
41
call
__printf_chk
# call printf
42
addl
$1, %ebx
# increment i
# Increment ptr into array by size of one element
43
addl
$20, %esi
# add element size to ptr
44
cmpl
%ebx, (%edi)
# compare iStudentCnt:i
45
jg
.L3
# if iStudentCnt > i, loop back to .L3
46 .L5:
47
addl
$24, %esp
# tbd
48
popl
%ecx
# tbd
49
popl
%ebx
# tbd
50
popl
%esi
# tbd
51
popl
%edi
# tbd
52
popl
%ebp
# tbd
53
leal
-4(%ecx), %esp
# tbd
54
ret
# tbd
55
.size
main, .-main
56
.comm
studentData,404,32
57
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
58
.section
.note.GNU-stack,"",@progbits
Slack Bytes (Alignment)
If each Student is 19 bytes (7+4+4+4), 19 bytes * 20 students
is 380 bytes. Adding 4 for iStudentCnt would be 384. Why is
Why did it reserve 404 bytes for studentData?
typedef struct
{
char szStudentId[7];
it 404 bytes? It is because of slack bytes. It can be easier to
read and write 4 byte numeric values if their addresses are
a multiple of 4.
If the compiler assumes a slack byte between szStudentId
and iExam1, iExam1 would be aligned on an address which is
a multiple of 4.
If each student is 20 bytes, 20*20 is 400 bytes. Adding 4
bytes for iStudentCnt gives 404.
int iExam1;
int iExam2;
int iFinalExam;
} Student;
typedef struct
{
int iStudentCnt;
Student studentM[20];
} StudentData;
If a student ID is only 4 characters and we had declared it
szStudentId[5], how many slack bytes would have been included?
Download