X86 Assembly Language We will be using the nasm assembler (other assemblers: MASM, as, gas) Outline • x86 registers • x86 memory basics • Introduction to x86 assembly Why Assembly ? • Assembly lets you write fast programs – You can write programs that execute calculations at the maximum hardware speed… – …assuming that you know what you’re doing… • Why not always use assembly ? • Assembly is necessary for writing certain portions of system software – Compilers – Operating systems • Assembly is used to program embedded devices and DSPs x86 • Intel/Microsoft’s monopoly: cost-effective microprocessors for the mass market • x86 refers to an instruction set rather than a specific processor architecture, also known as ia32 • The processor core that implements the x86 instruction set has gone through substantial modifications and additions over the last 20 years • x86 is a Complex Instruction Set Computer – 20,000+ instructions – This course course will take you through some of them! • amd64 instruction set has replaced x86 for new code Registers • • • • Registers are storage locations The first-level of a computer’s memory hierarchy The fastest to access storage in your system Purposes – Data used in arithmetic/logical operations – Pointers to memory locations containing data or instructions – Control information (e.g. outcome of arithmetic instructions, outcome of instructions that change the control flow of a program) x86 Registers at a Glance General Purpose (sort of) Special Registers AH Accumulator AL AX Index Registers Instr Pointer EAX IP EIP BH Base BL Flags BX Stack Pointer FLAG EFLAG EBX SP ESP Base Pointer BP EBP Count CH CL CX Segment Registers DH CS Code Segment DS Data Segment ES Extra Segment SS Stack Segment DL DX EDX DI EDI ECX Data Dest Index FS GS Source Index ESI SI General Purpose Registers • Accumulator (AH,AL,AX,EAX) – Accumulates results from mathematical calculations • Base (BH,BL,BX,EBX) – Points to memory locations • Count (CL,CH,CX,ECX) – Counter used typically for loops – Can be automatically incremented/decremented • Data (DL,DH,DX,EDX) – Data used in calculations – Most significant bits of a 32-bit mul/div operation A note on GP registers • In 80386 and newer processors GP registers can be used with a great deal of flexibility… • But you should remember that each GP register is meant to be used for specific purposes… • Memorizing the names of the registers will help you understand how to use them • Learning how to manage your registers will help you develop good programming practices • You will find that you are generally short of registers Index Registers • SP, ESP – Stack pointer (more on that in upcoming lectures…) • BP, EBP – Address stack memory, used to access subroutine arguments as a stack frame mechanism • SI, ESI, DI, EDI – Source/Destination registers – Point to the starting address of a string/array – Used to manipulate strings and similar data types Segment Registers • CS – Points to the memory area where your program’s instructions are stored • DS – Points to the memory area where your program’s data is stored • SS – Points to the memory area where your stack is stored • ES,FS,GS – They can be used to point to additional data segments, if necessary Special Registers • IP, EIP – Instruction pointer, points always to the next instruction that the processor is going to execute – Only changed indirectly by branching instructions • FLAG, EFLAG – Flags register, contains individual bits set by different operations (e.g. carry, overflow, zero) – Used massively with branch instructions x86 memory addressing modes • Width of the address bus determines the amount of addressable memory • The amount of addressable memory is NOT the amount of physical memory available in your system • Real mode addressing – A throwback to the age of 8086, 20-bit address bus, 16-bit data bus – In real mode we can only address memory locations 0 through 0FFFFFh. Used only with 16-bit registers – We will not be using real mode! • Protected mode addressing – 32-bit address bus, 32-bit data bus, 32-bit registers – Up to 4 Gigabytes of addressable memory – 80386 and higher operate in either real or protected mode Real-mode addressing on the x86 • Memory address format Segment:Offset • Linear address obtained by: – Shifting segment left by 4 bits – Adding offset • Example: 2222:3333 Linear address: 25553 • Example: 2000:5553 Linear address: 25553 • THIS WILL NOT APPLY TO US IN 32-bit PROTECTED MODE! Assembly files • Five simple things: • Labels – Variables are declared as labels pointing to specific memory locations – Labels mark the start of subroutines or locations to jump to in your code • • • • Instructions – cause machine code to be generated Directives – affect the operation of the assembler Comments Data Comments, comments, comments! ; ; ; ; ; ; Comments are denoted by semi-colons. Please comment your code thoroughly. It helps me figure out what you were doing It also helps you figure out what you were doing when you look back at code you wrote more than two minutes ago. ; everything from the semi-colon to the end ; of the line is ignored. Labels ; Labels are local to your file/module ; unless you direct otherwise, the colon ; identifies a label (an address!) MyLabel: ; to make it global we say global MyLabel ; And now the linker will see it Example with simple instructions var1: str1: var2: dd db dd 0FFh “my dog has fleas”,10 0 ; Here are some simple instructions mov eax, [var1] ; notice the brackets mov edx, str1 ; notice lack of brackets call dspmsg jmp done mov ebx, [var2] ; this will never happen done: nop