Introduction Chapter 1 What is Assembly Language? Data Representation 1 Table 1. Software Hierarchy Levels Level Description Application Program Software designed for a particular class of applications High-Level Language (HLL) Programs are compiled into either assembly language or machine language. E.g. C++, Pascal, Java, Visual Basic, etc. Operating Systems Contains procedures than can be called from programs written in either high-level language or assembly language. This system may also contain an application programming interface (API). Assembly Language (ASM) Uses instruction mnemonics that have a one-to-one correspondence with machine language. Machine Language (ML) 2 Numeric instructions and operands that can be stored in memory and directly executed by the computer processor. What is Assembly Language? 3 A low-level processorspecific programming language design to match the processor’s machine instruction set each assembly language instruction matches exactly one machine language instruction we study here Intel’s 80x86 (and Pentiums) Why learn Assembly Language? 4 To learn how high-level language code gets translated into machine language i.e.: to learn the details hidden in HLL code To learn the computer’s hardware by direct access to memory, video controller, sound card, keyboard… To speed up applications direct access to hardware (ex: writing directly to I/O ports instead of doing a system call) good ASM code is faster and smaller: rewrite in ASM the critical areas of code Assembly Language Applications 5 Application programs are rarely written completely in assembly language only time-critical parts are written in ASM Ex: an interface subroutine (called from HLL programs) is written in ASM for direct hardware access Ex2: device drivers (called from the OS) ASM often used for embedded systems (programs stored in PROM chips) computer cartridge games, microcontrollers (automobiles, industrial plants...), telecommunication equipment… Very fast and compact but processor-specific Table 2. Comparison of Assembly Language and High-Level Languages Type of Applications High-Level Language Assembly Language Business application software for single platform. Formal structures No formal structure. make it easy to organize and maintain. Hardware device driver. Awkward coding techniques required. Hardware access is straightforward and simple. Business application for multiple platforms. Portable. Difficult to maintain. Embedded systems and computer games requiring direct hardware access. Produces too much executable code, and may not run efficiently. Ideal, because the executable code is small and runs quickly. 6 Machine Language An assembler is a program that converts ASM code into machine language code: mov al,5 (Assembly Language) 1011000000000101 (Machine Language) significant byte is the opcode for “move into register AL” the least significant byte is for the operand “5” most 7 Directly programming in machine language offers no advantage (over Assembly)... Binary Numbers/Storage Size are used to store both code and data On Intel’s x86: byte = 8 bits (smallest addressable unit) word = 2 bytes doubleword = 2 words quadword = 2 doublewords 8 Data Representation Even if we know that a block of memory contains data, to obtain its value we need to choose an interpretation Ex: memory content “0100 0001” can either represent: the number 2^{6} + 1 = 65 or the ASCII code of character “A” 9 Data Representation Number Systems Binary/Octal/Decimal/Hexadecimal Converting between various number systems Signed/Unsigned Interpretation Two’s 10 Complement Addition/Subtraction Character Storage Number Systems 11 A written number is meaningful only with respect to a base To tell the assembler which base we use: Hexadecimal 25 is written as 25h Octal 25 is written as 25o or 25q Binary 1010 is written as 1010b Decimal 1010 is written as 1010 or 1010d You are supposed to know how to convert from one base to another (see appendix A) Binary Numbers Digits are 1 and 0 1 = true 0 = false MSB – most significant bit LSB – least significant bit MSB Bit numbering: 1011001010011100 15 12 LSB 0 Converting between various number systems 13 Converting Binary to Decimal Converting Decimal to Binary Converting Binary to Hexadecimal Converting Hexadecimal to Decimal Signed and Unsigned Interpretation When a memory block contains a number, to obtain its value we must choose either: the signed interpretation: in that case the most significant bit (msb) represents the sign Positive number (or zero) if msb = 0 Negative number if msb = 1 the unsigned interpretation: in that case all the bits are used to represent a magnitude (ie: positive number, or zero) 14 Signed Integers The highest bit indicates the sign. 1 = negative, 0 = positive sign bit 1 1 1 1 0 1 1 0 0 0 0 0 1 0 1 0 Negative Positive If the highest digit of a hexadecimal integer is > 7, the value is negative. Examples: 8A, C5, A2, 9D 15 Two’s Complement Notation 16 Used to represent negative numbers The twos complement of a positive number X, denoted by NEG(X), is obtained by complementing all its bits and adding +1 NEG(X) = NOT(X) + 1 Ex: NEG(10) = NOT(10) + 1 = NOT(0000 1010b) + 1 = (1111 0101b) + 1 = 1111 0110b = NEG(10) = -10 It follows that X + NEG(X) = 0 Forming the Two's Complement Negative numbers are stored in two's complement notation Represents the additive Inverse Note that 00000001 + 11111111 = 00000000 17 Binary Subtraction To perform the difference X - Y: the machine executes the addition X + NEG(Y) 00001100 – 00000011 00001100 +11111101 00001001 Practice: Subtract 0101 from 1001. 18 Maximum and Minimum Values The msb of a signed number is used for its sign fewer bits are left for its magnitude Ex: for a signed byte smallest positive = 0000 0000b largest positive = 0111 1111b = 127 largest negative = -1 = 1111 1111b smallest negative = 1000 0000b = -128 19 Ranges of Unsigned Integers byte Standard sizes: word doubleword quadword 8 16 32 64 What is the largest unsigned integer that may be stored in 20 bits? 20 Ranges of Signed Integers The highest bit is reserved for the sign. This limits the range: Practice: What is the largest positive value that may be stored in 20 bits? 21 Signed/Unsigned Interpretation (again) To obtain the value of a number we need to chose an interpretation Ex: memory content 1111 1111 can either represent: -1 if a signed interpretation is used 255 if an unsigned interpretation is used 22 Only the programmer can provide an interpretation of the content of memory Character Storage Systems Character sets (0 – 127) Extended ASCII (0 – 255) ANSI (0 – 255) Unicode (0 – 65,535) Standard ASCII Null-terminated String Array 23 of characters followed by a null byte ASCII vs Extended ASCII The ASCII code (from 00h to 7Fh) Only codes from 20h to 7Eh represent printable characters. The rest are control codes (used for printing, transmission…). Extended ASCII character set (codes 80h to FFh) Varies from one system to another MS-DOS usage: for accentuated characters, Greek symbols and some graphic characters 24 The ASCII character set CR = “carriage return” (MSDOS: move to beginning of line) LF = “line feed” (MSDOS: move directly one line below) SPC = “blank space” 25 Text Files These are files containing only ASCII characters But different conventions are used for indicating an “end-of line” MS-DOS: <CR>+<LF> UNIX: <LF> MAC: <CR> 26 This is at the origin of many problems encountered during transfers of text files from one system to another Strings and numbers 27 A strings is stored as an array of characters A 1-byte ASCII code is stored for each char Hence, we can either store the number 123 in numerical form or as the string “123” The string form is best for display The numerical form is best for computations