Lesson 04: Main Memory Mechanics Objectives: (a) Describe the organization and contents of main memory when a program is being executed. (b) Demonstrate the ability to analyze a C program and identify the corresponding assembly language instructions generated. (c) Explain and demonstrate how data is stored in memory for integers, floats and addresses (e.g. little endian). 1. Introduction In the previous lessons, we introduced some of the basic concepts of a C program at the high level. In this lesson, we will look closely at the low level mechanics of the main memory. Specifically, we will Introduce the memory partition known as the Text Segment where instructions are stored. Introduce the memory partition known as the Stack where variables are stored. Introduce special processor’s registers known as EIP, EBP, ESP which are used to access the text segment and the stack 2. Overview of Main Memory and CPU Recall from a previous lesson that a high-level C program must be converted to binary machine code in order to execute on a CPU. These machine codes are stored in a portion of memory called the Text Segment. The variables in the program are stored on the Stack. The x86 processor/CPU (covered in this class) has a set of registers that it uses to support data movement from/to the main memory. The three main registers are: eip: esp: ebp: This is the most important register. This register is known as the Instruction Pointer or the Program Counter. This register holds the address of the next instruction the CPU intends to execute. Main Memory CPU eip Fetch Decode Execute The CPU reserves a section of memory, called the stack, to store values that the CPU might want to retrieve later. The esp register is used to store the address of the "top" of the stack. The name esp stands for extended stack pointer, but it is usually just called the stack pointer. Text Segment esp Stack This register is called the base pointer. This CPU register is used to point to the "bottom" of the stack. (To be more precise, we will see later that ebp actually points to the very first address after the bottom of the stack.) ebp Fig 1. Overview of CPU and memory. Adapted from Patterson and Hennessy, Computer Organization and Design – the hardware/software interface, Elsevier, 5th ed, 2014. The Central Processing Unit (CPU) of a computer uses a basic three-step cycle of: Fetch a program instruction from main memory, Decode the instruction to determine what actions to take, Execute the required actions for the instruction. 3. x86 Assembly Language Now that you have a general idea of the relationship between C language, assembly language, and machine language, let’s explore the actual hardware and software that we will use. In this class, we mainly focus on hardware that runs the x86 instruction set, the so-called x86 chip. This is by far the most common hardware implementation in PCs and servers. Here is a cheat sheet of common assembly language instructions. You should refer back to it when you later encounter an assembly language instruction that is unfamiliar. Instruction mov Meaning move Example mov DWORD PTR [esp],0x804848a Explanation of the example Place the value 0x804848a in the location specified by the address in the esp register. cmp compare cmp DWORD PTR [ebp],0x4 Compare the value 4 to the value stored in the address contained within the ebp register. 1 jne jump if not equal jne 0x804839f This instruction will always follow a comparison (cmp). If the two items in the prior comparison were not equal, then jump to the instruction stored at address 0x804839f. jle jump if less than or equal jle 0x804839f jl jump if less than jge jump if greater than or equal jg jmp jump if greater than jump This instruction will always follow a comparison (cmp). If the first item in the prior comparison is less or greater than the second item in the prior comparison, then jump to the instruction stored at address 0x804839f. For example, if the prior comparison was cmp DWORD PTR [ebp],0x4, then if the value stored in the address pointed to by the ebp register is less than or equal to 4, we would jump to the instruction stored at address 0x804839f. jmp 0x804839f Jump to the instruction located at address 0x804839f. inc increment inc DWORD PTR [eax] Increment the value stored at the memory location contained within the eax register by one. 4. Main Memory We will briefly discuss details of the two main portions in the main memory: the text segment and the stack. The Text Segment: Let’s start the discussion with a simple C program below. #include<stdio.h> int main() { int x = 7; x = 2001; } When this C program is compiled, machine codes are generated. Then, machine codes are loaded into memory, specifically the text segment. Machine language instructions can vary in length. For example, the instruction at address 0x08048345 (0x89 0xe5) is two bytes long and we know that the size of each memory location is one byte. So, this instruction uses addresses 0x08048345 and 0x08048346. Similarly, the instruction at address 0x08048354 is 7 bytes long; therefore it occupies addresses 0x08048354 to 0x0804835a. int x = 7; x = 2001; (Note 0x7d1 = 2001) The Stack: The program’s variables are stored on the stack. When an int or a float variable is declared in a c program, four bytes are reserved on the stack between ebp and esp. Note that these variables are stored in little endian order. The little endian approach stores the least significant byte in the first address slot, the second-least-significant byte goes in the next address, and so on. 2 For example, if we declare an integer variable var as Variable in memory int var = 0x12345678; MSB (Most Significant Byte) LSB (Least Significant Byte) var And assume that var is stored in memory starting at address 0xbffff818 as shown on the right. Now, consider the assembly language of the instruction below. (ebp -4) mov DWORD PTR [ebp-4],0x00001234 This assembly language instruction means (in plain English): Move the value 0x1234 into the address pointed to by ebp-4 (the base pointer address, minus 4 bytes). The value will occupy 4 bytes (ie. DWORD). ebp Address 0xbffff816 0xbffff817 0xbffff818 0xbffff819 0xbffff81a 0xbffff81b 0xbffff81c 0xBFFFF806 0xBFFFF807 0xBFFFF808 0xBFFFF809 0xBFFFF80A 0xBFFFF80B Content 0x78 0x56 0x34 0x12 0x34 0x12 0x00 0x00 Important notes: In this course, storing values in memory in little-endian format ONLY applies to int and float values, and addresses. It does NOT apply to strings, which are comprised of ASCII characters that only occupy one byte each. In addition, concerning arrays of int or float values, the individual int or float values are stored in little-endian format, but the array elements are stored in order from index 0, 1, 2, etc. Memory can be shown as one byte per row as in previous example or one word per row as shown below. Address Content 0xbffff7e8 +0 0xf0 +1 0xf7 +2 0xff +3 0xbf 0xbffff7ec 0xbffff7f0 0x39 0x4f 0x84 0x63 0x04 0x74 0x08 0x6f 0xbffff7f4 0x62 0x65 0x72 0x00 0xbffff7f8 0x18 0xf8 0xff 0xbf Address offset What is the address of this byte? 0xbffff7fa Example 1: Suppose that the following variable are declared in a C program: char initial = ‘A’; int alpha = 291; int grades[2] = {80, 96}; char school[5] = “Navy”; 1. // // // // = = = = 0x41 0x123 = 0x00000123 {0x50, 0x60} {0x4E, 0x61, 0x76, 0x79, 0x00} How many total bytes are used to store all of these variables in memory? (1 + 4 + 2*4 + 5*1) = 18 bytes 2. Once these variables are stored, the stack looks as follows. Complete the memory table below. Memory Address 0xbffff806 0xbffff807 0xbffff808 0xbffff809 0xbffff80a Data at that Memory Address (Hex) ‘N’ = 0x4E ‘a’ = 0x61 ‘v’ = 0x76 ‘y’= 0x79 0x00 = NULL 3 Variable Name school 0xbffff80b 0xbffff80c 0xbffff80d 0xbffff80e 0xbffff80f 0xbffff810 0xbffff811 0xbffff812 0xbffff813 0xbffff814 0xbffff815 0xbffff816 0xbffff817 0xbffff818 0xbffff819 gar 0x50 0x00 0x00 0x00 0x60 0x00 0x00 0x00 0x23 0x01 0x00 0x00 ‘A’ = 0x41 gar grades[0] grades[1] alpha initial Example 2: The register ebp points to the "bottom" of the stack (see picture below). Upon further review of the assembly code you determine that two strings are stored in memory, one at address ebp-40 and the other at ebp-24. (Note that the numbers 40 and 24 are ordinary base-10 numbers, not base-16.) ebp-40 0xbffff7e0 0xbffff7e4 0xbffff7e8 0xbffff7ec 0xbffff7f0 0x85 0xf8 0xf7 0x84 0x63 = ‘c’ 0x65 =’e’ 0xf8 0x04 0xff 0xff 0x04 0x74 = ‘t’ 0x72 = ‘r’ 0xff 0x08 0xbf 0xbf 0x08 0x6f = ‘o’ 0x00 0xbffff7f8 0x02 0x00 0xf0 0x39 0x4f = ‘O’ 0x62 =’b’ 0x18 0xbffff7fc 0Xf4 0x5f 0xfd 0xb7 0xbffff800 0x65 =’e’ 0x00 0x6e =’n’ 0x04 0x74 =’t’ 0x08 0xbffff808 0x54 =’T’ 0x68 =’h’ 0x35 0x07 0x00 0x00 0xbffff80c 0xf4 0x5f 0xfd 0xb7 0xbffff810 0xbffff814 0xbffff818 0xe0 0x35 0x78 0x0c 0x07 0x00 0x00 0xb8 0x00 0xbffff7f4 ebp-24 0xbffff804 ebp-16 ebp a. 0xbffff818 0xbf Determine the string stored at address ebp-40. “October” b. Determine the string stored at address ebp-24. “Tenth” c. Assume that an integer is stored at address ebp-16, what is the decimal value of this integer? 0x00000735 = 1845 4