Outline TTIT61: Memory Issues in Pintos Sergiu Rafiliu serra@ida.liu.se phone: 282281, room: B 328:228 Memory Issues in Pintos Concept of Virtual Memory User Space vs Kernel Space in the Virtual Memory Virtual Memory vs Physical Memory Page Table Validating user-space pointers The Size of a Pointed Data Validating strings Validating buffers Working with the stack Putting syscall arguments in the stack Reading syscall arguments from the stack serra@ida.liu.se The Concept of Virtual Memory Virtual Memory : A Process is not aware of the internal memory of the system and does need to manage it, this is done by the kernel. Each process “thinks” that it has a very large amount of internal memory (4GB in Pintos) just for it to run. It does not see the memory used by other processes. This concept is used to ease the job of the user-processes programmers assure a correct and safe utilization of the systems resources assure a fair distribution of resources among processes. In reality, more processes have to share a much smaller internal memory. TTIT61 Lab lesson -2- User Space vs. Kernel Space in the Virtual Memory Each user process has a Virtual memory of 4GB size The memory is split into 2 parts Kernel Space – where the kernel stack and the thread structure are memorized User Space – where the user program and user stack are memorized. 0xFFFFFFFF Kernel Space PHYS_BASE 0xC0000000 User Space 0x00000000 serra@ida.liu.se TTIT61 Lab lesson -3- serra@ida.liu.se -4- User Space Layout PHYS_BASE TTIT61 Lab lesson User Space Layout Code Segment part of the Virtual Memory where the program object code is memorized. Initialized Data Segment part of the Virtual Memory where the global variables of the program are memorized. Uninitialized Data Segment part of the Virtual Memory where the local variables and the dynamic allocated variables (the ones allocated with malloc) are memorized this part of the memory grows upwards User Stack part of the Virtual Memory where the process’ stack is kept. the stack is used to pass arguments to functions. The user stack grows downwards. User stack User stack pointer (f->esp) Uninitialized data segment (BSS) Initialized data segment Code segment 0x08048000 0x00000000 serra@ida.liu.se -5- TTIT61 Lab lesson serra@ida.liu.se -6- TTIT61 Lab lesson Kernel Space Layout 0xFFFFFFFF 0xC0000FFF Kernel stack Kernel stack pointer (f) 4kB intr_frame Structure thread Structure PHYS_BASE serra@ida.liu.se -7- TTIT61 Lab lesson Virtual Memory (VM) vs. Physical Memory (PM) PROBLEM: Each user has it’s own Virtual Memory, but all the users have to run on the same machine, at the same time, using the machine’s Physical Memory. User n VM User 1 VM CPU PM Kernel Space Layout The Kernel Space in the Virtual Memory of a user process is placed above PHYS_BASE address, and it contains: The thread sturcture for the current thread. The kernel stack: Every time an interrupt appears, the state of CPU, before the jump, is pushed in the kernel stack A state of the CPU is memorized in intr_frame structure. The kernel stack and the thread structure use 4kB of memory, which is the size of a memory page. Only the first page in the Kernel Space is used. serra@ida.liu.se Virtual Memory (VM) vs. Physical Memory (PM) SOLUTION: Split the Virtual Memory of each Process into pages and load in the Physical Memory only the needed ones. User n VM User 1 VM NOT USED Kernel Space serra@ida.liu.se CPU PM NOT USED … Kernel Space … … User Space -9- … … … User Space TTIT61 Lab lesson -8- TTIT61 Lab lesson serra@ida.liu.se -10- TTIT61 Lab lesson Virtual Memory vs. Physical Memory Page Table User VM CPU PM 0xFFFFFFFF NOT USED PHYS_BASE … When the kernel switches the control to a user process, the process’ Virtual Memory is loaded into the Physical Memory in the following way: The Kernel Space is mapped in the Physical Memory starting with address 0x00000000. The pages in the User Space are mapped as required in the rest of the memory. The link between the address in the Virtual Memory and the address in the physical Memory is done by the Page Table. 0x00000000 NULL serra@ida.liu.se -11- TTIT61 Lab lesson serra@ida.liu.se else Page Table -12- If allocated TTIT61 Lab lesson Validating User-Space Pointers Validating User-Space Pointers Whenever a system-call is issued, the user-thread stops and passes the call to the kernel-thread to be served. The kernel takes several pointer arguments (that point in the UserSpace memory) in order to serve the call, and this pointers must be validated before they can be used. The Virtual Memory page containing the pointer is loaded in the Physical Memory. pagedir_get_page( curr_thread_pgdir , ptr ) != NULL PHYS_BASE TTIT61 Lab lesson c d 0x00000000 NULL Page Table else Any kind of pointer points to a data type which has a certain size in the memory. Pintos is made to run on x86 machines which is a 32 bit machine. This means that it can only access 4B at a time from the memory, even if every byte is independently addressable. A memory page also has a size which is a multiple of 4B then. If the data type pointed by a pointer has 4B or less in size, then a simple pointer validation is enough to ensure that the data is loaded in the memory. If we have pointers to large data types (like structures, arrays, buffers or strings), then additional checks must be performed in order to ensure the validity of the datas. The problem is that such a large type can spread in the memory over several pages. By checking that a pointer is valid, we check that the page it points in, is loaded in the memory. If a pointer in a page is valid, all the other possible pointers in that page will be valid. For a large type, pointers to the parts of data, outside of the starting page, must also be checked for validity. TTIT61 Lab lesson -15- serra@ida.liu.se c – valid TTIT61 Lab lesson -14- The Size of a Pointed Data Memory Page in User VM A memory page has 4kB of memory split into 4B blocks. 0xB3A7DFFC When reading a pointer’s pointed data, a 4B block of memory is read. ptr 0xB3A7D300 If a pointer points to the beginning of a large data type (like structures, arrays, buffers or strings), spanning several blocks, the pointer is valid if all the blocks are readable (true if the blocks are in the same page as the first one). 0xB3A7D2F8 If the large data type spreads over several pages, all pages must be checked for validity. 0xB3A7D000 0xB3A7D2FC offset page number serra@ida.liu.se TTIT61 Lab lesson -16- Validating Buffers User VM User VM A String is a char array that finishes with the ‘\0’ special character. A string can span across several pages, and all of them must be loaded in the Physical Memory in order for the string to be valid. DO NOT use operators on strings (i.e. strlen) before the string is validated. Page N+3 A Buffer is memory zone, of a given size. A buffer can also span across several pages, and all of them must be loaded in the Physical Memory in order for the buffer to be valid. Page N+2 str Check a pointer in each page of the string, in order to validate it. Page N+1 Page N Page N+3 Page N+2 Based on the size of the buffer, determine the number of pages on which it spans, and check a pointer in each page, in order to validate it. buff Page N+1 Page N Check this pointers Check this pointers -17- 4 bytes block 0xB3A7D004 Validating Strings serra@ida.liu.se If allocated a , b , d – invalid The Size of a Pointed Data serra@ida.liu.se CPU PM NOT USED b The pointer is in the user-space part of the virtual memory, and not in the kernel-space. is_user_vaddr( ptr ) -13- User VM a When validating pointers, three things must be checked: The pointer is not NULL ptr != NULL serra@ida.liu.se 0xFFFFFFFF TTIT61 Lab lesson serra@ida.liu.se -18- TTIT61 Lab lesson Putting syscall arguments in the stack Already implemented in the wrapper file lib/user/syscall.c in the macros syscall0(), syscall1(), syscall2() and … PHYS_BASE User stack Putting syscall arguments in the stack When implementing the system calls, that is, when implementing syscall_handler function, the syscall arguments must be read form the stack. f->esp represents the stack pointer. Read from the stack the syscall number and then the each argument. second argument syscall3(). They are called by the wrapper functions for the syscalls and put the syscall arguments in the stack in the following way: The arguments (which are 4 byte integers or pointers) are placed in reverse order in the stack. The syscall number is also added in the stack. first argument f->esp syscall Number 0x00000000 serra@ida.liu.se -19- TTIT61 Lab lesson NOTE: You don’t know the number of arguments so read the syscall number and then treat each case differently. You must validate the pointers that you read. You only read 4B integers (syscall number, file size, file descriptor, exit status …) and pointers which are also 4B integers (char * for filenames and void * for read/write buffers) therefore you always jump 4 bytes in order to reach the next argument. THE ACTUAL FILENAMES AND BUFFERS ARE NOT PLACED IN THE STACK. DON’T MODIFY THE STACK POINTER. Use (int32_t *)f->esp + n in order to get a pointer to the n’th argument fo the system call. serra@ida.liu.se TTIT61 Lab lesson -20- Stack example Stack example PHYS_BASE The stack for this call will look like this: Consider this line of code: User stack f->esp create(“file.txt”,1000); 1000 0xC086703D 9 create(“file.txt”,1000); Uninitialized data segment (BSS) Which creates the file “file.txt” with the size of 1000 bytes. Initialized data segment 0xC086703D NOTE: create has the syscall number 9 file.txt Code segment 0x08048000 0x00000000 serra@ida.liu.se -21- TTIT61 Lab lesson serra@ida.liu.se -22- TTIT61 Lab lesson