Introduction to Windows Memory Management And Pointers in C : Overview : The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when no longer needed. Modern protected mode operating system like Windows or Linux uses something called flat memory model. In a 32bit OS ( like a Windows 7 32 bit ) every application process has access to 2^32 address locations of 8 bits or 1 byte. Which is 4 GB. Each application in a OS like Window 7 is a process. We can see the process in the process Tab of the Task manager in windows. Description : Some of the reasons why pointers look difficult is because : Lack of understanding of Memory management in the operating system. Lack of understanding of assembly generated by the C Complier for the C source code. Non intuitive syntax relations with arrays, structures, other data types. Non intuitive pointer arithmetic. Lack of debugging. 5.1 Computer Memory : Let us examine a few basic points with regards to a computer's memory : Normally an electric chip connected to the motherboard of the computer call RAM. Very fast in reading and writing data. Most importantly processor can read and write data to memory. Memory can “remember” stored 1s and 0s. Every 8 bit or a byte of memory can be addressed which means each 8 bit of memory has an address. RAM is also called physical memory. The address of physical memory is also called physical address. Note : All the discussion refer to a 32 bit Modern OS like Windows 7 32 bit or a latest version of Linux variant which is 32 bit. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Also assume that the under lying CPU is Intel x86 with No Physical Addressing Extension. Also assume page file ( not paging ) is turned off. So the memory considered here is the RAM chip storage. The fact that some portion of the address space is specific for OS kernel is not taken into picture as it is not relevant to the discussion. 5.2 Windows Memory Management : Each Process can potentially access 4 GB of memory. This does not mean that every process has 4 GB of physical memory. It just means that process can maximum access up to 4 GB. 4GB of a process is completely different from the 4GB of another process. Any address in the 4GB may or may not be allocated that is why the name address SPACE for the 4GB. Application can access or use only memory which is ALLOCATED in the 4GB address space. Allocation is done by the OS by request from the application ( say an malloc ) or indirectly ( on stack like int a[100]; ) One particular application process cannot access any memory outside its 4 GB address space and it “thinks” that 4 GB address space is the entire system which explains the name VIRTUAL address space. (Virtual )Address space contain addresses. So address is just any integer number between 0 and 2^32. In other words “Address points to a location inside the 4GB” If a variable ( which is a memory location ) contains address we normally call it a Pointer in C programming language. Address normally shown in Hexadecimal format. So a 32 bit address is any number between 0x00000000 and 0xFFFFFFFF example 0x1234ABCD SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) 5.2.1 Process : Each process has its own private 4GB address space .The whole purpose of this is the isolation of one program from another such that all program feels like they are the only one running and the Address space is responsible for this. A process is a : 4GB Address space. Whole purpose is isolation of one program from another. All program feels like they are the only one running which is due to the address space. Figure 5.1 The above figure shows the processes which are currently running and handled by the CPU. This window can be accessed from the TaskManager and selecting the process tab from that window. Each of these processes has a 4 GB address space and each process can access a maximum of 4GB. Mainly two factors complicates memory management implementation and learning in a modern OS : 1. Protection via segmentation 2. Paging SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) 5.3 Protection using Segmentation and Paging : In a multitasking operating system each task need its own memory region which other task cannot access. Memory protection is a way to control memory access rights on a computer, and is a part of most modern operating systems. The main purpose of memory protection is to prevent a process from accessing memory that has not been allocated to it. Protection using Segmentation : Segmentation refers to dividing a computer's memory into segments. A reference to a memory location includes a value that identifies a segment and an offset within that segment. The x86 architecture has multiple segmentation features, which are helpful for using protected memory on this architecture. Protection using Paging : It is impossible for an application to access a page that has not been explicitly allocated to it, because every memory address either points to a page allocated to that application, or generates an interrupt called a page fault. Note : The reason why we got different values in same address for two different process is that the page table contains different entry for the virtual addresses that were present. So when the CPU performed the virtual address to physical address translation different physical addresses were obtained. 5.4 Allocation : In the OS, memory management involves the allocation (and constant reallocation) of specific memory blocks to individual programs as user demands change. The following are the various divisions in the 4GB address space : Allocated/ Committed regions : Contains the working Set. Paged out regions. Free Regions : Contains no allocation and is just 'space'. It does not contain any such counters of any kind kept by the OS, the CPU or any other program. Any attempt to access will result in a CPU interrupt which OS can handle. Reserved Regions : Used to avoid fragmentation. Has to be allocated before use otherwise it can be referred to as same as free regions. Just add a VAD to the process. Once reserved a region OS won't give any of the address inside that region to any allocation which is going to happen in that address space. Note: Region Means a contiguous group of address. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) 5.5 What is a pointer? A pointer is a variable which contains the address in memory of another variable. We can have a pointer to any variable type. The unary or monadic operator & gives the address of a variable. The indirection or dereference operator * gives the contents of an object pointed to by a pointer''. Pointer is a variable of size 32 bit (in a 32 bit OS) which normally contain an Address to another variable or block of memory. Other words, we ( programmers ) interpret the 32 bit value in a pointer variable as an address. Practically pointer can contain any number which can go upto 2^32. Like a visiting card which normally contains name and address of a person. One can print a visiting card with the list of his favorite TV shows or something very different but normally we don’t do it. Note : The difference between a pointer and a number is that the Pointer Size Always equal to the bitness of OS. 32 bit in 32 bit OS 64 in 64. Support certain special operation like differencing, pointer arithmetic etc by the language/complier. Syntax to declare a pointer : type *var_name; Here, type is the pointer's base type; it must be a valid C data type and var_name is the name of the pointer variable. The asterisk * used to declare a pointer is the same asterisk that is used for multiplication. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) However, in this statement the asterisk is being used to designate a variable as a pointer. e.g int *ptr; char *name; 5.5.1 What does the pointer point to? Anything in the 4GB address space that was seen. Can be invalid unallocated/or allocated or random locations. int * ptr in this case ptr can contain anything. Can be 0/NULL. e.g : int *ptr = 0; Can point to a allocated chunk of uninitialized memory in stack. e.g : int a[100] Can point to allocated chunk of uninitialized memory in heap, e.g : int *ptr = malloc( 100 ) Can point to another set of pointers in the allocated memory. Like the visiting card has the address of a place where other set of visiting cards are available. Note : Pointers are mainly used to share data between different parts of application (mostly huge chunks of data).Most of the languages have pointers but they don’t call it pointer but something else. For example in Java and C#, reference is nothing but a pointer. The reason why C, C++ notorious with pointers is they allow a lot of counter intuitive syntaxes to manipulate and access the data pointed by the pointer,while other language like Java or C# normally have some well defined intuitive functions to do that same task. Pointer in the Address space. Entire System ( Not necessarily memory .But more logical) int* ptr = malloc(100); The pointer variable ptr pointing a chunk of 100 bytes. Allocated Chunk Address space of a process. not allocated any RAM to it 0x789AC000 The number inside ptr is 0x789AC000 Which is the starting Address of the 100 byte chunk. int* ptr Notice that ptr itself has an address ( 0x1234ABCD)which is in the same address space. HelloWorld. exe 0x1234ABCD Address space of a process which has RAM allocated. Consider the assembly program of a code where an integer pointer stores the address of an integer variable . SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Figure 5.2 The above figure shows the Memory window which is a part of the Debugger in Visual Studio. It gives a list of all the allocated address spaces and the regions which are free to be allocated. The address space ranges from 0x0000000 to 0xFFFFFFFF which is nothing but 4GB. The above frame shows the address listed 1 byte at a time. Figure 5.3 SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) The above figure shows a simple program which contains an integer pointer ptr that stores the address of the integer variable 'a'. The & operator gives the address of a. The value present in the address bar in the memory window is the address of a. Figure 5.4 The above figure shows the assembly code generated for the pointer program in the Disassembly window. The value of 20 is initially assigned to 'a' as shown and then the address of 'a' is stored inside the memory location pointed to by the pointer 'ptr' which is dword ptr [ptr], where the ptr inside [] represents the integer pointer. 5.5.2 Relation between pointers and protection techniques like paging and segmentation : SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Relation between pointers and protection techniques like segmentation. CPU Program ( *ptr = 100 ) Virtual Address Segmentation Unit practically turned off so same virtual address will be given to paging system. Same Virtual address RAM Physical memory address Physical Address VA PA Paging In paging the memory address space is divided into equal-sized blocks called pages. The given program which is to be executed, is stored in the physical memory region. Using virtual memory hardware, each page can reside in any location of the computer's physical memory, or be flagged as being protected. Virtual memory makes it possible to have a linear virtual memory address space and to use it to access blocks fragmented over physical memory address space. Most computer architectures which support paging also use pages as the basis for memory protection. A page table maps virtual memory to physical memory. The page table is usually invisible to the process. Page tables make it easier to allocate additional memory, as each new page can be allocated from anywhere in physical memory. It is impossible for an application to access a page that has not been explicitly allocated to it, because every memory address either points to a page allocated to that application, or generates an interrupt called a page fault. Unallocated pages, and pages allocated to any other application, do not have any addresses from the application point of view. 5.6 Division of an allocated/ committed region : Region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently deallocated all at once. Based on Back Up : Working Set : Backed up by Ram Paged Out : Backed up by page file. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Based on programming aspects/nature of use. Stack : Stack of a thread. From example, a local variable in C language will be using the stack of the thread in which the program runs. Heap : An malloc() function in C is getting allocated from the heap. Mapped : An exe, or dll or sys is Mapped to the 4GB address space when the program start. So Global and static variables in C language are allocated from the mapped space which is part of the exe or dll. Shared. : Any region can be shared between process/kernel using windows API. Division of a Committed / Allocated region Mapped Region NT Heap Region Committed or Allocated Region Thread Stack Shared Regions Free Or Reserved Regions Working Set ( RAM) Paged out ( Page file) SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org)