By Anand George SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Agenda Have a look on to different factors contributing to the difficulty in learning pointers in C. Address one by one and to have a clear and in-depth understanding of the concepts. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Why Pointers looks difficult? Lack of understanding of Memory management in the operating system. Lack of understanding of assembly generated by the C Complier for the C source code. Non Intuitive syntax relations with arrays, structures, other data types. Non Intuitive pointer arithmetic. Lack of debugging. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Note All the discussion refer to a 32 bit Modern OS like Windows 7 32 bit or a latest version of Linux variant which is 32 bit. Also assume that the under lying CPU is Intel x86 with No Physical Addressing Extension. Also assume page file ( not paging ) is tuned off. So when I say memory it is the RAM chip storage. The fact that some portion of the address space is specific for OS kernel is not taken in to picture as it is not relevant to the discussion. Don’t worry if this is not making much of a sense to you now. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) What is computer memory? Normally an electric chip connected to the motherboard of the computer call RAM. Very fast in reading and writing data. Most importantly processor can read and write data to memory. Memory can “remember” stored 1s and 0s. Every 8 bit or a byte of memory can be addressed which means each 8 bit of memory has an address. Ram is also called physical memory. The address of physical memory is also called physical address. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Windows Memory Management Modern protected mode operating system like Windows or Linux uses something called flat memory model. In a 32bit OS ( like a Window 7 32 bit ) every application Process has access to 2^32 address locations of 8 bits or 1 byte. Which is 4 GB. Each application in a OS like Window 7 is a process. You can see the process in the process Tab of the Task manager in windows. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Demo Task manager process Attaching visual studio to different processes to see the address space of different processes in visual studio. Check a particular address in visual studio and see if the value is same or different. If different why? SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Process 4GB Address space. Whole purpose is isolation of one program from another. All program feels like they are the only one running. Address space is giving them that feeling. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Windows Memory Management Mainly 2 factors complicates memory management implementation and learning in a modern OS 1. Protection via segmentation 2. Paging Above 2 features of the CPU help OS to implement process or the 4 GB virtual address space. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Segmentation and paging Segmentation helps mainly for protection. It make sure the OS memory is not access by application programs. Paging also has similar features and it mostly responsible for the implementation of Virtual Address space. Not going into details now. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) All we need to know is All application has potential 4GB address space. All the address spaces are different and map to different physical memory. When we say memory, address etc in the context of a program running in windows we are mostly referring to some portion in the virtual address space of that program. All the code and data or any other information related to that application is inside the 4 GB address space. From and application stand point the address space is the ‘universe’. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Protection In a multitasking operating system each task need its own memory region which other task cannot access. Just like in a town each family need a house to live their own. Thinking about every one in a town live in the same one big house. So basically like every individual every task running in a OS needs it own space. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Protection (cont) It is practically impossible to implement protection by an OS without assistance from the CPU. Modern OS ( like Windows or Linux ) uses the CPU feature called “Paging” and “Segmentation” to implement protection. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) A glimpse into Paging and Segmentation A feature in modern CPU like Intel Pentium, Amd 64, or ARM (slightly different terminology in arm but same concept)etc. Helps OS to implement mainly 2 things Protection mostly via segmentation. Virtual Memory ( To extend Ram to disk in way transparent to the programs / programmers ) We are not going into too detail. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Windows Memory Management (cont) Each Process can potentially access 4 GB of memory. Does NOT mean that every process has 4 GB of physical memory. It just means that process can maximum access up to 4 GB. 4GB of a process is completely different from the 4GB of another process. Any address in the 4GB may or may not be allocated that is why the name address SPACE for the 4GB. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Windows Memory Management (cont) Application can access or use only memory which is ALLOCATED in the 4GB address space. Allocation is done by the OS by request from the application ( say an malloc ) or indirectly ( on stack like int a[100]; ) One particular application process cannot access any memory outside its 4 GB address space and it “thinks” that 4 GB address space is the entire system which explains the name VIRTUAL address space. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Windows Memory Management (cont) (Virtual )Address space contain addresses. So address is just any integer number between 0 and 2^32 Other words “Address points to a location inside the 4GB” If a variable ( which is a memory location ) contains address we normally call it a Pointer in C programming language. Address normally shown in Hexa decimal format. So a 32 bit address is any number between 0x00000000 and 0xFFFFFFFF example 0x1234ABCD SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Now why we got different values in same address in two different process.? The page table contains different entry for the virtual addresses we had. So when the CPU did the virtual address to physical address translation we got different physical addresses. Details coming up. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Mapping of address to physical memory myapp.exe notepad.exe word.exe Paging system in Cpu with page tables ( Part of CPU ). RAM SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) What is allocation? More or less adding page table entry to page table. Page table entry maps a virtual address to a physical address or virtual memory which is nothing but a file in the disk. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Divisions in 4 GB 4GB Allocated/ Committed regions ○ Working Set ○ Paged out. Free Regions ○ No allocation ○ Just “Space” ○ No counters or nothing kept by OS or CPU or any other program. ○ Any attempt to access will result in a Cpu interrupt which OS can handle. ○ Don’t confuse with Free physical page frame in windows. Reserved Regions ○ To avoid fragmentation. ○ Has to be allocated before use other wise same as free regions. ○ Just add a VAD to the process. ○ Once reserved a region OS wont give any of address inside that region to any allocation which is going to happen in that address space. Note: Region Means a contiguous group of address. Say from 1000 to 2000. Not much of a typical jargon though. I have heard the word segment as well. Don’t confuse with arm regions. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Address spaces in System 0xFFFFFFFF Entire System ( Not necessarily memory .But more logical) 0xD0000000 Allocated Chunk 0xC0000000 Allocated Chunk 0x789AC000 Allocated 0x789AB123 0x789AB000 Allocated 0x1234ABCD Notepad.ex e 0x00000000 HelloWorld. exe SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Address space of a process. not allocated any RAM to it Address space of a process which has RAM allocated. Demo Process explorer. See different process. See the amount of ram. Looking at different counters of memory. VM Map – Looking at the address space. RamMap – How the ram is being used in the system. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) So what is a pointer after all It is an allocated memory location of 32 bit size in the 4GB address space which an application process has in Windows/Linux. (just like int) Pointer contains a number less than 2^32 which normally a C programmer interpret as an address to some other allocation in the same 4GB address space. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Pointer Pointer is a variable of size 32 bit (in a 32 bit OS) which normally contain an Address to another variable or block of memory. Other words, we ( programmers ) interpret the 32 bit value in a pointer variable as an address. Practically pointer can contain any number which can go upto 2^32. Like a visiting card which normally contains name and address of a person. One can print a visiting card with the list of his favorite TV shows or something very different but normally we don’t do it. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) How do I declare a pointer variable in C? int* iptr. char* cptr In General xxx * xptr where xxx stands for a type of data. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Difference between pointer and a number. Pointer Size Always equal to the bitness of OS. 32 bit in 32 bit OS 64 in 64. Support certain special operation like differencing, pointer arithmetic etc by the language/complier. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Demo Pointers in visual Studio C program SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Understanding pointers Almost behaves like a card/paper which contains addresses to places. In this case address are numbers not any text. Although it is weird in reality it can contain address to another set of cards as well. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) What pointer pointing to? Anything in the 4GB address space we saw. Can be invalid unallocated/or allocated or random locations. int * ptr in this case ptr can contain anything. Can be 0/NULL int *ptr = 0; Can point to a allocated chunk of uninitialized memory in stack, Example int a[100] SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) What pointer pointing to?(cont) Can point to allocated chunk of uninitialized memory in heap, example int *ptr = malloc( 100 ) Can point to another set of pointers in the allocated memory. Like the visiting card has the address of a place where other set of visiting cards are available. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) What is a main uses of pointers? Share data between different parts of application (mostly huge chunks of data). Like you give the address of a house/business location printed in business card ( pointer). SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Share of data between different part of application is a common task which may be required in other languages as well. Like Java , C# then why they don’t have pointers? Most of the languages have pointers but they don’t call it pointer but something else. For example in Java and C# Reference is nothing but a pointer. The reason why C, C++ notorious with pointers is they allow a lot of counter intuitive syntaxes to manipulate and access the data pointed by the pointer. While other language like Java or C# normally has some well defined intuitive functions to do that same task. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Pointer in the Address space. Entire System ( Not necessarily memory .But more logical) int* ptr = malloc(100); The pointer variable ptr pointing a chunk of 100 bytes. Allocated Chunk 0x789AC000 The number inside ptr is 0x789AC000 Which is the starting Address of the 100 byte chunk. int* ptr Notice that ptr itself has an address ( 0x1234ABCD)which is in the same address space. HelloWorld. exe 0x1234ABCD SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Address space of a process. not allocated any RAM to it Address space of a process which has RAM allocated. What is the relation between pointers and the protection techniques we discussed earlier? Like Paging and segmentation? CPU Program ( *ptr = 100 ) Virtual Address Same Virtual address RAM Physical memory address Segmentation Unit practically turned off so same virtual address will be given to paging system. Physical Address Paging SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) VA PA Demo Visual Studio Memory window To see the 4 GB Address space. See allocated unallocated. Not a good idea to scroll memory window. Typing address in the Memory window. Simple Pointer program looking at the memory and understanding the allocation and the address assignment. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Access Violation Any access ( read or write ) to an address other than a committed (or allocated) region of memory will generate an access violation – which is to be handled by OS or the application. Internally this is an OS handling of a CPU interrupt called Page Fault. ( We will get to details later ) SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Allocated \ Committed further division Based on Back Up Working Set – Backed up by Ram Paged Out – Backed up by page file. Based on programming aspects/nature of use . Stack – Stack of a thread. From example a local variable in C language is will be using the stack of the thread in which the program runs. Heap An malloc in C is getting allocated from the heap. Mapped. An exe, or dll or sys is Mapped to the 4GB address space when the program start. So Global and static variables in C language are allocated from the mapped space which is part of the exe or dll. Shared. Any region can be shared between process/kernel using windows API, later details. Not too different than any of the ones already discussed. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Division of a Committed / Allocated region Mapped Region NT Heap Region Committed or Allocated Region Thread Stack Shared Regions Free Or Reserved Regions Working Set ( RAM) SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Paged out ( Page file Summary A 32 bit application has 4GB address space. All memory it access or anything matters to that program is inside the address space it has. All the application running on the system has different 4 GB address space. A pointer is a variable or memory location inside the above 4 GB which has a number inside it which is interpreted by the programmer as an address to some location in the same 4GB address space. SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org) Thank you SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org)