Name : Rutuja Ingale Mis No : 142203008 Div : Comp - 2 CO ASSIGNMENT – 2 Part A: Download and install open-source cache simulator demonstrator. T4: Associativity demonstration and results – Direct Map Caches Using 351 cache simulator : - 1 Certainly! A cache simulator has a user interface that provides options for configuring and running simulation Users can configure the cache parameters such as address width, cache size, block size, associativity, replacement policy. The simulation can be executed to see how the cache behaves based on the specified parameters and memory access patterns. Users can view the current contents of the cache to understand which blocks are stored where. Statistics such as hit rate, miss rate, and other performance metrics can be displayed after running the simulation. System Parameters 1. Address Width: Address width refers to the number of bits in a memory address, determining the total number of unique addresses in the system. Larger address width allows for more unique memory addresses. It influences the total addressable memory space in the system. If the address width is 8 bits, there are 2^8 = 256 unique addresses. Or if we have given the main memory size as 256 then the required bit to represent 256 is 8 so Address width is 8. 2 2. Cache Size: Cache size represents the total amount of data that a cache can store and is typically measured in bytes. Larger cache sizes can improve performance by holding more data. Cache size affects the trade-off between speed and cost in hardware design Example: A cache with a size of 32 bytes can store 32 bytes of data. 3. Block Size: Block size represents the amount of data transferred between main memory and cache in a single cache block. It is typically measured in bytes. Larger block sizes can reduce the number of cache misses for spatially contiguous data. Example: If the block size is 8 bytes, each cache block holds 8 bytes of data from main memory. And the line size of cache is always same as block size in Direct mapping. 4. Associativity: Associativity refers to the number of cache lines or blocks to which a particular set in a cache can map. It defines how many locations within a set can hold data.In a direct-mapped cache, associativity is 1, meaning that each set contains only one line. As associativity increases, the cache becomes more flexible in mapping data. Write Policies: Write policies in cache management dictate how write operations are handled, particularly when there is a write miss. Here are four common write policies: 5. Write Hit: Write hit occurs when a write operation is performed, and the target memory location is already present in the cache. It reduces the need to update the main memory immediately. Write hits improve overall system performance. 3 Example: Writing to address 0x100 when 0x100 is already in the cache. There are two policies in Write Hit : 1) Write Back: In a write-back policy, data is written to the cache on a write miss, and the corresponding block in main memory is only updated when the block is evicted from the cache. Reduces main memory writes, as updates are delayed until necessary. Can lead to potential data inconsistency between the cache and main memory. Writing to a memory address results in the update of the cache. The corresponding block in main memory is only updated when the cache line is replaced. 2) Write Through: In a write-through policy, data is simultaneously written to both the cache and main memory on a write miss. Ensures data consistency between the cache and main memory. May introduce additional latency due to writing to both locations. A write operation to a memory address updates both the cache and the corresponding block in main memory immediately. 6. Write-miss: Write miss occurs when a write operation is performed, but the target memory location is not present in the cache. It may require fetching the data from main memory before the write. The write policy dictates how write misses are handled. Example: Writing to address 0x200 when 0x200 is not in the cache. There are two policies in Write-miss : 1) Write Allocate: In a Write-allocate is a policy in which, on a write miss, the entire block is loaded into the cache before the write operation takes place. Used when a write miss occurs to a block not present in the cache. 4 The entire block is brought into the cache before updating the desired data. Writing to a memory address that results in a cache miss. The entire block containing the address is loaded into the cache before the write operation. 2) No-Write Allocate (Write Around or Write Bypass): No-write allocate is a policy where, on a write miss, the data is written directly to main memory without loading the entire block into the cache. Useful for avoiding unnecessary cache pollution with write data. Reduces the chance of loading data that might not be used. A write operation to a memory address that is not present in the cache. The data is written directly to main memory without loading the entire block into the cache . 7. Replacement Policy: Replacement policies determine which cache line is evicted when a new block of data needs to be loaded into a cache that is already full. Three common replacement policies. 1) Least Recently Used (LRU): LRU is a replacement policy in which the least recently used cache line is chosen for eviction when a replacement is necessary. It is based on the principle of replacing the least recently accessed data. LRU requires tracking the order of cache line accesses. Example: If the cache has three lines (A, B, C) and A was accessed most recently, a new access would replace the least recently used line (e.g., C). 2) Random Replacement Policy: In a random replacement policy, any cache line is chosen randomly for eviction when a new block needs to be loaded. Simple to implement. Does not consider past access patterns. Can result in good performance in some cases, but lacks optimization based on usage patterns. Example: Imagine a cache with three lines (A, B, C). When a new block needs to be loaded, a random number generator selects one of the lines (e.g., B) for replacement. 5 3) Round Robin Replacement Policy: In a round-robin replacement policy, cache lines are selected for replacement in a cyclic order. The next line to be replaced is the one following the line that was last replaced. Evicts lines in a circular fashion. Does not consider access history or popularity of cache lines. Simple to implement. Example: Consider a cache with three lines (A, B, C). The replacement order might be A, B, C, A, B, C, and so on. When a new block needs to be loaded, the next line in the sequence is replaced. Manual Memory Access 1. Read Operation Button: Triggers a simulated read operation from the specified memory address. Example Usage: Clicking "Read" with a given hexadecimal address initiates the simulation to verify if the data at that address is in the cache (cache hit) or needs fetching from main memory (cache miss). 6 2. Write Operation Button: Initiates a simulated write operation to the specified memory address. Example Usage: Clicking "Write" with a given hexadecimal address initiates the simulation to update the cache with new data. The simulator may follow a specified write policy, such as Write-Through or Write-Back. 3. Cache Flush Button: Clears the entire cache contents, simulating a cache flush operation. Example Usage: Clicking "Flush" removes all data from the cache, simulating a scenario where the cache is emptied, leading to cache misses for subsequent memory accesses. 4. Input Hexadecimal Address Field: Enables the user to input a specific memory address in hexadecimal format. Example Usage: Enter a hexadecimal address before clicking "Read" or "Write" to simulate memory access at that specific location. 5. Simulation Output Message: Displays the contents present in the cache. If no data is shown, it indicates a cache miss has occurred. Offset Bits: Represents the number of bits needed to address individual bytes within a cache line. Calculated as log2(Block Size). Dictates the size of the blocks that the cache can accommodate. If block size is 8 then it needs 3 bits to represent it so the offset will be of 3 bits. Index Bits: Indicates the number of lines (or sets) in the cache. Determines the granularity of mapping and is typically a power of 2, correlating with the number of unique cache lines. Generally, it based on the cache line. 7 So , if there is 32 bytes cache size and the line size is 8 then total cache line will be 4 and we need 2 bits to it and it refers to the index. 6. Tag : It consist the remaining bit from address width. If total address width is 8 and index and offset has 2 and 3 bits repesctively each so tag will have 3 bits. Hit Rate : Reflects the proportion of memory accesses resulting in a cache hit, where the requested data is found in the cache. higher hit rate signifies superior cache performance. Calculated as : Number of cache hits / Total Number of Memory Access ∗ 100 Miss Rate: Signifies the proportion of memory accesses resulting in a cache miss, requiring data fetching from main memory A lower miss rate is preferable for optimal cache performance. Calculated as : Number of Cache Misses / Total Number of memory Access ∗ 100 or Miss Rate = 1 - Hit Rate. 8 m = Address width, C = Cache size, K = Block Size, E = Associativity. VDT Cache Data : This refers to the cache memory, taking into account the process of manipulating inputs as described earlier. If a read operation is initiated with the provided inputs and the cache is currently empty, it will result in a cache miss. Conversely, if the data is already present in the cache, a cache hit will occur. Cache Hit or Miss Determination: The simulator utilizes the given hexadecimal address and the cache's configuration (e.g., direct-mapped, set-associative) to ascertain the presence of data in the cache. If the data is discovered in the cache, it constitutes a cache hit. The simulator might furnish details about the hit, such as the cache set and line. In the event that the data is not found in the cache, it constitutes a cache miss. Subsequently, the simulator might simulate the process of fetching the data from main memory into the cache. The simulator's output or display should explicitly indicate whether each operation resulted in a hit or a miss, offering valuable insights into the cache's efficacy in enhancing memory access times. Physical memory: 9 This encompasses various addresses and their respective contents. In the case of a cache miss, the cache will retrieve the contents from the specified memory location in physical memory and transfer them to the cache. Conversely if a cache hit occurs, no action is taken. Associativity demonstration and results – Direct Map Caches 1. Direct-Mapped (1-way): Each set contains only one cache line, so each block of memory can map to only one specific cache line. 2. Set-Associative (2-way, 4-way, etc.): Each set contains multiple cache lines, allowing multiple blocks of memory to potentially map to the same set. The number of lines per set is the associativity. 10 3. Fully Associative: Each block of memory can map to any cache line, providing maximum flexibility but at a higher hardware cost. o Direct-Mapped Caches: In a direct-mapped cache, each block of main memory can be mapped to only one specific block in the cache. This is achieved using the concept of indexing. The cache is divided into sets, and each set contains a single cache line. The mapping of a block to a cache line is determined by the index bits from the memory address. The number of sets is equal to the number of cache lines. In a direct-mapped cache, each set contains only one line. The cache line where a block can be placed is determined solely by the index bits of the memory address. Address Mapping in Direct-Mapped Cache: Given an address with n bits: Address=Tag bits ∣ Index bits ∣ Offset bitsAddress Results: As associativity increases, the chance of a cache hit improves because there are more locations within a set where a block can be placed. However, higher associativity also increases hardware complexity and access latency. The results of a direct-mapped cache simulation would show the impact of associativity on hit rates and miss rates. Higher associativity tends to reduce conflict misses but may increase other types of misses. The demonstration and results would depend on specific parameters like cache size, block size, and the characteristics of the memory access patterns. The choice of associativity is a trade-off between performance and hardware complexity PART B : Identifying and understanding the dependencies by taking set of instructions with data forwarding Understanding dependencies in a set of instructions, especially in the context of data forwarding, is crucial for optimizing pipeline execution in a processor. Definition: Data Dependencies Occurs when an instruction depends on the result of a previous instruction or the result produced by the another instruction. 11 Data dependencies are relationships between instructions that determine the order in which they must be executed to ensure correct program behavior. These dependencies can impact the pipeline stages of a processor, and understanding them is crucial for efficient instruction execution. These dependencies impact the order in which instructions can be executed in a program. Key Points :1. Types of Data Dependencies: a. Read After Write (RAW) Dependency: Occurs when an instruction depends on the result of a previous instruction. It is a specific type of data dependency. Example: If Instruction B reads a register that was written by Instruction A, there is a RAW dependency between A and B. b. Write After Read (WAR) Dependency: Occurs when an instruction writes to a location that is subsequently read by another instruction. Example: If Instruction A reads a register, and then Instruction B writes to the same register, there is a WAR dependency. c. Write After Write (WAW) Dependency: Occurs when two instructions write to the same location. Example: If Instruction A writes to a register and then Instruction B also writes to the same register, there is a WAW dependency. 2. Implications of RAW Dependency: a. Pipeline Stalls: If a processor encounters a RAW dependency, it may need to stall the pipeline, introducing idle cycles to wait for the required data to be available. b. Impact on Performance: RAW dependencies can lead to reduced pipeline throughput and slower program execution since subsequent instructions may need to wait for the completion of the instruction producing the required data. 3. Data Forwarding: 12 Data forwarding, also known as data hazard forwarding or bypassing, serves the purpose of minimizes stalls and improving pipeline efficiency in processors by directly transmitting data from the output of one instruction to the input of another. a) The main purposes are: Reducing Stalls: Data forwarding minimizes the need for the pipeline to stall, allowing instructions to proceed through the pipeline without waiting for the completion of dependent instructions. Optimizing Pipeline Throughput: By forwarding data directly, the processor can utilize pipeline stages more effectively, enhancing overall throughput. Avoiding Idle Cycles: Data forwarding helps avoid inserting no-operation (NOP) cycles or idle cycles into the pipeline due to dependencies, ensuring better utilization of processing resources. b) Implementation of Data Forwarding: Involves detecting data dependencies and rerouting data to avoid pipeline stalls. c) Techniques: Operand forwarding, result forwarding, bypassing. 4. Handling Dependencies with Data Forwarding: Operand Forwarding (Register Forwarding): Directly pass the result of an instruction to another instruction needing that result. Result Forwarding (Memory Forwarding): Forward the result of an instruction to another instruction needing that result. Set 1 Instruction : 13 Certainly, let's identifiy dependencies based on their types : There is no data dependencies in first two instructions. As they both are load instructions , so they doesn’t depend on any other instructions. RAW Dependencies: 1.MUL.D F0, F2, F4: MUL.D depends on the result of L.D F2, 44(R3) as it uses the value in register F2. 2. SUB.D F8, F2, F6: SUB.D depends on the result of L.D F2, 44(R3) as it uses the value in register F2. SUB.D also depends on the result of L.D F6, 32(R2) as it uses the value in register F6 3.DIV.D F10, F0, F6: DIV.D depends on the result of MUL.D F0, F2, F4 as it uses the value in register F0. DIV.D also depends on the result of L.D F6, 32(R2) as it uses the value in register F6 4.ADD.D F6, F8, F2: ADD.D depends on the result of SUB.D F8, F2, F6 as it uses the value in register F8. ADD.D also depends on the result of L.D F2, 44(R3) as it uses the value in register F2. There are no WAR and WAW dependencies in this set of instructions. Now, let's consider data forwarding: Data Forwarding: 14 For RAW dependencies, we can use operand forwarding to pass the results directly: 1.MUL.D F0, F2, F4: Forward the result from Instruction 2 (F2) to Instruction 3 (F0). As F2 is used in the MUL.D operation. 2. SUB.D F8, F2, F6: Forward the result from Instruction 1 (F6) to Instruction 4 (F8). This is necessary because F6 is used in the SUB.D operation. 3. DIV.D F10, F0, F6: Forward the result from Instruction 3 (F0) to Instruction 4 (F10). Forward the result from Instruction 1 (F6) to Instruction 4 (F6). Both F0 and F6 are used in the DIV.D operation. 4. ADD.D F6, F8, F2: Forward the result from Instruction 4 (F8) to Instruction 6 (F6). Forward the result from Instruction 2 (F2) to Instruction 6 (F2). Both F8 and F2 are used in the ADD.D operation. The final sequence with data forwarding might look like this: Set 2 Instruction 15 Now, let's identify the dependencies RAW Dependencies: 1. ADD R1, R2, R3 No data dependencies. As it is first instruction. 2. MUL R7, R1, R3 Instruction 2 (MUL) depends on the result of Instruction 1 (ADD): R1 is written by Instruction 1 (ADD) and read by Instruction 2 (MUL). 3. SUB R4, R1, R5 Instruction 3 (SUB) depends on the result of Instruction 1 (ADD): Dependency Type: Read After Write (RAW) Explanation: R1 is written by Instruction 1 (ADD) and read by Instruction 3 (SUB). 4. ADD R3, R2, R4 Instruction 4 (ADD) depends on the result of Instruction 2 (MUL): Dependency Type: Read After Write (RAW) Explanation: R7 is written by Instruction 2 (MUL) and read by Instruction 4 (ADD). 5. MUL R7, R8, R9 No data dependencies. As R8 and R9 are used first time. There are no WAR and WAW dependencies in this set of instructions. Now, let's consider data forwarding: 16 Data Forwarding: For RAW dependencies, we can use operand forwarding to pass the results directly: Forward the result from Instruction 1 (R1) to Instruction 2 (R1). Forward the result from Instruction 1 (R1) to Instruction 3 (R1). Forward the result from Instruction 3 (R4) to Instruction 4 (R4) The final sequence with data forwarding might look like this: This way, data forwarding helps avoid stalls caused by dependencies and improves the efficiency of the pipeline execution. This emphasizes the use of operand forwarding to resolve RAW dependencies between instructions by directly passing the required operand from the producing instruction to the consuming instruction 17