Cache Replacement in Modern Processors Prepared By: Paul Kosinski and Bridget Johnston Introduction The replacement policy specifies which disk block should be removed when a new block must be entered into an already full cache, and should be chosen so as to ensure that blocks likely to be referenced in the near future are retained in the cache. Common Cache Replacement Policies Used Least Recently Used (LRU) First In First Out (FIFO) Last In First Out (LIFO) Random (Rand) Least Recently Used Replaces the block in the cache which has not been used for the longest period of time Advantage: simplicity Drawback: does not consider file sizes & latency First In First Out Determines the oldest block rather than the least recently used. Advantage: Easier to calculate Last In First Out Uses the newest block Random Candidate blocks are randomly selected. All disk blocks are accessed with equal probability Used as a benchmark Problem Statement We will show the Cache Replacement Policy used by the Pentium III and Pentium IV processors and propose why each policy was used. We will use a simple matrix multiplication program and Vtune to determine our results. Experimental Setup Matrix Multiplication program Vtune Intel Pentium III machine Intel Pentium IV machine Experimental Process Ran matrix multiplication program on Pentium III & IV with the following inputs as the dimensions of an n x n matrix: 128 256 160 320 512 1024 2048 640 1280 2560 4098 8192 5120 10240 Intel Pentium III results 1.2 1 0.8 SB to DM Access 0.6 L2CM to DM Access 0.4 0.2 0 128 256 512 1024 2048 4096 8192 Intel Pentium III results 1.4 1.2 1 0.8 SB to DM Access L2CM to DM Access 0.6 0.4 0.2 0 160 320 640 1280 2560 5120 10240 Intel Pentium III results L1CM to DM Access 70 60 50 40 L1CM to DM Access 30 20 10 0 128 256 512 1024 2048 4096 8192 Intel Pentium III results L1CM to DM Access 120 100 80 60 L1CM to DM Access 40 20 0 160 320 640 1280 2560 5120 10240 Intel Pentium IV results 1000 900 800 700 600 L1CM to SR 500 L2CM to SR 400 300 200 100 0 128 256 512 1024 2048 4098 8192 Intel Pentium IV results 600 500 400 L!CM to SR 300 L2CM to SR 200 100 0 160 320 640 1280 2560 5120 10240 Conclusion Results Don’t Mean Anything Simple Hardware Required due to Timing Constraints This Means Only Simple Algorithms Should be Used LRU FIFO