CSCE614-2013-HW2

CSCE 614 (Spring 2015) Eun Jung Kim CSCE614 Computer Architecture (Spring 2015) Homework #4 (Pseudo-Associative Cache) (Due: Beginning of class on 4/17/2015) Objective This project is to help you understand how pseudo-associative (column-associative) cache works. You will initially analyze the sensitivity of L1 caches to changes in parameters. Then you are to implement L1 data cache as pseudo-associative in SimpleScalar and compare its performance to the normal directmapped L1 data cache. System Requirement Linux operating system is needed in order to use the pre-compiled little-endian Alpha ISA SPEC2000 binaries. Do not use Cygwin. If you don’t have any linux machine, please use linux.cs.tamu.edu with your CS account. If you don’t have CS account, contact HelpDesk located in the first floor. Setting up the environment and installing SimpleScalar 1. Download and Install SimpleScalar 3.0. (1) Download simplesim-3v0e.tgz from http://www.simplescalar.com/. (2) Untar the downloaded file. $ tar xzvf simplesim-3v0e.tgz (3) Read the README file under simplesim3.0 directory you have just untarred. (4) Compile the simulator according to the instructions. $ make config-alpha $ make Note: Some versions of GCC may generate compilation errors. In this case, use the version of GCC, which is already installed in the department linux machine, linux.cs.tamu.edu. (5) After you get the simulator, execute 'sim-outorder', and you will get all the configurable parameters in the out-of-order simulator and their default values. 2. Get the benchmark. (1) Download alpha binaries of SPECcpu 2000 benchmark from the following link. http://students.cse.tamu.edu/rahulboyapati/spec2000binary.tgz (2) Untar the downloaded file. $ tar xzvf spec2000binary.tgz 3. Get run scripts and argument files. (1) Download files from the following links. http://students.cse.tamu.edu/khkim/teaching/csce614/spec2000args.tgz http://students.cse.tamu.edu/khkim/teaching/csce614/runscripts.tgz (2) Untar the files using tar command. (3) Each run script contains the executable scripts to run each benchmark. (4) Each benchmark needs its own arguments which are stored in the files. (5) Select 2 integer and floating point benchmarks according to the last digit of your UIN. Last digit of UIN Integer Floating Point 0 bzip2, crafty ammp, applu 1 crafty, gap applu, apsi 2 gap, gcc apsi, art 3 gcc, gzip art, equake 4 gzip, mcf equake, fma3d 5 mcf, parser fma3d, galgel 6 parser, twolf galgel, lucas 7 twolf, vortex lucas, mesa 8 vortex, vpr mesa, mgrid 9 vpr, bzip2 mgrid, swim 4. Run benchmarks using compiled SimpleScalar binary. (1) Copy the script to the directory where the argument files are stored. Note: The script file and argument files must be in the same directory. $ cp (script dir)/RUN(benchmark) (spec2000args dir)/(benchmark) Ex) Assuming tar files are extracted in the current directory $ cp runscripts/RUNequake spec2000args/equake (2) Run the script $ cd (spec2000args dir)/(benchmark) $ ./RUN(benchmark) (simplescalar dir)/sim-outorder (spec2000bin dir)/(benchmark)00.peak.ev6 (simplescalar options) Ex) Assuming tar files are extracted in the current directory $ cd spec2000args/equake $ ./RUNequake ../../simplesim-3.0/sim-outorder ../../spec2000binaries/equake00.peak.ev6 –max:inst 50000000 – fastfwd 20000000 –redir:sim output1.txt –bpred bimod –bpred:bimod 256 –bpred:ras 8 –bpred:btb 64 2 Procedure Evaluate the sensitivity of L1 caches according to the given cache parameters, in order to get better understanding about cache mechanism. Then, implement a pseudo-associative cache in L1 data cache in SimpleScalar. Run sim-outorder to compare the performance to the normal direct-mapped L1 data cache using SPEC2000 benchmarks. Use the integer and floating-point benchmarks according to the last digit of your UIN. When running sim-outorder, use the following options as default. -max:inst 50000000 -fastfwd 20000000 -redir:sim sim_output_file -bpred 2lev –bpred:2lev 1 256 4 0 –bpred:ras 8 –bpred:btb 64 2 Since the assignment would require you to modify the L1 cache configurations, you can use an unified 64 KB L2 cache with a 64B cache block and 2-way associativity. If you are running SimpleScalar in linux.cse.tamu.edu, be sure you are not monopolizing computational resources on the machine. Do not run more than 1 instance at a time in linux.cse.tamu.edu. It is violation of section 3.3 of the Appropriate Use of Computer Science Computing Resources Policy, located here: http://www.cse.tamu.edu/department/policies/resources Don't run more than one instance of any benchmark simultaneous in the same machine. It may cause errors. Run one instance at a time per benchmark. Homework Description 1.Part A In the first part of the homework you will be evaluating the sensitivity of L1 caches to changes in various parameters like cache size, block size, associativity and replacement policy. You will need to run the simulations on all the different configurations and analyze the effects of changing cache parameters on the performance. Configurations Size Associativity Cache block size 1 (baseline) 4 KB Direct mapped 64B 2 4 KB {4 , 8, fully} 64B 3 4 KB Direct mapped 128B 4 16 KB Direct mapped 64B Replacement policy {LRU,random} You need to report the appropriate cache performance results and the analysis as to why you see this particular behavior. Please explain why you think you see particular behavioral patterns in each of the configurations. Also explain the effect of change in performance in L1 caches on the performance of the L2 cache. You need to read up on the options you need to use to simulate the cache configurations. They will show up in the configurable parameters when you execute sim-outorder as in step 1.(5) in setting up the simulator. Specifying Cache Configurations • all caches and TLB configurations specified with same format: <name>:<nsets>:<bsize>:<assoc>:<repl> • where: <name> - cache name (make this unique) <nsets> - number of sets <assoc> - associativity (number of “ways”) <repl> - set replacement policy (e.g. l - for LRU, f - for FIFO, r - for RANDOM) Examples of L1 data cache parameter setting These examples are the configuration of direct-mapped cache, set-associative cache and fullyassociative cache when cache block size is 32B and cache size is 4KB. For Part A, you need to change those parameters according to the given cache block size, 64B or 128B. Options Size Associativity Cache block size Replacement policy -cache:dl1 dl1:128:32:1:l 4KB Direct mapped(i.e. 1 way) 32B - -cache:dl1 dl1:32:32:4:l 4KB 4 ways 32B LRU -cache:dl1 dl1:1:32:128:l 4KB Fully 32B LRU Example of unified L2 cache setting This example configures 128KB L2 cache with a 64B cache block and 2-way associativity as a unified cache combining an instruction and data cache as one cache. To unify L2 cache, an Icache points to the data cache. Then, the unified cache named “ul2” is set with specific parameters. -cache:il2 dl2 -cache:dl2 ul2:1024:64:2:l For Part A, you are required to use 64 KB L2 cache with a 64B cache block and 2-way associativity. Thus, you need to change those parameters accordingly. In the end, when you run a benchmark, you need to give both L1 data cache parameters and unified L2 cache parameters. Here is an example. -cache:dl1 dl1:128:32:1:l -cache:il2 dl2 -cache:dl2 ul2:1024:64:2:l 2.Part B. 2.1. Reading (1) Anant Agarwal and Steven D. Pudar, “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” ISCA 1993 2.2. Guideline Direct-mapped caches are the solution for simple and easy-to-design caches with short hit access time. However, the biggest drawback of using direct-mapped caches is the large number of conflict misses. Pseudo-associative caches resolve conflicts by allowing alternate hashing functions and show much higher hit rate than normal direct-mapped caches while maintaining almost the same hit access time. Basically a pseudo-associative cache is the same as a direct-mapped cache. The fundamental idea is to resolve conflicts by dynamically choosing different locations, which are accessed by different hashing functions. When a conflict miss happens, the pseudo-associative cache tries to avoid it by relocating the cache block using another rehashing function. The simplest solution of rehashing function is bit selection with the highest-order bit inverted, which is called bit flipping. In order to avoid secondary thrashing effect, which is explained in detail in the reference paper, each cache block is expanded to have extra 1-bit information called a rehash bit that indicates whether the block is a rehashed location or not. 2.3. Design Add a new CACHE_TAG_PSEUDOASSOC macro in cache.c to get a tag value with the high-order bit of the index appended at the end. #define CACHE_TAG_PSEUDOASSOC(cp, addr) … Add one more variable in struct cache_blk_t for the rehash bit as following. The rehash bit must be initialized to 1 when the pseudo-associative cache is first created in cache_create() function in cache.c. int rehash_bit; You must modify cache_access() function in cache.c to implement the pseudo-associative cache for L1D. Since cache_access() is a general function used by all caches in the system and the pseudo-associative cache is only for L1D, you need to write new code for pseudo-associative cache specific to L1D. 2.4. Implementation Add the following options for pseudo-associative cache. -pseudoassoc <true/false> # false # use pseudo-associative cache in L1D 2.5. Comparison Compare performance of the two L1D cache configurations assuming the same size (128 sets * 64-byte block size = 8KB). (1) Normal direct-mapped L1D : -cache:dl1 dl1:128:64:1:l –pseudoassoc false (2) Pseudo-associative L1D : -cache:dl1 dl1:128:64:1:l –pseudoassoc true You do not need to consider various hit access times in the pseudo-associative cache. Focus on only hit/miss rates (dl1.hits/misses/miss_rate in SimpleScalar results). Submission Instruction 1. You have to submit a hard-copy report in class. The report should explain the code you modified or added and show the simulation results and analysis. 2. You also have to turn in your report, code and log files through the turn-in system on CSNET by the time announced. The submission file should be a single file named hw4_UIN.zip (where UIN is your UIN number). The log file is the output file you obtain after running a benchmark. 3. Penalty of Late submission: 5% deduction per day

CSCE614-2013-HW2

Related documents

Products

Support

CSCE614-2013-HW2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib