CSCE614-2013-HW2

advertisement
CSCE 614 (Spring 2015)
Eun Jung Kim
CSCE614 Computer Architecture (Spring 2015)
Homework #4 (Pseudo-Associative Cache)
(Due: Beginning of class on 4/17/2015)
Objective
This project is to help you understand how pseudo-associative (column-associative) cache works. You
will initially analyze the sensitivity of L1 caches to changes in parameters. Then you are to implement L1
data cache as pseudo-associative in SimpleScalar and compare its performance to the normal directmapped L1 data cache.
System Requirement
Linux operating system is needed in order to use the pre-compiled little-endian Alpha ISA SPEC2000
binaries. Do not use Cygwin. If you don’t have any linux machine, please use linux.cs.tamu.edu with your
CS account. If you don’t have CS account, contact HelpDesk located in the first floor.
Setting up the environment and installing SimpleScalar
1. Download and Install SimpleScalar 3.0.
(1) Download simplesim-3v0e.tgz from http://www.simplescalar.com/.
(2) Untar the downloaded file.
$ tar xzvf simplesim-3v0e.tgz
(3) Read the README file under simplesim3.0 directory you have just untarred.
(4) Compile the simulator according to the instructions.
$ make config-alpha
$ make
Note: Some versions of GCC may generate compilation errors. In this case, use the version of GCC, which is already
installed in the department linux machine, linux.cs.tamu.edu.
(5) After you get the simulator, execute 'sim-outorder', and you will get all the configurable
parameters in the out-of-order simulator and their default values.
2. Get the benchmark.
(1) Download alpha binaries of SPECcpu 2000 benchmark from the following link.
http://students.cse.tamu.edu/rahulboyapati/spec2000binary.tgz
(2) Untar the downloaded file.
$ tar xzvf spec2000binary.tgz
3. Get run scripts and argument files.
(1) Download files from the following links.
http://students.cse.tamu.edu/khkim/teaching/csce614/spec2000args.tgz
http://students.cse.tamu.edu/khkim/teaching/csce614/runscripts.tgz
(2) Untar the files using tar command.
(3) Each run script contains the executable scripts to run each benchmark.
(4) Each benchmark needs its own arguments which are stored in the files.
(5) Select 2 integer and floating point benchmarks according to the last digit of your UIN.
Last digit of
UIN
Integer
Floating Point
0
bzip2, crafty
ammp, applu
1
crafty, gap
applu, apsi
2
gap, gcc
apsi, art
3
gcc, gzip
art, equake
4
gzip, mcf
equake, fma3d
5
mcf, parser
fma3d, galgel
6
parser, twolf
galgel, lucas
7
twolf, vortex
lucas, mesa
8
vortex, vpr
mesa, mgrid
9
vpr, bzip2
mgrid, swim
4. Run benchmarks using compiled SimpleScalar binary.
(1) Copy the script to the directory where the argument files are stored.
Note: The script file and argument files must be in the same directory.
$ cp (script dir)/RUN(benchmark) (spec2000args dir)/(benchmark)
Ex) Assuming tar files are extracted in the current directory
$ cp runscripts/RUNequake spec2000args/equake
(2) Run the script
$ cd (spec2000args dir)/(benchmark)
$ ./RUN(benchmark) (simplescalar dir)/sim-outorder (spec2000bin dir)/(benchmark)00.peak.ev6 (simplescalar
options)
Ex) Assuming tar files are extracted in the current directory
$ cd spec2000args/equake
$ ./RUNequake ../../simplesim-3.0/sim-outorder ../../spec2000binaries/equake00.peak.ev6 –max:inst 50000000 –
fastfwd 20000000 –redir:sim output1.txt –bpred bimod –bpred:bimod 256 –bpred:ras 8 –bpred:btb 64 2
Procedure
Evaluate the sensitivity of L1 caches according to the given cache parameters, in order to get better
understanding about cache mechanism. Then, implement a pseudo-associative cache in L1 data cache in
SimpleScalar. Run sim-outorder to compare the performance to the normal direct-mapped L1 data cache
using SPEC2000 benchmarks. Use the integer and floating-point benchmarks according to the last digit of
your UIN.
When running sim-outorder, use the following options as default.
-max:inst 50000000 -fastfwd 20000000 -redir:sim sim_output_file
-bpred 2lev –bpred:2lev 1 256 4 0 –bpred:ras 8 –bpred:btb 64 2
Since the assignment would require you to modify the L1 cache configurations, you can use an unified 64
KB L2 cache with a 64B cache block and 2-way associativity.
If you are running SimpleScalar in linux.cse.tamu.edu, be sure you are not monopolizing computational
resources on the machine. Do not run more than 1 instance at a time in linux.cse.tamu.edu. It is violation
of section 3.3 of the Appropriate Use of Computer Science Computing Resources Policy, located here:
http://www.cse.tamu.edu/department/policies/resources
Don't run more than one instance of any benchmark simultaneous in the same machine. It may cause
errors. Run one instance at a time per benchmark.
Homework Description
1.Part A
In the first part of the homework you will be evaluating the sensitivity of L1 caches to changes in various
parameters like cache size, block size, associativity and replacement policy. You will need to run the
simulations on all the different configurations and analyze the effects of changing cache parameters on
the performance.
Configurations
Size
Associativity
Cache block size
1 (baseline)
4 KB
Direct mapped
64B
2
4 KB
{4 , 8, fully}
64B
3
4 KB
Direct mapped
128B
4
16 KB
Direct mapped
64B
Replacement policy
{LRU,random}
You need to report the appropriate cache performance results and the analysis as to why you see this
particular behavior. Please explain why you think you see particular behavioral patterns in each of the
configurations. Also explain the effect of change in performance in L1 caches on the performance of the
L2 cache.
You need to read up on the options you need to use to simulate the cache configurations. They will show
up in the configurable parameters when you execute sim-outorder as in step 1.(5) in setting up the
simulator.
Specifying Cache Configurations
• all caches and TLB configurations specified with same format:
<name>:<nsets>:<bsize>:<assoc>:<repl>
• where:
<name> - cache name (make this unique)
<nsets> - number of sets
<assoc> - associativity (number of “ways”)
<repl> - set replacement policy (e.g. l - for LRU, f - for FIFO, r - for RANDOM)
Examples of L1 data cache parameter setting
These examples are the configuration of direct-mapped cache, set-associative cache and fullyassociative cache when cache block size is 32B and cache size is 4KB. For Part A, you need to
change those parameters according to the given cache block size, 64B or 128B.
Options
Size
Associativity
Cache block
size
Replacement policy
-cache:dl1 dl1:128:32:1:l
4KB
Direct mapped(i.e. 1 way)
32B
-
-cache:dl1 dl1:32:32:4:l
4KB
4 ways
32B
LRU
-cache:dl1 dl1:1:32:128:l
4KB
Fully
32B
LRU
Example of unified L2 cache setting
This example configures 128KB L2 cache with a 64B cache block and 2-way associativity as a
unified cache combining an instruction and data cache as one cache. To unify L2 cache, an Icache points to the data cache. Then, the unified cache named “ul2” is set with specific
parameters.
-cache:il2 dl2 -cache:dl2 ul2:1024:64:2:l
For Part A, you are required to use 64 KB L2 cache with a 64B cache block and 2-way associativity.
Thus, you need to change those parameters accordingly.
In the end, when you run a benchmark, you need to give both L1 data cache parameters and
unified L2 cache parameters. Here is an example.
-cache:dl1 dl1:128:32:1:l -cache:il2 dl2 -cache:dl2 ul2:1024:64:2:l
2.Part B.
2.1. Reading
(1) Anant Agarwal and Steven D. Pudar, “Column-Associative Caches: A Technique for Reducing the
Miss Rate of Direct-Mapped Caches,” ISCA 1993
2.2. Guideline
Direct-mapped caches are the solution for simple and easy-to-design caches with short hit access time.
However, the biggest drawback of using direct-mapped caches is the large number of conflict misses.
Pseudo-associative caches resolve conflicts by allowing alternate hashing functions and show much
higher hit rate than normal direct-mapped caches while maintaining almost the same hit access time.
Basically a pseudo-associative cache is the same as a direct-mapped cache. The fundamental idea is to
resolve conflicts by dynamically choosing different locations, which are accessed by different hashing
functions. When a conflict miss happens, the pseudo-associative cache tries to avoid it by relocating the
cache block using another rehashing function. The simplest solution of rehashing function is bit selection
with the highest-order bit inverted, which is called bit flipping.
In order to avoid secondary thrashing effect, which is explained in detail in the reference paper, each
cache block is expanded to have extra 1-bit information called a rehash bit that indicates whether the
block is a rehashed location or not.
2.3. Design
Add a new CACHE_TAG_PSEUDOASSOC macro in cache.c to get a tag value with the high-order bit of
the index appended at the end.
#define CACHE_TAG_PSEUDOASSOC(cp, addr) …
Add one more variable in struct cache_blk_t for the rehash bit as following. The rehash bit must be
initialized to 1 when the pseudo-associative cache is first created in cache_create() function in cache.c.
int
rehash_bit;
You must modify cache_access() function in cache.c to implement the pseudo-associative cache for L1D.
Since cache_access() is a general function used by all caches in the system and the pseudo-associative cache
is only for L1D, you need to write new code for pseudo-associative cache specific to L1D.
2.4. Implementation
Add the following options for pseudo-associative cache.
-pseudoassoc <true/false> # false # use pseudo-associative cache in L1D
2.5. Comparison
Compare performance of the two L1D cache configurations assuming the same size (128 sets * 64-byte
block size = 8KB).
(1) Normal direct-mapped L1D : -cache:dl1 dl1:128:64:1:l –pseudoassoc false
(2) Pseudo-associative L1D
: -cache:dl1 dl1:128:64:1:l –pseudoassoc true
You do not need to consider various hit access times in the pseudo-associative cache. Focus on only
hit/miss rates (dl1.hits/misses/miss_rate in SimpleScalar results).
Submission Instruction
1. You have to submit a hard-copy report in class. The report should explain the code you modified
or added and show the simulation results and analysis.
2. You also have to turn in your report, code and log files through the turn-in system on CSNET by
the time announced. The submission file should be a single file named hw4_UIN.zip (where UIN
is your UIN number). The log file is the output file you obtain after running a benchmark.
3. Penalty of Late submission: 5% deduction per day
Download