Uploaded by Micah Gant

Cache Simulation Paper CDA3101

advertisement
1
Cache Analysis
Professor Cheryl Resch
Micah Gant
December 4, 2022
Caches are a means of storing data in organized structures that allow for means of
simple access to data without having to browse main memory. Fundamentally, they are how
data is temporarily stored in order to increase access times to those components in memory.
This simulation uses three forms of cache storage, Direct Mapped, Set Associative, and Fully
Associative. Direct Mapped has a one-to-one relationship between the key used to access the
data and the line the data is stored on, giving it O(1) access times if using a hash-based data
structure for both finding and replacing, but also only one piece of data with a specific tag can
be stored with a specific line number. I used an unordered map using the line number as its key
and the tag number as its data. Fully Associative is the opposite extreme, storing data on any
unspecific line until full, each value only identified by its tag. I used an unordered map with the
tag as its key and the time it was inserted as its value. Finding the data again is still O(1) due to
the hash-structure, but finding which value to replace using either First In First Out(FIFO) or
Least Recently Used(LRU) replacements is O(n) where n is the amount of entered lines since
you have to look through each value in the cache to find the lowest time of insertion for
replacement. This makes me think perhaps something like a map which uses a binary tree for
Fully Associative Cache types would be better since this would cause worst case O(logn) time
complexity where n is the amount of entered blocks, since balanced binary trees both take
O(logn) time to find, insert, and replace/delete. The Set Associative cache is a mixture of the
two types, using Direct Mapping line value idea, but instead calling it a set value, where each
set may store multiple blocks of data identified by their tag. I used an unordered_map where the
key was the set value, and the value was a vector of pairs of tags and entry times. The size of
this vector for each set would match the user-entered blocks-per-set value. Finding a value was
O(m) for this structure, where m is the amount of blocks per set as the set value had O(1)
access time to reach the corresponding vector, but the vector itself had to be scrolled through in
order to potentially find the corresponding tag. Replacing a value would take O(m) time, where
m is the amount of blocks per set, as the set (which is a vector of blocks) had to be scrolled
through once to find the lowest entry time to be replaced. It’s worth noting that LRU will update
the counter value for Full Associative and Set Associative caches when a hit is registered to a
new highest value. A hit means the data being searched for was found in the cache, while a
miss refers to the opposite. In the case of a miss the data being searched for is added to the
cache after registering a miss.
In order to test hit rates for the various types of caches, a variety of input sizes were
used. Each type of cache used both LRU and FIFO replacement methods (except Direct
Mapped since it only uses one method ever) with cache sizes from 1024 bytes to 16384 bytes
incrementing by powers of 2. The Set Associative cache was tested as a 2, 4, and 8-way Set
Associative cache (meaning 2,4, and 8 blocks were stored per set respectively.) This was done
with both a 32 byte line/block and a 64 byte line/block size.
Using 32-Byte Lines
Using 64-Byte Lines
The numbers used as headers represent the cache size in bytes, “FIFO” and “LRU” refer to the
replacement algorithm used, and “_way” refers to how many blocks were stored per set for a
Set Associative cache, where the “_” is filled by how many blocks per set were used. “Full”
represents Full Associative, and “Direct” represents Direct Mapped. The numbers stored as
results represent hit rates as percentages, as will be seen in the next section of this analysis’
graphs. The data used was from a file named “gcc.trace” given to the students which had
515,683 lines of 32-bit values, from which the various tag and set values were calculated for
each value before searching and either hitting the correct value, or missing and inserting the
value, running a replacement algorithm if necessary. This range of inputs was used in order to
ensure some variety within the results, changing each independent variable’s value at least
once for testing. Incrementing each input by a power of two also helped to create distant gaps
that further helped identify trends within the graph itself for each cache type.
Seen below are the graphed results taken from the data stored in each table.
-
Tables posted again for convenience of comparing to graphs.
Using 32-Byte Lines
Using 64-Byte Lines
From the above tables and graphs, it is very obvious that the LRU replacement boasts a
marginally higher hit rate than FIFO replacement in all circumstances, in reference to this
dataset. Unless the data was completely random in value and access time, it is likely logical to
assume that LRU will have a higher hit rate, since data that is being accessed the most will
continue to not be replaced, while the least-often accessed data will get overwritten. This
replacement method theoretically helps prevent things like a significant variable declared at the
beginning of a program from being overwritten while still being actively used. Fully Associative
caches also seem to maintain a higher hit rate than any Set Associative caches, and Direct
Mapping comes in last in all circumstances. This makes sense, as Fully Associative essentially
is Set Associative where all blocks are stored in one set, which means you have the least
overlapping of data due to its arbitrary insertion. Direct Mapped is one-to-one, so any matches
that do not contain the same tag will be a miss, hence the lower hit rate. It is appropriate to think
of Direct Mapped as a Set Associative cache with 1 block per set. Since each set has only one
block, there are no options with replacement methods, as it can only be the one block in the set
that ends up replaced. The results of this data thus make logical sense and confirm the
relationship between Direct Mapped, Set Associative, and Fully Associative caches. It is worth
mentioning that despite offering more lines of data for storage, 32-byte lines resulted in overall
lower hit rates for greater size caches. I theorize this is caused due to the offset variable being
smaller when the block size is smaller, meaning that tag, set, or line values are larger as a
result. Longer tags/set values/line values mean more bits have to match when searching for a
hit. This relationship can be compared to a password and its length, where adding just one
character can greatly increase the time it takes to randomly match said password. Similarly,
adding just one bit to a tag/set value/line value can greatly reduce the odds that another random
value matches. It is also noticeable that greater cache sizes create higher hit rates, which
logically makes sense because this also allows for more lines/blocks to be created, but without
affecting the offset, tag, line value, or set value. Mores blocks means, by the pigeonhole
principle, there are less odds of insertions overlapping, and not having one of the values
mentioned changed means that indexing of data occurs the exact same way, so there is no
negative statistical effect on the hit rate in reference to the block size as compared to another
cache with the same block size and different sized overall memory. Thus, it seems reasonable
to assume that a Fully Associative cache that uses LRU replacement, with a large cache
memory and large block size would prove the most effective in reference to deterring misses.
The best ratio between cache size and block size is worth looking into, as just having the block
size be equal to the cache would result in one block being available, guaranteeing overlap on
subsequent values after the first if the identifiers do not match, which is obviously inefficient. So
clearly, just making the block size larger is not the only important step, as it must maintain this
“best ratio” with the cache size as well.
Download