a. cache

advertisement
Implementation of Low Power and Delay
Efficient L2 Cache Design
N.Munikumar, Kuppam N Chandra Sekhar, R.Mallikarjuna Reddy

Abstract— The power consumption of a system is a
crucial parameter in modern VLSI circuits especially for low
power
applications
Many
high-performance
microprocessors employ cache write-through policy for
performance improvement and at the same time
achieving good tolerance to soft errors in on chip caches.
However, write-through policy also incurs large energy
overhead due to the increased accesses to caches at the
lower level (e.g., L2 caches) during write operations. In
this paper, we propose a new cache architecture referred
to as way-tagged cache to improve the energy efficiency
of write-through caches. By maintaining the way tags of
L2 cache in the L1 cache during read operations, the
proposed technique enables L2 cache to work in an
equivalent direct-mapping manner during write hits,
which account for the majority of L2 cache accesses.
This leads to significant energy reduction without
performance degradation. In this paper we introduce the
AXI protocol, FIFO architecture and modified way
tagged cache. Simulation results based on AXI protocol,
here data transaction (write-through policy) the AXI
protocol benchmarks demonstrate that the proposed
technique achieves 70% energy savings and time savings
up to 30-40% in L2 caches. Similar results are also
obtained under different L1 and L2 cache
configurations. Furthermore, the idea of way tagging can
be applied to existing low-power cache design techniques
to further improve energy efficiency.
Index Terms— Cache, low power, write-through policy.
I INTRODUCTION
MULTI-LEVEL on-chip cache systems have been
widely adopted in high-performance microprocessors. To
keep data consistence throughout the memory hierarchy,
write-through and write-back policies are commonly
employed. Under the write-back policy, a modified cache
block is copied back to its corresponding lower level cache
only when the block is about to be replaced. It has been
reported that single-event multi-bit upsets are getting worse
in on-chip memories.
N.Munikumar,ECE Department, Jawaharlal Nehru Technology
University,GVIC
College,Madanapalle,Andhra
Pradhesh
,INDIA,Mobile:9000226722.,e-mail:munikumar431@gmail.com
Kuppam N Chandra Sekhar, ECE Department, Jawaharlal Nehru
Technology University,GVIC College,Madanapalle,Andhra Pradhesh
,INDIA,Mobile:9885229501, e-mail: chandrasekhar501@gmail.com.
R.Mallikarjuna Reddy, ECE Department, Jawaharlal Nehru
Technology University, GVIC College, Madanapalle, Andhra Pradhesh,
INDIA,Mobile:8096330172 email:mallikarjunareddy.r416@gmail.com
Currently,this problem has been addressed at
different levels of the design abstraction. At the architecture
level, an effective solution is to keep data consistent among
different levels of the memory hierarchy to prevent the
system from collapse due to soft errors. While enabling
better tolerance to soft errors, the write-through policy also
incurs large energy overhead. This is because under the
write-through policy, caches at the lower level experience
more accesses during write operations. Consider a two-level
(i.e., Level-1 and Level-2) cache system for example. If the
L1 data cache implements the write-back policy, a write hit
in the L1 cache does not need to access the L2 cache. In
contrast, if the L1 cache is write-through, then both L1 and
L2 caches need to be accessed for every write operation.
Obviously, the write-through policy incurs more write
accesses in the L2 cache, which in turn increases the energy
consumption of the cache system. Power dissipation is now
considered as one of the critical issues in cache design.
Studies have shown that on-chip caches can consume about
50% of the total power in high-performance
microprocessors.
In this paper, we propose a new cache architecture,
referred to as way-tagged cache, to improve the energy
efficiency of write-through cache systems with minimal area
overhead and no performance degradation. Consider a twolevel cache hierarchy, where the L1 data cache is writethrough and the L2 cache is inclusive for high performance.
It is observed that all the data residing in the L1 cache will
have copies in the L2 cache. In addition, the locations of
these copies in the L2 cache will not change until they are
evicted from the L2 cache. Thus, we can attach a tag to each
way in the L2 cache and send this tag information to the L1
cache when the data is loaded to the L1 cache. By doing so,
for all the data in the L1 cache, we will know exactly the
locations of their copies in the L2 cache. During the
subsequent accesses when there is a write hit in the L1 cache
(which also initiates a write access to the L2 cache under the
write-through policy), we can access the L2 cache in an
equivalent direct-mapping manner because the way tag of
the data copy in the L2 cache is available. As this operation
accounts for the majority of L2 cache accesses in most
applications, the energy consumption of L2 cache can be
reduced significantly.
The basic idea of way-tagged cache was initially
proposed in our past work with some preliminary results. In
this paper, we extend this work by making the following
contributions. First, a detailed VLSI architecture of the
proposed way-tagged cache is developed, where various
1
design issues regarding timing, control logic, operating
mechanisms, and area overhead have been studied. Second,
we demonstrate that the idea of way tagging can be
extended to many existing low-power cache design
techniques so that better tradeoffs of performance and
energy efficiency can be achieved. Third, a detailed energy
model is developed to quantify the effectiveness of the
proposed technique. Finally, a comprehensive suite of
simulations is performed with new results covering the
effectiveness of the proposed technique under different
cache configurations. It is also shown that the proposed
technique can be integrated with existing low-power cache
design techniques to further improve energy efficiency.





II LITERATURE SURVEY
A. CACHE:
Cache is the name given to the highest or first level of the
memory hierarchy encountered once the address leaves the
processor.Cache is a high – speed access area that can be
either a reserved section of main memory or a storage
device.Cache is a high speed SRAM memory used to bridge
the speed gap between the processor and slower Lower
Level Memories (LLM). For accessing LLM, one pays the
price of longer access time and large power consumption.
By reducing the number of accesses to LLM, cache
decreases the access time and improves the overall dynamic
power consumption.When the processor finds a requested
data item in the cache, it is called a cache hit. When the
processor does not find a data item it needs in the cache, a
cache miss occurs. A fixed-size collection of data containing
the requested word, called a block.
The two main cache types



Memory Cache
Disk Cache
Memory Cache:
It is a portion on memory of high –
speedstatic RAM and is effective because most programs
access the same data or instructions over – and – over. Most
computer today come with L3 Cache or L2 Cache, while
older computer included only L1 Cache.
Disk Cache:
It is a mechanism for improving the time it takes to
read from or write to a hard disk. The disk cache is usually
included as part of the hard disk. Uses conventional main
memory is 640KB. The disk cache holds data that has
recently been read and, in some cases, adjacent data areas
that are likely to be accessed next. Write caching is also
provided with some disk caches. It can dramatically improve
the performance of applications because accessing a byte of
data in RAM can be thousand of time faster than accessing a
byte on a hard disk.The purpose of cache memory is to
speed up data accesses for processor by storing information
in faster memory made of SRAM.SRAM access time is 3ns
to 10ns and DRAM access time is 30ns to 90ns


A cache is a small area of fast RAM and associated
control logic which is placed between the
processor and main memory.
The area of RAM is typically organized as a
number of cache lines, each of which comprise a
number of sub-words that are used to store
contiguous addresses from main memory.
Since the cache is smaller than main memory, it
stores a sub-set of the memory content.
Each cache line is composed from a number of
sub-words that are each one byte, that is 8-bits, in
size and the memory system is byte addressed.
The idea of cache memories is similar to virtual
memory in that some active portion of a low-speed
memory is stored in duplicate in a higher-speed
cache memory.
When a memory request is generated, the request
is first presented to the cache memory, and if the
cache cannot respond, the request is then presented
to main memory.
The time required for the cache miss depends on
both the latency and bandwidth of the memory.
Latency determines the time to retrieve the first
word of the block, and bandwidth determines the
time to retrieve the rest of this block. A cache miss
is handled by hardware and causes processors
using in-order execution to pause, or stall, until the
data are available.
The data stored in cache is data that the processor
is likely to use in the very near future.SRAM is
fast but has less memory, so store only a subset of
the data stored in main memory
Table 2.1
L1 Cache: Reference caches are Primary cache, Internal
caches, or System cache.
When referring to computer processors, L1 Cache is
cache that is built into the processor and is the fastest and
most expensive cache in the computer. The L1 Cache stores
the most critical files that to be executed and is the first
thing the processor looks when performing an instruction.
L2 Cache: Reference caches are secondary or external
cache.
L2 cache was first introduced with the Intel Pentium
and Pentium Pro computers and has been included with ever
processor. This cache is not as fast as the L1 Cache, but is
only slightly slower since it is still located on the same
processor chip, and is still faster than the computer’s
memory.
L3 Cache:
The Intel core i7-3960x processor die, is an example of
a processor chip containing six cores and the shared L3
Cache. The L3 Cache is shared b/w all cores and in very
2
large in comparison to what an L1 or L2 Cache would be on
the same chip because it is cheaper although slower.
Queue, un core and I/O
Core
Core
Core
Shared L3 Cache
Core
Core
MAPPING SCHEME 1:
DIRECT MAPPED CACHE:
The cache consists of normal high speed random
access memory, and each location in the cache holds the
data, at an address in the cache given by the lower
significant bits of the main memory address. This enables
the block to be selected directly from the lower significant
bits of the memory address. The remaining higher
significant bits of the address are stored in the cache with
the data to complete the identification of the cached data.
Core
Memory controller
Figure 2.1 L3 Cache
Basic Terminology:
Memory is divided into blocks. Each block contains fixed
numbers of words. Word = size of data stored in one
location e.g. 8 bits, 16 bits etc. One block is used as the
minimum unit of transfer between main memory and cache.
Hence, each location in the cache stores 1 block
Figure 2.3 Cache with Direct mapping
Figure 2.2 Main memory

A processor might have 512 KB of cache and 512
MB of RAM.
 There may be 1000 times more RAM than cache.
 The cache algorithms have to carefully select the
0.1% of the memory that is likely to be most
accessed.
CACHE MAPPING SCHEME:
If the CPU generates an address for a particular
word in main memory, and that data happens to be in the
cache, the same main memory address cannot be used to
access the cache. Hence a mapping scheme is required that
converts the generated main memory address into a cache
location. Mapping Scheme also determines where the block
will placed when it originally copied into the cache.
CACHE OPERATION:
Most of the time the cache is busy filling cache lines
(reading from memory) But the processor doesn’t write a
cache line which can be up to 128 bytes - it only writes
between 1 and 8 bytes. Therefore it must perform a readmodify-write sequence on the cache line. Also, the cache
uses one of two write operations: Write-through, where data
The address from the processor is divided into two
fields, a tag and an index. The tag consists of the higher
significant bits of the address, which are stored with the
data. The index is the lower significant bits of the address
used to address the cache. When the memory is referenced,
the index is first used to access a word in the cache. Then
the tag stored in the accessed word is read and compared
with the tag in the address. If the two tags are the same,
indicating that the word is the one required, access is made
to the addressed cache word. However, if the tags are not the
same, indicating that the required word is not in the cache,
reference is made to the main memory to find it. For a
memory read operation, the word is then transferred into the
cache where it is accessed. It is possible to pass the
information to the cache and the processor simultaneously,
i.e., to read-through the cache, on a miss. The cache location
is altered for a write operation. The main memory may be
altered at the same time (write-through) or later. The main
memory address is compared of a tag, an index, and a word
with in a line. All the words within a line in the cache have
the same stored tag. The index part to the address is used to
access the cache and the stored tag is compared with
required tag address. For a read operation, if the tags are the
same the word within the block is selected for transfer to the
processor. If the tags are not the same, the block containing
the required word is first transferred to the cache.
CACHE OPERATION:
is updated both on the cache and in the main memory Writeback, where data is written to the cache, and updated in the
main memory only when the cache line is replaced.


CPU requests contents of memory location
Check cache for this data
3












If present, get from cache (fast)
If not present, read required block from main
memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of
main memory is in each cache slot
If memory location found than ok! Read (hit)
If not then memory location is searched in main
memory and block of Mem. Loc. Just contain
referred mem. Loc. transferred into cache and then
read by CPU (miss)
Cache access time (c):Time between requested
address arrived and requested data placed on the
bus
Main memory access time (m):Time it takes to
transfer data from main memory
Hit ratio (0 < 1) hr:hr = cache hits / cache hits +
cache miss i.e (total requests made by cpu)
Miss ratio (mr):mr = (1 – hr)
Mean/avarage access time = c + (1-hr)m:if hr --> 1
then m.a.t. = c and if hr --> 0 then m.a.t. = c + m
Efficiency of cache (%): Cache access time * 100 /
mean access time.
Cache Read Operation Data Flow:
community will benefit from an objective series of tests
which can serve as a common reference point.
BENCHMARK: A STANDARD OF MEASUREMENT
Computer benchmark metrics usually measure speed
(how fast was the workload completed) or throughput (how
many workloads per unit time were measured). Running the
same computer benchmark on multiple computers allows a
comparison to be made. We evaluate our results using
benchmarks from the SPEC CPU2000. The benchmarks are
compiled for the Alpha instruction set using the Compaq
Alpha compiler with SPEC peak settings. For each program,
we follow the recommendation but skip a minimum of 1
billion instructions.
USE A BENCHMARK:

Ideally, the best comparison test for systems would
be your own application with your own workload.
 Unfortunately, it is often very difficult to get a
wide base of reliable, repeatable and comparable
measurements for comparisons of different systems
on your own application with your own workload.
 This might be due to time, money, confidentiality,
or other constraints.
OPTIONS AVAILABLE IN BENCHMARK:
 At this point, you can consider using standardized
benchmarks as a reference point.
 Ideally, a standardized benchmark will be portable
and maybe already run on the platforms that you
are interested in.
SPEC CPU 2000 MEASURE:



SPEC CPU2000 focuses on compute intensive
performance, which means these benchmarks
emphasize the performance of:
 computer's processor (CPU),
 memory architecture, and
 compilers.
It is important to remember the contribution of the
latter two components; performance is more than
just the processor.
SPEC CPU2000 is made up of two subcomponents
that focus on two different types of compute
intensive performance:CINT2000 for measuring
and
comparing
compute-intensive
integer
performance.CFP2000
for
measuring
and
comparing compute-intensive floating point
performance.
Figure 2.4.Cache Read Operation Data Flow:
III.EXISTING TECHNOLOGY
SPEC2000 Benchmark
SPEC:
SPEC is an acronym for the Standard Performance
Evaluation Corporation. SPEC is a non-profit organization
composed of computer vendors, systems integrators,
universities, research organizations, publishers and
consultants whose goal is to establish, maintain and endorse
a standardized set of relevant benchmarks for computer
systems. Although no one set of tests can fully characterize
overall system performance, SPEC believes that the user
Table 3.1
B .WAY-TAGGED CACHE:
4
In this section, we propose a way-tagged cache that
exploits the way information in L2 cache to improve energy
efficiency. We consider a conventional set-associative cache
system when the L1 data cache loads/writes data from/into
the L2 cache, all ways in the L2 cache are activated
simultaneously for performance consideration at the cost of
energy overhead. In Section V, we will extend this
technique to L2 caches with phased tag-data accesses. Fig. 1
illustrates the architecture of the two-level cache. Only the
L1 data cache and L2 unified cache are shown as the L1
instruction cache only reads from the L2 cache. Under the
write through policy, the L2 cache always maintains the
most recent copy of the data. Thus, whenever a data is
updated in the L1 cache, the L2 cache is updated with the
same data as well. This results in an increase in the write
accesses to the L2 cache and consequently more energy
consumption. Here we examine some important properties
of write-through caches through statistical characterization
of cache accesses.
Fig. 2 shows the simulation results of L2 cache
accesses based on the SPEC CPU2000 benchmarks [30].
These results are obtained from Simplescalar1 for the cache
configuration given in Section VII-A. Unlike the L1 cache
where read operations account for a large portion of total
memory accesses, write operations are dominant in the L2
cache for all but three benchmarks (galgel, ammp, and art).
This is because read accesses in the L2 cache are initiated by
the read misses in the L1 cache, which typically occur much
less frequently (the miss rate is less than 5% on average
[27]). For galgel, ammp, and art, L1 read miss rates are high
resulting in more read accesses than write accesses.
Nevertheless, write accesses still account for about 20%–
40% of the total accesses in the L2 cache. From the results
in Section VII, each L2 read or write access consumes
roughly the same amount of energy on average. Thus,
reducing the energy consumption of L2 write accesses is an
effective way for memory power management.
As explained in the introduction, the locations (i.e., way
tags) of L1 data copies in the L2 cache will not change until
the data are evicted from the L2 cache.
cache, the way tag of the data in the L2 cache is also sent to
the L1 cache and stored in a new set of way-tag arrays.
These way tags provide the key information for the
subsequent write accesses to the L2 cache. In general, both
write and read accesses in the L1 cache may need to access
the L2 cache. These accesses lead to different operations in
the proposed way-tagged cache. Under the write-through
policy, all write operations of the L1 cache need to access
the L2 cache. In the case of a write hit in the L1 cache, only
one way in the L2 cache will be activated because the way
tag information of the L2 cache is available, i.e., from the
way-tag arrays we can obtain the L2 way of the accessed
data. While for a write miss in the L1 cache, the requested
data is not stored in the L1 cache. As a result, its
corresponding L2 way information is not available in the
way-tag arrays.
Therefore, all ways in the L2 cache need to be
activated simultaneously. Since write hit/miss is not known
a priori, the way-tag arrays need to be accessed
simultaneously with all L1 write operations in order to avoid
performance degradation. Note that the way-tag arrays are
very small and the involved energy overhead can be easily
compensated For L1 read operations, neither read hits nor
misses need to access the way-tag arrays. This is because
read hits do not need to access the L2 cache; while for read
misses, the corresponding way tag information is not
available in the way-tag arrays. As a result, all ways in the
L2 cache are activated simultaneously under read misses.
Figure 3.2 Read Write graph
Figure 3.1 L2 Cache
The proposed way-tagged cache exploits this fact
to reduce the number of ways accessed during L2 cache
accesses. When the L1 data cache loads a data from the L2
From Fig. 4.1(b) write accesses account for the
majority of L2 cache accesses in most applications. In
addition, write hits are dominant among all write operations.
Therefore, by activating fewer ways in most of the L2 write
accesses, the proposed waytagged cache is very effective in
reducing memory energy consumption. Fig. 3 shows the
system diagram of proposed way-tagged cache. We
introduce several new components: way-tag arrays, way-tag
buffer, way decoder, and way register, all shown in the
dotted line.
The way tags of each cache line in the L2 cache are
maintained in the way-tag arrays, located with the L1 data
cache. Note that write buffers are commonly employed in
writethrough caches (and even in many write-back caches)
to improve the performance.With awrite buffer, the data to
be written into the L1 cache is also sent to the write buffer.
5
The operations stored in the write buffer are then sent to the
L2 cache in sequence. This avoids write stalls when the
processor waits for write operations to be completed in the
L2 cache. In the proposed technique, we also need to send
the way tags stored in the way-tag arrays to the L2 cache
along with the operations in the write buffer. Thus, a small
way-tag buffer is introduced to buffer the way tags read
from the way-tag arrays. A way decoder is employed to
decode way tags and generate the enable signals for the L2
cache, which activate only the desired ways in the L2 cache.
Each way in the L2 cache is encoded into a way tag. A way
register stores way tags and provides this information to the
way-tag arrays.
As well as the data transfer protocol, the AXI
protocol includes optional extensions that cover signaling
for low-power operation.
There are three types of AXI4 interfaces:
 AXI4—for high-performance memory-mapped
requirements.
 AXI4-Lite—for simple, low-throughput memorymapped communication (for example, to and from
control and status registers).
 AXI-Stream—for high-speed streaming data.
AXI WORK:
The AXI specifications describe an interface
between a single AXI master and a single AXI slave,
representing IP cores that exchange information with each
other. Memory mapped AXI masters and slaves can be
connected together using a structure called an Interconnect
block. The Xilinx AXI Interconnect IP contains AXIcompliant master and slave interfaces, and can be used to
route transactions between one or more AXI masters and
slaves.
READ AND WRITE ADDRESS CHANNELS:
Read and write transactions each have their own
address channel. The appropriate address channel carries all
of the required address and control information for a
transaction.
READ DATA CHANNEL:
Figure 3.3 Way tag arrays
IV PROPOSED TECHNOLOGY
A .AXI PROTOCOL
AXI:
AXI is part of ARM AMBA, a family of micro
controller buses first introduced in 1996. The first version of
AXI was first included in AMBA 3.0, released in 2003.
AMBA 4.0, released in 2010, includes the second version of
AXI, AXI4.The AMBA AXI protocol is targeted at highperformance, high-frequency system designs and includes a
number of features that make it suitable for a high-speed
submicron interconnect.
THE OBJECTIVES OF THE LATEST GENERATION
AMBA INTERFACE ARE TO:
enable high-frequency operation without using complex
bridges be suitable for high-bandwidth and low-latency
designs. meet the interface requirements of a wide range of
components be suitable for memory controllers with high
initial access latency provide flexibility in the
implementation of interconnect architectures be backwardcompatible with existing AHB and APB interfaces.
THE KEY FEATURES :
separate address/control and data phases,support
for unaligned data transfers using byte strobes,burst-based
transactions with only start address issued,separate read and
write data channels to enable low-cost Direct Memory
Access (DMA) ability to issue multiple outstanding
addresses ,out-of-order transaction completion,easy addition
of register stages to provide timing closure.
The read data channel conveys both the read data
and any read response information from the slave back to
the master. The read data channel includes:


The data bus, which can be 8, 16, 32, 64, 128,
256, 512, or 1024 bits wide
A read response indicating the completion
status of the read transaction.
Figure 4.1 Channel Architecture of Reads
WRITE DATA CHANNEL:
The write data channel conveys the write data from the
master to the slave and includes:
 The data bus, which can be 8, 16, 32, 64, 128, 256,
512, or 1024 bits wide
 One byte lane strobe for every eight data bits,
indicating which bytes of the data bus are valid.
Write data channel information is always treated as
buffered, so that the master can perform write transactions
without slave acknowledgement of previous write
transactions.
6
WRITE RESPONSE CHANNEL:
The write response channel provides a way for the
slave to respond to write transactions. All write transactions
use completion signaling. The completion signal occurs
once for each burst, not for each individual data transfer
within the burst.
Figure 4.2 Channel Architecture of Writes
AXI4-STREAM PROTOCOL:
The AXI4-Stream protocol is used for applications
that typically focus on a data-centric and data-flow
paradigm where the concept of an address is not present or
not required. Each AXI4-Stream acts as a single
unidirectional channel for a handshake data flow. At this
lower level of operation (compared to the memory mapped
AXI protocol types), the mechanism to move data between
IP is defined and efficient, but there is no unifying address
context between IP. The AXI4-Stream IP can be better
optimized for performance in data flow applications, but
also tends to be more specialized around a given application
space.
The AXI4-Stream protocol defines a single channel
for transmission of streaming data. The AXI4-Stream
channel is modeled after the Write Data channel of the
AXI4. Unlike AXI4, AXI4-Stream interfaces can burst an
unlimited amount of data. There are additional, optional
capabilities described in the AXI4-Stream Protocol
Specification. The specification describes how AXI4Stream-compliant interfaces can be split, merged,
interleaved, upsized, and downsized. Unlike AXI4, AXI4Stream transfers cannot be reordered. With regards to AXI4Stream, it should be noted that even if two pieces of IP are
designed in accordance with the AXI4-Stream specification,
and are compatible at a signaling level, it does not guarantee
that two components will function correctly together due to
higher level system considerations.
READ BURST EXAMPLE:
A read burst of four transfers. In this example, the
master drives the address, and the slave accepts it one cycle
later. After the address appears on the address bus, the data
transfer occurs on the read data channel. The slave keeps the
VALID signal LOW until the read data is available. For the
final data transfer of the burst, the slave asserts the RLAST
signal to show that the last data item is being transferred.
The master also drives a set of control signals
showing the length and type of the burst, but these signals
are omitted from the figure for clarity.
Figure4.3 Read burst
WRITE BURST EXAMPLE:
The process starts when the master sends an
address and control information on the write address
channel. The master then sends each item of write data over
the write data channel. When the master sends the last data
item, the WLAST signal goes HIGH.
Figure 4.4.Write burst
TRANSACTION ORDERING:
The AXI protocol enables out-of-order transaction
completion. It gives an ID tag to every transaction across the
interface. The protocol requires that transactions with the
same ID tag are completed in order, but transactions with
different ID tags can be completed out of order.
If a master requires that transactions are completed
in the same order that they are issued, then they must all
have the same ID tag. If, however, a master does not require
in-order transaction completion, it can supply the
transactions with different ID tags, enabling them to be
completed in any order. In a multi master system, the
interconnect is responsible for appending additional
information to the ID tag to ensure that ID tags from all
masters are unique.
The ID tag is similar to a master number, but with
the extension that each master can implement multiple
virtual masters within the same port by supplying an ID tag
to indicate the virtual master number. Although complex
devices can make use of the out-of-order facility, simple
devices are not required to use it. Simple masters can issue
every transaction with the same ID tag, and simple slaves
can respond to every transaction in order, irrespective of the
ID tag.
7
4.
V SIMULATION RESULTS
5.
6.
7.
Figure5.1 Cache Output
8.
9.
10.
Figure5.2 Cache waveforms
11.
CONCLUSION
This paper presents a new energy-efficient cache
technique for high-performance microprocessors employing
the write-through policy. The proposed technique attaches a
tag to each way in the L2 cache. This way tag is sent to the
way-tag arrays in the L1 cache when the data is loaded from
the L2 cache to the L1 cache. Utilizing the way tags stored
in the way-tag arrays, the L2 cache can be accessed as a
direct-mapping cache during the subsequent write hits,
thereby reducing cache energy consumption.
In this proposed system using AXI protocol for
data transmission and here developed the FIFO architecture.
Main advantages for this project modification, reduce the
power consumption up to 30%, reduce the execution time
and no data lose if Cache enable or Disable.
REFERENCES
1.
2.
3.
G. Konstadinidis, K. Normoyle, S. Wong, S. Bhutani, H.
Stuimer, T. Johnson, A. Smith, D. Cheung, F. Romano,
S. Yu, S. Oh, V.Melamed, S. Narayanan, D. Bunsey, C.
Khieu, K. J. Wu, R. Schmitt, A. Dumlao, M. Sutera, J.
Chau, andK. J. Lin, “Implementation of a thirdgeneration 1.1-GHz 64-bit microprocessor,” IEEE J.
Solid-State Circuits, vol. 37, no. 11, pp. 1461–1469,
Nov. 2002.
S. Rusu, J. Stinson, S. Tam, J. Leung, H. Muljono, and
B. Cherkauer, “A 1.5-GHz 130-nm itanium 2 processor
with 6-MB on-die L3 cache,” IEEE J. Solid-State
Circuits, vol. 38, no. 11, pp. 1887–1895, Nov. 2003.
D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V.
Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu,
H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and
P. Lazar, “A 4 MB on-chip L2 cache for a 90 nm 1.6
GHz 64 bit SPARC microprocessor,” in IEEE Int. SolidState Circuits Conf. (ISSCC) Dig. Tech. Papers, 2004,
pp. 66–67.
12.
13.
14.
15.
S. Segars, “Low power design techniques for
microprocessors,” in Proc. Int. Solid-State Circuits Conf.
Tutorial, 2001, pp. 268–273. [5] A. Malik, B. Moyer,
and D. Cermak, “A low power unified cache architecture
providing power and performance flexibility,” in Proc.
Int. Symp. Low Power Electron. Design, 2000, pp. 241–
243.
D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A
framework for architectural- level power analysis and
optimizations,” in Proc. Int. Symp. Comput. Arch., 2000,
pp. 83–94.
J. Maiz, S. hareland, K. Zhang, and P.Armstrong,
“Characterization of multi-bit soft error events in
advanced SRAMs,” in Proc. Int. Electron Devices
Meeting, 2003, pp. 21.4.1–21.4.4.
K. Osada, K. Yamaguchi, and Y. Saitoh, “SRAM
immunity to cosmic-ray-induced multierrors based on
analysis of an induced parasitic bipolar effect,” IEEE J.
Solid-State Circuits, pp. 827–833, 2004.
F. X. Ruckerbauer and G. Georgakos, “Soft error rates in
65 nm SRAMs: Analysis of new phenomena,” in Proc.
IEEE Int. On-Line Test. Symp., 2007, pp. 203–204.
G. H.Asadi,V. Sridharan, M. B. Tahoori, andD.Kaeli,
“Balancing performance and reliability in the memory
hierarchy,” in Proc. Int. Symp. Perform. Anal. Syst.
Softw., 2005, pp. 269–279.
L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and
M. J. Irwin, “Soft error and energy consumption
interactions: A data cache perspective,” in Proc. Int.
Symp. Low Power Electron. Design, 2004, pp. 132–137.
http://www.cs.ucr.edu/~junyang/teach/161_W06/slides/L
15-cache2.pdf
http://www.authorstream.com/Presentation/rajendra.raju1199306-cache-memory/
http://news.softpedia.com/newsPDF/How-CacheMemory-Works-83803.pdf
http://homepage.cs.uiowa.edu/~ghosh/4-8-08.pdf
http://www.spec.org/cpu2000/research/umn/a
BIOGRAPHIE
N.Munikumar , received his Bachelor Degree in Electronics and
Commuication Engineering from Jawaharlal Nehru Technological
University Anantapur in the year 2012. Currently he is doing his
Master’s Degree in VLSI and ES System Design in Jawaharlal
Nehru Technological University Anantapur. His interested areas
are Low Power VLSI Design, Digital Circuits Design and VLSI
Technology
Kuppam N Chandra Sekhar, received her Bachelor Degree in
Electronics & Communication Engineering from Jawaharlal Nehru
Technological University Hyderabad in the year 2010, Master’s
Degree in VLSI System Design from Jawaharlal Nehru
Technological University Hyderabad in the year 2013. Presently
working as an Assistant Professor in ECE Department In GVIC at
Madanapalle.
His interested areas of research are
Communication,VLSI System Design and Low Power VLSI
Design.
R.Mallikarjuna Reddy, received her Bachelor Degree in
Electronics & Communication Engineering from Jawaharlal Nehru
Technological University Hyderabad in the year 2004, Master’s
Degree in VLSI System Design from Jawaharlal Nehru
Technological University Hyderabad in the year 2008. Presently
working as an Assistant Professor in ECE Department In GVIC at
Madanapalle. His interested areas of research are Nano Electronics,
VLSI System Design and Low Power VLSI Design.
8
9
Download