Implementation of Low Power and Delay Efficient L2 Cache Design N.Munikumar, Kuppam N Chandra Sekhar, R.Mallikarjuna Reddy Abstract— The power consumption of a system is a crucial parameter in modern VLSI circuits especially for low power applications Many high-performance microprocessors employ cache write-through policy for performance improvement and at the same time achieving good tolerance to soft errors in on chip caches. However, write-through policy also incurs large energy overhead due to the increased accesses to caches at the lower level (e.g., L2 caches) during write operations. In this paper, we propose a new cache architecture referred to as way-tagged cache to improve the energy efficiency of write-through caches. By maintaining the way tags of L2 cache in the L1 cache during read operations, the proposed technique enables L2 cache to work in an equivalent direct-mapping manner during write hits, which account for the majority of L2 cache accesses. This leads to significant energy reduction without performance degradation. In this paper we introduce the AXI protocol, FIFO architecture and modified way tagged cache. Simulation results based on AXI protocol, here data transaction (write-through policy) the AXI protocol benchmarks demonstrate that the proposed technique achieves 70% energy savings and time savings up to 30-40% in L2 caches. Similar results are also obtained under different L1 and L2 cache configurations. Furthermore, the idea of way tagging can be applied to existing low-power cache design techniques to further improve energy efficiency. Index Terms— Cache, low power, write-through policy. I INTRODUCTION MULTI-LEVEL on-chip cache systems have been widely adopted in high-performance microprocessors. To keep data consistence throughout the memory hierarchy, write-through and write-back policies are commonly employed. Under the write-back policy, a modified cache block is copied back to its corresponding lower level cache only when the block is about to be replaced. It has been reported that single-event multi-bit upsets are getting worse in on-chip memories. N.Munikumar,ECE Department, Jawaharlal Nehru Technology University,GVIC College,Madanapalle,Andhra Pradhesh ,INDIA,Mobile:9000226722.,e-mail:munikumar431@gmail.com Kuppam N Chandra Sekhar, ECE Department, Jawaharlal Nehru Technology University,GVIC College,Madanapalle,Andhra Pradhesh ,INDIA,Mobile:9885229501, e-mail: chandrasekhar501@gmail.com. R.Mallikarjuna Reddy, ECE Department, Jawaharlal Nehru Technology University, GVIC College, Madanapalle, Andhra Pradhesh, INDIA,Mobile:8096330172 email:mallikarjunareddy.r416@gmail.com Currently,this problem has been addressed at different levels of the design abstraction. At the architecture level, an effective solution is to keep data consistent among different levels of the memory hierarchy to prevent the system from collapse due to soft errors. While enabling better tolerance to soft errors, the write-through policy also incurs large energy overhead. This is because under the write-through policy, caches at the lower level experience more accesses during write operations. Consider a two-level (i.e., Level-1 and Level-2) cache system for example. If the L1 data cache implements the write-back policy, a write hit in the L1 cache does not need to access the L2 cache. In contrast, if the L1 cache is write-through, then both L1 and L2 caches need to be accessed for every write operation. Obviously, the write-through policy incurs more write accesses in the L2 cache, which in turn increases the energy consumption of the cache system. Power dissipation is now considered as one of the critical issues in cache design. Studies have shown that on-chip caches can consume about 50% of the total power in high-performance microprocessors. In this paper, we propose a new cache architecture, referred to as way-tagged cache, to improve the energy efficiency of write-through cache systems with minimal area overhead and no performance degradation. Consider a twolevel cache hierarchy, where the L1 data cache is writethrough and the L2 cache is inclusive for high performance. It is observed that all the data residing in the L1 cache will have copies in the L2 cache. In addition, the locations of these copies in the L2 cache will not change until they are evicted from the L2 cache. Thus, we can attach a tag to each way in the L2 cache and send this tag information to the L1 cache when the data is loaded to the L1 cache. By doing so, for all the data in the L1 cache, we will know exactly the locations of their copies in the L2 cache. During the subsequent accesses when there is a write hit in the L1 cache (which also initiates a write access to the L2 cache under the write-through policy), we can access the L2 cache in an equivalent direct-mapping manner because the way tag of the data copy in the L2 cache is available. As this operation accounts for the majority of L2 cache accesses in most applications, the energy consumption of L2 cache can be reduced significantly. The basic idea of way-tagged cache was initially proposed in our past work with some preliminary results. In this paper, we extend this work by making the following contributions. First, a detailed VLSI architecture of the proposed way-tagged cache is developed, where various 1 design issues regarding timing, control logic, operating mechanisms, and area overhead have been studied. Second, we demonstrate that the idea of way tagging can be extended to many existing low-power cache design techniques so that better tradeoffs of performance and energy efficiency can be achieved. Third, a detailed energy model is developed to quantify the effectiveness of the proposed technique. Finally, a comprehensive suite of simulations is performed with new results covering the effectiveness of the proposed technique under different cache configurations. It is also shown that the proposed technique can be integrated with existing low-power cache design techniques to further improve energy efficiency. II LITERATURE SURVEY A. CACHE: Cache is the name given to the highest or first level of the memory hierarchy encountered once the address leaves the processor.Cache is a high – speed access area that can be either a reserved section of main memory or a storage device.Cache is a high speed SRAM memory used to bridge the speed gap between the processor and slower Lower Level Memories (LLM). For accessing LLM, one pays the price of longer access time and large power consumption. By reducing the number of accesses to LLM, cache decreases the access time and improves the overall dynamic power consumption.When the processor finds a requested data item in the cache, it is called a cache hit. When the processor does not find a data item it needs in the cache, a cache miss occurs. A fixed-size collection of data containing the requested word, called a block. The two main cache types Memory Cache Disk Cache Memory Cache: It is a portion on memory of high – speedstatic RAM and is effective because most programs access the same data or instructions over – and – over. Most computer today come with L3 Cache or L2 Cache, while older computer included only L1 Cache. Disk Cache: It is a mechanism for improving the time it takes to read from or write to a hard disk. The disk cache is usually included as part of the hard disk. Uses conventional main memory is 640KB. The disk cache holds data that has recently been read and, in some cases, adjacent data areas that are likely to be accessed next. Write caching is also provided with some disk caches. It can dramatically improve the performance of applications because accessing a byte of data in RAM can be thousand of time faster than accessing a byte on a hard disk.The purpose of cache memory is to speed up data accesses for processor by storing information in faster memory made of SRAM.SRAM access time is 3ns to 10ns and DRAM access time is 30ns to 90ns A cache is a small area of fast RAM and associated control logic which is placed between the processor and main memory. The area of RAM is typically organized as a number of cache lines, each of which comprise a number of sub-words that are used to store contiguous addresses from main memory. Since the cache is smaller than main memory, it stores a sub-set of the memory content. Each cache line is composed from a number of sub-words that are each one byte, that is 8-bits, in size and the memory system is byte addressed. The idea of cache memories is similar to virtual memory in that some active portion of a low-speed memory is stored in duplicate in a higher-speed cache memory. When a memory request is generated, the request is first presented to the cache memory, and if the cache cannot respond, the request is then presented to main memory. The time required for the cache miss depends on both the latency and bandwidth of the memory. Latency determines the time to retrieve the first word of the block, and bandwidth determines the time to retrieve the rest of this block. A cache miss is handled by hardware and causes processors using in-order execution to pause, or stall, until the data are available. The data stored in cache is data that the processor is likely to use in the very near future.SRAM is fast but has less memory, so store only a subset of the data stored in main memory Table 2.1 L1 Cache: Reference caches are Primary cache, Internal caches, or System cache. When referring to computer processors, L1 Cache is cache that is built into the processor and is the fastest and most expensive cache in the computer. The L1 Cache stores the most critical files that to be executed and is the first thing the processor looks when performing an instruction. L2 Cache: Reference caches are secondary or external cache. L2 cache was first introduced with the Intel Pentium and Pentium Pro computers and has been included with ever processor. This cache is not as fast as the L1 Cache, but is only slightly slower since it is still located on the same processor chip, and is still faster than the computer’s memory. L3 Cache: The Intel core i7-3960x processor die, is an example of a processor chip containing six cores and the shared L3 Cache. The L3 Cache is shared b/w all cores and in very 2 large in comparison to what an L1 or L2 Cache would be on the same chip because it is cheaper although slower. Queue, un core and I/O Core Core Core Shared L3 Cache Core Core MAPPING SCHEME 1: DIRECT MAPPED CACHE: The cache consists of normal high speed random access memory, and each location in the cache holds the data, at an address in the cache given by the lower significant bits of the main memory address. This enables the block to be selected directly from the lower significant bits of the memory address. The remaining higher significant bits of the address are stored in the cache with the data to complete the identification of the cached data. Core Memory controller Figure 2.1 L3 Cache Basic Terminology: Memory is divided into blocks. Each block contains fixed numbers of words. Word = size of data stored in one location e.g. 8 bits, 16 bits etc. One block is used as the minimum unit of transfer between main memory and cache. Hence, each location in the cache stores 1 block Figure 2.3 Cache with Direct mapping Figure 2.2 Main memory A processor might have 512 KB of cache and 512 MB of RAM. There may be 1000 times more RAM than cache. The cache algorithms have to carefully select the 0.1% of the memory that is likely to be most accessed. CACHE MAPPING SCHEME: If the CPU generates an address for a particular word in main memory, and that data happens to be in the cache, the same main memory address cannot be used to access the cache. Hence a mapping scheme is required that converts the generated main memory address into a cache location. Mapping Scheme also determines where the block will placed when it originally copied into the cache. CACHE OPERATION: Most of the time the cache is busy filling cache lines (reading from memory) But the processor doesn’t write a cache line which can be up to 128 bytes - it only writes between 1 and 8 bytes. Therefore it must perform a readmodify-write sequence on the cache line. Also, the cache uses one of two write operations: Write-through, where data The address from the processor is divided into two fields, a tag and an index. The tag consists of the higher significant bits of the address, which are stored with the data. The index is the lower significant bits of the address used to address the cache. When the memory is referenced, the index is first used to access a word in the cache. Then the tag stored in the accessed word is read and compared with the tag in the address. If the two tags are the same, indicating that the word is the one required, access is made to the addressed cache word. However, if the tags are not the same, indicating that the required word is not in the cache, reference is made to the main memory to find it. For a memory read operation, the word is then transferred into the cache where it is accessed. It is possible to pass the information to the cache and the processor simultaneously, i.e., to read-through the cache, on a miss. The cache location is altered for a write operation. The main memory may be altered at the same time (write-through) or later. The main memory address is compared of a tag, an index, and a word with in a line. All the words within a line in the cache have the same stored tag. The index part to the address is used to access the cache and the stored tag is compared with required tag address. For a read operation, if the tags are the same the word within the block is selected for transfer to the processor. If the tags are not the same, the block containing the required word is first transferred to the cache. CACHE OPERATION: is updated both on the cache and in the main memory Writeback, where data is written to the cache, and updated in the main memory only when the cache line is replaced. CPU requests contents of memory location Check cache for this data 3 If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot If memory location found than ok! Read (hit) If not then memory location is searched in main memory and block of Mem. Loc. Just contain referred mem. Loc. transferred into cache and then read by CPU (miss) Cache access time (c):Time between requested address arrived and requested data placed on the bus Main memory access time (m):Time it takes to transfer data from main memory Hit ratio (0 < 1) hr:hr = cache hits / cache hits + cache miss i.e (total requests made by cpu) Miss ratio (mr):mr = (1 – hr) Mean/avarage access time = c + (1-hr)m:if hr --> 1 then m.a.t. = c and if hr --> 0 then m.a.t. = c + m Efficiency of cache (%): Cache access time * 100 / mean access time. Cache Read Operation Data Flow: community will benefit from an objective series of tests which can serve as a common reference point. BENCHMARK: A STANDARD OF MEASUREMENT Computer benchmark metrics usually measure speed (how fast was the workload completed) or throughput (how many workloads per unit time were measured). Running the same computer benchmark on multiple computers allows a comparison to be made. We evaluate our results using benchmarks from the SPEC CPU2000. The benchmarks are compiled for the Alpha instruction set using the Compaq Alpha compiler with SPEC peak settings. For each program, we follow the recommendation but skip a minimum of 1 billion instructions. USE A BENCHMARK: Ideally, the best comparison test for systems would be your own application with your own workload. Unfortunately, it is often very difficult to get a wide base of reliable, repeatable and comparable measurements for comparisons of different systems on your own application with your own workload. This might be due to time, money, confidentiality, or other constraints. OPTIONS AVAILABLE IN BENCHMARK: At this point, you can consider using standardized benchmarks as a reference point. Ideally, a standardized benchmark will be portable and maybe already run on the platforms that you are interested in. SPEC CPU 2000 MEASURE: SPEC CPU2000 focuses on compute intensive performance, which means these benchmarks emphasize the performance of: computer's processor (CPU), memory architecture, and compilers. It is important to remember the contribution of the latter two components; performance is more than just the processor. SPEC CPU2000 is made up of two subcomponents that focus on two different types of compute intensive performance:CINT2000 for measuring and comparing compute-intensive integer performance.CFP2000 for measuring and comparing compute-intensive floating point performance. Figure 2.4.Cache Read Operation Data Flow: III.EXISTING TECHNOLOGY SPEC2000 Benchmark SPEC: SPEC is an acronym for the Standard Performance Evaluation Corporation. SPEC is a non-profit organization composed of computer vendors, systems integrators, universities, research organizations, publishers and consultants whose goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user Table 3.1 B .WAY-TAGGED CACHE: 4 In this section, we propose a way-tagged cache that exploits the way information in L2 cache to improve energy efficiency. We consider a conventional set-associative cache system when the L1 data cache loads/writes data from/into the L2 cache, all ways in the L2 cache are activated simultaneously for performance consideration at the cost of energy overhead. In Section V, we will extend this technique to L2 caches with phased tag-data accesses. Fig. 1 illustrates the architecture of the two-level cache. Only the L1 data cache and L2 unified cache are shown as the L1 instruction cache only reads from the L2 cache. Under the write through policy, the L2 cache always maintains the most recent copy of the data. Thus, whenever a data is updated in the L1 cache, the L2 cache is updated with the same data as well. This results in an increase in the write accesses to the L2 cache and consequently more energy consumption. Here we examine some important properties of write-through caches through statistical characterization of cache accesses. Fig. 2 shows the simulation results of L2 cache accesses based on the SPEC CPU2000 benchmarks [30]. These results are obtained from Simplescalar1 for the cache configuration given in Section VII-A. Unlike the L1 cache where read operations account for a large portion of total memory accesses, write operations are dominant in the L2 cache for all but three benchmarks (galgel, ammp, and art). This is because read accesses in the L2 cache are initiated by the read misses in the L1 cache, which typically occur much less frequently (the miss rate is less than 5% on average [27]). For galgel, ammp, and art, L1 read miss rates are high resulting in more read accesses than write accesses. Nevertheless, write accesses still account for about 20%– 40% of the total accesses in the L2 cache. From the results in Section VII, each L2 read or write access consumes roughly the same amount of energy on average. Thus, reducing the energy consumption of L2 write accesses is an effective way for memory power management. As explained in the introduction, the locations (i.e., way tags) of L1 data copies in the L2 cache will not change until the data are evicted from the L2 cache. cache, the way tag of the data in the L2 cache is also sent to the L1 cache and stored in a new set of way-tag arrays. These way tags provide the key information for the subsequent write accesses to the L2 cache. In general, both write and read accesses in the L1 cache may need to access the L2 cache. These accesses lead to different operations in the proposed way-tagged cache. Under the write-through policy, all write operations of the L1 cache need to access the L2 cache. In the case of a write hit in the L1 cache, only one way in the L2 cache will be activated because the way tag information of the L2 cache is available, i.e., from the way-tag arrays we can obtain the L2 way of the accessed data. While for a write miss in the L1 cache, the requested data is not stored in the L1 cache. As a result, its corresponding L2 way information is not available in the way-tag arrays. Therefore, all ways in the L2 cache need to be activated simultaneously. Since write hit/miss is not known a priori, the way-tag arrays need to be accessed simultaneously with all L1 write operations in order to avoid performance degradation. Note that the way-tag arrays are very small and the involved energy overhead can be easily compensated For L1 read operations, neither read hits nor misses need to access the way-tag arrays. This is because read hits do not need to access the L2 cache; while for read misses, the corresponding way tag information is not available in the way-tag arrays. As a result, all ways in the L2 cache are activated simultaneously under read misses. Figure 3.2 Read Write graph Figure 3.1 L2 Cache The proposed way-tagged cache exploits this fact to reduce the number of ways accessed during L2 cache accesses. When the L1 data cache loads a data from the L2 From Fig. 4.1(b) write accesses account for the majority of L2 cache accesses in most applications. In addition, write hits are dominant among all write operations. Therefore, by activating fewer ways in most of the L2 write accesses, the proposed waytagged cache is very effective in reducing memory energy consumption. Fig. 3 shows the system diagram of proposed way-tagged cache. We introduce several new components: way-tag arrays, way-tag buffer, way decoder, and way register, all shown in the dotted line. The way tags of each cache line in the L2 cache are maintained in the way-tag arrays, located with the L1 data cache. Note that write buffers are commonly employed in writethrough caches (and even in many write-back caches) to improve the performance.With awrite buffer, the data to be written into the L1 cache is also sent to the write buffer. 5 The operations stored in the write buffer are then sent to the L2 cache in sequence. This avoids write stalls when the processor waits for write operations to be completed in the L2 cache. In the proposed technique, we also need to send the way tags stored in the way-tag arrays to the L2 cache along with the operations in the write buffer. Thus, a small way-tag buffer is introduced to buffer the way tags read from the way-tag arrays. A way decoder is employed to decode way tags and generate the enable signals for the L2 cache, which activate only the desired ways in the L2 cache. Each way in the L2 cache is encoded into a way tag. A way register stores way tags and provides this information to the way-tag arrays. As well as the data transfer protocol, the AXI protocol includes optional extensions that cover signaling for low-power operation. There are three types of AXI4 interfaces: AXI4—for high-performance memory-mapped requirements. AXI4-Lite—for simple, low-throughput memorymapped communication (for example, to and from control and status registers). AXI-Stream—for high-speed streaming data. AXI WORK: The AXI specifications describe an interface between a single AXI master and a single AXI slave, representing IP cores that exchange information with each other. Memory mapped AXI masters and slaves can be connected together using a structure called an Interconnect block. The Xilinx AXI Interconnect IP contains AXIcompliant master and slave interfaces, and can be used to route transactions between one or more AXI masters and slaves. READ AND WRITE ADDRESS CHANNELS: Read and write transactions each have their own address channel. The appropriate address channel carries all of the required address and control information for a transaction. READ DATA CHANNEL: Figure 3.3 Way tag arrays IV PROPOSED TECHNOLOGY A .AXI PROTOCOL AXI: AXI is part of ARM AMBA, a family of micro controller buses first introduced in 1996. The first version of AXI was first included in AMBA 3.0, released in 2003. AMBA 4.0, released in 2010, includes the second version of AXI, AXI4.The AMBA AXI protocol is targeted at highperformance, high-frequency system designs and includes a number of features that make it suitable for a high-speed submicron interconnect. THE OBJECTIVES OF THE LATEST GENERATION AMBA INTERFACE ARE TO: enable high-frequency operation without using complex bridges be suitable for high-bandwidth and low-latency designs. meet the interface requirements of a wide range of components be suitable for memory controllers with high initial access latency provide flexibility in the implementation of interconnect architectures be backwardcompatible with existing AHB and APB interfaces. THE KEY FEATURES : separate address/control and data phases,support for unaligned data transfers using byte strobes,burst-based transactions with only start address issued,separate read and write data channels to enable low-cost Direct Memory Access (DMA) ability to issue multiple outstanding addresses ,out-of-order transaction completion,easy addition of register stages to provide timing closure. The read data channel conveys both the read data and any read response information from the slave back to the master. The read data channel includes: The data bus, which can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide A read response indicating the completion status of the read transaction. Figure 4.1 Channel Architecture of Reads WRITE DATA CHANNEL: The write data channel conveys the write data from the master to the slave and includes: The data bus, which can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide One byte lane strobe for every eight data bits, indicating which bytes of the data bus are valid. Write data channel information is always treated as buffered, so that the master can perform write transactions without slave acknowledgement of previous write transactions. 6 WRITE RESPONSE CHANNEL: The write response channel provides a way for the slave to respond to write transactions. All write transactions use completion signaling. The completion signal occurs once for each burst, not for each individual data transfer within the burst. Figure 4.2 Channel Architecture of Writes AXI4-STREAM PROTOCOL: The AXI4-Stream protocol is used for applications that typically focus on a data-centric and data-flow paradigm where the concept of an address is not present or not required. Each AXI4-Stream acts as a single unidirectional channel for a handshake data flow. At this lower level of operation (compared to the memory mapped AXI protocol types), the mechanism to move data between IP is defined and efficient, but there is no unifying address context between IP. The AXI4-Stream IP can be better optimized for performance in data flow applications, but also tends to be more specialized around a given application space. The AXI4-Stream protocol defines a single channel for transmission of streaming data. The AXI4-Stream channel is modeled after the Write Data channel of the AXI4. Unlike AXI4, AXI4-Stream interfaces can burst an unlimited amount of data. There are additional, optional capabilities described in the AXI4-Stream Protocol Specification. The specification describes how AXI4Stream-compliant interfaces can be split, merged, interleaved, upsized, and downsized. Unlike AXI4, AXI4Stream transfers cannot be reordered. With regards to AXI4Stream, it should be noted that even if two pieces of IP are designed in accordance with the AXI4-Stream specification, and are compatible at a signaling level, it does not guarantee that two components will function correctly together due to higher level system considerations. READ BURST EXAMPLE: A read burst of four transfers. In this example, the master drives the address, and the slave accepts it one cycle later. After the address appears on the address bus, the data transfer occurs on the read data channel. The slave keeps the VALID signal LOW until the read data is available. For the final data transfer of the burst, the slave asserts the RLAST signal to show that the last data item is being transferred. The master also drives a set of control signals showing the length and type of the burst, but these signals are omitted from the figure for clarity. Figure4.3 Read burst WRITE BURST EXAMPLE: The process starts when the master sends an address and control information on the write address channel. The master then sends each item of write data over the write data channel. When the master sends the last data item, the WLAST signal goes HIGH. Figure 4.4.Write burst TRANSACTION ORDERING: The AXI protocol enables out-of-order transaction completion. It gives an ID tag to every transaction across the interface. The protocol requires that transactions with the same ID tag are completed in order, but transactions with different ID tags can be completed out of order. If a master requires that transactions are completed in the same order that they are issued, then they must all have the same ID tag. If, however, a master does not require in-order transaction completion, it can supply the transactions with different ID tags, enabling them to be completed in any order. In a multi master system, the interconnect is responsible for appending additional information to the ID tag to ensure that ID tags from all masters are unique. The ID tag is similar to a master number, but with the extension that each master can implement multiple virtual masters within the same port by supplying an ID tag to indicate the virtual master number. Although complex devices can make use of the out-of-order facility, simple devices are not required to use it. Simple masters can issue every transaction with the same ID tag, and simple slaves can respond to every transaction in order, irrespective of the ID tag. 7 4. V SIMULATION RESULTS 5. 6. 7. Figure5.1 Cache Output 8. 9. 10. Figure5.2 Cache waveforms 11. CONCLUSION This paper presents a new energy-efficient cache technique for high-performance microprocessors employing the write-through policy. The proposed technique attaches a tag to each way in the L2 cache. This way tag is sent to the way-tag arrays in the L1 cache when the data is loaded from the L2 cache to the L1 cache. Utilizing the way tags stored in the way-tag arrays, the L2 cache can be accessed as a direct-mapping cache during the subsequent write hits, thereby reducing cache energy consumption. In this proposed system using AXI protocol for data transmission and here developed the FIFO architecture. Main advantages for this project modification, reduce the power consumption up to 30%, reduce the execution time and no data lose if Cache enable or Disable. REFERENCES 1. 2. 3. G. Konstadinidis, K. Normoyle, S. Wong, S. Bhutani, H. Stuimer, T. Johnson, A. Smith, D. Cheung, F. Romano, S. Yu, S. Oh, V.Melamed, S. Narayanan, D. Bunsey, C. Khieu, K. J. Wu, R. Schmitt, A. Dumlao, M. Sutera, J. Chau, andK. J. Lin, “Implementation of a thirdgeneration 1.1-GHz 64-bit microprocessor,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1461–1469, Nov. 2002. S. Rusu, J. Stinson, S. Tam, J. Leung, H. Muljono, and B. Cherkauer, “A 1.5-GHz 130-nm itanium 2 processor with 6-MB on-die L3 cache,” IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 1887–1895, Nov. 2003. D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar, “A 4 MB on-chip L2 cache for a 90 nm 1.6 GHz 64 bit SPARC microprocessor,” in IEEE Int. SolidState Circuits Conf. (ISSCC) Dig. Tech. Papers, 2004, pp. 66–67. 12. 13. 14. 15. S. Segars, “Low power design techniques for microprocessors,” in Proc. Int. Solid-State Circuits Conf. Tutorial, 2001, pp. 268–273. [5] A. Malik, B. Moyer, and D. Cermak, “A low power unified cache architecture providing power and performance flexibility,” in Proc. Int. Symp. Low Power Electron. Design, 2000, pp. 241– 243. D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for architectural- level power analysis and optimizations,” in Proc. Int. Symp. Comput. Arch., 2000, pp. 83–94. J. Maiz, S. hareland, K. Zhang, and P.Armstrong, “Characterization of multi-bit soft error events in advanced SRAMs,” in Proc. Int. Electron Devices Meeting, 2003, pp. 21.4.1–21.4.4. K. Osada, K. Yamaguchi, and Y. Saitoh, “SRAM immunity to cosmic-ray-induced multierrors based on analysis of an induced parasitic bipolar effect,” IEEE J. Solid-State Circuits, pp. 827–833, 2004. F. X. Ruckerbauer and G. Georgakos, “Soft error rates in 65 nm SRAMs: Analysis of new phenomena,” in Proc. IEEE Int. On-Line Test. Symp., 2007, pp. 203–204. G. H.Asadi,V. Sridharan, M. B. Tahoori, andD.Kaeli, “Balancing performance and reliability in the memory hierarchy,” in Proc. Int. Symp. Perform. Anal. Syst. Softw., 2005, pp. 269–279. L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, “Soft error and energy consumption interactions: A data cache perspective,” in Proc. Int. Symp. Low Power Electron. Design, 2004, pp. 132–137. http://www.cs.ucr.edu/~junyang/teach/161_W06/slides/L 15-cache2.pdf http://www.authorstream.com/Presentation/rajendra.raju1199306-cache-memory/ http://news.softpedia.com/newsPDF/How-CacheMemory-Works-83803.pdf http://homepage.cs.uiowa.edu/~ghosh/4-8-08.pdf http://www.spec.org/cpu2000/research/umn/a BIOGRAPHIE N.Munikumar , received his Bachelor Degree in Electronics and Commuication Engineering from Jawaharlal Nehru Technological University Anantapur in the year 2012. Currently he is doing his Master’s Degree in VLSI and ES System Design in Jawaharlal Nehru Technological University Anantapur. His interested areas are Low Power VLSI Design, Digital Circuits Design and VLSI Technology Kuppam N Chandra Sekhar, received her Bachelor Degree in Electronics & Communication Engineering from Jawaharlal Nehru Technological University Hyderabad in the year 2010, Master’s Degree in VLSI System Design from Jawaharlal Nehru Technological University Hyderabad in the year 2013. Presently working as an Assistant Professor in ECE Department In GVIC at Madanapalle. His interested areas of research are Communication,VLSI System Design and Low Power VLSI Design. R.Mallikarjuna Reddy, received her Bachelor Degree in Electronics & Communication Engineering from Jawaharlal Nehru Technological University Hyderabad in the year 2004, Master’s Degree in VLSI System Design from Jawaharlal Nehru Technological University Hyderabad in the year 2008. Presently working as an Assistant Professor in ECE Department In GVIC at Madanapalle. His interested areas of research are Nano Electronics, VLSI System Design and Low Power VLSI Design. 8 9