Assignment 1: Review of two research papers (done individually) Read the papers carefully and write a personally written review on each paper based on the issues below (about A4 page normal sized text per paper, 300 - 700 words). The analysis must be expressed in your own words. PAPER 1 Worst-Case Execution Time Analysis for Dynamic Branch Predictors Author, Title: Iain Bate and Ralf Reutemann Department of Computer Science, University of York York, United Kingdom e-mail: {ijb,ralf}@cs.york.ac.uk Is the paper well organized? In this paper everything is well organized like in abstract it is clearly written that nowadays branch prediction is commonly used in modern microprocessor. There are some problems in predictability like what should we predict first. Introduction is also clear and briefly describes worst case execution time (WCET) and how prediction branch occur. This paper also describes the branch prediction techniques in detail and how they work. Facts and figures are also understandable and clear so all mathematical notations algorithms defines the efficiency of each prediction scheme. Conclusion also describes the all related work, comparison and tells which prediction algorithm is suitable for which condition. References are also from different scientific papers, books and seminars are well written and Comment the following sections (if present): Title As long as title of this scientific paper concern it describes the general comparison of all the methods of branch prediction not especially dynamic branch prediction. So title should be worst case execution time for branch predictors. Abstract It is well written as it mention the problem of branch predictor is predictability. And what they are going to do is making comparison between bimodal and global history branch prediction. They also mention that it is easily to predict or hard to predict for semantic context of the branches in the source code. Introduction Introduction is well written and clear it describes the main problem of branch predictor that is all the tasks are able to meet their respective deadlines. It describes many commercially available microprocessors which are not designed for real time systems have branch predictability by their schedulable analysis. It also describes the procedure of branch predictor. Main section(s) Main section also describes related work, comparison, mathematical analysis, techniques and statistics of all the algorithms and different types of branch prediction techniques. It also describes some statistical behavior of different branch prediction techniques. Summary This paper summarized that static WCET analysis of dynamic is feasible. And tells which branch predictors are better for which condition to achieve better performance. Conclusions, and This paper describes all foundation of static analysis of branch predictor. It also describes references of authors where they describe the models of different branch prediction techniques. References. References are helpful and informative to see others papers and their ideas. There are also references of workshops and books those are helpful and can fine more related topics and their perspective. Comment on the language used in the paper. Language is fine and clear to understand spatially facts and figures are informative and clear to understand. General comments to the paper. Overall paper is informative and have good link between previous section and next session. But language is technical and mathematical, but it is scientific paper so it should have this technicality. This research paper is for those who have enough knowledge of their relative field. Paper 2 Title: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers Author, Norman P. Jouppi Digital Equipment Corporation Westem Research Lab 100 Hamilton Ave.. Palo Alto, CA 94301 • Is the paper well organized? I studied whole paper so this paper is well organized but description or explanation required more whole the work good for title “Improving Direct-Mapped Cache Performance by the Addition of a Small FullyAssociative Cache and Prefetch Buffers” so interesting topic. • Title, The title of this IEEE paper (Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers) is good because it explains whole the phenomenon of this paper when a person read its title then he could understand the idea, purpose and the technology working behind it so, it may be edited some more information related to topic e.g.” Improving DirectMapped Cache Performance Controlled by the Addition of a Small Fully-Associative Cache and Prefetch Buffers” • Abstract, In the abstract the author defines about the miss caching, victim caching, stream buffers and multi-way stream buffers. It also defines the pipelined using in stream buffering. And explain how to use the techniques to reduce the conflict of cache. Author uses the multi-way stream buffers for the replacement of pipelined to decrease the rate of confliction of cache then we get better processing speed using these kinds of techniques. • Introduction, In the introduction author well to says comparatively that, if we using a different kind of machines like VAX11/780 WRL Titan so according to table 1-1 whenever we increase the cycles per instruction then the miss cost instruction will automatically high with cycle and memory time high to low but miss cycles cost is high but whenever the number of cycles per instruction is high then cost of miss cycles cost instruction is low. • Main section(s), 2. Baseline Design In baseline the author explains how to increase the speed of data processing he had taken the CMOS and GaAs for under consideration to elaborate about the large and small cache and described the design of baseline in Fig: 2-1. To more elaboration including a test program in Table 2-1. 3. Reducing Conflict Misses: Miss Caching and Victim caching It includes four categories Conflict, Compulsory, Capacity and Coherence. More detail is given below. 3.1. Miss Caching In this section author discuss the worst, average, and the best case to avoiding conflict of caches e.g. assuming at least 60 different instructions are executed in each procedure, the conflict misses would span more than the 15 lines in the maximum size miss cache tested. In other words, a small miss cache could not contain the entire overlap and so would be reloaded repeatedly before it could be used. This type of reference pattern exhibits the worst miss cache performance also defines in Fig: 3-3 3.2. Victim Caching Victim cache is used to reduce the miss cache because of miss cache replaced by victim cache so that we can say that the victim cache is the main step to reduce the confliction of cache. 3.3. The Effect of Direct-Mapped Cache Size on Victim Cache Performance Whenever the victim cache entries are increasing then the confliction of miss cache is reduces by the effect of direct-mapped cache size on victim cache performance. 3.4. The Effect of Line Size on Victim Cache Performance The effect of line size on victim cache performance is still improved whenever the number of lines and size of line increases then the conflict of miss cache decreases because of victim cache. 3.5. Victim Caches and Second-Level Caches When we use a cache for normal level use then we use first level victim cache but when we use a cache for higher memory then we use second level caches. 4. Reducing Capacity and Compulsory Misses There are three kind of Prefetch algorithm to reducing capacity and compulsory misses Prefetch on miss, tagged Prefetch, Prefetch always. It based on the second level cache system so the tagged cache may be many head start on giving instructions . 4.1. Stream Buffers A stream buffer is mainly used in start the Prefetch before tagged transition can take place. The sequential buffer streaming defines in fig: 4-2 every line have a tag and its address which it stored in buffers. Because of second level used it uses maximum bandwidth so it reduces the conflict. 4.2. Multi-Way Stream Buffers Multi-way stream buffer is a technique more reduces the chance of confliction (it changes from 7% to 60% reduction) the matrices are used reduces the chance of miss cache as introduce in Fig: 4-4. 4.3. Stream Buffer Performance vs. Cache Size The instruction stream buffers have remarkably constant performance over a wide range of cache sizes. The data stream buffer performance generally improves as the cache size increases. This is especially true for the single stream buffer if a cache size increase the number of confliction is also increases. 4.4. Stream Buffer Performance vs. Line Size The single data stream buffer performance is especially hard hit compared to the multi-way stream buffer because of the increase in conflict misses at large line sizes. • Summary , Miss caching is due to the confliction of the cache. Baseline the author explains how to increase the speed of data processing. Includes four categories Conflict, Compulsory, Capacity and coherence. Small miss cache could not contain the entire overlap and so would be reloaded repeatedly before it could be used. Victim cache is used to reduce the miss cache because of miss cache replaced by victim cache. A stream buffer is mainly used in start the Pre-fetch before tagged transition can take place. Multi-way stream buffer is a technique more reduces the chance of confliction (it changes from 7% to 60% reduction) the matrices are used reduces the chance of miss cache. • Conclusions Victim caches are an improvement to miss caching that saves the victim of the cache miss instead of the target in a small associative cache. Victim caches are even more effective at removing conflict misses than miss caches. Multi-way stream buffers are a set of stream buffers that can prefetch down several streams concurrently. Multi-way stream buffers are useful for data references that contain interleaved accesses to several different large data structures, such as in array operations. This study has concentrated on applying victim caches and stream buffers to first-level caches. An interesting area for future work is the application of these techniques to second-level caches. Also, the numeric programs used in this study used unit stride access patters. Numeric programs with non-unit stride and mixed stride access patterns also need to be simulated. Finally, the performance of victim caching and stream buffers needs to be investigated for operating system execution and for multiprogramming workloads. • References Reference is to good precious and overcome the requirement. • Comment on the language used in the paper. The language used in the paper is technical and also a user friendly but it may be improved by more elaborating, user friendly and writing information about each process.