Seoul National University Computer Architecture Project #2 Cache Simulator 1 Seoul National University Objectives To understand cache memory Organization Set associativity Operation Cache Read & Write, Hit & Miss LRU replacement policy Performance Hit/miss ratio, miss penalty To develop your own cache simulator Memory Access Pattern Cache Organization Display Option Cache Simulator Hit/Miss Performance 2 Seoul National University General Cache Organization (S, E, B) E = 2e lines per set set line If e = 1, “Direct Mapped Cache” else If s = 1, “Fully Associative Cache” else “E-Way Set Associative Cache” S = 2s sets v valid bit tag 0 1 2 B-1 Cache size: C = S x E x B data bytes B = 2b bytes per cache block (the data) 3 Seoul National University E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 0…01 100 find set 4 Seoul National University E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits compare both 0…01 100 valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset 5 Seoul National University E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits compare both 0…01 100 valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here No match : • One line in set is selected for eviction and replacement • Replacement policies: random, least recently used (LRU), … 6 Seoul National University LRU Replacement Policy Theoretically… Address 1 2 3 4 1 2 3 1 2 3 4 5 Set 1 2 3 4 1 2 3 1 2 3 4 5 1 2 3 4 1 2 3 1 2 3 4 1 2 3 4 1 2 3 1 2 3 Practically… 7 Seoul National University Performance (Average Access Time) = (Hit Time) + (Miss Rate) × (Miss Penalty) = (Hit Time) + [1 – (Hit Rate)] × (Miss Penalty) Example Suppose cache hit time is 1 cycle, Miss penalty is 100 cycles, and hit rate is 97%. Then average access time is: 1 cycle + ( 1 – 0.97 ) × 100 cycles = 1 + 0.03 × 100 = 4 cycles. 8 Seoul National University Requirements of the cache simulator (1) Cache simulator (hereinafter referred to CSIM) shall implement arbi trary numbers of sets and lines, and block size. You should implement a way to provide the numbers of sets and lines, and block size as inputs to CSIM. CSIM shall a read trace file line by line and process it. You should determine whether each memory operation is a cache hit or miss. You should implement the LRU replacement policy CSIM shall report the result of cache simulation. You should report these three basic results: numbers of Hits, misses, and evicts You should be able to report the average access time of cache simulation You should be able to report whether each memory access in trace file results in a cache hit or miss 9 Seoul National University Restrictions & Advices Implement method for input parameters. You should implement it by argument passing. (full credit) If you can’t, you can use standard input such as scanf(). (low credit) Evaluate only data cache performance. Therefore, you should ignore instruction load. You should assume that the memory accesses are aligned properly. Therefore, you can ignore requested size in trace file. You should evaluate your CSIM with, at least, 3 different trace data. You can use one provided with this project. Calculate average access time using below assumption: Hit time = 1 cycle, miss penalty = 100 cycles. Compile your CSIM without warnings. 10 Seoul National University How to trace memory accesses “valgrind” GPL licensed programming tool for memory debugging, memory leak detection, and profiling. (from http://en.wikipedia.org/wiki/Valgrind) Usage: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l – Valgrind prints out memory accesses of “ls -l” on stdout, so you need to capture it by: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l > ls.trace Output Format: [space]operation address,size Output Type Example Naccess [space] I 0400d7d4,8 Instruction load All instructions 1 X L 04f6b868,8 Data Load movl (%eax), %ebx 1 O S 7ff0005c8,8 Data Store movl %eax, (%ebx) 1 O M 0421c7f0,4 Data Modify incl (%ecx) 2 O 11 Seoul National University Reference Cache Simulator Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> -v: Optional verbose flag that displays trace info -s <s>: Number of set index bits (S = 2s is the number of sets) -E <E>: Associativity (number of lines per set) -b <b>: Number of block bits (B = 2b is the block size) -t <trace file>: Name of the valgrind trace to replay set line S = 2s sets v tag 0 1 2 Cache size: C = S x E x B data bytes B-1 valid bit B = 2b bytes per cache block (the data) 12 Seoul National University Cache Simulation Example (1) Usage: >>./csim [-v] -s <s> -E <E> -b <b> -t <trace file> Example: >>./csim -v -s 4 -E 1 -b 4 -t ./traces/yi.trace Number of set index bits = 4 (16 sets) Associativity = 1 (Direct Mapped Cache) Number of block bits = 4 (16 blocks in a cache line) Output L 10,1 miss M 20,1 miss hit …. hits: 4 misses:5 eviction: 3 13 Seoul National University Cache Simulation Example (2) Example memory access pattern Oper. Address Byte S V 0 I 1 I Load 0x10 1 Modify 0x20 1 2 I Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 14 Seoul National University Cache Simulation Example (3) R/W Address Byte S V 0 I 1 V Load 0x10 1 Modify 0x20 1 2 I Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Hit Miss 0 Evict 1 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x0 15 Seoul National University Cache Simulation Example (4) R/W Address Byte S V 0 I 1 V 0x0 0x0 Load 0x10 1 Modify 0x20 1 2 V Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Hit Miss 1 Evict 2 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 16 Seoul National University Cache Simulation Example (5) R/W Address Byte S V 0 I 1 V 0x0 0x0 Load 0x10 1 Modify 0x20 1 2 V Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Hit Miss 2 Evict 2 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 17 Seoul National University Cache Simulation Example (6) R/W Address Byte S V 0 I 1 V 0x0 0x0 Load 0x10 1 Modify 0x20 1 2 V Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Hit Miss 3 Evict 2 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 18 Seoul National University Cache Simulation Example (7) R/W Address Byte S V 0 I 1 V 0x1 0x0 Load 0x10 1 Modify 0x20 1 2 V Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Hit Miss 3 Evict 3 1 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 19 Seoul National University Cache Simulation Example (8) R/W Address Byte S V 0 I 1 V 0x2 0x0 Load 0x10 1 Modify 0x20 1 2 V Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I Modify 0x12 1 8 I 9 I A I B I C I D I E I F I Hit Miss 3 Evict 4 2 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 20 Seoul National University Cache Simulation Example (9) R/W Address Byte S V 0 I 1 V 0x0 0x0 Load 0x10 1 Modify 0x20 1 2 V Load 0x22 1 3 I Store 0x18 1 4 I 5 I Load 0x110 1 6 I Load 0x210 1 7 I 0x12 1 8 I 9 I A I B I C I D I E I F I Modify Hit Miss 4 Evict 5 3 Average Access Time = 1 + (5 / 9) * 100 = 56.5 Cycle Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 21 Seoul National University 보고서 작성요령 (1) 설계 시험 아래의 내용을 포함할 것 설계 요구사항 구현 제시된 CSIM의 설계 요구사항을 자신의 CSIM에 맞춰 재정의 구현 자신의 CSIM이 어떤 식으로 동작하며 어떻게 설계 요구사항을 반영하는지 서술 자신의 CSIM의 사용법과 시뮬레이션 결과 출력 방법에 대해 서술 시험 CSIM의 요구사항을 어떤 방법으로 검증하였는지 서술 최소 3가지 Trace Data를 이용하여 검증 수행 추가적으로, Trace Data를 어떤 방법으로 얻었는지를 서술 CSIM 구현 내용을 알 수 있도록 캡쳐된 이미지를 첨부할 것 22 Seoul National University 보고서 작성요령 (2) Design Testing 아래의 내용을 포함할 것 성능 평가 Coding 각각의 Cache 구조 (direct mapped, E-way set associative 및 fully associative cache)별로 성능을 측정하고 각각을 비교할 것 23 Seoul National University 평가기준 Title CSIM Pts. Description 70 10 Warning: 각 -0.5 pt. / Error: 각 -1 pt. Parameter Input 10 Argument Passing: 10 pts., Other methods: 5 pts. Cache Operation 성능 평가 주석 제출지연 30 Details 제출 Cache Organization 보고서 Pts. 5 20 5 10 설계 요구사항 7 구현 7 시험 8 성능 평가 8 매 1일 당 -5 Dynamic allocation 사용 시: 5 pts. - 배열 사용 시: 2 pts. Hit/miss의 정확한 처리: 10pts. Replacement policy (LRU): 5 pts - implementing random replacement: 3pts. 각각의 Memory Access에 대한 결과 (Hit/Miss) 시현: 4pts. - 결과 시현 여부를 선택할 수 있는 옵션 제공: 1pts. 정확한 Average Access Time의 제공 최대한 각각의 라인에 주석을 제공 제출 기한 1주일까지 제출 가능 24 Seoul National University 제출방법 아래 제출 목록의 산출물들을 메일로 제출 E-mail address: yonghunlee@archi.snu.ac.kr E-mail 제목: “[CSIM]학번_이름” 산출물들은 “학번_이름.zip” 또는 “학번_이름.tar”으로 압축하여 제출 제출 목록 CSIM source code Project 보고서 CSIM의 검증 시 사용한 Trace file 제출 기한 : ’13. 12. 18(수) 23:59 까지 25