Linköping University Department of Computer and Information Science Assignment Report № 2 Course: TDDI08 Embedded Systems Design Simulation-Based Design-Space Exploration for the Deployment of Embedded Software on Multiprocessor Platforms Solved by: /Deyan Levski Dimitrov/ p-number: 880718-P196 group: 22.02.2010, Linköping Approved by: /Adrian Alin Lifa/ 1. Assignment 1: Simulation Based Design Space Exploration for Energy Minimization In this assignment I have used the MPARM platform for the optimization of a multimedia application which is running on one processor. I have done a series of simulations with different hardware configurations, giving different results. The table below shows the tried configurations and the obtained results: Default simulator settings: Instruction cache : 4096 bytes direct-mapped Data cache: 4096 bytes direct-mapped Processor frequency: maximal Execution time: 3,43ms Energy consumption: 34,369µJ Simulation No:1 Instruction cache : 128 bytes; direct-mapped; 0 wait states Data cache: 128 bytes; direct-mapped; 0wait states Processor frequency: 1/2Fmax Execution time: 10,22ms Energy consumption: 48,48µJ Simulation No:2 Instruction cache : 1024 bytes; direct-mapped; cache miss rate:8,5% Data cache: 1024 bytes; direct-mapped; cache miss rate:0,81% Processor frequency: 1/2Fmax Execution time: 3,7ms Energy consumption: 20,169µJ Simulation No:3 Instruction cache : 1024 bytes; 8 way associative; cache miss rate:25,2% Data cache: 1024 bytes; 8 way associative; cache miss rate:2,24% Processor frequency: 1/2Fmax Execution time: 3,53ms Energy consumption: 26,86µJ Simulation No:4 Instruction cache : 1024 bytes; direct-mapped; cache miss rate:1,86% Data cache: 4096 bytes; direct-mapped; cache miss rate:0,81% Processor frequency: 1/2Fmax Execution time: 3,51ms Energy consumption: 19,83µJ Simulation No:5 Instruction cache : 512 bytes; direct-mapped; cache miss rate:0,92% Data cache: 4096 bytes; direct-mapped; cache miss rate:1,86% Processor frequency: 1/2Fmax Execution time: 3,56ms Energy consumption: 19,11µJ Simulation No:6 Instruction cache : 512 bytes; direct-mapped; cache miss rate:0,92% Data cache: 8192 bytes; direct-mapped; cache miss rate:1,83% Processor frequency: 1/2Fmax Execution time: 3,56ms Energy consumption: 19,8µJ Simulation No:7 Instruction cache : 512 bytes; direct-mapped; cache miss rate:0,92% Data cache: 4096 bytes; direct-mapped; cache miss rate:1,86% Processor frequency: 1/2Fmax Execution time: 3,56ms Energy consumption: 19,11µJ Processor frequency: 1/3Fmax Execution time: 5,06ms Energy consumption: 18,22µJ Processor frequency: 1/4Fmax Execution time: 6,57ms Energy consumption: 18,13µJ Processor frequency: 1/5Fmax Execution time: 8,22ms Energy consumption: 18,13µJ Conclusion: The best configuration obtained is using a 512bytes instruction cache, direct mapped, 4096bytes instruction cache, direct mapped, processor frequency 1/4Fmax (50MHz). The obtained execution time is 6,57ms and the energy consumption: 18,13µJ. The parameters causing the greatest impact of the energy consumption is the cache size and the processor frequency. Using a small cache size leads to cash misses, which consumes more energy for the same tasks, while on the other hand the big cache size itself consumes more power due to the not needed supplying memory cells which are not used. It is essential for the design of low power embedded systems that a certain balance is obtained between the execution time and power consumption. 2. Assignment 2.2: Shared Memory vs. Distributed Message Passing Comparing the efficiency of two communication approaches: shared memory and distributed message passing. The comparison is based on simulation results of a GSM voice codec, implemented using the two alternatives: Distributed Message Passing Simulation Results: Energy Consumption: 605,463µJ Execution Time: 9,97ms Total Bus Accesses: 313766 Processor 0 Bus Activity Cycles: 1911923 Processor 1 Bus Activity Cycles: 3094263 Processor 2 Bus Activity Cycles: 3031546 Shared Memory Simulation Results: Energy Consumption: 894,58µJ Execution Time: 14,12ms Total Bus Accesses: 383962 Processor 0 Bus Activity Cycles: 3473420 Processor 1 Bus Activity Cycles: 2513179 Processor 2 Bus Activity Cycles: 3462305 The Shared Memory communication approach uses more power and runs slower due to the more bus accesses to the memory causing more processor load and bus load, while using the Distributed Message Passing approach does not use the bus for intercommunication between the processors, which leads to faster execution time and lower energy consumption. Synchronization in the shared memory implementation by frequency selection. With the changing of the processor frequency we can reduce the bus accesses and reduce the amount of the current drawn from the voltage source to a certain extent. Below are shown results from simulations with reduced processor frequency. 1/2Fmax: Execution Time: 26,23ms Energy consumption: 589,2µJ Total Bus Accesses: 350572 1/3Fmax: Execution Time: 37,69ms Energy consumption: 578,65,2µJ Total Bus Accesses: 353962 1/4Fmax: Execution Time: 49,22ms Energy consumption: 580,73µJ Total Bus Accesses: 356011 3. Assignment 3.1 Scheduling Exercise Initial task graph and schedule: Optimized task graph and schedule: 4. Assignment 3.2: Extracting the Execution Times for the GSM Voice Codec with MPARM GSM Encoding Process: TASK: Init GetInputAudio Preprocess LPC_Analysis ShortTermAnalysisFilter LongTermPredictor1 RPE_Encoding2 Add LongTermPredictor2 RPE_Encoding2 Add LongTermPredictor3 RPE_Encoding3 Add LongTermPredictor4 RPE_Encoding4 Add Encode Output EXECUTION CYCLES: 67 6486 14479 51702 92058 64821 11579 1416 66947 10732 1318 66263 10694 1318 66086 10716 1318 67094 3549 EXECUTION TIME /ns/: 13,4 1297,2 2895,8 10340,4 18411,6 12964 2315 283 13289,4 2146 263 13252 2138 263,6 13217 2148 263,6 13418 709 GSM Decoding Process: TASK: Init GetInputAudio RPE_Decoding1 LongTermSynthesis1 RPE_Decoding2 LongTermSynthesis2 RPE_Decoding3 LongTermSynthesis3 RPE_Decoding4 LongTermSynthesis4 ShortTermSynthesisFilter Postprocessing Output EXECUTION CYCLES: 81 1956 6529 5715 2158 5491 2116 5491 2094 5463 108969 7498 1606 EXECUTION TIME /ns/: 16,2 391,2 1305,8 1143 431,6 1098,2 423,2 1098,2 418,8 1092,6 21793,8 1499,6 321,2