MPSIM

advertisement
Linköping University
Department of Computer and Information Science
Assignment Report № 2
Course: TDDI08 Embedded Systems Design
Simulation-Based Design-Space Exploration for the
Deployment of Embedded Software on Multiprocessor
Platforms
Solved by:
/Deyan Levski Dimitrov/
p-number: 880718-P196
group:
22.02.2010, Linköping
Approved by:
/Adrian Alin Lifa/
1. Assignment 1: Simulation Based Design Space Exploration for Energy
Minimization
In this assignment I have used the MPARM platform for the optimization of a
multimedia application which is running on one processor.
I have done a series of simulations with different hardware configurations, giving
different results. The table below shows the tried configurations and the obtained results:
Default simulator settings:
Instruction cache : 4096 bytes direct-mapped
Data cache: 4096 bytes direct-mapped
Processor frequency: maximal
Execution time: 3,43ms
Energy consumption: 34,369µJ
Simulation No:1
Instruction cache : 128 bytes; direct-mapped; 0 wait states
Data cache: 128 bytes; direct-mapped; 0wait states
Processor frequency: 1/2Fmax
Execution time: 10,22ms
Energy consumption: 48,48µJ
Simulation No:2
Instruction cache : 1024 bytes; direct-mapped; cache miss rate:8,5%
Data cache: 1024 bytes; direct-mapped; cache miss rate:0,81%
Processor frequency: 1/2Fmax
Execution time: 3,7ms
Energy consumption: 20,169µJ
Simulation No:3
Instruction cache : 1024 bytes; 8 way associative; cache miss rate:25,2%
Data cache: 1024 bytes; 8 way associative; cache miss rate:2,24%
Processor frequency: 1/2Fmax
Execution time: 3,53ms
Energy consumption: 26,86µJ
Simulation No:4
Instruction cache : 1024 bytes; direct-mapped; cache miss rate:1,86%
Data cache: 4096 bytes; direct-mapped; cache miss rate:0,81%
Processor frequency: 1/2Fmax
Execution time: 3,51ms
Energy consumption: 19,83µJ
Simulation No:5
Instruction cache : 512 bytes; direct-mapped; cache miss rate:0,92%
Data cache: 4096 bytes; direct-mapped; cache miss rate:1,86%
Processor frequency: 1/2Fmax
Execution time: 3,56ms
Energy consumption: 19,11µJ
Simulation No:6
Instruction cache : 512 bytes; direct-mapped; cache miss rate:0,92%
Data cache: 8192 bytes; direct-mapped; cache miss rate:1,83%
Processor frequency: 1/2Fmax
Execution time: 3,56ms
Energy consumption: 19,8µJ
Simulation No:7
Instruction cache : 512 bytes; direct-mapped; cache miss rate:0,92%
Data cache: 4096 bytes; direct-mapped; cache miss rate:1,86%
Processor frequency: 1/2Fmax
Execution time: 3,56ms
Energy consumption: 19,11µJ
Processor frequency: 1/3Fmax
Execution time: 5,06ms
Energy consumption: 18,22µJ
Processor frequency: 1/4Fmax
Execution time: 6,57ms
Energy consumption: 18,13µJ
Processor frequency: 1/5Fmax
Execution time: 8,22ms
Energy consumption: 18,13µJ
Conclusion:
The best configuration obtained is using a 512bytes instruction cache, direct mapped,
4096bytes instruction cache, direct mapped, processor frequency 1/4Fmax (50MHz).
The obtained execution time is 6,57ms and the energy consumption: 18,13µJ.
The parameters causing the greatest impact of the energy consumption is the cache size and
the processor frequency. Using a small cache size leads to cash misses, which consumes more
energy for the same tasks, while on the other hand the big cache size itself consumes more
power due to the not needed supplying memory cells which are not used. It is essential for the
design of low power embedded systems that a certain balance is obtained between the
execution time and power consumption.
2. Assignment 2.2: Shared Memory vs. Distributed Message Passing
Comparing the efficiency of two communication approaches: shared memory and distributed
message passing. The comparison is based on simulation results of a GSM voice codec,
implemented using the two alternatives:
Distributed Message Passing Simulation Results:
Energy Consumption: 605,463µJ
Execution Time: 9,97ms
Total Bus Accesses: 313766
Processor 0 Bus Activity Cycles: 1911923
Processor 1 Bus Activity Cycles: 3094263
Processor 2 Bus Activity Cycles: 3031546
Shared Memory Simulation Results:
Energy Consumption: 894,58µJ
Execution Time: 14,12ms
Total Bus Accesses: 383962
Processor 0 Bus Activity Cycles: 3473420
Processor 1 Bus Activity Cycles: 2513179
Processor 2 Bus Activity Cycles: 3462305
The Shared Memory communication approach uses more power and runs slower due to the
more bus accesses to the memory causing more processor load and bus load, while using the
Distributed Message Passing approach does not use the bus for intercommunication between
the processors, which leads to faster execution time and lower energy consumption.
Synchronization in the shared memory implementation by frequency selection.
With the changing of the processor frequency we can reduce the bus accesses and reduce the
amount of the current drawn from the voltage source to a certain extent. Below are shown
results from simulations with reduced processor frequency.
1/2Fmax:
Execution Time: 26,23ms
Energy consumption: 589,2µJ
Total Bus Accesses: 350572
1/3Fmax:
Execution Time: 37,69ms
Energy consumption: 578,65,2µJ
Total Bus Accesses: 353962
1/4Fmax:
Execution Time: 49,22ms
Energy consumption: 580,73µJ
Total Bus Accesses: 356011
3. Assignment 3.1 Scheduling Exercise
Initial task graph and schedule:
Optimized task graph and schedule:
4. Assignment 3.2: Extracting the Execution Times for the GSM Voice Codec
with MPARM
GSM Encoding Process:
TASK:
Init
GetInputAudio
Preprocess
LPC_Analysis
ShortTermAnalysisFilter
LongTermPredictor1
RPE_Encoding2
Add
LongTermPredictor2
RPE_Encoding2
Add
LongTermPredictor3
RPE_Encoding3
Add
LongTermPredictor4
RPE_Encoding4
Add
Encode
Output
EXECUTION CYCLES:
67
6486
14479
51702
92058
64821
11579
1416
66947
10732
1318
66263
10694
1318
66086
10716
1318
67094
3549
EXECUTION TIME /ns/:
13,4
1297,2
2895,8
10340,4
18411,6
12964
2315
283
13289,4
2146
263
13252
2138
263,6
13217
2148
263,6
13418
709
GSM Decoding Process:
TASK:
Init
GetInputAudio
RPE_Decoding1
LongTermSynthesis1
RPE_Decoding2
LongTermSynthesis2
RPE_Decoding3
LongTermSynthesis3
RPE_Decoding4
LongTermSynthesis4
ShortTermSynthesisFilter
Postprocessing
Output
EXECUTION CYCLES:
81
1956
6529
5715
2158
5491
2116
5491
2094
5463
108969
7498
1606
EXECUTION TIME /ns/:
16,2
391,2
1305,8
1143
431,6
1098,2
423,2
1098,2
418,8
1092,6
21793,8
1499,6
321,2
Download