Voltage Assignment with Guaranteed Probability Satisfying Timing Constraint for Real-time Multiproceesor DSP

advertisement
1
Voltage Assignment with Guaranteed Probability
Satisfying Timing Constraint for Real-time
Multiproceesor DSP
Meikang Qiu, Zhiping Jia, Chun Xue, Zili Shao, Edwin H.-M. Sha
Abstract
Dynamic Voltage Scaling (DVS) is one of the techniques used to obtain energy-saving in real-time DSP
systems. In many DSP systems, some tasks contain conditional instructions that have different execution times for
different inputs. Due to the uncertainties in execution time of these tasks, this paper models each varied execution
time as a probabilistic random variable and solves the Voltage Assignment with Probability (VAP) Problem. VAP
problem involves finding a voltage level to be used for each node of an date flow graph (DFG) in uniprocessor
and multiprocessor DSP systems. This paper proposes two optimal algorithms, one for uniprocessor and one
for multiprocessor DSP systems, to minimize the expected total energy consumption while satisfying the timing
constraint with a guaranteed confidence probability. The experimental results show that our approach achieves
significant energy saving than previous work. For example, our algorithm for multiprocessor achieves an average
improvement of 56.1% on total energy-saving with 0.80 probability satisfying timing constraint.
Index Terms
Probability, assignment, DVS, real-time, DSP
I. I NTRODUCTION
The increasingly ubiquitous DSP systems pose great challenges different from those faced by generalpurpose computers. DSP systems are more application specific and more constrained in terms of power,
timing, and other resources. Energy-saving is a critical issue and performance metric in DSP systems
design due to wide use of portable devices, especially those powered by batteries [1]–[6]. The systems
become more and more complicate and some tasks may not have fixed execution time. Such tasks usually
contain conditional instructions and/or operations that could have different execution times for different
M. Qiu, C. Xue, and E. H.-M. Sha are with the Department of Computer Science, University of Texas at Dallas, Richardson, Texas
75083, USA. Email: mxq012100, cxx016000, edsha @utdallas.edu. Z. Jia is with School of Computer Science and Technology, Shangdong
University, Jinan, Shangdong, China. Email:zhipingj@sdu.edu.cn. Z. Shao is with the Department of Computing, Hong Kong Polytechnic
University, Hung Hom, Kowloon, Hongkong. Email:cszshao@comp.polyu.edu.hk.
2
inputs [7]–[11]. It is possible to obtain the execution time distribution for each task by sampling and
knowing detailed timing information about the system or by profiling the target hardware [12]. Also
some multimedia applications, such as image, audio, and video data streams, often tolerate occasional
deadline misses without being noticed by human visual and auditory systems. For example, in packet
audio applications, loss rates between 1% - 10% can be tolerated [13].
Prior design space exploration methods for hardware/software codesign of DSP systems [14]–[17]
guarantee no deadline missing by considering worst-case execution time of each task. Many design
methods have been developed based on worst-case execution time to meet the timing constraints without
any deadline misses. These methods are pessimistic and are suitable for developing systems in a hard
real-time environment, where any deadline miss will be catastrophic. However, there are also many soft
real-time systems, such as heterogeneous systems, which can tolerate occasional violations of timing
constraints. The above pessimistic design methods can’t take advantage of this feature and will often
lead to over-designed systems that deliver higher performance than necessary at the cost of expensive
hardware, higher energy consumption, and other system resources.
There are several papers on the probabilistic timing performance estimation for soft real-time systems
design [7]–[12], [18], [19]. The general assumption is that each task’s execution time can be described
by a discrete probability density function that can be obtained by applying path analysis and system
utilization analysis techniques. Hu et al. [8] propose a state-based probability metric to evaluate the
overall probabilistic timing performance of the entire task set. However, their evaluation method becomes
very time consuming when task has many different execution time variations. Hua et al. [9], [10] propose
the concept of probabilistic design where they design the system to meet the timing constraints of periodic
applications statistically. But their algorithm is not optimal and only suitable to uniprocessor executing
tasks according to a fixed order, that is, a simple path. In this paper, we will propose a optimal algorithm
for the uniprocessor situation. Also, we will give a optimal algorithm for multiprocessor executing tasks
according to an executing order in DAG (Direct Acyclic Graph).
Low power and low energy consumptions are extremely important for real-time DSP systems. Dynamic
voltage scaling (DVS) is one of the most effective techniques to reduce energy consumption [20]–[23].
In many microprocessor systems, the supply voltage can be changed by mode-set instructions according
to the workload at run-time. With the trend of multiple cores being widely used in DSP systems, it is
important to study DVS techniques for multiprocessor DSP systems. This paper focuses on minimizing
expected energy consumption with guaranteed probability satisfying timing constraints via DVS for realtime multiprocessor DSP systems.
In this paper, we use probabilistic design space exploration and DVS to avoid over-design systems.
3
We propose two novel optimal algorithms, one for uniprocessor and one for multiprocessor DSP systems,
to minimize the expected value of total energy consumption while satisfying timing constraints with
guaranteed probabilities for real-time applications. Our work is related to the work in [9], [10]. In [9],
[10], Hua et al. proposed an heuristic algorithm for uniprocessor and the Data Flow Graph (DFG) is a
simple path. We call the offline part of it as HUA algorithm for convenience. We also apply the greedy
method of HUA algorithm to multiprocessor and call the new algorithm as Heu.
Our contributions are listed as the following:
When there is a uniprocessor, the results of our algorithm, VAP S, gives the optimal solution and
achieves significant energy saving than HUA algorithm.
For the general problem, that is, when there are multiple processors and the DFG is a DAG, our
algorithm, VAP M, gives the optimal solution and achieves significant average energy reduction than
Heu algorithm.
Our algorithms not only are optimal, but also provide more choices of smaller expected value of
total energy consumption with guaranteed confidence probabilities satisfying timing constraints. In
many situations, algorithms HUA and Heu cannot find a solution, yet ours can find satisfied results.
Our algorithms are practical and quick. In practice, when the number of multi-parent nodes and multichild nodes in the given DFG graph is small, and the timing constraint is polynomial to the size of
DFG, the running times of these algorithms are very small and our experiments always finished in
very short time.
We conduct experiments on a set of benchmarks, and compare our algorithms with HUA and Heu algorithms Experiments show that our algorithm for uniprocessor, VAP S, has an average 58.0% energy-saving
improvement with probability 0.8 satisfying timig constraint compared with the greedy algorithm HUA.
Our algorithm for multiprocessor and DAG, VAP M, has an average 56.1% energy-saving improvement
compared with the results of the heuristic algorithm Heu.
The rest of the paper is organized as following: The models and basic concepts are introduced in
Section II. In SectionIII, we give motivational examples. In Section IV, we propose our algorithms. The
experimental results are shown in Section V, and the conclusion is shown in Section VI.
II. M ODELS
AND
C ONCEPTS
In this section, we introduce the system model, the energy model, and VAP problem that will be used
in the later sections.
System Model:
We focus on real time applications on single-processor and multiprocessor DSP systems. Probabilistic Data-Flow Graph (PDFG) is used to model a embedded systems application. A PDFG G
4
is a directed acyclic graph (DAG), where V
!#"$
is the set of nodes; R
is a voltage set; the execution time %'&)(*,+.- is a random variable; E /
edge set that defines the precedence relations among nodes in
102
is the
. There is a timing constraint 3 and it must
be satisfied for executing the whole PDFG. In the multiprocessor system, each processor has multiple
discrete levels of voltages and its voltage level can be changed independently by voltage-level-setting
instructions without the influence for other processors.
Energy Model:
Dynamic power, which is the dominant source of power dissipation in CMOS circuit, is proportional
to
4
065!7809:
;<;
, where
=
represent the number of computation cycles for a node,
>@?A
is the effective
switched capacitance, and B6CC is the supply voltage. Reducing the supply voltage can result in substantial
power and energy saving. Roughly speaking, system’s power dissipation is halved if we reduce
;<;
by
30% without changing any other system parameters. However, this saving comes at the cost of reduced
throughput, slower system clock frequency, or higher cycle period time (gate delay).
We use the similar energy model as in [21]–[23]. Let
5
stand for energy consumption (since
proportional to
H
, where
EFGF
E FGFI EJLKMON
QPOR
its computation time
*
-
has been used to represent edge). The cycle period time
is the threshold voltage and
constant. Given the number of cycles
5
*
-
for node
0c
a`
b
; ;ed
* <
*
SUTV*XWZY\[
of node , the supply voltage
4
and the energy
represent the execution time of a node and
-
0ga`
4
5
0
*
-
4
;<;
QPOR
]
;<;
Y\[^
is
%D
is a technology dependent
and the threshold voltage
_POR
,
are calculated as follows:
-<f
(1)
0h
b
4
; ;d
* X
0h5!78i0c :
;X;
;<;
)POR
- f
(2)
(3)
In Equation (1), k is a device related parameter. From Equations (2) and (3), we can see that the lower
voltage will prolong the execution time of a node but reduces its energy consumption. In this paper, we
assume that there is no energy or delay penalty associated with voltage switching and the energy leakage
is very small.
DVS (Dynamic voltage scaling) is a technique that varies system’s operating voltages and clock
frequencies based on the computation load to provide desired performance with the minimum energy
consumption. It has been demonstrated as one of the most effective low power system design techniques
and has been supported by many modern microprocessors. Examples include Transmeta’s Crusoe, AMD’s
K-6, Intel’s XScale and Pentium III and IV, and some DSPs developed in Bell Labs [9].
5
Low power design is of particular interest for the soft real time multimedia systems and we assume
that our system has multiple voltages available on the chip such that the system can switch from one level
to another. For the energy and delay overhead associated with multiple-voltage systems, we assume that
voltage scaling occurs simultaneously without any energy and delay overhead.
VAP problem:
For multiple-voltage systems, assume there are maximum
:
2"@
. For each voltage, there are maximum
different voltages in a voltage set
execution time variations, although each node
may have different execution time variations. An assignment for a PDFG G is to assign a processor to
each node. Define an assignment A to be a function from domain V to range R, where V is the node set
and R is the voltage set. For a node
T
,
$* -
gives selected voltage level of node . In a PDFG G,
;
& ( *,+.- , W
each varied execution time is modeled as a probabilistic random variable,
the execution times of each node
T
when running at voltage level
the corresponding probability function. For each voltage
* %
&)(*,+.-
,
, represents
W
, represents
with respect to node , there is a set of pairs
- , W , where is the probability and is the energy consumption of each node in PDFG.
>i& ( *,+.- , W
, is used to represent the expected value of energy consumption of each node T
!
5 , which is a fixed value. Because of the linearity of expected values, we
at voltage ,
* can define the total energy consumption for a system. Given an assignment A of a PDFG G, we define the
* - , to be the summation
#" E 5$ H% M * - . In this paper we
system expected total energy consumption under assignment A, denoted as
H * - T , of all nodes, that is, 5 !* iof energy consumptions,
M
5$
*&'- total energy consumption in brief.
call
5
For the input PDFG
, given an assignment
* !* i- ,+-/.10 8 H% *
M
, assume that
' (* %
-
stands for the execution time of
) *( - can be obtained from the longest path in . The new variable
H% * - , where
#" 8 H% * - , is also a random variable. The minimum
graph G under assignment A.
>
%
M
M
expected total energy consumption C with confidence probability P under timing constraint L is
2+354 5$ *!i- , where probability of ( 6 !* i-
798 . For each timing constraint 3 ,
5
our algorithm will output a serial of (Probability, Consumption) pairs ( 8 , ).
:; * - is either a discrete random variable or a continuous random variable. We define <*>=- to be
? the cumulative distribution function (abbreviated as CDF) of the random variable
* - , where @ *>A :; :; 8 * * -$BA - . When * - is a discrete random variable, the CDF @ *CA - is the sum of all the probabilities
D * - is a continuous random
associating with the execution times that are less than or equal to A . If
P
G
F
H E *!I-KJLI .
variable, then it has a probability density function (PDF) . If assume the PDF is E , then @ *>A [ , @ * M
W .
Function @ *>A - is nondecreasing, and @ * dNM defined as
5
3
)
We define the VAP (voltage assignment with probability) problem as follows: Given
different
6
voltage levels:
i ,
:
,
#"
,
executed on each voltage
probability
8
each node in assignment
$
, a PDFG
with
, a timing constraint
6
*
-
,
8 *
-
, and
5
*
and a confidence probability
3
8
-
for each node
T
, find the voltage for
that gives the minimum expected total energy consumption
5
with confidence
under timing constraint 3 .
III. M OTIVATIONAL E XAMPLE
A. Multiprocessor Systems
Nodes
5
5
4
3
4
1
3
1
2
3
2
1
T
2
3
1
2
1
1
R1
P C
0.9
4
0.1
1.0 20
0.8
24
0.2
0.9
16
0.1
1.0 12
C
1
PR1
5
PR2
3
4
1
2
5
6
4
3
(b)
(a)
Fig. 1.
R2
T P
4 0.9
6 0.1
2 1.0
2 0.8
6 0.2
2 0.9
4 0.1
2 1.0
(c)
(a) A given PDFG. (b) The times, probabilities, and energy consumption of its nodes for different voltage levels. (c) The schedule
graph of two processors.
First we give an example for multiprocessor embedded systems, which is shown in Figure 2. In this
paper, we assume the tasks have already been preprocessed. We already know which task will be processed
by which processor, and the scheduling graph is given. For example, the task graph is shown in Figure 1(a).
After preprocessing, we get the scheduling graph in Figure 1(c). This is a same graph to the input DFG
that is shown in Figure 2(b).
Nodes
5
4
3
2
1
T
2
3
1
R1
P C
0.9 4
0.1 4
1.0 20
1
3
1
2
0.8 24
0.2 24
0.9 16
0.1 16
2
6
2
4
R2
P
C
0.9 1
0.1 1
1.0 5
0.8 6
0.2 6
0.9 4
0.1 4
1
1.0
2
1.0
(a)
Fig. 2.
12
T
4
6
2
3
Nodes
5
5
4
4
3
3
2
1
2
1
T
2
3
1
1
3
1
2
1
R1
P C
0.9 4
0.1 4
1.0 20
0.8 24
24
16
16
2
6
2
4
R2
P
C
0.9 1
0.1 1
1.0 5
0.8 6
0.2 6
0.9 4
0.1 4
12
2
1.0
0.2
0.9
0.1
1.0
(b)
T
4
6
2
3
(c)
(a) The times, probabilities, and energy consumption of nodes for different voltage levels. (b) A DAG. (c) The time cumulative
distribution functions (CDFs) and energy consumption of its nodes for different voltage levels
In Figure 2, each node has two different voltage levels to choose from, and is executed on them
with probabilistic execution times. The input DFG has five nodes. The times, energy consumption, and
probabilities of each node is shown in Figure 2(a). Node 5 is a multi-child node, which has two children: 3
and 4. Node 2 is a multi-parent node, and has two parents: 3 and 4. Figure 2(c) shows the time cumulative
7
distribution functions (CDFs) and energy consumption of each node for different voltage levels. In DSP
applications, a real time system does not always has hard deadline time. The execution time can be smaller
than the hard deadline time with certain probabilities. So, the hard deadline time is the worst-case of the
varied smaller time cases. If we consider these time variations, we can achieve a better minimum energy
consumption with satisfying confidence probabilities under timing constraints.
The number of computation cycles ( 4 ) for a task is proportional to the execution time. The energy
5
consumption ( ) depends on not only the voltage level
We use the expected value of energy consumption ( voltage level
, but also the number of computation cycles
5
*
-
) as the energy consumption
5
4
.
under a certain
. Under different voltage levels, a task has different expected energy consumptions. The
higher the voltage level is, the faster the execution time is, and the more expected energy is consumed.
According to the energy model of DVS [21], [22], [24], the computation time is proportional to
)POR
to
:
- , where ;<;
i:
;X; . So here
is the supply voltage,
_POR
as it is under the high voltage (
;<;
*
;X; d
is the threshold voltage; the energy consumption is proportional
we assume the computation time of a node under the low voltage (
'
:
) is twice as much
); the energy consumption of a node under the high voltage (
four times as much as it is under the low voltage (
:
$
) is
).
T
(P , C)
(P , C)
(P , C)
4
0.65, 76
5
0.65, 43
0.72, 61
6
0.72, 34
0.81, 61
7
0.80, 34
0.90, 52
8
0.72, 22
0.80, 34
1.00, 52
9
0.80, 22
0.90, 40
1.00, 52
10
0.72, 19
0.80, 22
0.90, 34
11
0.72, 19
0.80, 22
1.00, 34
12
0.80, 19
0.90, 22
1.00, 34
13
0.80, 19
1.00, 22
14
0.90, 19
1.00, 22
15
0.90, 19
1.00, 22
16
1.00, 19
(P , C)
1.00, 40
TABLE I
M INIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR A
DAG.
B. Our Solution
For Figure 2, the minimum total energy consumptions with computed confidence probabilities under
the timing constraint are shown in Table I. For each row of the table, the
5
in each ( 8 ,
5
) pair gives the
8
minimum total energy consumption with confidence probability
8
under timing constraint
. Using our
algorithm, VAP M, at timing constraint 11, we can get (0.80, 22) pair. Table II shows the assignments
of our algorithm. Assignment
$* -
represents the voltage selection of each node . Using our algorithm,
we achieve minimum total energy consumption 22 with probability 0.80 satisfying timing constraint 11.
While using the heuristic algorithm HUA, the total energy consumption obtained is 61, because HUA still
need to use voltage level
and cannot change all node’s voltage level to
:
under timing constraint 11.
The energy saving improvement of our algorithm is 59.1%. This case shows that in many situations, the
solutions obtained by our algorithms have significant improvement compared with the results gotten by
Heu.
Node id
T
Type
Prob.
Consumption
5
3
R1
1.00
4
4
2
R2
1.00
5
3
4
R2
0.80
6
2
2
R2
1.00
4
1
4
R2
1.00
3
0.80
22
Ours
Total
11
HUA
5
3
R1
1.00
4
4
2
R1
1.00
5
3
1
R1
0.80
24
2
1
R1
1.00
12
1
2
R1
1.00
16
0.80
61
Total
6
TABLE II
U NDER TIMING CONSTRAINT 11,
THE DIFFERENT ASSIGNMENTS BETWEEN VAP
M AND Heu.
Given the requirement of probability, we can get a minimum total energy consumption under every
timing constraint. For example, if the required probability is 0.80, then the total energy consumption varies
from 61 to 19 at different timing constraints. The energy consumptions under different timing constraints
are shown in Table III.
T
6
7,8
9 - 11
12 - 16
(P , C)
(0.80, 61)
(0.80, 34)
(0.80, 22)
(0.80, 19)
TABLE III
G IVEN AN
REQUIREMNT OF PROBABILITY, THE
(P ROBABILITY, C ONSUMPTION ) PAIRS UNDER DIFFERENT TIMING CONSTRAINTS .
9
IV. T HE A LGORITHMS
A. Definitions and Lemma
To solve the VAP problem, we use dynamic programming method traveling the graph in bottom up
fashion. For the easiness of explanation, we will index the nodes based on bottom up sequence. For
Q
example, Figure 2 (a) shows a tree indexed by bottom up sequence. The sequence is:
:
.
Define a root node to be a node without any parent and a leaf node to be a node without any child. A
multi-child node is a node with more than one child. For example, in Figure 2 (a), node 3 and 5 are
multi-child nodes. Similarly, a multi-parent node is a node with more than one parent.
Given the timing constraint
, a DFG
3
, and an assignment
, we first give several definitions as
follows:
: The sub-graph rooted at node
Z
, containing all the nodes reached by node
. In our algorithm,
each
step will add one node which becomes the root of its sub-graph. For example, in Figure 2 (a),
is the tree containing nodes 1, 2, and 3.
(* >
(* and
-
%
: The total energy consumption and total execution time of
-
under the
assignment A. In our algorithm, each step will achieve the minimum total energy consumption of
with computed confidence probabilities under various timing constraints.
In our algorithm, table
will be built. Each entry of table
will store a link list of (Probability,
Consumption) pairs sorted by probability in ascending order. Here we define the (Probability,
, )
Consumption) pair ( >
computed by all assignments
(8
,
5
where
8
), and
=
8 :
*
is ( 8
8 In our algorithm,
:
:
and
,
5 :
5
is the minimum energy consumption of
!* -$
with probability
7 8
), then,
after the
=
5
+
5
:
operation between
. We denote this operation as
!*
5
-
.
and
:
, if
: , we get pair (8
.
5 and
is the table in which each entry has a link list that store pair ( 8
represents a node number, and
6)
satisfying
5!
We
introduce
the operator “ ” in this paper. For two (Probability, Cost) pairs
as follows:
,
represents time. For example, a link list can be (0.1, 2)
,
5
). Here,
(0.3, 3)
is
),
(0.8,
(1.0, 12). Usually, there are redundant pairs in a link list. We give the redundant-pair removal algorithm
as following:
For example, we have a list with pairs (0.1, 2)
(0.3, 3)
pair removal as following: First, sort the list according
(0.1, 2)
(0.3, 3)
(0.3, 4)
8
(0.5, 3)
(0.3, 4), we do the redundant-
in an ascending order. This list becomes to
(0.5, 3). Second, cancel redundant pairs. Comparing (0.1, 2) and (0.3,
3), we keep both. For the two pairs (0.3, 3) and (0.3, 4), we cancel pair (0.3, 4) since the cost 4 is bigger
than 3 in pair (0.3, 3). Comparing (0,3, 3) and (0.5, 3), we cancel (0.3, 3) since
There is no information lost in redundant-pair removal
[ Y
[ Y
while
7 .
10
Algorithm IV.1 redundant-pair removal algorithm
Require: A list of ( ,
)
Ensure: A redundant-pair free list
1: Sort the list by
in an ascending order such that
.
2: From the beginning to the end of the list,
3: for each two neighboring pairs (
6:
then
if then
cancel the pair 7:
else
if
4:
5:
cancel the pair
8:
, ) and ( , ) do
end if
9:
10:
else
11:
if
then
, )
cancel the pair (
12:
end if
13:
end if
14:
15: end for
In summary, we can use
the following Lemma to cancel redundant pairs.
Lemma 4.1: Given ( 8
8 8 : 1) If
8 8 : 2) If
5
,
) and ( 8
:
:
5
,
) in the same list:
, then the
pair with minimum
7
5
and
:
5
:
, then
5
5!
is selected to be kept.
is selected to be kept.
Using Lemma 4.1, we can cancel many redundant-pair ( 8
in a list during a computation. After the
has the following properties:
Lemma 4.2: For any ( 8
8 8 8 8 : 1)
2)
:
and
,
5
5
5 :
if and only if
.
5
5
,
:
:
,
5 :
has already covered ( 8
,
5
) whenever we find conflicting pairs
.
8
8 ,
5
)
) in the same list:
Since the link list is in an ascending order
by probabilities, if
the definitions, we can guarantee
with
5
operation and redundant pair removal, the list of (
) and ( 8
8
8
:
, while
to find smaller
energy
consumption
). We can cancel the pair ( 8
,
5
5
5a :
7
5 :
, based on
. Hence ( 8
:
,
5 :
)
). For example, we have two pairs: (0.1,
3) and (0.6, 2). Since (0.6, 2) is better than (0.1, 3), in other words, (0.6, 2) covers (0.1, 3), we can
cancel
(0.1, 3) and will not lose useful information. We can prove the vice versa is true similarly. When
8 8 : , smaller
5
is selected.
In every step in our algorithm, one more node will be included for consideration. The information of
11
this node is stored in local table
, which is similar to table
such as probabilities and costs, of a node itself. Table
of node
an ascending order;
is the cost only for node
The building procedures of
. A local table only store information,
is the local table only storing the information
is a local table of link lists that store pair (
. In more detail,
at time , and
, insert
3
into
3
a
,
) sorted by
4), (4: 0.3, 4) for type
:
in
is the corresponding probability.
3
be the link list in each entry of
while redundant pairs canceled out based on Lemma 4.1. For example, For
example, if a node has the following (T: P, C) pairs: (1: 0.9, 10), (3: 0.1, 10) for type
are as follows. First, sort the execution time variations in an ascending
order. Then, accumulate the probabilities of same type. Finally, let
, and (2: 0.7,
. After sorting and accumulating, we get (1: 0.9, 10), (2: 0.7, 4), (3: 1.0, 10),
and (4: 1.0, 4). We obtain Table IV after the insertion.
Similarly, for two link lists
implement
3
and
3 :
, the operation
is implemented as follows: First,
operation on all possible combinations of two pairs from different link lists. Then insert the
new pairs into a new link list and remove redundant-pair using Lemma 4.1.
1
2
3
4
(0.9, 10)
(0.7, 4)
(0.7, 4)
(1.0, 4)
(1.0, 10)
TABLE IV
A N EXAMPLE OF
LOCAL TABLE ,
For an input PDFG with multiple processors, given the order of nodes and expected energy consumption
of each node, the basic steps of our schedule are shown in algorithm VAP SG.
Algorithm IV.2 algorithm to get scheduling graph (VAP SG)
Require: a task graph PDFG
Ensure: a scheduling graph
1: build a graph to show the order using list scheduling.
2: show the dependency in the graph.
3: remove all redundant edges according Lemma 4.3.
In the graph built in step 1 of VAP SG, If there is a edge from
before
in the same processor or
depends on
to
, this means that
is scheduled
in the original PDFG. The new graph is a DAG that
represents the order of nodes and dependencies.
Lemma 4.3: For two nodes
then the edge
and
, there is an edge
, if we can find another separate path
,
can be deleted.
For example, Figure 3 shows the input PDFG. There are two processors
nodes are given. In Figure 3(b), there are paths
5
amd
8
@
and
8
:
. The order of
. Also, there is a separate path
12
5
. Then we can cancel the path
5
.
PR2
PR1
A
A
C
B
E
D
G
F
B
C
D
E
F
G
(a)
Fig. 3.
(b)
(a) A task graph. (b) The schedule graph for two processors using list scheduling.
B. The VAP S Algorithm
The Algorithm for uniprocessor system is shown in VAP S. It can give the optimal solution for the
VAP problem when there is only one processor.
The VAP S Algorithm
In algorithm VAP S, first build a local table
when
b
for each node. Next, in step 2 of the algorithm,
, there is only one node. We set the initial value, and let
W
programming method, build the table
( W
) in table
time of nodes from
. For each node
. We use “ ” on the two tables
to
#
. Then using dynamic
under each time , we try all the times
and
I
I
. Since
b
*
d
b -
b
, the total
is . The “ ” operation add the energy consumptions of two tables together
and multiply the probabilities of two tables with each other. Finally, we use Lemma 4.1 to cancel the
conflicting (Probability, Consumption) pairs. The new energy consumption in each pair obtained in table
obtained in
I
of each pair in
8
Z
is the energy consumption of current node
I at time
b
plus the energy consumption in each pair
. Since we have used Lemma 4.1 canceling redundant pairs,
the energy consumption
is the minimum total energy consumption for graph
under timing constraint .
The energy consumption in
with confidence probability
is the minimum total energy consumption with computed confidence
probability under timing constraint j. Given the timing constraint 3 , the minimum total energy consumption
for the graph G is the energy consumption in
Theorem 4.1: For each pair ( 8
,
5
) in
minimum total energy consumption for graph
.
. In the following, we show Theorem 4.1 about this.
( W
4
) obtained by algorithm VAP S,
with confidence probability
8
5
is the
under timing constraint
13
Algorithm IV.3 optimal algorithm for the VAP problem when there is a single processor (VAP S)
Require:
different levels of voltages, a DAG, and the timing constraint
.
Ensure: An optimal voltage level assignment
1) Construct schedule graph using algorithm VAP SG.
2) Build a local table
1: let 3)
for each node of PDFG.
2: for each node do
for each time
3:
do
for each time
4:
if
5:
do
then
6:
in
else
7:
continue
8:
end if
9:
10:
end for
11:
insert
12:
end for
to
and remove redundant pairs using Lemma 4.1 and redundant-pair remove algorithm.
13: end for
4) return
Proof: By induction. Basic Step: When
, there is only one node and
W
#
.
*) '& !( + . Thus, Thus, when W , Theorem 4.1 is true. Induction Step: We need to
5 5 7
W , if for each pair ( 8
show that for
,
)
in
,
is the minimum total energy consumption
a 5 a for graph
with confidence probability 8
under timing constraint , then
for
each
pair
(
,
)
8
a
a
!
5
a
in
,
is the minimum total energy consumption for graph
with confidence probability
a 2a d
!#"
8
$!%
'& !(
if
under timing constraint . In step 2 of the algorithm, since
we try all the possibilities to obtain . Then we use
b
*
b -
for each
b
in
,
operator to add the energy consumptions of two
tables and multiply the probabilities of two tables. Finally, we use Lemma 4.1 to cancel the conflicting
(Probability, Consumption) pairs. The new energy consumption in each pair obtained in table
energy consumption of current node
I
a
W
at time
b
a
is the
plus the energy consumption in each pair obtained in
. Since we have used Lemma 4.1 to cancel redundant pairs,
the energy consumption of each pair in
a
is the minimum total energy consumption for graph
timing constraint . Thus, Theorem 4.1 is true for any
From Theorem 4.1, we know
( W
with confidence probability
4
8 a under
).
records the minimum total energy consumption of the whole path
with corresponding confidence probabilities under the timing constraint 3 . We can record the corresponding
FU type assignment of each node when computing the minimum total energy consumption in step 2 in
14
the algorithm VAP S. Using these information, we can get an optimal assignment by tracing how to reach
.
!
It takes
and
c-
*
to compute one value of
, where
is the maximum number of FU types,
is the maximum number of execution time variations for each node. Thus, the complexity of the
*
algorithm VAP S is
3
c-
, where is the number of nodes and
is the given timing
`
3
constraint. Usually, the execution time of each node is upper bounded by a constant. So
(
*
equals
3
-
is a constant). In this case, VAP S is polynomial.
C. The VAP M Algorithm
In this subsection, we give a heuristic algorithm Heu first, then we propose our novel and optimal
algorithm, VAP M, for multiprocessor DSP systems. We will compare them in the experiments section.
The Heu Algorithm
We first design an heuristic algorithm for multiprocessor systems according to the HUA algorithm in
[9], [10], we call this algorithm as Heu. The DFG now is a DAG and no longer limited to a simple path.
The authors of [9], [10] did not give the algorithm for multiple processors situation, and their data flow
graph is a simple path. Heu is an algorithm that use the idea of the algorithm of [9], [10] and using for
multiple processors situation, the data flow graph is a DAG.
In Heu algorithm,
,
is the largest possilbe time variation for node
b
is the scaled time slot for node
Let
*
-
the entire assigned slot , where
workload
:
A
, and
A
*
;
7
-
, is the time variation of node
is the j time variation of node
be the minimum energy to complete the workload A
ideal variable voltage processor,
Z
.
required by vertex
is the energy consumed by running at voltage
in time . On an
;
7
throughout
is the voltage level that enables the processor to accumulate the
at the end of the slot. On a multiple voltage processor with only a finite set of voltage levels
,
*
-
is the energy consumed by running at
switching to the next higher level
a
to complete
point can be conveniently calculated from and
A
A
, where
for a certain amount of time and then
;
7
a
and the switching
[9], [10].
We propose our algorithm, VAP M, for multiprocessor DSP systems, which shown as follows. In VAP M,
we exhaust all the possible assignments of multi-parent or multi-child nodes. Without loss of generality,
assume we using bottom up approach. If the total number of nodes with multi-parent is A , and there are
maximum
variations for the execution times of all nodes, then we will give each of these t nodes a
fixed assignment.
The VAP M Algorithm
Require:
different voltage levels, a DAG, and the timing constraint
3
.
15
Algorithm IV.4 Heuristic algorithm for the VAP problem when there are multiple processors and the
DFG is DAG (Heu)
Require:
different levels of voltages, a DAG, and the timing constraint
.
Ensure: a voltage assignment to minimize energy consumption with a guaranteed probability
1: Schedue graph construction.
2: /* assign worst-case time to 3:
, let ;
for each vertex
satisfying
*/
;
4: while (
)
$
5:
7:
that has the maximum
9:
)
( ;
)
&
)
;
if (
8:
pick
6:
;
10: +
11: /* calculate the total execution time
)
12: /* if
if (
*/
;
cannot be met */
) exit;
13: for each vertex , let
)
;
Ensure: An optimal voltage assignment for the DAG
1) Topological sort all the nodes, and get a sequence A.
2) Count the number of multi-parent nodes
A
8
and the number of multi-child nodes
A
`
. If
A
8
A
`
,
use bottom up approach; Otherwise, use top down approach.
3) For bottom up approach, use the following algorithm. For top down approach, just reverse the
sequence.
4) If the total number of nodes with multi-parent is A , and there are maximum
variations for the
execution times of P all nodes, then we will give each of these t nodes a fixed assignment.
5) For each of the
:
possible fixed assignments, Assume the sequence after topological sorting is
, in bottom up fashion. Let
#
. Assume
is the table that
stored minimum total energy consumption with computed confidence probabilities under the timing
constraint
for the sub-graph rooted on
Z
except
. Nodes
! #" #$
are all child nodes of
16
node
and
*G[
6) For
!
#"
,
, then
!
!
if
[ -
Z
is the number of child nodes of node
#$
if
[
7
if
(4)
W
W
is the union of all nodes in the graphs rooted at nodes
the graphs rooted at nodes
!
and
"
Z
!
#"
and
. Travel all
. If a node is a common node, then use a selection function
to choose the type of a node.
7) Then, for each
in
b
.
8) For each possible fixed assignment, we get a
all the possible
I
(5)
. Merge the (Probability, Consumption) pairs in
together, and sort them in ascending sequence according probability.
9) Then use the Lemma 4.1 to remove redundant pairs. Finally get
.
Now we explain our optimal algorithm VAP M in details. In VAP M, we exhaust all the possible
assignments of multi-parent or multi-child nodes. Without loss of generality, assume we using bottom up
approach. If the total number of nodes with multi-parent is A , and there are maximum
A
the execution times of P all nodes, then we will give each of these
exhausted all of the
variations for
nodes a fixed assignment. We will
possible fixed assignments.
Algorithm VAP M gives the optimal solution when the given DFG is a DAG. In the following, we give
the Theorem 4.2 and Theorem 4.3 about this.
Theorem 4.2: In each possible fixed assignment, for each pair ( 8
by algorithm VAP M,
probability
8
5!
Proof: By induction. Basic Step: When
W
,
5
W
in
,
) in
, There is only one node and
, Theorem 4.2 is true. Induction Step: We need to show that for
5 in
,
a 5
( W
is the minimum total energy consumption for the graph
under timing constraint .
7
W
with confidence
. Thus, when
, if for each pair ( 8
is the minimum total energy consumption of the grapha , then for each pair (8
a
is the total energy consumption of the graph
) obtained
4
a
a
5
,
5
,
8
with confidence probability
)
a
)
under timing constraint . According to the bottom up approach (for top down approach, just reverse the
sequence), the execution of
From equation (4),
nodes of
a
a for each child node of
a
has been finished before executing
a
.
gets the summation of the minimum total energy consumption of all child
because they can be executed simultaneously within time . We avoid the repeat counting
of the common nodes. Hence, each nodes in the graph rooted by node
a
was counted only once. From
equation (5), the minimum total energy consumption is selected from all possible energy consumptions
caused by adding
a
into the sub-graph rooted on
a
. So for each pair ( 8
a
,
5! a
) in
a
,
17
5
a
a
is the total energy consumption of the graph
with confidence probability
constraint . Therefore, Theorem 4.2 is true for any
Theorem 4.3: For each pair (8
5
,
) in
( W
( W
4
8 a ).
) obtained by algorithm VAP M,
3
with confidence probability
minimum total energy consumption for the given DAG
constraint .
8
a
we obtained,
5!a
is the total energy consumption of the graph
5!
is the
under timing
,
5
) in
with confidence probability
under timing constraint . In algorithm VAP M, we try all the possible fixed assignments, combine
Hence, for each pair ( 8
) in
,
5
in dynamic table, and remove redundant pairs using the Lemma 4.1.
(W
.
In algorithm VAP M, there are
) obtained by algorithm VAP M,
3
total energy consumption for the given DAG
loops
and each loop needs
P a
&
*
:
3
-
*
, where
A
:
3
8
with confidence probability
P
complexity of Algorithm VAP M is
is the given timing constraint,
5
is the minimum
under timing constraint
c-
running time. The
is the total number of nodes with
multi-parent (or multi-child) in bottom up approach (or top down approach), 3
8
Proof: According to Theorem 4.2, in each possible fixed assignment,
for each pair ( 8
a
them together into a new row
under timing
is the number of nodes,
is the maximum number of FU types for each node, and
is the
maximum number of execution time variation for each node. Algorithm VAP M is exponential, hence it
can not be applied to a graph with large amounts of multi-parent and multi-child nodes.
V. E XPERIMENTS
This section presents the experimental results of our algorithms. We conduct experiments on a set of
benchmarks including 4-stage lattice filter, 8-stage lattice filter, voltera filter, differential equation solver,
RLS-languerre lattice filter, and elliptic filter. Among them, the PDFG for first three filters are trees and
those for the others are DAGs.
Three different voltage levels,
'
:
, and
, are used in the system, in which a processor under
is the quickest with the highest energy consumption and a processor under
'
is the slowest with the
lowest energy consumption. The execution times, probabilities, and energy consumptions for each node
are randomly assigned. For each benchmark, the first timing constraint we use is the minimum execution
time. The experiments are performed on a Dell PC with a P4 2.1 G processor and 512 MB memory
running Red Hat Linux 7.3.
We compare our uniprocessor algorithm with the heuristic algorithm HUA in [9], [10] on all of the
six benchmarks. There is only one processor:
8
W
. These experiments are finished in less than one
second. The experimental results on voltera filter, 4-stage lattice filter, and 8-stage lattice filter are shown
in Table V-VII. In each table, column “TC” represents the given timing constraint, “HUA” represents the
18
Voltera Filter (27 nodes)
TC
0.8
0.9
1.0
HUA Ours
HUA Ours
HUA Ours
103
708
701
1.0
108
708
647
8.6
708
701
1.0
110
708
623
12.1
708
671
5.2
708
701
1.0
150
708
310
56.2
708
338
52.3
708
347
51.0
200
708
186
73.7
708
204
71.2
708
206
70.9
206
177
175
1.2
708
196
72.3
708
198
72.0
216
177
162
8.5
177
175
1.2
708
182
74.3
220
177
156
11.9
177
172
2.8
177
175
1.2
250
177
118
33.4
177
130
26.6
177
132
25.6
300
177
73
59.8
177
76
57.1
177
82
54.7
350
177
45
74.6
177
45
74.6
177
45
74.6
Ave. Redu.( )
58.6
61.7
64.8
TABLE V
T HE MINIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR VOLTERA FILTER .
heuristic algorithm HUA in [9], [10], and “Ours” represents our optimal algorithm VAP S. The minimum
total energy consumption obtained from different algorithms: VAP S and the optimal HUA [9], [10], are
presented in each entry. Columns “1.0”, “0.9”, and “0.8”, represent that the confidence probability is 1.0,
0.9, and 0.8, respectively.
Column “%” shows the percentage of reduction on the total energy consumption, compared the results
of algorithm HUA [9], [10]. The average percentage reduction is shown in the last row “Ave. Redu(%)” of
all Tables V-VII, which is computed by averaging energy-savings at all different timing constraints. The
0
entry with “ ” means no solution available. For the timing constraint below certain value, we can not
find a solution using HUA algorithm, while we can find solutions using our algorithm. For example, in
Table V, under timing constraint 103, we can not find solution for probability 0.9 using HUA algorithm,
but we can find solution 705 using our optimal algorithm.
From the Table V-VII, we found in many situations, algorithm VAP S has significant energy consumption
reduction than algorithm HUA [9], [10]. For example, in Table V, under the timing constraint 200, for
probability 0.8, the entry under “HUA” is 708, which is the minimum total energy consumption using
algorithm HUA. The entry under “Ours” is 186, which means by using VAP S algorithm, we can achieve
minimum total energy consumption 186 with confidence probability 0.8 under timing constraint 200. The
energy reduction is 73.7%.
The experimental results show that our algorithm can greatly reduce the total energy consumption while
19
4-stage Lattice IIR Filter (26 nodes)
TC
0.8
0.9
1.0
HUA Ours
HUA Ours
HUA Ours
96
684
676
1.0
101
684
616
9.8
684
676
1.0
107
684
592
13.5
684
671
6.7
684
676
1.0
150
684
309
54.8
684
355
48.2
684
369
45.8
180
684
212
69.0
684
227
66.8
684
238
65.2
192
171
168
1.8
684
186
62.9
684
190
62.2
202
171
154
10.0
171
168
1.8
684
174
64.6
214
171
150
12.1
171
152
11.8
171
168
1.8
240
171
108
26.9
171
120
29.9
171
124
27.5
280
171
64
62.6
171
68
60.2
171
74
56.7
320
171
41
76.0
171
41
76.0
171
41
76.0
Ave. Redu.( )
55.9
59.7
62.4
TABLE VI
T HE MINIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR
4- STAGE LATTICE FILTER .
have a guaranteed confidence probability. On average, algorithm VAP S gives an energy consumption
reduction of 58.0% with 0.8 confidence probability satisfying timing constraints, and energy consumption
reductions of 61.4% and 64.1% with 0.9 and 1.0 confidence probabilities satisfying timing constraints,
respectively. The experiments using VAP S on these benchmarks are finished within several minutes.
For the multiprocessor systems, we compare our VAP M algorithm with the Heu algorithm. We also
conduct experiments on all of the six benchmarks. There are two processors:
8
W
and
8
]
. The
experimental results for different equation solver, RLS-Laguerre filter, and elliptic filter are shown in
Table VIII-X. In each table, column “TC” represents the given timing constraint, “Heu” represents the
heuristic algorithm Heu in, and “Ours” represents our optimal algorithm VAP M. The minimum total
energy consumption obtained from different algorithms: VAP M and the optimal Heu, are presented in
each entry. Columns “1.0”, “0.9”, and “0.8”, represent that the confidence probability is 1.0, 0.9, and 0.8,
respectively.
Column “%” shows the percentage of reduction on the total energy consumption, compared the results
of algorithm Heu. The average percentage reduction is shown in the last row “Ave. Redu(%)” of all
0
Tables VIII-X. The entry with “ ” means no solution available. Under timing constraint 63 in Table VIII,
there is no solution for probability 0.9 using algorithm Heu. However, we can find solution 426 with
0.9 probability that guarantees the total execution time of the DFG are less than or equal to the timing
constraint 63.
20
8-stage Lattice IIR Filter (42 nodes)
TC
0.8
0.9
1.0
HUA Ours
HUA Ours
HUA Ours
166 1100 1090 0.9
172 1100 1003 8.2
1100 1090 0.9
175 1100
960 12.8 1100 1032 6.2
200 1100
548 50.2 1100
597 45.7 1100
626 43.1
180 1100
404 63.2 1100
421 61.7 1100
442 59.8
332
275
272
1.1
1100
305 72.3 1100
308 72.0
344
275
250
9.0
275
272
1.1
1100
282 74.4
350
275
239 13.2
275
242 12.1
275
272
400
275
183 33.5
275
200 27.3
275
205 25.5
420
275
112 59.3
275
118 57.1
275
127 53.9
450
275
68
75.3
275
68
275
68
Ave. Redu.( )
59.5
75.3
1100 1090 0.9
62.8
1.1
75.3
65.2
TABLE VII
T HE MINIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR
8- STAGE LATTICE FILTER .
From the Table VIII-X, we found in many situations, algorithm VAP M has significant energy consumption reduction than algorithm Heu. For example, in Table VIII, under the timing constraint 80, for
probability 0.8, the entry under “Heu” is 432, which is the minimum total energy consumption using
algorithm Heu. The entry under “Ours” is 162, which means by using VAP M algorithm, we can achieve
minimum total energy consumption 186 with confidence probability 0.8 under timing constraint 80. The
energy reduction is 72.6%, which is very impressive.
The experimental results show that our algorithm can greatly reduce the total energy consumption while
have a guaranteed confidence probability. On average, algorithm VAP M gives an energy consumption
reduction of 56.1% with 0.8 confidence probability satisfying timing constraints, and energy consumption
reductions of 59.3% and 61.7% with 0.9 and 1.0 confidence probabilities satisfying timing constraints,
respectively. The experiments using VAP M on these benchmarks are finished within several minutes.
We also give the impact of guaranteed probability on energy consumption in Figure 4. The experiment
was implemented on Voltera Filter. The timing constraints are fixed. We selected three timing constraints:
150, 206, and 250. Figure 4 shows the energy consumption values from algorithm HUA and VAP S. The
three lines with square box stands for the results from algorithm HUA, and the three lines with “ ” stands
for the results from algorithm VAP S. We can find that the energy consumption from algorithm VAP S is
significant lower than the results from HUA. In curve HUA206, there is a jump or energy consumption
from 708 to 177 at probability 0.9. The reason is: at this time, we can switch the voltage level from high
21
Different Equation Solver (11 nodes)
TC
0.8
0.9
Heu
1.0
Heu
Ours
Ours
63
432
426
1.5
66
432
394
8.8
432
426
1.5
68
432
380 12.1
432
397
8.2
432
426
60
432
196 54.6
432
208 51.8
432
214 50.4
80
432
162 72.6
432
126 70.8
432
133 69.2
126
108
104
3.7
432
118 62.7
432
123 71.6
132
108
98
9.2
108
104
3.7
432
112 74.1
138
108
95
12.5
108
103
4.2
108
104
3.7
150
108
72
33.4
108
80
26.0
108
84
22.3
180
108
46
57.4
108
48
55.6
108
58
46.3
200
108
32
70.4
108
32
70.4
108
32
70.4
Ave. Redu.( )
52.6
Heu
Ours
55.3
1.5
57.4
TABLE VIII
T HE MINIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR DIFFERENT EQUATION SOLVER .
level
W
to
]
. While at the probability below 0.9, we cannot switch according to algorithm HUA. The
results from algorithm VAP S can freely choose voltage level for each node and hence achieve impressive
energy saving.
Figure 5 depicts timing constraints’ impact on energy consumption. We fixed the guaranteed probability
as 0.80. Figure 4 shows the energy consumption values from algorithm HUA and VAP S. The solid line
with “+” stands for the results from algorithm VAP S. It decreases quickly from 701 to 45 as the timing
constraint increases from 103 to 350. The dashed line with square box stands for the results from algorithm
HUA. It keeps 708 unchanged from timing constraint 103 to 205; Then at 206, it drops sharply to 177;
After that it keeps 177 from timing constraint 206 to 350. The reason of the drop from 708 to 177 is
because the voltage level switched from high voltage
timing constraint 206, we can switch the voltage from
W
W
to low voltage
to low voltage
]
]
for all the nodes. Before
for all nodes since we cannot
find way to guarantee the probability of satisfying timing constraint to reach 0.80 under low voltage
]
.
The dot dashed line with “ ” is the result curve of energy saving percentage of algorithm VAP S to HUA.
The percentage first increases gradually; Then at the middle, it droped to nearly 0; After that, it increases
gradually again. We can find that the energy consumption from algorithm VAP S is significant lower that
the results from HUA. The larger the timing constraints, the lower the energy needed. This matches our
expectation.
22
RLS-Laguerre Filter (19 nodes)
TC
0.8
0.9
Heu
1.0
Heu
Ours
130
900
892
136
900
804 10.6
900
892
0.9
140
900
756 16.0
900
727
9.2
900
892
180
900
408 54.6
900
430 52.2
900
444 50.7
220
900
268 70.2
900
277 69.2
900
261 70.0
260
225
218
3.2
900
248 72.4
900
251 72.1
272
225
202 10.2
225
218
3.2
900
232 74.2
280
225
186 17.2
225
212
6.0
225
218
320
225
152 32.4
225
164 27.2
225
172 23.6
360
225
94
58.2
225
98
56.4
225
106 52.9
420
225
58
74.2
225
58
74.2
225
58
Ave. Redu.( )
54.3
0.9
Ours
Heu
Ours
57.6
0.9
3.2
74.2
59.2
TABLE IX
T HE MINIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR
RLS-L AGUERRE FILTER .
VI. C ONCLUSION
This paper proposed two optimal algorithms to minimize energy in real-time DSP systems. By taking
advantage of the uncertainties in execution time of tasks, our approach relaxes the rigid hardware requirements for software implementation and eventually avoids over-designing the system. For the voltage
assignment with probability (VAP) problem, by using Dynamic Voltage Scaling (DVS), we proposed
two algorithms, VAP S and VAP M, to give the optimal solutions for uniprocessor or multiprocessor
DSP systems, respectively. Experimental results show that our proposed algorithms achieved significant
improvement on energy consumption savings than previous work. Our approach has great potential for
further exploration.
VII. ACKNOWLEDGEMENT
This work is partially supported by TI University Program, NSF EIA-0103709, Texas ARP 0097410028-2001, NSF CCR-0309461, NSF IIS-0513669, and Microsoft, USA.
R EFERENCES
[1] I. Foster, Designing and Building Parallel Program: Concepts and Tools for Parallel Software Engineering, Addison-Wesley, 1994.
[2] B. P. Lester, The Art of Parallel Programming, Englewood Cliffs, N.J.: Prentice Hall, 1993.
[3] M. E. Wolfe, High Performance Compilers for Parallel Computing, Redwood City, Calif.: Addison-Wesley, 1996.
23
Elliptic Filter (34 nodes)
TC
0.8
0.9
1.0
Heu Ours
Heu Ours
Heu Ours
208 1424 1406 2.3
218 1424 1284 9.8
1424 1406 2.3
220 1424 1236 13.2 1424 1307 8.2
1424 1406 2.3
300 1424 595 58.2 1424 618 56.6 1424 678 52.4
350 1424 435 69.4 1424 452 68.2 1424 481 66.2
416
356
348
2.3
1424 430 69.0 1424 438 69.2
436
356
316 11.2
356
348
2.3
1424 424 70.2
440
356
303 14.8
356
335
6.8
356
348
500
356
236 33.7
356
262 26.4
356
266 25.3
550
356
148 58.4
356
154 56.7
356
164 54.0
600
356
98
72.5
356
98
356
98
Ave. Redu.( )
55.5
72.5
58.7
2.3
72.5
61.3
TABLE X
T HE MINIMUM EXPECTED TOTAL ENERGY CONSUMPTION WITH COMPUTED CONFIDENCE PROBABILITIES UNDER VARIOUS TIMING
CONSTRAINTS FOR ELLIPTIC FILTER .
[4] P. Gl Paulin and J. P. Knight, “Force-directed scheduling for the behavioral synthesis of asic’s,” IEEE Trans. on Computer-Aided
Design of Integrated Circuits and Systems, vol. 8, pp. 661–679, Jun. 1989.
[5] L.-F. Chao and Edwin H.-M. Sha, “Static scheduling for synthesis of dsp algorithms on various models,” Journal of VLSI Signal
Processing Systems, vol. 10, pp. 207–223, 1995.
[6] L.-F. Chao and Edwin H.-M. Sha, “Scheduling data-flow graphs via retiming and unfolding,” IEEE Trans. on Parallel and Distributed
Systems, vol. 8, pp. 1259–1267, Dec. 1997.
[7] S. Tongsima, Edwin H.-M. Sha, C. Chantrapornchai, D. Surma, and N. Passose, “Probabilistic loop scheduling for applications with
uncertain execution time,” IEEE Trans. on Computers, vol. 49, pp. 65–80, Jan. 2000.
[8] T. Zhou, X. Hu, and Edwin H.-M. Sha, “Estimating probabilistic timing performance for real-time embedded systems,” IEEE
Transactions on Very Large Scale Integration(VLSI) Systems, vol. 9, no. 6, pp. 833–844, Dec. 2001.
[9] Shaoxiong Hua, Gang Qu, and Shuvra S. Bhattacharyya, “Exploring the probabilistic design space of multimedia systems,” in IEEE
International Workshop on Rapid System Prototyping, 2003, pp. 233–240.
[10] Shaoxiong Hua, Gang Qu, and Shuvra S. Bhattacharyya, “Energy reduction techniques for multimedia applications with tolerance to
deadline misses,” in ACM/IEEE Design Automation Conference (DAC), 2003, pp. 131–136.
[11] Shaoxiong Hua and Gang Qu, “Approaching the maximum energy saving on embedded systems with multiple voltages,” in International
Conference on Computer Aid Design (ICCAD), 2003, pp. 26–29.
[12] T. Tia, Z. Deng, M. Shankar, M. Storch, J. Sun, L. Wu, and J. Liu, “Probabilistic performance guarantee for real-time tasks with
varying computation times,” in Proceedings of Real-Time Technology and Applications Symposium, 1995, pp. 164 – 173.
[13] J. Bolot and A. Vega-Garcia, “Control mechanisms for packet audio in the internet,” in Proceedings of IEEE Infocom, 1996.
[14] C.-Y. Wang and K. K. Parhi, “High-level synthesis using concurrent transformations, scheduling, and allocation,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 14, pp. 274–295, Mar. 1995.
[15] K. Ito, L. Lucke, and K. Parhi, “Ilp-based cost-optimal dsp synthesis with module selection and data format conversion,” IEEE Trans.
on VLSI Systems, vol. 6, pp. 582–594, Dec. 1998.
24
800
700
Ours150
HUA150
Ours206
HUA206
Ours250
HUA250
Energy consumption
600
500
400
300
200
100
0
0.2
Fig. 4.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Required probability satisfying timing constraint
1
Guaranteed probability’s impact on energy consumption.
800
Ours
HUA
saving %
700
energy consumption
600
500
400
300
200
100
0
100
Fig. 5.
150
200
250
timing constraints
300
350
Timing constraints’ impact on energy consumption.
[16] K. Ito and K. Parhi, “Register minimization in cost-optimal synthesis of dsp architecture,” in Proc. of the IEEE VLSI Signal Processing
Workshop, Oct. 1995.
[17] Z. Shao, Q. Zhuge, C. Xue, and Edwin H.-M. Sha, “Efficient assignment and scheduling for heterogeneous dsp systems,” IEEE Trans.
on Parallel and Distributed Systems, vol. 16, pp. 516–525, Jun. 2005.
[18] M. Qiu, M. Liu, C. Xue, Z. Shao, Q. Zhuge, and E. H.-M. Sha, “Optimal assignment with guaranteed confidence probability for trees
on heterogeneous dsp systems,” in Proceedings The 17th IASTED International Conference on Parallel and Distributed Computing
Systems (PDCS 2005), Phoenix, Arizona, 14-16 Nov. 2005.
[19] A. Kalavade and P. Moghe, “A tool for performance estimation of networked embedded end-systems,” in Proceedings of Design
Automation Conference, Jun. 1998, pp. 257 – 262.
[20] Y. Chen, Z. Shao, Q. Zhuge, C. Xue, B. Xiao, and E. H.-M. Sha, “Minimizing energy via loop scheduling and dvs for multi-core
embedded systems,” in Proceedings the 11th International Conference on Parallel and Distributed Systems (ICPADS’05) Volume II,
The IEEE/IFIP International Workshop on Parallel and Distributed EMbedded Systems (PDES 2005), Fukuoka, Japan, 20-22 Jul. 2005,
pp. 2 – 6.
25
[21] D. Shin, J. Kim, and S. Lee, “Low-energy intra-task voltage scheduling using static timing analysis,” in DAC, 2001, pp. 438–443.
[22] Y. Zhang, X. Hu, and D. Z. Chen, “Task scheduling and voltage selection for energy minimization,” in DAC, 2002, pp. 183–188.
[23] H. Saputra, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, J. S. Hu, C-H. Hsu, and U. Kremer, “Energy-conscious compilation based
on voltage scaling,” in LCTES’02, June 2002.
[24] T. Ishihara and H. Yasuura, “Voltage scheduling problem for dynamically variable voltage processor,” in ISLPED, 1998, pp. 197 –202.
Download