A Timing-Driven Approach to Synthesize Fast Barrel

advertisement
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 55, NO. 1, JANUARY 2008
31
A Timing-Driven Approach to Synthesize
Fast Barrel Shifters
Sabyasachi Das and Sunil P. Khatri
Abstract—In modern digital signal processing and graphics
applications, the shifter is an important module, consuming a
significant amount of delay. This brief presents an architectural
optimization approach to synthesize a faster barrel shifter block,
which can be useful to reduce the delay of the design without
significantly increasing the area. We have divided the problem of
generating the shifter into two steps: i) timing-driven selection
of multiple stages for merging, and ii) the design of the merged
stage. In our proposed method, we define the notion of dual
merged stage, where two stages are merged and the triple merged
stage, where three stages are merged into a single composite
stage. These merged stages are identified by using a timing-driven
algorithm and are used in conjunction with some single stages
of the traditional barrel shifter. The use of these merged stages
helps reduce the depth of the proposed barrel shifter architecture,
thereby improving the delay. The timing-driven nature of our
algorithm helps produce a faster implementation for the overall
shifter block. We have evaluated the performance of our design
by using a number of technology libraries, timing constraints and
shifter bit-widths. Our experimental data shows that the shifter
block generated by our algorithm is significantly faster (10.19%
on average) than the shifter block generated by a commercially
available datapath synthesis tool. These improvements were
verified on placed-and-routed designs as well.
I. INTRODUCTION
A
S WE MIGRATE toward ultra deep sub-micron feature
sizes, digital designs are becoming increasingly complex,
with very aggressive performance goals. Arithmetic components are typically highly computation-intensive, and are
widely used in modern integrated circuits (ICs). The shifter is
an integral part of many digital designs. A barrel shifter is a
combinational logic block that can shift a data by any given
number of bits, in a single operation. There are many applications that require shift operations, including CPUs, floating
point operations (like normalization), variable length coding,
word packing/unpacking, bit indexing, address generation,
field extraction etc. Shifters are essential in the digital signal
processing field.
The barrel shifter is a commonly used shifter architecture.
One of the important reasons behind the widespread usage of
this architecture is the fact that it can perform multi-bit shifts in
a single operation (within one clock cycle). In addition, the area
of the barrel shifter is also reasonably small, which helps keep
the area of the design under control.
Manuscript received May 22, 2007, revised July 11, 2007. This paper was
recommended by Associate Editor L. Lavagno.
S. Das is with Synplicity Inc, Sunnyvale, CA 94087 USA (e-mail:
sabya@synplicity.com).
S. P. Khatri is with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 30332 USA (e-mail:
sunilkhatri@tamu.edu).
Digital Object Identifier 10.1109/TCSII.2007.908951
Several techniques have been proposed to design efficient
barrel shifters in different contexts. Basic architecture of a barrel
shifter was introduced in [1]. High-speed pipelined architectures using TSPC were discussed in [2] and [3]. A high performance and area efficient CMOS 32-bit barrel switch and its
physical design were presented in [4]. In [5], number of stages in
a shifter were reduced, resulting in significantly faster speed. A
multilevel barrel shifter structure in the context of the CORDIC
design was introduced in [6]. In [7], different design tradeoffs
in the context of barrel shifter were analyzed. Timing-driven
layout techniques of cyclic shifters were proposed in [8] and [9].
In [10], data-driven dynamic logic is used to generate a faster
and more power-efficient barrel shifter than domino-logic based
design. A 4-bit barrel shifter in the QCA computing paradigm
was introduced in [11]. A mixed signal 32-bit rotator/shifter circuit design with short latency was discussed in [12]. Several
low-power architectures for barrel shifters have been presented
in [13]–[15] and [16]. Energy delay evaluation of a Low Power
Barrel Switch is discussed in [17].
In this brief, we propose a timing-driven technique to synthesize a faster barrel shifter block. In our approach, we merge two
(or three) stages of the shift operation into a single stage, leading
to a reduction in the total number of stages. These stages are referred to as dual merged and triple merged stages. The decision
to merge stages is made in a timing-driven fashion, so that the
overall delay of the shifter is minimized. The optimizations involved in our approach are orthogonal to the ideas previously
presented in this section.
We have organized the rest of the brief as follows: In
Section II, we present some background information about the
barrel shifter architecture. In Section III, we discuss our proposed approach in detail. Section IV presents the experimental
results. Conclusions are drawn in Section V.
II. PRELIMINARIES
In this section, we briefly explain the concept of a barrel
shifter and discuss how it is typically synthesized [18]. In a
barrel shifter, if the data input signal is -bit wide, then the shift
bit wide. The width of the output
signal is typically
of the shifter is typically same as the input width ( ). The shifter
stages, where each stage ( ) performs a
is divided into
single shift of 0 or bits, depending on the value of the th bit of
the shift signal. Each bit of the shift signal controls exactly one
barrel shifter stage. The input data is shifted (or not shifted) by
each of the stages in sequence. To implement this, multiplexers
(or an equivalent logic circuit constituted using technology library cells) are used in each stage. Fig. 1 shows the block-level
diagram of a 3-stage barrel shifter (left shifter), where each row
represents a stage. In this figure, the logic-0 input signal is de(Verilog notation). In this diagram, the data input
noted by
signal ( ) is 8-bit wide and the output signal ( ) is also 8-bit
1549-7747/$25.00 © 2007 IEEE
Authorized licensed use limited to: Texas A M University. Downloaded on May 20, 2009 at 03:44 from IEEE Xplore. Restrictions apply.
32
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 55, NO. 1, JANUARY 2008
Fig. 1. Traditional left barrel shifter with 3-stages.
wide. The shift signal has 3 bits (
) and hence the
shifter consists of 3 stages. Similarly, to implement a shifter with
a 128-bit input data signal, the shifter would require 7 stages in
the shifter architecture.
III. OUR APPROACH
Throughout the rest of the brief, we will assume the data input
to the barrel shifter is bits wide and the shift input signal is
bits wide. The output (signal ) is also bits wide. We
by . Let
be the arrival time of the th bit
denote
of the shift signal .
In the traditional barrel shifter architecture, this block has
stages and each stage consists of 2-to-1 Multiplexers (MUXes).
The timing-critical path of the shifter traverses through 2-to-1
MUXes. To estimate the delay of a traditional barrel shifter stage,
we identify the fastest 2-to-1 MUX cell from the provided technology library. The functionality of a 2-to-1 MUX cell can equivalently be implemented by one of the following two logic expressions:
(1)
(2)
In some technology libraries, the built-in 2-to-1 MUX cell
delay is larger than the MUX cells generated from the basic gates
by using the functionality presented in (1) (AND-OR operation)
or (2) (NAND-NAND operation). We consider the smallest of these
three delays as the delay of a single stage of the traditional barrel
.
shifter. We denote this delay as
In this brief, we introduce a technique to implement a faster
barrel shifter. The key idea is to merge multiple (two or three)
stages of the barrel shifter into one stage. We define mergeable
stages as those which can be merged to create a hybrid stage
(leading to faster performance of the shifter block). To identify
mergeable stages, we design a timing-driven algorithm, so that
the overall delay of the shifter block is minimized. In the following two subsections, we discuss each of the two steps (the
design and identification of merged stages) in detail.
A. Design of the Merged Stages
To facilitate the explanation, we will discuss about the leftshifter only. Note that the similar concept applies to right-shifter
as well. In our approach, we attempt to merge two or three stages
of the shifter into one single stage. If two stages are merged, we
call the newly created stage a dual merged stage. On the other
hand, if three stages are merged, then we call the new stage a
triple merged stage. Note that the stages to be merged are not
necessarily consecutive.
In the case of dual merged stages, let us assume that we merge
the stages corresponding to the th bit and the th bit of the shift
,
and
. Note
signal , where
that and do not require to be two consecutive bits of the shift
signal. Our newly created dual merged stage will perform one
of the following four operations:
and
);
1) no shifting operation (if
and
);
2) shift by bits (if
and
);
3) shift by bits (if
) bits (if
and
).
4) shift by (
The functionality of each bit-slice of our dual merged stage
for a left shifter is as follows:
for
where =
, =
, =
,
.
and =
Even if no merging is performed, for the left-shifter, the functionality of a few bitslices near the least significant bits (LSB)
of the shifter gets simplified, because some of the values (in
). For example, in
the above expression) become logic-0 (
) of Fig. 1, two bitslices near the LSB
the middle stage (
have simplified functionality. In case merging is performed, this
simplification can be exploited more aggressively.
The above expressions indicate that the timing-critical path
of each of our dual merged stage consists of a single inverter,
a single 3-input NAND gate and a single 4-input NAND gate. We
. The functionality of the dual merged
denote this delay as
stage can also be implemented by two individual stages of the
barrel shifter placed one after the other. In all the technology
libraries that we have explored, the delay of the dual merged
) is less than the delay of two cascaded stages of the
stage (
).
traditional barrel shifter (
In a similar manner, we can formulate the output equations
of each bitslice of a triple merged stage. Let us assume that we
merge the stages corresponding to the th bit, th bit and the
th bit of the shift signal , where
,
,
,
,
and
. Note that , and
do not require to be three consecutive bits of the shift signal.
Our newly created triple merged stage will perform one of the
following eight operations:
,
and
);
1) no shifting operation (if
,
and
);
2) shift by bits (if
,
and
);
3) shift by bits (if
bits (if
,
and
);
4) shift by
) bits (if
,
and
);
5) shift by (
) bits (if
,
and
);
6) shift by (
) bits (if
,
and
);
7) shift by (
) bits (if
,
and
).
8) shift by (
The functionality of each bit-slice of our triple merged stage
for a left shifter is as follows:
for
where
.
=
,
, =
, =
, =
, =
, =
.
Similar to the dual-merged stages, for the triple merged stage,
the functionality of few bitslices near the LSB (for a left-shifter)
,
=
Authorized licensed use limited to: Texas A M University. Downloaded on May 20, 2009 at 03:44 from IEEE Xplore. Restrictions apply.
=
DAS AND KHATRI: TIMING-DRIVEN APPROACH TO SYNTHESIZE FAST BARREL SHIFTERS
or MSB (for a right-shifter) gets simplified, because some of
the values in above expressions become logic-0. This fact is
aggressively exploited while merging stages.
By decomposing the functionality of each bitslice, we find
that the timing critical path of each triple merged stages consists of a single inverter, a single 4-input NAND gate, a single
3-input NAND gate and a single 3-input OR gate. Based on the
available cells in a technology library, there may be other more
efficient ways of implementing the functionality of each bitslice
as well. A general-purpose technology mapper is able to identify
the most efficient implementation of the triple merged stage of
a shifter. We denote the best possible delay of the triple merged
.
stage as
else
B. Identification of the Mergeable Stages
end while
In addition to the design of the merged stages, the technique
to identify the mergeable stages plays a key role in determining
the performance of our proposed shifter architecture.
Algorithm 1 : Identification of the Mergeable Stages
MergeableStageList = NULL
SelPriorityQueue
= Store s0 ; s1 ; . . . ; sm01 in ascending order of arrival time
while SelPriorityQueue is not empty do
(i; j; k) = Select rst (earliest-arriving) three elements of
the shift signal from SelP riorityQueue:
33
Create a new node (singlestage) with only one element i // Not
suitable for any merging
singlestage:element0 =
i
01
singlestage element2 = 01
singlestage:element1 =
:
Add singlestage into MergeableStageList
Remove (Deque) i from SelPriorityQueue
end if
end if
return MergeableStageList // The list of all stages
The algorithm to identify the mergeable stages is presented
in Algorithm 1. A detailed explanation is provided below. Our
algorithm uses the following timing-driven analysis to find two
or three stages for merging: we store all the bits of the shift
signal in the ascending order of the arrival time. To perform
this operation in an efficient way, we use a priority queue data
structure. Let us assume that the six earliest arriving signals are
, , , , and . For the signals and , if we construct
a dual merged stage, then the output of the dual merged stage
will be available at time
On the other hand, if we construct two individual stages, then
the output of the second stage will be available at time
// If 3 stages are not remaining, then the algorithm takes a simpler
route
Tsingle1
Tdual
=
Tsingle2
Ttriple
=
tsj
+ Del1
+ Del2
= Max((tsi + Del1 ); tsj ) + Del1
=
Tsingle3
tsi
tsk
+ Del3
= Max(Tsingle2 ; tsk ) + Del1
if (Ttriple
Similarly, for the signals , and , if we construct a triple
merged stage, then the output of the triple merged stage will be
available at time
< Tsingle3
) and (Ttriple
<
(Tdual + Del2 =2))
then
On the other hand, if we construct three individual stages and
cascade them, then the output of the third stage will be available
at time
Create a new node (triplestage) with three elements i, j and k //
Select three stages for triple merging
triplestage:element0 = i; triplestage:elementt1 = j
triplestage:element2 =
k
Add triplestage into MergeableStageList
Remove (Deque) i, j and k from SelPriorityQueue
else
if (Tdual
< Tsingle2
) then
Create a new node (dualstage) with two elements i and j // Select
two stages for dual merging
dualstage:element0 = i; dualstage:element1 = j
dualstage:element2 =
01
Add dualstage into MergeableStageList
Remove (Deque) i and j from SelPriorityQueue
Now, if (
) and (
),
then we designate the three stages ( , , and ) of the shifter as
mergeable stages. If the two conditions above are not true and
), then we designate the two stages ( and
if (
) as mergeable stages. If both the above conditions are false
) and (
)),
(in other words, if (
then we do not select stage for merging and implement a single
stage for the stage . Next, we perform the same analysis with
the three stages corresponding to the next three earliest arriving
shift bits. For example, if we implemented a single stage for
stage (in the previous analysis), then we would select stages
, , and in this step of the algorithm. On the other hand, if
we implemented a dual merged stage for stages and (in the
previous analysis), then we would select stages , , and in
this step of the algorithm. This analysis and identification of
mergeable stages continues until all the stages are analyzed. At
Authorized licensed use limited to: Texas A M University. Downloaded on May 20, 2009 at 03:44 from IEEE Xplore. Restrictions apply.
34
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 55, NO. 1, JANUARY 2008
TABLE I
AREA AND DELAY COMPARISON OF SHIFTER BLOCKS GENERATED BY A COMMERCIAL SYNTHESIS TOOL AND BY OUR APPROACH
the end of this algorithm, our approach produces a list of all the
mergeable stages.
Note that during technology mapping in our approach, the
mapper sizes the output of any node based on their load capacitance. Also, the delay analysis for each configuration considers
actual capacitance of the output node, using a load-dependent
delay model. Also, note that our merged nodes do not have high
fanouts (maximum of 4 for dual merged shifter stages and maximum of 8 for triple merged shifter stages).
In terms of the execution of the flow, we first execute the
API in Algorithm 1 to identify the mergeable stages. Once the
configurations of all the stages ( dual merged, triple merged and
unmerged) are identified, then we execute the second API to
implement the merged stages (as well as single-stages) in the
netlist with proper connectivity between the stages (as described
in the Section III-A).
By using the dual merged, triple merged and unmerged
stages, there can be several ways to design a barrel shifter.
For example, a barrel shifter having 4 stages (with 16-bit wide
data input) can be designed in 14 different configurations. As
the bit-width of the shift input of the shifter increases, the
number of possible configurations also increases in a non-linear
fashion. The timing-driven analysis in our algorithm enables us
to identify a timing-efficient merging configuration.
In summary, our approach can potentially reduce the number
of stages of the shifter module by upto one-third (33.33%) of
the original number of stages, by selecting different groups of
three or two stages and merging each group to a single stage.
IV. EXPERIMENTAL RESULTS
We have implemented our proposed algorithm in the C++
programming language. For all our experiments, we used a
Linux workstation running on RedHat 7.1 with the dual-2.2
GHz processors and 4 GB memory.
To test the effectiveness of our approach under varying design
conditions, we used the following design constraints.
• Shifter designs of different input widths:
We used Sh-8 block, where the input data and output signals are 8-bits wide and the shift signals is 3-bits wide. By
following the similar naming conventions, we used Sh-16,
Sh-32, Sh-64 and Sh-128 blocks.
• Different technologies and libraries:
— Two commercial libraries ( and ) for 0.13 .
— Two commercial libraries ( and ) for 0.09 .
— Two commercial libraries ( and ) for 0.065 .
• Different input arrival time constraints:
In many real-life designs, we have noticed that the shift
input of a shifter comes from either the register-banks
or the outputs of multiple datapath blocks (Multiplier,
Squarer, Multiply-Accumulator or Sum-of-Products etc.).
To test the effectiveness of our algorithm in different realistic situations, we used the following scenarios, which
generate different arrival time constraints for the shift
input signal of the shifter block.
timing constraint, where the shift input ( ) of
1)
the shifter design comes from the output (or a selected
set of bits of the output) of a multiplier.
timing constraint, where the shift input of the
2)
shifter design comes from the output of an adder.
timing constraint, where shift input of shifter
3)
comes from output of bus-based multiplexers.
timing constraint, where the shift input of the
4)
shifter design comes from the output of register banks.
as the arrival time of the signal
Let us denote
. Assuming that is a constant, then the
timing constraint can be represented as
Authorized licensed use limited to: Texas A M University. Downloaded on May 20, 2009 at 03:44 from IEEE Xplore. Restrictions apply.
DAS AND KHATRI: TIMING-DRIVEN APPROACH TO SYNTHESIZE FAST BARREL SHIFTERS
We have compared our algorithm against a well-known commercially available datapath synthesis tool. The synthesis tool
generates arithmetic-optimized architectures for all the arithmetic blocks (like shifters) and then performs general-purpose
operations like technology-independent optimizations, constant
propagation, redundancy removal, technology mapping, timingdriven optimization, area-driven optimization etc. While running the synthesis tool, we turned on all the above-mentioned
optimizations. Due to the licensing agreements, we are unable
to mention the name of the commercial tool we used. In Table I,
we report the worst-case delay and the total area results obtained
for the shifter block from the commercial synthesis tool and
from our algorithm. In this table, we report 28 sets of data-points
involving different combinations of shifter blocks, timing constraints and technology libraries.
If we compute the average of all the 28 data-points presented
in the Table I, then our algorithm results in about 10.19% faster
implementation of the shifter block, with a 4.11% area penalty.
State-of-the-art designs have very strict timing goals, hence
most designers would be willing to accept a 10.19% delay
improvement at the expense of a 4.11% area penalty of the
shifter block only.
We also note that the most frequently occurring timing situatiming constraint. In that case,
tion is represented by the
our approach produces largest improvement in speed (11.23%
on an average) with average area overhead of 4.47%. This is in
accordance with our expectation, because the availability of all
the bits of the shift signals at the same time enables our approach
to perform maximal merging of stages, which results in largest
performance improvement.
Forreference purposes, we implemented the traditional barrel
shifter and measured its delay and area numbers across all our
shifter blocks, technology libraries and timing constraints. The
experimental data showed that, on an average, our proposed
shifter is about 13.84% faster than the traditional barrel shifter
with a 7.29% area-penalty.
To verify the correlation of post-synthesis experimental data
with post place-and-route data, we performed placement and
routing on 12 shifter blocks. For these testcases, the average
post-routing worst-case delay of the shifter generated by our
proposed approach is 0.91 (normalized to the worst delay of
the shifter generated by the commercial synthesis tool). Similarly, the post-routing total area of the shifter generated by our
proposed approach is 1.03 (normalized to the total area of the
shifter generated by the commercial synthesis tool). These results after place and route confirm our conclusion about the efficient characteristics of our approach.
In addition to stand-alone shifter blocks, we applied our algorithm on several large industrial designs, where the critical path
traverses a shifter module. In such designs as well, our architecture reduces the delay of the shifter by 8% to 14%.
Our delay improvement is consistent across multiple sizes of
shifters, timing constraints and technology libraries. This underscores the strength of our algorithm. Since the shifter is one
of the key datapath operations in modern digital design, we believe that the timing-critical portions of many real-life designs
can significantly benefit from our algorithm. Since our proposed
35
approach works on general-purpose shifter blocks, it can also be
used for the rotators and shift-rotate blocks.
V. CONCLUSION
In this brief, we have presented a new approach to implement a faster shifter block, which is very useful when the critical path of the design traverses through the shifter block. Our
timing-driven algorithm to identify mergeable shifter stages,
coupled with our architecture based on the merging of these
stages, work seamlessly with different types of shifter blocks,
arrival timing constraints and across different technology domains (0.13 , 0.09 , 0.065 ). Our experimental results indicate that our implementation of the shifter is significantly faster
(with a modest area penalty) than shifters generated by a commercially available datapath synthesis tool.
REFERENCES
[1] R. S. Lim, “A barrel switch design,” Computer Design, pp. 76–78,
1972.
[2] R. Pereira, J. A. Michell, and J. M. Solana, “Pipelined TSPC barrel
shifter with scan test facilities for VLSI implementation of high speed
DSP applications,” in Proc. Euro ASIC ’92, 1992, p. 405.
[3] R. Pereira, J. A. Michell, and J. M. Solana, “Fully pipelined TSPC
barrel shifter for high-speed applications,” IEEE J. Solid-State Circuits,
vol. 30, no. 3, pp. 686–690, Jun. 1995.
[4] S. M. Kang, “Domino-CMOS barrel switch for 32-bit VLSI processors,” IEEE Circuits Devices Mag., vol. 3, no. 3, pp. 3–8, Mar. 1987.
[5] G. M. Tharakan and S. M. Kang, “A new design of a fast barrel switch
network,” IEEE J. Solid-State Circuits, pp. 217–221, 1992.
[6] S.-J. Yih, M. Cheng, and W. S. Feng, “Multilevel barrel shifter for
CORDIC design,” Electron. Lett., vol. 32, no. 13, pp. 1178–79, 1996.
[7] V. Milutinovic, M. Bettinger, and W. Helbig, “Multiplier/ shifter design tradeoffs in a 32-bit microprocessor,” IEEE Trans. Comput., vol.
38, no. 8, pp. 874–880, Aug. 1989.
[8] P. M. Seidel and K. Fazel, “Two dimensional folding strategies for improved layouts of cyclic shifters,” in Proc. IEEE Comput. Soc. Ann.
Symp. VLSI, 2004, pp. 277–278.
[9] M. A. Hillebrand, T. Schurger, and P. M. Seidel, “How to half wire
lengths in the layout of cyclic shifters,” in Proc. IEEE Int. Conf. VLSI
Design, 2001, pp. 339–344.
[10] R. Rafati, S. M. Fakhraie, and K. C. Smith, “A 16-Bit barrel-shifter im),” IEEE Trans. Circuits
plemented in data-driven dynamic logic (
Syst. I, Reg. Papers, vol. 53, no. 10, pp. 2194–2202, Oct. 2006.
[11] A. Vetteth, K. Walus, V. S. Dimitrov, and G. A. Jullien, “Quantum dot
cellular automata carry-look-ahead adder and barrel shifter,” in Proc.
IEEE Emerging Telecommunications Technologies Conf., Dallas, TX,
Sep. 2002.
[12] A. P. Singh, M. Barany, and D. J. Deleganes, “A mixed signal rotator/
shifter for 8 GHz intel/spl reg/ pentium/spl reg/ 4 integer core,” in Proc.
Symp. VLSI Circuits, 2004, pp. 394–397.
[13] K. P. Acken, M. J. Irwin, and R. M. Owens, “Power comparisons for
barrel shifters,” in Proc. Int. Symp. Low Power Electron. Design, 1996,
pp. 209–212.
[14] P. A. Beerel, S. Kim, P.-C. Yeh, and K. Kim, “Statistically optimized
asynchronous barrel shifters for variable length codecs,” in Proc. Int.
Symp. Low Power Electronics and Design, 1999, pp. 261–263.
[15] R. Ramadoss, “A new breed of power-aware hybrid shifters,” in Proc.
IEEE Int. SOC Conf., 2005, pp. 143–146.
[16] K. H. Abed and R. E. Siferd, “CMOS VLSI implementation of a
low-power logarithmic converter,” IEEE Trans. Comput., vol. 52, pp.
1421–1433, 2003.
[17] R. V. K. Pillai, D. Al-Khalili, and A. J. Al-Khalili, “Energy delay measures of barrel switch architectures for pre-alignment of floating point
operands for addition,” in Proc. Int. Symp. Low Power Electronics and
Design, 1997, pp. 235–238.
[18] M. D. Ercegovac and T. Lang, Digital Arithmetic, ser. Computer Architecture and Design. New York: Morgan Kaufman, 2003.
Authorized licensed use limited to: Texas A M University. Downloaded on May 20, 2009 at 03:44 from IEEE Xplore. Restrictions apply.
DL
Download