A Column-Row-Parallel ASIC Architecture for 3D Kailiang Chen

advertisement
A Column-Row-Parallel ASIC Architecture for 3D
Wearable / Portable Medical Ultrasonic Imaging
by
Kailiang Chen
B.E., Tsinghua University (2007)
S.M., Massachusetts Institute of Technology (2009)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
February 2014
c Massachusetts Institute of Technology 2014. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
January 31, 2014
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Charles G. Sodini
LeBel Professor of Electrical Engineering
Thesis Supervisor
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anantha P. Chandrakasan
Joseph F. and Nancy P. Keithley Professor of Electrical Engineering
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Leslie A. Kolodziejski
Chair, Department Committee on Graduate Students
2
A Column-Row-Parallel ASIC Architecture for 3D Wearable
/ Portable Medical Ultrasonic Imaging
by
Kailiang Chen
Submitted to the Department of Electrical Engineering and Computer Science
on January 31, 2014, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
This work presents a scalable Column-Row-Parallel ASIC architecture for 3D wearable / portable medical ultrasound. It leverages programmable electronic addressing
to achieve linear scaling for both hardware interconnection and software data acquisition. A 16x16 transceiver ASIC is fabricated and flip-chip bonded to a 16x16 capacitive micromachined ultrasonic transducer (CMUT) to demonstrate the compact,
low-power front-end assembly. A 3D plane-wave coherent compounding algorithm is
designed for fast volume rate (62.5 volume/s), high quality 3D ultrasonic imaging.
An interleaved checker board pattern with I&Q excitations is also proposed for ultrasonic harmonic imaging, reducing transmitted second harmonic distortion by over
20dB, applicable to nonlinear transducers and circuits with arbitrary pulse shapes.
Each transceiver circuit is element-matched to its CMUT element. The high
voltage transmitter employs a 3-level pulse-shaping technique with charge recycling
to enhance the power efficiency, requiring minimum off-chip components. Compared
to traditional 2-level pulsers, 50% more acoustic power delivery is obtained with the
same total power dissipation. The receiver is implemented with a transimpedance
amplifier topology and achieves a lowest noise efficiency factor in theqliterature (2.1
compared to a previously reported lowest of 3.6, in unit of mP a · mW/Hz). A
source follower stage is specially designed to combine the analog outputs of receivers in
parallel, improving output SNR as parallelization increases and offering flexibility for
imaging algorithm design. Lastly, fault-tolerance is incorporated into the transceiver
to deal with faulty elements within the 2D MEMS transducer array, increasing yield
for the system assembly.
Thesis Supervisor: Charles G. Sodini
Title: LeBel Professor of Electrical Engineering
Thesis Supervisor: Anantha P. Chandrakasan
Title: Joseph F. and Nancy P. Keithley Professor of Electrical Engineering
3
4
Acknowledgments
Finishing my Ph.D. is not possible without the enduring love from my parents and
wife. I would like to thank them for all their support. Recently we have been through
difficult moments together, but I look forward to the good days to come.
I feel extremely fortunate to work under the joint supervision of Prof. Charlie
Sodini and Prof. Anantha Chandrakasan. I am grateful to Charlie, who is a great
teacher for me inside and outside of school. I learned from him to always try to seek
for insight and intuition behind a problem. I also learned from him to be down-toearth, yet persistent, both in research and in life. I enjoyed our conversations, softball
games played together for MTL, Redsox games, and of course, the Hong Kong trip.
All of them are unforgettable.
I would like to express my gratitude to Anantha. Even as the Department Head
with an incredibly busy schedule, I was able to receive ample guidance from him. He
is always resourceful and creative, which sets me a standard for a good researcher.
I would like to thank Prof. Greg Wornell for being in my thesis committee and
providing insights about imaging system trade-offs; Prof. Harry Lee for providing
many clever circuit design ideas; Dr. Kai Thomenius for teaching me a lot of ultrasonics know-how; Dr. Brian Brandt for continued support for my test setup and
career development; Prof. Thomas Heldt, Tom O’Dwyer, Dr. Dennis Buss, Dr. Peter
Holloway, and Mr. Haiyang Zhu for many useful technical discussions. I am thankful
for all their help to my project.
I am grateful to people who helped me with the hardware system assembly, which
is the key to the successful project demonstration. The ASIC fabrication is generously made possible through the TSMC University Shuttle Program. The CMUT
samples are obtained from Prof. Butrus (Pierre) Khuri-Yakub’s research group at
Stanford University; students Byung Chul Lee, Anshuman Bhuyan, and Jung Woo
Choe offered me many handy tips to work with the device. The CMUT-PCB-ASIC
flip-chip bonding assembly was done with the help of Dr. Helen Kim and MIT Lincoln Laboratory. The acrylic oil tank and the 3D translation stage were designed and
5
built with the assistance of MIT Central Machine Shop.
It has been a pleasant journey because of my colleagues in the Sodini/Lee lab
and the Anantha group. In particular, I would like to thank Bonnie Lam, Sabino
Pietrangelo, Joohyun Seo, and Katherine Smyth for a lot of intriguing discussions
about ultrasonics. Also, I would like to thank Sunghyuk Lee, SungWon Chung, Wei
Li, and Marcus Yip for the tremendous help during my tape-outs. Daniel Piedra,
Allen Hsu, Bin Lu, and Jerome Lin taught me how to operate a probe station to take
accurate measurements on a bare silicon die. Moreover, I would like to thank David
He, Amanda Gaudreau, Philip Godoy, Jack Chu, Grant Anderson, Doyeon Yoon, Xi
Yang, Eric Winokur, Maggie Delano, Daniel Kumar, Bruno Do Valle, and many more
for being great labmates with whom I could hang out and have fun. Last but not
least, Coleen Milley and Margaret Flaherty have been very supportive in logistics,
who always make sure everything in lab runs smoothly.
This project is funded by the C2S2 Focus Center, one of six research centers
funded under the Focus Center Research Program (FCRP), a Semiconductor Research
Corporation entity; Texas Instruments; and the MIT Center for Integrated Circuits
and Systems (CICS).
6
Contents
1 Introduction
23
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
The Challenge for Implementing a 3D Wearable / Portable Ultrasonic
23
Imaging Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
1.3
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
1.4
Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2 Background Information
29
2.1
Ultrasonic Imaging Modes . . . . . . . . . . . . . . . . . . . . . . . .
29
2.2
The Beam-formation Principle . . . . . . . . . . . . . . . . . . . . . .
32
2.3
Ultrasonic Transducers . . . . . . . . . . . . . . . . . . . . . . . . . .
34
2.4
Field II Simulation Program . . . . . . . . . . . . . . . . . . . . . . .
36
3 The Column-Row-Parallel Architecture for 3D Ultrasonic Imaging 39
3.1
The Prior Art of Architectures for 3D Ultrasonic Imaging . . . . . . .
39
3.2
The Motivation of the Column-Row-Parallel ASIC Architecture . . .
42
3.3
The Column-Row-Parallel ASIC Architecture . . . . . . . . . . . . .
44
3.4
The Functionality of the Column-Row-Parallel Architecture
. . . . .
49
3.5
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4 3D Ultrasonic Imaging System Experiments
4.1
55
The Hardware System Assembly . . . . . . . . . . . . . . . . . . . . .
55
4.1.1
58
The PCB-CMUT Connection . . . . . . . . . . . . . . . . . .
7
4.2
4.3
4.4
4.5
4.1.2
The PCB-ASIC Connection . . . . . . . . . . . . . . . . . . .
60
4.1.3
The Flip-Chip Bonding Assembly Process . . . . . . . . . . .
62
4.1.4
Mounting onto the Oil Tank . . . . . . . . . . . . . . . . . . .
66
Plane-wave Coherent Compounding for Fast Volume Rate 3D Ultrasonic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
4.2.1
PWCC for 2D Imaging . . . . . . . . . . . . . . . . . . . . . .
69
4.2.2
Extending PWCC to 3D Imaging on the Column-Row-Parallel
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
4.2.3
PWCC3D Results: Simulations and Measurements . . . . . .
77
4.2.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Interleaved Checker Board Tx Apertures with I&Q Excitations for HD2
Reduction in Ultrasonic Harmonic Imaging . . . . . . . . . . . . . . .
88
4.3.1
THI Principle and Previous Methods . . . . . . . . . . . . . .
89
4.3.2
Tx HD2 Suppression on the Column-Row-Parallel Architecture
91
4.3.3
Experimental Results . . . . . . . . . . . . . . . . . . . . . . .
93
Annular Ring Apertures for Forward-looking Imaging Applications . .
96
4.4.1
Annular Ring Apertures on Column-Row-Parallel Architecture
96
4.4.2
Annular Ring Imaging Results . . . . . . . . . . . . . . . . . .
99
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 Design of the 16x16 Ultrasonic Transceiver Array ASIC with ColumnRow-Parallel Architecture
5.1
103
High-Level Description of the Ultrasonic Imaging Transceiver Circuits
and the Architecture Logic Implementation . . . . . . . . . . . . . . . 103
5.2
5.3
Tx Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.1
Multi-Level Pulsing for Efficient CMUT Driver
. . . . . . . . 108
5.2.2
3-Level Pulser Circuit Design . . . . . . . . . . . . . . . . . . 111
5.2.3
Tx Path Design for 2D Ultrasonic Transducer Arrays . . . . . 114
Rx Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.3.1
LNA Optimization Methodology for CMUT . . . . . . . . . . 116
8
5.3.2
LNA Transistor-Level Implementation . . . . . . . . . . . . . 120
5.3.3
Rx Path Design for 2D Ultrasonic Transducer Arrays . . . . . 122
5.4
Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.5
The Fault-Tolerant ASIC Design for Faulty MEMS Devices . . . . . . 131
6 ASIC Characterization
6.1
137
Tx Ultrasonic Power and Efficiency Measurement . . . . . . . . . . . 137
6.1.1
Measuring Acoustic Output Power . . . . . . . . . . . . . . . 138
6.1.2
Measuring Tx Efficiency . . . . . . . . . . . . . . . . . . . . . 141
6.2
LNA Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.3
The Tx Beam-Steering Experiment . . . . . . . . . . . . . . . . . . . 149
6.4
The Pulse-Echo Experiment . . . . . . . . . . . . . . . . . . . . . . . 151
7 Conclusion
155
7.1
Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9
10
List of Figures
2-1 The typical signals and the operation for B-mode ultrasound. . . . . .
30
2-2 Simplified block diagram of a ultrasound BF system, figure courtesy
of [27]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2-3 A typical Field II flow diagram for ultrasonic system behavioral simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3-1 Column-parallel architecture implementations in the literature: (a) a
1D transducer array mechanically translated to scan the 3D space, elevation beam-formation is done by a synthetic virtual source technique,
figure courtesy of [3]; (b) a 2D array operated to receive row-by-row,
elevation beam-formation is done by sub-array delay-and-sum across
the column using analog delay lines, figure courtesy of [55]. . . . . . .
41
3-2 The column-row addressing scheme implemented on a 256x256 2D
transducer array: (a) row-by-row transmit addressing; (b) columnby-column receive addressing; (c) the “Maltese cross” beam-pattern.
Figure courtesy of [38]. . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3-3 A column-row addressing architecture implemented at the circuit-level,
with column and row interconnections that reduce the system channel
count and provide maximum flexibility for algorithms. . . . . . . . . .
44
3-4 Column-Row-Parallel architecture block diagram, the CMUT and ASIC
chips are stacked vertically. . . . . . . . . . . . . . . . . . . . . . . . .
11
45
3-5 (a) The block-level implementation of one transceiver channel and (b)
the per-element logic implementation. Column and row select logic
is implemented with shift registers that can be reprogrammed in “N ”
time (implementation detail will be shown in Figure 5-2). . . . . . . .
47
3-6 (a) Tx input port multiplexing, implemented with digital logic; (b) Rx
output port multiplexing, implemented with analog pass-gates. . . . .
49
3-7 The architecture configured in a column-parallel mode for the Tx aperture. The configuration is broken down and illustrated in steps (a)
through (d) to help understanding. Two rows are activated as the Tx
aperture and beam-formation along azimuth (X) direction is achieved.
51
3-8 The architecture configured in a row-parallel mode for the Rx aperture.
Five columns are activated as the Rx aperture and beam-formation
along elevation (Y) direction is achieved. . . . . . . . . . . . . . . . .
52
3-9 More use examples of the proposed architecture: (a) a diagonal Rx
aperture; (b) a checker board Tx aperture for ultrasonic harmonic
imaging; (c) & (d) annular ring Tx and Rx apertures for forwardlooking ultrasonic imaging applications. . . . . . . . . . . . . . . . . .
53
4-1 System integration diagram showing the flip-chip bonding connection
between CMUT and ASIC through a PCB interposer. The figure also
shows the mechanical setup for imaging experiments, including an oil
tank and a 3D translation stage. . . . . . . . . . . . . . . . . . . . . .
56
4-2 The picture of the hardware system setup. . . . . . . . . . . . . . . .
57
4-3 The block diagram of the hardware system setup. . . . . . . . . . . .
57
4-4 The 16x16 CMUT die drawings: (a) the footprint of the CMUT; (b)
the CMUT flip-chip bonding pad metal structure drawing, courtesy
of [40]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
58
4-5 The two different PCB designs made to fit CMUT footprints: (a) the
PCB version A’s footprint for CMUT with a gap distance of 250µm;
(b) the PCB version B’s footprint for CMUT with a gap distance of
373.75µm, only 1x16 pads are made on the PCB side due to space
limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4-6 The drawing of a PCB pad defined with a solder mask, and bumped
with a solder ball. The PCB pad is used to do flip-chip bonding to the
CMUT die. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4-7 The ASIC die drawings: (a) the footprint of the ASIC, containing the
center 18x16 pads to be element-matched and connected to CMUT
through the PCB interposer, and the surrounding I/O pads; (b) the
PCB interposer layout design that allows the ASIC I/O pads to be
routed out to the PCB edges. . . . . . . . . . . . . . . . . . . . . . .
61
4-8 The ASIC flip-chip bonding pad metal structure drawings: (a) the horizontal view of a flip-chip bonding pad in ASIC; (b) the cross-sectional
view of the ASIC flip-chip bonding pad.
. . . . . . . . . . . . . . . .
62
4-9 The CMUT-PCB-ASIC two-step flip-chip bonding process: (a) first
step, the bonding between PCB and ASIC; (b) second step, the bonding between PCB and CMUT, with ASIC already bonded to PCB. . .
63
4-10 The CMUT-ASIC connection result pictures: (a) the bonded PCBASIC assembly shows good connectivity; (b) the solder bumps at the
PCB’s CMUT side is reflowed after PCB-ASIC bonding, any deformation would be restored. . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4-11 The PCB-CMUT bonding connection is verified by pulling off the test
CMUT die from the PCB after bonding and reflow. (a) & (b) show the
CMUT connection posts remain on the PCB after the pull, indicating
good connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4-12 The finished CMUT-PCB-ASIC assembly: (a) cross-sectional view of
the sandwich stack; (b) CMUT side assembly picture; (c) ASIC side
assembly picture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
65
4-13 The acrylic tank drawings: (a) the tank dimension drawing; (b) the
mounting between the oil tank and the CMUT-PCB-ASIC assembly.
66
4-14 The illustration of how PWCC works for 2D ultrasonic imaging, courtesy of [68]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4-15 The principle of coherent compounding used in PWCC, courtesy of [68]:
(a) the imaging space; (b) the beam-formation delay calculation when
the transmitted plane-wave is normal to the transducer surface (α =
0o ); (c) the beam-formation delay calculation when the transmitted
plane-wave is steered to an angle of α. . . . . . . . . . . . . . . . . .
70
4-16 The signal processing flow for PWCC3D on the Column-Row-Parallel
architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4-17 The PWCC3D implementation on the Column-Row-Parallel architecture: (a) Tx beam-steering along azimuth (X) direction using columnparallel mode; (b) Tx beam-steering along elevation (Y) direction using
row-parallel mode; (c)-(e) Rx signal acquisition, sweeping through 16
rows for each transmit angle. . . . . . . . . . . . . . . . . . . . . . . .
76
4-18 The sequence of operation to implement PWCC3D on the ColumnRow-Parallel architecture. . . . . . . . . . . . . . . . . . . . . . . . .
77
4-19 The setup of the wire phantom imaging experiment using PWCC3D
algorithm: (a) a single plane-wave is transmitted to image the wire
phantom; (b) five different Tx angles are used along the azimuth direction for PWCC3D. . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4-20 Simulation results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional image produced from 5-angle coherent compounded
plane-wave insonification; (c) lateral resolution plot from single planewave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal crosssectional image from 5-angle plane-waves. . . . . . . . . . . . . . . . .
14
80
4-21 Measurement results of a wire phantom: (a) vertical cross-sectional
image produced from single angle plane-wave insonification; (b) vertical cross-sectional image produced from 5-angle coherent compounded
plane-wave insonification; (c) lateral resolution plot from single planewave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal crosssectional image from 5-angle plane-waves. . . . . . . . . . . . . . . . .
81
4-22 The setup of the ring phantom imaging experiment using PWCC3D
algorithm: (a) a single plane-wave is transmitted to image the phantom; (b) five different Tx angles are used along the azimuth direction
and another five Tx angles along the elevation direction to image the
phantom with PWCC3D. . . . . . . . . . . . . . . . . . . . . . . . . .
82
4-23 Measured horizontal cross-sectional images of a ring phantom: (a)
single-angle Tx plane-wave; (b) 5-angle Tx plane-wave compounding
along azimuth direction; (c) 5-angle Tx plane-wave compounding along
elevation direction; (d) compounding across all 5-angle azimuth and 5angle elevation directions. . . . . . . . . . . . . . . . . . . . . . . . .
83
4-24 Measured vertical cross-sectional images of a ring phantom: (a) singleangle Tx plane-wave; (b) compounding across all 5-angle azimuth and
5-angle elevation directions; (c) lateral resolution plot of ring image
from single-angle Tx plane-wave; (d) lateral resolution plot of ring
image from 5-angle X and 5-angle Y plane-waves. . . . . . . . . . . .
84
4-25 Simulated XZ cross-sectional images showing the three cysts in one
slice image: (a) image generated from single-angle plane-wave; (b)
image generated from 5 azimuth-angle and 5 elevation-angle planewaves compounded; (c) the cross-sectional image location in 3D space.
85
4-26 Simulated YZ cross-sectional images showing the cyst at (−3, 0, 25)mm:
(a) image generated from single-angle plane-wave; (b) image generated
from 5 azimuth-angle and 5 elevation-angle plane-waves compounded;
(c) the cross-sectional image location in 3D space. . . . . . . . . . . .
15
86
4-27 Simulated YZ cross-sectional images showing the cyst at (0, 0, 35)mm:
(a) image generated from single-angle plane-wave; (b) image generated
from 5 azimuth-angle and 5 elevation-angle plane-waves compounded;
(c) the cross-sectional image location in 3D space. . . . . . . . . . . .
86
4-28 Simulated YZ cross-sectional images showing the cyst at (3, 0, 45)mm:
(a) image generated from single-angle plane-wave; (b) image generated
from 5 azimuth-angle and 5 elevation-angle plane-waves compounded;
(c) the cross-sectional image location in 3D space. . . . . . . . . . . .
87
4-29 Implementation of checker board Tx aperture on the proposed architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
4-30 Simulation comparison between the conventional and I&Q methods:
(a) fundamental component spatial intensity for conventional; (b) fundamental component spatial intensity for I&Q; (c) HD2 spatial intensity for conventional; (d) HD2 spatial intensity for I&Q. . . . . . . . .
94
4-31 Annular ring mode imaging implemented in Column-Row-Parallel architecture: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposed architecture, all active elements are driven
in-phase; (c) Rx aperture with the biggest ring shape, all active elements’ analog outputs are combined; (d) Rx aperture with the 2nd
ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture
with the smallest ring shape. . . . . . . . . . . . . . . . . . . . . . . .
97
4-32 Annular ring mode dynamic beam-formation scheme. . . . . . . . . .
98
4-33 Annular ring configuration example, off-center: (a) Tx and Rx aperture
setup; (b) Tx aperture implemented in the proposed architecture; (c)
Rx aperture with the biggest ring shape; (d) Rx aperture with the 2nd
ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture
with the smallest ring shape. . . . . . . . . . . . . . . . . . . . . . . . 100
16
4-34 Cross-section slices of the wire phantom 3D images from simulation
and measurement: (a) simulated XZ slice; (b) measured XZ slice; (c)
simulated YZ slice; (d) measured YZ slice; (e) simulated XY slice; (f)
measured XY slice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5-1 A re-plot of Figure 3-5 in Section 3.3. (a) The block-level implementation of one transceiver channel and (b) the per-element logic implementation. Column and row select logic is implemented with shift registers
that can be reprogrammed in “N ” time (implementation detail will be
shown in Figure 5-2). . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5-2 Circuit implementation for the logic control: (a) multiplexing for perelement enable bits; (b) Tx row / column selection logic; (c) Rx row /
column selection logic. . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5-3 (a) The transmitter load model of a CMUT element used in this work.
(b) An exemplary 2-level square wave pulse applied onto CMUT. (c)
An exemplary 3-level pulse applied onto CMUT. . . . . . . . . . . . . 109
5-4 Circuit schematic of the four-channel 3-level pulsers with the middlevoltage generation (all transistors are high voltage devices). . . . . . . 111
5-5 The digital control circuits for the pulser: (a) the signal flow and block
diagrams; (b) the non-overlapping signal generator; (c) the level shifter
implementation; (d) the control signal timing diagram. . . . . . . . . 113
5-6 Tx design for the 2D array: (a) 2D pulser schematic; (b) MUX implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5-7 Small signal model and noise sources of the CMUT element and the
LNA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5-8 Transfer functions when the LNA optimality condition is reached. . . 118
5-9 Transfer function examples when the LNA optimality condition of fi ≈
fp is not reached: (a) fi < fp , (b) fp < fi . . . . . . . . . . . . . . . . . 118
5-10 Transfer function examples: (a) fi < fp , (b) fi ≈ fp , (c) fi > fp . . . . 120
17
5-11 The LNA schematic, implemented in the TIA topology. All transistors
are low voltage devices except the HV Rx Switch M10. . . . . . . . . 121
5-12 Design optimization for input stage transistors: (a) transistors are sized
at the boundary of strong and weak inversion; (b) transistor width is
optimized for the lowest noise figure. . . . . . . . . . . . . . . . . . . 122
5-13 The signal and noise combining with two Rx channels in parallel: (a)
two channels on the same line, shown in Thevenin’s equivalent circuit
at LNA outputs; (b) two channels on the same line, shown in Norton’s
equivalent circuit at LNA outputs (c) two channels combined, showing
the resultant signal and noise amplitudes. . . . . . . . . . . . . . . . . 124
5-14 The LNA schematic, implemented in the TIA topology. All transistors
are low voltage devices except the HV Rx Switch M10. “vip” node is
also buffered with a source follower to output (not shown). . . . . . . 127
5-15 Parallelism with even more Rx channels by utilizing intermediate line
buffers to preserve the circuit performance. . . . . . . . . . . . . . . . 129
5-16 The biasing circuit for the 2D array. . . . . . . . . . . . . . . . . . . . 130
5-17 The technique used for detecting and isolating the short CMUT elements: (a) front-end transistors in each channel and their control
voltages; (b) the effective circuit connection of all 256 channels with
CMUT elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5-18 Two successful 16x16 CMUT-ASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC. The rest of the elements
are functional and their sensitivity performance is expressed by the
brightness of the elements, which will be described in detail in Section
6.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6-1 The photo of the lab setup for measuring the acoustic output power
and the Tx efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6-2 Acoustic output power and Tx efficiency measurement setup. . . . . . 140
18
6-3 Normalized RMS pressure along the transducer axial axis, measurement vs. simulation. The measurement deviates from the simulation
in the near field because the hydrophone tip is too close to the transducer surface, distorting the pressure field. . . . . . . . . . . . . . . . 140
6-4 (a) Tx efficiency measurement setup and pulse shape definition. (b)
Measured time-domain waveform of the optimal 3-level 3.3MHz pulses,
∆=20ns, ∆/T=0.067 . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6-5 Tx efficiency measurement results using different 3-level pulse shapes
by varying the ∆/T ratio and at different frequencies. . . . . . . . . . 143
6-6 The die photo of the four-channel ultrasonic imaging transceiver test
chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6-7 The die photo of the 256-channel 16x16 2D ultrasonic imaging transceiver
test chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6-8 (a) Measured ultrasonic lateral beam profile, steered to the center
(broadside). (b) Measured beam profile, with 30ns delay between channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6-9 The setup of the pulse-echo experiment for characterizing the complete
ultrasound channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6-10 The key waveforms from the pulse-echo experiment, showing the ultrasound channel characteristics. (a) The transmitted pulse waveform.
(b) The received echo waveform. (c) The spectrum of the received echo
waveform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6-11 A re-plot of Figure 5-18 in Section 5.5. Two successful 16x16 CMUTASIC assemblies with short CMUT elements (marked in red) isolated
by the ASIC. The rest of the elements are functional and their sensitivity performance is expressed by the brightness of the elements. . . 153
7-1 Four 16x16 ASICs tiled together for a 32x32 imaging front-end. . . . 159
19
7-2 CMUT-ASIC assembly alternatives to eliminate the interposer PCB:
(a) TSV technology for interconnecting ASIC I/Os to the main testing PCB; (b) Applying flip-chip bonding technology for CMUT-ASIC
interconnection and wire-bonding for ASIC I/Os. . . . . . . . . . . . 159
20
List of Tables
4.1
Simulated HD2 improvement of the I&Q method. . . . . . . . . . . .
95
4.2
Measured HD2 improvement of the I&Q method. . . . . . . . . . . .
95
5.1
SNR improvement from Rx channel parallelism, theory prediction and
measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1
Measured Power and Efficiency Comparison at 3.3MHz for the 1D
ASIC and CMUT (40pF capacitance per element) . . . . . . . . . . . 143
6.2
Measured Optimal 3-level Pulser Performance Summary for the 1D
ASIC and CMUT (40pF capacitance per element) . . . . . . . . . . . 144
6.3
Measured Optimal 3-level Pulser Performance Summary for the 2D
ASIC and CMUT (2pF capacitance per element) . . . . . . . . . . . . 144
6.4
CMUT Pulser Performance Comparison . . . . . . . . . . . . . . . . 145
6.5
Measured LNA Performance Summary for the 1D ASIC [5] . . . . . . 145
6.6
Measured LNA Performance Summary for the 2D ASIC . . . . . . . . 146
6.7
CMUT LNA Performance Comparison . . . . . . . . . . . . . . . . . 147
21
22
Chapter 1
Introduction
1.1
Motivation
Ultrasonic imaging is an important modality for medical diagnosis. Compared to
other imaging modalities, ultrasound is relatively low cost, harmless to human health,
and has decent resolution. Modern ultrasonic imaging systems are becoming increasingly complex and powerful, yet compact, benefiting from Moore’s law [1]. Laptop-size
ultrasound systems have gained comparable performance to the traditional cart-size
machines; hand-held devices, such as the GE Vscan [2], indicates the trend toward
highly integrated ultrasonic imaging solutions to enable portable or even wearable
ultrasound applications in hospital and at home.
Traditional 2D medical ultrasonic imaging systems have been in wide use for
decades. A 2D imaging system uses a 1D ultrasonic transducer probe and generates rectangular or sector-shape 2D cross-sectional images of human tissue or organs.
These systems exist predominantly in hospital settings where professional sonographers are available to operate the system. They would carefully angle and position the
probe against the human body, so as to produce satisfactory 2D medical images for
diagnosis. This process is manual and requires extensive training for the operators,
adding complexity and extra cost to the diagnostic procedure.
On the other hand, 3D medical ultrasonic imaging systems provide a full view
of human tissue or organs in space, rather than cross-sectional views in 2D imaging
23
systems. The 3D volumetric image data represent a more comprehensive set of data
which could be more easily interpreted to help locate target of medical interest. As a
result, the manual search of the “best” 2D slice image performed by the sonographers
holding a 1D probe is possible to be substituted with an automated search algorithm
in a 3D imaging system. Furthermore, by leveraging advanced microelectronics technology, a compact and low-power ultrasonic hardware system can be built to enable
wearable / portable self-monitoring ultrasonic imaging devices at home. Therefore,
one could imagine an automated imaging system that continuously tracks human tissue or organs of interest and produces long-term medical information with minimum
reliance on experienced sonographers.
1.2
The Challenge for Implementing a 3D Wearable / Portable Ultrasonic Imaging Device
A typical 1D array for a 2D imaging system has an element count of as high as
one thousand. The interconnection from the transducer elements to the interfacing
electronics are co-axial cables. When it comes to 3D imaging systems, 1D ultrasonic
transducer arrays had been used historically to acquire the 3D volumetric data, by
being mechanically translated [3] or rotated [4] to cover the whole 3D space. A slice
of 2D image is formed at each physical position of the 1D array. Multiple 2D slice
images are stitched together to form the 3D volumetric image. These mechanical
approaches have many disadvantages. For example, the image resolution tends to be
poor due to the relatively large incremental step size of the mechanical movement;
the image frame rate or volume rate could be limited by the mechanical movement
speed; the system integration tends to be bulky and system power consumption is
high because a mechanical motor is needed.
More recently, 2D ultrasonic transducer arrays made from a micromachining process have become more available and proven to be more suitable for 3D ultrasonic
imaging. As a result, the mechanical movement is replaced by electrical addressing;
24
the coarse motor stepping is replaced by the much finer element-to-element spacing;
the image frame rate or volume rate is no longer limited by the speed of mechanical
movement; and system size and power are reduced to allow long-term wearable /
portable hardware solutions.
However, an electronic system working with a 2D array is much harder to be
built. Most notably, the interconnection between a 2D transducer array and its supporting electronics is a bottleneck. Because a NxN 2D transducer array contains N 2
transducer elements, if a dedicated electronic channel is provided for each transducer
element to control the transmit and receive operation, the active channel count of
the electronic integrated circuits is also N 2 . Therefore, as the transducer array size
grows, it is very difficult to keep up with the N 2 growth of active channels. The
hardware complexity, instantaneous power dissipation, and interconnect count would
quickly become unmanageable.
1.3
Contribution
To overcome the interconnect problem in interfacing to a 2D ultrasonic transducer
array for 3D ultrasonic imaging, this thesis proposes new solutions at the circuit,
architecture and algorithm levels.
At the circuit-level, the analog front-end (AFE) transmitter (Tx) and receiver (Rx)
circuits need to be optimized for power efficiency, performance and size, in order to
work optimally with the ultrasonic transducer elements [5, 6]. For the transmitter, a
3-level pulse-shaping high voltage pulser is designed to drive the transducer elements
with improved power efficiency and minimum off-chip components. For the receiver,
a low-noise amplifier (LNA) is implemented with a transimpedance amplifier (TIA)
topology to achieve excellent noise, power and bandwidth trade-offs, offering a low
power, high efficiency receiver solution. The transceiver front-end circuit is designed
to be element-matched to the transducer, replacing traditional cable connections with
flip-chip bonding assembly between the 2D transducer die and the 2D electronics
ASIC die. The compact, cable-less assembly avoids excessive parasitic capacitance
25
from the cable and leads to an integrated, low-power solution for wearable / portable
applications.
At the architecture-level, the addressing and control mechanism for the 2D array
of elements needs to be designed carefully to not only reduce hardware and interconnect complexity, but also to maintain enough support for software flexibility. A
Column-Row-Parallel architecture is proposed to reduce the AFE interconnect requirement from N 2 to N . At the same time, the highly programmable architecture
design guarantees strong support for system-level algorithm needs. It is compatible
to existing widely used beam-formation algorithms, and provides possibilities of using
the 2D array differently for new applications.
At the algorithm-level, beam-formation algorithms are also indispensable to compress and generate beamformed ultrasonic data to form the 3D volumetric images.
The algorithm design is tightly connected with architecture design and we propose
new ways of using the 2D array to achieve fast volume rate imaging with adequate
image quality, as well as a new way of reducing transmitter second harmonic distortion (HD2). Extensive in-vitro experiments have been carried out to validate and
evaluate the beam-formation algorithms and hardware system performance, including various 3D imaging algorithms, ultrasonic harmonic imaging mode, Tx efficiency
characterization, and pulse-echo characterization [5–7].
1.4
Thesis Organization
This thesis is organized into the following chapters:
Chapter 2 introduces the needed background information for the discussion of 3D
ultrasonic imaging systems in this thesis. This includes a brief description of various
ultrasonic imaging modes, the beam-formation principle, and the transducer types.
Chapter 3 first lists previous solutions to 3D ultrasonic imaging. A different architecture that offers better system trade-offs is motivated. The overview of the
proposed Column-Row-Parallel architecture is then described, which shows the potential to reduce hardware interconnection complexity while maintaining software
26
flexibility. Several examples of operation illustrate the architecture functionality to
perform column-parallel addressing, row-parallel addressing, or special patterns.
Chapter 4 presents ultrasonic imaging applications that show what the ColumnRow-Parallel architecture is capable of, without going into circuit details yet. It starts
with the hardware system assembly description. The CMUT-PCB-ASIC flip-chip
bonding assembly process is discussed in detail and the whole electrical + mechanical
test setup is shown. Three Column-Row-Parallel application examples are given afterwards. 3D Plane-wave coherent compounding (PWCC3D) algorithm is proposed
and demonstrated as a fast volume rate, high quality 3D imaging solution. Annular
ring aperture mode is presented for forward-looking intravascular ultrasound (IVUS)
and intracardiac echocardiography (ICE) applications. And a checker board pattern
is used for second harmonic suppression for ultrasonic harmonic imaging mode.
Chapter 5 provides circuit design details for a 16x16 Column-Row-Parallel test
chip working with a 16x16 CMUT. The implementation of architecture control logic,
transmitter, receiver, and biasing circuits are described. The transmitter and receiver circuit design reflects the optimization considerations for the specific target
transducer, in which the sensory interface for capacitive source / load is used. On the
other hand, the control logic and the biasing circuits reflect the architecture implementation, which is general to different transducer types. The last section explains
the fault-tolerance against transducer defects incorporated by the transceiver circuit
implementation, which is critical for front-end electronics working with MEMS devices with large element count.
Chapter 6 shows various circuit characterizations, which are complementary to
the system experiments described in Chapter 4. The transmitter and the receiver are
characterized as individual blocks; their circuit performance is summarized. Several
acoustic / electrical characterizations are also carried out, including the Tx beamsteering demonstration, and pulse-echo experiment.
Finally, Chapter 7 concludes the work with a summary of contributions and lists
directions for future work.
27
28
Chapter 2
Background Information
This chapter provides the needed background information about ultrasonics, in preparation for the discussion of 3D ultrasonic imaging systems.
2.1
Ultrasonic Imaging Modes
Ultrasonic imaging systems are generally active imaging systems. The system stimulates the transducers to transmit ultrasonic waves into the medium (human body);
the reflected ultrasonic echoes are then received and processed to generate images,
which visualize the medium [8–10] or provide flow information through Doppler processing [11–15].
Medical ultrasound systems use different “imaging modes” to assist various diagnoses [8,9]. For visualization of the tissue anatomy, the most common imaging modes
include A, B, C and M modes [9, 10]. The B-mode is the most common mode and its
typical operation is shown in Figure 2-1. The imaging system uses a 1D transducer
array and pulsed ultrasonic waves to probe the tissue medium, in order to acquire
a 2D grayscale image of the tissue. At time 0, the transmitter circuit drives the
transducer to emit the ultrasonic pulse as shown by the red pulse. The pulse travels
through the tissue at the sound speed c, typically 1540m/s in human soft tissue [16].
When it hits some medium interfaces, the mechanical impedance mismatch at each
interface generates reflected ultrasonic waves. An interface at depth Z leads to a
29
Tp
Time
0
Z
td
T
A Medium
Interface
(Mechanical
Impedance
Mismatch)
Z
T+td
Z
The B-mode Image
(t=0)
(t=td)
Figure 2-1: The typical signals and the operation for B-mode ultrasound.
received ultrasonic echo at time td = 2Z/c, as shown by the blue pulse. Because
the echo amplitude is proportional to how large the mechanical impedance mismatch
is, the amplitude information is translated to the grayscale intensity of pixels in the
image. Meanwhile, the time delay from the received echo to the transmit instance (td )
translates to the depth, indicating the interface location in the image. A simplified
grayscale image is also shown in the figure.
The transmit-receive action is repeated after time T , such that the B-mode image
can be continuously updated in time. The period T is called the pulse repetition
period (PRP), and it needs to be long enough to ensure that all ultrasonic echoes
from the previous transmission are back. Given that the ultrasonic wave travels at the
sound speed of about 1540m/s and the typical image depth of 7.5cm, one transmitreceive repetition will take approximately 100µs (2 × 7.5cm ÷ 1540m/s = 97µs).
The reciprocal of PRP is called the pulse repetition frequency (PRF), which is the
number of pulses per second. It is a term frequently used in active imaging systems
such as the ultrasound, sonar or radar systems. A typical PRF in ultrasound is 10kHz
corresponding to the 100µs PRP. Depending on applications, commonly used PRFs
can be from 5 to 20kHz.
30
The red transmit pulse shown in Figure 2-1 is composed of 2 bursts of sinusoids
with a cycle period of Tp . While it shows a typical case, the sinusoidal pulse shape
can be replaced by other pulse shapes, such as discrete level pulses, which will be
discussed in this thesis. The number of bursts in one transmission can also be variable
depending on applications. Generally speaking, more bursts lead to stronger reflected
echoes, while less bursts lead to better image axial resolution because of the shorter
pulse duration. B-mode imaging commonly employs 2-5 bursts per transmission; and
PW Doppler imaging (see next paragraph) employs as many as 20 bursts to improve
signal strength in the received echoes.
Besides direct visualization of tissue anatomy, the Doppler effect is used to obtain blood flow velocity information inside human body [17]. There are mainly three
Doppler modes: Continuous Wave (CW), Pulsed Wave (PW) and Color Flow Mode
(CFM) Doppler [11–15]. The CW Doppler is the earliest mode, which transmits continuous ultrasonic waves into human body and detects Doppler frequency shift from
the echo waves [13]. It is simple and reliable, but lacks range information. The PW
Doppler improves upon the CW mode by repeatedly sending pulsed ultrasonic waves
into the medium [14]. The time of flight of the received echoes contains the range
information, and the slight timing difference between consecutive echo pulses reflects
the object movement1 . Sub-sampling at the PRF is usually carried out before the
spectrum analysis for the PW Doppler frequency shift [11]. The CFM Doppler is used
to present velocity information as a color-coded image, which is often overlaid on top
of a B-mode image. Time-domain autocorrelation based signal processing techniques
are often used to speed up the CFM processing [15]. The velocity estimation accuracy
is good enough for color-coded visualization.
Many more imaging modes exist. For example, the Harmonic Imaging mode uses
the second harmonic of the pulse to provide high resolution images [18–22]; the Power
Mode Doppler visualizes the magnitude of Doppler signal, rather than the frequency
1
It is important to point out that in the PW mode, the Doppler effect does not come from the
frequency shift of a single received echo pulse, since a short pulse is broadband, and therefore it
is difficult to detect the small Doppler frequency shift (typically less than 100KHz). Besides, the
frequency-dependent attenuation through the tissue complicates the task even more. Instead, it is
the velocity-dependent time delay across several pulses, that carries the velocity information.
31
&/#!,
0/).4
6!2)!",%
$%,!93
!22!9
/54054
3)'.!,
!.!,/'
!$$%2
!$#
Figure 2-2: Simplified block diagram of a ultrasound BF system, figure courtesy
of [27].
shift, to help identify the existence of low flows and velocities [23]. Furthermore,
many imaging modes are used together as Duplex or Triplex modes for the best
visualization [24, 25].
2.2
The Beam-formation Principle
Beam-formation (BF) is heavily involved in ultrasonic imaging, to increase the signalto-noise ratio (SNR), to focus the ultrasound beam to deliver more power, and to steer
the beam to scan the imaging space [8,9,12,26,27]. The beamforming algorithms are
based on the delay-and-sum principle, which is shown in Figure 2-2. When a focus is
specified, delays are calculated for each ultrasound channel, so that the pulses from
different channels travel the same distance between the corresponding transducer
elements and the focus.
The implementation of beam-formation can be either analog or digital, and the
beam-formation can be achieved at both the transmitting and receiving paths. Because of the denser integration, higher flexibility, and lower power consumption, digital beamforming is favored in modern systems.
Ultrasonic imaging systems are often operating at both the near field (or Fresnel
zone) and the far field (or Fraunhofer zone) regions [28–30]. For a round-shape, nonfocused, single element transducer, the boundary between the near field and the far
32
field regions is usually defined at2 :
L=
D2
,
4·λ
(2.1)
in which the D is the diameter of the transducer surface and the λ is the ultrasound
wavelength.
In the near field, the pressure amplitude varies drastically, with many local maximums and minimums. This complex characteristic is caused by the constructive
and destructive interference wave patterns of ultrasound beam. In the far field, the
pressure amplitude decreases monotonically with distance and the ultrasound beam
diverges at the angle θ defined as: sin (θ) = 1.22 Dλ .
At the boundary of the near and far field, where the distance is roughly given
by Equation (2.1), the maximum pressure amplitude, or equivalently the maximum
ultrasound intensity, is reached; and the beamwidth is minimized at the same time.
According to [28–30], the effective beamwidth is approximately equal to half of the
transducer diameter D; the pressure amplitude is therefore about 2 times of the
pressure amplitude at the transducer surface.
Because of this unique property, it is advantageous for ultrasonic imaging to operate close to the near and far field interface, for best SNR and lateral resolution. As
a simple numerical example, a typical single element transducer for an intracranial
pressure (ICP) measurement has a diameter of about 1.5cm [32, 33]. The typical operating frequency is 2MHz and the typical ultrasound speed in human soft tissue is
1540m/s [16], giving a wavelength of 0.77mm. The interface distance calculated from
Equation (2.1) is therefore 7.3cm, which is about the same distance from the target
brain blood vessel to the transducer3 .
Because the system operates heavily in near field region, time-domain techniques
for beamforming and processing are common in ultrasonic imaging. Consequently,
2
Depending on applications, there are many different definitions [31]. The one used in this article
is most widely used in medical ultrasound area.
3
For transducers with more complex shapes and structures, the equations presented above will
be slightly different by some factors. But the effective aperture size D can be used to approximate
the element diameter, and the conclusions about near field and far field more or less stay the same.
33
the ultrasound pulses are short-duration, wideband signals to facilitate time based
algorithms.
In additional to the basic delay-and-sum beam-formation principle, several techniques are often used to improve the visualization, creating a more homogeneous
image quality throughout the full depth [8, 9, 12]. They have been applied to imaging
experiments of our work.
• Dynamic focusing: Instead of a fixed array delay pattern for a fixed focal
point in the space, the dynamic focusing technique implements a continuously
moving focal point across different imaging depth. The array elements are controlled to focus signals at a shallow depth at the beginning; as time progresses
(corresponding to depth increase), the array delay pattern is gradually modified
to move the focal point into deeper depth until the end of the imaging depth.
Compared to a single focal point, dynamic focusing generates high detail resolution and high contrast resolution for all depths. It can be relatively easily
implemented by a digital beamformer at the receive side.
• Constant F-number imaging: F-number (F #) is the ratio of focal length
(f ) to the imaging aperture diameter (D), as in (2.2).
F# =
f
.
D
(2.2)
It is an important concept in optics, photography, and ultrasound. In ultrasound, the constant F-number imaging technique keeps a constant F # by gradually enlarging the active aperture (D) as the focused imaging depth (f ) grows
larger. The result of this technique is a constant lateral resolution and it is
often used in conjunction with the dynamic focusing technique.
2.3
Ultrasonic Transducers
Currently, 1D ultrasonic transducer arrays for 2D medical ultrasound images is the
common practice [8, 12, 34–36]. The transducer arrays are usually built with piezo34
electric materials. Element count of an array can be as high as one thousand. The
interconnection to the electronics are co-axial cables.
3D ultrasonic imaging can be achieved by translating or rotating a 1D transducer
array over the space [3, 4], but the accuracy and speed is limited by the mechanical
movements. As a result, 2D transducer arrays and the supporting 2D electronics are
more desirable for 3D ultrasonic imaging. There are commercial 3D imaging systems
utilizing 2D transducer arrays. For example, Philips Matrix X6-1 is a 2D array that
contains 9,212 elements [37]. However, cables are still needed for the interconnections
between the transducer probe and the data acquisition system, which might not be
the best solution for 3D imaging, due to the high channel count. Additionally, the 2D
transducers have been built from piezoelectric materials [37,38], where manual dicing
is often needed to separate individual array elements. The interconnection and yield
problems are challenging as the array gets larger and the element size gets smaller.
The capacitive micromachined ultrasonic transducer (CMUT) [39–41] is an alternative to the traditional piezoelectric transducers (PZTs). The CMUT technology offers advantages such as improved bandwidth, ease of fabricating large arrays, and potential for integration with electronics with the through-silicon vias (TSVs) [40,42,43]
or monolithic CMUT-CMOS integration [44–46].
But there are also challenges for CMUT. Most importantly, the output power and
efficiency are still relatively low, partly due to the large parasitic device capacitance.
The primary reason for the large parasitic capacitance is the physical structure of
the CMUT element, which forms a parallel-plate capacitor [41]. As a result, the
transmitter and receiver circuitry that interfaces to CMUT is different from that
for PZT. They need to be designed appropriately to prevent excessive performance
degradation caused by the load that is much more capacitive and higher impedance.
The piezoelectric micromachined ultrasonic transducers (PMUTs) also emerge as
another possible 2D transducer solution for 3D imaging [47–51]. It combines the
piezoelectric material with micromachining techniques, trying to exploit the benefits
from both worlds. The piezoelectric material tends to provide transduction with
relatively high efficiency and good linearity, while the micromachining process helps
35
create fine-pitched 2D arrays with higher yield and reliability. As a technology in its
early research phase, it has shown initial success of a 5x5 working array [47]. More
works are being done to address problems with this technology, including how to
enhance the device bandwidth to generate images with better axial resolution; and
how to reduce the intrinsic device parasitic capacitance from the high permittivity of
the piezoelectric material [48, 49, 51].
In this thesis, we design block-level circuits for CMUT, but our architecture and
system innovations are not limited to a particular transducer type, as will be discussed
in succeeding chapters.
2.4
Field II Simulation Program
In our work, we make heavy use of the Field II Simulation Program [52, 53] to model
the complete hardware and software setup. Field II is a behavioral simulation package
running under MATLAB (The MathWorks, Natick, MA) Environment. Figure 2-3
shows a typical Field II simulation flow diagram. The users have the freedom of
defining the ultrasonic phantom (i.e. the medium being imaged by the system),
transducer property, pulsing / receiving methods, beam-formation algorithms, and
image processing / display methods. Based on the user definition, Field II simulates
the ultrasound transducer fields and ultrasonic imaging using linear acoustics.
The phantom definition is realized by specifying point scatterers in space with
different reflecting amplitudes. It can be a simple single scatterer phantom that
characterizes the point spread function of an imaging system; or complex shapes
defined by a set of scatterers. Moving structures can also be instantiated by a sequence
of phantoms with slight position changes over time, which is useful in simulations for
ultrasonic Doppler systems.
The transducers are defined with the type, frequency response and active aperture.
The transducer types include 1D, 1.5D, 2D arrays, as well as curved arrays with
concave or convex shapes. The transducer element dimensions can be freely specified
and the element frequency response is described by its impulse response. Transmit
36
Figure 2-3: A typical Field II flow diagram for ultrasonic system behavioral simulation.
and receive apertures are defined separately, while the active elements are selectable
within the array. Two other properties associated with the active apertures are the
focus and apodization. Through the focus specification, the beam-formation delays
can be automatically calculated for each element in an aperture. The apodization
gives amplitude weights for signals at different transducer elements. Both focus and
apodization can be a function of time, in which dynamic focusing / apodization is
realized.
The pulsing excitation for the transducer is supplied to the array by a time-domain
pulse waveform. Based on the pulsation, phantom definition and transducer property,
the received echo waveforms from every element in the Rx aperture are produced by
the Field II simulator. Beam-formation is performed on the collected echo waveforms;
and the beamformed waveforms can then be used to construct a 2D or 3D image, or
further processed for Doppler information.
With the ultrasonic field simulation, Field II helps verify the acoustical physics
and visually show the ultrasonic pressure field generated by the transducer. With
37
the capability of incorporating different beam-formation algorithms, it allows the
development and validation of new architecture-level and system-level ideas. It could
also be used to model non-ideality from circuits and transducers, so that a practical
understanding of the real imaging system can be achieved. As will be seen in the
following chapters, Field II simulation plays an important role in the thesis work.
38
Chapter 3
The Column-Row-Parallel
Architecture for 3D Ultrasonic
Imaging
This chapter describes our approach to solve the challenges in realizing a 3D medical
ultrasonic imaging system. The analog front-end architectural trade-offs are first discussed and the design process of the Column-Row-Parallel architecture is presented.
The implementation of the proposed architecture is then shown, which is both scalable
for hardware realization and flexible for software algorithm support. The functionality
of the implemented architecture is then described.
3.1
The Prior Art of Architectures for 3D Ultrasonic Imaging
A 2D NxN transducer array is often used to acquire 3D volumetric data, where the
architecture of the front-end circuit interfacing to the transducer array is an important
design consideration.
The most straightforward way to interconnect to a 2D transducer array is to use
a fully-parallel architecture, but it is not very scalable for hardware implementation.
39
A fully-parallel architecture requires N 2 active transceivers that are operating at
the same time. As a result, it requires N 2 independent input control lines for the
transmitter array and N 2 output data lines for the receiver array. As the array size
grows bigger, the required channel count will be correspondingly larger and this is
difficult to scale up economically.
On the other extreme, a serialized system could be used to save channel count,
but it is usually too slow for data acquisition. One could serialize the input control
lines and/or the output data lines of the aforementioned fully-parallel system, so
that the number of interconnect lines needed is reduced. Due to the large number
of channels to be serialized, the data rate requirement would become too high to be
practical, following a similar N 2 scaling trend. Alternatively, one could use a singlechannel transceiver to sweep the 2D array, one element at a time. The transceiver is
connected to each element by multiplexing and it repeatedly transmits and receives
ultrasound with different elements in the array to acquire a full data set [40]. Given
that one transmit-receive repetition could take as long as 100µs (Section 2.1), and
that the total time consumed to gather one full data set increases with N 2 trend, the
image frame rate would greatly suffer as the array size continues to grow bigger.
Therefore, to alleviate the conflict between hardware complexity and data acquisition speed in 3D ultrasonic imaging systems, there is a lot of research on various
sub-array architectures that lie in between the fully-parallel architecture and the serialized single-channel architecture. In [43], the diagonal elements in a full 2D array
are used to form the receive aperture, while the rest of the 2D elements are used
to form the transmit aperture. At the transmitter side, it is close to a fully-parallel
architecture because almost all elements are being used. To provide the transmit
beam-formation delay pattern for all transmitters, the digital delay values are serially streamed in to program each transmitter. It saves the interconnection but slows
down the programming speed. At the receive side, the output channel count is reduced to N from N 2 because only the diagonal sub-array elements are used. This
diagonal sub-array approach leads to an elevated side-lobe level that degrades the
image contrast. Similarly, [54] investigated possibilities of various sparsely sampled
40
Figure 3-1: Column-parallel architecture implementations in the literature: (a) a
1D transducer array mechanically translated to scan the 3D space, elevation beamformation is done by a synthetic virtual source technique, figure courtesy of [3]; (b)
a 2D array operated to receive row-by-row, elevation beam-formation is done by
sub-array delay-and-sum across the column using analog delay lines, figure courtesy
of [55].
2D aperture patterns. But because the sub-array is fixed once the pattern is chosen,
the reduction of active elements generally leads to higher side-lobes and worse image
resolution performance.
To avoid a fixed sub-array pattern selection, another sub-array idea of using either
3x3 or 5x5 elements is described in [37]. The sub-arrays are programmable and each
sub-array performs beam-formation to compress the received data into one channel,
reducing the overall channel count by a factor of 9 or 25. To maintain the image
quality and avoid introducing artifacts, programmable delay patterns for the subarray are required. This requirement directly translates into analog delay lines in a
hardware implementation, which tends to be bulky and power hungry.
In [3, 4], a conventional 1D transducer array is used as a sub-array and is mechanically translated or rotated to achieve synthetic 3D imaging, as shown in Figure
3-1(a). The active channel count is reduced to N and the synthetic beam-formation
technique could produce good image quality, as long as the object being imaged is
static or moving at a much slower speed than the image frame rate, to avoid mo41
tion artifact. The major drawback in this solution is the mechanical implementation,
which is both a bottleneck for frame rate due to the slow movement speed, and a bottleneck for power saving due to the large amount of power needed to drive a motor.
More recently, to replace the mechanical translation, an electrical scanning front-end
architecture is implemented as shown in Figure 3-1(b) [55–57]. The receiver channels
are turned on row-by-row to collect reflected ultrasound echoes. By activating different rows of transducer elements over consecutive ultrasound transmits, it effectively
mimics the translation of a 1D transducer array, but much faster and lower power.
3.2
The Motivation of the Column-Row-Parallel
ASIC Architecture
The work in [3] and [56, 57] both employ row-by-row (i.e. column-parallel) operation
to reduce number of active channels from N 2 to N . The 3D image quality from
the column-parallel architecture is very good in the azimuth (X) direction because
each row can perform full beam-formation along the azimuth direction. However,
the beam-formation along the elevation (Y) direction is poor. Techniques such as
synthetic virtual source [3] are used to enhance the focusing in elevation with limited
success in Figure 3-1(a). Analog delay lines are also attempted to realize elevational
beam-focusing to achieve good imaging performance in Figure 3-1(b) [55]. But for the
same reason mentioned in the previous section, the analog delay lines lead to large
power and silicon area overhead, making system integration difficult.
To cover both azimuth and elevation directions for 3D volumetric imaging, a
column-row addressing scheme has been implemented for a 2D transducer design as
shown in Figure 3-2 [38, 58–60]. By dicing the transducer top plate row-by-row and
dicing the bottom plate column-by-column, the transducer can be driven row-by-row
in transmit (Figure 3-2(a)) and column-by-column in receive (Figure 3-2(b)). The
combined “Maltese cross” shaped beam-pattern (Figure 3-2(c)) makes it suitable to
carry out beam-formation both in azimuth and elevation directions. At the same
42
Figure 3-2: The column-row addressing scheme implemented on a 256x256 2D transducer array: (a) row-by-row transmit addressing; (b) column-by-column receive addressing; (c) the “Maltese cross” beam-pattern. Figure courtesy of [38].
time, the interconnection complexity for the array is still kept at a linear growth
(2*N).
The column-row addressing implemented on the transducer-level has shown potential to be a balanced architecture solution for both good image performance and
hardware scalability. However it still suffers from a lack of flexibility, because the
transducer array is hard-wired to be divided into rows and columns. The limitation
of only addressing the elements by one row or one column at a time provides limited
freedom for the supporting algorithm design. On the other hand, if one could implement a similar column-row addressing architecture at the circuit-level instead of
at the transducer-level, as depicted in Figure 3-3, the element addressing mechanism
could be much more flexible. With the highly programmable control support from the
electronics, various sub-array patterns could be possible on the same system, allowing
more versatile functionality and more design freedom at the system-level.
43
Figure 3-3: A column-row addressing architecture implemented at the circuit-level,
with column and row interconnections that reduce the system channel count and
provide maximum flexibility for algorithms.
3.3
The Column-Row-Parallel ASIC Architecture
In our work, a Column-Row-Parallel architecture is implemented at the circuit-level
with much more diverse functionality and a better trade-off between complexity and
speed. Figure 3-3 in the previous section is a conceptual drawing of the proposed
architecture, while Figure 3-4 shows a detailed picture. 2D CMUTs are chosen as
the target transducer arrays for this work, because of its ease of integration and
scalability [39,40,43]. But the same architecture design can be applied to other types
of 2D ultrasonic transducers easily.
As shown in Figure 3-4, a 2D CMUT (16x16 transducer arrays are used in this
work) is DC biased at 30-50V from the common top membrane and each CMUT
element’s bottom pad is connected to its corresponding ASIC channel. The DC bias
network is provided off-chip with the resistor and the capacitor being shared across
all CMUT elements in the array [40, 41]. As indicated by both Figure 3-3 and Figure
3-4, there is a transmitter (Tx) pulser, a receiver (Rx) low noise amplifier (LNA),
and a receiver high voltage (HV) protection switch per electronic channel, under each
44
Shared
External
Biasing
CMUT
ASIC
Column Select Logic
Rx
Rx
BUF
Rx
Rx
Delay
Gate Dr
BUF
Gate Dr
Gate Dr
Delay
BUF
Delay
Gate Dr
Delay
BUF
Column Circuitry
Figure 3-4: Column-Row-Parallel architecture block diagram, the CMUT and ASIC
chips are stacked vertically.
45
CMUT element. The total silicon layout area of a transceiver is designed to be the
same as a CMUT element’s area, which is 250µm × 250µm in this work, so that the
ASIC channels can be element-matched to the CMUT pitch. The Tx pulser gate
drivers and Rx buffer amplifiers are placed at the ASIC perimeter to interface to the
transceiver array. There are 16 copies of Tx drivers and Rx buffers at the column
side and another 16 copies at the row side, reducing the ASIC I/Os down to “N ”1 .
Zooming into one transceiver channel located at ith column and the j th row, Figure
3-5 shows that Tx and Rx operations are independent and time-multiplexed. The
control inputs of the transceiver channel include: the ith column select signals (T c[i],
Rc[i]) supplied from the column side, the j th row select signals (T r[j], Rr[j]) from the
row side, and the local per-element enable bits (T en, R en). The column and row
select signals are designed to be only active at one side, they cannot be asserted at
the same time. The signals are input to the per-element logic unit, shown in Figure
3-5(b), to generate corresponding internal switch controls including: T r, T c, Rr, Rc,
and RxSw.
T r and T c determine whether the Tx pulser is driven by the column side or the
row side, or none, in which case the pulser is turned off. When the Tx element [i, j]
is enabled (T en = 1) and the j th Tx row is selected (T r[j] = 1), the internal switch
control signal T r becomes high and the Tx pulser gate drive signals are supplied
from the Column Gate Driver[i]. The array’s Tx path is in column-parallel mode.
When the Tx element [i, j] is enabled (T en = 1) and the ith Tx column is selected
(T c[i] = 1), the internal switch control signal T c becomes high and the Tx pulser
gate drive signals are supplied from the Row Gate Driver[j]. The array’s Tx path
is in row-parallel mode. When the Tx element [i, j] is disabled (T en = 0); or when
neither Tx row or Tx column is selected (T r[j] = T c[i] = 0), both T r and T c are low
and the Tx pulser is turned off, ignoring gate drive signals from both column and row
gate drivers.
Similarly, Rr and Rc determine whether the Rx LNA outputs its analog signal
to the column side or the row side, or none, in which case the LNA is turned off.
1
Figure 3-6 shows the I/Os are N instead of 2N .
46
Tc[ i ] Rc[ i ]
Transceiver
[ i, j ]
Tr[ j ]
Rr[ j ]
T_en R_en
R
Row
Gate Driver[ j ]
b
T
b
Row
BUF[ j ]
Column
Gate Driver[ i ]
Tc[ i ]
T_en
Tr[ j ]
T_en
Column
BUF[ i ]
Rc[ i ]+Rr[ j ]
R_en
Rc[ i ]
R_en
Rr[ j ]
R_en
Figure 3-5: (a) The block-level implementation of one transceiver channel and (b) the
per-element logic implementation. Column and row select logic is implemented with
shift registers that can be reprogrammed in “N ” time (implementation detail will be
shown in Figure 5-2).
47
When the Rx element [i, j] is enabled (R en = 1) and the j th Rx row is selected
(Rr[j] = 1), the internal switch control signal Rr becomes high and the Rx LNA
output is connected to the Column Buf f er[i]. The array’s Rx path is in columnparallel mode. When the Rx element [i, j] is enabled (R en = 1) and the ith Rx
column is selected (Rc[i] = 1), the internal switch control signal Rc becomes high
and the Rx LNA output is connected to the Row Buf f er[j]. The array’s Rx path
is in row-parallel mode. When the Rx element [i, j] is disabled (R en = 0); or when
neither Rx row or Rx column is selected (Rr[j] = Rc[i] = 0), both Rr and Rc are low
and the Rx LNA is turned off, presenting as high output impedance to both column
and row buffers.
The Rx HV protection switch protects low voltage Rx electronics from high voltage
Tx transients. An additional internal control signal, RxSw, is generated to control
the gate of the protection switch. Whenever the Rx LNA is activated and connected
to either column or row buffer, the HV switch is turned on (RxSw = 1) to allow
CMUT signal to reach LNA for amplification. The HV switch is off when the LNA
is not activated, and it also remains off during Tx pulsing to isolate the high voltage
pulsing transients.
The detailed circuit implementation for generating column / row select signals as
well as the per-element enable bits will be the topic of Chapter 5. But as a high-level
description, these selection and enable bits are stored in shift registers (SR’s) which
can be programmed serially. The column and row select signals are 16-bit long for the
16 columns and rows, while the per-element enable bits are 512-bit long, accounting
for 1-bit Tx enabling and 1-bit Rx enabling for each CMUT element in the 16x16
array. Furthermore, two multiplexed banks for each control set are implemented.
For example, there are two multiplexed 512-bit SR banks for per-element enable bit
programming. One SR bank can be used in normal operation while the other bank
is being reprogrammed. Alternatively, two SR banks can be both initiated so that
one could quickly alternate between the two banks to achieve fast aperture switching
between two pre-defined aperture patterns.
Lastly, because either column side or row side will be activated at one time, the
48
Row Gate
Driver[ 15 ]
Row
BUF [ 15 ]
Tx_IN
[ 15 ]
Row Gate
Driver[ 0 ]
Row
BUF [ 0 ]
Tx_IN
[0]
Column
BUF [ 0 ]
Column Gate Column Gate
Driver[ 0 ]
Driver[ 15 ]
Column
BUF [ 15 ]
Rx_OUT [ 0 ]
Rx_OUT [ 15 ]
Figure 3-6: (a) Tx input port multiplexing, implemented with digital logic; (b) Rx
output port multiplexing, implemented with analog pass-gates.
column and row circuits share I/O ports by multiplexing, as shown in Figure 3-62 .
For Tx, the multiplexing switches are implemented with digital logic gates; for Rx,
the multiplexing switches are implemented with analog pass-gates for analog signal
outputs. In this way, the input ports for Tx beamforming control and output ports
for Rx received waveforms are both 16 instead of 32 for a 16x16 array, saving the chip
I/O count considerably. And the chip’s interface scaling trend becomes N (rather
than 2N ), which is the same trend as a 1D array for 2D imaging.
3.4
The Functionality of the Column-Row-Parallel
Architecture
In this section, a few examples will be utilized to help understand how the proposed
Column-Row-Parallel ASIC architecture could be used for 3D ultrasonic imaging.
Figure 3-7 shows an exemplary configuration of a column-parallel mode Tx aperture on the 16x16 CMUT-ASIC system. Note that the exemplary configuration is
broken down and illustrated in steps to help understanding, but the actual ASIC
2
This implementation detail is not shown in most other block diagram figures to avoid complication.
49
configuration is carried out as a whole in one step. In this example, two of the 16
row select signals are turned on so that the two rows of Tx elements are activated, as
shown by the red squares in Figure 3-7(a). Because the array is operating in columnparallel mode, all elements along the same column are in parallel as shown by the
red column connection lines in Figure 3-7(b). The elements on the same column are
driven by a shared Tx column gate driver as in Figure 3-7(c). Because the 16 column
gate drivers can be controlled independently, by supplying the driver signals with
different delay timings, the 16 Tx columns emit ultrasonic waves at slightly different
timing with respect to each other. This delay pattern could be configured to perform
ultrasonic beam-focusing or beam-steering along the azimuth direction, as shown in
Figure 3-7(d).
Figure 3-8 shows another exemplary configuration, in which a row-parallel mode
Rx aperture is programmed on the 16x16 CMUT-ASIC system. Five columns are
activated in this example by the column select signals, and each five Rx elements
on the same row are in parallel. Their outputs are combined in the analog domain,
which is buffered by a shared Rx row buffer. The 16 analog outputs are digitized by
off-chip ADCs. Afterwards, the digitized channel data can be processed digitally to
perform beam-formation along the elevation direction.
As mentioned in previous section already, the Tx and Rx paths are completely
independent. Therefore, the Tx path can be configured into a row-parallel mode
and the Rx path can be in column-parallel mode too. The number of active rows or
columns can also be programmable depending on the need. The reprogramming time
for the active rows and columns is fast, because the row and column select signals are
generated at the side of the array and it only takes N clock cycles, making it scalable
as the array size grows.
When multiple rows (the case for columns is similar) are activated, they operate
in parallel and effectively behave as a “thicker” row compared to when only one row
is activated. The azimuth beam-focusing is the same while the additional elevation
thickness could provide larger signal strength. This feature offers freedom at the
system-level. As will be seen in Chapter 4, different number of rows or columns
50
Row Select Logic
Column Select Logic
Row Select Logic
Column Select Logic
Row Select Logic
Column Select Logic
Row Select Logic
Column Select Logic
D0 D1
D15
Column Tx Drivers:
Beamform delays
Tx beamform
in X (azimuth)
D0 D1
D15
Column Tx Drivers:
Beamform Delays
Figure 3-7: The architecture configured in a column-parallel mode for the Tx aperture.
The configuration is broken down and illustrated in steps (a) through (d) to help
understanding. Two rows are activated as the Tx aperture and beam-formation along
azimuth (X) direction is achieved.
51
Figure 3-8: The architecture configured in a row-parallel mode for the Rx aperture.
Five columns are activated as the Rx aperture and beam-formation along elevation
(Y) direction is achieved.
can be selected for transmit or receive, to achieve the desired imaging requirements
(volume rate, resolution, etc.). The innovative circuit structures realizing the feature
are discussed in Chapter 5.
In addition to row-by-row or column-by-column operations, the array can also
be programmed into more complex aperture patterns for specific ultrasonic imaging
applications. This programming is accomplished through the proper use of the perelement enable bits under each element for both Tx and Rx paths.
For example, in Figure 3-9(a), only the diagonal elements are configured with a
Rx per-element enable bit of 1 (R en = 1) while all other Rx element’s enable bits
are 0. The system is in row-parallel mode and all 16 column select signals are on,
so that the 16 diagonal Rx elements receive ultrasound echoes and output to the
16 row buffers. In this way, a diagonal Rx aperture is formed, achieving the same
functionality as described in [43].
Figure 3-9(b) shows another example, where a checker board pattern is activated
52
Column Select Logic
Row Select Logic
To External ADCs
Row Select Logic
Column Select Logic
DD
Column Select Logic
D
D
Row Select Logic
Row Select Logic
Column Select Logic
D
D
To External ADCs
Figure 3-9: More use examples of the proposed architecture: (a) a diagonal Rx
aperture; (b) a checker board Tx aperture for ultrasonic harmonic imaging; (c) & (d)
annular ring Tx and Rx apertures for forward-looking ultrasonic imaging applications.
53
for Tx path. All 16 Tx row select signals are activated while Tx per-element enable
bits define the checker board pattern inside the array. The column Tx gate drivers
supply the same delay profile for the 16 columns so that effectively all activated Tx
elements emit ultrasound pulses in-phase. This checker board Tx aperture could help
reduce second harmonic generation for the emitted ultrasound pressure field, which
is useful in ultrasonic harmonic imaging applications. It will be discussed in more
detail in Chapter 4.
The annular ring apertures are shown in Figure 3-9(c) and (d) for Tx and Rx
paths respectively. The ring shapes are adjustable in both column-parallel or rowparallel modes. The Tx elements activated for the annular ring are driven in-phase
as indicated by the same delay values supplied by the row gate drivers in Figure 39(c). Similarly, the Rx annular ring outputs the received ultrasound echoes through
the column buffers. The digitized waveforms from different channels can be summed
in-phase to form a single annular ring Rx waveform. The application of annular ring
apertures for forward-looking 3D ultrasonic imaging will be presented in Chapter 4.
3.5
Summary
The Column-Row-Parallel architecture provides both scalability and flexibility. First,
column and row select signals are fast to be reprogrammed, which are linearly scalable
as the 2D array size grows bigger (“N ” scaling trend). They can activate rows or
columns for beam-formation in azimuth (X) or elevation (Y) directions. Second,
per-element enable bits offer fine granularity to form application-specific patterns,
such as the diagonal, checker board and the annular ring apertures. Moreover, each
control set has two multiplexed SR banks, which allow normal operation based on
one bank while reprogramming the other, or fast aperture switching between two
pre-programmed banks. Lastly, the architecture is compatible with many existing
beam-formation schemes [38, 40, 43, 55, 58], while offering new possibilities as will be
shown later.
54
Chapter 4
3D Ultrasonic Imaging System
Experiments
In this chapter, the system-level 3D ultrasonic imaging experiments are described.
The imaging system is assembled based on our custom designed prototype analog
front-end chip implementing the proposed Column-Row-Parallel architecture, interfacing to a 16x16 2D CMUT transducer array. The detailed design, implementation
and characterization of the AFE chip will be described in Chapter 5 and 6, but here
we will focus on the system-level capability of the proposed architecture and various
beam-formation algorithms suitable for the architecture.
4.1
The Hardware System Assembly
The experiments are conducted based on the real integrated hardware system, in
which a 16x16 CMUT chip and a 16x16 AFE custom chip are integrated as a complete
3D ultrasonic imaging front-end. As mentioned in Chapter 3, the layout area of
each AFE transceiver channel is element-matched to each CMUT element with a
size of 250µm × 250µm, so that the 16x16 array area of the CMUT and the ASIC
is matched and can be vertically integrated. For integration, each CMUT element
provides the electrical interconnection using a through silicon via (TSV) to a bonding
pad at the bottom side of the die, as has been described by many papers from CMUT
55
Figure 4-1: System integration diagram showing the flip-chip bonding connection
between CMUT and ASIC through a PCB interposer. The figure also shows the
mechanical setup for imaging experiments, including an oil tank and a 3D translation
stage.
literature [39–43,61]. Each AFE channel of the ASIC also provides a flip-chip bonding
pad. Solder balls have been placed onto all ASIC pads with a solder bumping process
as one of the final steps in ASIC fabrication by the foundry. Figure 4-1 shows how the
CMUT and ASIC are integrated together to form a 3D ultrasonic imaging front-end
system. To interconnect to both the CMUT die and the ASIC die while providing
footprint flexibility, a PCB interposer is fabricated and used to do flip-chip bonding to
CMUT and ASIC at both sides respectively [62, 63]. The PCB vias directly connect
an individual CMUT element to its ASIC transceiver channel. The CMUT-PCBASIC assembly is then plugged into the main testing PCB for measurements. The
oil tank contains vegetable oil as an in-vitro approximation to human fat and a 3D
translation stage is made to help hold various measurement tools or imaging phantoms
for experiments.
The actual test setup picture is shown in Figure 4-2, and the corresponding block
56
16-channel
Data
Acquisition
System
A Metal Ring
Phantom on
top of CMUT
3D
Translation
Stage
Holder
Main
Testing
PCB
Tank with
vegetable oil
Figure 4-2: The picture of the hardware system setup.
Main Testing PCB
FPGA Control:
ASIC Initialization
DC-DC Converter Control
Tx / Rx Switching
Tx Beamforming
Rx Gain Control
Column / Row Mode Select
Column / Row Select
PC:
Rx Beamforming
3D Image Display
Power Supplies
(HV, analog, digital, etc.)
Phantom &
Measurement
Setup
16-ch Data
Acquisition
Figure 4-3: The block diagram of the hardware system setup.
57
4mm
2x16 pads to
CMUT common
top membrane
16x16 pads to
individual CMUT
elements’ bottom plate
CMUT Flip-Chip Bonding Pad Drawing
CMUT
Height
~0.5μm
4.5mm
Gap is
250um or
373.75um
Pitch is
250um
Figure 4-4: The 16x16 CMUT die drawings: (a) the footprint of the CMUT; (b) the
CMUT flip-chip bonding pad metal structure drawing, courtesy of [40].
diagram is shown in Figure 4-3 to give an abstract view.
4.1.1
The PCB-CMUT Connection
At the PCB-CMUT connection side, the footprint of the CMUT samples comes with
two different possible configurations. As shown by Figure 4-4(a), the CMUT main
elements’ pad array is 16x16 with the pitch of 250µm, and there is an additional 2x16
pad array used for connection to the common top membrane to provide the DC bias
voltage, or the CMUT’s “ground”. The gap between the “ground” and the main array
can be either 250µm or 373.75µm, depending on the specific CMUT batches made
at the supplier. This necessitates the need for the PCB interposer, so that different
PCBs can be designed to fit different footprints. The CMUT flip-chip bonding pad
metal structure is also shown in Figure 4-4(b). The pad metal stack is composed of
Ti-Cu-Au and the pad diameter is 50µm at a pitch of 250µm.
Two PCB designs are correspondingly made to adapt to the two different CMUT
footprints, as shown in Figure 4-5(a) and (b). In version A, the gap between the
“ground” and the main array is 250µm, and all 18x16 pads are made in a pitch of
58
PCB version A
CMUT footprint
PCB version B
CMUT footprint
Figure 4-5: The two different PCB designs made to fit CMUT footprints: (a) the
PCB version A’s footprint for CMUT with a gap distance of 250µm; (b) the PCB
version B’s footprint for CMUT with a gap distance of 373.75µm, only 1x16 pads are
made on the PCB side due to space limitations.
250µm. In version B, because the gap is 373.75µm, only 1x16 pads for “ground”,
instead of 2x16 pads, are laid out on the PCB due to space limitations. But because
the 2x16 pads are redundant, the omission still allows correct electrical connection
between PCB and CMUT. All PCB pads’ pitch is also 250µm.
Because the CMUT pads are not solder bumped at its initial fabrication, and
it is difficult to do solder bumping on an individual die, we need to perform solder
bumping for the pads on the PCB side, so that the flip-chip bonding can still be made
between PCB and CMUT. To accommodate PCB solder bumping, the PCB pads are
made with electroless nickel immersion gold (ENIG) with a metal stack of Ni-Cu-Au.
The pad diameter is 190µm. Because the pad pitch is 250µm, it leaves a clearance
of 60µm between two pads. The pads are drilled into vias of 150µm diameter with
a mechanical drill1 . The vias are filled and plated with ENIG at both sides of the
PCB. The solder mask is then covered onto the pad with laser direct imaging (LDI)
technology, to define a solder mask thickness of roughly 13µm and a pad opening
1
A laser drill could produce even smaller drills, but the smaller holes cannot be epoxy filled and
plated over. As a result, 150µm mechanical drills are used.
59
PCB pad design and solder bumping drawings
CMUT die: 4.5mmX4mm
All pitch = 250um
Pad open = 4mil (100um)
Solder Mask Thickness = 0.5mil
Pad finish: ENIG (Ni-Cu-Au)
Pad size = 7.5mil (190um)
PCB: size ~2inchX2inch; thickness ~30mil, FR4
Figure 4-6: The drawing of a PCB pad defined with a solder mask, and bumped with
a solder ball. The PCB pad is used to do flip-chip bonding to the CMUT die.
diameter of 100µm. The solder mask thickness and the pad opening size is defined
such that a solder ball diameter of 100µm can be placed onto the PCB pad. The
drawing of a PCB pad bumped with a solder ball is shown in Figure 4-6. The PCB
interposer is fabricated with FR4 material with a thickness of 0.76mm. The solder
balls have a commonly used composition of 63% Sn and 37% Pb, with a diameter of
100µm. Both versions of the PCB are fabricated by Sierra Circuits, Inc., Sunnyvale,
CA; and the PCB solder bumping is performed by Pac Tech - Packaging Technologies,
Santa Clara, CA.
4.1.2
The PCB-ASIC Connection
At the PCB-ASIC connection side, the ASIC die is already solder bumped. Therefore,
the PCB pads at the ASIC side are without solder bumps and are used to do flip-chip
bonding directly. The ASIC footprint is shown in Figure 4-7(a). The center area of
the ASIC is occupied by a grid of 18x16 pads, which are 16x16 AFE channels and the
2x16 CMUT biasing pads. They are element-matched to the CMUT die’s connecting
pads through the PCB interposer. The perimeter of the ASIC are a ring of pads
with 2-pad width, which are used as the I/O pads, providing ASIC’s power supplies,
ground, input controls and output signals. These ASIC I/O pads are also flip-chip
bonded to the PCB interposer, and are further routed to the four edges of the PCB,
60
5.5mm
2x16 pads
providing
CMUT bias
Surrounding 2x
pad rings
16x16 AFE
channels
18x16 pads
connecting to CMUT
through PCB
6mm
Surrounding 2x pad rings
are for ASIC I/Os
Pitch is
250um
Figure 4-7: The ASIC die drawings: (a) the footprint of the ASIC, containing the
center 18x16 pads to be element-matched and connected to CMUT through the PCB
interposer, and the surrounding I/O pads; (b) the PCB interposer layout design that
allows the ASIC I/O pads to be routed out to the PCB edges.
for interconnection to the main testing PCB, as shown in Figure 4-7(b). The fact that
the I/O pad ring is of 2-pad width ensures that only a 2-layer PCB design is needed.
Since the PCB interposer is of fine pitch at 250µm, and that the wire spacing is as
tight as 60µm, keeping the PCB layer requirement to the minimum can help reduce
the manufacturing cost greatly.
In Figure 4-8, the ASIC flip-chip bonding pad’s metal structure is depicted. The
structure is made using the dedicated metal layers for flip-chip bonding pads provided
by the silicon process, in which MD is the redistribution metal layer for routing
between the ASIC’s top metal (M6) to the flip-chip bonding pads, and the Under
Bump Metallurgy (UBM) is the material forming the pad structure under the solder
bump.
61
ASIC Flip-Chip Bonding Pad Drawings
Solder ball height
after bumping ≈ 80um
Solder ball diameter ≈ 100um
Pad size (UBM) = “C” = 80um
Pad open size = “A” = 50um
(b)
(a)
Figure 4-8: The ASIC flip-chip bonding pad metal structure drawings: (a) the horizontal view of a flip-chip bonding pad in ASIC; (b) the cross-sectional view of the
ASIC flip-chip bonding pad.
4.1.3
The Flip-Chip Bonding Assembly Process
The bonding process is performed on a FC150 flip-chip bonder (SET Corporation SA,
Smart Equipment Technology, France). The process contains the flip-chip bonding
steps between PCB-CMUT and PCB-ASIC respectively. Each side’s assembly is first
tested and verified to be working with spare CMUT and ASIC chips, in which different
process parameters, such as the tacky flux, the bonding force, reflow temperature
profile, etc., are tweaked for an optimal result. Afterwards, a two-step bonding process
is performed to obtain the full assembly.
First Step: Bonding between PCB-ASIC
The first step is the flip-chip bonding between PCB and ASIC. As shown in Figure
4-9(a), the PCB is picked up by the arm (chip holder) of the flip-chip bonder, with
ASIC-side PCB pads facing down; and the ASIC is fixed horizontally by the chuck
(substrate holder) of the flip-chip bonder, with the solder-bumped ASIC pads facing
up. Tacky flux is applied onto the ASIC chip. 3000 grams of bonding force is applied
62
The CMUT-PCB-ASIC Two-Step Bonding Process
Arm (chip holder)
Arm (chip holder)
CMUT
PCB
PCB
ASIC
Chuck (substrate holder)
ASIC
Chuck (substrate holder)
(a)
(b)
Figure 4-9: The CMUT-PCB-ASIC two-step flip-chip bonding process: (a) first step,
the bonding between PCB and ASIC; (b) second step, the bonding between PCB and
CMUT, with ASIC already bonded to PCB.
and the bonded assembly is reflowed in a Centrotherm Reflow Oven, with a peak
temperature of 215o C and a dwell time of 12 seconds. The reflow is done in N2
atmosphere. After that, the half-finished assembly is cleaned in propanol over night.
The PCB-ASIC connection shows a success rate of close to 100%, and a picture
of the connection is shown in Figure 4-10(a). Optionally, PCB-ASIC connection can
be verified by doing electrical tests on ASIC through the PCB interconnections. If
the ASIC operates as expected, the connections are very likely to be good since all
perimeter I/O pad connections are verified to be normal. During the flip-chip bonding,
because the arm vacuum holder holds the PCB by its CMUT-side, the solder bumps
at the PCB’s CMUT side are slightly deformed. However, since the bonded assembly
goes through a reflow process, any solder ball deformation is restored after the reflow.
Figure 4-10(b) shows the solder balls at PCB’s CMUT side after the reflow, and it
shows good uniformity in shape.
Second Step: Bonding between PCB-CMUT
The second step is the flip-chip bonding between PCB and CMUT. As shown in
Figure 4-9(b), the CMUT die is picked up by the arm, with its pads facing down
63
PCB-ASIC Connection
PCB’s CMUT-side Solder Bumps
after Reflow
(b)
(a)
Figure 4-10: The CMUT-ASIC connection result pictures: (a) the bonded PCB-ASIC
assembly shows good connectivity; (b) the solder bumps at the PCB’s CMUT side is
reflowed after PCB-ASIC bonding, any deformation would be restored.
Figure 4-11: The PCB-CMUT bonding connection is verified by pulling off the test
CMUT die from the PCB after bonding and reflow. (a) & (b) show the CMUT
connection posts remain on the PCB after the pull, indicating good connectivity.
64
Figure 4-12: The finished CMUT-PCB-ASIC assembly: (a) cross-sectional view of the
sandwich stack; (b) CMUT side assembly picture; (c) ASIC side assembly picture.
and its membrane surface touching the vacuum holder. The PCB-ASIC assembly
is fixed horizontally by the chuck, with the solder-bumped PCB’s CMUT-side pads
facing up. Tacky flux is applied onto the PCB’s CMUT side. 2000 grams of bonding
force is applied and the bonded assembly is reflowed in N2 atmosphere with a peak
temperature of 215o C and a dwell time of 12 seconds. The complete assembly is
thus finished. Underfill is not applied to either steps, and no significant mechanical
degradation has been observed over the testing period.
The PCB-CMUT connection took us a few trials before reaching a fully-functional
16x16 array. Electrical characterization and fault-tolerant design techniques are also
key factors leading to the fully-functional array, which will be discussed in more detail
in Section 5.5. Mechanically, flip-chip bonding trials had been performed on spare
dummy CMUT chips to obtain a correct bonding force. When bonding forces of
2000-3000 grams were applied, the PCB-CMUT bonding produced best results. It
was verified by pulling off test CMUT die from the PCB after bonding and reflow,
65
CMUT
ASIC
Figure 4-13: The acrylic tank drawings: (a) the tank dimension drawing; (b) the
mounting between the oil tank and the CMUT-PCB-ASIC assembly.
as shown in Figure 4-11(a) and (b). The CMUT was removed with great force, and
after its removal, the majority of CMUT TSV posts remain connected with the PCB
pads, indicating a strong bonding result.
Figure 4-12(a) shows a complete sandwich stack of the CMUT-PCB-ASIC assembly. Also in Figure 4-12(b) and (c), the CMUT side and ASIC side assembly pictures
are shown. It has also been proven over time that although the arm vacuum holder is
holding the CMUT by its membrane surface during the PCB-CMUT bonding process,
it does not break the CMUT device or affect its operation afterwards.
4.1.4
Mounting onto the Oil Tank
The CMUT-PCB-ASIC assembly is closely mounted onto an acrylic oil tank, so that
the assembly can be directly used to perform in-vitro imaging experiments. As shown
66
in Figure 4-13(a), the tank is a cube with a side length of approximately 3 inches.
The tank is designed to be mounted on top of the CMUT-PCB-ASIC assembly and
its bottom has a hole in the center to expose the CMUT chip. There are threaded
screw holes at tank’s bottom plane so that the PCB can be screw mounted under the
tank. To help improve sealing, rubber gasket is inserted between the tank and PCB,
and silicone industrial sealant (General Electric RTV 110 Series) is applied to both
sides of the gasket.
To help hold imaging phantoms or measurement tools, a 3D translation stage is
also added into the hardware setup, as already been shown in Figure 4-1 and Figure
4-2. The translation stage is fixed with respect to the oil tank to avoid any relative
movement.
As a final comment for this section, the 16x16 CMUT device still contains a few
defective, non-functional elements. The ASIC is designed to be fault-tolerant to the
CMUT defects, as will be discussed in Section 5.5, so that the defective elements
are disabled while the remaining functional elements operate normally. This faulttolerant design strategy has been a key factor ensuring the successful assemblies.
Meanwhile, because the number of non-functional elements are limited (less than 10
in some of the best assemblies), their effect on the imaging quality is not severe. The
imaging experiments presented in Sections 4.2 and 4.4 are carried out on the full
16x16 assemblies with defects. To minimize the loss of elements, digital interpolation
has been implemented on the received signals, although the transmitter side does not
have the interpolation capability2 . When the loss of elements are not acceptable, as
is the case in Section 4.3, a 10x10 sub-array with all functional elements are used for
the experimental demonstration.
2
If transmitter side interpolation is also desired, a pulser design implemented as a linear amplifier
can be used. The pulse amplitude and phase in neighboring channels of the missing one can be
adjusted, to implement the interpolation. More discussions are in Section 7.2.
67
4.2
Plane-wave Coherent Compounding for Fast
Volume Rate 3D Ultrasonic Imaging
For 3D ultrasonic imaging, we face not only the challenge of massive channel count as a
hardware limitation (as already been discussed in Chapter 3), but also the challenge of
a proper imaging scheme so that the 3D space can be imaged with satisfactory quality
and volume rate. A real-time imaging system needs good image quality (resolution,
contrast, etc.) for visualization and high volume rate to avoid severe motion blurring,
but these two considerations are conflicting requirements that are especially hard
to reconcile in 3D imaging. Comparing a 3D imaging system with a 2D array to a
2D imaging system with a 1D array, both the number of transceiver channels and
the image spatial span are significantly increased. More channels translates to more
data to collect and process, more image spatial span translates to the necessity of
transmitting and receiving from more ultrasonic beams to cover the whole volumetric
space.
Previously, efforts have been made to achieve fast volume rate in 3D imaging
systems by transmitting a “fat” ultrasonic beam at one time and doing parallel processing of 8 ultrasonic beams in the receive mode [64, 65]. This parallel beamforming
technique is called Explososcan and it achieves a volume rate of 8 volume/s, with 32
transmit channels and 32 receive channels on a 289-element 2D array (17x17). More
recent studies have pushed this concept to the extreme, by transmitting a planewave ultrasonic beam and doing massively parallel beam-formation at the receive
end. The method is called plane-wave coherent compounding (PWCC) [66–69]. The
plane-wave emission illuminates a large space with one transmission, decreasing the
data acquisition time and greatly increasing the volume rate. The plane-wave can
also be steered to multiple different angles, and the received data from different angles
can be coherently compounded to yield ultrasonic images with better contrast and
less speckle. Moreover, because the data processing is done after all channel data is
collected, with synthetic beam-formation techniques, PWCC is in essence a software
beam-formation process. It is highly flexible and scalable, with its computational
68
Figure 4-14: The illustration of how PWCC works for 2D ultrasonic imaging, courtesy
of [68].
complexity proportional to number of pixels / voxels to be displayed in the final
image.
4.2.1
PWCC for 2D Imaging
The PWCC method was demonstrated on a 1D array for 2D imaging from previous
literature [66–69]. The intuitive illustration is shown in Figure 4-14. The 1D array
emits plane-waves, which have different wavefront angles. Under each transmitted
plane-wave angle, the received waveforms from all channels are collected and stored.
Normal delay-and-sum beam-formation is then carried out on each angle’s data set
to obtain the coarse 2D image of lower contrast and resolution. Finally, coherent
compounding is performed across images obtained from different transmit angles. As
a result, a higher quality image is produced.
The principle of coherent compounding is illustrated in Figure 4-15. The receive
side beamforming delays are calculated as in focused imaging. It is based on the
time-of-flight from the center of the transducer array (0, 0) to a spatial point in the
2D image with the coordinates of (x, z), then back to the receiving element at (x1 , 0),
69
Figure 4-15: The principle of coherent compounding used in PWCC, courtesy of [68]:
(a) the imaging space; (b) the beam-formation delay calculation when the transmitted
plane-wave is normal to the transducer surface (α = 0o ); (c) the beam-formation delay
calculation when the transmitted plane-wave is steered to an angle of α.
as in Equation (4.1) (c is sound speed):
τRX (x1 , x, z) =
q
z 2 + (x − x1 )2 /c.
(4.1)
However, the transmit side beamforming delays need to take into account the
propagation of the plane-wave angle. It is done by adding a constant time offset
to the original delays used to generate the plane-wave, which effectively rotates the
plane wavefronts about a point “behind” the transducer by an angle of α. The delay
for a spatial point at (x, z) in the 2D image is in Equation (4.2):
τT X (α, x, z) = (z · cos α + x · sin α) /c.
(4.2)
Combining both the transmit and the receive side delays, the propagation time
from the center of the transducer array to (x, z) is expressed in Equation (4.3):
τ (α, x1 , x, z) = τT X + τRX .
(4.3)
Additional techniques such as the constant F-number aperture scaling and apodization, as mentioned in Section 2.2, can also be applied. Investigations in [68, 70]
have shown that approximately 7 to 9 plane-wave acquisitions are both adequate
70
and practical for coherent compounding. Therefore the plane-wave acquisitions have
10x reduction in number of transmissions than traditional focused emissions, while
producing images with comparable quality. Extensive image quality measurement
metrics have been used to reach the conclusion, including: -10dB lateral resolution,
contrast, side-lobe amplitude, and image SNR. The reduction in number of transmissions could translate to less system power consumption, or higher image frame rate,
with similar image quality as conventional methods.
4.2.2
Extending PWCC to 3D Imaging on the Column-RowParallel Architecture
The previous PWCC implements plane-wave steering along the azimuth (X) direction
only, so that the 2D images can be coherently compounded. It is quite natural to
extend the plane-wave insonification to be steered in both azimuth (X) and elevation
(Y) directions, so that the whole 3D space can be illuminated and the compounding
can be performed over the volumetric images. This possibility has been briefly mentioned in [71], in which a 32x32 2D transducer array is built and a 3D imaging system
is proposed. However, no detailed algorithm explanations or hardware measurement
results are exhibited.
On the contrary, our proposed 3D imaging architecture could be a suitable hardware platform to support the plane-wave coherent compounding in 3D (PWCC3D).
The algorithm and the hardware realization will be described in this section.
PWCC3D Signal Processing
The beam-formation and coherent compounding procedure can be easily extended to
3D imaging, as shown in Figure 4-16. On our 16x16 imaging system assembly, each
data set of 256 received echo waveforms is associated with one transmit angle. Totally
p transmit angles “α X1” to “α Xp” can be steered along the azimuth direction and
q transmit angles “β Y 1” to “β Y q” are steered along the elevation direction. The
delay-and-sum beam-formation is applied onto each data set, yielding a 3D volumetric
71
image for each transmit angle. And finally the volumetric images for each angle can
be coherently compounded to produce a high quality 3D image.
Each voxel in the volumetric image is beam-formed from the 256-channel data,
the equations for calculating the delay values for each channel can be revised from
Equations (4.1)-(4.3) to adapt to 3D imaging.
The receive side beamforming delays are calculated based on the time-of-flight, but
the coordinates are extended to 3D. The distance is from the center of the transducer
array (0, 0, 0) to a spatial point (i.e. the voxel) in the 3D image with the coordinates
of (x, y, z), then back to the receiving element at (x1 , y1 , 0), as in Equation (4.4):
τRX (x1 , y1 , x, y, z) =
q
z 2 + (x − x1 )2 + (y − y1 )2 /c.
(4.4)
The transmit side beamforming delays are used to account for the propagation
of the plane-wave angle. Depending on whether the plane-wave is steered across the
azimuth or elevation direction, the delays are calculated differently. Equation (4.5) is
used when the column-parallel mode is active and the plane-waves are steered along
the azimuth direction, with an transmit angle of α. The delay for a voxel at (x, y, z)
in the 3D image is:
τT X
azimuth
(α, x, y, z) = (z · cos α + x · sin α) /c.
(4.5)
Equation (4.6) is used when the row-parallel mode is active and the plane-waves
are steered along the elevation direction, with an transmit angle of β. The delay for
a voxel at (x, y, z) in the 3D image is:
τT X
elevation
(β, x, y, z) = (z · cos β + y · sin β) /c.
(4.6)
Combining both the transmit and the receive side delays, the delay value from
the center of the transducer array (0, 0, 0) to voxel (x, y, z) can be summarized with
72
Tx: α_X1
Complex
Domain
Tx: α_Xp
Tx: β_Y1
Tx: β_Yq
16x16 Rx
Waveforms
16x16 Rx
Waveforms
16x16 Rx
Waveforms
16x16 Rx
Waveforms
Hilbert
Transform
Hilbert
Transform
Hilbert
Transform
Hilbert
Transform
Delay-andsum BF
Delay-andsum BF
Delay-andsum BF
Delay-andsum BF
3D Image
α_X1
3D Image
α_Xp
3D Image
β_Y1
3D Image
β_Yq
Coherent Compounding
Envelop Detection
(absolute value)
Final
3D Image
Figure 4-16: The signal processing flow for PWCC3D on the Column-Row-Parallel
architecture.
73
Equation (4.7) for azimuth and elevation steering:







q
q
τazimuth (α, x1 , y1 , x, y, z) = z · cos α + x · sin α +
τelevation (β, x1 , y1 , x, y, z) = z · cos β + y · sin β +
z2
2
2
+ (x − x1 ) + (y − y1 )
/c,
z 2 + (x − x1 )2 + (y − y1 )2 /c.
(4.7)
Under each transmit angle, a coarse 3D image is formed by applying delay-andsum beam-formation algorithm on 256-channel data, with the delay values calculated
from Equation (4.7). Figure 4-16 indicates that the Hilbert transformation is first
performed to convert the original channel data into the “in-phase” signal I(t) and
“quadrature” signal Q(t) to preserve the phase information. When compounding is
performed across different transmit angles, the voxel values are added in both I(t) and
Q(t), which maintains the data coherency, hence the name coherent compounding.
The final compounded 3D image is obtained by taking the amplitude of I(t) and Q(t)
q
( I(t)2 + Q(t)2 ) of the voxels using envelope detection.
Because the beam-formation is performed on each voxel while utilizing the same
set of data, the beamformer is a software beamformer and the processing is very
scalable and flexible. The data acquisition is only done once so that the data under
every Tx angle is stored. The beam-formation can be done independently over the
voxels of interest in the space. One could first perform beam-formation and image
display over a large space with large voxel spacing for a coarse volumetric image; after
spotting feature of interest, one could perform a second-pass processing using the same
collected data, over a smaller space with finer voxel spacing, which would generate
higher definition volumetric images. In this way, a flexible, low-power, software beamformer can be designed to adapt to different user scenarios for optimal trade-offs
between power consumption, processing speed and image quality.
In addition, constant F-number technique is applied during the delay-and-sum
beam-formation (see Section 2.2). Voxels closer to the transducer surface will have a
smaller active aperture contributing to its beam-formation, while voxels farther away
will exploit a bigger active aperture for the beam-formation.
74
Implementing PWCC3D on the Column-Row-Parallel Architecture
The implementation of PWCC3D on the proposed Column-Row-Parallel architecture
is shown in Figure 4-17. All elements are turned on during the transmit phase, so
that a steered plane-wave can be emitted. In Figure 4-17(a), the array is configured
in the column-parallel mode for its Tx path. Since each of the 16 Tx pulser drivers at
the column side is supplied with an independent delay to drive the 16 elements along
the same column, the 16 columns can be delayed with respect to each other, thus
implementing beam-steering along the azimuth (X) direction. Similarly, to achieve
beam-steering along the elevation (Y) direction, as shown in Figure 4-17(b), the
array’s Tx path is arranged in the row-parallel mode, and 16 elements along the same
row are driven by the shared Tx pulser driver at the row side.
During the receive phase, the receive channels are turned on row-by-row, as can
be seen from Figures 4-17(c)-(e). For each row, 16 ultrasonic echo waveforms are
sensed by the activated CMUT elements and amplified by the receiver AFE. The
waveforms are then buffered on-chip by the column buffers and digitized by external
ADCs, before stored digitally in a PC. To collect all 256 elements’ echo waveforms, 16
consecutive ultrasonic insonifications of the same transmit angle are generated, while
the 16 rows are activated serially, such that the whole 16x16 aperture is swept.
This operation sequence is also illustrated in Figure 4-18. There are p angles along
azimuth and q angles along elevation used to generate the final compounded 3D image.
Each angle is transmitted and the echo waveforms are collected for all 256 channels.
Under each angle, 16 transmit-receive repetitions are needed to acquire all channel
data as shown in the inset of Figure 4-18. As a result, totally 16 × (p + q) transmitreceive repetitions are needed for the processing of a final compounded image.
For a general case of a NxN array, the transmit-receive repetitions needed for
acquiring one volumetric image becomes N × (p + q). For a imaging system running
at a certain PRF, the time for one transmit-receive repetition is the PRP (PRF and
PRP are defined in Section 2.1). Therefore, as shown in (4.8) and (4.9), the acquisition
time increases linearly with respect to array size growth (“N ” scaling trend); and the
75
(Tx Plane-wave Steer in X)
Row Select Logic
Plane-wave Delays
(Row-parallel)
Column Select Logic
Row Select Logic
Column Select Logic
(Tx Plane-wave
Steer in Y)
Plane-wave Delays
(Column-parallel)
(a)
(b)
Column Select Logic
Row Select Logic
Column Select Logic
Row Select Logic
Row Select Logic
Column Select Logic
Acquire 16 waveforms
each repetition
Acquire 16 waveforms
each repetition
Acquire 16 waveforms
each repetition
(Step 16 rows for all 256)
(Step 16 rows for all 256)
(Step 16 rows for all 256)
(c)
(d)
(e)
Figure 4-17: The PWCC3D implementation on the Column-Row-Parallel architecture: (a) Tx beam-steering along azimuth (X) direction using column-parallel mode;
(b) Tx beam-steering along elevation (Y) direction using row-parallel mode; (c)-(e)
Rx signal acquisition, sweeping through 16 rows for each transmit angle.
76
Time
Tx: α_X1
Rx (collecting all
echo waveforms)
Tx:
α_X1
Tx: α_Xp
Tx: α_X2
Rx:
Row1
Tx: β_Y1 Tx: β_Y2
Rx
Rx
Tx:
α_X1
Rx
Rx:
Row2
Tx:
α_X1
Rx
Tx: β_Yq
Rx
Rx:
Row16
1 transmit-receive repetition
16 transmit-receive repetitions:
Acquire full 16x16 waveforms under one Tx angle
Figure 4-18: The sequence of operation to implement PWCC3D on the Column-RowParallel architecture.
volume rate of a PWCC3D imaging system is inversely proportional to N . This is a
benign scaling trend for 3D imaging systems, because of the row-by-row or columnby-column data reception capability provided by the architecture.
Acquisition T ime = N × (p + q) × P RP =
V olume Rate =
4.2.3
N × (p + q)
⇔ O (N ) ,
P RF
1
P RF
=
∝ N −1 .
Acquisition T ime
N × (p + q)
(4.8)
(4.9)
PWCC3D Results: Simulations and Measurements
To evaluate the performance of PWCC3D, both Field II simulations and real measurements are carried out. Simulations are compared against the measurements, and
various Tx angles are used to demonstrate the PWCC3D algorithm.
77
Z/
depth
Wire
Phantom
Crosssectional
Image
Crosssectional
Image
Wire
Phantom
Y/
elevation
CMUT
X/
azimuth
CMUT
Single plane-wave, avg 5x
(a)
5 plane-wave angles:
(-6.7o,-3.3o,0o,3.3o,6.7o)
(b)
Figure 4-19: The setup of the wire phantom imaging experiment using PWCC3D
algorithm: (a) a single plane-wave is transmitted to image the wire phantom; (b) five
different Tx angles are used along the azimuth direction for PWCC3D.
Wire Phantom
A wire phantom is first imaged by the 16x16 array setup in simulation and measurements, so that the spatial impulse response can be recorded for the imaging
system. The physical setup is shown in Figure 4-19. The wire phantom is placed
at 7.5mm away from the transducer surface, horizontal to the surface. Transmit
pulsation is 2 bursts of 8.33MHz pulses3 . A constant F-number of 1.75 is used for
beam-formation and the rectangular window is used for both Tx and Rx apodization. Single Tx plane-wave angle insonification is compared against five Tx angles
(−6.7o , −3.3o , 0o , 3.3o , 6.7o ) compounded along azimuth direction in this experiment.
Compounding along the elevation direction is not performed for the wire phantom because its benefit will not be evident for the wire spanning along the elevation direction;
but the compounding along azimuth direction makes big improvement, as revealed
by the simulated and measured images. The volumetric images are displayed at 20dB
dynamic range.
The simulation results are shown in Figure 4-20. It is done by simulating a line
of ideal point scatterers in space to mimic the metal wire in real experiment. The
vertical cross-sectional images of the wire phantom (point spread function) imaged
by single plane-wave and compounded are visually compared in Figure 4-20(a) and
(b). It can be seen that the 5-angle compounded image is of higher contrast and
3
The choice of 2 bursts of pulses is for good image axial resolution, as been discussed in Section
2.1.
78
better resolution. This is confirmed by the quantitative comparison in Figure 4-20(c)
and (d), where the lateral point scatterer’s amplitudes are plotted. The compounded
image has a finer -10dB lateral resolution (0.50mm compared to 0.58mm) and a
lower side-lobe amplitude (less than -30dB compared to -12dB) than single planewave. The side-lobes can be more readily seen in Figure 4-20(e) and (f), where the
horizontal cross-sectional images of the wire are shown. While single plane-wave
transmit generates visible “fake” wires (i.e. the side-lobes) at the two sides of the
main wire location, the compounded image has no side wires visible.
Real imaging experiments on a metal wire phantom are also performed. The metal
wire has a diameter of 0.48mm and is placed 7.5mm away from the transducer. The
same pulsation and PWCC3D beam-formation is used to form the images. Similarly,
single-angle vs. 5-angle compounding results are compared in Figure 4-21. The measured wire images show quality degradations due to the wire thickness, array element
and circuit mismatches. But PWCC3D still demonstrates significant improvement
for image resolution, where the -10dB resolution is improved by over 46% in this case
(from 1.32mm to 0.71mm). The axial resolution is determined by the pulse frequency
and number of bursts. Therefore, -10dB axial resolution is measured to be similar in
two cases (0.39mm for single-angle vs. 0.36mm for 5-angle).
Ring Phantom
The wire phantom displays the benefit of PWCC3D only from the azimuth direction.
Here a metal ring phantom is used to fully demonstrate the benefit of PWCC3D for
a volumetric image. As shown in Figure 4-22, the ring is placed horizontally above
the transducer surface with a vertical distance of 7.5mm. The transmit pulsation is 2
bursts of 8.33MHz pulses, the constant F-number is 1.75, and the rectangular window
is used for both Tx and Rx apodization. The compounding employs 5 different
Tx plane-wave steering angles (−6.7o , −3.3o , 0o , 3.3o , 6.7o ) in azimuth and elevation
directions respectively, so that the ring image can be enhanced in all directions.
In order to investigate closely the effect of coherent compounding in both azimuth and elevation directions, Figure 4-23 shows a comparison between different
79
5-angle in X
Z(mm)
Z(mm)
Single-angle
Wire
Phantom
X(mm)
X(mm)
(a)
(b)
Cross-sectional
Image
Side-lobe:
-12dB
Side-lobe:
< -30dB
Lateral -10dB
resolution: 0.58mm
Lateral -10dB
resolution: 0.50mm
(c)
(d)
Single-angle
5-angle in X
Y(mm)
Y(mm)
Wire
Phantom
Cross-sectional
Image
X(mm)
X(mm)
(e)
(f)
Figure 4-20: Simulation results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional
image produced from 5-angle coherent compounded plane-wave insonification; (c) lateral resolution plot from single plane-wave; (d) lateral resolution plot from 5-angle
plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal cross-sectional image from 5-angle plane-waves.
80
5-angle in X
Z(mm)
Z(mm)
Single-angle
Wire
Phantom
X(mm)
X(mm)
(a)
(b)
Cross-sectional
Image
Lateral -10dB
resolution: 1.32mm
Lateral -10dB
resolution: 0.71mm
(c)
(d)
Single-angle
5-angle in X
Y(mm)
Y(mm)
Wire
Phantom
Cross-sectional
Image
X(mm)
X(mm)
(e)
(f)
Figure 4-21: Measurement results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional
image produced from 5-angle coherent compounded plane-wave insonification; (c) lateral resolution plot from single plane-wave; (d) lateral resolution plot from 5-angle
plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal cross-sectional image from 5-angle plane-waves.
81
Ring
Phantom
Z/
depth
Crosssectional
Image
Y/
elevation
X/
azimuth
Ring
Phantom
CMUT
Single plane-wave, avg 10x
(a)
Crosssectional
Image
CMUT
10 Tx Angles in X & Y
(-6.7o,-3.3o,0o,3.3o,6.7o)
(b)
Figure 4-22: The setup of the ring phantom imaging experiment using PWCC3D
algorithm: (a) a single plane-wave is transmitted to image the phantom; (b) five
different Tx angles are used along the azimuth direction and another five Tx angles
along the elevation direction to image the phantom with PWCC3D.
compounding schemes. Comparing Figure 4-23(a) and (b), the 5-angle compounding
in X direction is able to suppress the side-lobes along the azimuth much more than
the single-angle plane-wave. The most noticeable difference is that the artifact in the
blue-cycle region in Figure 4-23(b) is much less evident than Figure 4-23(a). However, the side-lobes along the elevation are not suppressed, as can be seen from the
red-cycle region in Figure 4-23(b), which looks almost the same as Figure 4-23(a).
Similarly, comparing Figure 4-23(a) and (c), the 5-angle compounding in Y direction is able to suppress the side-lobes along the elevation much more than the
single-angle plane-wave. The artifact along elevation in the blue-cycle region in Figure 4-23(c) is suppressed, but the side-lobes along the azimuth in the red-cycle region
remains and looks almost the same as Figure 4-23(a). When the compounding on
both azimuth and elevation directions are combined, as in Figure 4-23(d), the artifacts
along both directions are suppressed. The image quality is most enhanced compared
to Figure 4-23(a).
Figure 4-24 quantifies the performance improvement of PWCC3D for the ring
images. The vertical cross-sectional images are used to show the side-lobe amplitudes
of the ring images from the single-angle plane-wave insonification and the 10-angle
X & Y steered plane-waves. As can be seen, the side-lobes in the center of the ring
is improved from -7.3dB to -13.3dB, leading to a 6dB improvement with 10-angle
82
Single-angle
5-angle in X
Y(mm)
Y(mm)
Side-lobe
suppressed
Ring
Phantom
X(mm)
Side-lobe
remains
(a)
Cross-sectional
Image
5-angle in Y
X(mm)
(b)
5-angle X + 5-angle Y
Side-lobe
suppressed
Y(mm)
Y(mm)
Side-lobe
remains
X(mm)
X(mm)
(c)
(d)
Figure 4-23: Measured horizontal cross-sectional images of a ring phantom: (a) singleangle Tx plane-wave; (b) 5-angle Tx plane-wave compounding along azimuth direction; (c) 5-angle Tx plane-wave compounding along elevation direction; (d) compounding across all 5-angle azimuth and 5-angle elevation directions.
coherent compounding. A 10kHz PRF is used for the 10-angle compounding scheme
in our experiments. According to (4.8) and (4.9), where (p + q) = 10, N = 16, the
acquisition time for one volumetric image is 16ms and the volume rate reaches 62.5
volume/s. As mentioned in Section 4.2.2, the volume rate will decrease linearly with
respect to the increase in the array size, or number of plane-wave angles, to trade off
for a better image quality.
Cyst Phantom Simulation on a 64x64 Array
As an extrapolation of our current hardware setup, a more complex setup is simulated
in Field II to investigate how technology scaling can push PWCC3D performance
further.
A hypothetical 64x64 2D array with an element pitch of 250µm is used in the
83
Z(mm)
5-angle X + 5-angle Y
Z(mm)
Single-angle
Ring
Phantom
X(mm)
X(mm)
(a)
(b)
Cross-sectional
Image
Side-lobe at center:
-7.3dB
(c)
Side-lobe at center:
-13.3dB
(d)
Figure 4-24: Measured vertical cross-sectional images of a ring phantom: (a) singleangle Tx plane-wave; (b) compounding across all 5-angle azimuth and 5-angle elevation directions; (c) lateral resolution plot of ring image from single-angle Tx planewave; (d) lateral resolution plot of ring image from 5-angle X and 5-angle Y planewaves.
simulation to provide a bigger aperture. A cyst phantom spanning between the depth
of 20mm to 50mm is initiated as the imaging target, which serves as a benchmark
for evaluating speckle reduction performance of PWCC3D. There are three cysts
located at (−3, 0, 25)mm, (0, 0, 35)mm, (3, 0, 45)mm, respectively. Each cyst size is
6mm in diameter. The surrounding of the cysts are randomly spaced point scatterers
mimicking normal tissues. The transmit pulsation is 2 bursts of 5MHz pulses, the
constant F-number is 1.75, and the rectangular window is used for both Tx and Rx
apodization.
The XZ cross-sectional images are shown. Figure 4-25(a) shows the single-angle
plane-wave image while Figure 4-25(b) shows a compounded one, in which 5 angles
along azimuth (−4o , −2o , 0o , 2o , 4o ) and 5 angles along elevation (−4o , −2o , 0o , 2o , 4o )
84
Figure 4-25: Simulated XZ cross-sectional images showing the three cysts in one slice
image: (a) image generated from single-angle plane-wave; (b) image generated from 5
azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional
image location in 3D space.
are used. The associated simulation setup is illustrated in Figure 4-25(c). The comparison shows a much improved image contrast by utilizing PWCC3D.
The YZ cross-sectional images show individual cyst at different depth. Figures
4-26, 4-27, 4-28 show slice image comparisons of the three cysts.
Finally, the volume rate of 10-angle compounding PWCC3D implemented on a
64x64 array with the Column-Row-Parallel architecture would be 15.6 volume/s, assuming a 10kHz PRF, according to (4.9). Compared to the 10-angle compounding
on a 16x16 array in Section 4.2.3, the 64x64 system frame rate is exactly decreased
by 4x. But the image resolution and contrast become better by using a bigger array.
4.2.4
Discussion
The proposed PWCC3D algorithm on the Column-Row-Parallel architecture is a
suitable solution for high volume rate 3D ultrasonic imaging applications. The volume
rate can be traded off with image quality easily. More Tx angles lead to better image
85
Figure 4-26: Simulated YZ cross-sectional images showing the cyst at (−3, 0, 25)mm:
(a) image generated from single-angle plane-wave; (b) image generated from 5
azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional
image location in 3D space.
Figure 4-27: Simulated YZ cross-sectional images showing the cyst at (0, 0, 35)mm:
(a) image generated from single-angle plane-wave; (b) image generated from 5
azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional
image location in 3D space.
86
Figure 4-28: Simulated YZ cross-sectional images showing the cyst at (3, 0, 45)mm:
(a) image generated from single-angle plane-wave; (b) image generated from 5
azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional
image location in 3D space.
resolution and contrast, while the acquisition time would increase linearly and the
volume rate would be reduced. This is a flexible feature that allows PWCC3D to be
adaptive to a wide range of ultrasonic applications.
PWCC3D is also flexible for data processing as a software beamformer, where
volumetric images of different spatial resolution and/or at different regions can be
generated with the same acquired data. The software beamformer capability together with the flexibility of choosing different plane-wave angles, provide rich knobs
that can enable autonomous ultrasonic imaging devices. Such imaging device could
dynamically reconfigure the AFE and the beamformer, so that the data acquisition
and processing are performed with complexity that is suitable for the target scene.
System-level power saving and performance improvement can be optimized under this
frame work [7].
The cyst phantom simulation in Section 4.2.3 has shown how scaling brings improved image resolution and contrast performance. Thanks to the Column-Row87
Parallel architecture, the 3D imaging system’s volume rate only decreases inversely
proportional to N rather than N 2 , and the interconnection complexity of the front-end
is only as high as a 1D array in a 2D imaging front-end.
The Column-Row-Parallel architecture and PWCC algorithm can also be applied
onto a 2D array with the size of NxM, in which M is smaller than N (for example,
64x4). This type of “narrow” 2D array is sometimes called a 1.5D array, in that the
size usually scales at the N (azimuth) dimension while the M (elevation) dimension
is somewhat fixed. By operating the array row-by-row (each row is N elements), it
only takes M transmit-receive repetitions to collect data for one plane-wave angle.
Equations (4.8) and (4.9) can be revised to (4.10) and (4.11).
Acquisition T ime =
V olume Rate =
M × (p + q)
∝ Constant,
P RF
P RF
∝ Constant.
M × (p + q)
(4.10)
(4.11)
Because M is a relatively fixed number in size, the volume rate scaling becomes a
constant as the array size increases in the N dimension. This is the same scaling
trend for a 2D imaging system with a 1D array. The added array elements along
the N (azimuth) dimension contribute to the improved lateral resolution without
degrading the frame rate. Furthermore, the plane-wave coherent compounding along
the M (elevation) dimension effectively realizes the elevational beam-focusing, which
is traditionally implemented with a physical acoustic lens or electrical analog delay
lines on a 1D array.
4.3
Interleaved Checker Board Tx Apertures with
I&Q Excitations for HD2 Reduction in Ultrasonic Harmonic Imaging
Using the Column-Row-Parallel architecture, a new way to reduce Tx second harmonic distortion (HD2) for ultrasonic tissue harmonic imaging (THI) is proposed.
88
It utilizes simultaneous I&Q excitations on two interleaved checker board Tx apertures, in order to mitigate HD2 from both transducers and circuits with any arbitrary
pulse shapes. In particular, CMUT nonlinearity due to its electrostatic mechanism is
suppressed.
4.3.1
THI Principle and Previous Methods
Tissue harmonic imaging is a widely used imaging mode [18–22]. The ultrasound system sends out bursts of ultrasound at the fundamental frequency. Human tissue or
contrast agents (micro-bubbles injected into human body) could have nonlinear reaction to the ultrasonic wave. Specifically, when a sinusoidal pressure wave propagates
through the medium, the tissue or bubble would contract at the positive pressure
(the first half of the sine wave) and expand at the negative pressure (the second half
of the sine wave). The contraction and expansion cause slightly different propagation
speed for the ultrasonic wave, thus distorting the wave in an asymmetric way, which
generates weak second harmonic component.
Instead of tuning to the fundamental reflected ultrasonic echoes as in the conventional ultrasound, THI mode looks for that weak second harmonic echo signal,
while filtering out the fundamental component. The benefit is that ultrasonic beamformation using the harmonic signal has a narrower beamwidth and lower side-lobes,
THI gains improved spatial resolution for better visualization, and improved contrast
resolution for better demonstration of subtle differences.
However, the harmonic signal also tends to be weaker for mainly two reasons.
First, the nonlinear generation of the second harmonic from the tissue is not strong
to begin with. Contrast agents such as micro-bubbles can be injected into human
body to increase the harmonic generation, but it is still weak compared to the fundamental component. Second, the tissue medium presents a frequency-dependent
attenuation for ultrasound propagation, ultrasonic wave at a higher frequency sees
more attenuation during the propagation [12]. Empirically the attenuation coefficient
is about 1-2dB/MHz/cm. If a 5MHz fundamental signal is used and a 10MHz second
harmonic component is generated, the propagation attenuation per centimeter for the
89
second harmonic is 5-10dB more than the fundamental. As a result, the fundamental
signal needs to be filtered or suppressed at the receive side, while at the transmit
side, the second harmonic generation from the transducer needs to be kept at minimum (< −30dBc), so that only the harmonic signal produced by the human body is
received in the end.
Compared to traditional PZT transducers, CMUT is at a disadvantage in THI
mode because of the nonlinear transmit property from its electrostatic actuation
mechanism [19,20], where the actuation force (hence the generated acoustic pressure)
is proportional to square of the electrical pulse excitation V (t). Excessive HD2 is
generated during transmit, making CMUT difficult to be used for harmonic imaging.
Previously, methods to reduce the transmit HD2 generation in CMUT have been
explored. For example, work in [20] focused on pre-distorting the electrical excitation
signal’s pulse shape, such that the frequency content of the actual transmitted acoustic
pulse is HD2 free. The method is heavily dependent on detailed CMUT transmit
properties and its bias voltage, requiring complicated and frequent calibration. Subharmonic driving is also tried in [19], but because CMUT has a DC bias voltage,
the emitted acoustic pulse still contains the sub-harmonic frequency content which
becomes an additional interference.
Instead of working on individual elements, [21, 22, 72] try to cancel the harmonics
at the transducer-level. In [21, 22], a technique called second harmonic inversion is
used. Pulse shape of I(t) is first transmitted. On the next repetition, a delayed pulse
shape Q(t) is used to transmit again; Q(t) is a quarter cycle delayed with respect to
I(t). At fundamental frequency, I(t) and Q(t) are out of phase by π/2; while at second
harmonic frequency, the components from I(t) and Q(t) have a phase difference of
π. As a result, the HD2 from transmitter can be cancelled by synthetically adding
two consecutive received echoes. The scheme is clever, but its drawback is that the
synthetic combining reduces the effective PRF of the system to half, and that motion
artifact in the system could lead to leakage in cancellation.
The work in [72] tries to cancel the Tx HD2 in one shot on a 1D array for 2D
imaging. Simulation has been performed, but not real measurements. The elements in
90
I I
I
Interleaved
Checker Board Tx
Aperture
Column Select Logic
Per-element Bit:
Bank2
Column Select Logic
Row Select Logic
Row Select Logic
Row Select Logic
Per-element Bit:
Bank1
Column Select Logic
Q
QQ
I/Q
I/Q
Figure 4-29: Implementation of checker board Tx aperture on the proposed architecture.
a 1D array is arranged in two groups. Each group contains every other elements from
the array and elements from two groups interleave with each other. The two groups
are driven by I(t) and Q(t) pulses respectively. Because I(t) and Q(t) pulse emissions
happen at the same time, the resulting acoustic pressure field is a linear superposition
of the two groups, in which the second harmonic component is suppressed. This
method is not subject to motion from either the transducer or the scene. However,
care needs to be taken for the grating lobes. This is because the two neighboring
elements have to be driven with the correlated pulses, the effective pitch of the 1D
array becomes twice as big as its physical element pitch.
4.3.2
Tx HD2 Suppression on the Column-Row-Parallel Architecture
Extending the interleaved configuration into a 2D array, the Tx HD2 cancellation can
be done for 3D imaging. In Figure 4-29, two banks of Tx per-element enable bits4 ,
Bank1 in red and Bank2 in yellow, are pre-programmed into checker board patterns.
The elements of the two banks interleave with each other. The pulser gate drivers at
the column side are time-multiplexed to drive both Bank1 and Bank2 with I(t) and
4
The functionality of the per-element enable bits is mentioned in Section 3.3 and will be described
in detail in Section 5.1.
91
Q(t) simultaneously, which are out of phase by a quarter pulse cycle (see Equation
(4.12)). In the mid- to far-field region, the ultrasound pressure from the two banks
can cancel in second harmonic using the I(t) and Q(t) driving scheme.
Q (t) = I t −
T
.
4
(4.12)
This I&Q combination on the two interleaved checker board Tx apertures for
HD2 reduction is a broadband technique that works for any arbitrary pulse shape. A
brief mathematical explanation can show the reason. The arbitrary pulse shape I(t)
with a period of T can be represented by its Fourier series in Equation (4.13), where
V0 , V1 , V2 , ... are its Fourier coefficients and w = 2π/T :
I (t) = V0 + V1 ejwt + V2 ej2wt + V3 ej3wt + ...
(4.13)
The delayed version pulse shape Q(t) is represented by:
Q (t) = I t −
T
4
= V0 + V1 ejwt−jπ/2 + V2 ej2wt−jπ + V3 ej3wt−j3π/2 + ...
(4.14)
The pulse shape is provided electrically, and goes through an electrical to mechanical transduction. The process is modelled as a combination of both linear and
nonlinear processes in Equation (4.15). Because only second harmonic is of concern in
ultrasound systems, up to second-order nonlinearity is modelled for the investigation.



pI (t) = a + b · I (t) + c · I(t)2


pQ (t) = a + b · Q (t) + c · Q(t)2
(4.15)
Looking at the emitted pressure signals pI (t) and pQ (t), Equation (4.16) shows
only their second harmonic component:



pI (t) |HD2 = b · V2 + c · V1 2 + 2c · V0 V2 · ej2wt


pQ (t) |HD2 = b · V2 + c · V1 2 + 2c · V0 V2 · ej2wt−jπ = −pI (t) |HD2
92
(4.16)
Equation (4.16) indicates that the second harmonic component generated from
I(t) and Q(t) excitations are out of phase by π, and it holds for any pulse shape5 .
The fundamental component of pI (t) and pQ (t) are out of phase by π/2, therefore
the combined fundamental intensity is 3dB lower compared to a single full-aperture
excitation. Furthermore, because the nonlinear model is a general model, not only
CMUT nonlinearity, but other sources of nonlinearity can be cancelled using this
method too. For example, circuit mismatches tend to introduce asymmetry in pulse
shape between the rising and falling edges. The simultaneous I&Q excitations on the
interleaved checker board apertures can still be effective in improving the HD2 caused
by the circuit non-ideality.
In the end, the checker board patterns require that the element pitch be smaller
or approximately equal to the ultrasound wavelength (λ = c/f ), so that the grating
lobes are kept at minimum and the HD2 cancellation in space is close to perfect.
4.3.3
Experimental Results
Both simulation and measurement are carried out to verify that the combination of
I&Q excitations cancels acoustic HD2 while the “useful” fundamental intensity is only
3dB less than conventional full-aperture excitation. The simulation assumes a 10x10
array with a pitch of 250µm. 20 cycles of 4.2MHz pulses are used as the stimulating
pulse shape, which go through a nonlinear transform modelled by Equation (4.15).
The pulse shape is 3-level with a peak-to-peak amplitude of 30Vpp, in order to mimic
the real measurement. Other pulse shapes, such as 2-level pulses or sinusoid with
Gaussian envelope, or different number of pulse cycles (between 2 to 20), are also
tried to verify that the cancellation works for arbitrary pulse shape. For conventional excitation, all elements are driven with the same pulse shape I(t). For I&Q
method, the two interleaved banks of elements are driven by I(t) and the delayed
Q(t), respectively.
Figure 4-30 shows the Field II simulation of the I&Q method compared to con5
It is interesting to mention that not only second harmonic, but 6th , 10th , 14th , etc. ((2 + 4 ·
k) , k = 0, 1, 2, ...) are also out of phase by integer multiples of π in I(t) and Q(t) excitations.
th
93
I&Q
Conventional
Spatial Pressure Field
Spatial Pressure Field
Z(mm)
(fundamental)
Z(mm)
f0
mm
Y(
X(mm
)
)
X(mm
m
Y(m
)
(a)
)
(b)
Spatial Pressure Field
Spatial Pressure Field
Z(mm)
(HD2)
Z(mm)
2*f0
X(mm
Y(m
)
m)
X(mm
(c)
Y(m
)
m)
(d)
Figure 4-30: Simulation comparison between the conventional and I&Q methods: (a)
fundamental component spatial intensity for conventional; (b) fundamental component spatial intensity for I&Q; (c) HD2 spatial intensity for conventional; (d) HD2
spatial intensity for I&Q.
ventional excitation. The top two sub-figures (a) and (b) compare the fundamental
component of the emitted pressure field, in which the conventional method produces
a field with 3dB higher intensity than I&Q. The bottom two sub-figures (c) and (d)
compare the HD2 component of the emitted pressure field, which clearly shows a
large suppression from I&Q method. The results of two spatial locations are listed in
Table 4.1, indicating a 20dB reduction in HD2 from I&Q method.
Acoustic measurements are also performed to verify the proposed method. Due
to the fact that there are a few non-functional CMUT elements in the 16x16 array, a
94
I&Q vs. Conventional (Simulation)
“A” (0, 0, 30.3)mm
“B” (0, 0, 10.2)mm
HD2 Reduction
-19.7dB
-19.7dB
Fundamental Loss
-3.0dB (the whole space)
Table 4.1: Simulated HD2 improvement of the I&Q method.
I&Q vs. Conventional (Measurement)
“A” (0, 0, 30.3)mm
“B” (0, 0, 10.2)mm
HD2 Reduction
-21.7dB
-22.1dB
Fundamental Loss
-3.4dB
-3.2dB
Table 4.2: Measured HD2 improvement of the I&Q method.
10x10 sub-array is chosen to carry out the comparison6 . The ASIC Tx channels are
programmed to excite CMUT with either I&Q or conventional full-aperture schemes,
using 3-level 30Vpp pulses7 . Mounting a hydrophone on the 3D translation stage,
the emitted ultrasonic pressure wave is detected and shown on a oscilloscope. An
FFT shows the frequency content. At a given far-field spatial location (i.e. where
the hydrophone tip is located), the pressure intensity generated from I&Q and the
conventional excitations are compared. The measured results at the same spatial
locations as in simulation (Table 4.1) are summarized in Table 4.2, which confirms
that the I&Q method has 3dB less fundamental component and over 20dB less second
harmonic component, similar to simulation results shown in Figure 4-30 and Table
4.1. Moreover, theory predicts cancellation of all (2 + 4 · k)th , k = 0, 1, 2, ... harmonics.
In our measurement, the reductions in the 6th and 10th components are observed on
the oscilloscope, while the 14th harmonic is too weak to see.
To sum up, the I&Q method could be used to reduce the second harmonic generation in Tx for a 3D imaging system. The method works for arbitrary pulse shapes
and works equally well for nonlinearity generated from both transducer and circuit.
In particular, it mitigates the nonlinear problem in CMUT with its electrostatic actuation, and it could suppress the harmonic from the pulser’s rising and falling edge
asymmetry.
6
More details about the non-functional elements and the fault-tolerant circuit design can be found
in Section 5.5.
7
Different pulse shapes are also tried to verify that the method works for arbitrary pulse shapes.
95
4.4
Annular Ring Apertures for Forward-looking
Imaging Applications
Forward-looking ultrasonic imaging systems can be used for intravascular (within the
blood vessel) and intracardiac (within the heart) visualizations. The miniaturized
imaging system is mounted onto the tip of a catheter, which provides minimally invasive diagnosis, interventions or treatments in medical procedures [73–76]. Currently
the more commonly used ultrasound systems for intravascular ultrasound (IVUS) and
intracardiac echocardiography (ICE) are side-looking ones, while forward-looking ones
are gaining more popularity because they offers complimentary information.
Annular ring apertures are suitable to realize forward-looking imaging. Although
dedicated annular ring arrays are available by custom fabrication [75, 76], a generalpurpose 2D array with the proposed Column-Row-Parallel architecture can achieve
similar results [77, 78]. The full 2D array provides even more flexibility, since more
rings can be formed within the regular 2D aperture.
4.4.1
Annular Ring Apertures on Column-Row-Parallel Architecture
As already been shown in Chapter 3, the 2D array with the Column-Row-Parallel
architecture can form a circular aperture or an annular ring aperture by programming
the per-element bits under each element. For annular ring imaging, a circular Tx
aperture is used for transmit and four concentric annular rings with different diameters
can be activated as Rx apertures, shown in Figure 4-31(a). The Tx elements are
supplied with the same delay value “D” as in Figure 4-31(b), so that the whole
circular aperture is driven in-phase and emits a broad ultrasound beam. The Rx
elements’ analog outputs are also combined in parallel along the column, and by
digitally summing the weighted waveforms from all column buffers, one echo waveform
will be collected for each annular ring (Figure 4-31(c)-(f)). The weight for each
column is the number of active elements along the column. Equation (4.17) describes
96
Column Select Logic
Rx Path
D
Digital
Waveform
Row Select Logic
Tx Path
(a)
DD
Column Select Logic
Row Select Logic
Column Select Logic
Row Select Logic
D
(b)
s1,0(t)
s1,15(t)
s2,2(t)
s2,13(t)
S1(t)
S2(t)
(c)
(d)
Column Select Logic
Column Select Logic
Row Select Logic
Weighted Digital Summation
Row Select Logic
Weighted Digital Summation
s3,4(t)
s3,11(t)
s4,6(t)
s4,9(t)
Weighted Digital Summation
Weighted Digital Summation
S3(t)
S4(t)
(e)
(f)
Figure 4-31: Annular ring mode imaging implemented in Column-Row-Parallel architecture: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposed
architecture, all active elements are driven in-phase; (c) Rx aperture with the biggest
ring shape, all active elements’ analog outputs are combined; (d) Rx aperture with
the 2nd ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture with
the smallest ring shape.
97
z
z
τ1(z)
R1
D
D
S1(t)
τ4(z)
R3
Rx
Tx
Rx
Tx
z
τ3(z)
R2
Rx
Tx
z
τ2(z)
D
S2(t)
S3(t)
R4
Rx
Tx
D
S4(t)
Dynamic
Beamforming:
z
Beamformed
Axial Line
Figure 4-32: Annular ring mode dynamic beam-formation scheme.
the function of the weighted digital summation block:
Sm (t) =
15
X
nk · sm,k (t),
m = 1, 2, 3, 4.
(4.17)
k=0
Take the smallest ring in Figure 4-31(f) as an example, number of active elements for columns s6 (t) ∼ s9 (t) are 4, 2, 2, 4, respectively. Therefore the weighted
summation should be:
S4 (t) = 4 × s4,6 (t) + 2 × s4,7 (t) + 2 × s4,8 (t) + 4 × s4,9 (t) .
(4.18)
The four Rx annular rings are activated over four consecutive Tx transmits as
shown in Figure 4-32. The digital waveforms from the four Rx rings can then be
dynamically beamformed to generate a synthetic A-scan line along the axial axis of
the rings. Because all elements on the same ring have the same time-of-flight to a
point on the axial axis, each ring has a natural focus effect along the axial axis. The
delay value for a spatial point located at depth z away from the transducer surface,
for the ring with a radius of Rm , is calculated as:
τm (z) =
q
z 2 + Rm 2 /c,
98
m = 1, 2, 3, 4.
(4.19)
The beamformed image line along the axial axis thus becomes:
SBF (z, t) =
4
X
Sm (t − τm (z)).
(4.20)
m=1
The circular and ring apertures are translated horizontally, so that different axial
A-scan lines can be collected to form volumetric images. Examples of the translated
Tx and Rx apertures are shown in Figure 4-33, some edge effect will affect the scan
line intensity slightly, but not significantly.
4.4.2
Annular Ring Imaging Results
The forward-looking programmable annular ring array can form volumetric images
by moving the circular Tx and annular Rx apertures in the 2D array, so that multiple
axial lines can be acquired. Both simulation and measurement of a wire phantom
are performed, similar to the PWCC3D experiments. The wire phantom is 0.48mm
in diameter and is placed at 10.5mm away from the transducer surface, horizontal
to the surface. Transmit pulsation is 2 bursts of 8.33MHz pulses. The volumetric
images are displayed at 20dB dynamic range. Totally 81 circular Tx apertures are
swept through the 2D array, acquiring data for 81 axial lines. With 4 beamforming
annular rings at each Tx aperture location, totally 324 transmit-receive repetitions
are needed to acquire a full set of volumetric data. Similar to PWCC3D equations
(4.8) and (4.9), the acquisition time and volume rate for the annular ring imaging
system can be calculated in (4.21) and (4.22).
Acquisition T ime =
V olume Rate =
(# Axial Lines) × (# Annular Rings)
,
P RF
P RF
.
(# Axial Lines) × (# Annular Rings)
(4.21)
(4.22)
The acquisition time again scales linearly with respect to the number of axial lines in
the volumetric image, or number of annular rings used for beam-formation.
The volumetric images from simulation and measurement are shown in Figure
4-34. The cross-sectional images display a clear wire in the space and at the same
99
Column Select Logic
Rx Path
Row Select Logic
Tx Path
Digital
Waveform
D
D
DD
Column Select Logic
Column Select Logic
Row Select Logic
(b)
Row Select Logic
(a)
s1,0(t)
s1,11(t)
s2,0(t)
Weighted Digital Summation
s2,9(t)
Weighted Digital Summation
(d)
Column Select Logic
Column Select Logic
Row Select Logic
S2(t)
(c)
Row Select Logic
S1(t)
s3,0(t)
s3,7(t)
s4,2(t)
s4,5(t)
Weighted Digital Summation
Weighted Digital Summation
S3(t)
S4(t)
(e)
(f)
Figure 4-33: Annular ring configuration example, off-center: (a) Tx and Rx aperture
setup; (b) Tx aperture implemented in the proposed architecture; (c) Rx aperture
with the biggest ring shape; (d) Rx aperture with the 2nd ring shape; (e) Rx aperture
with the 3rd ring shape; (f) Rx aperture with the smallest ring shape.
100
time they provide the evaluation for the performance. The measured -10dB lateral
resolution (from XZ slice, Figure 4-34(b)) is 1.19mm and the -10dB axial resolution
(from YZ slice, Figure 4-34(d)) is 0.32mm. Both numbers are close to the performance measured from PWCC3D images with single-angle plane-wave insonification
in Section 4.2. Using a 10kHz PRF, the volume rate is 30.9 volume/s, according to
(4.22).
Lastly, this section aims to demonstrate the capability of the Column-Row-Parallel
architecture in forward-looking ultrasonic imaging applications. The 16x16 array size
limits the range covered in the space. As the manufacturing technology improves, a
bigger array could lead to a much better image quality. Furthermore, the number of
annular rings can also be increased, but at the cost of a proportionally lower volume
rate (the volume rate is linearly reduced with respect to the increase of the number
of annular rings used).
4.5
Summary
This chapter has presented several 3D medical ultrasonic imaging applications for
the Column-Row-Parallel ASIC architecture. The 16x16 CMUT-PCB-ASIC imaging front-end is assembled to demonstrate the 3D imaging algorithms such as the
plane-wave coherent compounding and the annular ring aperture imaging. The same
architecture can be programmed to implement different imaging algorithms. Both
schemes are suitable for high volume rate imaging with decent quality. Moreover,
the architecture enables an interleaved checker board pattern with I&Q excitations
for Tx HD2 reduction. The scheme is promising to improve the intrinsic nonlinear
property of CMUT, facilitating the ultrasonic harmonic imaging mode.
101
Measurement
Simulation
Z(mm)
Z(mm)
Wire
Phantom
Cross-sectional
Image
X(mm)
X(mm)
(a)
(b)
Measurement
Simulation
Z(mm)
Z(mm)
Wire
Phantom
Cross-sectional
Image
Y(mm)
Y(mm)
(c)
(d)
Simulation
Measurement
Y(mm)
Y(mm)
Wire
Phantom
Cross-sectional
Image
X(mm)
X(mm)
(e)
(f)
Figure 4-34: Cross-section slices of the wire phantom 3D images from simulation and
measurement: (a) simulated XZ slice; (b) measured XZ slice; (c) simulated YZ slice;
(d) measured YZ slice; (e) simulated XY slice; (f) measured XY slice.
102
Chapter 5
Design of the 16x16 Ultrasonic
Transceiver Array ASIC with
Column-Row-Parallel Architecture
The transistor-level design of the 16x16 ultrasonic transceiver ASIC is described in
this chapter. It follows the high-level description in Chapter 3, and adds implementation details to Chapter 4.
The block-level circuit design [5, 6] is optimized to interface to CMUT transducers. However, the architecture-level design is a flexible and scalable analog front-end
solution for 2D ultrasonic arrays in general, applicable to different technologies, such
as CMUTs, PMUTs, and bulk PZTs. This chapter will cover both architecture-level
and block-level circuit designs.
5.1
High-Level Description of the Ultrasonic Imaging Transceiver Circuits and the Architecture
Logic Implementation
This section describes the digital implementation of the Column-Row-Parallel architecture. The design aims to realize the rich functionality as presented in Chapter 3
103
and 4, while achieving linear scaling for the programming time. The control logic
attains a proper separation of functionality, such that the control from the sides is
more often used to take advantage of its fast programming time, while the control
within each element provides more diverse system functionality.
The overview of the proposed Column-Row-Parallel architecture has been given
in Section 3.3. Figure 3-4 shows the array structure and Figure 3-5 shows the perelement circuit block diagram. In addition to these main blocks, more circuit details
will be discussed here. For convenience, the block diagram of one transceiver channel
in the 2D array presented in Figure 3-5 is shown again in Figure 5-1.
Each CMUT element in the 2D array is DC-biased with the shared RC network
provided off-chip. Resistor Rb and capacitor Cb filter out noise from the high voltage
supply and provide an AC ground for the transducer. The DC bias voltage V BIAS
applied on the CMUT is between 20-50V. The transceiver channel includes a 30Vpp
high voltage pulser in the transmit (Tx) path, which drives the ultrasonic transducer
to emit acoustic energy. The emitted ultrasonic wave travels through the medium
and is reflected whenever it hits medium boundaries with mechanical impedance
mismatch. The reflected echoes are transformed by the CMUT element into a weak
electrical signal. A low noise amplifier (LNA) in the receive (Rx) path amplifies
the weak signal to the output. During transmission, the Rx switch is turned off
to prevent the high voltage transients from breaking the LNA implemented with low
voltage transistors. The CMUT device used in this work has a pass-band of 3-10MHz.
The frequency range, power consumption, noise, and linearity performance of the Tx
and Rx circuits are designed and optimized for this CMUT device’s parameters.
After collecting multiple channels’ outputs from one or several transmissions, ultrasonic images can be generated, as been shown in Chapter 4. Medical ultrasound
systems use beam-formation to improve the image quality. Tx beam-formation is realized by controlling and applying different delays across the Tx channels. Similarly,
the received signals are digitized and processed by an off-chip Rx beamformer.
The digital control inside each transceiver channel has been described in Section
3.3. The combination of Row Select Signals, Column Select Signals and P er −
104
Tc[ i ] Rc[ i ]
Transceiver
[ i, j ]
Tr[ j ]
Rr[ j ]
T_en R_en
R
Row
Gate Driver[ j ]
b
T
b
Row
BUF[ j ]
Column
Gate Driver[ i ]
Tc[ i ]
T_en
Tr[ j ]
T_en
Column
BUF[ i ]
Rc[ i ]+Rr[ j ]
R_en
Rc[ i ]
R_en
Rr[ j ]
R_en
Figure 5-1: A re-plot of Figure 3-5 in Section 3.3. (a) The block-level implementation
of one transceiver channel and (b) the per-element logic implementation. Column
and row select logic is implemented with shift registers that can be reprogrammed in
“N ” time (implementation detail will be shown in Figure 5-2).
105
T_bank1 T_bank2 T_bankSel
0
<15>
0
0
T_en1 T_en2
R_en1 R_en2
0
<15>
Tr[0:15]
(0: row-para)
T[0:15] Tmode(1: col-para)
01 01
<0>
<1>
01
R_en
T_en
1 P_bankSel
1
<0>
<1>
0
1
0
1
Tc[0:15]
(b)
R_bank1 R_bank2 R_bankSel
(a)
0
<1>
<1>
<15>
01 01
<15>
Rr[0:15]
(0: row-para)
R[0:15] Rmode(1: col-para)
0
<0>
01
<0>
0
1
0
1
Rc[0:15]
(c)
Figure 5-2: Circuit implementation for the logic control: (a) multiplexing for perelement enable bits; (b) Tx row / column selection logic; (c) Rx row / column selection
logic.
element Enable Signals determine whether the channel is connected to column side,
row side, or turned off. The per-element logic implementation has been shown in
Figure 5-1(b). The transmitter and receiver are controlled independently and they
are time-multiplexed during normal operations.
Both control signals on the sides and the per-element controls are implemented
with shift registers (SR’s), which can be programmed serially. Each set of control
is realized with two SR banks multiplexed, as shown in Figure 5-2. The P bankSel
(Figure 5-2(a)), T bankSel (Figure 5-2(b)), and R bankSel (Figure 5-2(c)) control
the SR bank selection for per-element enable bits (T en and R en), Tx row / column
selection (T [0 : 15]) and Rx row / column selection (R[0 : 15]) respectively. There are
several benefits associated with this implementation. First, while one bank is being
programmed, the other bank is in use to avoid interrupt of operations. Second, the
two banks can be pre-programmed, and by quickly switching between the two banks
using the bankSel signal, they can take turns to control the ASIC.
Additionally, for Tx and Rx side controls in Figure 5-2(b) and (c), they are further
106
divided into Row Select Signals (T r[0 : 15] and Rr[0 : 15]) and Column Select Signals
(T c[0 : 15] and Rc[0 : 15]). Only one side can be active at any given time. Taking
Tx control as an example, the multiplexed SR outputs T [0 : 15] are forked and gated
by a pair of multiplexers controlled by T mode to generate T x Row Select Signals
(T r[0 : 15]) and T x Column Select Signals (T c[0 : 15]), the logic ensures that while
one side is activated, the other will remain all “0”.
As been briefly mentioned previously, this partition of controls provide flexibility
and scalability. The side controls are programmed within “N ” time, which can be
easily reprogrammed between consecutive ultrasound transmits. The example use
of this control style is the row-by-row receive implemented in PWCC3D algorithms
in Section 4.2. The per-element controls provide maximum flexibility despite the
fact that they are programmed with a longer time (“N 2 ”), as they are snake-chained
through all 2D array elements. They can be less often changed to provide mode
switch. Examples include the annular ring imaging experiments in Section 4.4, and
the selective disabling of non-functional elements as will be described in Section 5.5,
which is a critical fault-tolerant feature for analog front-end circuits working with
MEMS devices. Lastly, alternating two pre-programmed SR banks could realize the
fast swapping between two ultrasound aperture patterns. The simultaneous I&Q excitations to the interleaved checker board patterns implemented by switching between
two per-element SR banks in Section 4.3 is a perfect example.
5.2
Tx Circuit Design
This section describes the design of the transmitter path. The block-level transmitter circuit design will be introduced first [5, 6], which is optimized to drive CMUT
elements. A multi-level pulse shaping technique with charge recycling is proposed to
boost the efficiency of the transmitter with a CMUT element load. The design is
highly scalable and compact, requiring minimum off-chip components. Next, design
issues for making a 2D array of transmitters will be described, which is more general
and applies to many other types of ultrasonic transducers. High voltage pass-gate
107
transistors implement multiplexers, which realizes the programmable column / row
addressing, and handles the parallelism of multiple transmit elements.
5.2.1
Multi-Level Pulsing for Efficient CMUT Driver
For the transmitter, high voltage linear amplifiers are commonly used to drive the
PZT loads to achieve good linearity and acceptable efficiency [79, 80]. To drive a
CMUT load, however, linear amplifiers are not optimum. In addition to the amplifier power consumption, a considerable power loss becomes associated with charging
and discharging the parasitic capacitance of the CMUT element [41], degrading the
overall power efficiency of the transmitter stage. Furthermore, the linearity of the
amplifier does not translate to good linearity performance of the transmitter stage,
because the CMUT element distorts the amplifier’s output waveform through the
nonlinear relationship between the electrical input signal and electrostatic force acting on the element’s membrane [20]. Resonant transmitters with inductors to cancel
out the loading capacitance could boost the power efficiency [27]. However, bulky
off-chip inductors of several micro-Henries are needed for every transmitting channel, to work with typical loads of 10-200pF per channel at the ultrasound operating
frequency range of 1-20MHz [81–83], which is undesirable for compact integration.
Alternatively, the multi-level pulsing technique, which was initially introduced for
chip-to-chip interconnects [84], can be applied to reduce the power consumption on
the capacitive load. Multi-level techniques have been used in PZT ultrasound drivers
for pulse-shaping and harmonic suppression [81–83,85]. However, the power efficiency
was not improved because charge recycling was not implemented between the multiple voltage levels. This section presents the advantage of the multi-level pulsing with
charge recycling to improve the combined power efficiency of the CMUT transducer
and transmitter. It also requires the least off-chip components, as will be seen in
Section 5.2.2.
The transmitter load model of a CMUT element is represented by a capacitor
and resistor in parallel, as shown in Figure 5-3(a). The capacitor C is the parallelplate capacitance between the CMUT element’s membrane and the common node.
108
b
b
v(t)
1
pk
p
v(t)
1
pk
p
Figure 5-3: (a) The transmitter load model of a CMUT element used in this work.
(b) An exemplary 2-level square wave pulse applied onto CMUT. (c) An exemplary
3-level pulse applied onto CMUT.
The resistor R is the medium’s mechanical load at the CMUT surface, transformed
to the electrical port [41]. The power dissipated by R, due to the electrical pulse’s
fundamental frequency component, models the useful acoustic power delivered into
the medium. The power dissipated while charging and discharging C (dynamic power)
does not contribute to the acoustic output and thus is wasted.
The CMUT transducer used in this work is a 16x16 2D array. Each CMUT element
has a size of 250µm × 250µm and is modelled as 2pF ||1M Ω [41]. The Tx efficiency is
defined as the ratio between the useful acoustic power and the total power dissipated.
It models the combined efficiency of CMUT and the ultrasonic pulser together, by
capturing both the power loss in the pulser circuitry and the dynamic power dissipated
by charging and discharging the CMUT parasitic capacitance.
To show how multi-level pulse-shaping increases Tx efficiency, first assume the
conventional 2-level square wave pulses are used to drive a 2pF ||1M Ω load, as shown
in Figure 5-3(b). The pulse magnitude is 30Vpp at a frequency of 3.3MHz. The
109
amplitude of the fundamental frequency component is the Fourier series of the periodic
pulse shape, as described in (5.1):
!
2π
2 Z Tp
v (t) · sin
· t dt.
V1 =
Tp 0
Tp
(5.1)
The amplitude V1 is calculated to be 19.1V, or 13.5Vrms. Therefore, the power dissipated on the 1M Ω resistor, i.e., the transmitted ultrasonic power at fundamental
frequency, is 0.182mW. Meanwhile, the dynamic power wasted on charging and discharging the capacitor C is calculated to be: CV 2 f = 6mW.
An N-level pulser, using (N − 1) regulated voltage sources to charge and discharge
the capacitor in a stepwise fashion, reduces the wasted dynamic power to CV 2 f /(N −
1) [84]. The power saving comes from the charge recycling mechanism during the
discharge operation, which is enabled by the regulated voltage supplies1 . Instead of
discarding all the capacitor charge CV to ground as in the square wave case, a charge
packet of CV /(N − 1) is recycled back to the power supply when the capacitor is
switched from one voltage level to the next lower one. As many as (N − 2) charge
packets of CV /(N − 1) are recycled until the last packet is dumped to ground. As a
result, the dynamic power is reduced by a factor of (N − 1). At the same time, the
magnitude of the fundamental component is only decreased slightly following (5.1),
leading to overall efficiency improvement. For example, Figure 5-3(c) shows 3-level
pulses with 20ns middle voltage level steps, out of a 300ns period. Its fundamental
frequency component amplitude is 18.7V, or 13.2Vrms. The useful power delivered
is 0.174mW and the dynamic power is CV 2 f /2 = 3mW. A comparison to the square
wave example reveals theoretically a 49% total power saving with only a 4.4% acoustic
power reduction, or equivalently 88% more acoustic output power given the same total
power dissipation.
1
Without regulated supplies which recycle charge, the dynamic power cannot be reduced even
with multi-level pulsing, as is the case in [81–83].
110
3-level Waveform Generation
& Tx Beamforming Control
Shared
DC-DC Converter
(off-chip capacitors)
HVDD=30V
30V
Ψ1
M8
VBIAS
3
15V
Ψ1
Ψ2
M5
4
M4
Ψ2
M7
0.1uF
CMUT Bias
Circuitry
(off-chip)
Vo
M2
M6
1MΩ
M3
2
CMUT
0.1uF
0.1uF
0V
M1
1
Figure 5-4: Circuit schematic of the four-channel 3-level pulsers with the middlevoltage generation (all transistors are high voltage devices).
5.2.2
3-Level Pulser Circuit Design
The 3-level pulser is implemented as shown in Figure 5-4. The three pulse voltage
levels are 30V (HVDD), 15V and 0V (GND). The 15V middle voltage is generated
from a 2:1 parallel-series switched-capacitor DC-DC converter (M5-M8), which is
shared between channels. The only off-chip components are two 0.1µF capacitors.
Because of the charge recycling nature of the proposed 3-level pulser, and that the
CMUT load (roughly 2pF per channel) is much smaller than 0.1µF , the converter
can operate at a very low frequency (10-100Hz) to save power, consuming less than
1% of the total 256-channel pulsing power.
3-level pulse-shaping is implemented with four high voltage switches (M1-M4) in
each channel. NMOS M1 and M2 are used for the transitions of 15→0V and 0→15V
respectively, while PMOS M3 and M4 are used for the transitions of 30→15V and
15→30V respectively. The on-resistance of each transistor and the CMUT capacitance
form a RC time constant that determines pulse voltage level settling. The transistors
are sized wide enough to keep the RC time constant at around 3ns, so that the 10%
111
to 90% rise / fall time is 6.6ns. This is close to 1/20 of the pulse cycle typically
used (3-10MHz pulses with pulse cycles of 100-333ns) to make sure the 3-level pulse
shape is not excessively compromised by the settling edges. The relative timing
differences between each channel’s gate control signals is digitally adjustable and
effectively implements the Tx beamforming.
To reduce number of I/O ports needed for pulser gate control, a non-overlapping
2-to-4 line decoder is used for each channel with 2 lines of low voltage control inputs (Ain and Bin ) supplied off-chip from a FPGA running at 100MHz. As shown
in Figure 5-5(a), the inputs first go through non-overlapping signal generation blocks
(implementation shown in Figure 5-5(b)) before being fed into the 2-to-4 decoder.
The non-overlapping block ensures that the generated low-voltage gate control signals (ϕ1 (LV ) − ϕ4 (LV )) have dead time between each other, such that the pulser
transistors (M1-M4 in Figure 5-4) are not on at the same time, dissipating unnecessary crowbar current. The non-overlapping dead time is 2-bit adjustable through the
variable length delay lines controlled by Delay[0 : 1] to provide enough adjustment
margin.
The low-voltage gate controls are further level-shifted by the cross-coupled level
shifters in Figure 5-5(c), which translate the low-swing signals into high-swing signals
that drive gates of the high voltage transistors in the pulser and the DC-DC converter.
The threshold voltage of M1 and M2 is low enough such that they can be completely
turned on by the 3.3V inverters. The level-shifted gate drive signals have a 30V voltage
swing, which is under the rated operation conditions of high voltage transistors in
this process. The typical set of 3-level pulser control signals and the resultant 3-level
pulse shape at the output V o, are shown in Figure 5-5(d). Because the low-voltage
signal swing is small, the digital control power is negligible compared to the pulser
power.
This design of multi-channel pulsers with a shared voltage converter can be extended easily to more Tx channels, without additional off-chip components. It could
also be revised to implement more voltage levels to achieve more dynamic power reduction. However, this requires the addition of more switches connected between the
112
Non-overlapping
2-to-4 Line Decoder
Ain
Bin
Nonoverlap
Nonoverlap
A
2-to-4
Decoder
Ab
Ab & Bb
Ab & B
A &B
A & Bb
B
Bb
Level-shifter
φ1(LV) Levelφ2(LV) shifter
φ4(LV) (X4)
φ3(LV)
φ1
φ2
φ4
φ3
(To
pulser
gates)
(Off-chip
controls)
(a)
Ain
Bin
A
Ab
Non-overlap
B
Xin
X
Bb
Delay[0:1]
φ1
Xb
φ2
(b)
Level-shifter
φx M6
M3
φ4
30V
φ3
M4
Vo
(pulser
output)
M5
3.3V 3.3V
φx
(LV)
(1x) (8x)
(LV) (LV)
*M1~M6 are
HV devices
(d)
M1 M2
3.3V
(8x)
(LV)
(c)
Figure 5-5: The digital control circuits for the pulser: (a) the signal flow and block diagrams; (b) the non-overlapping signal generator; (c) the level shifter implementation;
(d) the control signal timing diagram.
113
CMUT and the voltage levels. Due to the large drain capacitance of high voltage
switches, the self-loading effect takes away much of the power savings from introducing additional voltage levels. According to simulation results of the 0.18µm CMOS
process used in this work, a 3-level pulser dissipates 16% of total power to drive the
gate and drain capacitance of M5-M8 in Figure 5-4. For a 4-level pulser, the dynamic
power reduction is counteracted by the power increase to drive more and bigger transistors, leaving the overall efficiency roughly the same as a 3-level pulser. A 5-level
pulser incurs even more power penalty on driving the high voltage transistors and the
efficiency is lower than a 3-level pulser. Therefore, a 3-level pulser design is used in
this work.
5.2.3
Tx Path Design for 2D Ultrasonic Transducer Arrays
For the 2D ASIC implementation, a 2D grid of per-element 3-level 30Vpp pulseshaping pulsers are connected by column and row lines, additional circuitry is added
to support column-parallel and row-parallel modes.
Figure 5-6(a) shows the complete schematic of a pulser at the j th row and ith
column and its corresponding row and column gate drivers. Except M2 and M3, all
transistors’ bulk are connected to source. The bulk of M2 and M3 are connected to
0V and 30V respectively. The pass-gate multiplexers2 implemented in high voltage
transistors are added into the per-element pulser as shown in Figure 5-6(b). This is
to implement the functionality of T r and T c switches in Figure 5-1(a), so that the
pulser gates can be either driven by the row driver, the column driver, or none, in
which the gate is held at 0V for M1-M2 and at 30V for M3-M4.
An important issue in the 2D array design is the line parasitics. To account for the
line parasitics accurately, the line metal layout is extracted to obtain the estimated
lumped circuit model (Rp , Cp ) as shown by the red circle in Figure 5-6(a). The pulser
is placed under each element to avoid the parasitics affecting the pulsing performance
as much as possible. This is because the line parasitics are only present as a load for
2
All four pulser gates, M1-M4, have their MUX, but only M3 is shown in Figure 5-6(a) as an
example.
114
φ4
Row Gate Driver for M3 [j]
30V
M10
M12
30V
30V
φ3,r[j]
INV1
INV2
M7
M5
M6
M9
M11
φ2
φ1
INV3
0V
Per-element
line parasitics
(1x layout width)
VBIAS
M3
15V
M8
M4
φ3
MUX3
D3,r[j]
Pulser [ i, j ]
M2
(2pF)
CMUT
1MΩ
0.1uF
M1
φ3,c[i]
Column Gate
Driver for M3 [i]
Rp=62Ω
Cp=25fF
D3,c[i]
30V
MUX3
(Tc+Tr)
Tc
To M3 Gate
φr<j>
Tc
Tr
Tr
φc<i>
Figure 5-6: Tx design for the 2D array: (a) 2D pulser schematic; (b) MUX implementation.
the gate drivers, rather than the pulser itself. In this way, when there are different
number of elements active along a column or a row line, the gate driver sees different
loads while each pulser always sees a constant CMUT load that is local. Therefore, the
design makes sure that the pulse’s shape and amplitude is preserved and invariant to
number of active elements. Meanwhile, the gate driver transistor sizing is optimized
to drive pulsers on the same column or row line with the presence of parasitic line
capacitance.
In current design, the gate drivers are sized for the heaviest driving load, which
corresponds to all 16 active pulser gates (about 90-100fF per gate capacitance), and
115
the line parasitics across the length of 4mm (16 × 250µm). The column / row line
layout is implemented with minimum width metal wire layout, giving an estimated
per-element (250µm length) line parasitic model: Rp = 62Ω, Cp = 25f F . The gate
driver power consumption takes up about 35% of the total power consumption in Tx.
However, in the future, the gate drivers can also be made programmable in driving
strength, so that it can adapt to the number of active elements to save power. With
the adaptive driving strength, the self-loading of the gate driver at the light load can
be reduced, and a constant pulser efficiency can be maintained.
5.3
Rx Circuit Design
This section describes the design of the receiver path. The block-level circuit optimization to interface to CMUT elements is presented first [5, 6]. A transimpedance
amplifier (TIA) topology is utilized to improve the trade-offs between noise, bandwidth, and power dissipation. Design optimizations for the 2D array will be described
next, which can be applied to general 2D ultrasonic transducers. A specially sized
source follower output stage is added to the LNA to implement receiver parallelization
for improved SNR.
5.3.1
LNA Optimization Methodology for CMUT
For the receiver, large input capacitance limits the bandwidth and tends to increase
the noise contribution from the input stage transistors, degrading the noise figure
(NF). Bulky off-chip inductors are needed to impedance match the source to a traditional PZT pre-amplifier that assumes a low-impedance source [27]. Charge-based
amplifiers were attempted for CMUTs. The continuous-time charge amplifier achieved
low noise and low power performance for CMUT working at kHz range [86], but the
large impedance from the DC-setting network limits the bandwidth for a CMUT array working at MHz range for medical imaging applications. The switched-capacitor
charge integrating amplifier in [87] could provide enough bandwidth, but issues such
as clock feed-through and charge injection are difficult to mitigate for the inherently
116
Figure 5-7: Small signal model and noise sources of the CMUT element and the LNA.
single-ended CMUT signal path. Moreover, because the sampling clock switches at a
higher frequency than input signal bandwidth, the settling requirement for the op-amp
demands a higher bandwidth than what is needed in op-amps used as continuous-time
buffers, leading to much more power consumption. In this section, the TIA topology
is described to improve the trade-off between gain, bandwidth and noise, with an
inductor-less design at the presence of high input capacitance [40, 45, 88].
Figure 5-7 shows the small signal model of the CMUT and LNA. Figures 5-8 and
5-9 plot various circuit transfer functions to help analyze the optimization process for
the LNA. The closed-loop TIA gain is expressed as:
ZCL
1
= Rf ·
1 + sRf Cf
!
·
F · AOL
,
1 + F · AOL
(5.2)
where F = Zi /(Zi +Zf ) is the feedback factor, and AOL is the op-amp open-loop gain.
From (5.2), the LNA DC transimpedance gain is Rf and its bandwidth is determined
by the smaller of the following two poles:
fp =
1
,
2πRf Cf
117
(5.3)
Gain (dB)
100
AOL
80
1/F
ZCL
60
Gn
40
20
0
4
6
10
8
10
Freq (Hz)
z
10
p
i
c
Figure 5-8: Transfer functions when the LNA optimality condition is reached.
i
i
p
OL
p
OL
Figure 5-9: Transfer function examples when the LNA optimality condition of fi ≈ fp
is not reached: (a) fi < fp , (b) fp < fi .
118
fi ≈
q
s
fc · fz =
fc ·
1
.
2π (Rf ||Ri ) (Ci + Cf )
(5.4)
fp in (5.3) is due to the RC time constant of the second multiplying term in (5.2).
fi in (5.4) comes from the third multiplying term in (5.2), which reaches -3dB when
F · AOL = 1. Graphically, as can be seen in Figures 5-8 and 5-9, fi is the intersection
between 1/F and AOL curves, which is approximately the geometric mean of 1/F ’s
zero (fz ) and the op-amp’s unity-gain frequency (fc ), assuming a 20dB/dec slope in
both 1/F and AOL curves.
When fi < fp , as shown in Figure 5-9(a), an increase in Rf always improves
LNA’s gain-bandwidth product (GBP). This is because gain = Rf , while bandwidth
q
= fi , which is approximately proportional to 1/ Rf as indicated by (5.4). GBP
improves roughly proportionally with
q
Rf . However, because fp is proportional to
1/Rf as indicated by (5.3), the increase in Rf leads to faster decrease in value of fp
than fi . When fi ≈ fp , the LNA achieves maximum GBP available from the op-amp.
The phase margin is roughly 45o . Further increase in Rf no longer improves GBP,
because the bandwidth becomes limited by fp and is proportional to 1/Rf (Figure
5-9(b)), holding the GBP constant. But as Rf increases, the phase margin continues
to improve at the expense of a reduced bandwidth [88].
The optimality condition, fi ≈ fp , also minimizes noise contribution from the
op-amp input-referred voltage noise. Figure 5-7 shows all noise sources in the circuit.
The noise figure is expressed as:
2
2 · Ṽop · I˜op I˜op
Ṽop2
Ri
NF = 1 +
+ 2
+ 2 + 2
.
Rf
I˜in · |Zi ||Zf |2 I˜in
I˜in · |Zi ||Zf |
(5.5)
From (5.5), a large Rf is desired to reduce its thermal noise contribution. Moreover, the op-amp’s input-referred voltage noise (Ṽop ) has a peaking effect due to the
impedance drop in |Zi | at higher frequencies. It can be mathematically seen from the
following noise gain expression (Gn ), defined as the transfer function from LNA input
119
Gain (dB)
fi fp Rf
fi fp Rf
fi fp Rf
80
80
80
AOL
60
60
60
40
40
40
1/F
Gn
20
20
20
0 3
10
6
10
Freq (Hz)
9 0 3
10 10
6
10
Freq (Hz)
9 0 3
10 10
6
9
10
Freq (Hz)
10
Figure 5-10: Transfer function examples: (a) fi < fp , (b) fi ≈ fp , (c) fi > fp .
(Ṽop ) to the output (Ṽout ):
1
1
AOL
= ||AOL ≈ min , |AOL | .
Gn = 1 + F · AOL F
F
(5.6)
The dashed red curves in Figure 5-8 and Figure 5-10 show the graphical interpretation of (5.6): Gn is the lower parts of 1/F and AOL curves, which has a considerable peaking effect within the LNA bandwidth. By comparing the optimal and
non-optimal conditions in Figure 5-10, one can see that the condition fi ≈ fp minimizes the noise peaking effect while exploiting the maximum possible GBP from the
op-amp design.
5.3.2
LNA Transistor-Level Implementation
Following the guidelines discussed in Section 5.3.1, the LNA optimization starts with
a 10MHz bandwidth target and the optimal condition: fi ≈ fp ≈ BW . Rf is maximized while keeping the corresponding Cf , estimated from (5.3), larger than parasitic
capacitances to maintain control over circuit stability. The unity-gain frequency fc
is estimated from (5.4) to set the op-amp design target. Further design adjustments
keep phase margin above 60o .
Figure 5-11 shows the LNA schematic. The input stage devices (M1, M2) are
biased at the boundary of strong and weak inversion, as shown in Figure 5-12(a),
to achieve high transconductance per unit current and low noise while minimizing
120
CMUT
AC
Model
HV Rx
Switch
400μA
vb1
M0
RxSw
M6
vip
M10
M9
s1
M3
Cc s5 M8
M1
M2
Ci
25μA
s3
vin
2pF
83μA
M4
s4
out0
M5
s2
120fF
1.8V
68KΩ
20pF
68KΩ
vip
68KΩ
120fF
120fF
vb2
s6
M7
23Ω(1MHz),
1KΩ(10MHz)
120fF
175KΩ 175KΩ 175KΩ 175KΩ
Ms1
Ms2
Ms3
Ms4
Programmable Transimpedance Gain
Figure 5-11: The LNA schematic, implemented in the TIA topology. All transistors
are low voltage devices except the HV Rx Switch M10.
size and parasitic capacitance. The differential pair suppresses interference from the
power supplies, which is not possible with single-ended topologies [40, 45]. Circuit
simulation result in Figure 5-12(b) shows that the sizing of M1 and M2 is optimized
for the target CMUT parameter and that the noise figure is minimized to be below
10dB. The Miller compensation leg (M9, Cc) keeps the op-amp second pole well
beyond the closed-loop bandwidth for good phase margin. The source follower (M7,
M8) lowers the op-amp output impedance to enforce accurate feedback.
During high voltage transmissions, the high voltage Rx switch (M10) is opened
and the low voltage switches (s1-s6) are closed. The on-resistance of M10 directly
impacts LNA noise performance. Its size is chosen such that its noise contribution
is only a small portion of the input stage, and its parasitic drain and source capacitance do not degrade phase margin and bandwidth. Switches s1-s6 put the op-amp
into sleep mode when they are closed, during which only the reference current remains conducting for fast wake-up within 1µs. The sleep mode enables system-level
power saving opportunities. In addition, 4-step programmable transimpedance gain
121
(gm*ro) w.r.t. (ID/W)
NF w.r.t. W
12.0
Input stage sizing
Input stage sizing
(L↑)
NF [dB]
(gm*ro)
11.5
Weak Inv.
10-8
10-7
10-5
10.5
10.0
9.75
Strong Inv.
10-6
11.0
10-4
Current Density (ID/W) [A/μm]
10-3
10-2
(a)
0
250
500
750
1000 1250 1500
Input Stage Width [μm]
(b)
Figure 5-12: Design optimization for input stage transistors: (a) transistors are sized
at the boundary of strong and weak inversion; (b) transistor width is optimized for
the lowest noise figure.
is implemented to provide system-level flexibility.
5.3.3
Rx Path Design for 2D Ultrasonic Transducer Arrays
For Rx path in a 2D array, we want to achieve the same parallelism effect as in
the Tx path. Therefore, the LNA is modified such that when multiple Rx channels
are activated on the same column or row line, their analog outputs combine for an
increased SNR, where signals are averaged and noise is reduced. In this way, CMUT
elements are effectively parallelized to receive acoustic echoes to satisfy system-level
requirements. One example of its use is already presented in Section 4.4, where the
active Rx elements in the annular ring aperture are in parallel and the analog signals
are added along the columns.
To illustrate the principle of analog signal combining, Figure 5-13 shows the process of combining two Rx channel outputs. In Figure 5-13(a), the input current signals
(is1 , is2 ) and the input-referred current noise (in1 , in2 ) from two CMUT elements are
amplified by the two TIAs. The outputs of the LNAs (implemented with TIA) are
modelled as the Thevenin’s equivalent circuits. Both the current signal and the current noise are amplified by the transimpedance gain Z into voltage sources, in series
with a output resistance Ro. The output configuration is then converted to Norton’s
122
equivalent circuits as shown in Figure 5-13(b), to indicate the combination is done
in the current domain3 . The current gain from the input to the output is expressed
as K = Z/Ro. Assuming the two channel parameters are perfectly matched and
ignoring the line parasitics for now, the combined LNA circuits are equivalent to the
circuit shown in Figure 5-13(c). The two output resistors are in parallel to form a
output resistance of Ro/2. The two current signals add up directly as in Equation
(5.7), while the two noise sources add up in power in Equation (5.8), since they are
uncorrelated noise sources.
is,output = K · (is1 + is2 ) .
in,output = K ·
(5.7)
q
in1 2 + in2 2 .
(5.8)
Because the CMUT element size is the same and the LNAs are designed to be
matched, the input-referred noise power should be roughly equal (i2n1 = i2n2 ). Moreover, if the two receiving CMUT elements are close to each other in space, the two
CMUTs would see ultrasound echoes similar in amplitude and phase, leading to similar input signals (is1 ≈ is2 ). The above assumptions lead to the output signal and
noise expressions in Equation (5.9) and (5.10). This translates to a 3dB improvement
in the output SNR when two Rx channels are in parallel compared to a single channel
output, as indicated in Equation (5.11). Naturally, more parallelism would lead to
further SNR improvement, and SNR improvement follows the trend of 10 log(N ) dB,
in which N is number of channels in parallel. It is also summarized in the “Theory”
row in Table 5.1.
is,output = 2K · is1 .
in,output =
SN R2x
is,output
= 20 log
in,output
!
√
(5.9)
2K · in1 .
2K · is1
= 20 log √
2K · in1
!
(5.10)
= 3+20 log
is1
in1
= 3dB+SN R1x .
(5.11)
3
Thevenin’s equivalent circuit in voltage domain will yield the same conclusion, but Norton’s
equivalent is easier for explanation.
123
in1 is1
Z*is1
Z*in1
in2 is2
Z*is2
Z*in2
K = Z/Ro
in1 is1
(K*in1)(K*is1)
in2 is2
(K*in2)(K*is2)
K*√(in12+in22)
K*(is1+is2)
Figure 5-13: The signal and noise combining with two Rx channels in parallel: (a) two
channels on the same line, shown in Thevenin’s equivalent circuit at LNA outputs; (b)
two channels on the same line, shown in Norton’s equivalent circuit at LNA outputs
(c) two channels combined, showing the resultant signal and noise amplitudes.
124
In the implementation, line parasitics and component mismatches need to be taken
into consideration. The RC model of the line parasitics is shown in Figure 5-13(b),
and the LNA output stage must be specially designed to achieve the proper analog
signal combination. First of all, current mode combining should be used because it is
intrinsically robust against the parasitics and mismatch. The LNA output impedance
(Ro) must maintain a relatively large value compared to line parasitic resistance (Rp ),
i.e. Rp << Ro, so that the circuit DC condition is less susceptible to mismatch and
parasitics, and the signal combining has less distortion. On the other hand, Ro must
not be too high either, because the line capacitance would limit bandwidth, due to
the time constant formed by Ro and the line capacitor (Cp ).
As a result, the output resistance and the line parasitics need to be co-designed
to work together optimally. The line parasitics can be adjusted during design by
changing the metal layout wire width, and a source follower stage (M11-M12 in Figure
5-14) is proposed to provide a constant output impedance. First, because the source
follower stage is the last stage of the LNA, the linearity requirement determines its
biasing current. An estimated biasing current of 34µA is calculated based on the
needed worst case slew rate for a full-swing 10MHz output signal as in (5.12).
ID ≈ Islew = Cload · Vlinear · (2πf ) = 0.9pF × 0.6V × (2π × 10M Hz) = 34µA. (5.12)
The loading capacitance is estimated from the input stage capacitance of the succeeding row / column BUF amplifier (0.5pF) plus the 4mm line capacitance assuming
minimum layout width (0.4pF); the linear range of the output signal swing amplitude is estimated based on the maximum possible voltage headroom; and the signal
frequency is the maximum 10MHz supported in the ASIC design. The initial biasing current leads to roughly an output resistance of 2.2kΩ as in (5.13), assuming an
estimated 0.15V transistor over-drive voltage.
Ro ≈
VGS − VT H
0.15V
1
=
=
= 2.2kΩ.
gm
2ID
2 × 34µA
(5.13)
Starting with this initial design, the row / column line width is swept to find a
125
solution that not only maintains the SNR improvement with Rx channel parallelism,
but also preserves a 10MHz bandwidth. At the same time, the output stage linearity
performance numbers, such as HD2, IMD3, and Po1dB, are re-examined as the parasitic loading from the line changes. If the linearity specs are not met, the transistor
sizing or the biasing current of the source follower stage are tweaked to satisfy the
design target. After several iterations of changes in the line width and output stage
design, the final optimal design has a 45µA biasing current and a 1.7kΩ LNA Ro. The
corresponding line width is chosen to be 10x minimum metal wire width, as shown
in Figure 5-14. The estimated per-element (250µm length) line parasitic model is:
Rp = 6.2Ω, Cp = 250f F . According to circuit simulation, the circuit maintains a
worst case 9.2MHz bandwidth when only the channel at the end of the line is activated, driving the whole 4mm line to reach the BUF amplifier. Except for the worst
case, most other configurations4 provide a bandwidth over 10MHz. Meanwhile, the
SNR improvement with 16x channel parallelism is 11.97dB, which is very close to the
ideal target of 12dB when there is no parasitics.
As a sanity check, the row / column line width is modified to see its effect on circuit
performance. When the line width is decreased by 5x (Rp = 31Ω and Cp = 50f F
per element), the larger line resistance reduces noise averaging of 16x parallelism to
11dB. When the line width is increased by 3x (Rp = 2.07Ω and Cp = 750f F per
element), the worst case channel bandwidth drops to 6.2MHz due to the increased
line capacitance.
The last step in design is to carry out the Monte Carlo simulation for verification
in the presence of device mismatches. Less than 2% DC disturbance is observed in
Monte Carlo simulations that include line parasitics, global process variations, and
local transistor mismatches. Finally, the ASIC measurement (more details are in
Chapter 6) verified the design functionality. The measured SNR improvement with
parallel channels is close to theory, as listed by Table 5.1. The discrepancy between
the measurement and the ideal case most likely comes from the fact that there exist
4
Other configurations include: a single channel closer to the BUF, driving a shorter line with less
parasitics; several channels parallelized; etc.
126
LNA[ i, j ]
400μA
vb1
M0
RxSw
M6
vip
M10
CMUT
HV Rx
AC
Switch
Model
M9
s1
M3
Cc s5 M8
M1
M2
Ci
25μA
s3
vin
2pF
83μA
M4
s4
120fF
M5
120fF
vb2
s6
M7
35/0.45
Ro=1.7KΩ
Mc1
Mc2
68KΩ
68KΩ
vip
68KΩ
175KΩ 175KΩ 175KΩ 175KΩ
Ms1
Ms2
Ms3
Rr
50/0.18
Ms4
Row
BUF[ j ]
0.5pF
Per-element
line parasitic
(10x layout width,
250μm length)
120fF
1.8V
20pF
Mr2 Rc
50/0.18
M11
s2
120fF
45μA Source Follower
M12 Output Stage
vb3
45/0.54
25/0.18Rc
out_r
Mr1
out
Rp=6.2Ω
Rr
25/0.18
Cp=250fF
out_c
Programmable Transimpedance Gain
0.5pF
Column
BUF[ i ]
Figure 5-14: The LNA schematic, implemented in the TIA topology. All transistors
are low voltage devices except the HV Rx Switch M10. “vip” node is also buffered
with a source follower to output (not shown).
SNR improvement with parallelism
Theory (dB)
Measured (dB)
2x
3
2.41
4x
6
5.41
8x
9
8.20
16x
12
10.86
Table 5.1: SNR improvement from Rx channel parallelism, theory prediction and
measurement.
correlated noise sources, preventing the noise power to be averaged out.
Discussion on Scaling
As can be seen from Table 5.1, the measured channel SNR improvement deviates
from the theoretical expectation more as the channel parallelism increases. The
performance degradation is the result of the line parasitics and indicates that the
parallelism cannot be scaled up to infinite number of channels. In particular, it is
impossible to maintain a satisfactory bandwidth performance for the channel located
at the farthest end of the line, when the line length is excessively long.
However, several techniques can be proposed to mitigate the negative effect from
127
the line parasitics and improve the scaling to an even larger array, as described below.
• Increasing the source follower stage bias current and transistor sizing further
could lead to more than 16x parallel channels with the same performance. The
corresponding line width needs to be increased approximately proportionally
to keep Rp << Ro for current summing. Channel count increase in this way
will stop when self-loading condition for circuit bandwidth is reached. At that
point, Cp becomes the dominant load at the output, and the increase of Cp
completely offsets the reduction of Ro. Circuit simulation shows that at around
64x parallelism with a 40x minimum metal line width, self-loading is reached;
increasing output stage sizing and power consumption does not extend parallel
channels any more.
• The metal wire layout in current design is using only one layer of metal. Several metal layers can be connected in parallel to yield a better line parasitics
model. For example, by using two metal layers in parallel to implement the
interconnecting column and row lines, Rp is reduced by 2x while Cp is increased
by a factor that is much less than 2x, because there are no coupling capacitance
between the two metal layers at the same potential. As a result, the channel
parallelism can be approximately extended further by close to 2x.
• The column or row lines can be interconnected from both ends to the column
or row buffers, effectively reducing the line parasitics. The worst case channel
in this scenario becomes the one at the center of a line, rather than the ones
at the two ends. Therefore, approximately another 2x more channels can be
placed in parallel with the same performance.
• Lastly, inserting intermediate buffering stages in the middle of interconnection
lines could extend the number of parallel channels even further, as shown in
Figure 5-15. Within each intermediate block, 16-64x channel outputs can be
combined in parallel by each channel’s source follower stage. The additional
line buffers inserted could attain parallelism with even more channels without
excessive bandwidth / linearity performance degradation.
128
Figure 5-15: Parallelism with even more Rx channels by utilizing intermediate line
buffers to preserve the circuit performance.
5.4
Biasing
The current biasing for the 2D ASIC is carefully designed to provide good matching
for channels across the array. Figure 5-16 shows the biasing scheme. An 8-bit DAC
is used to generate a gate voltage, which is applied onto a tunable PMOS transistor.
The PMOS M d0 is implemented by binary weighted PMOS transistors in parallel
to provide 8-bit tunable widths. The 8-bit DAC produces a nominal seed current of
25µA and the 8-bit tunable PMOS width provides 0.2µA steps over the adjustable
range of 0 − 50µA. The seed current generated by M d0 is fed into a current mirror
with 16 branches implemented by NMOS transistors M d1 and M n0 − M n15. These
16 branches provide seed currents for the 16 rows in the 2D array. The layout of
transistors M n0 − M n15 are physically placed next to each other for good matching.
Each of the 16 row currents is then routed and distributed to its corresponding row,
where it goes through another set of current mirrors with 16 branches. For example,
row current generated by M n0 is mirrored by PMOS transistors M p0 and M 0−M 15.
129
Cp15
240
255
0
15
Cp0
Cd1
(MOS cap)
Figure 5-16: The biasing circuit for the 2D array.
Similarly, for matching purposes, transistors M 0−M 15 are placed next to each other,
before their generated biasing currents are routed into corresponding circuit channels.
The current into each channel is nominally 25µA.
It is important to design the current mirror to be robust against mismatches across
the array. The transistor mismatch model in strong inversion is expressed in (5.14).
∆I
=
I
v
#
u"
u ∆ (W/L) 2
t
(W/L)
2∆VT H
+
VGS − VT H
2
=
v
u
u ∆W 2
t
W
∆L
+
L
2
2
2∆VT H
.
+
VGS − VT H
(5.14)
The transistor L is chosen to be long to provide both large output impedance and small
sensitivity to channel length mismatches. At the same time, the transistor W is chosen
to keep the transistor well in the saturation region, with Vdsat = |VGS − VT H | ≈ 0.3V .
The large over-drive voltage helps maintain a relatively small VT H mismatch.
To reduce the noise contribution from current mirror transistors to LNA circuits,
MOS capacitors Cd1, Cp0 − Cp15 are instantiated as bypass capacitors. They take
up as much free layout area as possible, such that the noise generated from current
mirrors are negligible according to circuit simulation.
130
5.5
The Fault-Tolerant ASIC Design for Faulty MEMS
Devices
This section discusses the practical issues in the CMUT-ASIC assembly process. The
fault-tolerant transceiver front-end design in conjunction with the use of per-element
enable bits become an elegant solution to overcome the defective transducer elements.
The method increases assembly yield and allows successful system demonstration.
A 2D CMUT array contains a large number of elements, inevitably there could
exist defective elements. Currently, we obtain 2D CMUT transducer samples externally with the size of 16x16 to work with our 2D ASICs for experiments. Some of
these MEMS research prototypes suffer from failure mechanisms including individual
shorted elements and individual open elements. The problematic elements are randomly distributed in the array, and their positions vary from device to device. For
short elements, the short behavior is also observed to be related to the bias voltage. A
higher V BIAS tends to create more short elements; when V BIAS is reduced, some
elements that were shorted might turn into a normal element.
For the non-functional elements in the array, the open elements do not require
special treatment. The transceiver channel with an open element is not useful, since
no ultrasonic signal can be emitted or received. But that element does not affect the
transceiver circuit, nor prevent other elements from working properly. On the other
hand, the short elements cause more problems. Because the whole 2D array is biased
with a shared high voltage supply V BIAS, a short element could propagate the high
voltage to the side that is connected to the circuit, exposing the transceiver circuitry
under V BIAS and potentially damaging the circuit. Furthermore, if the transceiver
circuit provides a relatively low impedance path to ground, V BIAS could be pulled
down to close to 0V, sinking current through the low-impedance path from V BIAS
to ground. Since V BIAS is shared across the array, the whole array would be hardly
biased in this situation and become useless.
While extensive research is ongoing to make the device more reliable with a lower
defective element percentage, it is worthwhile to investigate methods to cope with
131
the existing defects. In particular, given the fact that even one short element could
render the whole array useless, and that achieving 100% functional element percentage
is difficult for 2D arrays with ever-growing sizes, fault-tolerance is indispensable to
work with 2D CMUT arrays in the future.
Previously, a very manual process has been used to overcome the problem caused
by the short elements [40, 43]. The elements in a 2D CMUT array are first tested
with a probe station to identify all the short elements under a certain V BIAS. The
solder bumps at the positions corresponding to the short elements are then manually
removed, to prevent the electrical contact between the short CMUT element and the
interposer PCB. In this way, the short CMUT elements are physically isolated from
the transceiver circuitry and the rest of the array can operate normally.
There are several drawbacks with this “selective bumping” approach. First, using
a probe station to sweep through all 256 elements to find shorts is a very slow and
manual process which is prone to errors. Second, because each CMUT device has a
unique pattern of short elements, it is not an easily automated process to remove the
detected shorts. Lastly, this manual approach might not solve the problem completely.
It has been observed that new CMUT short elements might emerge when a different
V BIAS voltage is applied. Therefore, a fixed solder ball removal pattern might work
at the beginning, but as soon as one single additional new short element emerges, the
assembly becomes not usable.
On the contrary, our 2D ASIC takes advantage of circuit techniques to implement
fault-tolerant transceivers, in order to eliminate the need for “PCB selective bumping”. The ASIC and CMUT are flip-chip bonded together in the usual way without
selective solder ball removing, as already been described in Section 4.1. Afterwards,
the ASIC performs a programmable “channel removal” process electrically, used both
as a scanner to detect short elements and as a selector to isolate the detected shorts.
Our solution does not require additional circuitry, but only small changes in controlling the existing front-end HV transistors in the Tx pulser and the RxSw, as shown
in Figure 5-17. In each channel, totally five front-end HV transistors are directly
connected to the CMUT element as shown in Figure 5-17(a). M1-M4 are pulser
132
30V
30V
(Monitor
current)
VBIAS
M4
15V
30V
M3
1MΩ
M2
0V
G1.[0]
CMUT
0V
0->30->0V
M1.[0]
G1.[1]
0.1uF
M1.[1]
M1
G1.[255]
0V
+
CMUT.
[0]
(Monitor
current)
VBIAS
CMUT.
10kΩ
[1]
V I
1MΩ
CMUT.
[255] 0.1uF
M1.[255]
M10
Figure 5-17: The technique used for detecting and isolating the short CMUT elements:
(a) front-end transistors in each channel and their control voltages; (b) the effective
circuit connection of all 256 channels with CMUT elements.
transistors and M10 is the Rx protection switch (RxSw). Their gate voltages can be
controlled independently. When all transistors are switched off, the CMUT element
is effectively disconnected and “selectively removed” from the array. To detect short
elements, M1 is used to provide a ground path to CMUT, while other four transistors
are kept off. Focusing on M1, the 256-channel electrical connections between ASIC
and CMUT are reduced to Figure 5-17(b). M1 from each channel is sequentially
turned on and off, applying a voltage sequence of 0→30→0V to M1’s gate. For example, when M1 from channel [0] is on with its gate voltage G1.[0] at 30V, CMUT
in channel [0] is connected across the ground and V BIAS. Normally, the CMUT is
a capacitor at DC and the current monitored by the voltage meter is zero. But if the
CMUT is shorted, the 10kΩ probing resistor would expose a leakage current through
the abnormal CMUT, indicating a short element.
The per-element enable bits in the Column-Row-Parallel architecture is the key
factor to ensure the selective enabling of transceiver channels to only make electrical
connections to normal elements. It is the independent control over each channel that
133
board7-B-SOICMUT,VBIAS=30V
board8-B-SOICMUT,VBIAS=30V
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
96
97
98
99 100 101 102 103 104 105 106 107 108 109 110 111
96
97
98
99 100 101 102 103 104 105 106 107 108 109 110 111
80
81
82
83
95
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
84
85
86
87
88
89
90
91
92
93
94
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Figure 5-18: Two successful 16x16 CMUT-ASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC. The rest of the elements are functional
and their sensitivity performance is expressed by the brightness of the elements, which
will be described in detail in Section 6.4.
allows us to identify individual short elements. And by iterating through each of the
256 channels, all short CMUT elements are identified. The ASIC is then programmed
such that only the channels with normal CMUTs are enabled which contribute to the
imaging operations. All transceiver channels facing shorted elements will keep their
front-end HV transistors cut-off during all operations. If new short elements emerge
in the future, the ASIC can be programmed again to easily account for the changes.
Figure 5-18(a) and (b) show two example assemblies with the short elements marked
in red.
However, the electrical isolation does have one limitation. The maximum acceptable V BIAS is limited to the maximum rated voltage that the HV transistors can
withstand. This is because the short elements would conduct V BIAS to the circuit
side, and the HV transistors are therefore stressed by the voltage difference between
the drain and source. V BIAS as high as 40V has been applied without breaking the
ASIC in our experiments. Normally a V BIAS of 30V is used since it already offers
134
enough acoustic pressure and sensitivity to perform the imaging experiments.
Overall, our approach has been successful. It leverages the powerful functionality
provided by the electronics implementing the Column-Row-Parallel architecture, and
makes the short detection problem a fast and automated electrical process. It does
not require repetitive manual device characterization, and it could easily adapt to element property changes over time. Imaging experiments in Chapter 4 are all carried
out with the short elements disabled in this way. As also been discussed in Section 4.1
and 7.2, with several non-functional elements inside the 16x16 2D aperture, imaging
experiment results in Chapter 4 are not severely affected. Interpolation is used to
make up for the missing elements’ signals in digital post-processing in receive. Transmit interpolation is also possible with pulsers that provide programmable amplitude
and phase generation (the 3-level pulser design in this work has a fixed pulse amplitude generation), such that the neighboring channels of a missing channel can adjust
their pulse shapes to compensate for the missing signal in transmit. Furthermore, as
can be seen from Figure 5-18, the channel responses of different channels across the
array have mismatches due to the device and circuit component mismatches, and from
the flip-chip bonding assembly process. This response difference in the receive path
is already corrected by digital post-processing, where a weight is applied onto each
channel’s waveform to account for the response amplitude difference. In the transmit
path, similar correction can be applied with pulse shape pre-distortion. Because the
assembly property does not change much, this correction / trimming process is static
and only needs to be performed infrequently.
Lastly, the per-element addressing capability combined with the highly flexible
front-end circuit design has its application beyond implementing fault-tolerance. It
is a general testing and calibration infrastructure that enables programmable access
to individual ultrasonic channels. In fact, the block-level circuit performance characterization is carried out by only enabling a single channel for measurement; the
channel performance mismatches are evaluated by turning on different channels; and
the LNA parallelism measurement is obtained by activating different number of Rx
channels for each parallelism configuration. For an even larger 2D transducer array in
135
the future, the ability of random access to channels in the array is critical for device
characterization, performance evaluation, and calibration.
136
Chapter 6
ASIC Characterization
The ASIC circuit block characterization is presented in this chapter. We have taped
out two ultrasonic transceiver ASIC test chips. Before the 2D 16x16 ASIC was made,
a 1D 4-channel ASIC was first fabricated and tested [5, 6], which is designed for
a 1D CMUT array, with a pitch of 300µm and an element height of 3mm [41].
The 1D chip allows us to familiarize with the CMOS high voltage process and the
CMUT device properties. The 2D chip re-uses circuit blocks from 1D chip, with
innovation at the architecture-level. While the 2D chip testing focuses heavily on
system-level demonstrations, which has been covered in Chapter 4, the 1D chip testing
focuses heavily on interesting acoustic experiments and device characterization. In
this chapter, both chip’s test results will be presented, with different emphases.
6.1
Tx Ultrasonic Power and Efficiency Measurement
The most important performance specification for the ultrasonic transmitter is the
power efficiency. The Tx efficiency is defined as the ratio between the transmitter’s
acoustic output power and the total consumed electrical power. To obtain Tx efficiency, the total pulsing power can be measured electrically. However, the ultrasonic
power transmitted into the medium requires acoustic measurements.
137
This section shows the way of characterizing the transmitter performance with a
combination of acoustic and electrical measurements. The 1D chip is used to show the
characterization process, but both chips’ results are listed at the end of the section.
6.1.1
Measuring Acoustic Output Power
From (6.1), acoustic power is the product of the acoustic intensity (I) at transducer
surface and the transducer surface area (A). I is calculated from the RMS fundamental frequency component of the acoustic pressure at the transducer surface (prms )
and the acoustic impedance of the medium (Zm ).
Pacoustic = I · A =
p2rms
·A
Zm
(6.1)
In practice, the acoustic pressure at the transducer surface cannot be directly
measured. Instead, it can be reliably back-calculated from a pressure measurement
at another location. According to [28–30], when the transducer aperture is close to a
square or a circle, the pressure magnitude profile along the axial direction reaches its
maximum at the boundary between the near and far field. The maximum magnitude
is roughly twice the pressure magnitude at the transducer surface.
For back-calculation, an acoustic pressure measurement system1 is established in
lab, as shown in Figure 6-1. The 1D CMUT array is submerged in vegetable oil at the
bottom of the oil tank. The test chip circuitry is connected to CMUT from under the
oil tank. A hydrophone (ONDA HNC0400) is mounted on the 3D translation stage
to probe the acoustic pressure magnitude generated by CMUT, in the oil medium.
Figure 6-2 shows the detailed configuration to measure the acoustic output power.
The four-channel pulser circuitry is parallelized and connected to eight CMUT elements in parallel, in order to form an aperture of 2.4mm x 3mm (roughly a square).
The solid curve in Figure 6-3 is the corresponding acoustic simulation of the pressure
field using the Field II software, verifying that surface pressure (z=0mm) is about
1
This measurement setup is for 1D chip testing, which is similar to the 2D chip test setup
described in Chapter 4.
138
Figure 6-1: The photo of the lab setup for measuring the acoustic output power and
the Tx efficiency.
139
Figure 6-2: Acoustic output power and Tx efficiency measurement setup.
Figure 6-3: Normalized RMS pressure along the transducer axial axis, measurement
vs. simulation. The measurement deviates from the simulation in the near field
because the hydrophone tip is too close to the transducer surface, distorting the
pressure field.
140
half maximum pressure (z=5.9mm). Furthermore, the hydrophone is used to probe
the acoustic pressure magnitude along the axial direction (z-axis). The measured
result in dots in Figure 6-3 shows good agreement to both theory and simulation.
In near field, the measured data do not exhibit amplitude fluctuations as predicted
by simulation. This is likely caused by the hydrophone tip distorting the pressure
field as it approaches the transducer. However, it does not affect the accuracy of the
maximum pressure measurement and the surface pressure back-calculation.
6.1.2
Measuring Tx Efficiency
Fixing the hydrophone at the near and far field boundary (5.9mm away), acoustic
output power is obtained with the aforementioned method. Tx efficiency is thus
acquired after dividing the acoustic output power by the total power consumption.
Different pulse shapes are generated to evaluate the efficiency improvement. The
pulse shape is defined by the ∆/T ratio as shown in Figure 6-4(a), where ∆ is the
step duration of the middle voltage level and T is the pulse period. When ∆/T = 0,
2-level pulses are generated. As ∆/T increases, the pulses turn into 3-level, reducing
the dynamic power from CV 2 f to CV 2 f /2 and increasing the efficiency. But as
∆/T increases further, the acoustic power starts to decrease because less energy is
contained within the pulse shape. Since the dynamic power is kept at CV 2 f /2,
efficiency decreases. Therefore, there is an optimal pulse shape to maximize the Tx
efficiency. For example, Figure 6-4(b) is a time-domain waveform of optimal 3-level
pulses at 3.3MHz.
Figure 6-5 shows the measurement results. As an example, Table 6.1 compares
the optimal 3-level pulser against the 2-level pulser operating at 3.3MHz: the optimal
3-level pulser dissipates 38% less total power at the cost of delivering 7% less acoustic
power. In other words, the 3-level pulser outputs 50% more acoustic power at the
same power dissipation. The measured improvement is not as big as the theoretical
calculation in Section 5.2.1 (50% rather than 88%), mainly for two reasons. First,
the RC settling transition distorts pulse shapes, with 3-level pulses being distorted
more severely than 2-level pulses, which leads to more acoustic power reduction in a
141
30Vpp
(a)
(b)
Figure 6-4: (a) Tx efficiency measurement setup and pulse shape definition. (b)
Measured time-domain waveform of the optimal 3-level 3.3MHz pulses, ∆=20ns,
∆/T=0.067
real-world 3-level pulser (7% rather than 4.4%). Second, a 3-level pulser uses more
high voltage transistors than a 2-level pulser, dissipating more power for driving the
transistors’ gate and drain capacitance, which leads to less total power reduction
(38% rather than 49%).
The relative efficiency improvements of a 3-level pulser over a traditional 2-level
pulser at 2.5, 3.3 and 5.0MHz pulses are measured to be 56%, 50% and 43%, respectively. Table 6.2 lists the optimal 30Vpp 3-level pulser power dissipation and
efficiency at all three measured frequencies. Efficiency improvement is less for pulses
with a shorter period, because the same RC settling transition distorts shorter pulse
shape more severely, reducing useful acoustic output power. Moreover, higher frequency pulses dissipate proportionally more dynamic power while acoustic output
power is kept roughly the same, thus the overall efficiency curve shifts down. Lastly,
the optimal ∆ value for the three frequencies is approximately the same (20ns), which
is slightly more than the RC settling time. This is because the optimal pulses use
just enough time to settle to the middle level to achieve CV 2 f /2 dynamic power,
142
Figure 6-5: Tx efficiency measurement results using different 3-level pulse shapes by
varying the ∆/T ratio and at different frequencies.
Table 6.1: Measured Power and Efficiency Comparison at 3.3MHz for the 1D ASIC
and CMUT (40pF capacitance per element)
2-level Optimal 3-level Change
Acoustic Power 0.56mW
0.52mW
-7%
Total Power
84.5mW
52.4mW
-38%
Efficiency
0.66%
1.0%
50%
while keeping the middle level as narrow as possible to maintain large fundamental
frequency pulse energy delivery. When normalized over pulse period T in Figure 6-5,
the optimal ∆/T ratios become different for different pulse frequencies.
Similarly, the 2D chip is designed to generate pulses at frequencies between 210MHz for the CMUT element size of 250µm × 250µm. The capacitance is roughly
2pF per element. Its performance is summarized in Table 6.3.
By comparing the optimal 3-level pulser against the 2-level pulser, this work is
effectively compared against a range of traditional pulsers. The reason is that not
only for 2-level pulsers [40, 41, 89], but also for multi-level pulsers without charge
143
Table 6.2: Measured Optimal 3-level Pulser Performance Summary for the 1D ASIC
and CMUT (40pF capacitance per element)
2.5MHz 3.3MHz 5.0MHz
Total Power (mW)
39.4
52.4
77.6
Relative Efficiency Improvement
56%
50%
43%
Against a 2-level Pulser
Table 6.3: Measured Optimal 3-level Pulser Performance Summary for the 2D ASIC
and CMUT (2pF capacitance per element)
4.2MHz 5.6MHz 8.3MHz
Total Power (mW)
7.1
9.6
14.3
Relative Efficiency Improvement
46%
38%
18%
Against a 2-level Pulser
recycling [81–83] or pulsers implemented as linear amplifiers [79, 80], the dynamic
power dissipation is always CV 2 f . Therefore these traditional pulsers have similar
(if not worse, considering the quiescent power dissipation in linear amplifiers) Tx
efficiency performance compared to the 2-level pulser used in this work.
Table 6.4 gives a comparison between different types of pulsers for ultrasonic
imaging. The multi-level pulser in [82] (STHV748 datasheet) does not implement
charge recycling, it would consume the same amount of CV 2 f dynamic power as the
2-level pulser in [40] (2008) & [43] (2013), if the load is the same. The linear amplifier
approach in [79] (2012), on the other hand, is more suitable for resistive transducers.
Because 2D transducers typically have capacitive elements, its efficiency would be low
due to quiescent power dissipation. Lastly, the discrete-level pulsers tend to generate
harmonics. This work attempts to improve the pulser’s HD2 performance from the
system-level, employing the I&Q excitation method presented in Section 4.3.
6.2
LNA Characterization
The LNAs from the 1D and the 2D ASICs are tested as single amplifier blocks in this
section. Table 6.5 and Table 6.6 summarize the measured performance numbers from
the two ASICs respectively. In Table 6.7, selected performance specifications of the
LNAs are compared against other CMUT LNAs in the literature.
144
Table 6.4: CMUT Pulser Performance Comparison
[40]
Our 1D
[82]
Our 2D
(2008)
Pulser Specs
ASIC
(STHV748
ASIC
& [43]
[5]
datasheet)
(2013)
250 x
300 x
250 x
General
CMUT Element
250
3000
250
Purpose
Size (µm × µm)
(2pF)
(40pF)
(2pF)
(200Ω||50pF )
3- / 5- Levels,
Discrete
Discrete
Discrete
Pulser Type
without charge
3-Levels
3-Levels
2-Levels
recycling
7.1mW
77.6mW
Active Power
N/A
N/A
@4.2MHz @5MHz
Quiescent Power
0
0
0
N/A
2
2
2
“CV f /2” “CV f /2” “CV f ”
“CV 2 f ”
Dominant Power
Dynamic Dynamic Dynamic
Dynamic
Dissipation
Power
Power
Power
Power
Pulse Amplitude
30 Vpp
30 Vpp
25 Vpp
± 90 V
Minimum Pulse
20 ns
20 ns
100 ns
20 MHz
Width/Bandwidth
Linearity
N/A
N/A
N/A
N/A
[79] (2012)
General
Purpose
(100Ω||150pF )
Linear
Amplifier
20W
37mW
“V 2 /R”
Resistive
Power
90 Vpp
6.5 MHz
HD2<-43dBc
Table 6.5: Measured LNA Performance Summary for the 1D ASIC [5]
LNA Specs
Measured Result
Process
0.18µm CMOS
Target CMUT Element Size
300µm × 3000µm
Active Power Consumption
14.3 mW
Sleep Power Consumption
1.5 mW
Bandwidth
5.2 MHz
Transimpedance Gain
96.6 dBΩ
Receive Sensitivity
1.2 Pa(rms)
Receive Responsivity
162 mV/kPa
√
Input-referred Pressure Noise
0.56 mP a/
√ Hz @3MHz
Output-referred Voltage Noise
91 nV / Hz @3MHz
Noise Figure
10.3 dB @3MHz
Output P1dB
618 mVpp
4-Ch Gain Mismatch
<0.11 dBΩ
Crosstalk
<-47 dBc @3MHz; <-35 dBc @10MHz
Wake-up / Sleep Time
<1µs
145
Table 6.6: Measured LNA Performance Summary for the 2D ASIC
LNA Specs
Measured Result
Process
0.18µm CMOS
Target CMUT Element Size
250µm × 250µm
Active Power Consumption
1.4 mW
Sleep Power Consumption
0.054 mW
Bandwidth
10.2 MHz
Transimpedance Gain
116/113.5/110/104 dBΩ
Receive Sensitivity
7.3 Pa(rms)
Receive Responsivity
123 √
mV/kPa
Input-referred Pressure Noise
2.3 mP a/√ Hz @5MHz†
Input-referred Current Noise
0.41 pA/√ Hz @5MHz
Output-referred Voltage Noise
289 nV / Hz @5MHz
Noise Figure
13 dB @5MHz
Output P1dB
946 mVpp†
HD2
−46dBc @330mVpp, 2MHz tone†
HD3
−46dBc @330mVpp, 2MHz tone†
IMD3
−72dBc @324mVpp, 2MHz & 2.01MHz (-25dBc) tones†
256-Ch Gain Mismatch
<2.0 dBΩ
Crosstalk
<-50 dBc @3MHz; <-22 dBc @15MHz
Wake-up / Sleep Time
<1µs
†: These results are measured at the maximum LNA gain setting.
Being used for different medical ultrasound applications, the CMUT arrays are
very different in size, impedance and operating frequency. Thus, the corresponding
LNA specs are also vastly different and difficult to compare. For example, the 1D
CMUT used in this work is designed as an alternative to 1D PZT linear arrays
operating up to 5MHz; the 2D CMUT in this work however, has a smaller perelement size (thus smaller element capacitance) while its bandwidth is larger (up to
10MHz). To establish a figure of merit for fair comparison and to be able to apply the
data available in CMUT LNA literature, the noise efficiency factor (NEF) commonly
used for instrumentation amplifiers [90] is revised for use here. The orignial NEF and
the revised NEF’ are expressed in (6.2) and (6.3) respectively:
s
N EF = Vrms,in ·
2 · Itot
,
π · UT · 4kT · BW
146
(6.2)
Table 6.7: CMUT LNA Performance Comparison
Our
Our 2D
1D
[40]
[43]
[73]
LNA Specs
ASIC
ASIC
(2008) (2013) (2010)
[5]
CMUT Element
250 x
300 x
250 x
250 x
63 x
Size (µm × µm)
250
3000
250
250
1037
Active Power
1.4
14.3
4.0
9.4
3.8
(mW) [Ptot ]
Sleep Power (mW)
0.054
1.5
N/A
N/A
N/A
Bandwidth (MHz)
10.2
5.2
10
25
20
Transimpedance
116/113.5
96.6
112.7
106.6
94.0
Gain (dBΩ)
/110/104
Input-referred
Pressure Noise
2.3
0.56
1.8
2.18
N/A
Density
@5MHz
@3MHz @5MHz
@10MHz
√
(mP a/ Hz) [pn,in ]
Noise Figure (dB)
Output P1dB
(mVpp)
NEF’
q
(mP a · mW/Hz)
√
[pn,in · Ptot ]
[45]
(2011)
70 x 70
6.6
N/A
10-20
129.5
3.0
@15MHz
1.8
10.5
@10-20
@10MHz
MHz
13
@5MHz
10.3
@3MHz
N/A
N/A
946
618
N/A
N/A
84.2
N/A
2.7
2.1
3.6
N/A
4.2
7.7
q
prms,in q
· Ptot = pn,in · Ptot .
N EF 0 = √
BW
(6.3)
The constant factors in the original NEF are ignored, and Vrms,in is replaced by
prms,in or pn,in . prms,in is the input-referred RMS noise amplitude in-band and pn,in
is the input-referred noise spectral density averaged inside the passband. Note that
both prms,in and pn,in are acoustic pressure noise, input-referred all the way to the
√
mechanical side at the CMUT element surface, in the unit of P a and P a/ Hz respectively. This input-referred method normalizes the effect of CMUT receive sensitivity
and LNA gain. Moreover, the input-referred noise spectral density at the center frequency of the passband is used to approximate pn,in (the input-referred noise spectral
density averaged inside the passband) for the actual NEF’ calculation, because it is
the more accessible measurement result in the literature.
147
Figure 6-6: The die photo of the four-channel ultrasonic imaging transceiver test chip.
The NEF’ in (6.3) handles CMUT element size scaling correctly. For example, a
CMUT element with 2x bigger surface area presents approximately 2x bigger input
capacitance to the LNA. If two of the same LNAs are parallelized to buffer the 2x
CMUT element, the same bandwidth and noise figure targets are achieved. Although
√
the parallelization reduces the input-referred noise amplitude by 2x and increases
the power consumption by 2x, the NEF’ is held unchanged. This is expected since
the same LNA design is used in both cases. Another example to show the usefulness
of NEF’ is [45] in Table 6.7. It achieves a very low noise performance as indicated
by the noise figure. On the other hand, excessive power is dissipated on a very small
CMUT element, which leads to a relatively high NEF’.
Table 6.7 suggests that our LNA designs for the 1D CMUT achieves the lowest NEF’, indicating the best power efficiency for noise and bandwidth performance.
NEF’ in our 2D LNA is slightly worse than our 1D LNA due to the overhead needed
to drive extra line capacitance and to combine analog outputs. In addition, both
designs achieve good linearity performance as shown by results in P1dB, harmonics
and intermodulation numbers.
Finally, Figure 6-6 shows a die photo of the 1D test chip fabricated in TSMC
0.18µm high voltage CMOS process. The chip occupies a total area of 3mm × 3mm
148
Figure 6-7: The die photo of the 256-channel 16x16 2D ultrasonic imaging transceiver
test chip.
and each channel occupies an area of 300µm × 1100µm. The shared middle voltage
generation circuit occupies an area of 300µm × 600µm. Figure 6-7 shows a die photo
of the 2D test chip fabricated in the same CMOS process. The chip occupies a total
area of 6mm × 5.5mm and each channel is element-matched to the CMUT element
area of 250µm × 250µm.
6.3
The Tx Beam-Steering Experiment
Although Tx beam-steering or beam-focusing is already used on the 2D CMUTASIC system for real imaging experiments in Chapter 4, a tangible Tx beam-pattern
demonstration would help understanding. Therefore, a simple Tx beam-steering experiment is conducted on the 1D ASIC, in which the ultrasonic lateral beam-pattern
is measured. In this experiment, each of the four-channel pulsers is connected to one
149
(z=7.4mm)
(z=7.4mm)
Figure 6-8: (a) Measured ultrasonic lateral beam profile, steered to the center (broadside). (b) Measured beam profile, with 30ns delay between channels.
of four consecutive CMUT elements in the experimental setup in Figure 6-1; each
pulser drives its CMUT with the 3.3MHz optimal 3-level pulses. The hydrophone
is placed at a fixed depth in the transducer’s far field (z=7.4mm). By moving the
hydrophone along the lateral direction (x-axis) and collecting the acoustic pressure
readings, the lateral beam profile can be plotted. Furthermore, ultrasonic Tx beamsteering is demonstrated on the four-channel transmitter system when varying the
relative pulsing delays across four channels.
Figure 6-8(a) shows the measured beam profile in dots with zero delay between
channels. The beam is steered to the center, i.e., broadside. Figure 6-8(b) shows
the profile when 30ns delay is applied between channels. The figures also show the
Field II simulation results of the same experimental configurations for each case.
The simulation and measured data match well. Hand calculation based on classical
wave propagation provides another verification for Figure 6-8(b). The beam lateral
displacement ∆x and the channel delay, td = 30ns, are related to each other by (6.4):
∆x
td · c
≈
,
z
d
(6.4)
where depth z = 7.4mm, sound speed c in vegetable oil is measured to be 1460m/s,
and CMUT element pitch d = 300µm. The calculated beam lateral displacement
∆x = 1.08mm, which is consistent with the measured result.
150
Figure 6-9: The setup of the pulse-echo experiment for characterizing the complete
ultrasound channel.
6.4
The Pulse-Echo Experiment
The pulse-echo experiment characterizes the complete ultrasound signal chain. The
1D ASIC test setup in Figure 6-1 is revised to perform the experiment. As shown
in Figure 6-9, the pulser drives a single CMUT element with a wideband pulse as an
approximation to the ideal impulse excitation. The narrowest pulse that can be generated from the pulser is a 2-level 30Vpp pulse with 20ns pulse width (Figure 6-10(a)).
The excited ultrasonic wave then propagates through the oil medium and is reflected
back at the oil-air boundary 26mm away from the transducer (the hydrophone is not
needed for this experiment). The reflected echo is received by the same CMUT element and amplified by the LNA (Figure 6-10(b)). Because the CMUT blocks the DC
component and acts as a differentiator, the received echo looks similar to the derivative of the transmitted pulse, with a positive peak and a negative peak corresponding
to the rising edge and the falling edge of the transmitted pulse. The echo duration
is about 0.3µs, corresponding to the dominant frequencies (3-5MHz) that go through
the ultrasound signal chain. The echo’s FFT in Figure 6-10(c) confirms the intuition.
It shows the total channel impulse response, including CMUT, the oil medium and
LNA. It mainly reflects the band-pass characteristic and the wide bandwidth of the
151
Voltage (V)
(a) Transmitted Pulse Waveform
40
30
20
10
0
-10
-0.2
-0.1
0 Time (us)
Time (us)
0.1
0.2
Amplitude (dB)
Voltage (V)
(b) Received Echo Waveform
0.2
0.1
0
-0.1
-0.2
32
0
33
34 Time (us) 35
Time (us)
36
37
(c) Spectrum of Received Echo Waveform
-10
BW=5.2MHz
-20
-30
-40
0
2
2.3MHz
f0=4.5MHz
7.5MHz
4 Freq (MHz) 6
Freq (MHz)
8
10
Figure 6-10: The key waveforms from the pulse-echo experiment, showing the ultrasound channel characteristics. (a) The transmitted pulse waveform. (b) The received
echo waveform. (c) The spectrum of the received echo waveform.
CMUT device, with a center frequency of 4.5MHz and a -6dB fractional bandwidth
of 116%2 .
Similarly, the 2D ASIC performs the pulse-echo experiment on all of its 16x16
transceiver channels, which shows (on average) a center frequency of 6.25MHz and a
-6dB fractional bandwidth of 75% of the CMUT-ASIC total channel response. The
reflected echo amplitude also shows the channel sensitivity. By collecting all echoes’
amplitudes, the sensitivity map of the array can be obtained as shown in Figure 6-11
(a re-plot of Figure 5-18). Except for the short elements in red, the working elements
are drawn in grayscale. The brightness encodes the normalized sensitivity of each
2
-6dB bandwidth is used instead of -3dB because the spectrum is showing the combined CMUT
characteristic both-ways.
152
board7-B-SOICMUT,VBIAS=30V
board8-B-SOICMUT,VBIAS=30V
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
96
97
98
99 100 101 102 103 104 105 106 107 108 109 110 111
96
97
98
99 100 101 102 103 104 105 106 107 108 109 110 111
80
81
82
83
95
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
84
85
86
87
88
89
90
91
92
93
94
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Figure 6-11: A re-plot of Figure 5-18 in Section 5.5. Two successful 16x16 CMUTASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC.
The rest of the elements are functional and their sensitivity performance is expressed
by the brightness of the elements.
CMUT element. The sensitivity map can be used for digital calibration of channel
gain mismatch for the 2D array.
153
154
Chapter 7
Conclusion
7.1
Summary of Contributions
In summary, this thesis presents a Column-Row-Parallel ASIC architecture as a scalable and flexible hardware solution for 3D wearable / portable medical ultrasound
applications.
The architecture provides a “N ” interconnection complexity and a “N ” acquisition time complexity for a NxN 2D ultrasonic transducer array, which is scalable as
the array size grows bigger. The architecture offers column-parallel and row-parallel
operations, and fine-granularity per-element selection control, which makes the hardware system flexible for different ultrasonic imaging algorithms.
A 16x16 ASIC ultrasonic transceiver test chip interfacing to a 16x16 CMUT array
is designed and fabricated to demonstrate the proposed architecture. Plane-wave
coherent compounding algorithm in 3D (PWCC3D) is implemented on the system
assembly as a fast volume rate, high quality, volumetric imaging algorithm. The
architecture also enables a technique for HD2 reduction in the transmitters used in
ultrasonic harmonic imaging mode. The interleaved checker board patterns with
I&Q excitations achieve Tx HD2 reduction by over 20dB compared to conventional
methods. This technique is applicable to nonlinearity from both CMUT transducers
and circuits, and it is useful for any arbitrary pulse shapes.
The circuit design of the 16x16 ASIC transceiver is optimized to the target CMUT
155
transducer. The high-voltage transmitter uses a 3-level pulse-shaping technique with
charge recycling to improve the power efficiency. The design requires minimum offchip components and is scalable for more channels. The receiver is implemented with
a transimpedance amplifier topology and is optimized for trade-offs between noise,
bandwidth, and power dissipation. The test chip is characterized with both acoustic
and electrical measurements. Comparing the 3-level pulser against traditional 2-level
pulsers, the measured Tx efficiency shows 50% more acoustic power delivery with the
same total power dissipation. The CMUT receiver achieves the lowest noise efficiency
factor compared with that of the literature (2.1 compared to a previously reported
lowest of 3.6, in units of mP a ·
q
mW/Hz).
In addition, both transmitters and receivers can be parallelized to efficiently implement the Column-Row-Parallel architecture. Particularly for the receivers, a special
output stage is implemented for the receiver LNA, such that the analog outputs are
combined for a higher SNR, which scales with the number of LNAs as 10 log(N ) dB.
The 2D transceiver chip is also designed to be fault-tolerant against defects existing in CMUT arrays. The transceiver channels can be used for detecting CMUT short
elements and then disconnecting the non-functional elements from the array. This
selective element disabling capability is realized electrically and can be automated.
This design strategy has proven to be critical for working with faulty MEMS devices.
It is especially beneficial for the 2D arrays with large element count, and it reflects
a highly desirable feature for front-end sensor interface circuit design. The random
access to the array elements serves as a flexible testing infrastructure in general. Not
only faulty channels can be detected and isolated, performance characterizations of
functional channels can also be obtained with this programmable interface.
There are several low power circuit design techniques used in this work, to make
the system suitable for wearable / portable applications. They are summarized as
follows:
• The multi-level pulse shaping technique is combined with a charge-recycling,
regulated power supply to implement the transmit pulser in Section 5.2.1. The
dynamic power consumed by the capacitive load of the CMUT element is re156
duced by half with a 3-level pulsing scheme. The regulated power supply that
recycles the charge is implemented with a shared DC-DC power converter, which
requires only two off-chip capacitors. The circuit is scalable to more channels,
and is easily integrated.
• The transimpedance amplifier topology for the receiver LNA is optimized for
the best power efficiency, given the bandwidth, gain and noise performance
requirements. The optimization procedure is described in Section 5.3.1. The
revised noise efficiency factor (NEF’) shows that the amplifier design achieves
the lowest power consumption while meeting all design targets. In addition
to the low active power dissipation, the amplifier also implements a very low
power sleep mode. Both the main amplifier stages and the biasing currents are
turned off in sleep mode by auxiliary switches, which help attain less than 1µs
amplifier recover time during wake up. With some prior information about the
scene (e.g. from a coarse image of the space in the first pass), the receiver signal
chain can be put into sleep mode for power savings, when it is not needed to
perform imaging at certain regions. Therefore, the sleep mode offers flexibility
for system-level power scheduling.
• The receiver output stage is designed to facilitate the analog signal combining. A source follower stage is specially sized to provide proper signal current
summing, while overcoming the parasitics from the 2D interconnect lines. The
optimization procedure is shown in Section 5.3.3. Similar to the sleep mode,
the receiver output stage could benefit the system flexibility by providing programmable receiver parallelism. More receivers are parallelized for a better
signal quality (i.e. SNR) when it is necessary for the imaging algorithm.
• At the algorithm-level, flexible beam-formation schemes are proposed, such that
power consumption can be tuned according to the image quality requirement. In
Sections 4.2 and 4.4, the 3D plane-wave coherent compounding (PWCC3D) algorithm and the annular ring aperture imaging method are presented as scalable
imaging algorithms that could adjust image volume rate, contrast and resolu157
tion performance for variable system power dissipation. For example, with less
transmit plane-wave angles in PWCC3D, or less annular rings formed in the
ring apertures, the image contrast and resolution are degraded, but the energy
consumed for data acquisition of one volumetric image is decreased. Moreover,
after data acquisition, PWCC3D beamforming processing is also scalable. A
low resolution volumetric image should first be computed with relatively low
power consumption; a higher resolution image can then be computed based on
the region of interest. The latter consumes more digital computation power,
but offers proportionally higher image resolution performance.
7.2
Future Work
Several possible improvements can be made for the 16x16 Column-Row-Parallel ASIC:
• A more complete analog front-end design would require on-chip ADCs. Because
the scalable Column-Row-Parallel architecture offers “N ” I/O complexity, only
16 (rather than 256) ADCs are required for the 16x16 ASIC, which is practical to
implement. In fact, there are many octal analog front-end ASICs commercially
available for conventional 1D ultrasonic arrays [91–93], where each ADC channel
occupies 2 LVDS output pins to output serialized digital data. The ColumnRow-Parallel ASIC with on-chip ADCs could take the same strategy to deal
with the massive amount of data and save pin count.
• When the 2D array size grows beyond 16x16, if a single ASIC with excessive
silicon area is to be avoided for yield and reliability reasons, multiple ColumnRow-Parallel ASICs could be tiled together for expansion. For example, four
16x16 ASICs make a 32x32 front-end system. To simplify the tiling assembly,
the ASIC layout could be re-arranged, such that only two sides, instead of all
four sides of the chip have extra area for peripheral I/Os. In this way, the 16x16
transceiver array is exposed to two sides, to which four of the same ASIC chips
can be simply abutted for a 32x32 transceiver array, as shown in Figure 7-1.
158
Figure 7-1: Four 16x16 ASICs tiled together for a 32x32 imaging front-end.
Figure 7-2: CMUT-ASIC assembly alternatives to eliminate the interposer PCB: (a)
TSV technology for interconnecting ASIC I/Os to the main testing PCB; (b) Applying
flip-chip bonding technology for CMUT-ASIC interconnection and wire-bonding for
ASIC I/Os.
• The current system assembly is accomplished by using an interposer PCB as
shown in Section 4.1. It helps adapt to different CMUT footprint and it serves
as an intermediate substrate, such that the ASIC I/Os can be connected to the
main testing board. But it increases the assembly complexity and adds additional parasitic capacitance and resistance to the CMUT-ASIC interconnection.
In the future, the interposer PCB can be eliminated by adopting new process
and assembly technologies for the interconnection, such as the through silicon
via (TSV), or the co-assembly of wire-bonding and flip-chip bonding. Their corresponding ASIC I/O connection methods to the main testing PCB are shown
in Figure 7-2(a) and (b).
• As been briefly mentioned in Section 5.2.3, the future ASIC could make the
pulser gate driver programmable in driving strength, to dynamically adapt to
159
different number of active pulsers on a column / row line, leading to a power
saving. Similarly for the Rx path, if the ADCs are implemented, ADCs with
configurable accuracy (i.e. number of bits) can be designed to adapt to different
number of active LNAs (different SNR) along the line to save power.
• The programmable gain Rx LNAs in the array are currently controlled globally. It would be more flexible to have control over individual LNA, by adding
configuration bits per element. Moreover, programmable Tx pulse amplitude
control can be realized with a multi-level pulser design with per-element controls. Different voltage levels can be used to realize pulses with different amplitudes. These programmable functionality could enable fine-granularity, flexible
apodization for both Tx and Rx apertures. In particular, it can be used to perform signal strength compensation against the channel mismatches as seen in
Figure 5-18; and to compensate the missing channels by adjusting neighboring
channels’ pulse shapes (amplitude and phase) as been mentioned in Section 4.1
and 5.5.
At the system-level, work is being done by Bonnie Lam, under the guidance of
Prof. Anantha Chandrakasan and Prof. Charles Sodini, to design a custom digital
test chip that performs 3D beam-formation based on the Column-Row-Parallel analog
front-end ASIC made in this thesis. This thesis aims to demonstrate the ColumnRow-Parallel architecture as a promising hardware system framework for efficient,
low-power, and scalable 3D ultrasonic imaging for wearable / portable applications.
The analog front-end circuit implementation is the focus, while the system-level digital
beam-formation processing is performed on a PC. To demonstrate a complete wearable / portable ultrasonic imaging device, a real-time low-power digital beam-former
that is optimized for the Column-Row-Parallel analog front-end is indispensable.
Furthermore, intelligence can be implanted in the beam-former chip, such that
the ultrasound device becomes adaptive and autonomous. The beam-former could
understand the scene based on its beam-formed data, and improve its imaging strategy
correspondingly. One example of intelligence has been described in Section 4.2. One
160
could control PWCC3D algorithm to either obtain volumetric images of a large space
with coarse spatial resolution, or to “zoom into” a smaller region with finer resolution,
under certain volume rate or power constraints. The beam-former chip could exploit
such algorithm features and provide feedback controls to the analog front-end to
realize a closed-loop adaptive imaging system.
On another thread, PMUT is currently being investigated as an alternative transducer technology to CMUT by Katherine Smyth, under the guidance of Prof. SangGook Kim at MIT. Because Column-Row-Parallel architecture is independent of
transducer technology, implementation of a Column-Row-Parallel analog front-end for
PMUT would be interesting. Block-level circuit optimization is different for PMUT
due to its different device characteristics as compared to CMUT. To understand the
performance differences between PMUT and CMUT, and the impact to circuit topology, a detailed comparison study is currently on-going.
161
162
Bibliography
[1] G. E. Moore, “Cramming more components onto integrated circuits,” Proc.
IEEE, vol. 86, no. 1, pp. 82–85, Jan 1998.
[2] C. Prinz and J. Voigt, “Diagnostic accuracy of a hand-held ultrasound scanner in
routine patients referred for echocardiography,” Journal of the American Society
of Echocardiography, 2010.
[3] S. Nikolov and J. Jensen, “3d synthetic aperture imaging using a virtual source
element in the elevation plane,” in Ultrasonics Symposium, 2000 IEEE, vol. 2,
oct 2000, pp. 1743 –1747 vol.2.
[4] S. Nikolov, J. Jensen, R. Dufait, and A. Schoisswohl, “Three-dimensional realtime synthetic aperture imaging using a rotating phased array transducer,” in
Ultrasonics Symposium, 2002. Proceedings. 2002 IEEE, vol. 2, oct. 2002, pp.
1585 – 1588 vol.2.
[5] K. Chen, H.-S. Lee, A. Chandrakasan, and C. Sodini, “Ultrasonic imaging
transceiver design for cmut: A three-level 30-vpp pulse-shaping pulser with improved efficiency and a noise-optimized receiver,” Solid-State Circuits, IEEE
Journal of, vol. 48, no. 11, pp. 2734–2745, 2013.
[6] K. Chen, A. Chandrakasan, and C. Sodini, “Ultrasonic imaging front-end design
for cmut: A 3-level 30vpp pulse-shaping pulser with improved efficiency and a
noise-optimized receiver,” in Solid State Circuits Conference (A-SSCC), 2012
IEEE Asian, 2012, pp. 173–176.
163
[7] K. Chen, B. Lam, C. Sodini, and A. Chandrakasan, “System energy model for a
digital ultrasound beamformer with image quality control,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012, pp. 615–618.
[8] G. S. Kino, Acoustic Waves: Devices, Imaging, and Analog Signal Processing.
Prentice Hall, 1987.
[9] R. S. C. Cobbold, Foundations of Biomedical Ultrasound.
Oxford University
Press, 2006.
[10] D. Olendorf, C. Jeryan, and K. Boyden, The Gale encyclopedia of medicine.
Gale Research (Detroit, MI), 1999.
[11] J. A. Jensen, Estimation of Blood Velocities Using Ultrasound, A Signal Processing Approach. Cambridge University Press, 1996.
[12] T. Szabo, Diagnostic Ultrasound Imaging: Inside Out. Elsevier, 2004.
[13] P. Satamura, “Study of the flow patterns in peripheral arteries by ultrasonics,”
J. Acoust. Soc. Japan, vol. 15, pp. 151–158, 1959.
[14] D. Baker, “Pulsed ultrasonic doppler blood-flow sensing,” Sonics and Ultrasonics, IEEE Transactions on, vol. 17, no. 3, pp. 170 – 184, jul 1970.
[15] C. Kasai, K. Namekawa, A. Koyano, and R. Omoto, “Real-time two-dimensional
blood flow imaging using an autocorrelation technique,” IEEE Transactions on
Sonics and Ultrasonics, vol. SU-32, no. 3, pp. 458–463, May 1985.
[16] M. Anderson, M. McKeag, and G. Trahey, “The impact of sound speed errors on
medical ultrasound imaging,” The Journal of the Acoustical Society of America,
vol. 107, p. 3540, 2000.
[17] D. H. Evans and W. N. McDicken, Doppler Ultrasound (Second ed.).
Wiley and Sons, 2000.
164
John
[18] F. Tranquart, N. Grenier, V. Eder, and L. Pourcelot, “Clinical use of ultrasound
tissue harmonic imaging,” Ultrasound in medicine & biology, vol. 25, no. 6, pp.
889–894, 1999.
[19] A. Novell, M. Legros, N. Felix, and A. Bouakaz, “Exploitation of capacitive
micromachined transducers for nonlinear ultrasound imaging,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 56, no. 12, pp.
2733–2743, 2009.
[20] S. Satir and F. L. Degertekin, “Harmonic reduction in capacitive micromachined
ultrasonic transducers by gap feedback linearization,” IEEE Transactions on
Ultrasonics Ferroelectrics and Frequency Control, vol. 59, no. 1, pp. 50–59, Jan
2012.
[21] F. Lin, C. Cachard, R. Mori, J. Viti, F. Varray, F. Guidi, and O. Basset, “Influences of bubble motion to second-harmonic inversion imaging,” in Ultrasonics
Symposium (IUS), 2012 IEEE International, 2012, pp. 675–678.
[22] M. Pasovic, M. Danilouchkine, T. Faez, P. L. van Neer, C. Cachard, A. F. van der
Steen, O. Basset, and N. de Jong, “Second harmonic inversion for ultrasound
contrast harmonic imaging,” Physics in Medicine and Biology, vol. 56, no. 11, p.
3163, 2011.
[23] J. Rubin, R. Bude, P. Carson, R. Bree, and R. Adler, “Power doppler us: a potentially useful alternative to mean frequency-based color doppler us.” Radiology,
vol. 190, no. 3, p. 853, 1994.
[24] J. Platt, J. Rubin, J. Ellis, and M. DiPietro, “Duplex doppler us of the kidney:
differentiation of obstructive from nonobstructive dilatation.” Radiology, vol. 171,
no. 2, p. 515, 1989.
[25] A. Yuan, P. Yang, D. Chang, C. Yu, S. Kuo, and K. Luh, “Lung sequestration.
diagnosis with ultrasound and triplex doppler technique in an adult.” Chest, vol.
102, no. 6, p. 1880, 1992.
165
[26] K. Thomenius, “Evolution of ultrasound beamformers,” in Ultrasonics Symposium, 1996. Proceedings., 1996 IEEE, vol. 2, nov 1996, pp. 1615 –1622 vol.2.
[27] E. Brunner, “Ultrasound system considerations and their impact on front-end
components,” Analog Devices, 2002.
[28] J. Bushberg, The essential physics of medical imaging.
Williams & Wilkins,
2002.
[29] H. Pettersson, The Encyclopaedia of Medical Imaging: Physics, Techniques and
Procedures vol. 1. Taylor & Francis Ltd, 1998.
[30] X. Zeng and R. J. McGough, “Evaluation of the angular spectrum approach
for simulations of near-field pressures,” The Journal of the Acoustical Society of
America, vol. 123, no. 1, p. 68, 2008.
[31] C. Capps, “Near field or far field,” EDN, August, vol. 16, pp. 95–102, 2001.
[32] L. Steiner and P. Andrews, “Monitoring the injured brain: Icp and cbf,” British
journal of anaesthesia, vol. 97, no. 1, p. 26, 2006.
[33] F. M. Kashif, T. Heldt, and V. G. C., “Model-based estimation of intracranial
pressure and cerebrovascular autoregulation,” Comput Cardiol, pp. 35: 369–372,
2008.
[34] W. Mason, Electromechanical transducers and wave filters. Van Nostrand Reinhold, 1946.
[35] F. V. Hunt and D. T. Blackstock, Electroacoustics: the analysis of transduction,
and its historical background. American Institute of Physics for the Acoustical
Society of America, 1982.
[36] C. H. Sherman and J. L. Butler, Transducers and arrays for underwater sound.
Springer, 2007.
166
[37] B. Savord and R. Solomon, “Fully sampled matrix transducer for real time 3d
ultrasonic imaging,” in Ultrasonics, 2003 IEEE Symposium on, vol. 1, 2003, pp.
945–953 Vol.1.
[38] C. H. Seo and J. Yen, “A 256 x 256 2-d array transducer with row-column
addressing for 3-d rectilinear imaging,” Ultrasonics, Ferroelectrics and Frequency
Control, IEEE Transactions on, vol. 56, no. 4, pp. 837–847, 2009.
[39] O. Oralkan, A. Ergun, J. Johnson, M. Karaman, U. Demirci, K. Kaviani, T. Lee,
and B. Khuri-Yakub, “Capacitive micromachined ultrasonic transducers: nextgeneration arrays for acoustic imaging?” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 49, no. 11, pp. 1596–1610, Nov 2002.
[40] I. Wygant, X. Zhuang, D. Yeh, O. Oralkan, A. Ergun, M. Karaman, and
B. Khuri-Yakub, “Integration of 2d cmut arrays with front-end electronics for volumetric ultrasound imaging,” IEEE Transactions on Ultrasonics Ferroelectrics
and Frequency Control, vol. 55, no. 2, pp. 327–342, Feb 2008.
[41] O. Oralkan, “Acoustic imaging using capacitive micromachined ultrasonic transducer arrays: devices, circuits, and systems,” Ph.D. dissertation, Stanford University, 2004.
[42] I. Wygant, N. Jamal, H. Lee, A. Nikoozadeh, O. Oralkan, M. Karaman, and
B. Khuri-yakub, “An integrated circuit with transmit beamforming flip-chip
bonded to a 2-d cmut array for 3-d ultrasound imaging,” IEEE Transactions
on Ultrasonics Ferroelectrics and Frequency Control, vol. 56, no. 10, pp. 2145–
2156, Oct 2009.
[43] A. Bhuyan, J. W. Choe, B. C. Lee, I. Wygant, A. Nikoozadeh, O. Oralkan, and
B. T. Khuri-Yakub, “3d volumetric ultrasound imaging with a 32x32 cmut array
integrated with front-end ICs using flip-chip bonding technology.” IEEE International Solid-State Circuits Conference (ISSCC), Digest of Technical Papers,
Feb 2013, pp. 396–397.
167
[44] J. Zahorian, M. Hochman, T. Xu, S. Satir, G. Gurun, M. Karaman, and
F. Degertekin, “Monolithic cmut-on-cmos integration for intravascular ultrasound applications,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 58, no. 12, pp. 2659–2667, Dec 2011.
[45] G. Gurun, P. Hasler, and F. L. Degertekin, “Front-end receiver electronics for
high-frequency monolithic cmut-on-cmos imaging arrays,” IEEE Transactions on
Ultrasonics Ferroelectrics and Frequency Control, vol. 58, no. 8, pp. 1658–1668,
Aug 2011.
[46] P. Helin, P. Czarnecki, A. Verbist, G. Bryce, X. Rottenberg, and S. Severi,
“Poly-SiGe-based cmut array with high acoustical pressure.” IEEE International
Conference on Micro Electro Mechanical Systems (MEMS), Jan 2012, pp. 305–
308.
[47] D. Dausch, J. Castellucci, D. Chou, and O. Von Ramm, “Theory and operation
of 2-d array piezoelectric micromachined ultrasound transducers,” Ultrasonics,
Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 55, no. 11, pp.
2484–2492, 2008.
[48] A. Hajati, D. Latev, D. Gardner, A. Hajati, D. Imai, M. Torrey, and M. Schoeppler, “Three-dimensional micro electromechanical system piezoelectric ultrasound transducer,” Applied Physics Letters, vol. 101, no. 25, pp. 253 101–253 101–
5, 2012.
[49] K. Smyth, S. Bathurst, F. Sammoura, and S.-G. Kim, “Analytic solution for nelectrode actuated piezoelectric disk with application to piezoelectric micromachined ultrasonic transducers,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 60, no. 8, pp. 1756–1767, 2013.
[50] P. Muralt, N. Ledermann, J. Paborowski, A. Barzegar, S. Gentil, B. Belgacem,
S. Petitgrand, A. Bosseboeuf, and N. Setter, “Piezoelectric micromachined ultrasonic transducers based on pzt thin films,” Ultrasonics, Ferroelectrics and
Frequency Control, IEEE Transactions on, vol. 52, no. 12, pp. 2276–2288, 2005.
168
[51] I. Wygant, “A comparison of cmuts and piezoelectric transducer elements for
2d medical imaging based on conventional simulation models,” in Ultrasonics
Symposium (IUS), 2011 IEEE International, 2011, pp. 100–103.
[52] J. Jensen, “Field: A program for simulating ultrasound systems,” in NordicBaltic
Conference on Biomedical Imaging, 1996.
[53] J. A. Jensen and N. B. Svendsen, “Calculation of pressure fields from arbitrarily
shaped, apodized, and excited ultrasound transducers,” IEEE Transactions on
Ultrasonics Ferroelectrics and Frequency Control, vol. 39, no. 2, pp. 262–267,
Mar 1992.
[54] M. Karaman, I. Wygant, O. Oralkan, and B. Khuri-Yakub, “Minimally redundant 2-d array designs for 3-d medical ultrasound imaging,” Medical Imaging,
IEEE Transactions on, vol. 28, no. 7, pp. 1051–1061, 2009.
[55] B.-H. Kim, T.-K. Song, Y. Yoo, J. H. Chang, S. Lee, Y. Kim, K. Cho, and
J. Song, “Hybrid volume beamforming for 3-d ultrasound imaging using 2-d
cmut arrays,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012,
pp. 2246–2249.
[56] J. Song, S. Jung, Y. Kim, K. Cho, B. Kim, S. Lee, J. Na, I. Yang, O.-k. Kwon,
and D. Kim, “Reconfigurable 2d cmut-asic arrays for 3d ultrasound image,” in
SPIE Medical Imaging.
International Society for Optics and Photonics, 2012,
pp. 83 201A–83 201A.
[57] B.-H. Kim, Y. Kim, S. Lee, K. Cho, and J. Song, “Design and test of a fully
controllable 64x128 2-d cmut array integrated with reconfigurable frontend asics
for volumetric ultrasound imaging,” in Ultrasonics Symposium (IUS), 2012 IEEE
International, 2012, pp. 77–80.
[58] M. Rasmussen and J. Jensen, “3-d ultrasound imaging performance of a rowcolumn addressed 2-d array transducer: A measurement study,” in Ultrasonics
Symposium (IUS), 2013 IEEE International, 2013.
169
[59] T. Christiansen, C. Dahl-Petersen, J. Jensen, and E. Thomsen, “2-d row-column
cmut arrays with an open-grid support structure,” in Ultrasonics Symposium
(IUS), 2013 IEEE International, 2013.
[60] M. Rasmussen and J. Jensen, “2-d row-column cmut arrays with an open-grid
support structure,” in Proceedings of SPIE, vol. 8675.
SPIE - International
Society for Optical Engineering, 2013.
[61] X. Zhuang, D.-S. Lin, A. Ergun, O. Oralkan, and B. Khuri-Yakub, “P2p-6 trenchisolated cmut arrays with a supporting frame,” in Ultrasonics Symposium, 2006.
IEEE, 2006, pp. 1955–1958.
[62] D.-S. Lin, R. Wodnicki, X. Zhuang, C. Woychik, K. Thomenius, R. Fisher,
D. Mills, A. Byun, W. Burdick, P. Khuri-Yakub, B. Bonitz, T. Davies,
G. Thomas, B. Otto, M. Topper, T. Fritzsch, and O. Ehrmann, “Packaging
and modular assembly of large-area and fine-pitch 2-d ultrasonic transducer arrays,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on,
vol. 60, no. 7, pp. 1356–1375, 2013.
[63] D.-S. Lin, X. Zhuang, R. Wodnicki, C. Woychik, O. Omer, M. Kupnik, and
B. Khuri-Yakub, “Packaging of large and low-pitch size 2d ultrasonic transducer
arrays,” in Micro Electro Mechanical Systems (MEMS), 2010 IEEE 23rd International Conference on, 2010, pp. 508–511.
[64] S. Smith, H. Pavy Jr, and O. von Ramm, “High-speed ultrasound volumetric
imaging system. i. transducer design and beam steering,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 38, no. 2, pp. 100–
108, 1991.
[65] O. von Ramm, S. Smith, and H. Pavy Jr, “High-speed ultrasound volumetric
imaging system. ii. parallel processing and image display,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 38, no. 2, pp. 109–
115, 1991.
170
[66] J.
Bercoff,
Medical
“Ultrafast
Applications,
from:
ultrasound
Prof.
Oleg
imaging,”
Minin
Ultrasound
(Ed.),
InTech,
Imaging
-
Available
http://www.intechopen.com/books/ultrasoundimaging-medical-
applications/ultrafast-ultrasound-imaging, pp. 3–24, 2011.
[67] O. Couture, M. Fink, and M. Tanter, “Ultrasound contrast plane wave imaging,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on,
vol. 59, no. 12, pp. –, 2012.
[68] G. Montaldo, M. Tanter, J. Bercoff, N. Benech, and M. Fink, “Coherent planewave compounding for very high frame rate ultrasonography and transient elastography,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions
on, vol. 56, no. 3, pp. 489–506, 2009.
[69] J. Bercoff, M. Tanter, and M. Fink, “Supersonic shear imaging: a new technique for soft tissue elasticity mapping,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 51, no. 4, pp. 396–409, 2004.
[70] S. Nikolov, J. Kortbek, and J. Jensen, “Practical applications of synthetic aperture imaging,” in Ultrasonics Symposium (IUS), 2010 IEEE, 2010, pp. 350–358.
[71] M. Docter, R. Beurskens, G. Ferin, P. Brands, J. Bosch, and N. de Jong, “A matrix phased array system for 3d high frame-rate imaging of the carotid arteries,”
in Ultrasonics Symposium (IUS), 2010 IEEE, 2010, pp. 318–321.
[72] S. Krishnan and M. O’Donnell, “Transmit aperture processing for nonlinear contrast agent imaging,” Ultrasonic imaging, vol. 18, no. 2, pp. 77–105, 1996.
[73] A. Nikoozadeh, “Intracardiac ultrasound imaging using capacitive micromachined ultrasonic transducer (cmut) arrays,” Ph.D. dissertation, Stanford University, 2010.
[74] A. Nikoozadeh, I. Wygant, D.-S. Lin, O. Oralkan, A. Ergun, D. Stephens,
K. Thomenius, A. Dentinger, D. Wildes, G. Akopyan, K. Shivkumar, A. Mahajan, D. Sahn, and B. Khuri-Yakub, “Forward-looking intracardiac ultrasound
171
imaging using a 1-d cmut array integrated with custom front-end electronics,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 55,
no. 12, pp. 2651–2660, 2008.
[75] D. Yeh, O. Oralkan, I. Wygant, M. O’Donnell, and B. Khuri-Yakub, “3-d
ultrasound imaging using a forward-looking cmut ring array for intravascular/intracardiac applications,” Ultrasonics, Ferroelectrics and Frequency Control,
IEEE Transactions on, vol. 53, no. 6, pp. 1202–1211, 2006.
[76] C. Tekes, M. Karaman, and F. Degertekin, “Optimizing circular ring arrays
for forward- looking ivus imaging,” Ultrasonics, Ferroelectrics and Frequency
Control, IEEE Transactions on, vol. 58, no. 12, pp. –, 2011.
[77] R. Fisher, K. Thomenius, R. Wodnicki, R. Thomas, S. Cogan, C. Hazard, W. Lee,
D. Mills, B. Khuri-Yakub, A. Ergun, and G. Yaralioglu, “Reconfigurable arrays
for portable ultrasound,” in Ultrasonics Symposium, 2005 IEEE, vol. 1, Sept
2005, pp. 495–499.
[78] R. Fisher, R. Wodnicki, S. Cogan, R. Thomas, D. Mills, C. Woychik,
R. Lewandowski, and K. Thomenius, “Packaging and design of reconfigurable
arrays for volumetric imaging,” in Ultrasonics Symposium, 2007. IEEE, Oct
2007, pp. 407–410.
[79] D. Bianchi, F. Quaglia, A. Mazzanti, and F. Svelto, “A 90Vpp 720MHz GBW
linear power amplifier for ultrasound imaging transmitters in BCD6-SOI.” IEEE
International Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, Feb 2012, pp. 370–372.
[80] B. Haider, “Power drive circuits for diagnostic medical ultrasound.”
IEEE
International Symposium on Power Semiconductor Devices and IC’s, 2006, pp.
1–8.
[81] “MD1712 data sheet: High speed, integrated ultrasound driver IC,” Supertex,
Sunnyvale, CA, USA.
172
[82] “STHV748 data sheet: Quad +/-90V, +/-2A, 3/5 levels, high speed ultrasound
pulser,” STMicroelectronics, Geneva, Switzerland.
[83] “TX734 data sheet: Quad channel, 3-level RTZ, +/-75V, 2A integrated ultrasound pulser,” Texas Instruments, Dallas, TX, USA.
[84] L. Svensson and J. Koller, “Driving a capacitive load without dissipating fCV2.”
IEEE Symposium on Low Power Electronics, Digest of Technical Papers, 1994,
pp. 100–101.
[85] K. Kristoffersen and H. Torp, “Method and apparatus for generating a multi-level
ultrasound pulse,” Apr. 4 2006, U.S. Patent 7,022,074.
[86] S.-Y. Peng, M. Qureshi, P. Hasler, A. Basu, and F. Degertekin, “A charge-based
low-power high-snr capacitive sensing interface circuit,” IEEE Transactions on
Circuits and Systems I: Regular Papers, vol. 55, no. 7, pp. 1863–1872, Aug 2008.
[87] S. Berg, T. Ytterdal, and A. Ronnekleiv, “Co-optimization of cmut and receive amplifiers to suppress effects of neighbor coupling between cmut elements.”
IEEE Ultrasonics Symposium, Nov 2008, pp. 2103–2106.
[88] J. Graeme, Photodiode Amplifiers: Op Amp Solutions. McGraw-Hill, 1995.
[89] I. Cicek, A. Bozkurt, and M. Karaman, “Design of a front-end integrated circuit
for 3d acoustic imaging using 2d cmut arrays,” IEEE Transactions on Ultrasonics
Ferroelectrics and Frequency Control, vol. 52, no. 12, pp. 2235–2241, Dec 2005.
[90] M. S. J. Steyaert, W. M. C. Sansen, and C. Zhongyuan, “A micropower lownoise monolithic instrumentation amplifier for medical purposes,” IEEE Journal
of Solid-State Circuits, vol. SC-22, pp. 1163–1168, Dec 1987.
[91] “AFE5808 data sheet: Fully integrated, 8-channel ultrasound analog front end
with passive CW mixer,” Texas Instruments, Dallas, TX, USA.
[92] “AD9277 data sheet: Octal LNA/VGA/AAF/14-bit ADC and CW I/Q demodulator,” Analog Devices, Inc., Norwood, MA, USA.
173
[93] “MAX2082 data sheet: Octal ultrasound transceiver with integrated AFE,
pulser, T/R switch, and coupling capacitors,” Maxim Integrated, San Jose, CA,
USA.
174
Download