A Column-Row-Parallel ASIC Architecture for 3D Wearable / Portable Medical Ultrasonic Imaging by Kailiang Chen B.E., Tsinghua University (2007) S.M., Massachusetts Institute of Technology (2009) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2014 c Massachusetts Institute of Technology 2014. All rights reserved. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science January 31, 2014 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charles G. Sodini LeBel Professor of Electrical Engineering Thesis Supervisor Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anantha P. Chandrakasan Joseph F. and Nancy P. Keithley Professor of Electrical Engineering Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leslie A. Kolodziejski Chair, Department Committee on Graduate Students 2 A Column-Row-Parallel ASIC Architecture for 3D Wearable / Portable Medical Ultrasonic Imaging by Kailiang Chen Submitted to the Department of Electrical Engineering and Computer Science on January 31, 2014, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract This work presents a scalable Column-Row-Parallel ASIC architecture for 3D wearable / portable medical ultrasound. It leverages programmable electronic addressing to achieve linear scaling for both hardware interconnection and software data acquisition. A 16x16 transceiver ASIC is fabricated and flip-chip bonded to a 16x16 capacitive micromachined ultrasonic transducer (CMUT) to demonstrate the compact, low-power front-end assembly. A 3D plane-wave coherent compounding algorithm is designed for fast volume rate (62.5 volume/s), high quality 3D ultrasonic imaging. An interleaved checker board pattern with I&Q excitations is also proposed for ultrasonic harmonic imaging, reducing transmitted second harmonic distortion by over 20dB, applicable to nonlinear transducers and circuits with arbitrary pulse shapes. Each transceiver circuit is element-matched to its CMUT element. The high voltage transmitter employs a 3-level pulse-shaping technique with charge recycling to enhance the power efficiency, requiring minimum off-chip components. Compared to traditional 2-level pulsers, 50% more acoustic power delivery is obtained with the same total power dissipation. The receiver is implemented with a transimpedance amplifier topology and achieves a lowest noise efficiency factor in theqliterature (2.1 compared to a previously reported lowest of 3.6, in unit of mP a · mW/Hz). A source follower stage is specially designed to combine the analog outputs of receivers in parallel, improving output SNR as parallelization increases and offering flexibility for imaging algorithm design. Lastly, fault-tolerance is incorporated into the transceiver to deal with faulty elements within the 2D MEMS transducer array, increasing yield for the system assembly. Thesis Supervisor: Charles G. Sodini Title: LeBel Professor of Electrical Engineering Thesis Supervisor: Anantha P. Chandrakasan Title: Joseph F. and Nancy P. Keithley Professor of Electrical Engineering 3 4 Acknowledgments Finishing my Ph.D. is not possible without the enduring love from my parents and wife. I would like to thank them for all their support. Recently we have been through difficult moments together, but I look forward to the good days to come. I feel extremely fortunate to work under the joint supervision of Prof. Charlie Sodini and Prof. Anantha Chandrakasan. I am grateful to Charlie, who is a great teacher for me inside and outside of school. I learned from him to always try to seek for insight and intuition behind a problem. I also learned from him to be down-toearth, yet persistent, both in research and in life. I enjoyed our conversations, softball games played together for MTL, Redsox games, and of course, the Hong Kong trip. All of them are unforgettable. I would like to express my gratitude to Anantha. Even as the Department Head with an incredibly busy schedule, I was able to receive ample guidance from him. He is always resourceful and creative, which sets me a standard for a good researcher. I would like to thank Prof. Greg Wornell for being in my thesis committee and providing insights about imaging system trade-offs; Prof. Harry Lee for providing many clever circuit design ideas; Dr. Kai Thomenius for teaching me a lot of ultrasonics know-how; Dr. Brian Brandt for continued support for my test setup and career development; Prof. Thomas Heldt, Tom O’Dwyer, Dr. Dennis Buss, Dr. Peter Holloway, and Mr. Haiyang Zhu for many useful technical discussions. I am thankful for all their help to my project. I am grateful to people who helped me with the hardware system assembly, which is the key to the successful project demonstration. The ASIC fabrication is generously made possible through the TSMC University Shuttle Program. The CMUT samples are obtained from Prof. Butrus (Pierre) Khuri-Yakub’s research group at Stanford University; students Byung Chul Lee, Anshuman Bhuyan, and Jung Woo Choe offered me many handy tips to work with the device. The CMUT-PCB-ASIC flip-chip bonding assembly was done with the help of Dr. Helen Kim and MIT Lincoln Laboratory. The acrylic oil tank and the 3D translation stage were designed and 5 built with the assistance of MIT Central Machine Shop. It has been a pleasant journey because of my colleagues in the Sodini/Lee lab and the Anantha group. In particular, I would like to thank Bonnie Lam, Sabino Pietrangelo, Joohyun Seo, and Katherine Smyth for a lot of intriguing discussions about ultrasonics. Also, I would like to thank Sunghyuk Lee, SungWon Chung, Wei Li, and Marcus Yip for the tremendous help during my tape-outs. Daniel Piedra, Allen Hsu, Bin Lu, and Jerome Lin taught me how to operate a probe station to take accurate measurements on a bare silicon die. Moreover, I would like to thank David He, Amanda Gaudreau, Philip Godoy, Jack Chu, Grant Anderson, Doyeon Yoon, Xi Yang, Eric Winokur, Maggie Delano, Daniel Kumar, Bruno Do Valle, and many more for being great labmates with whom I could hang out and have fun. Last but not least, Coleen Milley and Margaret Flaherty have been very supportive in logistics, who always make sure everything in lab runs smoothly. This project is funded by the C2S2 Focus Center, one of six research centers funded under the Focus Center Research Program (FCRP), a Semiconductor Research Corporation entity; Texas Instruments; and the MIT Center for Integrated Circuits and Systems (CICS). 6 Contents 1 Introduction 23 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Challenge for Implementing a 3D Wearable / Portable Ultrasonic 23 Imaging Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Background Information 29 2.1 Ultrasonic Imaging Modes . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 The Beam-formation Principle . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Ultrasonic Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 Field II Simulation Program . . . . . . . . . . . . . . . . . . . . . . . 36 3 The Column-Row-Parallel Architecture for 3D Ultrasonic Imaging 39 3.1 The Prior Art of Architectures for 3D Ultrasonic Imaging . . . . . . . 39 3.2 The Motivation of the Column-Row-Parallel ASIC Architecture . . . 42 3.3 The Column-Row-Parallel ASIC Architecture . . . . . . . . . . . . . 44 3.4 The Functionality of the Column-Row-Parallel Architecture . . . . . 49 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4 3D Ultrasonic Imaging System Experiments 4.1 55 The Hardware System Assembly . . . . . . . . . . . . . . . . . . . . . 55 4.1.1 58 The PCB-CMUT Connection . . . . . . . . . . . . . . . . . . 7 4.2 4.3 4.4 4.5 4.1.2 The PCB-ASIC Connection . . . . . . . . . . . . . . . . . . . 60 4.1.3 The Flip-Chip Bonding Assembly Process . . . . . . . . . . . 62 4.1.4 Mounting onto the Oil Tank . . . . . . . . . . . . . . . . . . . 66 Plane-wave Coherent Compounding for Fast Volume Rate 3D Ultrasonic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 PWCC for 2D Imaging . . . . . . . . . . . . . . . . . . . . . . 69 4.2.2 Extending PWCC to 3D Imaging on the Column-Row-Parallel Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2.3 PWCC3D Results: Simulations and Measurements . . . . . . 77 4.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Interleaved Checker Board Tx Apertures with I&Q Excitations for HD2 Reduction in Ultrasonic Harmonic Imaging . . . . . . . . . . . . . . . 88 4.3.1 THI Principle and Previous Methods . . . . . . . . . . . . . . 89 4.3.2 Tx HD2 Suppression on the Column-Row-Parallel Architecture 91 4.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 93 Annular Ring Apertures for Forward-looking Imaging Applications . . 96 4.4.1 Annular Ring Apertures on Column-Row-Parallel Architecture 96 4.4.2 Annular Ring Imaging Results . . . . . . . . . . . . . . . . . . 99 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5 Design of the 16x16 Ultrasonic Transceiver Array ASIC with ColumnRow-Parallel Architecture 5.1 103 High-Level Description of the Ultrasonic Imaging Transceiver Circuits and the Architecture Logic Implementation . . . . . . . . . . . . . . . 103 5.2 5.3 Tx Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2.1 Multi-Level Pulsing for Efficient CMUT Driver . . . . . . . . 108 5.2.2 3-Level Pulser Circuit Design . . . . . . . . . . . . . . . . . . 111 5.2.3 Tx Path Design for 2D Ultrasonic Transducer Arrays . . . . . 114 Rx Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.3.1 LNA Optimization Methodology for CMUT . . . . . . . . . . 116 8 5.3.2 LNA Transistor-Level Implementation . . . . . . . . . . . . . 120 5.3.3 Rx Path Design for 2D Ultrasonic Transducer Arrays . . . . . 122 5.4 Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.5 The Fault-Tolerant ASIC Design for Faulty MEMS Devices . . . . . . 131 6 ASIC Characterization 6.1 137 Tx Ultrasonic Power and Efficiency Measurement . . . . . . . . . . . 137 6.1.1 Measuring Acoustic Output Power . . . . . . . . . . . . . . . 138 6.1.2 Measuring Tx Efficiency . . . . . . . . . . . . . . . . . . . . . 141 6.2 LNA Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.3 The Tx Beam-Steering Experiment . . . . . . . . . . . . . . . . . . . 149 6.4 The Pulse-Echo Experiment . . . . . . . . . . . . . . . . . . . . . . . 151 7 Conclusion 155 7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 155 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9 10 List of Figures 2-1 The typical signals and the operation for B-mode ultrasound. . . . . . 30 2-2 Simplified block diagram of a ultrasound BF system, figure courtesy of [27]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2-3 A typical Field II flow diagram for ultrasonic system behavioral simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3-1 Column-parallel architecture implementations in the literature: (a) a 1D transducer array mechanically translated to scan the 3D space, elevation beam-formation is done by a synthetic virtual source technique, figure courtesy of [3]; (b) a 2D array operated to receive row-by-row, elevation beam-formation is done by sub-array delay-and-sum across the column using analog delay lines, figure courtesy of [55]. . . . . . . 41 3-2 The column-row addressing scheme implemented on a 256x256 2D transducer array: (a) row-by-row transmit addressing; (b) columnby-column receive addressing; (c) the “Maltese cross” beam-pattern. Figure courtesy of [38]. . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3-3 A column-row addressing architecture implemented at the circuit-level, with column and row interconnections that reduce the system channel count and provide maximum flexibility for algorithms. . . . . . . . . . 44 3-4 Column-Row-Parallel architecture block diagram, the CMUT and ASIC chips are stacked vertically. . . . . . . . . . . . . . . . . . . . . . . . . 11 45 3-5 (a) The block-level implementation of one transceiver channel and (b) the per-element logic implementation. Column and row select logic is implemented with shift registers that can be reprogrammed in “N ” time (implementation detail will be shown in Figure 5-2). . . . . . . . 47 3-6 (a) Tx input port multiplexing, implemented with digital logic; (b) Rx output port multiplexing, implemented with analog pass-gates. . . . . 49 3-7 The architecture configured in a column-parallel mode for the Tx aperture. The configuration is broken down and illustrated in steps (a) through (d) to help understanding. Two rows are activated as the Tx aperture and beam-formation along azimuth (X) direction is achieved. 51 3-8 The architecture configured in a row-parallel mode for the Rx aperture. Five columns are activated as the Rx aperture and beam-formation along elevation (Y) direction is achieved. . . . . . . . . . . . . . . . . 52 3-9 More use examples of the proposed architecture: (a) a diagonal Rx aperture; (b) a checker board Tx aperture for ultrasonic harmonic imaging; (c) & (d) annular ring Tx and Rx apertures for forwardlooking ultrasonic imaging applications. . . . . . . . . . . . . . . . . . 53 4-1 System integration diagram showing the flip-chip bonding connection between CMUT and ASIC through a PCB interposer. The figure also shows the mechanical setup for imaging experiments, including an oil tank and a 3D translation stage. . . . . . . . . . . . . . . . . . . . . . 56 4-2 The picture of the hardware system setup. . . . . . . . . . . . . . . . 57 4-3 The block diagram of the hardware system setup. . . . . . . . . . . . 57 4-4 The 16x16 CMUT die drawings: (a) the footprint of the CMUT; (b) the CMUT flip-chip bonding pad metal structure drawing, courtesy of [40]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 58 4-5 The two different PCB designs made to fit CMUT footprints: (a) the PCB version A’s footprint for CMUT with a gap distance of 250µm; (b) the PCB version B’s footprint for CMUT with a gap distance of 373.75µm, only 1x16 pads are made on the PCB side due to space limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4-6 The drawing of a PCB pad defined with a solder mask, and bumped with a solder ball. The PCB pad is used to do flip-chip bonding to the CMUT die. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4-7 The ASIC die drawings: (a) the footprint of the ASIC, containing the center 18x16 pads to be element-matched and connected to CMUT through the PCB interposer, and the surrounding I/O pads; (b) the PCB interposer layout design that allows the ASIC I/O pads to be routed out to the PCB edges. . . . . . . . . . . . . . . . . . . . . . . 61 4-8 The ASIC flip-chip bonding pad metal structure drawings: (a) the horizontal view of a flip-chip bonding pad in ASIC; (b) the cross-sectional view of the ASIC flip-chip bonding pad. . . . . . . . . . . . . . . . . 62 4-9 The CMUT-PCB-ASIC two-step flip-chip bonding process: (a) first step, the bonding between PCB and ASIC; (b) second step, the bonding between PCB and CMUT, with ASIC already bonded to PCB. . . 63 4-10 The CMUT-ASIC connection result pictures: (a) the bonded PCBASIC assembly shows good connectivity; (b) the solder bumps at the PCB’s CMUT side is reflowed after PCB-ASIC bonding, any deformation would be restored. . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4-11 The PCB-CMUT bonding connection is verified by pulling off the test CMUT die from the PCB after bonding and reflow. (a) & (b) show the CMUT connection posts remain on the PCB after the pull, indicating good connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4-12 The finished CMUT-PCB-ASIC assembly: (a) cross-sectional view of the sandwich stack; (b) CMUT side assembly picture; (c) ASIC side assembly picture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 65 4-13 The acrylic tank drawings: (a) the tank dimension drawing; (b) the mounting between the oil tank and the CMUT-PCB-ASIC assembly. 66 4-14 The illustration of how PWCC works for 2D ultrasonic imaging, courtesy of [68]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4-15 The principle of coherent compounding used in PWCC, courtesy of [68]: (a) the imaging space; (b) the beam-formation delay calculation when the transmitted plane-wave is normal to the transducer surface (α = 0o ); (c) the beam-formation delay calculation when the transmitted plane-wave is steered to an angle of α. . . . . . . . . . . . . . . . . . 70 4-16 The signal processing flow for PWCC3D on the Column-Row-Parallel architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4-17 The PWCC3D implementation on the Column-Row-Parallel architecture: (a) Tx beam-steering along azimuth (X) direction using columnparallel mode; (b) Tx beam-steering along elevation (Y) direction using row-parallel mode; (c)-(e) Rx signal acquisition, sweeping through 16 rows for each transmit angle. . . . . . . . . . . . . . . . . . . . . . . . 76 4-18 The sequence of operation to implement PWCC3D on the ColumnRow-Parallel architecture. . . . . . . . . . . . . . . . . . . . . . . . . 77 4-19 The setup of the wire phantom imaging experiment using PWCC3D algorithm: (a) a single plane-wave is transmitted to image the wire phantom; (b) five different Tx angles are used along the azimuth direction for PWCC3D. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4-20 Simulation results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional image produced from 5-angle coherent compounded plane-wave insonification; (c) lateral resolution plot from single planewave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal crosssectional image from 5-angle plane-waves. . . . . . . . . . . . . . . . . 14 80 4-21 Measurement results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional image produced from 5-angle coherent compounded plane-wave insonification; (c) lateral resolution plot from single planewave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal crosssectional image from 5-angle plane-waves. . . . . . . . . . . . . . . . . 81 4-22 The setup of the ring phantom imaging experiment using PWCC3D algorithm: (a) a single plane-wave is transmitted to image the phantom; (b) five different Tx angles are used along the azimuth direction and another five Tx angles along the elevation direction to image the phantom with PWCC3D. . . . . . . . . . . . . . . . . . . . . . . . . . 82 4-23 Measured horizontal cross-sectional images of a ring phantom: (a) single-angle Tx plane-wave; (b) 5-angle Tx plane-wave compounding along azimuth direction; (c) 5-angle Tx plane-wave compounding along elevation direction; (d) compounding across all 5-angle azimuth and 5angle elevation directions. . . . . . . . . . . . . . . . . . . . . . . . . 83 4-24 Measured vertical cross-sectional images of a ring phantom: (a) singleangle Tx plane-wave; (b) compounding across all 5-angle azimuth and 5-angle elevation directions; (c) lateral resolution plot of ring image from single-angle Tx plane-wave; (d) lateral resolution plot of ring image from 5-angle X and 5-angle Y plane-waves. . . . . . . . . . . . 84 4-25 Simulated XZ cross-sectional images showing the three cysts in one slice image: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle planewaves compounded; (c) the cross-sectional image location in 3D space. 85 4-26 Simulated YZ cross-sectional images showing the cyst at (−3, 0, 25)mm: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. . . . . . . . . . . . 15 86 4-27 Simulated YZ cross-sectional images showing the cyst at (0, 0, 35)mm: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. . . . . . . . . . . . 86 4-28 Simulated YZ cross-sectional images showing the cyst at (3, 0, 45)mm: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. . . . . . . . . . . . 87 4-29 Implementation of checker board Tx aperture on the proposed architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4-30 Simulation comparison between the conventional and I&Q methods: (a) fundamental component spatial intensity for conventional; (b) fundamental component spatial intensity for I&Q; (c) HD2 spatial intensity for conventional; (d) HD2 spatial intensity for I&Q. . . . . . . . . 94 4-31 Annular ring mode imaging implemented in Column-Row-Parallel architecture: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposed architecture, all active elements are driven in-phase; (c) Rx aperture with the biggest ring shape, all active elements’ analog outputs are combined; (d) Rx aperture with the 2nd ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture with the smallest ring shape. . . . . . . . . . . . . . . . . . . . . . . . 97 4-32 Annular ring mode dynamic beam-formation scheme. . . . . . . . . . 98 4-33 Annular ring configuration example, off-center: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposed architecture; (c) Rx aperture with the biggest ring shape; (d) Rx aperture with the 2nd ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture with the smallest ring shape. . . . . . . . . . . . . . . . . . . . . . . . 100 16 4-34 Cross-section slices of the wire phantom 3D images from simulation and measurement: (a) simulated XZ slice; (b) measured XZ slice; (c) simulated YZ slice; (d) measured YZ slice; (e) simulated XY slice; (f) measured XY slice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5-1 A re-plot of Figure 3-5 in Section 3.3. (a) The block-level implementation of one transceiver channel and (b) the per-element logic implementation. Column and row select logic is implemented with shift registers that can be reprogrammed in “N ” time (implementation detail will be shown in Figure 5-2). . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5-2 Circuit implementation for the logic control: (a) multiplexing for perelement enable bits; (b) Tx row / column selection logic; (c) Rx row / column selection logic. . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5-3 (a) The transmitter load model of a CMUT element used in this work. (b) An exemplary 2-level square wave pulse applied onto CMUT. (c) An exemplary 3-level pulse applied onto CMUT. . . . . . . . . . . . . 109 5-4 Circuit schematic of the four-channel 3-level pulsers with the middlevoltage generation (all transistors are high voltage devices). . . . . . . 111 5-5 The digital control circuits for the pulser: (a) the signal flow and block diagrams; (b) the non-overlapping signal generator; (c) the level shifter implementation; (d) the control signal timing diagram. . . . . . . . . 113 5-6 Tx design for the 2D array: (a) 2D pulser schematic; (b) MUX implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5-7 Small signal model and noise sources of the CMUT element and the LNA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5-8 Transfer functions when the LNA optimality condition is reached. . . 118 5-9 Transfer function examples when the LNA optimality condition of fi ≈ fp is not reached: (a) fi < fp , (b) fp < fi . . . . . . . . . . . . . . . . . 118 5-10 Transfer function examples: (a) fi < fp , (b) fi ≈ fp , (c) fi > fp . . . . 120 17 5-11 The LNA schematic, implemented in the TIA topology. All transistors are low voltage devices except the HV Rx Switch M10. . . . . . . . . 121 5-12 Design optimization for input stage transistors: (a) transistors are sized at the boundary of strong and weak inversion; (b) transistor width is optimized for the lowest noise figure. . . . . . . . . . . . . . . . . . . 122 5-13 The signal and noise combining with two Rx channels in parallel: (a) two channels on the same line, shown in Thevenin’s equivalent circuit at LNA outputs; (b) two channels on the same line, shown in Norton’s equivalent circuit at LNA outputs (c) two channels combined, showing the resultant signal and noise amplitudes. . . . . . . . . . . . . . . . . 124 5-14 The LNA schematic, implemented in the TIA topology. All transistors are low voltage devices except the HV Rx Switch M10. “vip” node is also buffered with a source follower to output (not shown). . . . . . . 127 5-15 Parallelism with even more Rx channels by utilizing intermediate line buffers to preserve the circuit performance. . . . . . . . . . . . . . . . 129 5-16 The biasing circuit for the 2D array. . . . . . . . . . . . . . . . . . . . 130 5-17 The technique used for detecting and isolating the short CMUT elements: (a) front-end transistors in each channel and their control voltages; (b) the effective circuit connection of all 256 channels with CMUT elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5-18 Two successful 16x16 CMUT-ASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC. The rest of the elements are functional and their sensitivity performance is expressed by the brightness of the elements, which will be described in detail in Section 6.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6-1 The photo of the lab setup for measuring the acoustic output power and the Tx efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6-2 Acoustic output power and Tx efficiency measurement setup. . . . . . 140 18 6-3 Normalized RMS pressure along the transducer axial axis, measurement vs. simulation. The measurement deviates from the simulation in the near field because the hydrophone tip is too close to the transducer surface, distorting the pressure field. . . . . . . . . . . . . . . . 140 6-4 (a) Tx efficiency measurement setup and pulse shape definition. (b) Measured time-domain waveform of the optimal 3-level 3.3MHz pulses, ∆=20ns, ∆/T=0.067 . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6-5 Tx efficiency measurement results using different 3-level pulse shapes by varying the ∆/T ratio and at different frequencies. . . . . . . . . . 143 6-6 The die photo of the four-channel ultrasonic imaging transceiver test chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6-7 The die photo of the 256-channel 16x16 2D ultrasonic imaging transceiver test chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6-8 (a) Measured ultrasonic lateral beam profile, steered to the center (broadside). (b) Measured beam profile, with 30ns delay between channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6-9 The setup of the pulse-echo experiment for characterizing the complete ultrasound channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6-10 The key waveforms from the pulse-echo experiment, showing the ultrasound channel characteristics. (a) The transmitted pulse waveform. (b) The received echo waveform. (c) The spectrum of the received echo waveform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 6-11 A re-plot of Figure 5-18 in Section 5.5. Two successful 16x16 CMUTASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC. The rest of the elements are functional and their sensitivity performance is expressed by the brightness of the elements. . . 153 7-1 Four 16x16 ASICs tiled together for a 32x32 imaging front-end. . . . 159 19 7-2 CMUT-ASIC assembly alternatives to eliminate the interposer PCB: (a) TSV technology for interconnecting ASIC I/Os to the main testing PCB; (b) Applying flip-chip bonding technology for CMUT-ASIC interconnection and wire-bonding for ASIC I/Os. . . . . . . . . . . . 159 20 List of Tables 4.1 Simulated HD2 improvement of the I&Q method. . . . . . . . . . . . 95 4.2 Measured HD2 improvement of the I&Q method. . . . . . . . . . . . 95 5.1 SNR improvement from Rx channel parallelism, theory prediction and measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1 Measured Power and Efficiency Comparison at 3.3MHz for the 1D ASIC and CMUT (40pF capacitance per element) . . . . . . . . . . . 143 6.2 Measured Optimal 3-level Pulser Performance Summary for the 1D ASIC and CMUT (40pF capacitance per element) . . . . . . . . . . . 144 6.3 Measured Optimal 3-level Pulser Performance Summary for the 2D ASIC and CMUT (2pF capacitance per element) . . . . . . . . . . . . 144 6.4 CMUT Pulser Performance Comparison . . . . . . . . . . . . . . . . 145 6.5 Measured LNA Performance Summary for the 1D ASIC [5] . . . . . . 145 6.6 Measured LNA Performance Summary for the 2D ASIC . . . . . . . . 146 6.7 CMUT LNA Performance Comparison . . . . . . . . . . . . . . . . . 147 21 22 Chapter 1 Introduction 1.1 Motivation Ultrasonic imaging is an important modality for medical diagnosis. Compared to other imaging modalities, ultrasound is relatively low cost, harmless to human health, and has decent resolution. Modern ultrasonic imaging systems are becoming increasingly complex and powerful, yet compact, benefiting from Moore’s law [1]. Laptop-size ultrasound systems have gained comparable performance to the traditional cart-size machines; hand-held devices, such as the GE Vscan [2], indicates the trend toward highly integrated ultrasonic imaging solutions to enable portable or even wearable ultrasound applications in hospital and at home. Traditional 2D medical ultrasonic imaging systems have been in wide use for decades. A 2D imaging system uses a 1D ultrasonic transducer probe and generates rectangular or sector-shape 2D cross-sectional images of human tissue or organs. These systems exist predominantly in hospital settings where professional sonographers are available to operate the system. They would carefully angle and position the probe against the human body, so as to produce satisfactory 2D medical images for diagnosis. This process is manual and requires extensive training for the operators, adding complexity and extra cost to the diagnostic procedure. On the other hand, 3D medical ultrasonic imaging systems provide a full view of human tissue or organs in space, rather than cross-sectional views in 2D imaging 23 systems. The 3D volumetric image data represent a more comprehensive set of data which could be more easily interpreted to help locate target of medical interest. As a result, the manual search of the “best” 2D slice image performed by the sonographers holding a 1D probe is possible to be substituted with an automated search algorithm in a 3D imaging system. Furthermore, by leveraging advanced microelectronics technology, a compact and low-power ultrasonic hardware system can be built to enable wearable / portable self-monitoring ultrasonic imaging devices at home. Therefore, one could imagine an automated imaging system that continuously tracks human tissue or organs of interest and produces long-term medical information with minimum reliance on experienced sonographers. 1.2 The Challenge for Implementing a 3D Wearable / Portable Ultrasonic Imaging Device A typical 1D array for a 2D imaging system has an element count of as high as one thousand. The interconnection from the transducer elements to the interfacing electronics are co-axial cables. When it comes to 3D imaging systems, 1D ultrasonic transducer arrays had been used historically to acquire the 3D volumetric data, by being mechanically translated [3] or rotated [4] to cover the whole 3D space. A slice of 2D image is formed at each physical position of the 1D array. Multiple 2D slice images are stitched together to form the 3D volumetric image. These mechanical approaches have many disadvantages. For example, the image resolution tends to be poor due to the relatively large incremental step size of the mechanical movement; the image frame rate or volume rate could be limited by the mechanical movement speed; the system integration tends to be bulky and system power consumption is high because a mechanical motor is needed. More recently, 2D ultrasonic transducer arrays made from a micromachining process have become more available and proven to be more suitable for 3D ultrasonic imaging. As a result, the mechanical movement is replaced by electrical addressing; 24 the coarse motor stepping is replaced by the much finer element-to-element spacing; the image frame rate or volume rate is no longer limited by the speed of mechanical movement; and system size and power are reduced to allow long-term wearable / portable hardware solutions. However, an electronic system working with a 2D array is much harder to be built. Most notably, the interconnection between a 2D transducer array and its supporting electronics is a bottleneck. Because a NxN 2D transducer array contains N 2 transducer elements, if a dedicated electronic channel is provided for each transducer element to control the transmit and receive operation, the active channel count of the electronic integrated circuits is also N 2 . Therefore, as the transducer array size grows, it is very difficult to keep up with the N 2 growth of active channels. The hardware complexity, instantaneous power dissipation, and interconnect count would quickly become unmanageable. 1.3 Contribution To overcome the interconnect problem in interfacing to a 2D ultrasonic transducer array for 3D ultrasonic imaging, this thesis proposes new solutions at the circuit, architecture and algorithm levels. At the circuit-level, the analog front-end (AFE) transmitter (Tx) and receiver (Rx) circuits need to be optimized for power efficiency, performance and size, in order to work optimally with the ultrasonic transducer elements [5, 6]. For the transmitter, a 3-level pulse-shaping high voltage pulser is designed to drive the transducer elements with improved power efficiency and minimum off-chip components. For the receiver, a low-noise amplifier (LNA) is implemented with a transimpedance amplifier (TIA) topology to achieve excellent noise, power and bandwidth trade-offs, offering a low power, high efficiency receiver solution. The transceiver front-end circuit is designed to be element-matched to the transducer, replacing traditional cable connections with flip-chip bonding assembly between the 2D transducer die and the 2D electronics ASIC die. The compact, cable-less assembly avoids excessive parasitic capacitance 25 from the cable and leads to an integrated, low-power solution for wearable / portable applications. At the architecture-level, the addressing and control mechanism for the 2D array of elements needs to be designed carefully to not only reduce hardware and interconnect complexity, but also to maintain enough support for software flexibility. A Column-Row-Parallel architecture is proposed to reduce the AFE interconnect requirement from N 2 to N . At the same time, the highly programmable architecture design guarantees strong support for system-level algorithm needs. It is compatible to existing widely used beam-formation algorithms, and provides possibilities of using the 2D array differently for new applications. At the algorithm-level, beam-formation algorithms are also indispensable to compress and generate beamformed ultrasonic data to form the 3D volumetric images. The algorithm design is tightly connected with architecture design and we propose new ways of using the 2D array to achieve fast volume rate imaging with adequate image quality, as well as a new way of reducing transmitter second harmonic distortion (HD2). Extensive in-vitro experiments have been carried out to validate and evaluate the beam-formation algorithms and hardware system performance, including various 3D imaging algorithms, ultrasonic harmonic imaging mode, Tx efficiency characterization, and pulse-echo characterization [5–7]. 1.4 Thesis Organization This thesis is organized into the following chapters: Chapter 2 introduces the needed background information for the discussion of 3D ultrasonic imaging systems in this thesis. This includes a brief description of various ultrasonic imaging modes, the beam-formation principle, and the transducer types. Chapter 3 first lists previous solutions to 3D ultrasonic imaging. A different architecture that offers better system trade-offs is motivated. The overview of the proposed Column-Row-Parallel architecture is then described, which shows the potential to reduce hardware interconnection complexity while maintaining software 26 flexibility. Several examples of operation illustrate the architecture functionality to perform column-parallel addressing, row-parallel addressing, or special patterns. Chapter 4 presents ultrasonic imaging applications that show what the ColumnRow-Parallel architecture is capable of, without going into circuit details yet. It starts with the hardware system assembly description. The CMUT-PCB-ASIC flip-chip bonding assembly process is discussed in detail and the whole electrical + mechanical test setup is shown. Three Column-Row-Parallel application examples are given afterwards. 3D Plane-wave coherent compounding (PWCC3D) algorithm is proposed and demonstrated as a fast volume rate, high quality 3D imaging solution. Annular ring aperture mode is presented for forward-looking intravascular ultrasound (IVUS) and intracardiac echocardiography (ICE) applications. And a checker board pattern is used for second harmonic suppression for ultrasonic harmonic imaging mode. Chapter 5 provides circuit design details for a 16x16 Column-Row-Parallel test chip working with a 16x16 CMUT. The implementation of architecture control logic, transmitter, receiver, and biasing circuits are described. The transmitter and receiver circuit design reflects the optimization considerations for the specific target transducer, in which the sensory interface for capacitive source / load is used. On the other hand, the control logic and the biasing circuits reflect the architecture implementation, which is general to different transducer types. The last section explains the fault-tolerance against transducer defects incorporated by the transceiver circuit implementation, which is critical for front-end electronics working with MEMS devices with large element count. Chapter 6 shows various circuit characterizations, which are complementary to the system experiments described in Chapter 4. The transmitter and the receiver are characterized as individual blocks; their circuit performance is summarized. Several acoustic / electrical characterizations are also carried out, including the Tx beamsteering demonstration, and pulse-echo experiment. Finally, Chapter 7 concludes the work with a summary of contributions and lists directions for future work. 27 28 Chapter 2 Background Information This chapter provides the needed background information about ultrasonics, in preparation for the discussion of 3D ultrasonic imaging systems. 2.1 Ultrasonic Imaging Modes Ultrasonic imaging systems are generally active imaging systems. The system stimulates the transducers to transmit ultrasonic waves into the medium (human body); the reflected ultrasonic echoes are then received and processed to generate images, which visualize the medium [8–10] or provide flow information through Doppler processing [11–15]. Medical ultrasound systems use different “imaging modes” to assist various diagnoses [8,9]. For visualization of the tissue anatomy, the most common imaging modes include A, B, C and M modes [9, 10]. The B-mode is the most common mode and its typical operation is shown in Figure 2-1. The imaging system uses a 1D transducer array and pulsed ultrasonic waves to probe the tissue medium, in order to acquire a 2D grayscale image of the tissue. At time 0, the transmitter circuit drives the transducer to emit the ultrasonic pulse as shown by the red pulse. The pulse travels through the tissue at the sound speed c, typically 1540m/s in human soft tissue [16]. When it hits some medium interfaces, the mechanical impedance mismatch at each interface generates reflected ultrasonic waves. An interface at depth Z leads to a 29 Tp Time 0 Z td T A Medium Interface (Mechanical Impedance Mismatch) Z T+td Z The B-mode Image (t=0) (t=td) Figure 2-1: The typical signals and the operation for B-mode ultrasound. received ultrasonic echo at time td = 2Z/c, as shown by the blue pulse. Because the echo amplitude is proportional to how large the mechanical impedance mismatch is, the amplitude information is translated to the grayscale intensity of pixels in the image. Meanwhile, the time delay from the received echo to the transmit instance (td ) translates to the depth, indicating the interface location in the image. A simplified grayscale image is also shown in the figure. The transmit-receive action is repeated after time T , such that the B-mode image can be continuously updated in time. The period T is called the pulse repetition period (PRP), and it needs to be long enough to ensure that all ultrasonic echoes from the previous transmission are back. Given that the ultrasonic wave travels at the sound speed of about 1540m/s and the typical image depth of 7.5cm, one transmitreceive repetition will take approximately 100µs (2 × 7.5cm ÷ 1540m/s = 97µs). The reciprocal of PRP is called the pulse repetition frequency (PRF), which is the number of pulses per second. It is a term frequently used in active imaging systems such as the ultrasound, sonar or radar systems. A typical PRF in ultrasound is 10kHz corresponding to the 100µs PRP. Depending on applications, commonly used PRFs can be from 5 to 20kHz. 30 The red transmit pulse shown in Figure 2-1 is composed of 2 bursts of sinusoids with a cycle period of Tp . While it shows a typical case, the sinusoidal pulse shape can be replaced by other pulse shapes, such as discrete level pulses, which will be discussed in this thesis. The number of bursts in one transmission can also be variable depending on applications. Generally speaking, more bursts lead to stronger reflected echoes, while less bursts lead to better image axial resolution because of the shorter pulse duration. B-mode imaging commonly employs 2-5 bursts per transmission; and PW Doppler imaging (see next paragraph) employs as many as 20 bursts to improve signal strength in the received echoes. Besides direct visualization of tissue anatomy, the Doppler effect is used to obtain blood flow velocity information inside human body [17]. There are mainly three Doppler modes: Continuous Wave (CW), Pulsed Wave (PW) and Color Flow Mode (CFM) Doppler [11–15]. The CW Doppler is the earliest mode, which transmits continuous ultrasonic waves into human body and detects Doppler frequency shift from the echo waves [13]. It is simple and reliable, but lacks range information. The PW Doppler improves upon the CW mode by repeatedly sending pulsed ultrasonic waves into the medium [14]. The time of flight of the received echoes contains the range information, and the slight timing difference between consecutive echo pulses reflects the object movement1 . Sub-sampling at the PRF is usually carried out before the spectrum analysis for the PW Doppler frequency shift [11]. The CFM Doppler is used to present velocity information as a color-coded image, which is often overlaid on top of a B-mode image. Time-domain autocorrelation based signal processing techniques are often used to speed up the CFM processing [15]. The velocity estimation accuracy is good enough for color-coded visualization. Many more imaging modes exist. For example, the Harmonic Imaging mode uses the second harmonic of the pulse to provide high resolution images [18–22]; the Power Mode Doppler visualizes the magnitude of Doppler signal, rather than the frequency 1 It is important to point out that in the PW mode, the Doppler effect does not come from the frequency shift of a single received echo pulse, since a short pulse is broadband, and therefore it is difficult to detect the small Doppler frequency shift (typically less than 100KHz). Besides, the frequency-dependent attenuation through the tissue complicates the task even more. Instead, it is the velocity-dependent time delay across several pulses, that carries the velocity information. 31 &/#!, 0/).4 6!2)!",% $%,!93 !22!9 /54054 3)'.!, !.!,/' !$$%2 !$# Figure 2-2: Simplified block diagram of a ultrasound BF system, figure courtesy of [27]. shift, to help identify the existence of low flows and velocities [23]. Furthermore, many imaging modes are used together as Duplex or Triplex modes for the best visualization [24, 25]. 2.2 The Beam-formation Principle Beam-formation (BF) is heavily involved in ultrasonic imaging, to increase the signalto-noise ratio (SNR), to focus the ultrasound beam to deliver more power, and to steer the beam to scan the imaging space [8,9,12,26,27]. The beamforming algorithms are based on the delay-and-sum principle, which is shown in Figure 2-2. When a focus is specified, delays are calculated for each ultrasound channel, so that the pulses from different channels travel the same distance between the corresponding transducer elements and the focus. The implementation of beam-formation can be either analog or digital, and the beam-formation can be achieved at both the transmitting and receiving paths. Because of the denser integration, higher flexibility, and lower power consumption, digital beamforming is favored in modern systems. Ultrasonic imaging systems are often operating at both the near field (or Fresnel zone) and the far field (or Fraunhofer zone) regions [28–30]. For a round-shape, nonfocused, single element transducer, the boundary between the near field and the far 32 field regions is usually defined at2 : L= D2 , 4·λ (2.1) in which the D is the diameter of the transducer surface and the λ is the ultrasound wavelength. In the near field, the pressure amplitude varies drastically, with many local maximums and minimums. This complex characteristic is caused by the constructive and destructive interference wave patterns of ultrasound beam. In the far field, the pressure amplitude decreases monotonically with distance and the ultrasound beam diverges at the angle θ defined as: sin (θ) = 1.22 Dλ . At the boundary of the near and far field, where the distance is roughly given by Equation (2.1), the maximum pressure amplitude, or equivalently the maximum ultrasound intensity, is reached; and the beamwidth is minimized at the same time. According to [28–30], the effective beamwidth is approximately equal to half of the transducer diameter D; the pressure amplitude is therefore about 2 times of the pressure amplitude at the transducer surface. Because of this unique property, it is advantageous for ultrasonic imaging to operate close to the near and far field interface, for best SNR and lateral resolution. As a simple numerical example, a typical single element transducer for an intracranial pressure (ICP) measurement has a diameter of about 1.5cm [32, 33]. The typical operating frequency is 2MHz and the typical ultrasound speed in human soft tissue is 1540m/s [16], giving a wavelength of 0.77mm. The interface distance calculated from Equation (2.1) is therefore 7.3cm, which is about the same distance from the target brain blood vessel to the transducer3 . Because the system operates heavily in near field region, time-domain techniques for beamforming and processing are common in ultrasonic imaging. Consequently, 2 Depending on applications, there are many different definitions [31]. The one used in this article is most widely used in medical ultrasound area. 3 For transducers with more complex shapes and structures, the equations presented above will be slightly different by some factors. But the effective aperture size D can be used to approximate the element diameter, and the conclusions about near field and far field more or less stay the same. 33 the ultrasound pulses are short-duration, wideband signals to facilitate time based algorithms. In additional to the basic delay-and-sum beam-formation principle, several techniques are often used to improve the visualization, creating a more homogeneous image quality throughout the full depth [8, 9, 12]. They have been applied to imaging experiments of our work. • Dynamic focusing: Instead of a fixed array delay pattern for a fixed focal point in the space, the dynamic focusing technique implements a continuously moving focal point across different imaging depth. The array elements are controlled to focus signals at a shallow depth at the beginning; as time progresses (corresponding to depth increase), the array delay pattern is gradually modified to move the focal point into deeper depth until the end of the imaging depth. Compared to a single focal point, dynamic focusing generates high detail resolution and high contrast resolution for all depths. It can be relatively easily implemented by a digital beamformer at the receive side. • Constant F-number imaging: F-number (F #) is the ratio of focal length (f ) to the imaging aperture diameter (D), as in (2.2). F# = f . D (2.2) It is an important concept in optics, photography, and ultrasound. In ultrasound, the constant F-number imaging technique keeps a constant F # by gradually enlarging the active aperture (D) as the focused imaging depth (f ) grows larger. The result of this technique is a constant lateral resolution and it is often used in conjunction with the dynamic focusing technique. 2.3 Ultrasonic Transducers Currently, 1D ultrasonic transducer arrays for 2D medical ultrasound images is the common practice [8, 12, 34–36]. The transducer arrays are usually built with piezo34 electric materials. Element count of an array can be as high as one thousand. The interconnection to the electronics are co-axial cables. 3D ultrasonic imaging can be achieved by translating or rotating a 1D transducer array over the space [3, 4], but the accuracy and speed is limited by the mechanical movements. As a result, 2D transducer arrays and the supporting 2D electronics are more desirable for 3D ultrasonic imaging. There are commercial 3D imaging systems utilizing 2D transducer arrays. For example, Philips Matrix X6-1 is a 2D array that contains 9,212 elements [37]. However, cables are still needed for the interconnections between the transducer probe and the data acquisition system, which might not be the best solution for 3D imaging, due to the high channel count. Additionally, the 2D transducers have been built from piezoelectric materials [37,38], where manual dicing is often needed to separate individual array elements. The interconnection and yield problems are challenging as the array gets larger and the element size gets smaller. The capacitive micromachined ultrasonic transducer (CMUT) [39–41] is an alternative to the traditional piezoelectric transducers (PZTs). The CMUT technology offers advantages such as improved bandwidth, ease of fabricating large arrays, and potential for integration with electronics with the through-silicon vias (TSVs) [40,42,43] or monolithic CMUT-CMOS integration [44–46]. But there are also challenges for CMUT. Most importantly, the output power and efficiency are still relatively low, partly due to the large parasitic device capacitance. The primary reason for the large parasitic capacitance is the physical structure of the CMUT element, which forms a parallel-plate capacitor [41]. As a result, the transmitter and receiver circuitry that interfaces to CMUT is different from that for PZT. They need to be designed appropriately to prevent excessive performance degradation caused by the load that is much more capacitive and higher impedance. The piezoelectric micromachined ultrasonic transducers (PMUTs) also emerge as another possible 2D transducer solution for 3D imaging [47–51]. It combines the piezoelectric material with micromachining techniques, trying to exploit the benefits from both worlds. The piezoelectric material tends to provide transduction with relatively high efficiency and good linearity, while the micromachining process helps 35 create fine-pitched 2D arrays with higher yield and reliability. As a technology in its early research phase, it has shown initial success of a 5x5 working array [47]. More works are being done to address problems with this technology, including how to enhance the device bandwidth to generate images with better axial resolution; and how to reduce the intrinsic device parasitic capacitance from the high permittivity of the piezoelectric material [48, 49, 51]. In this thesis, we design block-level circuits for CMUT, but our architecture and system innovations are not limited to a particular transducer type, as will be discussed in succeeding chapters. 2.4 Field II Simulation Program In our work, we make heavy use of the Field II Simulation Program [52, 53] to model the complete hardware and software setup. Field II is a behavioral simulation package running under MATLAB (The MathWorks, Natick, MA) Environment. Figure 2-3 shows a typical Field II simulation flow diagram. The users have the freedom of defining the ultrasonic phantom (i.e. the medium being imaged by the system), transducer property, pulsing / receiving methods, beam-formation algorithms, and image processing / display methods. Based on the user definition, Field II simulates the ultrasound transducer fields and ultrasonic imaging using linear acoustics. The phantom definition is realized by specifying point scatterers in space with different reflecting amplitudes. It can be a simple single scatterer phantom that characterizes the point spread function of an imaging system; or complex shapes defined by a set of scatterers. Moving structures can also be instantiated by a sequence of phantoms with slight position changes over time, which is useful in simulations for ultrasonic Doppler systems. The transducers are defined with the type, frequency response and active aperture. The transducer types include 1D, 1.5D, 2D arrays, as well as curved arrays with concave or convex shapes. The transducer element dimensions can be freely specified and the element frequency response is described by its impulse response. Transmit 36 Figure 2-3: A typical Field II flow diagram for ultrasonic system behavioral simulation. and receive apertures are defined separately, while the active elements are selectable within the array. Two other properties associated with the active apertures are the focus and apodization. Through the focus specification, the beam-formation delays can be automatically calculated for each element in an aperture. The apodization gives amplitude weights for signals at different transducer elements. Both focus and apodization can be a function of time, in which dynamic focusing / apodization is realized. The pulsing excitation for the transducer is supplied to the array by a time-domain pulse waveform. Based on the pulsation, phantom definition and transducer property, the received echo waveforms from every element in the Rx aperture are produced by the Field II simulator. Beam-formation is performed on the collected echo waveforms; and the beamformed waveforms can then be used to construct a 2D or 3D image, or further processed for Doppler information. With the ultrasonic field simulation, Field II helps verify the acoustical physics and visually show the ultrasonic pressure field generated by the transducer. With 37 the capability of incorporating different beam-formation algorithms, it allows the development and validation of new architecture-level and system-level ideas. It could also be used to model non-ideality from circuits and transducers, so that a practical understanding of the real imaging system can be achieved. As will be seen in the following chapters, Field II simulation plays an important role in the thesis work. 38 Chapter 3 The Column-Row-Parallel Architecture for 3D Ultrasonic Imaging This chapter describes our approach to solve the challenges in realizing a 3D medical ultrasonic imaging system. The analog front-end architectural trade-offs are first discussed and the design process of the Column-Row-Parallel architecture is presented. The implementation of the proposed architecture is then shown, which is both scalable for hardware realization and flexible for software algorithm support. The functionality of the implemented architecture is then described. 3.1 The Prior Art of Architectures for 3D Ultrasonic Imaging A 2D NxN transducer array is often used to acquire 3D volumetric data, where the architecture of the front-end circuit interfacing to the transducer array is an important design consideration. The most straightforward way to interconnect to a 2D transducer array is to use a fully-parallel architecture, but it is not very scalable for hardware implementation. 39 A fully-parallel architecture requires N 2 active transceivers that are operating at the same time. As a result, it requires N 2 independent input control lines for the transmitter array and N 2 output data lines for the receiver array. As the array size grows bigger, the required channel count will be correspondingly larger and this is difficult to scale up economically. On the other extreme, a serialized system could be used to save channel count, but it is usually too slow for data acquisition. One could serialize the input control lines and/or the output data lines of the aforementioned fully-parallel system, so that the number of interconnect lines needed is reduced. Due to the large number of channels to be serialized, the data rate requirement would become too high to be practical, following a similar N 2 scaling trend. Alternatively, one could use a singlechannel transceiver to sweep the 2D array, one element at a time. The transceiver is connected to each element by multiplexing and it repeatedly transmits and receives ultrasound with different elements in the array to acquire a full data set [40]. Given that one transmit-receive repetition could take as long as 100µs (Section 2.1), and that the total time consumed to gather one full data set increases with N 2 trend, the image frame rate would greatly suffer as the array size continues to grow bigger. Therefore, to alleviate the conflict between hardware complexity and data acquisition speed in 3D ultrasonic imaging systems, there is a lot of research on various sub-array architectures that lie in between the fully-parallel architecture and the serialized single-channel architecture. In [43], the diagonal elements in a full 2D array are used to form the receive aperture, while the rest of the 2D elements are used to form the transmit aperture. At the transmitter side, it is close to a fully-parallel architecture because almost all elements are being used. To provide the transmit beam-formation delay pattern for all transmitters, the digital delay values are serially streamed in to program each transmitter. It saves the interconnection but slows down the programming speed. At the receive side, the output channel count is reduced to N from N 2 because only the diagonal sub-array elements are used. This diagonal sub-array approach leads to an elevated side-lobe level that degrades the image contrast. Similarly, [54] investigated possibilities of various sparsely sampled 40 Figure 3-1: Column-parallel architecture implementations in the literature: (a) a 1D transducer array mechanically translated to scan the 3D space, elevation beamformation is done by a synthetic virtual source technique, figure courtesy of [3]; (b) a 2D array operated to receive row-by-row, elevation beam-formation is done by sub-array delay-and-sum across the column using analog delay lines, figure courtesy of [55]. 2D aperture patterns. But because the sub-array is fixed once the pattern is chosen, the reduction of active elements generally leads to higher side-lobes and worse image resolution performance. To avoid a fixed sub-array pattern selection, another sub-array idea of using either 3x3 or 5x5 elements is described in [37]. The sub-arrays are programmable and each sub-array performs beam-formation to compress the received data into one channel, reducing the overall channel count by a factor of 9 or 25. To maintain the image quality and avoid introducing artifacts, programmable delay patterns for the subarray are required. This requirement directly translates into analog delay lines in a hardware implementation, which tends to be bulky and power hungry. In [3, 4], a conventional 1D transducer array is used as a sub-array and is mechanically translated or rotated to achieve synthetic 3D imaging, as shown in Figure 3-1(a). The active channel count is reduced to N and the synthetic beam-formation technique could produce good image quality, as long as the object being imaged is static or moving at a much slower speed than the image frame rate, to avoid mo41 tion artifact. The major drawback in this solution is the mechanical implementation, which is both a bottleneck for frame rate due to the slow movement speed, and a bottleneck for power saving due to the large amount of power needed to drive a motor. More recently, to replace the mechanical translation, an electrical scanning front-end architecture is implemented as shown in Figure 3-1(b) [55–57]. The receiver channels are turned on row-by-row to collect reflected ultrasound echoes. By activating different rows of transducer elements over consecutive ultrasound transmits, it effectively mimics the translation of a 1D transducer array, but much faster and lower power. 3.2 The Motivation of the Column-Row-Parallel ASIC Architecture The work in [3] and [56, 57] both employ row-by-row (i.e. column-parallel) operation to reduce number of active channels from N 2 to N . The 3D image quality from the column-parallel architecture is very good in the azimuth (X) direction because each row can perform full beam-formation along the azimuth direction. However, the beam-formation along the elevation (Y) direction is poor. Techniques such as synthetic virtual source [3] are used to enhance the focusing in elevation with limited success in Figure 3-1(a). Analog delay lines are also attempted to realize elevational beam-focusing to achieve good imaging performance in Figure 3-1(b) [55]. But for the same reason mentioned in the previous section, the analog delay lines lead to large power and silicon area overhead, making system integration difficult. To cover both azimuth and elevation directions for 3D volumetric imaging, a column-row addressing scheme has been implemented for a 2D transducer design as shown in Figure 3-2 [38, 58–60]. By dicing the transducer top plate row-by-row and dicing the bottom plate column-by-column, the transducer can be driven row-by-row in transmit (Figure 3-2(a)) and column-by-column in receive (Figure 3-2(b)). The combined “Maltese cross” shaped beam-pattern (Figure 3-2(c)) makes it suitable to carry out beam-formation both in azimuth and elevation directions. At the same 42 Figure 3-2: The column-row addressing scheme implemented on a 256x256 2D transducer array: (a) row-by-row transmit addressing; (b) column-by-column receive addressing; (c) the “Maltese cross” beam-pattern. Figure courtesy of [38]. time, the interconnection complexity for the array is still kept at a linear growth (2*N). The column-row addressing implemented on the transducer-level has shown potential to be a balanced architecture solution for both good image performance and hardware scalability. However it still suffers from a lack of flexibility, because the transducer array is hard-wired to be divided into rows and columns. The limitation of only addressing the elements by one row or one column at a time provides limited freedom for the supporting algorithm design. On the other hand, if one could implement a similar column-row addressing architecture at the circuit-level instead of at the transducer-level, as depicted in Figure 3-3, the element addressing mechanism could be much more flexible. With the highly programmable control support from the electronics, various sub-array patterns could be possible on the same system, allowing more versatile functionality and more design freedom at the system-level. 43 Figure 3-3: A column-row addressing architecture implemented at the circuit-level, with column and row interconnections that reduce the system channel count and provide maximum flexibility for algorithms. 3.3 The Column-Row-Parallel ASIC Architecture In our work, a Column-Row-Parallel architecture is implemented at the circuit-level with much more diverse functionality and a better trade-off between complexity and speed. Figure 3-3 in the previous section is a conceptual drawing of the proposed architecture, while Figure 3-4 shows a detailed picture. 2D CMUTs are chosen as the target transducer arrays for this work, because of its ease of integration and scalability [39,40,43]. But the same architecture design can be applied to other types of 2D ultrasonic transducers easily. As shown in Figure 3-4, a 2D CMUT (16x16 transducer arrays are used in this work) is DC biased at 30-50V from the common top membrane and each CMUT element’s bottom pad is connected to its corresponding ASIC channel. The DC bias network is provided off-chip with the resistor and the capacitor being shared across all CMUT elements in the array [40, 41]. As indicated by both Figure 3-3 and Figure 3-4, there is a transmitter (Tx) pulser, a receiver (Rx) low noise amplifier (LNA), and a receiver high voltage (HV) protection switch per electronic channel, under each 44 Shared External Biasing CMUT ASIC Column Select Logic Rx Rx BUF Rx Rx Delay Gate Dr BUF Gate Dr Gate Dr Delay BUF Delay Gate Dr Delay BUF Column Circuitry Figure 3-4: Column-Row-Parallel architecture block diagram, the CMUT and ASIC chips are stacked vertically. 45 CMUT element. The total silicon layout area of a transceiver is designed to be the same as a CMUT element’s area, which is 250µm × 250µm in this work, so that the ASIC channels can be element-matched to the CMUT pitch. The Tx pulser gate drivers and Rx buffer amplifiers are placed at the ASIC perimeter to interface to the transceiver array. There are 16 copies of Tx drivers and Rx buffers at the column side and another 16 copies at the row side, reducing the ASIC I/Os down to “N ”1 . Zooming into one transceiver channel located at ith column and the j th row, Figure 3-5 shows that Tx and Rx operations are independent and time-multiplexed. The control inputs of the transceiver channel include: the ith column select signals (T c[i], Rc[i]) supplied from the column side, the j th row select signals (T r[j], Rr[j]) from the row side, and the local per-element enable bits (T en, R en). The column and row select signals are designed to be only active at one side, they cannot be asserted at the same time. The signals are input to the per-element logic unit, shown in Figure 3-5(b), to generate corresponding internal switch controls including: T r, T c, Rr, Rc, and RxSw. T r and T c determine whether the Tx pulser is driven by the column side or the row side, or none, in which case the pulser is turned off. When the Tx element [i, j] is enabled (T en = 1) and the j th Tx row is selected (T r[j] = 1), the internal switch control signal T r becomes high and the Tx pulser gate drive signals are supplied from the Column Gate Driver[i]. The array’s Tx path is in column-parallel mode. When the Tx element [i, j] is enabled (T en = 1) and the ith Tx column is selected (T c[i] = 1), the internal switch control signal T c becomes high and the Tx pulser gate drive signals are supplied from the Row Gate Driver[j]. The array’s Tx path is in row-parallel mode. When the Tx element [i, j] is disabled (T en = 0); or when neither Tx row or Tx column is selected (T r[j] = T c[i] = 0), both T r and T c are low and the Tx pulser is turned off, ignoring gate drive signals from both column and row gate drivers. Similarly, Rr and Rc determine whether the Rx LNA outputs its analog signal to the column side or the row side, or none, in which case the LNA is turned off. 1 Figure 3-6 shows the I/Os are N instead of 2N . 46 Tc[ i ] Rc[ i ] Transceiver [ i, j ] Tr[ j ] Rr[ j ] T_en R_en R Row Gate Driver[ j ] b T b Row BUF[ j ] Column Gate Driver[ i ] Tc[ i ] T_en Tr[ j ] T_en Column BUF[ i ] Rc[ i ]+Rr[ j ] R_en Rc[ i ] R_en Rr[ j ] R_en Figure 3-5: (a) The block-level implementation of one transceiver channel and (b) the per-element logic implementation. Column and row select logic is implemented with shift registers that can be reprogrammed in “N ” time (implementation detail will be shown in Figure 5-2). 47 When the Rx element [i, j] is enabled (R en = 1) and the j th Rx row is selected (Rr[j] = 1), the internal switch control signal Rr becomes high and the Rx LNA output is connected to the Column Buf f er[i]. The array’s Rx path is in columnparallel mode. When the Rx element [i, j] is enabled (R en = 1) and the ith Rx column is selected (Rc[i] = 1), the internal switch control signal Rc becomes high and the Rx LNA output is connected to the Row Buf f er[j]. The array’s Rx path is in row-parallel mode. When the Rx element [i, j] is disabled (R en = 0); or when neither Rx row or Rx column is selected (Rr[j] = Rc[i] = 0), both Rr and Rc are low and the Rx LNA is turned off, presenting as high output impedance to both column and row buffers. The Rx HV protection switch protects low voltage Rx electronics from high voltage Tx transients. An additional internal control signal, RxSw, is generated to control the gate of the protection switch. Whenever the Rx LNA is activated and connected to either column or row buffer, the HV switch is turned on (RxSw = 1) to allow CMUT signal to reach LNA for amplification. The HV switch is off when the LNA is not activated, and it also remains off during Tx pulsing to isolate the high voltage pulsing transients. The detailed circuit implementation for generating column / row select signals as well as the per-element enable bits will be the topic of Chapter 5. But as a high-level description, these selection and enable bits are stored in shift registers (SR’s) which can be programmed serially. The column and row select signals are 16-bit long for the 16 columns and rows, while the per-element enable bits are 512-bit long, accounting for 1-bit Tx enabling and 1-bit Rx enabling for each CMUT element in the 16x16 array. Furthermore, two multiplexed banks for each control set are implemented. For example, there are two multiplexed 512-bit SR banks for per-element enable bit programming. One SR bank can be used in normal operation while the other bank is being reprogrammed. Alternatively, two SR banks can be both initiated so that one could quickly alternate between the two banks to achieve fast aperture switching between two pre-defined aperture patterns. Lastly, because either column side or row side will be activated at one time, the 48 Row Gate Driver[ 15 ] Row BUF [ 15 ] Tx_IN [ 15 ] Row Gate Driver[ 0 ] Row BUF [ 0 ] Tx_IN [0] Column BUF [ 0 ] Column Gate Column Gate Driver[ 0 ] Driver[ 15 ] Column BUF [ 15 ] Rx_OUT [ 0 ] Rx_OUT [ 15 ] Figure 3-6: (a) Tx input port multiplexing, implemented with digital logic; (b) Rx output port multiplexing, implemented with analog pass-gates. column and row circuits share I/O ports by multiplexing, as shown in Figure 3-62 . For Tx, the multiplexing switches are implemented with digital logic gates; for Rx, the multiplexing switches are implemented with analog pass-gates for analog signal outputs. In this way, the input ports for Tx beamforming control and output ports for Rx received waveforms are both 16 instead of 32 for a 16x16 array, saving the chip I/O count considerably. And the chip’s interface scaling trend becomes N (rather than 2N ), which is the same trend as a 1D array for 2D imaging. 3.4 The Functionality of the Column-Row-Parallel Architecture In this section, a few examples will be utilized to help understand how the proposed Column-Row-Parallel ASIC architecture could be used for 3D ultrasonic imaging. Figure 3-7 shows an exemplary configuration of a column-parallel mode Tx aperture on the 16x16 CMUT-ASIC system. Note that the exemplary configuration is broken down and illustrated in steps to help understanding, but the actual ASIC 2 This implementation detail is not shown in most other block diagram figures to avoid complication. 49 configuration is carried out as a whole in one step. In this example, two of the 16 row select signals are turned on so that the two rows of Tx elements are activated, as shown by the red squares in Figure 3-7(a). Because the array is operating in columnparallel mode, all elements along the same column are in parallel as shown by the red column connection lines in Figure 3-7(b). The elements on the same column are driven by a shared Tx column gate driver as in Figure 3-7(c). Because the 16 column gate drivers can be controlled independently, by supplying the driver signals with different delay timings, the 16 Tx columns emit ultrasonic waves at slightly different timing with respect to each other. This delay pattern could be configured to perform ultrasonic beam-focusing or beam-steering along the azimuth direction, as shown in Figure 3-7(d). Figure 3-8 shows another exemplary configuration, in which a row-parallel mode Rx aperture is programmed on the 16x16 CMUT-ASIC system. Five columns are activated in this example by the column select signals, and each five Rx elements on the same row are in parallel. Their outputs are combined in the analog domain, which is buffered by a shared Rx row buffer. The 16 analog outputs are digitized by off-chip ADCs. Afterwards, the digitized channel data can be processed digitally to perform beam-formation along the elevation direction. As mentioned in previous section already, the Tx and Rx paths are completely independent. Therefore, the Tx path can be configured into a row-parallel mode and the Rx path can be in column-parallel mode too. The number of active rows or columns can also be programmable depending on the need. The reprogramming time for the active rows and columns is fast, because the row and column select signals are generated at the side of the array and it only takes N clock cycles, making it scalable as the array size grows. When multiple rows (the case for columns is similar) are activated, they operate in parallel and effectively behave as a “thicker” row compared to when only one row is activated. The azimuth beam-focusing is the same while the additional elevation thickness could provide larger signal strength. This feature offers freedom at the system-level. As will be seen in Chapter 4, different number of rows or columns 50 Row Select Logic Column Select Logic Row Select Logic Column Select Logic Row Select Logic Column Select Logic Row Select Logic Column Select Logic D0 D1 D15 Column Tx Drivers: Beamform delays Tx beamform in X (azimuth) D0 D1 D15 Column Tx Drivers: Beamform Delays Figure 3-7: The architecture configured in a column-parallel mode for the Tx aperture. The configuration is broken down and illustrated in steps (a) through (d) to help understanding. Two rows are activated as the Tx aperture and beam-formation along azimuth (X) direction is achieved. 51 Figure 3-8: The architecture configured in a row-parallel mode for the Rx aperture. Five columns are activated as the Rx aperture and beam-formation along elevation (Y) direction is achieved. can be selected for transmit or receive, to achieve the desired imaging requirements (volume rate, resolution, etc.). The innovative circuit structures realizing the feature are discussed in Chapter 5. In addition to row-by-row or column-by-column operations, the array can also be programmed into more complex aperture patterns for specific ultrasonic imaging applications. This programming is accomplished through the proper use of the perelement enable bits under each element for both Tx and Rx paths. For example, in Figure 3-9(a), only the diagonal elements are configured with a Rx per-element enable bit of 1 (R en = 1) while all other Rx element’s enable bits are 0. The system is in row-parallel mode and all 16 column select signals are on, so that the 16 diagonal Rx elements receive ultrasound echoes and output to the 16 row buffers. In this way, a diagonal Rx aperture is formed, achieving the same functionality as described in [43]. Figure 3-9(b) shows another example, where a checker board pattern is activated 52 Column Select Logic Row Select Logic To External ADCs Row Select Logic Column Select Logic DD Column Select Logic D D Row Select Logic Row Select Logic Column Select Logic D D To External ADCs Figure 3-9: More use examples of the proposed architecture: (a) a diagonal Rx aperture; (b) a checker board Tx aperture for ultrasonic harmonic imaging; (c) & (d) annular ring Tx and Rx apertures for forward-looking ultrasonic imaging applications. 53 for Tx path. All 16 Tx row select signals are activated while Tx per-element enable bits define the checker board pattern inside the array. The column Tx gate drivers supply the same delay profile for the 16 columns so that effectively all activated Tx elements emit ultrasound pulses in-phase. This checker board Tx aperture could help reduce second harmonic generation for the emitted ultrasound pressure field, which is useful in ultrasonic harmonic imaging applications. It will be discussed in more detail in Chapter 4. The annular ring apertures are shown in Figure 3-9(c) and (d) for Tx and Rx paths respectively. The ring shapes are adjustable in both column-parallel or rowparallel modes. The Tx elements activated for the annular ring are driven in-phase as indicated by the same delay values supplied by the row gate drivers in Figure 39(c). Similarly, the Rx annular ring outputs the received ultrasound echoes through the column buffers. The digitized waveforms from different channels can be summed in-phase to form a single annular ring Rx waveform. The application of annular ring apertures for forward-looking 3D ultrasonic imaging will be presented in Chapter 4. 3.5 Summary The Column-Row-Parallel architecture provides both scalability and flexibility. First, column and row select signals are fast to be reprogrammed, which are linearly scalable as the 2D array size grows bigger (“N ” scaling trend). They can activate rows or columns for beam-formation in azimuth (X) or elevation (Y) directions. Second, per-element enable bits offer fine granularity to form application-specific patterns, such as the diagonal, checker board and the annular ring apertures. Moreover, each control set has two multiplexed SR banks, which allow normal operation based on one bank while reprogramming the other, or fast aperture switching between two pre-programmed banks. Lastly, the architecture is compatible with many existing beam-formation schemes [38, 40, 43, 55, 58], while offering new possibilities as will be shown later. 54 Chapter 4 3D Ultrasonic Imaging System Experiments In this chapter, the system-level 3D ultrasonic imaging experiments are described. The imaging system is assembled based on our custom designed prototype analog front-end chip implementing the proposed Column-Row-Parallel architecture, interfacing to a 16x16 2D CMUT transducer array. The detailed design, implementation and characterization of the AFE chip will be described in Chapter 5 and 6, but here we will focus on the system-level capability of the proposed architecture and various beam-formation algorithms suitable for the architecture. 4.1 The Hardware System Assembly The experiments are conducted based on the real integrated hardware system, in which a 16x16 CMUT chip and a 16x16 AFE custom chip are integrated as a complete 3D ultrasonic imaging front-end. As mentioned in Chapter 3, the layout area of each AFE transceiver channel is element-matched to each CMUT element with a size of 250µm × 250µm, so that the 16x16 array area of the CMUT and the ASIC is matched and can be vertically integrated. For integration, each CMUT element provides the electrical interconnection using a through silicon via (TSV) to a bonding pad at the bottom side of the die, as has been described by many papers from CMUT 55 Figure 4-1: System integration diagram showing the flip-chip bonding connection between CMUT and ASIC through a PCB interposer. The figure also shows the mechanical setup for imaging experiments, including an oil tank and a 3D translation stage. literature [39–43,61]. Each AFE channel of the ASIC also provides a flip-chip bonding pad. Solder balls have been placed onto all ASIC pads with a solder bumping process as one of the final steps in ASIC fabrication by the foundry. Figure 4-1 shows how the CMUT and ASIC are integrated together to form a 3D ultrasonic imaging front-end system. To interconnect to both the CMUT die and the ASIC die while providing footprint flexibility, a PCB interposer is fabricated and used to do flip-chip bonding to CMUT and ASIC at both sides respectively [62, 63]. The PCB vias directly connect an individual CMUT element to its ASIC transceiver channel. The CMUT-PCBASIC assembly is then plugged into the main testing PCB for measurements. The oil tank contains vegetable oil as an in-vitro approximation to human fat and a 3D translation stage is made to help hold various measurement tools or imaging phantoms for experiments. The actual test setup picture is shown in Figure 4-2, and the corresponding block 56 16-channel Data Acquisition System A Metal Ring Phantom on top of CMUT 3D Translation Stage Holder Main Testing PCB Tank with vegetable oil Figure 4-2: The picture of the hardware system setup. Main Testing PCB FPGA Control: ASIC Initialization DC-DC Converter Control Tx / Rx Switching Tx Beamforming Rx Gain Control Column / Row Mode Select Column / Row Select PC: Rx Beamforming 3D Image Display Power Supplies (HV, analog, digital, etc.) Phantom & Measurement Setup 16-ch Data Acquisition Figure 4-3: The block diagram of the hardware system setup. 57 4mm 2x16 pads to CMUT common top membrane 16x16 pads to individual CMUT elements’ bottom plate CMUT Flip-Chip Bonding Pad Drawing CMUT Height ~0.5μm 4.5mm Gap is 250um or 373.75um Pitch is 250um Figure 4-4: The 16x16 CMUT die drawings: (a) the footprint of the CMUT; (b) the CMUT flip-chip bonding pad metal structure drawing, courtesy of [40]. diagram is shown in Figure 4-3 to give an abstract view. 4.1.1 The PCB-CMUT Connection At the PCB-CMUT connection side, the footprint of the CMUT samples comes with two different possible configurations. As shown by Figure 4-4(a), the CMUT main elements’ pad array is 16x16 with the pitch of 250µm, and there is an additional 2x16 pad array used for connection to the common top membrane to provide the DC bias voltage, or the CMUT’s “ground”. The gap between the “ground” and the main array can be either 250µm or 373.75µm, depending on the specific CMUT batches made at the supplier. This necessitates the need for the PCB interposer, so that different PCBs can be designed to fit different footprints. The CMUT flip-chip bonding pad metal structure is also shown in Figure 4-4(b). The pad metal stack is composed of Ti-Cu-Au and the pad diameter is 50µm at a pitch of 250µm. Two PCB designs are correspondingly made to adapt to the two different CMUT footprints, as shown in Figure 4-5(a) and (b). In version A, the gap between the “ground” and the main array is 250µm, and all 18x16 pads are made in a pitch of 58 PCB version A CMUT footprint PCB version B CMUT footprint Figure 4-5: The two different PCB designs made to fit CMUT footprints: (a) the PCB version A’s footprint for CMUT with a gap distance of 250µm; (b) the PCB version B’s footprint for CMUT with a gap distance of 373.75µm, only 1x16 pads are made on the PCB side due to space limitations. 250µm. In version B, because the gap is 373.75µm, only 1x16 pads for “ground”, instead of 2x16 pads, are laid out on the PCB due to space limitations. But because the 2x16 pads are redundant, the omission still allows correct electrical connection between PCB and CMUT. All PCB pads’ pitch is also 250µm. Because the CMUT pads are not solder bumped at its initial fabrication, and it is difficult to do solder bumping on an individual die, we need to perform solder bumping for the pads on the PCB side, so that the flip-chip bonding can still be made between PCB and CMUT. To accommodate PCB solder bumping, the PCB pads are made with electroless nickel immersion gold (ENIG) with a metal stack of Ni-Cu-Au. The pad diameter is 190µm. Because the pad pitch is 250µm, it leaves a clearance of 60µm between two pads. The pads are drilled into vias of 150µm diameter with a mechanical drill1 . The vias are filled and plated with ENIG at both sides of the PCB. The solder mask is then covered onto the pad with laser direct imaging (LDI) technology, to define a solder mask thickness of roughly 13µm and a pad opening 1 A laser drill could produce even smaller drills, but the smaller holes cannot be epoxy filled and plated over. As a result, 150µm mechanical drills are used. 59 PCB pad design and solder bumping drawings CMUT die: 4.5mmX4mm All pitch = 250um Pad open = 4mil (100um) Solder Mask Thickness = 0.5mil Pad finish: ENIG (Ni-Cu-Au) Pad size = 7.5mil (190um) PCB: size ~2inchX2inch; thickness ~30mil, FR4 Figure 4-6: The drawing of a PCB pad defined with a solder mask, and bumped with a solder ball. The PCB pad is used to do flip-chip bonding to the CMUT die. diameter of 100µm. The solder mask thickness and the pad opening size is defined such that a solder ball diameter of 100µm can be placed onto the PCB pad. The drawing of a PCB pad bumped with a solder ball is shown in Figure 4-6. The PCB interposer is fabricated with FR4 material with a thickness of 0.76mm. The solder balls have a commonly used composition of 63% Sn and 37% Pb, with a diameter of 100µm. Both versions of the PCB are fabricated by Sierra Circuits, Inc., Sunnyvale, CA; and the PCB solder bumping is performed by Pac Tech - Packaging Technologies, Santa Clara, CA. 4.1.2 The PCB-ASIC Connection At the PCB-ASIC connection side, the ASIC die is already solder bumped. Therefore, the PCB pads at the ASIC side are without solder bumps and are used to do flip-chip bonding directly. The ASIC footprint is shown in Figure 4-7(a). The center area of the ASIC is occupied by a grid of 18x16 pads, which are 16x16 AFE channels and the 2x16 CMUT biasing pads. They are element-matched to the CMUT die’s connecting pads through the PCB interposer. The perimeter of the ASIC are a ring of pads with 2-pad width, which are used as the I/O pads, providing ASIC’s power supplies, ground, input controls and output signals. These ASIC I/O pads are also flip-chip bonded to the PCB interposer, and are further routed to the four edges of the PCB, 60 5.5mm 2x16 pads providing CMUT bias Surrounding 2x pad rings 16x16 AFE channels 18x16 pads connecting to CMUT through PCB 6mm Surrounding 2x pad rings are for ASIC I/Os Pitch is 250um Figure 4-7: The ASIC die drawings: (a) the footprint of the ASIC, containing the center 18x16 pads to be element-matched and connected to CMUT through the PCB interposer, and the surrounding I/O pads; (b) the PCB interposer layout design that allows the ASIC I/O pads to be routed out to the PCB edges. for interconnection to the main testing PCB, as shown in Figure 4-7(b). The fact that the I/O pad ring is of 2-pad width ensures that only a 2-layer PCB design is needed. Since the PCB interposer is of fine pitch at 250µm, and that the wire spacing is as tight as 60µm, keeping the PCB layer requirement to the minimum can help reduce the manufacturing cost greatly. In Figure 4-8, the ASIC flip-chip bonding pad’s metal structure is depicted. The structure is made using the dedicated metal layers for flip-chip bonding pads provided by the silicon process, in which MD is the redistribution metal layer for routing between the ASIC’s top metal (M6) to the flip-chip bonding pads, and the Under Bump Metallurgy (UBM) is the material forming the pad structure under the solder bump. 61 ASIC Flip-Chip Bonding Pad Drawings Solder ball height after bumping ≈ 80um Solder ball diameter ≈ 100um Pad size (UBM) = “C” = 80um Pad open size = “A” = 50um (b) (a) Figure 4-8: The ASIC flip-chip bonding pad metal structure drawings: (a) the horizontal view of a flip-chip bonding pad in ASIC; (b) the cross-sectional view of the ASIC flip-chip bonding pad. 4.1.3 The Flip-Chip Bonding Assembly Process The bonding process is performed on a FC150 flip-chip bonder (SET Corporation SA, Smart Equipment Technology, France). The process contains the flip-chip bonding steps between PCB-CMUT and PCB-ASIC respectively. Each side’s assembly is first tested and verified to be working with spare CMUT and ASIC chips, in which different process parameters, such as the tacky flux, the bonding force, reflow temperature profile, etc., are tweaked for an optimal result. Afterwards, a two-step bonding process is performed to obtain the full assembly. First Step: Bonding between PCB-ASIC The first step is the flip-chip bonding between PCB and ASIC. As shown in Figure 4-9(a), the PCB is picked up by the arm (chip holder) of the flip-chip bonder, with ASIC-side PCB pads facing down; and the ASIC is fixed horizontally by the chuck (substrate holder) of the flip-chip bonder, with the solder-bumped ASIC pads facing up. Tacky flux is applied onto the ASIC chip. 3000 grams of bonding force is applied 62 The CMUT-PCB-ASIC Two-Step Bonding Process Arm (chip holder) Arm (chip holder) CMUT PCB PCB ASIC Chuck (substrate holder) ASIC Chuck (substrate holder) (a) (b) Figure 4-9: The CMUT-PCB-ASIC two-step flip-chip bonding process: (a) first step, the bonding between PCB and ASIC; (b) second step, the bonding between PCB and CMUT, with ASIC already bonded to PCB. and the bonded assembly is reflowed in a Centrotherm Reflow Oven, with a peak temperature of 215o C and a dwell time of 12 seconds. The reflow is done in N2 atmosphere. After that, the half-finished assembly is cleaned in propanol over night. The PCB-ASIC connection shows a success rate of close to 100%, and a picture of the connection is shown in Figure 4-10(a). Optionally, PCB-ASIC connection can be verified by doing electrical tests on ASIC through the PCB interconnections. If the ASIC operates as expected, the connections are very likely to be good since all perimeter I/O pad connections are verified to be normal. During the flip-chip bonding, because the arm vacuum holder holds the PCB by its CMUT-side, the solder bumps at the PCB’s CMUT side are slightly deformed. However, since the bonded assembly goes through a reflow process, any solder ball deformation is restored after the reflow. Figure 4-10(b) shows the solder balls at PCB’s CMUT side after the reflow, and it shows good uniformity in shape. Second Step: Bonding between PCB-CMUT The second step is the flip-chip bonding between PCB and CMUT. As shown in Figure 4-9(b), the CMUT die is picked up by the arm, with its pads facing down 63 PCB-ASIC Connection PCB’s CMUT-side Solder Bumps after Reflow (b) (a) Figure 4-10: The CMUT-ASIC connection result pictures: (a) the bonded PCB-ASIC assembly shows good connectivity; (b) the solder bumps at the PCB’s CMUT side is reflowed after PCB-ASIC bonding, any deformation would be restored. Figure 4-11: The PCB-CMUT bonding connection is verified by pulling off the test CMUT die from the PCB after bonding and reflow. (a) & (b) show the CMUT connection posts remain on the PCB after the pull, indicating good connectivity. 64 Figure 4-12: The finished CMUT-PCB-ASIC assembly: (a) cross-sectional view of the sandwich stack; (b) CMUT side assembly picture; (c) ASIC side assembly picture. and its membrane surface touching the vacuum holder. The PCB-ASIC assembly is fixed horizontally by the chuck, with the solder-bumped PCB’s CMUT-side pads facing up. Tacky flux is applied onto the PCB’s CMUT side. 2000 grams of bonding force is applied and the bonded assembly is reflowed in N2 atmosphere with a peak temperature of 215o C and a dwell time of 12 seconds. The complete assembly is thus finished. Underfill is not applied to either steps, and no significant mechanical degradation has been observed over the testing period. The PCB-CMUT connection took us a few trials before reaching a fully-functional 16x16 array. Electrical characterization and fault-tolerant design techniques are also key factors leading to the fully-functional array, which will be discussed in more detail in Section 5.5. Mechanically, flip-chip bonding trials had been performed on spare dummy CMUT chips to obtain a correct bonding force. When bonding forces of 2000-3000 grams were applied, the PCB-CMUT bonding produced best results. It was verified by pulling off test CMUT die from the PCB after bonding and reflow, 65 CMUT ASIC Figure 4-13: The acrylic tank drawings: (a) the tank dimension drawing; (b) the mounting between the oil tank and the CMUT-PCB-ASIC assembly. as shown in Figure 4-11(a) and (b). The CMUT was removed with great force, and after its removal, the majority of CMUT TSV posts remain connected with the PCB pads, indicating a strong bonding result. Figure 4-12(a) shows a complete sandwich stack of the CMUT-PCB-ASIC assembly. Also in Figure 4-12(b) and (c), the CMUT side and ASIC side assembly pictures are shown. It has also been proven over time that although the arm vacuum holder is holding the CMUT by its membrane surface during the PCB-CMUT bonding process, it does not break the CMUT device or affect its operation afterwards. 4.1.4 Mounting onto the Oil Tank The CMUT-PCB-ASIC assembly is closely mounted onto an acrylic oil tank, so that the assembly can be directly used to perform in-vitro imaging experiments. As shown 66 in Figure 4-13(a), the tank is a cube with a side length of approximately 3 inches. The tank is designed to be mounted on top of the CMUT-PCB-ASIC assembly and its bottom has a hole in the center to expose the CMUT chip. There are threaded screw holes at tank’s bottom plane so that the PCB can be screw mounted under the tank. To help improve sealing, rubber gasket is inserted between the tank and PCB, and silicone industrial sealant (General Electric RTV 110 Series) is applied to both sides of the gasket. To help hold imaging phantoms or measurement tools, a 3D translation stage is also added into the hardware setup, as already been shown in Figure 4-1 and Figure 4-2. The translation stage is fixed with respect to the oil tank to avoid any relative movement. As a final comment for this section, the 16x16 CMUT device still contains a few defective, non-functional elements. The ASIC is designed to be fault-tolerant to the CMUT defects, as will be discussed in Section 5.5, so that the defective elements are disabled while the remaining functional elements operate normally. This faulttolerant design strategy has been a key factor ensuring the successful assemblies. Meanwhile, because the number of non-functional elements are limited (less than 10 in some of the best assemblies), their effect on the imaging quality is not severe. The imaging experiments presented in Sections 4.2 and 4.4 are carried out on the full 16x16 assemblies with defects. To minimize the loss of elements, digital interpolation has been implemented on the received signals, although the transmitter side does not have the interpolation capability2 . When the loss of elements are not acceptable, as is the case in Section 4.3, a 10x10 sub-array with all functional elements are used for the experimental demonstration. 2 If transmitter side interpolation is also desired, a pulser design implemented as a linear amplifier can be used. The pulse amplitude and phase in neighboring channels of the missing one can be adjusted, to implement the interpolation. More discussions are in Section 7.2. 67 4.2 Plane-wave Coherent Compounding for Fast Volume Rate 3D Ultrasonic Imaging For 3D ultrasonic imaging, we face not only the challenge of massive channel count as a hardware limitation (as already been discussed in Chapter 3), but also the challenge of a proper imaging scheme so that the 3D space can be imaged with satisfactory quality and volume rate. A real-time imaging system needs good image quality (resolution, contrast, etc.) for visualization and high volume rate to avoid severe motion blurring, but these two considerations are conflicting requirements that are especially hard to reconcile in 3D imaging. Comparing a 3D imaging system with a 2D array to a 2D imaging system with a 1D array, both the number of transceiver channels and the image spatial span are significantly increased. More channels translates to more data to collect and process, more image spatial span translates to the necessity of transmitting and receiving from more ultrasonic beams to cover the whole volumetric space. Previously, efforts have been made to achieve fast volume rate in 3D imaging systems by transmitting a “fat” ultrasonic beam at one time and doing parallel processing of 8 ultrasonic beams in the receive mode [64, 65]. This parallel beamforming technique is called Explososcan and it achieves a volume rate of 8 volume/s, with 32 transmit channels and 32 receive channels on a 289-element 2D array (17x17). More recent studies have pushed this concept to the extreme, by transmitting a planewave ultrasonic beam and doing massively parallel beam-formation at the receive end. The method is called plane-wave coherent compounding (PWCC) [66–69]. The plane-wave emission illuminates a large space with one transmission, decreasing the data acquisition time and greatly increasing the volume rate. The plane-wave can also be steered to multiple different angles, and the received data from different angles can be coherently compounded to yield ultrasonic images with better contrast and less speckle. Moreover, because the data processing is done after all channel data is collected, with synthetic beam-formation techniques, PWCC is in essence a software beam-formation process. It is highly flexible and scalable, with its computational 68 Figure 4-14: The illustration of how PWCC works for 2D ultrasonic imaging, courtesy of [68]. complexity proportional to number of pixels / voxels to be displayed in the final image. 4.2.1 PWCC for 2D Imaging The PWCC method was demonstrated on a 1D array for 2D imaging from previous literature [66–69]. The intuitive illustration is shown in Figure 4-14. The 1D array emits plane-waves, which have different wavefront angles. Under each transmitted plane-wave angle, the received waveforms from all channels are collected and stored. Normal delay-and-sum beam-formation is then carried out on each angle’s data set to obtain the coarse 2D image of lower contrast and resolution. Finally, coherent compounding is performed across images obtained from different transmit angles. As a result, a higher quality image is produced. The principle of coherent compounding is illustrated in Figure 4-15. The receive side beamforming delays are calculated as in focused imaging. It is based on the time-of-flight from the center of the transducer array (0, 0) to a spatial point in the 2D image with the coordinates of (x, z), then back to the receiving element at (x1 , 0), 69 Figure 4-15: The principle of coherent compounding used in PWCC, courtesy of [68]: (a) the imaging space; (b) the beam-formation delay calculation when the transmitted plane-wave is normal to the transducer surface (α = 0o ); (c) the beam-formation delay calculation when the transmitted plane-wave is steered to an angle of α. as in Equation (4.1) (c is sound speed): τRX (x1 , x, z) = q z 2 + (x − x1 )2 /c. (4.1) However, the transmit side beamforming delays need to take into account the propagation of the plane-wave angle. It is done by adding a constant time offset to the original delays used to generate the plane-wave, which effectively rotates the plane wavefronts about a point “behind” the transducer by an angle of α. The delay for a spatial point at (x, z) in the 2D image is in Equation (4.2): τT X (α, x, z) = (z · cos α + x · sin α) /c. (4.2) Combining both the transmit and the receive side delays, the propagation time from the center of the transducer array to (x, z) is expressed in Equation (4.3): τ (α, x1 , x, z) = τT X + τRX . (4.3) Additional techniques such as the constant F-number aperture scaling and apodization, as mentioned in Section 2.2, can also be applied. Investigations in [68, 70] have shown that approximately 7 to 9 plane-wave acquisitions are both adequate 70 and practical for coherent compounding. Therefore the plane-wave acquisitions have 10x reduction in number of transmissions than traditional focused emissions, while producing images with comparable quality. Extensive image quality measurement metrics have been used to reach the conclusion, including: -10dB lateral resolution, contrast, side-lobe amplitude, and image SNR. The reduction in number of transmissions could translate to less system power consumption, or higher image frame rate, with similar image quality as conventional methods. 4.2.2 Extending PWCC to 3D Imaging on the Column-RowParallel Architecture The previous PWCC implements plane-wave steering along the azimuth (X) direction only, so that the 2D images can be coherently compounded. It is quite natural to extend the plane-wave insonification to be steered in both azimuth (X) and elevation (Y) directions, so that the whole 3D space can be illuminated and the compounding can be performed over the volumetric images. This possibility has been briefly mentioned in [71], in which a 32x32 2D transducer array is built and a 3D imaging system is proposed. However, no detailed algorithm explanations or hardware measurement results are exhibited. On the contrary, our proposed 3D imaging architecture could be a suitable hardware platform to support the plane-wave coherent compounding in 3D (PWCC3D). The algorithm and the hardware realization will be described in this section. PWCC3D Signal Processing The beam-formation and coherent compounding procedure can be easily extended to 3D imaging, as shown in Figure 4-16. On our 16x16 imaging system assembly, each data set of 256 received echo waveforms is associated with one transmit angle. Totally p transmit angles “α X1” to “α Xp” can be steered along the azimuth direction and q transmit angles “β Y 1” to “β Y q” are steered along the elevation direction. The delay-and-sum beam-formation is applied onto each data set, yielding a 3D volumetric 71 image for each transmit angle. And finally the volumetric images for each angle can be coherently compounded to produce a high quality 3D image. Each voxel in the volumetric image is beam-formed from the 256-channel data, the equations for calculating the delay values for each channel can be revised from Equations (4.1)-(4.3) to adapt to 3D imaging. The receive side beamforming delays are calculated based on the time-of-flight, but the coordinates are extended to 3D. The distance is from the center of the transducer array (0, 0, 0) to a spatial point (i.e. the voxel) in the 3D image with the coordinates of (x, y, z), then back to the receiving element at (x1 , y1 , 0), as in Equation (4.4): τRX (x1 , y1 , x, y, z) = q z 2 + (x − x1 )2 + (y − y1 )2 /c. (4.4) The transmit side beamforming delays are used to account for the propagation of the plane-wave angle. Depending on whether the plane-wave is steered across the azimuth or elevation direction, the delays are calculated differently. Equation (4.5) is used when the column-parallel mode is active and the plane-waves are steered along the azimuth direction, with an transmit angle of α. The delay for a voxel at (x, y, z) in the 3D image is: τT X azimuth (α, x, y, z) = (z · cos α + x · sin α) /c. (4.5) Equation (4.6) is used when the row-parallel mode is active and the plane-waves are steered along the elevation direction, with an transmit angle of β. The delay for a voxel at (x, y, z) in the 3D image is: τT X elevation (β, x, y, z) = (z · cos β + y · sin β) /c. (4.6) Combining both the transmit and the receive side delays, the delay value from the center of the transducer array (0, 0, 0) to voxel (x, y, z) can be summarized with 72 Tx: α_X1 Complex Domain Tx: α_Xp Tx: β_Y1 Tx: β_Yq 16x16 Rx Waveforms 16x16 Rx Waveforms 16x16 Rx Waveforms 16x16 Rx Waveforms Hilbert Transform Hilbert Transform Hilbert Transform Hilbert Transform Delay-andsum BF Delay-andsum BF Delay-andsum BF Delay-andsum BF 3D Image α_X1 3D Image α_Xp 3D Image β_Y1 3D Image β_Yq Coherent Compounding Envelop Detection (absolute value) Final 3D Image Figure 4-16: The signal processing flow for PWCC3D on the Column-Row-Parallel architecture. 73 Equation (4.7) for azimuth and elevation steering: q q τazimuth (α, x1 , y1 , x, y, z) = z · cos α + x · sin α + τelevation (β, x1 , y1 , x, y, z) = z · cos β + y · sin β + z2 2 2 + (x − x1 ) + (y − y1 ) /c, z 2 + (x − x1 )2 + (y − y1 )2 /c. (4.7) Under each transmit angle, a coarse 3D image is formed by applying delay-andsum beam-formation algorithm on 256-channel data, with the delay values calculated from Equation (4.7). Figure 4-16 indicates that the Hilbert transformation is first performed to convert the original channel data into the “in-phase” signal I(t) and “quadrature” signal Q(t) to preserve the phase information. When compounding is performed across different transmit angles, the voxel values are added in both I(t) and Q(t), which maintains the data coherency, hence the name coherent compounding. The final compounded 3D image is obtained by taking the amplitude of I(t) and Q(t) q ( I(t)2 + Q(t)2 ) of the voxels using envelope detection. Because the beam-formation is performed on each voxel while utilizing the same set of data, the beamformer is a software beamformer and the processing is very scalable and flexible. The data acquisition is only done once so that the data under every Tx angle is stored. The beam-formation can be done independently over the voxels of interest in the space. One could first perform beam-formation and image display over a large space with large voxel spacing for a coarse volumetric image; after spotting feature of interest, one could perform a second-pass processing using the same collected data, over a smaller space with finer voxel spacing, which would generate higher definition volumetric images. In this way, a flexible, low-power, software beamformer can be designed to adapt to different user scenarios for optimal trade-offs between power consumption, processing speed and image quality. In addition, constant F-number technique is applied during the delay-and-sum beam-formation (see Section 2.2). Voxels closer to the transducer surface will have a smaller active aperture contributing to its beam-formation, while voxels farther away will exploit a bigger active aperture for the beam-formation. 74 Implementing PWCC3D on the Column-Row-Parallel Architecture The implementation of PWCC3D on the proposed Column-Row-Parallel architecture is shown in Figure 4-17. All elements are turned on during the transmit phase, so that a steered plane-wave can be emitted. In Figure 4-17(a), the array is configured in the column-parallel mode for its Tx path. Since each of the 16 Tx pulser drivers at the column side is supplied with an independent delay to drive the 16 elements along the same column, the 16 columns can be delayed with respect to each other, thus implementing beam-steering along the azimuth (X) direction. Similarly, to achieve beam-steering along the elevation (Y) direction, as shown in Figure 4-17(b), the array’s Tx path is arranged in the row-parallel mode, and 16 elements along the same row are driven by the shared Tx pulser driver at the row side. During the receive phase, the receive channels are turned on row-by-row, as can be seen from Figures 4-17(c)-(e). For each row, 16 ultrasonic echo waveforms are sensed by the activated CMUT elements and amplified by the receiver AFE. The waveforms are then buffered on-chip by the column buffers and digitized by external ADCs, before stored digitally in a PC. To collect all 256 elements’ echo waveforms, 16 consecutive ultrasonic insonifications of the same transmit angle are generated, while the 16 rows are activated serially, such that the whole 16x16 aperture is swept. This operation sequence is also illustrated in Figure 4-18. There are p angles along azimuth and q angles along elevation used to generate the final compounded 3D image. Each angle is transmitted and the echo waveforms are collected for all 256 channels. Under each angle, 16 transmit-receive repetitions are needed to acquire all channel data as shown in the inset of Figure 4-18. As a result, totally 16 × (p + q) transmitreceive repetitions are needed for the processing of a final compounded image. For a general case of a NxN array, the transmit-receive repetitions needed for acquiring one volumetric image becomes N × (p + q). For a imaging system running at a certain PRF, the time for one transmit-receive repetition is the PRP (PRF and PRP are defined in Section 2.1). Therefore, as shown in (4.8) and (4.9), the acquisition time increases linearly with respect to array size growth (“N ” scaling trend); and the 75 (Tx Plane-wave Steer in X) Row Select Logic Plane-wave Delays (Row-parallel) Column Select Logic Row Select Logic Column Select Logic (Tx Plane-wave Steer in Y) Plane-wave Delays (Column-parallel) (a) (b) Column Select Logic Row Select Logic Column Select Logic Row Select Logic Row Select Logic Column Select Logic Acquire 16 waveforms each repetition Acquire 16 waveforms each repetition Acquire 16 waveforms each repetition (Step 16 rows for all 256) (Step 16 rows for all 256) (Step 16 rows for all 256) (c) (d) (e) Figure 4-17: The PWCC3D implementation on the Column-Row-Parallel architecture: (a) Tx beam-steering along azimuth (X) direction using column-parallel mode; (b) Tx beam-steering along elevation (Y) direction using row-parallel mode; (c)-(e) Rx signal acquisition, sweeping through 16 rows for each transmit angle. 76 Time Tx: α_X1 Rx (collecting all echo waveforms) Tx: α_X1 Tx: α_Xp Tx: α_X2 Rx: Row1 Tx: β_Y1 Tx: β_Y2 Rx Rx Tx: α_X1 Rx Rx: Row2 Tx: α_X1 Rx Tx: β_Yq Rx Rx: Row16 1 transmit-receive repetition 16 transmit-receive repetitions: Acquire full 16x16 waveforms under one Tx angle Figure 4-18: The sequence of operation to implement PWCC3D on the Column-RowParallel architecture. volume rate of a PWCC3D imaging system is inversely proportional to N . This is a benign scaling trend for 3D imaging systems, because of the row-by-row or columnby-column data reception capability provided by the architecture. Acquisition T ime = N × (p + q) × P RP = V olume Rate = 4.2.3 N × (p + q) ⇔ O (N ) , P RF 1 P RF = ∝ N −1 . Acquisition T ime N × (p + q) (4.8) (4.9) PWCC3D Results: Simulations and Measurements To evaluate the performance of PWCC3D, both Field II simulations and real measurements are carried out. Simulations are compared against the measurements, and various Tx angles are used to demonstrate the PWCC3D algorithm. 77 Z/ depth Wire Phantom Crosssectional Image Crosssectional Image Wire Phantom Y/ elevation CMUT X/ azimuth CMUT Single plane-wave, avg 5x (a) 5 plane-wave angles: (-6.7o,-3.3o,0o,3.3o,6.7o) (b) Figure 4-19: The setup of the wire phantom imaging experiment using PWCC3D algorithm: (a) a single plane-wave is transmitted to image the wire phantom; (b) five different Tx angles are used along the azimuth direction for PWCC3D. Wire Phantom A wire phantom is first imaged by the 16x16 array setup in simulation and measurements, so that the spatial impulse response can be recorded for the imaging system. The physical setup is shown in Figure 4-19. The wire phantom is placed at 7.5mm away from the transducer surface, horizontal to the surface. Transmit pulsation is 2 bursts of 8.33MHz pulses3 . A constant F-number of 1.75 is used for beam-formation and the rectangular window is used for both Tx and Rx apodization. Single Tx plane-wave angle insonification is compared against five Tx angles (−6.7o , −3.3o , 0o , 3.3o , 6.7o ) compounded along azimuth direction in this experiment. Compounding along the elevation direction is not performed for the wire phantom because its benefit will not be evident for the wire spanning along the elevation direction; but the compounding along azimuth direction makes big improvement, as revealed by the simulated and measured images. The volumetric images are displayed at 20dB dynamic range. The simulation results are shown in Figure 4-20. It is done by simulating a line of ideal point scatterers in space to mimic the metal wire in real experiment. The vertical cross-sectional images of the wire phantom (point spread function) imaged by single plane-wave and compounded are visually compared in Figure 4-20(a) and (b). It can be seen that the 5-angle compounded image is of higher contrast and 3 The choice of 2 bursts of pulses is for good image axial resolution, as been discussed in Section 2.1. 78 better resolution. This is confirmed by the quantitative comparison in Figure 4-20(c) and (d), where the lateral point scatterer’s amplitudes are plotted. The compounded image has a finer -10dB lateral resolution (0.50mm compared to 0.58mm) and a lower side-lobe amplitude (less than -30dB compared to -12dB) than single planewave. The side-lobes can be more readily seen in Figure 4-20(e) and (f), where the horizontal cross-sectional images of the wire are shown. While single plane-wave transmit generates visible “fake” wires (i.e. the side-lobes) at the two sides of the main wire location, the compounded image has no side wires visible. Real imaging experiments on a metal wire phantom are also performed. The metal wire has a diameter of 0.48mm and is placed 7.5mm away from the transducer. The same pulsation and PWCC3D beam-formation is used to form the images. Similarly, single-angle vs. 5-angle compounding results are compared in Figure 4-21. The measured wire images show quality degradations due to the wire thickness, array element and circuit mismatches. But PWCC3D still demonstrates significant improvement for image resolution, where the -10dB resolution is improved by over 46% in this case (from 1.32mm to 0.71mm). The axial resolution is determined by the pulse frequency and number of bursts. Therefore, -10dB axial resolution is measured to be similar in two cases (0.39mm for single-angle vs. 0.36mm for 5-angle). Ring Phantom The wire phantom displays the benefit of PWCC3D only from the azimuth direction. Here a metal ring phantom is used to fully demonstrate the benefit of PWCC3D for a volumetric image. As shown in Figure 4-22, the ring is placed horizontally above the transducer surface with a vertical distance of 7.5mm. The transmit pulsation is 2 bursts of 8.33MHz pulses, the constant F-number is 1.75, and the rectangular window is used for both Tx and Rx apodization. The compounding employs 5 different Tx plane-wave steering angles (−6.7o , −3.3o , 0o , 3.3o , 6.7o ) in azimuth and elevation directions respectively, so that the ring image can be enhanced in all directions. In order to investigate closely the effect of coherent compounding in both azimuth and elevation directions, Figure 4-23 shows a comparison between different 79 5-angle in X Z(mm) Z(mm) Single-angle Wire Phantom X(mm) X(mm) (a) (b) Cross-sectional Image Side-lobe: -12dB Side-lobe: < -30dB Lateral -10dB resolution: 0.58mm Lateral -10dB resolution: 0.50mm (c) (d) Single-angle 5-angle in X Y(mm) Y(mm) Wire Phantom Cross-sectional Image X(mm) X(mm) (e) (f) Figure 4-20: Simulation results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional image produced from 5-angle coherent compounded plane-wave insonification; (c) lateral resolution plot from single plane-wave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal cross-sectional image from 5-angle plane-waves. 80 5-angle in X Z(mm) Z(mm) Single-angle Wire Phantom X(mm) X(mm) (a) (b) Cross-sectional Image Lateral -10dB resolution: 1.32mm Lateral -10dB resolution: 0.71mm (c) (d) Single-angle 5-angle in X Y(mm) Y(mm) Wire Phantom Cross-sectional Image X(mm) X(mm) (e) (f) Figure 4-21: Measurement results of a wire phantom: (a) vertical cross-sectional image produced from single angle plane-wave insonification; (b) vertical cross-sectional image produced from 5-angle coherent compounded plane-wave insonification; (c) lateral resolution plot from single plane-wave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizontal cross-sectional image from 5-angle plane-waves. 81 Ring Phantom Z/ depth Crosssectional Image Y/ elevation X/ azimuth Ring Phantom CMUT Single plane-wave, avg 10x (a) Crosssectional Image CMUT 10 Tx Angles in X & Y (-6.7o,-3.3o,0o,3.3o,6.7o) (b) Figure 4-22: The setup of the ring phantom imaging experiment using PWCC3D algorithm: (a) a single plane-wave is transmitted to image the phantom; (b) five different Tx angles are used along the azimuth direction and another five Tx angles along the elevation direction to image the phantom with PWCC3D. compounding schemes. Comparing Figure 4-23(a) and (b), the 5-angle compounding in X direction is able to suppress the side-lobes along the azimuth much more than the single-angle plane-wave. The most noticeable difference is that the artifact in the blue-cycle region in Figure 4-23(b) is much less evident than Figure 4-23(a). However, the side-lobes along the elevation are not suppressed, as can be seen from the red-cycle region in Figure 4-23(b), which looks almost the same as Figure 4-23(a). Similarly, comparing Figure 4-23(a) and (c), the 5-angle compounding in Y direction is able to suppress the side-lobes along the elevation much more than the single-angle plane-wave. The artifact along elevation in the blue-cycle region in Figure 4-23(c) is suppressed, but the side-lobes along the azimuth in the red-cycle region remains and looks almost the same as Figure 4-23(a). When the compounding on both azimuth and elevation directions are combined, as in Figure 4-23(d), the artifacts along both directions are suppressed. The image quality is most enhanced compared to Figure 4-23(a). Figure 4-24 quantifies the performance improvement of PWCC3D for the ring images. The vertical cross-sectional images are used to show the side-lobe amplitudes of the ring images from the single-angle plane-wave insonification and the 10-angle X & Y steered plane-waves. As can be seen, the side-lobes in the center of the ring is improved from -7.3dB to -13.3dB, leading to a 6dB improvement with 10-angle 82 Single-angle 5-angle in X Y(mm) Y(mm) Side-lobe suppressed Ring Phantom X(mm) Side-lobe remains (a) Cross-sectional Image 5-angle in Y X(mm) (b) 5-angle X + 5-angle Y Side-lobe suppressed Y(mm) Y(mm) Side-lobe remains X(mm) X(mm) (c) (d) Figure 4-23: Measured horizontal cross-sectional images of a ring phantom: (a) singleangle Tx plane-wave; (b) 5-angle Tx plane-wave compounding along azimuth direction; (c) 5-angle Tx plane-wave compounding along elevation direction; (d) compounding across all 5-angle azimuth and 5-angle elevation directions. coherent compounding. A 10kHz PRF is used for the 10-angle compounding scheme in our experiments. According to (4.8) and (4.9), where (p + q) = 10, N = 16, the acquisition time for one volumetric image is 16ms and the volume rate reaches 62.5 volume/s. As mentioned in Section 4.2.2, the volume rate will decrease linearly with respect to the increase in the array size, or number of plane-wave angles, to trade off for a better image quality. Cyst Phantom Simulation on a 64x64 Array As an extrapolation of our current hardware setup, a more complex setup is simulated in Field II to investigate how technology scaling can push PWCC3D performance further. A hypothetical 64x64 2D array with an element pitch of 250µm is used in the 83 Z(mm) 5-angle X + 5-angle Y Z(mm) Single-angle Ring Phantom X(mm) X(mm) (a) (b) Cross-sectional Image Side-lobe at center: -7.3dB (c) Side-lobe at center: -13.3dB (d) Figure 4-24: Measured vertical cross-sectional images of a ring phantom: (a) singleangle Tx plane-wave; (b) compounding across all 5-angle azimuth and 5-angle elevation directions; (c) lateral resolution plot of ring image from single-angle Tx planewave; (d) lateral resolution plot of ring image from 5-angle X and 5-angle Y planewaves. simulation to provide a bigger aperture. A cyst phantom spanning between the depth of 20mm to 50mm is initiated as the imaging target, which serves as a benchmark for evaluating speckle reduction performance of PWCC3D. There are three cysts located at (−3, 0, 25)mm, (0, 0, 35)mm, (3, 0, 45)mm, respectively. Each cyst size is 6mm in diameter. The surrounding of the cysts are randomly spaced point scatterers mimicking normal tissues. The transmit pulsation is 2 bursts of 5MHz pulses, the constant F-number is 1.75, and the rectangular window is used for both Tx and Rx apodization. The XZ cross-sectional images are shown. Figure 4-25(a) shows the single-angle plane-wave image while Figure 4-25(b) shows a compounded one, in which 5 angles along azimuth (−4o , −2o , 0o , 2o , 4o ) and 5 angles along elevation (−4o , −2o , 0o , 2o , 4o ) 84 Figure 4-25: Simulated XZ cross-sectional images showing the three cysts in one slice image: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. are used. The associated simulation setup is illustrated in Figure 4-25(c). The comparison shows a much improved image contrast by utilizing PWCC3D. The YZ cross-sectional images show individual cyst at different depth. Figures 4-26, 4-27, 4-28 show slice image comparisons of the three cysts. Finally, the volume rate of 10-angle compounding PWCC3D implemented on a 64x64 array with the Column-Row-Parallel architecture would be 15.6 volume/s, assuming a 10kHz PRF, according to (4.9). Compared to the 10-angle compounding on a 16x16 array in Section 4.2.3, the 64x64 system frame rate is exactly decreased by 4x. But the image resolution and contrast become better by using a bigger array. 4.2.4 Discussion The proposed PWCC3D algorithm on the Column-Row-Parallel architecture is a suitable solution for high volume rate 3D ultrasonic imaging applications. The volume rate can be traded off with image quality easily. More Tx angles lead to better image 85 Figure 4-26: Simulated YZ cross-sectional images showing the cyst at (−3, 0, 25)mm: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. Figure 4-27: Simulated YZ cross-sectional images showing the cyst at (0, 0, 35)mm: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. 86 Figure 4-28: Simulated YZ cross-sectional images showing the cyst at (3, 0, 45)mm: (a) image generated from single-angle plane-wave; (b) image generated from 5 azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectional image location in 3D space. resolution and contrast, while the acquisition time would increase linearly and the volume rate would be reduced. This is a flexible feature that allows PWCC3D to be adaptive to a wide range of ultrasonic applications. PWCC3D is also flexible for data processing as a software beamformer, where volumetric images of different spatial resolution and/or at different regions can be generated with the same acquired data. The software beamformer capability together with the flexibility of choosing different plane-wave angles, provide rich knobs that can enable autonomous ultrasonic imaging devices. Such imaging device could dynamically reconfigure the AFE and the beamformer, so that the data acquisition and processing are performed with complexity that is suitable for the target scene. System-level power saving and performance improvement can be optimized under this frame work [7]. The cyst phantom simulation in Section 4.2.3 has shown how scaling brings improved image resolution and contrast performance. Thanks to the Column-Row87 Parallel architecture, the 3D imaging system’s volume rate only decreases inversely proportional to N rather than N 2 , and the interconnection complexity of the front-end is only as high as a 1D array in a 2D imaging front-end. The Column-Row-Parallel architecture and PWCC algorithm can also be applied onto a 2D array with the size of NxM, in which M is smaller than N (for example, 64x4). This type of “narrow” 2D array is sometimes called a 1.5D array, in that the size usually scales at the N (azimuth) dimension while the M (elevation) dimension is somewhat fixed. By operating the array row-by-row (each row is N elements), it only takes M transmit-receive repetitions to collect data for one plane-wave angle. Equations (4.8) and (4.9) can be revised to (4.10) and (4.11). Acquisition T ime = V olume Rate = M × (p + q) ∝ Constant, P RF P RF ∝ Constant. M × (p + q) (4.10) (4.11) Because M is a relatively fixed number in size, the volume rate scaling becomes a constant as the array size increases in the N dimension. This is the same scaling trend for a 2D imaging system with a 1D array. The added array elements along the N (azimuth) dimension contribute to the improved lateral resolution without degrading the frame rate. Furthermore, the plane-wave coherent compounding along the M (elevation) dimension effectively realizes the elevational beam-focusing, which is traditionally implemented with a physical acoustic lens or electrical analog delay lines on a 1D array. 4.3 Interleaved Checker Board Tx Apertures with I&Q Excitations for HD2 Reduction in Ultrasonic Harmonic Imaging Using the Column-Row-Parallel architecture, a new way to reduce Tx second harmonic distortion (HD2) for ultrasonic tissue harmonic imaging (THI) is proposed. 88 It utilizes simultaneous I&Q excitations on two interleaved checker board Tx apertures, in order to mitigate HD2 from both transducers and circuits with any arbitrary pulse shapes. In particular, CMUT nonlinearity due to its electrostatic mechanism is suppressed. 4.3.1 THI Principle and Previous Methods Tissue harmonic imaging is a widely used imaging mode [18–22]. The ultrasound system sends out bursts of ultrasound at the fundamental frequency. Human tissue or contrast agents (micro-bubbles injected into human body) could have nonlinear reaction to the ultrasonic wave. Specifically, when a sinusoidal pressure wave propagates through the medium, the tissue or bubble would contract at the positive pressure (the first half of the sine wave) and expand at the negative pressure (the second half of the sine wave). The contraction and expansion cause slightly different propagation speed for the ultrasonic wave, thus distorting the wave in an asymmetric way, which generates weak second harmonic component. Instead of tuning to the fundamental reflected ultrasonic echoes as in the conventional ultrasound, THI mode looks for that weak second harmonic echo signal, while filtering out the fundamental component. The benefit is that ultrasonic beamformation using the harmonic signal has a narrower beamwidth and lower side-lobes, THI gains improved spatial resolution for better visualization, and improved contrast resolution for better demonstration of subtle differences. However, the harmonic signal also tends to be weaker for mainly two reasons. First, the nonlinear generation of the second harmonic from the tissue is not strong to begin with. Contrast agents such as micro-bubbles can be injected into human body to increase the harmonic generation, but it is still weak compared to the fundamental component. Second, the tissue medium presents a frequency-dependent attenuation for ultrasound propagation, ultrasonic wave at a higher frequency sees more attenuation during the propagation [12]. Empirically the attenuation coefficient is about 1-2dB/MHz/cm. If a 5MHz fundamental signal is used and a 10MHz second harmonic component is generated, the propagation attenuation per centimeter for the 89 second harmonic is 5-10dB more than the fundamental. As a result, the fundamental signal needs to be filtered or suppressed at the receive side, while at the transmit side, the second harmonic generation from the transducer needs to be kept at minimum (< −30dBc), so that only the harmonic signal produced by the human body is received in the end. Compared to traditional PZT transducers, CMUT is at a disadvantage in THI mode because of the nonlinear transmit property from its electrostatic actuation mechanism [19,20], where the actuation force (hence the generated acoustic pressure) is proportional to square of the electrical pulse excitation V (t). Excessive HD2 is generated during transmit, making CMUT difficult to be used for harmonic imaging. Previously, methods to reduce the transmit HD2 generation in CMUT have been explored. For example, work in [20] focused on pre-distorting the electrical excitation signal’s pulse shape, such that the frequency content of the actual transmitted acoustic pulse is HD2 free. The method is heavily dependent on detailed CMUT transmit properties and its bias voltage, requiring complicated and frequent calibration. Subharmonic driving is also tried in [19], but because CMUT has a DC bias voltage, the emitted acoustic pulse still contains the sub-harmonic frequency content which becomes an additional interference. Instead of working on individual elements, [21, 22, 72] try to cancel the harmonics at the transducer-level. In [21, 22], a technique called second harmonic inversion is used. Pulse shape of I(t) is first transmitted. On the next repetition, a delayed pulse shape Q(t) is used to transmit again; Q(t) is a quarter cycle delayed with respect to I(t). At fundamental frequency, I(t) and Q(t) are out of phase by π/2; while at second harmonic frequency, the components from I(t) and Q(t) have a phase difference of π. As a result, the HD2 from transmitter can be cancelled by synthetically adding two consecutive received echoes. The scheme is clever, but its drawback is that the synthetic combining reduces the effective PRF of the system to half, and that motion artifact in the system could lead to leakage in cancellation. The work in [72] tries to cancel the Tx HD2 in one shot on a 1D array for 2D imaging. Simulation has been performed, but not real measurements. The elements in 90 I I I Interleaved Checker Board Tx Aperture Column Select Logic Per-element Bit: Bank2 Column Select Logic Row Select Logic Row Select Logic Row Select Logic Per-element Bit: Bank1 Column Select Logic Q QQ I/Q I/Q Figure 4-29: Implementation of checker board Tx aperture on the proposed architecture. a 1D array is arranged in two groups. Each group contains every other elements from the array and elements from two groups interleave with each other. The two groups are driven by I(t) and Q(t) pulses respectively. Because I(t) and Q(t) pulse emissions happen at the same time, the resulting acoustic pressure field is a linear superposition of the two groups, in which the second harmonic component is suppressed. This method is not subject to motion from either the transducer or the scene. However, care needs to be taken for the grating lobes. This is because the two neighboring elements have to be driven with the correlated pulses, the effective pitch of the 1D array becomes twice as big as its physical element pitch. 4.3.2 Tx HD2 Suppression on the Column-Row-Parallel Architecture Extending the interleaved configuration into a 2D array, the Tx HD2 cancellation can be done for 3D imaging. In Figure 4-29, two banks of Tx per-element enable bits4 , Bank1 in red and Bank2 in yellow, are pre-programmed into checker board patterns. The elements of the two banks interleave with each other. The pulser gate drivers at the column side are time-multiplexed to drive both Bank1 and Bank2 with I(t) and 4 The functionality of the per-element enable bits is mentioned in Section 3.3 and will be described in detail in Section 5.1. 91 Q(t) simultaneously, which are out of phase by a quarter pulse cycle (see Equation (4.12)). In the mid- to far-field region, the ultrasound pressure from the two banks can cancel in second harmonic using the I(t) and Q(t) driving scheme. Q (t) = I t − T . 4 (4.12) This I&Q combination on the two interleaved checker board Tx apertures for HD2 reduction is a broadband technique that works for any arbitrary pulse shape. A brief mathematical explanation can show the reason. The arbitrary pulse shape I(t) with a period of T can be represented by its Fourier series in Equation (4.13), where V0 , V1 , V2 , ... are its Fourier coefficients and w = 2π/T : I (t) = V0 + V1 ejwt + V2 ej2wt + V3 ej3wt + ... (4.13) The delayed version pulse shape Q(t) is represented by: Q (t) = I t − T 4 = V0 + V1 ejwt−jπ/2 + V2 ej2wt−jπ + V3 ej3wt−j3π/2 + ... (4.14) The pulse shape is provided electrically, and goes through an electrical to mechanical transduction. The process is modelled as a combination of both linear and nonlinear processes in Equation (4.15). Because only second harmonic is of concern in ultrasound systems, up to second-order nonlinearity is modelled for the investigation. pI (t) = a + b · I (t) + c · I(t)2 pQ (t) = a + b · Q (t) + c · Q(t)2 (4.15) Looking at the emitted pressure signals pI (t) and pQ (t), Equation (4.16) shows only their second harmonic component: pI (t) |HD2 = b · V2 + c · V1 2 + 2c · V0 V2 · ej2wt pQ (t) |HD2 = b · V2 + c · V1 2 + 2c · V0 V2 · ej2wt−jπ = −pI (t) |HD2 92 (4.16) Equation (4.16) indicates that the second harmonic component generated from I(t) and Q(t) excitations are out of phase by π, and it holds for any pulse shape5 . The fundamental component of pI (t) and pQ (t) are out of phase by π/2, therefore the combined fundamental intensity is 3dB lower compared to a single full-aperture excitation. Furthermore, because the nonlinear model is a general model, not only CMUT nonlinearity, but other sources of nonlinearity can be cancelled using this method too. For example, circuit mismatches tend to introduce asymmetry in pulse shape between the rising and falling edges. The simultaneous I&Q excitations on the interleaved checker board apertures can still be effective in improving the HD2 caused by the circuit non-ideality. In the end, the checker board patterns require that the element pitch be smaller or approximately equal to the ultrasound wavelength (λ = c/f ), so that the grating lobes are kept at minimum and the HD2 cancellation in space is close to perfect. 4.3.3 Experimental Results Both simulation and measurement are carried out to verify that the combination of I&Q excitations cancels acoustic HD2 while the “useful” fundamental intensity is only 3dB less than conventional full-aperture excitation. The simulation assumes a 10x10 array with a pitch of 250µm. 20 cycles of 4.2MHz pulses are used as the stimulating pulse shape, which go through a nonlinear transform modelled by Equation (4.15). The pulse shape is 3-level with a peak-to-peak amplitude of 30Vpp, in order to mimic the real measurement. Other pulse shapes, such as 2-level pulses or sinusoid with Gaussian envelope, or different number of pulse cycles (between 2 to 20), are also tried to verify that the cancellation works for arbitrary pulse shape. For conventional excitation, all elements are driven with the same pulse shape I(t). For I&Q method, the two interleaved banks of elements are driven by I(t) and the delayed Q(t), respectively. Figure 4-30 shows the Field II simulation of the I&Q method compared to con5 It is interesting to mention that not only second harmonic, but 6th , 10th , 14th , etc. ((2 + 4 · k) , k = 0, 1, 2, ...) are also out of phase by integer multiples of π in I(t) and Q(t) excitations. th 93 I&Q Conventional Spatial Pressure Field Spatial Pressure Field Z(mm) (fundamental) Z(mm) f0 mm Y( X(mm ) ) X(mm m Y(m ) (a) ) (b) Spatial Pressure Field Spatial Pressure Field Z(mm) (HD2) Z(mm) 2*f0 X(mm Y(m ) m) X(mm (c) Y(m ) m) (d) Figure 4-30: Simulation comparison between the conventional and I&Q methods: (a) fundamental component spatial intensity for conventional; (b) fundamental component spatial intensity for I&Q; (c) HD2 spatial intensity for conventional; (d) HD2 spatial intensity for I&Q. ventional excitation. The top two sub-figures (a) and (b) compare the fundamental component of the emitted pressure field, in which the conventional method produces a field with 3dB higher intensity than I&Q. The bottom two sub-figures (c) and (d) compare the HD2 component of the emitted pressure field, which clearly shows a large suppression from I&Q method. The results of two spatial locations are listed in Table 4.1, indicating a 20dB reduction in HD2 from I&Q method. Acoustic measurements are also performed to verify the proposed method. Due to the fact that there are a few non-functional CMUT elements in the 16x16 array, a 94 I&Q vs. Conventional (Simulation) “A” (0, 0, 30.3)mm “B” (0, 0, 10.2)mm HD2 Reduction -19.7dB -19.7dB Fundamental Loss -3.0dB (the whole space) Table 4.1: Simulated HD2 improvement of the I&Q method. I&Q vs. Conventional (Measurement) “A” (0, 0, 30.3)mm “B” (0, 0, 10.2)mm HD2 Reduction -21.7dB -22.1dB Fundamental Loss -3.4dB -3.2dB Table 4.2: Measured HD2 improvement of the I&Q method. 10x10 sub-array is chosen to carry out the comparison6 . The ASIC Tx channels are programmed to excite CMUT with either I&Q or conventional full-aperture schemes, using 3-level 30Vpp pulses7 . Mounting a hydrophone on the 3D translation stage, the emitted ultrasonic pressure wave is detected and shown on a oscilloscope. An FFT shows the frequency content. At a given far-field spatial location (i.e. where the hydrophone tip is located), the pressure intensity generated from I&Q and the conventional excitations are compared. The measured results at the same spatial locations as in simulation (Table 4.1) are summarized in Table 4.2, which confirms that the I&Q method has 3dB less fundamental component and over 20dB less second harmonic component, similar to simulation results shown in Figure 4-30 and Table 4.1. Moreover, theory predicts cancellation of all (2 + 4 · k)th , k = 0, 1, 2, ... harmonics. In our measurement, the reductions in the 6th and 10th components are observed on the oscilloscope, while the 14th harmonic is too weak to see. To sum up, the I&Q method could be used to reduce the second harmonic generation in Tx for a 3D imaging system. The method works for arbitrary pulse shapes and works equally well for nonlinearity generated from both transducer and circuit. In particular, it mitigates the nonlinear problem in CMUT with its electrostatic actuation, and it could suppress the harmonic from the pulser’s rising and falling edge asymmetry. 6 More details about the non-functional elements and the fault-tolerant circuit design can be found in Section 5.5. 7 Different pulse shapes are also tried to verify that the method works for arbitrary pulse shapes. 95 4.4 Annular Ring Apertures for Forward-looking Imaging Applications Forward-looking ultrasonic imaging systems can be used for intravascular (within the blood vessel) and intracardiac (within the heart) visualizations. The miniaturized imaging system is mounted onto the tip of a catheter, which provides minimally invasive diagnosis, interventions or treatments in medical procedures [73–76]. Currently the more commonly used ultrasound systems for intravascular ultrasound (IVUS) and intracardiac echocardiography (ICE) are side-looking ones, while forward-looking ones are gaining more popularity because they offers complimentary information. Annular ring apertures are suitable to realize forward-looking imaging. Although dedicated annular ring arrays are available by custom fabrication [75, 76], a generalpurpose 2D array with the proposed Column-Row-Parallel architecture can achieve similar results [77, 78]. The full 2D array provides even more flexibility, since more rings can be formed within the regular 2D aperture. 4.4.1 Annular Ring Apertures on Column-Row-Parallel Architecture As already been shown in Chapter 3, the 2D array with the Column-Row-Parallel architecture can form a circular aperture or an annular ring aperture by programming the per-element bits under each element. For annular ring imaging, a circular Tx aperture is used for transmit and four concentric annular rings with different diameters can be activated as Rx apertures, shown in Figure 4-31(a). The Tx elements are supplied with the same delay value “D” as in Figure 4-31(b), so that the whole circular aperture is driven in-phase and emits a broad ultrasound beam. The Rx elements’ analog outputs are also combined in parallel along the column, and by digitally summing the weighted waveforms from all column buffers, one echo waveform will be collected for each annular ring (Figure 4-31(c)-(f)). The weight for each column is the number of active elements along the column. Equation (4.17) describes 96 Column Select Logic Rx Path D Digital Waveform Row Select Logic Tx Path (a) DD Column Select Logic Row Select Logic Column Select Logic Row Select Logic D (b) s1,0(t) s1,15(t) s2,2(t) s2,13(t) S1(t) S2(t) (c) (d) Column Select Logic Column Select Logic Row Select Logic Weighted Digital Summation Row Select Logic Weighted Digital Summation s3,4(t) s3,11(t) s4,6(t) s4,9(t) Weighted Digital Summation Weighted Digital Summation S3(t) S4(t) (e) (f) Figure 4-31: Annular ring mode imaging implemented in Column-Row-Parallel architecture: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposed architecture, all active elements are driven in-phase; (c) Rx aperture with the biggest ring shape, all active elements’ analog outputs are combined; (d) Rx aperture with the 2nd ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture with the smallest ring shape. 97 z z τ1(z) R1 D D S1(t) τ4(z) R3 Rx Tx Rx Tx z τ3(z) R2 Rx Tx z τ2(z) D S2(t) S3(t) R4 Rx Tx D S4(t) Dynamic Beamforming: z Beamformed Axial Line Figure 4-32: Annular ring mode dynamic beam-formation scheme. the function of the weighted digital summation block: Sm (t) = 15 X nk · sm,k (t), m = 1, 2, 3, 4. (4.17) k=0 Take the smallest ring in Figure 4-31(f) as an example, number of active elements for columns s6 (t) ∼ s9 (t) are 4, 2, 2, 4, respectively. Therefore the weighted summation should be: S4 (t) = 4 × s4,6 (t) + 2 × s4,7 (t) + 2 × s4,8 (t) + 4 × s4,9 (t) . (4.18) The four Rx annular rings are activated over four consecutive Tx transmits as shown in Figure 4-32. The digital waveforms from the four Rx rings can then be dynamically beamformed to generate a synthetic A-scan line along the axial axis of the rings. Because all elements on the same ring have the same time-of-flight to a point on the axial axis, each ring has a natural focus effect along the axial axis. The delay value for a spatial point located at depth z away from the transducer surface, for the ring with a radius of Rm , is calculated as: τm (z) = q z 2 + Rm 2 /c, 98 m = 1, 2, 3, 4. (4.19) The beamformed image line along the axial axis thus becomes: SBF (z, t) = 4 X Sm (t − τm (z)). (4.20) m=1 The circular and ring apertures are translated horizontally, so that different axial A-scan lines can be collected to form volumetric images. Examples of the translated Tx and Rx apertures are shown in Figure 4-33, some edge effect will affect the scan line intensity slightly, but not significantly. 4.4.2 Annular Ring Imaging Results The forward-looking programmable annular ring array can form volumetric images by moving the circular Tx and annular Rx apertures in the 2D array, so that multiple axial lines can be acquired. Both simulation and measurement of a wire phantom are performed, similar to the PWCC3D experiments. The wire phantom is 0.48mm in diameter and is placed at 10.5mm away from the transducer surface, horizontal to the surface. Transmit pulsation is 2 bursts of 8.33MHz pulses. The volumetric images are displayed at 20dB dynamic range. Totally 81 circular Tx apertures are swept through the 2D array, acquiring data for 81 axial lines. With 4 beamforming annular rings at each Tx aperture location, totally 324 transmit-receive repetitions are needed to acquire a full set of volumetric data. Similar to PWCC3D equations (4.8) and (4.9), the acquisition time and volume rate for the annular ring imaging system can be calculated in (4.21) and (4.22). Acquisition T ime = V olume Rate = (# Axial Lines) × (# Annular Rings) , P RF P RF . (# Axial Lines) × (# Annular Rings) (4.21) (4.22) The acquisition time again scales linearly with respect to the number of axial lines in the volumetric image, or number of annular rings used for beam-formation. The volumetric images from simulation and measurement are shown in Figure 4-34. The cross-sectional images display a clear wire in the space and at the same 99 Column Select Logic Rx Path Row Select Logic Tx Path Digital Waveform D D DD Column Select Logic Column Select Logic Row Select Logic (b) Row Select Logic (a) s1,0(t) s1,11(t) s2,0(t) Weighted Digital Summation s2,9(t) Weighted Digital Summation (d) Column Select Logic Column Select Logic Row Select Logic S2(t) (c) Row Select Logic S1(t) s3,0(t) s3,7(t) s4,2(t) s4,5(t) Weighted Digital Summation Weighted Digital Summation S3(t) S4(t) (e) (f) Figure 4-33: Annular ring configuration example, off-center: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposed architecture; (c) Rx aperture with the biggest ring shape; (d) Rx aperture with the 2nd ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture with the smallest ring shape. 100 time they provide the evaluation for the performance. The measured -10dB lateral resolution (from XZ slice, Figure 4-34(b)) is 1.19mm and the -10dB axial resolution (from YZ slice, Figure 4-34(d)) is 0.32mm. Both numbers are close to the performance measured from PWCC3D images with single-angle plane-wave insonification in Section 4.2. Using a 10kHz PRF, the volume rate is 30.9 volume/s, according to (4.22). Lastly, this section aims to demonstrate the capability of the Column-Row-Parallel architecture in forward-looking ultrasonic imaging applications. The 16x16 array size limits the range covered in the space. As the manufacturing technology improves, a bigger array could lead to a much better image quality. Furthermore, the number of annular rings can also be increased, but at the cost of a proportionally lower volume rate (the volume rate is linearly reduced with respect to the increase of the number of annular rings used). 4.5 Summary This chapter has presented several 3D medical ultrasonic imaging applications for the Column-Row-Parallel ASIC architecture. The 16x16 CMUT-PCB-ASIC imaging front-end is assembled to demonstrate the 3D imaging algorithms such as the plane-wave coherent compounding and the annular ring aperture imaging. The same architecture can be programmed to implement different imaging algorithms. Both schemes are suitable for high volume rate imaging with decent quality. Moreover, the architecture enables an interleaved checker board pattern with I&Q excitations for Tx HD2 reduction. The scheme is promising to improve the intrinsic nonlinear property of CMUT, facilitating the ultrasonic harmonic imaging mode. 101 Measurement Simulation Z(mm) Z(mm) Wire Phantom Cross-sectional Image X(mm) X(mm) (a) (b) Measurement Simulation Z(mm) Z(mm) Wire Phantom Cross-sectional Image Y(mm) Y(mm) (c) (d) Simulation Measurement Y(mm) Y(mm) Wire Phantom Cross-sectional Image X(mm) X(mm) (e) (f) Figure 4-34: Cross-section slices of the wire phantom 3D images from simulation and measurement: (a) simulated XZ slice; (b) measured XZ slice; (c) simulated YZ slice; (d) measured YZ slice; (e) simulated XY slice; (f) measured XY slice. 102 Chapter 5 Design of the 16x16 Ultrasonic Transceiver Array ASIC with Column-Row-Parallel Architecture The transistor-level design of the 16x16 ultrasonic transceiver ASIC is described in this chapter. It follows the high-level description in Chapter 3, and adds implementation details to Chapter 4. The block-level circuit design [5, 6] is optimized to interface to CMUT transducers. However, the architecture-level design is a flexible and scalable analog front-end solution for 2D ultrasonic arrays in general, applicable to different technologies, such as CMUTs, PMUTs, and bulk PZTs. This chapter will cover both architecture-level and block-level circuit designs. 5.1 High-Level Description of the Ultrasonic Imaging Transceiver Circuits and the Architecture Logic Implementation This section describes the digital implementation of the Column-Row-Parallel architecture. The design aims to realize the rich functionality as presented in Chapter 3 103 and 4, while achieving linear scaling for the programming time. The control logic attains a proper separation of functionality, such that the control from the sides is more often used to take advantage of its fast programming time, while the control within each element provides more diverse system functionality. The overview of the proposed Column-Row-Parallel architecture has been given in Section 3.3. Figure 3-4 shows the array structure and Figure 3-5 shows the perelement circuit block diagram. In addition to these main blocks, more circuit details will be discussed here. For convenience, the block diagram of one transceiver channel in the 2D array presented in Figure 3-5 is shown again in Figure 5-1. Each CMUT element in the 2D array is DC-biased with the shared RC network provided off-chip. Resistor Rb and capacitor Cb filter out noise from the high voltage supply and provide an AC ground for the transducer. The DC bias voltage V BIAS applied on the CMUT is between 20-50V. The transceiver channel includes a 30Vpp high voltage pulser in the transmit (Tx) path, which drives the ultrasonic transducer to emit acoustic energy. The emitted ultrasonic wave travels through the medium and is reflected whenever it hits medium boundaries with mechanical impedance mismatch. The reflected echoes are transformed by the CMUT element into a weak electrical signal. A low noise amplifier (LNA) in the receive (Rx) path amplifies the weak signal to the output. During transmission, the Rx switch is turned off to prevent the high voltage transients from breaking the LNA implemented with low voltage transistors. The CMUT device used in this work has a pass-band of 3-10MHz. The frequency range, power consumption, noise, and linearity performance of the Tx and Rx circuits are designed and optimized for this CMUT device’s parameters. After collecting multiple channels’ outputs from one or several transmissions, ultrasonic images can be generated, as been shown in Chapter 4. Medical ultrasound systems use beam-formation to improve the image quality. Tx beam-formation is realized by controlling and applying different delays across the Tx channels. Similarly, the received signals are digitized and processed by an off-chip Rx beamformer. The digital control inside each transceiver channel has been described in Section 3.3. The combination of Row Select Signals, Column Select Signals and P er − 104 Tc[ i ] Rc[ i ] Transceiver [ i, j ] Tr[ j ] Rr[ j ] T_en R_en R Row Gate Driver[ j ] b T b Row BUF[ j ] Column Gate Driver[ i ] Tc[ i ] T_en Tr[ j ] T_en Column BUF[ i ] Rc[ i ]+Rr[ j ] R_en Rc[ i ] R_en Rr[ j ] R_en Figure 5-1: A re-plot of Figure 3-5 in Section 3.3. (a) The block-level implementation of one transceiver channel and (b) the per-element logic implementation. Column and row select logic is implemented with shift registers that can be reprogrammed in “N ” time (implementation detail will be shown in Figure 5-2). 105 T_bank1 T_bank2 T_bankSel 0 <15> 0 0 T_en1 T_en2 R_en1 R_en2 0 <15> Tr[0:15] (0: row-para) T[0:15] Tmode(1: col-para) 01 01 <0> <1> 01 R_en T_en 1 P_bankSel 1 <0> <1> 0 1 0 1 Tc[0:15] (b) R_bank1 R_bank2 R_bankSel (a) 0 <1> <1> <15> 01 01 <15> Rr[0:15] (0: row-para) R[0:15] Rmode(1: col-para) 0 <0> 01 <0> 0 1 0 1 Rc[0:15] (c) Figure 5-2: Circuit implementation for the logic control: (a) multiplexing for perelement enable bits; (b) Tx row / column selection logic; (c) Rx row / column selection logic. element Enable Signals determine whether the channel is connected to column side, row side, or turned off. The per-element logic implementation has been shown in Figure 5-1(b). The transmitter and receiver are controlled independently and they are time-multiplexed during normal operations. Both control signals on the sides and the per-element controls are implemented with shift registers (SR’s), which can be programmed serially. Each set of control is realized with two SR banks multiplexed, as shown in Figure 5-2. The P bankSel (Figure 5-2(a)), T bankSel (Figure 5-2(b)), and R bankSel (Figure 5-2(c)) control the SR bank selection for per-element enable bits (T en and R en), Tx row / column selection (T [0 : 15]) and Rx row / column selection (R[0 : 15]) respectively. There are several benefits associated with this implementation. First, while one bank is being programmed, the other bank is in use to avoid interrupt of operations. Second, the two banks can be pre-programmed, and by quickly switching between the two banks using the bankSel signal, they can take turns to control the ASIC. Additionally, for Tx and Rx side controls in Figure 5-2(b) and (c), they are further 106 divided into Row Select Signals (T r[0 : 15] and Rr[0 : 15]) and Column Select Signals (T c[0 : 15] and Rc[0 : 15]). Only one side can be active at any given time. Taking Tx control as an example, the multiplexed SR outputs T [0 : 15] are forked and gated by a pair of multiplexers controlled by T mode to generate T x Row Select Signals (T r[0 : 15]) and T x Column Select Signals (T c[0 : 15]), the logic ensures that while one side is activated, the other will remain all “0”. As been briefly mentioned previously, this partition of controls provide flexibility and scalability. The side controls are programmed within “N ” time, which can be easily reprogrammed between consecutive ultrasound transmits. The example use of this control style is the row-by-row receive implemented in PWCC3D algorithms in Section 4.2. The per-element controls provide maximum flexibility despite the fact that they are programmed with a longer time (“N 2 ”), as they are snake-chained through all 2D array elements. They can be less often changed to provide mode switch. Examples include the annular ring imaging experiments in Section 4.4, and the selective disabling of non-functional elements as will be described in Section 5.5, which is a critical fault-tolerant feature for analog front-end circuits working with MEMS devices. Lastly, alternating two pre-programmed SR banks could realize the fast swapping between two ultrasound aperture patterns. The simultaneous I&Q excitations to the interleaved checker board patterns implemented by switching between two per-element SR banks in Section 4.3 is a perfect example. 5.2 Tx Circuit Design This section describes the design of the transmitter path. The block-level transmitter circuit design will be introduced first [5, 6], which is optimized to drive CMUT elements. A multi-level pulse shaping technique with charge recycling is proposed to boost the efficiency of the transmitter with a CMUT element load. The design is highly scalable and compact, requiring minimum off-chip components. Next, design issues for making a 2D array of transmitters will be described, which is more general and applies to many other types of ultrasonic transducers. High voltage pass-gate 107 transistors implement multiplexers, which realizes the programmable column / row addressing, and handles the parallelism of multiple transmit elements. 5.2.1 Multi-Level Pulsing for Efficient CMUT Driver For the transmitter, high voltage linear amplifiers are commonly used to drive the PZT loads to achieve good linearity and acceptable efficiency [79, 80]. To drive a CMUT load, however, linear amplifiers are not optimum. In addition to the amplifier power consumption, a considerable power loss becomes associated with charging and discharging the parasitic capacitance of the CMUT element [41], degrading the overall power efficiency of the transmitter stage. Furthermore, the linearity of the amplifier does not translate to good linearity performance of the transmitter stage, because the CMUT element distorts the amplifier’s output waveform through the nonlinear relationship between the electrical input signal and electrostatic force acting on the element’s membrane [20]. Resonant transmitters with inductors to cancel out the loading capacitance could boost the power efficiency [27]. However, bulky off-chip inductors of several micro-Henries are needed for every transmitting channel, to work with typical loads of 10-200pF per channel at the ultrasound operating frequency range of 1-20MHz [81–83], which is undesirable for compact integration. Alternatively, the multi-level pulsing technique, which was initially introduced for chip-to-chip interconnects [84], can be applied to reduce the power consumption on the capacitive load. Multi-level techniques have been used in PZT ultrasound drivers for pulse-shaping and harmonic suppression [81–83,85]. However, the power efficiency was not improved because charge recycling was not implemented between the multiple voltage levels. This section presents the advantage of the multi-level pulsing with charge recycling to improve the combined power efficiency of the CMUT transducer and transmitter. It also requires the least off-chip components, as will be seen in Section 5.2.2. The transmitter load model of a CMUT element is represented by a capacitor and resistor in parallel, as shown in Figure 5-3(a). The capacitor C is the parallelplate capacitance between the CMUT element’s membrane and the common node. 108 b b v(t) 1 pk p v(t) 1 pk p Figure 5-3: (a) The transmitter load model of a CMUT element used in this work. (b) An exemplary 2-level square wave pulse applied onto CMUT. (c) An exemplary 3-level pulse applied onto CMUT. The resistor R is the medium’s mechanical load at the CMUT surface, transformed to the electrical port [41]. The power dissipated by R, due to the electrical pulse’s fundamental frequency component, models the useful acoustic power delivered into the medium. The power dissipated while charging and discharging C (dynamic power) does not contribute to the acoustic output and thus is wasted. The CMUT transducer used in this work is a 16x16 2D array. Each CMUT element has a size of 250µm × 250µm and is modelled as 2pF ||1M Ω [41]. The Tx efficiency is defined as the ratio between the useful acoustic power and the total power dissipated. It models the combined efficiency of CMUT and the ultrasonic pulser together, by capturing both the power loss in the pulser circuitry and the dynamic power dissipated by charging and discharging the CMUT parasitic capacitance. To show how multi-level pulse-shaping increases Tx efficiency, first assume the conventional 2-level square wave pulses are used to drive a 2pF ||1M Ω load, as shown in Figure 5-3(b). The pulse magnitude is 30Vpp at a frequency of 3.3MHz. The 109 amplitude of the fundamental frequency component is the Fourier series of the periodic pulse shape, as described in (5.1): ! 2π 2 Z Tp v (t) · sin · t dt. V1 = Tp 0 Tp (5.1) The amplitude V1 is calculated to be 19.1V, or 13.5Vrms. Therefore, the power dissipated on the 1M Ω resistor, i.e., the transmitted ultrasonic power at fundamental frequency, is 0.182mW. Meanwhile, the dynamic power wasted on charging and discharging the capacitor C is calculated to be: CV 2 f = 6mW. An N-level pulser, using (N − 1) regulated voltage sources to charge and discharge the capacitor in a stepwise fashion, reduces the wasted dynamic power to CV 2 f /(N − 1) [84]. The power saving comes from the charge recycling mechanism during the discharge operation, which is enabled by the regulated voltage supplies1 . Instead of discarding all the capacitor charge CV to ground as in the square wave case, a charge packet of CV /(N − 1) is recycled back to the power supply when the capacitor is switched from one voltage level to the next lower one. As many as (N − 2) charge packets of CV /(N − 1) are recycled until the last packet is dumped to ground. As a result, the dynamic power is reduced by a factor of (N − 1). At the same time, the magnitude of the fundamental component is only decreased slightly following (5.1), leading to overall efficiency improvement. For example, Figure 5-3(c) shows 3-level pulses with 20ns middle voltage level steps, out of a 300ns period. Its fundamental frequency component amplitude is 18.7V, or 13.2Vrms. The useful power delivered is 0.174mW and the dynamic power is CV 2 f /2 = 3mW. A comparison to the square wave example reveals theoretically a 49% total power saving with only a 4.4% acoustic power reduction, or equivalently 88% more acoustic output power given the same total power dissipation. 1 Without regulated supplies which recycle charge, the dynamic power cannot be reduced even with multi-level pulsing, as is the case in [81–83]. 110 3-level Waveform Generation & Tx Beamforming Control Shared DC-DC Converter (off-chip capacitors) HVDD=30V 30V Ψ1 M8 VBIAS 3 15V Ψ1 Ψ2 M5 4 M4 Ψ2 M7 0.1uF CMUT Bias Circuitry (off-chip) Vo M2 M6 1MΩ M3 2 CMUT 0.1uF 0.1uF 0V M1 1 Figure 5-4: Circuit schematic of the four-channel 3-level pulsers with the middlevoltage generation (all transistors are high voltage devices). 5.2.2 3-Level Pulser Circuit Design The 3-level pulser is implemented as shown in Figure 5-4. The three pulse voltage levels are 30V (HVDD), 15V and 0V (GND). The 15V middle voltage is generated from a 2:1 parallel-series switched-capacitor DC-DC converter (M5-M8), which is shared between channels. The only off-chip components are two 0.1µF capacitors. Because of the charge recycling nature of the proposed 3-level pulser, and that the CMUT load (roughly 2pF per channel) is much smaller than 0.1µF , the converter can operate at a very low frequency (10-100Hz) to save power, consuming less than 1% of the total 256-channel pulsing power. 3-level pulse-shaping is implemented with four high voltage switches (M1-M4) in each channel. NMOS M1 and M2 are used for the transitions of 15→0V and 0→15V respectively, while PMOS M3 and M4 are used for the transitions of 30→15V and 15→30V respectively. The on-resistance of each transistor and the CMUT capacitance form a RC time constant that determines pulse voltage level settling. The transistors are sized wide enough to keep the RC time constant at around 3ns, so that the 10% 111 to 90% rise / fall time is 6.6ns. This is close to 1/20 of the pulse cycle typically used (3-10MHz pulses with pulse cycles of 100-333ns) to make sure the 3-level pulse shape is not excessively compromised by the settling edges. The relative timing differences between each channel’s gate control signals is digitally adjustable and effectively implements the Tx beamforming. To reduce number of I/O ports needed for pulser gate control, a non-overlapping 2-to-4 line decoder is used for each channel with 2 lines of low voltage control inputs (Ain and Bin ) supplied off-chip from a FPGA running at 100MHz. As shown in Figure 5-5(a), the inputs first go through non-overlapping signal generation blocks (implementation shown in Figure 5-5(b)) before being fed into the 2-to-4 decoder. The non-overlapping block ensures that the generated low-voltage gate control signals (ϕ1 (LV ) − ϕ4 (LV )) have dead time between each other, such that the pulser transistors (M1-M4 in Figure 5-4) are not on at the same time, dissipating unnecessary crowbar current. The non-overlapping dead time is 2-bit adjustable through the variable length delay lines controlled by Delay[0 : 1] to provide enough adjustment margin. The low-voltage gate controls are further level-shifted by the cross-coupled level shifters in Figure 5-5(c), which translate the low-swing signals into high-swing signals that drive gates of the high voltage transistors in the pulser and the DC-DC converter. The threshold voltage of M1 and M2 is low enough such that they can be completely turned on by the 3.3V inverters. The level-shifted gate drive signals have a 30V voltage swing, which is under the rated operation conditions of high voltage transistors in this process. The typical set of 3-level pulser control signals and the resultant 3-level pulse shape at the output V o, are shown in Figure 5-5(d). Because the low-voltage signal swing is small, the digital control power is negligible compared to the pulser power. This design of multi-channel pulsers with a shared voltage converter can be extended easily to more Tx channels, without additional off-chip components. It could also be revised to implement more voltage levels to achieve more dynamic power reduction. However, this requires the addition of more switches connected between the 112 Non-overlapping 2-to-4 Line Decoder Ain Bin Nonoverlap Nonoverlap A 2-to-4 Decoder Ab Ab & Bb Ab & B A &B A & Bb B Bb Level-shifter φ1(LV) Levelφ2(LV) shifter φ4(LV) (X4) φ3(LV) φ1 φ2 φ4 φ3 (To pulser gates) (Off-chip controls) (a) Ain Bin A Ab Non-overlap B Xin X Bb Delay[0:1] φ1 Xb φ2 (b) Level-shifter φx M6 M3 φ4 30V φ3 M4 Vo (pulser output) M5 3.3V 3.3V φx (LV) (1x) (8x) (LV) (LV) *M1~M6 are HV devices (d) M1 M2 3.3V (8x) (LV) (c) Figure 5-5: The digital control circuits for the pulser: (a) the signal flow and block diagrams; (b) the non-overlapping signal generator; (c) the level shifter implementation; (d) the control signal timing diagram. 113 CMUT and the voltage levels. Due to the large drain capacitance of high voltage switches, the self-loading effect takes away much of the power savings from introducing additional voltage levels. According to simulation results of the 0.18µm CMOS process used in this work, a 3-level pulser dissipates 16% of total power to drive the gate and drain capacitance of M5-M8 in Figure 5-4. For a 4-level pulser, the dynamic power reduction is counteracted by the power increase to drive more and bigger transistors, leaving the overall efficiency roughly the same as a 3-level pulser. A 5-level pulser incurs even more power penalty on driving the high voltage transistors and the efficiency is lower than a 3-level pulser. Therefore, a 3-level pulser design is used in this work. 5.2.3 Tx Path Design for 2D Ultrasonic Transducer Arrays For the 2D ASIC implementation, a 2D grid of per-element 3-level 30Vpp pulseshaping pulsers are connected by column and row lines, additional circuitry is added to support column-parallel and row-parallel modes. Figure 5-6(a) shows the complete schematic of a pulser at the j th row and ith column and its corresponding row and column gate drivers. Except M2 and M3, all transistors’ bulk are connected to source. The bulk of M2 and M3 are connected to 0V and 30V respectively. The pass-gate multiplexers2 implemented in high voltage transistors are added into the per-element pulser as shown in Figure 5-6(b). This is to implement the functionality of T r and T c switches in Figure 5-1(a), so that the pulser gates can be either driven by the row driver, the column driver, or none, in which the gate is held at 0V for M1-M2 and at 30V for M3-M4. An important issue in the 2D array design is the line parasitics. To account for the line parasitics accurately, the line metal layout is extracted to obtain the estimated lumped circuit model (Rp , Cp ) as shown by the red circle in Figure 5-6(a). The pulser is placed under each element to avoid the parasitics affecting the pulsing performance as much as possible. This is because the line parasitics are only present as a load for 2 All four pulser gates, M1-M4, have their MUX, but only M3 is shown in Figure 5-6(a) as an example. 114 φ4 Row Gate Driver for M3 [j] 30V M10 M12 30V 30V φ3,r[j] INV1 INV2 M7 M5 M6 M9 M11 φ2 φ1 INV3 0V Per-element line parasitics (1x layout width) VBIAS M3 15V M8 M4 φ3 MUX3 D3,r[j] Pulser [ i, j ] M2 (2pF) CMUT 1MΩ 0.1uF M1 φ3,c[i] Column Gate Driver for M3 [i] Rp=62Ω Cp=25fF D3,c[i] 30V MUX3 (Tc+Tr) Tc To M3 Gate φr<j> Tc Tr Tr φc<i> Figure 5-6: Tx design for the 2D array: (a) 2D pulser schematic; (b) MUX implementation. the gate drivers, rather than the pulser itself. In this way, when there are different number of elements active along a column or a row line, the gate driver sees different loads while each pulser always sees a constant CMUT load that is local. Therefore, the design makes sure that the pulse’s shape and amplitude is preserved and invariant to number of active elements. Meanwhile, the gate driver transistor sizing is optimized to drive pulsers on the same column or row line with the presence of parasitic line capacitance. In current design, the gate drivers are sized for the heaviest driving load, which corresponds to all 16 active pulser gates (about 90-100fF per gate capacitance), and 115 the line parasitics across the length of 4mm (16 × 250µm). The column / row line layout is implemented with minimum width metal wire layout, giving an estimated per-element (250µm length) line parasitic model: Rp = 62Ω, Cp = 25f F . The gate driver power consumption takes up about 35% of the total power consumption in Tx. However, in the future, the gate drivers can also be made programmable in driving strength, so that it can adapt to the number of active elements to save power. With the adaptive driving strength, the self-loading of the gate driver at the light load can be reduced, and a constant pulser efficiency can be maintained. 5.3 Rx Circuit Design This section describes the design of the receiver path. The block-level circuit optimization to interface to CMUT elements is presented first [5, 6]. A transimpedance amplifier (TIA) topology is utilized to improve the trade-offs between noise, bandwidth, and power dissipation. Design optimizations for the 2D array will be described next, which can be applied to general 2D ultrasonic transducers. A specially sized source follower output stage is added to the LNA to implement receiver parallelization for improved SNR. 5.3.1 LNA Optimization Methodology for CMUT For the receiver, large input capacitance limits the bandwidth and tends to increase the noise contribution from the input stage transistors, degrading the noise figure (NF). Bulky off-chip inductors are needed to impedance match the source to a traditional PZT pre-amplifier that assumes a low-impedance source [27]. Charge-based amplifiers were attempted for CMUTs. The continuous-time charge amplifier achieved low noise and low power performance for CMUT working at kHz range [86], but the large impedance from the DC-setting network limits the bandwidth for a CMUT array working at MHz range for medical imaging applications. The switched-capacitor charge integrating amplifier in [87] could provide enough bandwidth, but issues such as clock feed-through and charge injection are difficult to mitigate for the inherently 116 Figure 5-7: Small signal model and noise sources of the CMUT element and the LNA. single-ended CMUT signal path. Moreover, because the sampling clock switches at a higher frequency than input signal bandwidth, the settling requirement for the op-amp demands a higher bandwidth than what is needed in op-amps used as continuous-time buffers, leading to much more power consumption. In this section, the TIA topology is described to improve the trade-off between gain, bandwidth and noise, with an inductor-less design at the presence of high input capacitance [40, 45, 88]. Figure 5-7 shows the small signal model of the CMUT and LNA. Figures 5-8 and 5-9 plot various circuit transfer functions to help analyze the optimization process for the LNA. The closed-loop TIA gain is expressed as: ZCL 1 = Rf · 1 + sRf Cf ! · F · AOL , 1 + F · AOL (5.2) where F = Zi /(Zi +Zf ) is the feedback factor, and AOL is the op-amp open-loop gain. From (5.2), the LNA DC transimpedance gain is Rf and its bandwidth is determined by the smaller of the following two poles: fp = 1 , 2πRf Cf 117 (5.3) Gain (dB) 100 AOL 80 1/F ZCL 60 Gn 40 20 0 4 6 10 8 10 Freq (Hz) z 10 p i c Figure 5-8: Transfer functions when the LNA optimality condition is reached. i i p OL p OL Figure 5-9: Transfer function examples when the LNA optimality condition of fi ≈ fp is not reached: (a) fi < fp , (b) fp < fi . 118 fi ≈ q s fc · fz = fc · 1 . 2π (Rf ||Ri ) (Ci + Cf ) (5.4) fp in (5.3) is due to the RC time constant of the second multiplying term in (5.2). fi in (5.4) comes from the third multiplying term in (5.2), which reaches -3dB when F · AOL = 1. Graphically, as can be seen in Figures 5-8 and 5-9, fi is the intersection between 1/F and AOL curves, which is approximately the geometric mean of 1/F ’s zero (fz ) and the op-amp’s unity-gain frequency (fc ), assuming a 20dB/dec slope in both 1/F and AOL curves. When fi < fp , as shown in Figure 5-9(a), an increase in Rf always improves LNA’s gain-bandwidth product (GBP). This is because gain = Rf , while bandwidth q = fi , which is approximately proportional to 1/ Rf as indicated by (5.4). GBP improves roughly proportionally with q Rf . However, because fp is proportional to 1/Rf as indicated by (5.3), the increase in Rf leads to faster decrease in value of fp than fi . When fi ≈ fp , the LNA achieves maximum GBP available from the op-amp. The phase margin is roughly 45o . Further increase in Rf no longer improves GBP, because the bandwidth becomes limited by fp and is proportional to 1/Rf (Figure 5-9(b)), holding the GBP constant. But as Rf increases, the phase margin continues to improve at the expense of a reduced bandwidth [88]. The optimality condition, fi ≈ fp , also minimizes noise contribution from the op-amp input-referred voltage noise. Figure 5-7 shows all noise sources in the circuit. The noise figure is expressed as: 2 2 · Ṽop · I˜op I˜op Ṽop2 Ri NF = 1 + + 2 + 2 + 2 . Rf I˜in · |Zi ||Zf |2 I˜in I˜in · |Zi ||Zf | (5.5) From (5.5), a large Rf is desired to reduce its thermal noise contribution. Moreover, the op-amp’s input-referred voltage noise (Ṽop ) has a peaking effect due to the impedance drop in |Zi | at higher frequencies. It can be mathematically seen from the following noise gain expression (Gn ), defined as the transfer function from LNA input 119 Gain (dB) fi fp Rf fi fp Rf fi fp Rf 80 80 80 AOL 60 60 60 40 40 40 1/F Gn 20 20 20 0 3 10 6 10 Freq (Hz) 9 0 3 10 10 6 10 Freq (Hz) 9 0 3 10 10 6 9 10 Freq (Hz) 10 Figure 5-10: Transfer function examples: (a) fi < fp , (b) fi ≈ fp , (c) fi > fp . (Ṽop ) to the output (Ṽout ): 1 1 AOL = ||AOL ≈ min , |AOL | . Gn = 1 + F · AOL F F (5.6) The dashed red curves in Figure 5-8 and Figure 5-10 show the graphical interpretation of (5.6): Gn is the lower parts of 1/F and AOL curves, which has a considerable peaking effect within the LNA bandwidth. By comparing the optimal and non-optimal conditions in Figure 5-10, one can see that the condition fi ≈ fp minimizes the noise peaking effect while exploiting the maximum possible GBP from the op-amp design. 5.3.2 LNA Transistor-Level Implementation Following the guidelines discussed in Section 5.3.1, the LNA optimization starts with a 10MHz bandwidth target and the optimal condition: fi ≈ fp ≈ BW . Rf is maximized while keeping the corresponding Cf , estimated from (5.3), larger than parasitic capacitances to maintain control over circuit stability. The unity-gain frequency fc is estimated from (5.4) to set the op-amp design target. Further design adjustments keep phase margin above 60o . Figure 5-11 shows the LNA schematic. The input stage devices (M1, M2) are biased at the boundary of strong and weak inversion, as shown in Figure 5-12(a), to achieve high transconductance per unit current and low noise while minimizing 120 CMUT AC Model HV Rx Switch 400μA vb1 M0 RxSw M6 vip M10 M9 s1 M3 Cc s5 M8 M1 M2 Ci 25μA s3 vin 2pF 83μA M4 s4 out0 M5 s2 120fF 1.8V 68KΩ 20pF 68KΩ vip 68KΩ 120fF 120fF vb2 s6 M7 23Ω(1MHz), 1KΩ(10MHz) 120fF 175KΩ 175KΩ 175KΩ 175KΩ Ms1 Ms2 Ms3 Ms4 Programmable Transimpedance Gain Figure 5-11: The LNA schematic, implemented in the TIA topology. All transistors are low voltage devices except the HV Rx Switch M10. size and parasitic capacitance. The differential pair suppresses interference from the power supplies, which is not possible with single-ended topologies [40, 45]. Circuit simulation result in Figure 5-12(b) shows that the sizing of M1 and M2 is optimized for the target CMUT parameter and that the noise figure is minimized to be below 10dB. The Miller compensation leg (M9, Cc) keeps the op-amp second pole well beyond the closed-loop bandwidth for good phase margin. The source follower (M7, M8) lowers the op-amp output impedance to enforce accurate feedback. During high voltage transmissions, the high voltage Rx switch (M10) is opened and the low voltage switches (s1-s6) are closed. The on-resistance of M10 directly impacts LNA noise performance. Its size is chosen such that its noise contribution is only a small portion of the input stage, and its parasitic drain and source capacitance do not degrade phase margin and bandwidth. Switches s1-s6 put the op-amp into sleep mode when they are closed, during which only the reference current remains conducting for fast wake-up within 1µs. The sleep mode enables system-level power saving opportunities. In addition, 4-step programmable transimpedance gain 121 (gm*ro) w.r.t. (ID/W) NF w.r.t. W 12.0 Input stage sizing Input stage sizing (L↑) NF [dB] (gm*ro) 11.5 Weak Inv. 10-8 10-7 10-5 10.5 10.0 9.75 Strong Inv. 10-6 11.0 10-4 Current Density (ID/W) [A/μm] 10-3 10-2 (a) 0 250 500 750 1000 1250 1500 Input Stage Width [μm] (b) Figure 5-12: Design optimization for input stage transistors: (a) transistors are sized at the boundary of strong and weak inversion; (b) transistor width is optimized for the lowest noise figure. is implemented to provide system-level flexibility. 5.3.3 Rx Path Design for 2D Ultrasonic Transducer Arrays For Rx path in a 2D array, we want to achieve the same parallelism effect as in the Tx path. Therefore, the LNA is modified such that when multiple Rx channels are activated on the same column or row line, their analog outputs combine for an increased SNR, where signals are averaged and noise is reduced. In this way, CMUT elements are effectively parallelized to receive acoustic echoes to satisfy system-level requirements. One example of its use is already presented in Section 4.4, where the active Rx elements in the annular ring aperture are in parallel and the analog signals are added along the columns. To illustrate the principle of analog signal combining, Figure 5-13 shows the process of combining two Rx channel outputs. In Figure 5-13(a), the input current signals (is1 , is2 ) and the input-referred current noise (in1 , in2 ) from two CMUT elements are amplified by the two TIAs. The outputs of the LNAs (implemented with TIA) are modelled as the Thevenin’s equivalent circuits. Both the current signal and the current noise are amplified by the transimpedance gain Z into voltage sources, in series with a output resistance Ro. The output configuration is then converted to Norton’s 122 equivalent circuits as shown in Figure 5-13(b), to indicate the combination is done in the current domain3 . The current gain from the input to the output is expressed as K = Z/Ro. Assuming the two channel parameters are perfectly matched and ignoring the line parasitics for now, the combined LNA circuits are equivalent to the circuit shown in Figure 5-13(c). The two output resistors are in parallel to form a output resistance of Ro/2. The two current signals add up directly as in Equation (5.7), while the two noise sources add up in power in Equation (5.8), since they are uncorrelated noise sources. is,output = K · (is1 + is2 ) . in,output = K · (5.7) q in1 2 + in2 2 . (5.8) Because the CMUT element size is the same and the LNAs are designed to be matched, the input-referred noise power should be roughly equal (i2n1 = i2n2 ). Moreover, if the two receiving CMUT elements are close to each other in space, the two CMUTs would see ultrasound echoes similar in amplitude and phase, leading to similar input signals (is1 ≈ is2 ). The above assumptions lead to the output signal and noise expressions in Equation (5.9) and (5.10). This translates to a 3dB improvement in the output SNR when two Rx channels are in parallel compared to a single channel output, as indicated in Equation (5.11). Naturally, more parallelism would lead to further SNR improvement, and SNR improvement follows the trend of 10 log(N ) dB, in which N is number of channels in parallel. It is also summarized in the “Theory” row in Table 5.1. is,output = 2K · is1 . in,output = SN R2x is,output = 20 log in,output ! √ (5.9) 2K · in1 . 2K · is1 = 20 log √ 2K · in1 ! (5.10) = 3+20 log is1 in1 = 3dB+SN R1x . (5.11) 3 Thevenin’s equivalent circuit in voltage domain will yield the same conclusion, but Norton’s equivalent is easier for explanation. 123 in1 is1 Z*is1 Z*in1 in2 is2 Z*is2 Z*in2 K = Z/Ro in1 is1 (K*in1)(K*is1) in2 is2 (K*in2)(K*is2) K*√(in12+in22) K*(is1+is2) Figure 5-13: The signal and noise combining with two Rx channels in parallel: (a) two channels on the same line, shown in Thevenin’s equivalent circuit at LNA outputs; (b) two channels on the same line, shown in Norton’s equivalent circuit at LNA outputs (c) two channels combined, showing the resultant signal and noise amplitudes. 124 In the implementation, line parasitics and component mismatches need to be taken into consideration. The RC model of the line parasitics is shown in Figure 5-13(b), and the LNA output stage must be specially designed to achieve the proper analog signal combination. First of all, current mode combining should be used because it is intrinsically robust against the parasitics and mismatch. The LNA output impedance (Ro) must maintain a relatively large value compared to line parasitic resistance (Rp ), i.e. Rp << Ro, so that the circuit DC condition is less susceptible to mismatch and parasitics, and the signal combining has less distortion. On the other hand, Ro must not be too high either, because the line capacitance would limit bandwidth, due to the time constant formed by Ro and the line capacitor (Cp ). As a result, the output resistance and the line parasitics need to be co-designed to work together optimally. The line parasitics can be adjusted during design by changing the metal layout wire width, and a source follower stage (M11-M12 in Figure 5-14) is proposed to provide a constant output impedance. First, because the source follower stage is the last stage of the LNA, the linearity requirement determines its biasing current. An estimated biasing current of 34µA is calculated based on the needed worst case slew rate for a full-swing 10MHz output signal as in (5.12). ID ≈ Islew = Cload · Vlinear · (2πf ) = 0.9pF × 0.6V × (2π × 10M Hz) = 34µA. (5.12) The loading capacitance is estimated from the input stage capacitance of the succeeding row / column BUF amplifier (0.5pF) plus the 4mm line capacitance assuming minimum layout width (0.4pF); the linear range of the output signal swing amplitude is estimated based on the maximum possible voltage headroom; and the signal frequency is the maximum 10MHz supported in the ASIC design. The initial biasing current leads to roughly an output resistance of 2.2kΩ as in (5.13), assuming an estimated 0.15V transistor over-drive voltage. Ro ≈ VGS − VT H 0.15V 1 = = = 2.2kΩ. gm 2ID 2 × 34µA (5.13) Starting with this initial design, the row / column line width is swept to find a 125 solution that not only maintains the SNR improvement with Rx channel parallelism, but also preserves a 10MHz bandwidth. At the same time, the output stage linearity performance numbers, such as HD2, IMD3, and Po1dB, are re-examined as the parasitic loading from the line changes. If the linearity specs are not met, the transistor sizing or the biasing current of the source follower stage are tweaked to satisfy the design target. After several iterations of changes in the line width and output stage design, the final optimal design has a 45µA biasing current and a 1.7kΩ LNA Ro. The corresponding line width is chosen to be 10x minimum metal wire width, as shown in Figure 5-14. The estimated per-element (250µm length) line parasitic model is: Rp = 6.2Ω, Cp = 250f F . According to circuit simulation, the circuit maintains a worst case 9.2MHz bandwidth when only the channel at the end of the line is activated, driving the whole 4mm line to reach the BUF amplifier. Except for the worst case, most other configurations4 provide a bandwidth over 10MHz. Meanwhile, the SNR improvement with 16x channel parallelism is 11.97dB, which is very close to the ideal target of 12dB when there is no parasitics. As a sanity check, the row / column line width is modified to see its effect on circuit performance. When the line width is decreased by 5x (Rp = 31Ω and Cp = 50f F per element), the larger line resistance reduces noise averaging of 16x parallelism to 11dB. When the line width is increased by 3x (Rp = 2.07Ω and Cp = 750f F per element), the worst case channel bandwidth drops to 6.2MHz due to the increased line capacitance. The last step in design is to carry out the Monte Carlo simulation for verification in the presence of device mismatches. Less than 2% DC disturbance is observed in Monte Carlo simulations that include line parasitics, global process variations, and local transistor mismatches. Finally, the ASIC measurement (more details are in Chapter 6) verified the design functionality. The measured SNR improvement with parallel channels is close to theory, as listed by Table 5.1. The discrepancy between the measurement and the ideal case most likely comes from the fact that there exist 4 Other configurations include: a single channel closer to the BUF, driving a shorter line with less parasitics; several channels parallelized; etc. 126 LNA[ i, j ] 400μA vb1 M0 RxSw M6 vip M10 CMUT HV Rx AC Switch Model M9 s1 M3 Cc s5 M8 M1 M2 Ci 25μA s3 vin 2pF 83μA M4 s4 120fF M5 120fF vb2 s6 M7 35/0.45 Ro=1.7KΩ Mc1 Mc2 68KΩ 68KΩ vip 68KΩ 175KΩ 175KΩ 175KΩ 175KΩ Ms1 Ms2 Ms3 Rr 50/0.18 Ms4 Row BUF[ j ] 0.5pF Per-element line parasitic (10x layout width, 250μm length) 120fF 1.8V 20pF Mr2 Rc 50/0.18 M11 s2 120fF 45μA Source Follower M12 Output Stage vb3 45/0.54 25/0.18Rc out_r Mr1 out Rp=6.2Ω Rr 25/0.18 Cp=250fF out_c Programmable Transimpedance Gain 0.5pF Column BUF[ i ] Figure 5-14: The LNA schematic, implemented in the TIA topology. All transistors are low voltage devices except the HV Rx Switch M10. “vip” node is also buffered with a source follower to output (not shown). SNR improvement with parallelism Theory (dB) Measured (dB) 2x 3 2.41 4x 6 5.41 8x 9 8.20 16x 12 10.86 Table 5.1: SNR improvement from Rx channel parallelism, theory prediction and measurement. correlated noise sources, preventing the noise power to be averaged out. Discussion on Scaling As can be seen from Table 5.1, the measured channel SNR improvement deviates from the theoretical expectation more as the channel parallelism increases. The performance degradation is the result of the line parasitics and indicates that the parallelism cannot be scaled up to infinite number of channels. In particular, it is impossible to maintain a satisfactory bandwidth performance for the channel located at the farthest end of the line, when the line length is excessively long. However, several techniques can be proposed to mitigate the negative effect from 127 the line parasitics and improve the scaling to an even larger array, as described below. • Increasing the source follower stage bias current and transistor sizing further could lead to more than 16x parallel channels with the same performance. The corresponding line width needs to be increased approximately proportionally to keep Rp << Ro for current summing. Channel count increase in this way will stop when self-loading condition for circuit bandwidth is reached. At that point, Cp becomes the dominant load at the output, and the increase of Cp completely offsets the reduction of Ro. Circuit simulation shows that at around 64x parallelism with a 40x minimum metal line width, self-loading is reached; increasing output stage sizing and power consumption does not extend parallel channels any more. • The metal wire layout in current design is using only one layer of metal. Several metal layers can be connected in parallel to yield a better line parasitics model. For example, by using two metal layers in parallel to implement the interconnecting column and row lines, Rp is reduced by 2x while Cp is increased by a factor that is much less than 2x, because there are no coupling capacitance between the two metal layers at the same potential. As a result, the channel parallelism can be approximately extended further by close to 2x. • The column or row lines can be interconnected from both ends to the column or row buffers, effectively reducing the line parasitics. The worst case channel in this scenario becomes the one at the center of a line, rather than the ones at the two ends. Therefore, approximately another 2x more channels can be placed in parallel with the same performance. • Lastly, inserting intermediate buffering stages in the middle of interconnection lines could extend the number of parallel channels even further, as shown in Figure 5-15. Within each intermediate block, 16-64x channel outputs can be combined in parallel by each channel’s source follower stage. The additional line buffers inserted could attain parallelism with even more channels without excessive bandwidth / linearity performance degradation. 128 Figure 5-15: Parallelism with even more Rx channels by utilizing intermediate line buffers to preserve the circuit performance. 5.4 Biasing The current biasing for the 2D ASIC is carefully designed to provide good matching for channels across the array. Figure 5-16 shows the biasing scheme. An 8-bit DAC is used to generate a gate voltage, which is applied onto a tunable PMOS transistor. The PMOS M d0 is implemented by binary weighted PMOS transistors in parallel to provide 8-bit tunable widths. The 8-bit DAC produces a nominal seed current of 25µA and the 8-bit tunable PMOS width provides 0.2µA steps over the adjustable range of 0 − 50µA. The seed current generated by M d0 is fed into a current mirror with 16 branches implemented by NMOS transistors M d1 and M n0 − M n15. These 16 branches provide seed currents for the 16 rows in the 2D array. The layout of transistors M n0 − M n15 are physically placed next to each other for good matching. Each of the 16 row currents is then routed and distributed to its corresponding row, where it goes through another set of current mirrors with 16 branches. For example, row current generated by M n0 is mirrored by PMOS transistors M p0 and M 0−M 15. 129 Cp15 240 255 0 15 Cp0 Cd1 (MOS cap) Figure 5-16: The biasing circuit for the 2D array. Similarly, for matching purposes, transistors M 0−M 15 are placed next to each other, before their generated biasing currents are routed into corresponding circuit channels. The current into each channel is nominally 25µA. It is important to design the current mirror to be robust against mismatches across the array. The transistor mismatch model in strong inversion is expressed in (5.14). ∆I = I v # u" u ∆ (W/L) 2 t (W/L) 2∆VT H + VGS − VT H 2 = v u u ∆W 2 t W ∆L + L 2 2 2∆VT H . + VGS − VT H (5.14) The transistor L is chosen to be long to provide both large output impedance and small sensitivity to channel length mismatches. At the same time, the transistor W is chosen to keep the transistor well in the saturation region, with Vdsat = |VGS − VT H | ≈ 0.3V . The large over-drive voltage helps maintain a relatively small VT H mismatch. To reduce the noise contribution from current mirror transistors to LNA circuits, MOS capacitors Cd1, Cp0 − Cp15 are instantiated as bypass capacitors. They take up as much free layout area as possible, such that the noise generated from current mirrors are negligible according to circuit simulation. 130 5.5 The Fault-Tolerant ASIC Design for Faulty MEMS Devices This section discusses the practical issues in the CMUT-ASIC assembly process. The fault-tolerant transceiver front-end design in conjunction with the use of per-element enable bits become an elegant solution to overcome the defective transducer elements. The method increases assembly yield and allows successful system demonstration. A 2D CMUT array contains a large number of elements, inevitably there could exist defective elements. Currently, we obtain 2D CMUT transducer samples externally with the size of 16x16 to work with our 2D ASICs for experiments. Some of these MEMS research prototypes suffer from failure mechanisms including individual shorted elements and individual open elements. The problematic elements are randomly distributed in the array, and their positions vary from device to device. For short elements, the short behavior is also observed to be related to the bias voltage. A higher V BIAS tends to create more short elements; when V BIAS is reduced, some elements that were shorted might turn into a normal element. For the non-functional elements in the array, the open elements do not require special treatment. The transceiver channel with an open element is not useful, since no ultrasonic signal can be emitted or received. But that element does not affect the transceiver circuit, nor prevent other elements from working properly. On the other hand, the short elements cause more problems. Because the whole 2D array is biased with a shared high voltage supply V BIAS, a short element could propagate the high voltage to the side that is connected to the circuit, exposing the transceiver circuitry under V BIAS and potentially damaging the circuit. Furthermore, if the transceiver circuit provides a relatively low impedance path to ground, V BIAS could be pulled down to close to 0V, sinking current through the low-impedance path from V BIAS to ground. Since V BIAS is shared across the array, the whole array would be hardly biased in this situation and become useless. While extensive research is ongoing to make the device more reliable with a lower defective element percentage, it is worthwhile to investigate methods to cope with 131 the existing defects. In particular, given the fact that even one short element could render the whole array useless, and that achieving 100% functional element percentage is difficult for 2D arrays with ever-growing sizes, fault-tolerance is indispensable to work with 2D CMUT arrays in the future. Previously, a very manual process has been used to overcome the problem caused by the short elements [40, 43]. The elements in a 2D CMUT array are first tested with a probe station to identify all the short elements under a certain V BIAS. The solder bumps at the positions corresponding to the short elements are then manually removed, to prevent the electrical contact between the short CMUT element and the interposer PCB. In this way, the short CMUT elements are physically isolated from the transceiver circuitry and the rest of the array can operate normally. There are several drawbacks with this “selective bumping” approach. First, using a probe station to sweep through all 256 elements to find shorts is a very slow and manual process which is prone to errors. Second, because each CMUT device has a unique pattern of short elements, it is not an easily automated process to remove the detected shorts. Lastly, this manual approach might not solve the problem completely. It has been observed that new CMUT short elements might emerge when a different V BIAS voltage is applied. Therefore, a fixed solder ball removal pattern might work at the beginning, but as soon as one single additional new short element emerges, the assembly becomes not usable. On the contrary, our 2D ASIC takes advantage of circuit techniques to implement fault-tolerant transceivers, in order to eliminate the need for “PCB selective bumping”. The ASIC and CMUT are flip-chip bonded together in the usual way without selective solder ball removing, as already been described in Section 4.1. Afterwards, the ASIC performs a programmable “channel removal” process electrically, used both as a scanner to detect short elements and as a selector to isolate the detected shorts. Our solution does not require additional circuitry, but only small changes in controlling the existing front-end HV transistors in the Tx pulser and the RxSw, as shown in Figure 5-17. In each channel, totally five front-end HV transistors are directly connected to the CMUT element as shown in Figure 5-17(a). M1-M4 are pulser 132 30V 30V (Monitor current) VBIAS M4 15V 30V M3 1MΩ M2 0V G1.[0] CMUT 0V 0->30->0V M1.[0] G1.[1] 0.1uF M1.[1] M1 G1.[255] 0V + CMUT. [0] (Monitor current) VBIAS CMUT. 10kΩ [1] V I 1MΩ CMUT. [255] 0.1uF M1.[255] M10 Figure 5-17: The technique used for detecting and isolating the short CMUT elements: (a) front-end transistors in each channel and their control voltages; (b) the effective circuit connection of all 256 channels with CMUT elements. transistors and M10 is the Rx protection switch (RxSw). Their gate voltages can be controlled independently. When all transistors are switched off, the CMUT element is effectively disconnected and “selectively removed” from the array. To detect short elements, M1 is used to provide a ground path to CMUT, while other four transistors are kept off. Focusing on M1, the 256-channel electrical connections between ASIC and CMUT are reduced to Figure 5-17(b). M1 from each channel is sequentially turned on and off, applying a voltage sequence of 0→30→0V to M1’s gate. For example, when M1 from channel [0] is on with its gate voltage G1.[0] at 30V, CMUT in channel [0] is connected across the ground and V BIAS. Normally, the CMUT is a capacitor at DC and the current monitored by the voltage meter is zero. But if the CMUT is shorted, the 10kΩ probing resistor would expose a leakage current through the abnormal CMUT, indicating a short element. The per-element enable bits in the Column-Row-Parallel architecture is the key factor to ensure the selective enabling of transceiver channels to only make electrical connections to normal elements. It is the independent control over each channel that 133 board7-B-SOICMUT,VBIAS=30V board8-B-SOICMUT,VBIAS=30V 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 80 81 82 83 95 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 84 85 86 87 88 89 90 91 92 93 94 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 5-18: Two successful 16x16 CMUT-ASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC. The rest of the elements are functional and their sensitivity performance is expressed by the brightness of the elements, which will be described in detail in Section 6.4. allows us to identify individual short elements. And by iterating through each of the 256 channels, all short CMUT elements are identified. The ASIC is then programmed such that only the channels with normal CMUTs are enabled which contribute to the imaging operations. All transceiver channels facing shorted elements will keep their front-end HV transistors cut-off during all operations. If new short elements emerge in the future, the ASIC can be programmed again to easily account for the changes. Figure 5-18(a) and (b) show two example assemblies with the short elements marked in red. However, the electrical isolation does have one limitation. The maximum acceptable V BIAS is limited to the maximum rated voltage that the HV transistors can withstand. This is because the short elements would conduct V BIAS to the circuit side, and the HV transistors are therefore stressed by the voltage difference between the drain and source. V BIAS as high as 40V has been applied without breaking the ASIC in our experiments. Normally a V BIAS of 30V is used since it already offers 134 enough acoustic pressure and sensitivity to perform the imaging experiments. Overall, our approach has been successful. It leverages the powerful functionality provided by the electronics implementing the Column-Row-Parallel architecture, and makes the short detection problem a fast and automated electrical process. It does not require repetitive manual device characterization, and it could easily adapt to element property changes over time. Imaging experiments in Chapter 4 are all carried out with the short elements disabled in this way. As also been discussed in Section 4.1 and 7.2, with several non-functional elements inside the 16x16 2D aperture, imaging experiment results in Chapter 4 are not severely affected. Interpolation is used to make up for the missing elements’ signals in digital post-processing in receive. Transmit interpolation is also possible with pulsers that provide programmable amplitude and phase generation (the 3-level pulser design in this work has a fixed pulse amplitude generation), such that the neighboring channels of a missing channel can adjust their pulse shapes to compensate for the missing signal in transmit. Furthermore, as can be seen from Figure 5-18, the channel responses of different channels across the array have mismatches due to the device and circuit component mismatches, and from the flip-chip bonding assembly process. This response difference in the receive path is already corrected by digital post-processing, where a weight is applied onto each channel’s waveform to account for the response amplitude difference. In the transmit path, similar correction can be applied with pulse shape pre-distortion. Because the assembly property does not change much, this correction / trimming process is static and only needs to be performed infrequently. Lastly, the per-element addressing capability combined with the highly flexible front-end circuit design has its application beyond implementing fault-tolerance. It is a general testing and calibration infrastructure that enables programmable access to individual ultrasonic channels. In fact, the block-level circuit performance characterization is carried out by only enabling a single channel for measurement; the channel performance mismatches are evaluated by turning on different channels; and the LNA parallelism measurement is obtained by activating different number of Rx channels for each parallelism configuration. For an even larger 2D transducer array in 135 the future, the ability of random access to channels in the array is critical for device characterization, performance evaluation, and calibration. 136 Chapter 6 ASIC Characterization The ASIC circuit block characterization is presented in this chapter. We have taped out two ultrasonic transceiver ASIC test chips. Before the 2D 16x16 ASIC was made, a 1D 4-channel ASIC was first fabricated and tested [5, 6], which is designed for a 1D CMUT array, with a pitch of 300µm and an element height of 3mm [41]. The 1D chip allows us to familiarize with the CMOS high voltage process and the CMUT device properties. The 2D chip re-uses circuit blocks from 1D chip, with innovation at the architecture-level. While the 2D chip testing focuses heavily on system-level demonstrations, which has been covered in Chapter 4, the 1D chip testing focuses heavily on interesting acoustic experiments and device characterization. In this chapter, both chip’s test results will be presented, with different emphases. 6.1 Tx Ultrasonic Power and Efficiency Measurement The most important performance specification for the ultrasonic transmitter is the power efficiency. The Tx efficiency is defined as the ratio between the transmitter’s acoustic output power and the total consumed electrical power. To obtain Tx efficiency, the total pulsing power can be measured electrically. However, the ultrasonic power transmitted into the medium requires acoustic measurements. 137 This section shows the way of characterizing the transmitter performance with a combination of acoustic and electrical measurements. The 1D chip is used to show the characterization process, but both chips’ results are listed at the end of the section. 6.1.1 Measuring Acoustic Output Power From (6.1), acoustic power is the product of the acoustic intensity (I) at transducer surface and the transducer surface area (A). I is calculated from the RMS fundamental frequency component of the acoustic pressure at the transducer surface (prms ) and the acoustic impedance of the medium (Zm ). Pacoustic = I · A = p2rms ·A Zm (6.1) In practice, the acoustic pressure at the transducer surface cannot be directly measured. Instead, it can be reliably back-calculated from a pressure measurement at another location. According to [28–30], when the transducer aperture is close to a square or a circle, the pressure magnitude profile along the axial direction reaches its maximum at the boundary between the near and far field. The maximum magnitude is roughly twice the pressure magnitude at the transducer surface. For back-calculation, an acoustic pressure measurement system1 is established in lab, as shown in Figure 6-1. The 1D CMUT array is submerged in vegetable oil at the bottom of the oil tank. The test chip circuitry is connected to CMUT from under the oil tank. A hydrophone (ONDA HNC0400) is mounted on the 3D translation stage to probe the acoustic pressure magnitude generated by CMUT, in the oil medium. Figure 6-2 shows the detailed configuration to measure the acoustic output power. The four-channel pulser circuitry is parallelized and connected to eight CMUT elements in parallel, in order to form an aperture of 2.4mm x 3mm (roughly a square). The solid curve in Figure 6-3 is the corresponding acoustic simulation of the pressure field using the Field II software, verifying that surface pressure (z=0mm) is about 1 This measurement setup is for 1D chip testing, which is similar to the 2D chip test setup described in Chapter 4. 138 Figure 6-1: The photo of the lab setup for measuring the acoustic output power and the Tx efficiency. 139 Figure 6-2: Acoustic output power and Tx efficiency measurement setup. Figure 6-3: Normalized RMS pressure along the transducer axial axis, measurement vs. simulation. The measurement deviates from the simulation in the near field because the hydrophone tip is too close to the transducer surface, distorting the pressure field. 140 half maximum pressure (z=5.9mm). Furthermore, the hydrophone is used to probe the acoustic pressure magnitude along the axial direction (z-axis). The measured result in dots in Figure 6-3 shows good agreement to both theory and simulation. In near field, the measured data do not exhibit amplitude fluctuations as predicted by simulation. This is likely caused by the hydrophone tip distorting the pressure field as it approaches the transducer. However, it does not affect the accuracy of the maximum pressure measurement and the surface pressure back-calculation. 6.1.2 Measuring Tx Efficiency Fixing the hydrophone at the near and far field boundary (5.9mm away), acoustic output power is obtained with the aforementioned method. Tx efficiency is thus acquired after dividing the acoustic output power by the total power consumption. Different pulse shapes are generated to evaluate the efficiency improvement. The pulse shape is defined by the ∆/T ratio as shown in Figure 6-4(a), where ∆ is the step duration of the middle voltage level and T is the pulse period. When ∆/T = 0, 2-level pulses are generated. As ∆/T increases, the pulses turn into 3-level, reducing the dynamic power from CV 2 f to CV 2 f /2 and increasing the efficiency. But as ∆/T increases further, the acoustic power starts to decrease because less energy is contained within the pulse shape. Since the dynamic power is kept at CV 2 f /2, efficiency decreases. Therefore, there is an optimal pulse shape to maximize the Tx efficiency. For example, Figure 6-4(b) is a time-domain waveform of optimal 3-level pulses at 3.3MHz. Figure 6-5 shows the measurement results. As an example, Table 6.1 compares the optimal 3-level pulser against the 2-level pulser operating at 3.3MHz: the optimal 3-level pulser dissipates 38% less total power at the cost of delivering 7% less acoustic power. In other words, the 3-level pulser outputs 50% more acoustic power at the same power dissipation. The measured improvement is not as big as the theoretical calculation in Section 5.2.1 (50% rather than 88%), mainly for two reasons. First, the RC settling transition distorts pulse shapes, with 3-level pulses being distorted more severely than 2-level pulses, which leads to more acoustic power reduction in a 141 30Vpp (a) (b) Figure 6-4: (a) Tx efficiency measurement setup and pulse shape definition. (b) Measured time-domain waveform of the optimal 3-level 3.3MHz pulses, ∆=20ns, ∆/T=0.067 real-world 3-level pulser (7% rather than 4.4%). Second, a 3-level pulser uses more high voltage transistors than a 2-level pulser, dissipating more power for driving the transistors’ gate and drain capacitance, which leads to less total power reduction (38% rather than 49%). The relative efficiency improvements of a 3-level pulser over a traditional 2-level pulser at 2.5, 3.3 and 5.0MHz pulses are measured to be 56%, 50% and 43%, respectively. Table 6.2 lists the optimal 30Vpp 3-level pulser power dissipation and efficiency at all three measured frequencies. Efficiency improvement is less for pulses with a shorter period, because the same RC settling transition distorts shorter pulse shape more severely, reducing useful acoustic output power. Moreover, higher frequency pulses dissipate proportionally more dynamic power while acoustic output power is kept roughly the same, thus the overall efficiency curve shifts down. Lastly, the optimal ∆ value for the three frequencies is approximately the same (20ns), which is slightly more than the RC settling time. This is because the optimal pulses use just enough time to settle to the middle level to achieve CV 2 f /2 dynamic power, 142 Figure 6-5: Tx efficiency measurement results using different 3-level pulse shapes by varying the ∆/T ratio and at different frequencies. Table 6.1: Measured Power and Efficiency Comparison at 3.3MHz for the 1D ASIC and CMUT (40pF capacitance per element) 2-level Optimal 3-level Change Acoustic Power 0.56mW 0.52mW -7% Total Power 84.5mW 52.4mW -38% Efficiency 0.66% 1.0% 50% while keeping the middle level as narrow as possible to maintain large fundamental frequency pulse energy delivery. When normalized over pulse period T in Figure 6-5, the optimal ∆/T ratios become different for different pulse frequencies. Similarly, the 2D chip is designed to generate pulses at frequencies between 210MHz for the CMUT element size of 250µm × 250µm. The capacitance is roughly 2pF per element. Its performance is summarized in Table 6.3. By comparing the optimal 3-level pulser against the 2-level pulser, this work is effectively compared against a range of traditional pulsers. The reason is that not only for 2-level pulsers [40, 41, 89], but also for multi-level pulsers without charge 143 Table 6.2: Measured Optimal 3-level Pulser Performance Summary for the 1D ASIC and CMUT (40pF capacitance per element) 2.5MHz 3.3MHz 5.0MHz Total Power (mW) 39.4 52.4 77.6 Relative Efficiency Improvement 56% 50% 43% Against a 2-level Pulser Table 6.3: Measured Optimal 3-level Pulser Performance Summary for the 2D ASIC and CMUT (2pF capacitance per element) 4.2MHz 5.6MHz 8.3MHz Total Power (mW) 7.1 9.6 14.3 Relative Efficiency Improvement 46% 38% 18% Against a 2-level Pulser recycling [81–83] or pulsers implemented as linear amplifiers [79, 80], the dynamic power dissipation is always CV 2 f . Therefore these traditional pulsers have similar (if not worse, considering the quiescent power dissipation in linear amplifiers) Tx efficiency performance compared to the 2-level pulser used in this work. Table 6.4 gives a comparison between different types of pulsers for ultrasonic imaging. The multi-level pulser in [82] (STHV748 datasheet) does not implement charge recycling, it would consume the same amount of CV 2 f dynamic power as the 2-level pulser in [40] (2008) & [43] (2013), if the load is the same. The linear amplifier approach in [79] (2012), on the other hand, is more suitable for resistive transducers. Because 2D transducers typically have capacitive elements, its efficiency would be low due to quiescent power dissipation. Lastly, the discrete-level pulsers tend to generate harmonics. This work attempts to improve the pulser’s HD2 performance from the system-level, employing the I&Q excitation method presented in Section 4.3. 6.2 LNA Characterization The LNAs from the 1D and the 2D ASICs are tested as single amplifier blocks in this section. Table 6.5 and Table 6.6 summarize the measured performance numbers from the two ASICs respectively. In Table 6.7, selected performance specifications of the LNAs are compared against other CMUT LNAs in the literature. 144 Table 6.4: CMUT Pulser Performance Comparison [40] Our 1D [82] Our 2D (2008) Pulser Specs ASIC (STHV748 ASIC & [43] [5] datasheet) (2013) 250 x 300 x 250 x General CMUT Element 250 3000 250 Purpose Size (µm × µm) (2pF) (40pF) (2pF) (200Ω||50pF ) 3- / 5- Levels, Discrete Discrete Discrete Pulser Type without charge 3-Levels 3-Levels 2-Levels recycling 7.1mW 77.6mW Active Power N/A N/A @4.2MHz @5MHz Quiescent Power 0 0 0 N/A 2 2 2 “CV f /2” “CV f /2” “CV f ” “CV 2 f ” Dominant Power Dynamic Dynamic Dynamic Dynamic Dissipation Power Power Power Power Pulse Amplitude 30 Vpp 30 Vpp 25 Vpp ± 90 V Minimum Pulse 20 ns 20 ns 100 ns 20 MHz Width/Bandwidth Linearity N/A N/A N/A N/A [79] (2012) General Purpose (100Ω||150pF ) Linear Amplifier 20W 37mW “V 2 /R” Resistive Power 90 Vpp 6.5 MHz HD2<-43dBc Table 6.5: Measured LNA Performance Summary for the 1D ASIC [5] LNA Specs Measured Result Process 0.18µm CMOS Target CMUT Element Size 300µm × 3000µm Active Power Consumption 14.3 mW Sleep Power Consumption 1.5 mW Bandwidth 5.2 MHz Transimpedance Gain 96.6 dBΩ Receive Sensitivity 1.2 Pa(rms) Receive Responsivity 162 mV/kPa √ Input-referred Pressure Noise 0.56 mP a/ √ Hz @3MHz Output-referred Voltage Noise 91 nV / Hz @3MHz Noise Figure 10.3 dB @3MHz Output P1dB 618 mVpp 4-Ch Gain Mismatch <0.11 dBΩ Crosstalk <-47 dBc @3MHz; <-35 dBc @10MHz Wake-up / Sleep Time <1µs 145 Table 6.6: Measured LNA Performance Summary for the 2D ASIC LNA Specs Measured Result Process 0.18µm CMOS Target CMUT Element Size 250µm × 250µm Active Power Consumption 1.4 mW Sleep Power Consumption 0.054 mW Bandwidth 10.2 MHz Transimpedance Gain 116/113.5/110/104 dBΩ Receive Sensitivity 7.3 Pa(rms) Receive Responsivity 123 √ mV/kPa Input-referred Pressure Noise 2.3 mP a/√ Hz @5MHz† Input-referred Current Noise 0.41 pA/√ Hz @5MHz Output-referred Voltage Noise 289 nV / Hz @5MHz Noise Figure 13 dB @5MHz Output P1dB 946 mVpp† HD2 −46dBc @330mVpp, 2MHz tone† HD3 −46dBc @330mVpp, 2MHz tone† IMD3 −72dBc @324mVpp, 2MHz & 2.01MHz (-25dBc) tones† 256-Ch Gain Mismatch <2.0 dBΩ Crosstalk <-50 dBc @3MHz; <-22 dBc @15MHz Wake-up / Sleep Time <1µs †: These results are measured at the maximum LNA gain setting. Being used for different medical ultrasound applications, the CMUT arrays are very different in size, impedance and operating frequency. Thus, the corresponding LNA specs are also vastly different and difficult to compare. For example, the 1D CMUT used in this work is designed as an alternative to 1D PZT linear arrays operating up to 5MHz; the 2D CMUT in this work however, has a smaller perelement size (thus smaller element capacitance) while its bandwidth is larger (up to 10MHz). To establish a figure of merit for fair comparison and to be able to apply the data available in CMUT LNA literature, the noise efficiency factor (NEF) commonly used for instrumentation amplifiers [90] is revised for use here. The orignial NEF and the revised NEF’ are expressed in (6.2) and (6.3) respectively: s N EF = Vrms,in · 2 · Itot , π · UT · 4kT · BW 146 (6.2) Table 6.7: CMUT LNA Performance Comparison Our Our 2D 1D [40] [43] [73] LNA Specs ASIC ASIC (2008) (2013) (2010) [5] CMUT Element 250 x 300 x 250 x 250 x 63 x Size (µm × µm) 250 3000 250 250 1037 Active Power 1.4 14.3 4.0 9.4 3.8 (mW) [Ptot ] Sleep Power (mW) 0.054 1.5 N/A N/A N/A Bandwidth (MHz) 10.2 5.2 10 25 20 Transimpedance 116/113.5 96.6 112.7 106.6 94.0 Gain (dBΩ) /110/104 Input-referred Pressure Noise 2.3 0.56 1.8 2.18 N/A Density @5MHz @3MHz @5MHz @10MHz √ (mP a/ Hz) [pn,in ] Noise Figure (dB) Output P1dB (mVpp) NEF’ q (mP a · mW/Hz) √ [pn,in · Ptot ] [45] (2011) 70 x 70 6.6 N/A 10-20 129.5 3.0 @15MHz 1.8 10.5 @10-20 @10MHz MHz 13 @5MHz 10.3 @3MHz N/A N/A 946 618 N/A N/A 84.2 N/A 2.7 2.1 3.6 N/A 4.2 7.7 q prms,in q · Ptot = pn,in · Ptot . N EF 0 = √ BW (6.3) The constant factors in the original NEF are ignored, and Vrms,in is replaced by prms,in or pn,in . prms,in is the input-referred RMS noise amplitude in-band and pn,in is the input-referred noise spectral density averaged inside the passband. Note that both prms,in and pn,in are acoustic pressure noise, input-referred all the way to the √ mechanical side at the CMUT element surface, in the unit of P a and P a/ Hz respectively. This input-referred method normalizes the effect of CMUT receive sensitivity and LNA gain. Moreover, the input-referred noise spectral density at the center frequency of the passband is used to approximate pn,in (the input-referred noise spectral density averaged inside the passband) for the actual NEF’ calculation, because it is the more accessible measurement result in the literature. 147 Figure 6-6: The die photo of the four-channel ultrasonic imaging transceiver test chip. The NEF’ in (6.3) handles CMUT element size scaling correctly. For example, a CMUT element with 2x bigger surface area presents approximately 2x bigger input capacitance to the LNA. If two of the same LNAs are parallelized to buffer the 2x CMUT element, the same bandwidth and noise figure targets are achieved. Although √ the parallelization reduces the input-referred noise amplitude by 2x and increases the power consumption by 2x, the NEF’ is held unchanged. This is expected since the same LNA design is used in both cases. Another example to show the usefulness of NEF’ is [45] in Table 6.7. It achieves a very low noise performance as indicated by the noise figure. On the other hand, excessive power is dissipated on a very small CMUT element, which leads to a relatively high NEF’. Table 6.7 suggests that our LNA designs for the 1D CMUT achieves the lowest NEF’, indicating the best power efficiency for noise and bandwidth performance. NEF’ in our 2D LNA is slightly worse than our 1D LNA due to the overhead needed to drive extra line capacitance and to combine analog outputs. In addition, both designs achieve good linearity performance as shown by results in P1dB, harmonics and intermodulation numbers. Finally, Figure 6-6 shows a die photo of the 1D test chip fabricated in TSMC 0.18µm high voltage CMOS process. The chip occupies a total area of 3mm × 3mm 148 Figure 6-7: The die photo of the 256-channel 16x16 2D ultrasonic imaging transceiver test chip. and each channel occupies an area of 300µm × 1100µm. The shared middle voltage generation circuit occupies an area of 300µm × 600µm. Figure 6-7 shows a die photo of the 2D test chip fabricated in the same CMOS process. The chip occupies a total area of 6mm × 5.5mm and each channel is element-matched to the CMUT element area of 250µm × 250µm. 6.3 The Tx Beam-Steering Experiment Although Tx beam-steering or beam-focusing is already used on the 2D CMUTASIC system for real imaging experiments in Chapter 4, a tangible Tx beam-pattern demonstration would help understanding. Therefore, a simple Tx beam-steering experiment is conducted on the 1D ASIC, in which the ultrasonic lateral beam-pattern is measured. In this experiment, each of the four-channel pulsers is connected to one 149 (z=7.4mm) (z=7.4mm) Figure 6-8: (a) Measured ultrasonic lateral beam profile, steered to the center (broadside). (b) Measured beam profile, with 30ns delay between channels. of four consecutive CMUT elements in the experimental setup in Figure 6-1; each pulser drives its CMUT with the 3.3MHz optimal 3-level pulses. The hydrophone is placed at a fixed depth in the transducer’s far field (z=7.4mm). By moving the hydrophone along the lateral direction (x-axis) and collecting the acoustic pressure readings, the lateral beam profile can be plotted. Furthermore, ultrasonic Tx beamsteering is demonstrated on the four-channel transmitter system when varying the relative pulsing delays across four channels. Figure 6-8(a) shows the measured beam profile in dots with zero delay between channels. The beam is steered to the center, i.e., broadside. Figure 6-8(b) shows the profile when 30ns delay is applied between channels. The figures also show the Field II simulation results of the same experimental configurations for each case. The simulation and measured data match well. Hand calculation based on classical wave propagation provides another verification for Figure 6-8(b). The beam lateral displacement ∆x and the channel delay, td = 30ns, are related to each other by (6.4): ∆x td · c ≈ , z d (6.4) where depth z = 7.4mm, sound speed c in vegetable oil is measured to be 1460m/s, and CMUT element pitch d = 300µm. The calculated beam lateral displacement ∆x = 1.08mm, which is consistent with the measured result. 150 Figure 6-9: The setup of the pulse-echo experiment for characterizing the complete ultrasound channel. 6.4 The Pulse-Echo Experiment The pulse-echo experiment characterizes the complete ultrasound signal chain. The 1D ASIC test setup in Figure 6-1 is revised to perform the experiment. As shown in Figure 6-9, the pulser drives a single CMUT element with a wideband pulse as an approximation to the ideal impulse excitation. The narrowest pulse that can be generated from the pulser is a 2-level 30Vpp pulse with 20ns pulse width (Figure 6-10(a)). The excited ultrasonic wave then propagates through the oil medium and is reflected back at the oil-air boundary 26mm away from the transducer (the hydrophone is not needed for this experiment). The reflected echo is received by the same CMUT element and amplified by the LNA (Figure 6-10(b)). Because the CMUT blocks the DC component and acts as a differentiator, the received echo looks similar to the derivative of the transmitted pulse, with a positive peak and a negative peak corresponding to the rising edge and the falling edge of the transmitted pulse. The echo duration is about 0.3µs, corresponding to the dominant frequencies (3-5MHz) that go through the ultrasound signal chain. The echo’s FFT in Figure 6-10(c) confirms the intuition. It shows the total channel impulse response, including CMUT, the oil medium and LNA. It mainly reflects the band-pass characteristic and the wide bandwidth of the 151 Voltage (V) (a) Transmitted Pulse Waveform 40 30 20 10 0 -10 -0.2 -0.1 0 Time (us) Time (us) 0.1 0.2 Amplitude (dB) Voltage (V) (b) Received Echo Waveform 0.2 0.1 0 -0.1 -0.2 32 0 33 34 Time (us) 35 Time (us) 36 37 (c) Spectrum of Received Echo Waveform -10 BW=5.2MHz -20 -30 -40 0 2 2.3MHz f0=4.5MHz 7.5MHz 4 Freq (MHz) 6 Freq (MHz) 8 10 Figure 6-10: The key waveforms from the pulse-echo experiment, showing the ultrasound channel characteristics. (a) The transmitted pulse waveform. (b) The received echo waveform. (c) The spectrum of the received echo waveform. CMUT device, with a center frequency of 4.5MHz and a -6dB fractional bandwidth of 116%2 . Similarly, the 2D ASIC performs the pulse-echo experiment on all of its 16x16 transceiver channels, which shows (on average) a center frequency of 6.25MHz and a -6dB fractional bandwidth of 75% of the CMUT-ASIC total channel response. The reflected echo amplitude also shows the channel sensitivity. By collecting all echoes’ amplitudes, the sensitivity map of the array can be obtained as shown in Figure 6-11 (a re-plot of Figure 5-18). Except for the short elements in red, the working elements are drawn in grayscale. The brightness encodes the normalized sensitivity of each 2 -6dB bandwidth is used instead of -3dB because the spectrum is showing the combined CMUT characteristic both-ways. 152 board7-B-SOICMUT,VBIAS=30V board8-B-SOICMUT,VBIAS=30V 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 80 81 82 83 95 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 84 85 86 87 88 89 90 91 92 93 94 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 6-11: A re-plot of Figure 5-18 in Section 5.5. Two successful 16x16 CMUTASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC. The rest of the elements are functional and their sensitivity performance is expressed by the brightness of the elements. CMUT element. The sensitivity map can be used for digital calibration of channel gain mismatch for the 2D array. 153 154 Chapter 7 Conclusion 7.1 Summary of Contributions In summary, this thesis presents a Column-Row-Parallel ASIC architecture as a scalable and flexible hardware solution for 3D wearable / portable medical ultrasound applications. The architecture provides a “N ” interconnection complexity and a “N ” acquisition time complexity for a NxN 2D ultrasonic transducer array, which is scalable as the array size grows bigger. The architecture offers column-parallel and row-parallel operations, and fine-granularity per-element selection control, which makes the hardware system flexible for different ultrasonic imaging algorithms. A 16x16 ASIC ultrasonic transceiver test chip interfacing to a 16x16 CMUT array is designed and fabricated to demonstrate the proposed architecture. Plane-wave coherent compounding algorithm in 3D (PWCC3D) is implemented on the system assembly as a fast volume rate, high quality, volumetric imaging algorithm. The architecture also enables a technique for HD2 reduction in the transmitters used in ultrasonic harmonic imaging mode. The interleaved checker board patterns with I&Q excitations achieve Tx HD2 reduction by over 20dB compared to conventional methods. This technique is applicable to nonlinearity from both CMUT transducers and circuits, and it is useful for any arbitrary pulse shapes. The circuit design of the 16x16 ASIC transceiver is optimized to the target CMUT 155 transducer. The high-voltage transmitter uses a 3-level pulse-shaping technique with charge recycling to improve the power efficiency. The design requires minimum offchip components and is scalable for more channels. The receiver is implemented with a transimpedance amplifier topology and is optimized for trade-offs between noise, bandwidth, and power dissipation. The test chip is characterized with both acoustic and electrical measurements. Comparing the 3-level pulser against traditional 2-level pulsers, the measured Tx efficiency shows 50% more acoustic power delivery with the same total power dissipation. The CMUT receiver achieves the lowest noise efficiency factor compared with that of the literature (2.1 compared to a previously reported lowest of 3.6, in units of mP a · q mW/Hz). In addition, both transmitters and receivers can be parallelized to efficiently implement the Column-Row-Parallel architecture. Particularly for the receivers, a special output stage is implemented for the receiver LNA, such that the analog outputs are combined for a higher SNR, which scales with the number of LNAs as 10 log(N ) dB. The 2D transceiver chip is also designed to be fault-tolerant against defects existing in CMUT arrays. The transceiver channels can be used for detecting CMUT short elements and then disconnecting the non-functional elements from the array. This selective element disabling capability is realized electrically and can be automated. This design strategy has proven to be critical for working with faulty MEMS devices. It is especially beneficial for the 2D arrays with large element count, and it reflects a highly desirable feature for front-end sensor interface circuit design. The random access to the array elements serves as a flexible testing infrastructure in general. Not only faulty channels can be detected and isolated, performance characterizations of functional channels can also be obtained with this programmable interface. There are several low power circuit design techniques used in this work, to make the system suitable for wearable / portable applications. They are summarized as follows: • The multi-level pulse shaping technique is combined with a charge-recycling, regulated power supply to implement the transmit pulser in Section 5.2.1. The dynamic power consumed by the capacitive load of the CMUT element is re156 duced by half with a 3-level pulsing scheme. The regulated power supply that recycles the charge is implemented with a shared DC-DC power converter, which requires only two off-chip capacitors. The circuit is scalable to more channels, and is easily integrated. • The transimpedance amplifier topology for the receiver LNA is optimized for the best power efficiency, given the bandwidth, gain and noise performance requirements. The optimization procedure is described in Section 5.3.1. The revised noise efficiency factor (NEF’) shows that the amplifier design achieves the lowest power consumption while meeting all design targets. In addition to the low active power dissipation, the amplifier also implements a very low power sleep mode. Both the main amplifier stages and the biasing currents are turned off in sleep mode by auxiliary switches, which help attain less than 1µs amplifier recover time during wake up. With some prior information about the scene (e.g. from a coarse image of the space in the first pass), the receiver signal chain can be put into sleep mode for power savings, when it is not needed to perform imaging at certain regions. Therefore, the sleep mode offers flexibility for system-level power scheduling. • The receiver output stage is designed to facilitate the analog signal combining. A source follower stage is specially sized to provide proper signal current summing, while overcoming the parasitics from the 2D interconnect lines. The optimization procedure is shown in Section 5.3.3. Similar to the sleep mode, the receiver output stage could benefit the system flexibility by providing programmable receiver parallelism. More receivers are parallelized for a better signal quality (i.e. SNR) when it is necessary for the imaging algorithm. • At the algorithm-level, flexible beam-formation schemes are proposed, such that power consumption can be tuned according to the image quality requirement. In Sections 4.2 and 4.4, the 3D plane-wave coherent compounding (PWCC3D) algorithm and the annular ring aperture imaging method are presented as scalable imaging algorithms that could adjust image volume rate, contrast and resolu157 tion performance for variable system power dissipation. For example, with less transmit plane-wave angles in PWCC3D, or less annular rings formed in the ring apertures, the image contrast and resolution are degraded, but the energy consumed for data acquisition of one volumetric image is decreased. Moreover, after data acquisition, PWCC3D beamforming processing is also scalable. A low resolution volumetric image should first be computed with relatively low power consumption; a higher resolution image can then be computed based on the region of interest. The latter consumes more digital computation power, but offers proportionally higher image resolution performance. 7.2 Future Work Several possible improvements can be made for the 16x16 Column-Row-Parallel ASIC: • A more complete analog front-end design would require on-chip ADCs. Because the scalable Column-Row-Parallel architecture offers “N ” I/O complexity, only 16 (rather than 256) ADCs are required for the 16x16 ASIC, which is practical to implement. In fact, there are many octal analog front-end ASICs commercially available for conventional 1D ultrasonic arrays [91–93], where each ADC channel occupies 2 LVDS output pins to output serialized digital data. The ColumnRow-Parallel ASIC with on-chip ADCs could take the same strategy to deal with the massive amount of data and save pin count. • When the 2D array size grows beyond 16x16, if a single ASIC with excessive silicon area is to be avoided for yield and reliability reasons, multiple ColumnRow-Parallel ASICs could be tiled together for expansion. For example, four 16x16 ASICs make a 32x32 front-end system. To simplify the tiling assembly, the ASIC layout could be re-arranged, such that only two sides, instead of all four sides of the chip have extra area for peripheral I/Os. In this way, the 16x16 transceiver array is exposed to two sides, to which four of the same ASIC chips can be simply abutted for a 32x32 transceiver array, as shown in Figure 7-1. 158 Figure 7-1: Four 16x16 ASICs tiled together for a 32x32 imaging front-end. Figure 7-2: CMUT-ASIC assembly alternatives to eliminate the interposer PCB: (a) TSV technology for interconnecting ASIC I/Os to the main testing PCB; (b) Applying flip-chip bonding technology for CMUT-ASIC interconnection and wire-bonding for ASIC I/Os. • The current system assembly is accomplished by using an interposer PCB as shown in Section 4.1. It helps adapt to different CMUT footprint and it serves as an intermediate substrate, such that the ASIC I/Os can be connected to the main testing board. But it increases the assembly complexity and adds additional parasitic capacitance and resistance to the CMUT-ASIC interconnection. In the future, the interposer PCB can be eliminated by adopting new process and assembly technologies for the interconnection, such as the through silicon via (TSV), or the co-assembly of wire-bonding and flip-chip bonding. Their corresponding ASIC I/O connection methods to the main testing PCB are shown in Figure 7-2(a) and (b). • As been briefly mentioned in Section 5.2.3, the future ASIC could make the pulser gate driver programmable in driving strength, to dynamically adapt to 159 different number of active pulsers on a column / row line, leading to a power saving. Similarly for the Rx path, if the ADCs are implemented, ADCs with configurable accuracy (i.e. number of bits) can be designed to adapt to different number of active LNAs (different SNR) along the line to save power. • The programmable gain Rx LNAs in the array are currently controlled globally. It would be more flexible to have control over individual LNA, by adding configuration bits per element. Moreover, programmable Tx pulse amplitude control can be realized with a multi-level pulser design with per-element controls. Different voltage levels can be used to realize pulses with different amplitudes. These programmable functionality could enable fine-granularity, flexible apodization for both Tx and Rx apertures. In particular, it can be used to perform signal strength compensation against the channel mismatches as seen in Figure 5-18; and to compensate the missing channels by adjusting neighboring channels’ pulse shapes (amplitude and phase) as been mentioned in Section 4.1 and 5.5. At the system-level, work is being done by Bonnie Lam, under the guidance of Prof. Anantha Chandrakasan and Prof. Charles Sodini, to design a custom digital test chip that performs 3D beam-formation based on the Column-Row-Parallel analog front-end ASIC made in this thesis. This thesis aims to demonstrate the ColumnRow-Parallel architecture as a promising hardware system framework for efficient, low-power, and scalable 3D ultrasonic imaging for wearable / portable applications. The analog front-end circuit implementation is the focus, while the system-level digital beam-formation processing is performed on a PC. To demonstrate a complete wearable / portable ultrasonic imaging device, a real-time low-power digital beam-former that is optimized for the Column-Row-Parallel analog front-end is indispensable. Furthermore, intelligence can be implanted in the beam-former chip, such that the ultrasound device becomes adaptive and autonomous. The beam-former could understand the scene based on its beam-formed data, and improve its imaging strategy correspondingly. One example of intelligence has been described in Section 4.2. One 160 could control PWCC3D algorithm to either obtain volumetric images of a large space with coarse spatial resolution, or to “zoom into” a smaller region with finer resolution, under certain volume rate or power constraints. The beam-former chip could exploit such algorithm features and provide feedback controls to the analog front-end to realize a closed-loop adaptive imaging system. On another thread, PMUT is currently being investigated as an alternative transducer technology to CMUT by Katherine Smyth, under the guidance of Prof. SangGook Kim at MIT. Because Column-Row-Parallel architecture is independent of transducer technology, implementation of a Column-Row-Parallel analog front-end for PMUT would be interesting. Block-level circuit optimization is different for PMUT due to its different device characteristics as compared to CMUT. To understand the performance differences between PMUT and CMUT, and the impact to circuit topology, a detailed comparison study is currently on-going. 161 162 Bibliography [1] G. E. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE, vol. 86, no. 1, pp. 82–85, Jan 1998. [2] C. Prinz and J. Voigt, “Diagnostic accuracy of a hand-held ultrasound scanner in routine patients referred for echocardiography,” Journal of the American Society of Echocardiography, 2010. [3] S. Nikolov and J. Jensen, “3d synthetic aperture imaging using a virtual source element in the elevation plane,” in Ultrasonics Symposium, 2000 IEEE, vol. 2, oct 2000, pp. 1743 –1747 vol.2. [4] S. Nikolov, J. Jensen, R. Dufait, and A. Schoisswohl, “Three-dimensional realtime synthetic aperture imaging using a rotating phased array transducer,” in Ultrasonics Symposium, 2002. Proceedings. 2002 IEEE, vol. 2, oct. 2002, pp. 1585 – 1588 vol.2. [5] K. Chen, H.-S. Lee, A. Chandrakasan, and C. Sodini, “Ultrasonic imaging transceiver design for cmut: A three-level 30-vpp pulse-shaping pulser with improved efficiency and a noise-optimized receiver,” Solid-State Circuits, IEEE Journal of, vol. 48, no. 11, pp. 2734–2745, 2013. [6] K. Chen, A. Chandrakasan, and C. Sodini, “Ultrasonic imaging front-end design for cmut: A 3-level 30vpp pulse-shaping pulser with improved efficiency and a noise-optimized receiver,” in Solid State Circuits Conference (A-SSCC), 2012 IEEE Asian, 2012, pp. 173–176. 163 [7] K. Chen, B. Lam, C. Sodini, and A. Chandrakasan, “System energy model for a digital ultrasound beamformer with image quality control,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012, pp. 615–618. [8] G. S. Kino, Acoustic Waves: Devices, Imaging, and Analog Signal Processing. Prentice Hall, 1987. [9] R. S. C. Cobbold, Foundations of Biomedical Ultrasound. Oxford University Press, 2006. [10] D. Olendorf, C. Jeryan, and K. Boyden, The Gale encyclopedia of medicine. Gale Research (Detroit, MI), 1999. [11] J. A. Jensen, Estimation of Blood Velocities Using Ultrasound, A Signal Processing Approach. Cambridge University Press, 1996. [12] T. Szabo, Diagnostic Ultrasound Imaging: Inside Out. Elsevier, 2004. [13] P. Satamura, “Study of the flow patterns in peripheral arteries by ultrasonics,” J. Acoust. Soc. Japan, vol. 15, pp. 151–158, 1959. [14] D. Baker, “Pulsed ultrasonic doppler blood-flow sensing,” Sonics and Ultrasonics, IEEE Transactions on, vol. 17, no. 3, pp. 170 – 184, jul 1970. [15] C. Kasai, K. Namekawa, A. Koyano, and R. Omoto, “Real-time two-dimensional blood flow imaging using an autocorrelation technique,” IEEE Transactions on Sonics and Ultrasonics, vol. SU-32, no. 3, pp. 458–463, May 1985. [16] M. Anderson, M. McKeag, and G. Trahey, “The impact of sound speed errors on medical ultrasound imaging,” The Journal of the Acoustical Society of America, vol. 107, p. 3540, 2000. [17] D. H. Evans and W. N. McDicken, Doppler Ultrasound (Second ed.). Wiley and Sons, 2000. 164 John [18] F. Tranquart, N. Grenier, V. Eder, and L. Pourcelot, “Clinical use of ultrasound tissue harmonic imaging,” Ultrasound in medicine & biology, vol. 25, no. 6, pp. 889–894, 1999. [19] A. Novell, M. Legros, N. Felix, and A. Bouakaz, “Exploitation of capacitive micromachined transducers for nonlinear ultrasound imaging,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 56, no. 12, pp. 2733–2743, 2009. [20] S. Satir and F. L. Degertekin, “Harmonic reduction in capacitive micromachined ultrasonic transducers by gap feedback linearization,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 59, no. 1, pp. 50–59, Jan 2012. [21] F. Lin, C. Cachard, R. Mori, J. Viti, F. Varray, F. Guidi, and O. Basset, “Influences of bubble motion to second-harmonic inversion imaging,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012, pp. 675–678. [22] M. Pasovic, M. Danilouchkine, T. Faez, P. L. van Neer, C. Cachard, A. F. van der Steen, O. Basset, and N. de Jong, “Second harmonic inversion for ultrasound contrast harmonic imaging,” Physics in Medicine and Biology, vol. 56, no. 11, p. 3163, 2011. [23] J. Rubin, R. Bude, P. Carson, R. Bree, and R. Adler, “Power doppler us: a potentially useful alternative to mean frequency-based color doppler us.” Radiology, vol. 190, no. 3, p. 853, 1994. [24] J. Platt, J. Rubin, J. Ellis, and M. DiPietro, “Duplex doppler us of the kidney: differentiation of obstructive from nonobstructive dilatation.” Radiology, vol. 171, no. 2, p. 515, 1989. [25] A. Yuan, P. Yang, D. Chang, C. Yu, S. Kuo, and K. Luh, “Lung sequestration. diagnosis with ultrasound and triplex doppler technique in an adult.” Chest, vol. 102, no. 6, p. 1880, 1992. 165 [26] K. Thomenius, “Evolution of ultrasound beamformers,” in Ultrasonics Symposium, 1996. Proceedings., 1996 IEEE, vol. 2, nov 1996, pp. 1615 –1622 vol.2. [27] E. Brunner, “Ultrasound system considerations and their impact on front-end components,” Analog Devices, 2002. [28] J. Bushberg, The essential physics of medical imaging. Williams & Wilkins, 2002. [29] H. Pettersson, The Encyclopaedia of Medical Imaging: Physics, Techniques and Procedures vol. 1. Taylor & Francis Ltd, 1998. [30] X. Zeng and R. J. McGough, “Evaluation of the angular spectrum approach for simulations of near-field pressures,” The Journal of the Acoustical Society of America, vol. 123, no. 1, p. 68, 2008. [31] C. Capps, “Near field or far field,” EDN, August, vol. 16, pp. 95–102, 2001. [32] L. Steiner and P. Andrews, “Monitoring the injured brain: Icp and cbf,” British journal of anaesthesia, vol. 97, no. 1, p. 26, 2006. [33] F. M. Kashif, T. Heldt, and V. G. C., “Model-based estimation of intracranial pressure and cerebrovascular autoregulation,” Comput Cardiol, pp. 35: 369–372, 2008. [34] W. Mason, Electromechanical transducers and wave filters. Van Nostrand Reinhold, 1946. [35] F. V. Hunt and D. T. Blackstock, Electroacoustics: the analysis of transduction, and its historical background. American Institute of Physics for the Acoustical Society of America, 1982. [36] C. H. Sherman and J. L. Butler, Transducers and arrays for underwater sound. Springer, 2007. 166 [37] B. Savord and R. Solomon, “Fully sampled matrix transducer for real time 3d ultrasonic imaging,” in Ultrasonics, 2003 IEEE Symposium on, vol. 1, 2003, pp. 945–953 Vol.1. [38] C. H. Seo and J. Yen, “A 256 x 256 2-d array transducer with row-column addressing for 3-d rectilinear imaging,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 56, no. 4, pp. 837–847, 2009. [39] O. Oralkan, A. Ergun, J. Johnson, M. Karaman, U. Demirci, K. Kaviani, T. Lee, and B. Khuri-Yakub, “Capacitive micromachined ultrasonic transducers: nextgeneration arrays for acoustic imaging?” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 49, no. 11, pp. 1596–1610, Nov 2002. [40] I. Wygant, X. Zhuang, D. Yeh, O. Oralkan, A. Ergun, M. Karaman, and B. Khuri-Yakub, “Integration of 2d cmut arrays with front-end electronics for volumetric ultrasound imaging,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 55, no. 2, pp. 327–342, Feb 2008. [41] O. Oralkan, “Acoustic imaging using capacitive micromachined ultrasonic transducer arrays: devices, circuits, and systems,” Ph.D. dissertation, Stanford University, 2004. [42] I. Wygant, N. Jamal, H. Lee, A. Nikoozadeh, O. Oralkan, M. Karaman, and B. Khuri-yakub, “An integrated circuit with transmit beamforming flip-chip bonded to a 2-d cmut array for 3-d ultrasound imaging,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 56, no. 10, pp. 2145– 2156, Oct 2009. [43] A. Bhuyan, J. W. Choe, B. C. Lee, I. Wygant, A. Nikoozadeh, O. Oralkan, and B. T. Khuri-Yakub, “3d volumetric ultrasound imaging with a 32x32 cmut array integrated with front-end ICs using flip-chip bonding technology.” IEEE International Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, Feb 2013, pp. 396–397. 167 [44] J. Zahorian, M. Hochman, T. Xu, S. Satir, G. Gurun, M. Karaman, and F. Degertekin, “Monolithic cmut-on-cmos integration for intravascular ultrasound applications,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 58, no. 12, pp. 2659–2667, Dec 2011. [45] G. Gurun, P. Hasler, and F. L. Degertekin, “Front-end receiver electronics for high-frequency monolithic cmut-on-cmos imaging arrays,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 58, no. 8, pp. 1658–1668, Aug 2011. [46] P. Helin, P. Czarnecki, A. Verbist, G. Bryce, X. Rottenberg, and S. Severi, “Poly-SiGe-based cmut array with high acoustical pressure.” IEEE International Conference on Micro Electro Mechanical Systems (MEMS), Jan 2012, pp. 305– 308. [47] D. Dausch, J. Castellucci, D. Chou, and O. Von Ramm, “Theory and operation of 2-d array piezoelectric micromachined ultrasound transducers,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 55, no. 11, pp. 2484–2492, 2008. [48] A. Hajati, D. Latev, D. Gardner, A. Hajati, D. Imai, M. Torrey, and M. Schoeppler, “Three-dimensional micro electromechanical system piezoelectric ultrasound transducer,” Applied Physics Letters, vol. 101, no. 25, pp. 253 101–253 101– 5, 2012. [49] K. Smyth, S. Bathurst, F. Sammoura, and S.-G. Kim, “Analytic solution for nelectrode actuated piezoelectric disk with application to piezoelectric micromachined ultrasonic transducers,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 60, no. 8, pp. 1756–1767, 2013. [50] P. Muralt, N. Ledermann, J. Paborowski, A. Barzegar, S. Gentil, B. Belgacem, S. Petitgrand, A. Bosseboeuf, and N. Setter, “Piezoelectric micromachined ultrasonic transducers based on pzt thin films,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 52, no. 12, pp. 2276–2288, 2005. 168 [51] I. Wygant, “A comparison of cmuts and piezoelectric transducer elements for 2d medical imaging based on conventional simulation models,” in Ultrasonics Symposium (IUS), 2011 IEEE International, 2011, pp. 100–103. [52] J. Jensen, “Field: A program for simulating ultrasound systems,” in NordicBaltic Conference on Biomedical Imaging, 1996. [53] J. A. Jensen and N. B. Svendsen, “Calculation of pressure fields from arbitrarily shaped, apodized, and excited ultrasound transducers,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 39, no. 2, pp. 262–267, Mar 1992. [54] M. Karaman, I. Wygant, O. Oralkan, and B. Khuri-Yakub, “Minimally redundant 2-d array designs for 3-d medical ultrasound imaging,” Medical Imaging, IEEE Transactions on, vol. 28, no. 7, pp. 1051–1061, 2009. [55] B.-H. Kim, T.-K. Song, Y. Yoo, J. H. Chang, S. Lee, Y. Kim, K. Cho, and J. Song, “Hybrid volume beamforming for 3-d ultrasound imaging using 2-d cmut arrays,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012, pp. 2246–2249. [56] J. Song, S. Jung, Y. Kim, K. Cho, B. Kim, S. Lee, J. Na, I. Yang, O.-k. Kwon, and D. Kim, “Reconfigurable 2d cmut-asic arrays for 3d ultrasound image,” in SPIE Medical Imaging. International Society for Optics and Photonics, 2012, pp. 83 201A–83 201A. [57] B.-H. Kim, Y. Kim, S. Lee, K. Cho, and J. Song, “Design and test of a fully controllable 64x128 2-d cmut array integrated with reconfigurable frontend asics for volumetric ultrasound imaging,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012, pp. 77–80. [58] M. Rasmussen and J. Jensen, “3-d ultrasound imaging performance of a rowcolumn addressed 2-d array transducer: A measurement study,” in Ultrasonics Symposium (IUS), 2013 IEEE International, 2013. 169 [59] T. Christiansen, C. Dahl-Petersen, J. Jensen, and E. Thomsen, “2-d row-column cmut arrays with an open-grid support structure,” in Ultrasonics Symposium (IUS), 2013 IEEE International, 2013. [60] M. Rasmussen and J. Jensen, “2-d row-column cmut arrays with an open-grid support structure,” in Proceedings of SPIE, vol. 8675. SPIE - International Society for Optical Engineering, 2013. [61] X. Zhuang, D.-S. Lin, A. Ergun, O. Oralkan, and B. Khuri-Yakub, “P2p-6 trenchisolated cmut arrays with a supporting frame,” in Ultrasonics Symposium, 2006. IEEE, 2006, pp. 1955–1958. [62] D.-S. Lin, R. Wodnicki, X. Zhuang, C. Woychik, K. Thomenius, R. Fisher, D. Mills, A. Byun, W. Burdick, P. Khuri-Yakub, B. Bonitz, T. Davies, G. Thomas, B. Otto, M. Topper, T. Fritzsch, and O. Ehrmann, “Packaging and modular assembly of large-area and fine-pitch 2-d ultrasonic transducer arrays,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 60, no. 7, pp. 1356–1375, 2013. [63] D.-S. Lin, X. Zhuang, R. Wodnicki, C. Woychik, O. Omer, M. Kupnik, and B. Khuri-Yakub, “Packaging of large and low-pitch size 2d ultrasonic transducer arrays,” in Micro Electro Mechanical Systems (MEMS), 2010 IEEE 23rd International Conference on, 2010, pp. 508–511. [64] S. Smith, H. Pavy Jr, and O. von Ramm, “High-speed ultrasound volumetric imaging system. i. transducer design and beam steering,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 38, no. 2, pp. 100– 108, 1991. [65] O. von Ramm, S. Smith, and H. Pavy Jr, “High-speed ultrasound volumetric imaging system. ii. parallel processing and image display,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 38, no. 2, pp. 109– 115, 1991. 170 [66] J. Bercoff, Medical “Ultrafast Applications, from: ultrasound Prof. Oleg imaging,” Minin Ultrasound (Ed.), InTech, Imaging - Available http://www.intechopen.com/books/ultrasoundimaging-medical- applications/ultrafast-ultrasound-imaging, pp. 3–24, 2011. [67] O. Couture, M. Fink, and M. Tanter, “Ultrasound contrast plane wave imaging,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 59, no. 12, pp. –, 2012. [68] G. Montaldo, M. Tanter, J. Bercoff, N. Benech, and M. Fink, “Coherent planewave compounding for very high frame rate ultrasonography and transient elastography,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 56, no. 3, pp. 489–506, 2009. [69] J. Bercoff, M. Tanter, and M. Fink, “Supersonic shear imaging: a new technique for soft tissue elasticity mapping,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 51, no. 4, pp. 396–409, 2004. [70] S. Nikolov, J. Kortbek, and J. Jensen, “Practical applications of synthetic aperture imaging,” in Ultrasonics Symposium (IUS), 2010 IEEE, 2010, pp. 350–358. [71] M. Docter, R. Beurskens, G. Ferin, P. Brands, J. Bosch, and N. de Jong, “A matrix phased array system for 3d high frame-rate imaging of the carotid arteries,” in Ultrasonics Symposium (IUS), 2010 IEEE, 2010, pp. 318–321. [72] S. Krishnan and M. O’Donnell, “Transmit aperture processing for nonlinear contrast agent imaging,” Ultrasonic imaging, vol. 18, no. 2, pp. 77–105, 1996. [73] A. Nikoozadeh, “Intracardiac ultrasound imaging using capacitive micromachined ultrasonic transducer (cmut) arrays,” Ph.D. dissertation, Stanford University, 2010. [74] A. Nikoozadeh, I. Wygant, D.-S. Lin, O. Oralkan, A. Ergun, D. Stephens, K. Thomenius, A. Dentinger, D. Wildes, G. Akopyan, K. Shivkumar, A. Mahajan, D. Sahn, and B. Khuri-Yakub, “Forward-looking intracardiac ultrasound 171 imaging using a 1-d cmut array integrated with custom front-end electronics,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 55, no. 12, pp. 2651–2660, 2008. [75] D. Yeh, O. Oralkan, I. Wygant, M. O’Donnell, and B. Khuri-Yakub, “3-d ultrasound imaging using a forward-looking cmut ring array for intravascular/intracardiac applications,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 53, no. 6, pp. 1202–1211, 2006. [76] C. Tekes, M. Karaman, and F. Degertekin, “Optimizing circular ring arrays for forward- looking ivus imaging,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 58, no. 12, pp. –, 2011. [77] R. Fisher, K. Thomenius, R. Wodnicki, R. Thomas, S. Cogan, C. Hazard, W. Lee, D. Mills, B. Khuri-Yakub, A. Ergun, and G. Yaralioglu, “Reconfigurable arrays for portable ultrasound,” in Ultrasonics Symposium, 2005 IEEE, vol. 1, Sept 2005, pp. 495–499. [78] R. Fisher, R. Wodnicki, S. Cogan, R. Thomas, D. Mills, C. Woychik, R. Lewandowski, and K. Thomenius, “Packaging and design of reconfigurable arrays for volumetric imaging,” in Ultrasonics Symposium, 2007. IEEE, Oct 2007, pp. 407–410. [79] D. Bianchi, F. Quaglia, A. Mazzanti, and F. Svelto, “A 90Vpp 720MHz GBW linear power amplifier for ultrasound imaging transmitters in BCD6-SOI.” IEEE International Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, Feb 2012, pp. 370–372. [80] B. Haider, “Power drive circuits for diagnostic medical ultrasound.” IEEE International Symposium on Power Semiconductor Devices and IC’s, 2006, pp. 1–8. [81] “MD1712 data sheet: High speed, integrated ultrasound driver IC,” Supertex, Sunnyvale, CA, USA. 172 [82] “STHV748 data sheet: Quad +/-90V, +/-2A, 3/5 levels, high speed ultrasound pulser,” STMicroelectronics, Geneva, Switzerland. [83] “TX734 data sheet: Quad channel, 3-level RTZ, +/-75V, 2A integrated ultrasound pulser,” Texas Instruments, Dallas, TX, USA. [84] L. Svensson and J. Koller, “Driving a capacitive load without dissipating fCV2.” IEEE Symposium on Low Power Electronics, Digest of Technical Papers, 1994, pp. 100–101. [85] K. Kristoffersen and H. Torp, “Method and apparatus for generating a multi-level ultrasound pulse,” Apr. 4 2006, U.S. Patent 7,022,074. [86] S.-Y. Peng, M. Qureshi, P. Hasler, A. Basu, and F. Degertekin, “A charge-based low-power high-snr capacitive sensing interface circuit,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 55, no. 7, pp. 1863–1872, Aug 2008. [87] S. Berg, T. Ytterdal, and A. Ronnekleiv, “Co-optimization of cmut and receive amplifiers to suppress effects of neighbor coupling between cmut elements.” IEEE Ultrasonics Symposium, Nov 2008, pp. 2103–2106. [88] J. Graeme, Photodiode Amplifiers: Op Amp Solutions. McGraw-Hill, 1995. [89] I. Cicek, A. Bozkurt, and M. Karaman, “Design of a front-end integrated circuit for 3d acoustic imaging using 2d cmut arrays,” IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, vol. 52, no. 12, pp. 2235–2241, Dec 2005. [90] M. S. J. Steyaert, W. M. C. Sansen, and C. Zhongyuan, “A micropower lownoise monolithic instrumentation amplifier for medical purposes,” IEEE Journal of Solid-State Circuits, vol. SC-22, pp. 1163–1168, Dec 1987. [91] “AFE5808 data sheet: Fully integrated, 8-channel ultrasound analog front end with passive CW mixer,” Texas Instruments, Dallas, TX, USA. [92] “AD9277 data sheet: Octal LNA/VGA/AAF/14-bit ADC and CW I/Q demodulator,” Analog Devices, Inc., Norwood, MA, USA. 173 [93] “MAX2082 data sheet: Octal ultrasound transceiver with integrated AFE, pulser, T/R switch, and coupling capacitors,” Maxim Integrated, San Jose, CA, USA. 174