Surface Detection and Object Recognition in a Real-Time Three-Dimensional Ultrasonic Imaging System By Daniel Charles Letzler Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTTS INSTITUTE OF TECHNOLOGY MASSACHUSE OF TECh July 20, 1999 @ 1999 Daniel C. Letzler. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others the right to do so. Author - / Depa. .- ien f Electrical Engineering and Computer Science July 20, 1998 Certified b V Dan Dudgeon Thesis Supervisor Accepted by Arthur C. Smith Chairman, Department Committee on Graduate Theses Surface Detection and Object Recognition in a Real-Time Three-Dimensional Ultrasonic Imaging System By Daniel Charles Letzler Submitted to the Department of Electrical Engineering and Computer Science July 20, 1999 In Partial Fulfillment of the Requirements for the Degree Master of Engineering in Electrical Engineering and Computer Science Abstract A real-time three-dimensional acoustical imager is currently under development at Lockheed Martin IR Imaging Systems. This thesis first presents a brief overview of the hardware and software components of this system. Second, an algorithm capable of performing real-time, software-based surface detection from the acoustical data on a set of digital signal processing chips is presented. Third, the similarities and differences between the not yet operational acoustical imager and an experimental system at Lockheed Martin are explored. Fourth, an object recognition prototype is developed using data from this experimental system and found to have a great deal of discriminatory power. Based upon the comparison and contrast of the imager and the experimental system, it is then asserted that the acoustical imager should be capable of similar performance. Finally, suggestions for the implementation of such an object recognition system on an acoustical imager are presented. Thesis Supervisor: Dan Dudgeon Title: Senior Staff Member, Massachusetts Institute of Technology Lincoln Laboratory 2 Table of Contents 8 8 8 1 Introduction 1.1 Acoustical imaging 1.2 Thesis overview 2 System components 2.1 Acoustical lens 11 12 2.2 Transducer Hybrid Assembly 2.3 Acoustical Imaging Module 13 13 2.4 2.5 2.6 2.7 13 14 15 15 Image Interface Board Digital Signal Processor Image Processing Board Liquid crystal display screen Host computer 3 System software description 3.1 Common framework 3.2 Communications 3.3 Data acquisition processor 3.4 Peak detection processors 3.5 Video display processor 16 16 17 18 19 19 4 Surface detection algorithm 4.1 Introduction 21 21 4.1.1 Brief algorithm description _21 4.1.2 Algorithm justification 4.2 Background 21 23 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 23 23 23 27 27 4.3 ADSP-21060 SHARC hardware considerations 28 4.4 Required preprocessing 4.4.1 Requirements 4.4.2 Expected method of preprocessing Alternate preprocessing by the input DSP chip 4.4.3 4.5 The peak detection algorithm 32 32 32 32 33 English description of the algorithm Block diagram of the algorithm 33 36 4.5.1 4.5.2 5 Terminology and notational conventions Definition of a peak Expected form of data for peak detecting DSP chips Memory requirements for each of peak detecting DSP chips Potential first peak data structure 4.5.3 Illustration of the algorithm 4.6 Proof of algorithm correctness 4.7 Analysis of computational burden of algorithm 4.8 Analysis of communications burden of algorithm 4.9 Conclusion 37 38 41 44 45 Analysis of data gatheredby the Acoustical Imaging System 5.1 Introduction 5.2 Assumption of linear time invariant system performance 5.3 System dynamic range analysis 5.4 Experimental set-up: explanation, justification, and relation to AIS 46 46 46 48 49 3 6 5.4.1 Data acquisition 5.4.2 Target selection 5.4.3 Target presentation 5.4.4 Experimental set-up justification 5.4.5 Relation of experimental system to Acoustical Imaging System 5.5 Conclusion 50 50 51 52 54 59 Object recognition feasibility study 6.1 Introduction 6.2 Dependence of acquired data on viewpoint 6.3 Object recognition feature selection 6.3.1 Material-based classification features 61 61 61 70 71 6.3.1.1 Thick-shelled targets 71 6.3.1.2 Thin-shelled targets 6.3.1.3 Cylindrical targets 6.3.2 Structure-based classification features 6.3.2.1 Thick-shelled targets 6.3.2.2 Thin-shelled targets 6.3.2.3 Cylindrical targets 6.3.3 78 78 79 79 81 84 Frequency domain-based classification features 6.3.3.1 Thick-shelled targets 6.3.3.2 Thin-shelled targets 86 88 89 6.3.3.3 Cylindrical targets 89 6.3.4 Summary of target classification criteria 89 6.4 Object recognition prototype presentation 90 6.5 Object recognition prototype performance 93 6.5.1 OR performance: closed problem space, highly controlled orientation, no noise added 6.5.2 6.5.3 93 OR performance: closed problem space, highly controlled orientation, Gaussian noise added 95 OR performance: closed problem space, loosely controlled orientation, no noise added 6.5.4 7 8 OR performance: 98 open problem space, highly controlled orientation, no noise added 6.6 Conclusion 100 104 Object recognition implementation suggestions 7.1 Introduction 105 105 7.2 Sample images 105 7.3 7.4 7.5 7.6 7.7 107 108 110 110 111 Target identification Use of an ensemble of time series to improve object recognition Image-level object recognition Weighting of classification features Acoustical Imaging System design suggestions 7.8 Conclusion 112 Conclusion 114 Appendix A. List of acronyms B. Source code of prototype peak detection algorithm 4 116 116 117 WOMMONS C. Peak detection algorithm computational burden determination D. Table of test objects used E. Equation-based statement of each of the discriminatory tests used 120 125 126 F. Matlab source code for object recognition prototype 130 References 138 5 List of Figures Figure 2-1: System electronics block diagram 12 Figure 2-2: DSP Image Processing Board 15 Figure 4-1: Impact of noise on the surface detection algorithm performance Figure 4-2: Illustration of acoustical time series Figure 4-3: Hypothetical acoustical time series #1 Figure 4-4: Hypothetical acoustical time series #2 Figure 4-5: Hypothetical acoustical time series #3 Figure 4-6: Block diagram of the peak detection algorithm Figure 4-7: Illustration of the peak detection algorithm Figure 4-8: Worst case computational burden of peak detection v. processing block size Figure 4-9: Computational burden of peak detection v. maximum examination window Figure 5-1: Signal path of acoustical energy Figure 5-2: Typical bistatic acoustical imaging signal levels Figure 5-3: Drawing of target suspension apparatus Figure 5-4: The dependence of scattered intensity on the wavenumber Figure 5-5: Frequency content of recorded piston transducer acoustical pulse Figure 5-6: Receive transfer function of AIS from PiezoCAD Transducer Design Report Figure 6-1: Acoustical backscatter recordings from object #9 with vertical translations Figure 6-2: Acoustical backscatter recordings from object #9 with horizontal rotations Figure 6-3: Acoustical backscatter recordings from object #19 with vertical translations Figure 6-4: Acoustical backscatter recordings from object #19 with horizontal rotations Figure 6-5: Illustration of changing appearance of target with vertical translations Figure 6-6: Example acoustical returns from each of the three classes of objects Figure 6-7: Example recordings from thick-shelled PVC, aluminum, and brass pipes Figure 6-8: Schematic illustration of the material determination statistic computation Figure 6-9: Structure-based features in an example thick-shelled acoustical recording Figure 6-10: Structure-based features in an example thin-shelled acoustical recording Figure 6-11: Structure-based features in an example cylindrical acoustical recording Figure 6-12: A typical transfer function estimate Figure 6-13: Flow diagram of the object recognition prototype program Figure 6-14: Histograms showing typical best and second best scoring template matches Figure 7-1: Sample images taken from a precursor of the Acoustical Imaging System 6 22 25 25 26 26 36 37 43 44 47 49 51 53 57 58 63 64 65 66 68 71 75 76 79 84 85 88 92 ___ 101 106 List of Tables Table 6-1: Objects used for orientation-dependence determination 62 Table 6-2: Pipe wall thickness required to be thick-shelled 72 Table Table Table Table Table Table Table Table 6-3: Third to second reflection ratio statistics for several materials 6-4: Summary of class-specific object recognition criteria 6-5: Confusion matrix for the OR prototype 6-6: Accuracy of the object recognition prototype at varying levels of added noise 6-7: Confusion matrix for the OR prototype with 0.01 std noise added to data 6-8: Confusion matrix for the OR prototype with 0.02 std noise added to data 6-9: Confusion matrix for the OR prototype for loosely controlled orientation data 6-10: Confusion matrix for the OR prototype in an open problem space 7 77 90 94 95 96 97 99 103 1 Introduction 1.1 Acoustical imaging A great deal of recent work has been devoted to the development of high-resolution threedimensional ultrasonic systems for short-range imaging applications. Despite the limited resolution of high-frequency (in the low MHz range) ultrasonic systems as compared to optical systems, ultrasonic systems possess the advantage of having significant penetrative range through many translucent and opaque media that render optical systems useless. Medical ultrasound is a well-known application of ultrasonic imaging that takes advantage of this penetrative capability into human tissue. Further, ultrasonic imagers are relatively uninhibited by murky waters, which severely limit the performance of their optical counterparts [1]. Therefore, diver-held high-resolution underwater ultrasonic imagers have been developed for such applications as underwater search and rescue, pipe and ship hull examination, and mine detection [1]. Lockheed Martin IR Imaging Systems in Lexington, MA is currently developing such an imager. It is referred to as the Acoustical Imaging System (AIS) 1 . 1.2 Thesis overview The body of this thesis is divided into six chapters. Chapters 2 and 3 are primarily background material. Chapter 2 provides an overview of the hardware used by Lockheed Martin's AIS. Similarly, chapter 3 is devoted to the AIS software, particularly that which runs on a set of digital signal processing chips. Chapter 4 discusses the surface detection algorithm used by the AIS. This algorithm, which is performed in software by a set of digital signal processing chips, sorts through the acoustical return data to extract the surfaces of imaged objects. A brief rationale for the algorithm is presented. Next, features of the digital signal processor hardware that impacted the algorithmic implementation are discussed. The algorithm is then presented in detail, and a proof is presented regarding its correctness. Finally, the computational Throughout the course of this thesis, many acronyms will be used. For an alphabetized listing of these acronyms, see appendix A. 8 and communications burdens created by the algorithm are analyzed and found to be well within the performance capabilities of the digital signal processors used. An analysis of the data that will be gathered by the Acoustical Imaging System and a comparison of this data to data available through an experimental system at Lockheed Martin is presented in chapter 5. Although the AIS and realistic targets could not be used for the experimental work regarding object recognition presented in chapter 6, this chapter is intended to show that the conclusions reached with the experimental system and targets which were available can be applied to the AIS. An explanation of why the AIS can be considered to be a linear, time-invariant system is presented first. Next, the dynamic range of the AIS is shown to exceed that used by the experimental system. The methods of data acquisition and target presentation in the experimental system are then presented. Also, the targets selections are presented. Finally, reasons for the applicability of the experimental system and the targets used to the AIS imaging real-world targets are offered, and specific differences are discussed. Chapter 6 presents the results of an object recognition feasibility study performed with the experimental system and test targets. Because the AIS was unavailable for this work and the experimental system is composed of only a single piston transducer, all of the object recognition work deals with only acoustical time-series data, and not image data. The experimental data is shown to have some degree of viewpoint dependence. Due to differences in the transducers, this viewpoint dependence may or may not hold for data gathered by the AIS. Features that allow discrimination amongst the test targets are then presented. These fall into three basic categories: material-based, structure-based, and frequency domain-based features. Further, different features are found to be important for different types of targets, and categories of targets are created, each of which possesses category-specific features. The prototype object recognition system is then presented. Training data of known identity is used to build a series of templates. The prototype object recognition system then uses these templates to determine the identity of an unknown target by gauging the similarity of the target to each template. Next, the 9 performance of the object recognition prototype on a set of test data is assessed. Four cases are examined: a closed problem space with highly controlled viewpoint and no noise added, a closed problem space with highly controlled viewpoint and noise added, a closed problem space with less highly controlled viewpoint and no noise added, and finally an open problem space with highly controlled viewpoint and no noise added. The performance of the prototype in all cases is shown to be excellent. Finally, it is asserted that given the success of the object recognition prototype, implementation of an object recognition system on the AIS appears viable. Chapter 7 briefly presents suggestions for the implementation of an object recognition system on the AIS. As a first step, two sample images from a precursor of the AIS are presented. These images are used to motivate a discussion of how regions of interest may be identified in an acoustical image. These regions of interest could possible be a known object, and thus should undergo the object recognition algorithm. The use of an ensemble of time series from the region of interest to improve recognition power is then discussed. Next, incorporation of image-level characteristics into a recognition algorithm is presented. Finally, different classification features were found during the creation of the object recognition prototype to have varying amounts of discriminatory power across the set of known objects. It is argued that by adding a step to the training process in which object-specific weights are assigned to each classification feature overall recognition performance can be improved. 10 2 System components The Acoustical Imaging System can be thought of as consisting of seven subsystems, each of which will be described briefly during this section. There are six subsystems in the signal flow path of the system. The first of these is an acoustical lens, which eliminates the need for digital beam-forming electronics. The second is the Transducer Hybrid Assembly (THA), which provides the system's transmit and receive capabilities. Following the THA is the Acoustical Imaging Module (AIM), which controls the transmit and receive timing, and performs the digitization of the received signals. Next is the Image Interface Board (1IB). The IB receives a serial stream of digital data from the AIM, formats this data for presentation to the signal processing electronics, and performs some data preprocessing. Located fifth in the signal pathway is the Digital Signal Processor (DSP) Image Processing Board, which detects the surfaces in the received data to create a range map of the imaged environment and creates the display for the user. Finally, the last component in the signal pathway is a liquid crystal display (LCD) from which the user views the image of the surrounding environment. Figure 2-1 below shows the interconnections of these subsystems. Additionally, there is a host computer associated with the system that coordinates the actions and sets the operating modes of each of the other subsystems. 11 Transmitt Control To Tower Aimp[ifier Video DSP Image ProcessingBoar[ Figure 2-1: System Electronics Block Diagram (adapted from [2]) 2.1 Acoustical lens While there has been some interest from the medical community in real-time threedimensional ultrasound (3DUS), it is particularly important that an ultrasonic imager for diver-held underwater applications be capable of providing real-time range-based representations of the objects in the surrounding environment. The requirement of realtime three-dimensional (3D) imaging, however, is at odds with another requirement on any diver-held device - that the device be of a manageable size and shape. Traditional ultrasonic imagers require extensive signal processing electronics to perform digital beamforming [3]. Signal processing electronics imply power consumption, and this power must be provided by a battery. With today's battery technology, the amount of signal processing that would be necessary to perform the digital beamforming for a realtime three-dimensional ultrasonic imager with a reasonable field of view (at least on the order of 1,000,000 volume elements) would require a prohibitively large battery [3]. An alternate approach that has been developed to avoid an exceptionally cumbersome battery is an acoustical lens. An acoustical lens eliminates the need for digital beamforming and requires no power, thus allowing a real-time 3D ultrasonic imager to be compactly 12 packaged for use by an underwater diver [3]. The acoustical lens of the AIS was designed and fabricated at Lockheed Martin IR Imaging Systems. 2.2 Transducer Hybrid Assembly The Transducer Hybrid Assembly (THA) is composed of a piezoelectric composite transducer array of 128 by 128 elements and a transmit/receive integrated circuit (TRIC). When pressure is incident upon an element of the piezoelectric composite transducer array, a voltage is produced that is proportional to the level of the incident pressure. Associated with each element of the transducer array is a miniature circuit containing capacitors that can be used to capture samples of the voltage level at different times. Further, there is circuitry in the TRIC that allows the values in the capacitors associated with each transducer element to be read out and transmitted to the Acoustical Imaging Module (AIM). Likewise, the piezoelectric transducer elements are capable of transmitting acoustical energy when driven with a voltage. 2.3 Acoustical Imaging Module The Acoustical Imaging Module (AIM) is responsible for digitizing the signals it receives from the THA. The digitized signals are then transmitted over a high-speed serial link to the Image Interface Board. Further, the AIM is responsible for providing the TRIC with its controls from the host computer. 2.4 Image Interface Board The Image Interface Board (1IB) collects the digitized samples from the AIM and reorders them to be in a traditional raster scan line format. Further, it performs gain and offset correction to each of the samples that is recorded based on tables that have been downloaded from the host computer. Additionally, the THA performs quadrature sampling on the acoustical data. In this mode of sampling, four samples are collected in a coherent fashion with respect to the central transmit frequency. The IB transforms these four quadrature samples into a single magnitude value. Finally, the IB preprocesses the 13 data in a way that allows the DSP Image Processing Board to ignore data in which there is no useful information. 2.5 Digital Signal Processor (DSP) Image Processing Board The Digital Signal Processor (DSP) Image Processing Board performs the surface detection, image formation, and image processing tasks for the Acoustical Imaging System. As mentioned before, the DSP Image Processing Board accepts raster scan data from the IB. It then processes this data to detect the range of the first reflection at every pixel. Alternatively, if there is no reflection present at a particular pixel, the DSP Image Processing Board notes this occurrence. The range associated with each pixel then serves as the range map of the imaged environment. The 128 x 128 element acoustical image is expanded via bilinear interpolation to form a 256 x 256 image. Overlay and formatting information provided by the host computer are then incorporated to form a 640 x 480 display. This 640 x 480 display is then transmitted to the LCD in RS-170 format. Six Analog Devices 21060 SHARC digital signal processors are used by the DSP Image Processing Board sub-system. Each member of the network of six Analog Devices 21060 Digital Signal Processing chips will be a member of one of three classes. While the framework of the software running on each class is the same, the functions performed by them differ greatly. First, there is a single data acquisition (DAQ) DSP chip. Second, there are four peak detection (PKD) DSP chips. Each one of these four chips will perform the surface extraction algorithm for the data from one quarter of the pixel array. Finally, there is a single display (DIS) DSP chip. The chips are connected in a ring-like manner, with two parallel data paths through the PKD chips. The network topology, along with their interrelation to the other system components, are shown below in figure 2-2. 14 Figure 2-2: DSP Image Processing Board network topology and interrelation to other system components. 2.6 Liquid crystal display (LCD) screen Planar Systems' LC640.480.21-065 is the liquid crystal display which will be used in the Acoustical Imaging System. It is a 6.5" high performance color LCD. 2.7 Host computer The host computer used in the Acoustical Imaging System is Ampro Computers CoreModule/4Dxe single board computer. This single board computer is a 486DX-based PC/AT compatible system. 15 3 System software description Because of the reliance that has been placed upon the host computer to perform the subsystem coordination and a set of digital signal processing chips to perform much of the data analysis, the Acoustical Imaging System developed by Lockheed Martin IR Imaging Systems is very software intensive. Particularly, the functionality of the entire Acoustical Imaging System depends heavily upon the software running on the set of DSP chips on the DSP Image Processing Board. For this reason, the software running on the DSP chips will be outlined in this section. 3.1 Common framework As alluded to previously, the six DSP chips in the system will run three different program sets. The software running on the data acquisition, peak detection, and video display chips, however, shares a common framework. Data is passed through the system in the form of messages. Messages are routed and processed based upon two tables: a message routing table and a system event table. The entries in the message routing table are destinations for a given message in the current processor. A destination can be either a queue associated with a communications link, which will export the message to another processor for handling, or a queue associated with a function, which will process the contents of the message. The entries in the system event table are all of the functions that the processor can perform in its current operating mode. There are three main operating modes: initialize, operate, and terminate. The background task of each DSP chip cycles through the functions listed in the system event table for the current operating mode, calling each one sequentially. In general, the functions listed in the system event table for the initialize and terminate operating modes each execute one time before, respectively, the operating mode is updated to operate or the Acoustical Imaging System ceases operation. The functions listed in the system event table for the operate operating mode generally have a queue associated with them. As each of the operate system event table functions is called sequentially, it checks its queue to see if any messages await processing. If so, the message in the queue is examined, and 16 its data is processed as necessary. After examination and processing, the function alters the message to indicate what has occurred, and then routes the message to its next destination based upon the information in the message routing table. If no messages need processing, the next function in the system event table is called. 3.2 Communications Communication among various chips in the network occurs via interrupt-driven direct memory access (DMA) data transfers. The use of DMA to perform the data transfers means that very few core processor cycles are consumed by communication. For a data export the core processor must specify only what block of data is to be transmitted and out of which communications link the transfer is to occur. Similarly, for a data import the core processor specifies how much data is to be received and where to store that data. Following this specification, the core processor returns to normal operation while the DMA controller carries out the transfer invisibly to the core. Upon completion of the transfer, the DMA controller generates an interrupt to inform the core processor that the transfer is complete. Because the size of a message varies depending upon the amount of data contained in it, the transfer of a message between DSP chips occurs in two stages. First, a message header is transferred. This header specifies the type of message being transferred and the amount of data in the message. Next, the message data is transferred. To take advantage of the DMA capability of the system and make the communications invisible to the background task running on each processor, interrupt handlers control the communication over each port. Upon completion of the import of a message header, the input DMA interrupt handler sets up second stage of the message transfer - the DMA import of the message data based upon the information in the header. When the message data transfer completes, the input DMA interrupt handler routes the message just received using the message routing table and then sets up the import of the next message header. The data export communications links operate somewhat differently. Each export link 17 has a first-in first-out (FIFO) queue associated with it. Messages routed for export to the DSP which the link connects are placed in the FIFO. The function that places them in the FIFO checks to see if there is currently an export occurring. If not, it starts the transfer of the message header automatically. If so, it simply leaves the message in the FIFO to be exported automatically by the output DMA interrupt handler. Upon completion of the export of a message header, the output DMA interrupt handler starts the export of the data associated with that header. Next, upon completion of the transfer of message data the output DMA interrupt handler looks at the export FIFO. If there are any messages in it, the output handler grabs the first one and starts to export its header. If there are no messages in the FIFO, the handler simply returns. To summarize, the use of interrupt driven DMA message transfers allows the data to flow through the system without burdening the core processor. With interrupt handlers driving the communication process, the background process can be greatly simplified. To the background process, messages appear on the queues for system event table functions with no effort. Further, when a message is ready to be transferred to another DSP, the background processor starts the export if no export is currently occurring over the necessary communications link or places it in a queue if the link is currently busy. No further effort is required by the background process. 3.3 Data acquisition (DAQ) processor The data acquisition processor provides the interface of the DSP Image Processing Board to both the Image Interface Board and the host computer. The interface of the DAQ to the IIB is very similar to the interface used between the different DSPs in that data is transferred over the same type of communications link. However, the IIB has no knowledge of the messages used within the DSP network. Luckily, the data transmissions from the IIB are always of the same size and the data is always ultrasonic image data. Therefore, the DAQ always DMAs the IIB data into the data area of a message and then applies a standard message header to that message. 18 The message containing the IB data is then routed to a function which divides up the IB data and places it into messages which will be routed to the proper peak detection processor. The interface of the DAQ DSP to the host computer occurs through the use of a shared memory space. Essentially, the host computer can read to and write from several of the registers on the DAQ. One of the functions in the DAQ's system event table for the operate operating mode polls this shared memory space and determines if the host has placed an instruction there. If so, the DAQ acts upon the instruction, and then places a return message in the shared memory space. 3.4 Peak detection (PKD) processors The peak detection processors perform the peak detection algorithm described in chapter 4 of this document. A discussion of this algorithm, which extracts the front surface of the objects viewed from the ultrasonic data, will be deferred to chapter 4. 3.5 Video display (DIS) processor The video display processors accept surface range data from the PKD processors. The DIS processors perform several image processing/enhancement operations on the data before shipping it out over an RS-170 communications link to the LCD display. First, the video display processor replaces the range data of any known defective pixels with the median of the range data in the surrounding pixels. Second, the DIS processor attempts to reduce image speckle by applying a median filter to the range data. At each pixel, the median filter first computes the median range value of the surrounding pixels. Then, if the current pixel's range value differs from this median value by more than some user prescribed threshold, the current pixel's range value is replaced by the median value. Third, the DIS processor applies a grayscale transformation to the range data at each pixel. This step maps each one of the 80 range values to one of the 256 graylevel values available in a manner which will allow the user to get some sense of the threedimensional shape of the imaged objects. Finally, a 2X interpolation is applied to the 19 data in both the horizontal and vertical directions to expand the image from 128x128 to 256x256 for display. 20 4 Surface detection algorithm 4.1 Introduction This section contains an explanation of a DSP-based software algorithm used to detect object surfaces in the Acoustical Imaging System's ultrasonic data. This algorithm will identify the range at which the first object surface occurs for each location in the 128x 128 detection array and record both this range and the magnitude of the ultrasonic reflection for each location. The processing will be split among four Analog Devices 21060 SHARC DSP chips that will be operating concurrently. 4.1.1 Brief algorithm description The algorithm selected to detect surfaces in the ultrasonic data searches for local maxima in ultrasonic return magnitude. In order to be considered a possible surface, this maxima must be above a certain threshold, called the signal threshold. To be classified as a surface, that local maximum must then be followed by a local minimum at least some other threshold lower than the maxima. This second threshold is called the noise threshold. At present, the implementation used by the AIS searches only for the surface nearest to the ultrasonic imager at each position in the two-dimensional detector array. 4.1.2 Algorithm justification The algorithm briefly presented above was selected because of its robustness in the presence of background noise. Neither low-level, highly-varying nor high-level, slightlyvarying noise sources will interfere with proper surface detection under the implementation selected. The signal threshold allows all noise below a certain level of magnitude to be ignored. This eliminates the detection of low-level noise as surfaces. Noise with a mean amplitude on the order of the signals to be detected by the image is more troublesome. The influence of noise that varies about this mean by only small amount compared to the reflected signal strengths can be eliminated using an appropriate noise threshold. High- 21 level noise with a high degree of variation, however, cannot be eliminated by any means. Figure 4-1 illustrates this point graphically. Surface Surface Signal threshold Surface , Surface | Noise A threshold a) No noise c) High-level, slightlyvarying noise b) Low-level, highlyvarying noise d) High-level, highlyvarying noise Figure 4-1: Impact of noise on surface detection algorithm. In part a, a series of samples are shown with no noise added. In all subsequent parts, noise with differing properties has been added. In all parts, the horizontal dashed line represents the signal threshold, and the dashed double-headed arrow represents the noise threshold. Furthermore, the surface that is found by the surface detection algorithm is indicated. With low-level, highly-varying noise (as in part b), the signal threshold protects against incorrectly labeling a sample consisting of just noise as the surface. With high-level, slightly-varying noise (as in part c), the noise threshold prevents a sample that is just noise from being named the surface. With high-level, highlyvarying noise (as in part d), however, neither the signal threshold nor the noise threshold can protect against incorrect surface detection. The noise is at high enough levels that it surpasses the signal threshold, yet varies more than can be protected against by the noise threshold. Performing the noise threshold check in addition to just the signal threshold check for each pixel's data series places a large computational cost on the system. Naturally occurring background noises provide a justification for incurring this high computational cost. For example, the snapping shrimp Synalpheus parneomeris produces background noise of just the type that can be eliminated through use of the noise threshold. In waters less than 60 m depth at latitudes beneath 400, the noise produced by these snapping shrimp is essentially omnipresent [4]. The ambient noise created by a large bed of snapping shrimp has been described as being similar to fat sizzling in a frying pan [5]. Measurements of the acoustical signature of a single snapping shrimp were performed by Au and Banks [5]. These measurements indicate that the noise produced by the snapping shrimp is very broad in its spectral content, with energy across the entire 0 - 200 kHz range and beyond. Further, the variation in spectral density over this range is only about 20 dB [5]. 22 4.2 Background 4.2.1 Terminology and notational conventions A pixel (short for picture element) refers to one of the locations on the 128x 128 detection array. An image is a 128x 128x80 array of data. A plane is a 128x 128 array of data, all of which occurred at the same distance from the acoustical imager. Therefore, there are 80 planes to an image. Frames are made up of some number of consecutive planes. Currently, there are 8 planes to each frame which implies that there are 10 frames associated with an image, with planes 0-7 making up frame 0, planes 8-15 making up frame 2, and so on. If both the pixel and plane are specified, then a single data point in the 128x128x80 array of acoustical data has been indicated. The data points are magnitude values that represent the strength of the acoustical return at that pixel for a depth corresponding to the current plane. For a (pixel, plane) pair called P, mag(P) will be used as a shorthand form to indicate the magnitude associated with specific plane for a specific pixel. A (pixel, plane) pair could also be said to refer to a specific voxel (the volume element analog to a pixel), so mag(P) will also be referred to as the magnitude of a voxel. Peaks are associated with a particular pixel. The magnitude values for the planes of a neighboring pixel have no influence on where the peaks of a particular pixel are located. Therefore, in the remainder of this document, by referring to the magnitude associated with a plane A, or mag(A), indicates the magnitude at plane A of a particular pixel. 4.2.2 Definition of a peak As stated above, peaks are associated with a particular pixel. There may be many peaks associated with a pixel, or there may be none. A plane, A, is referred to as a peak if it has a magnitude value associated with it, mag(A), that is greater than a prescribed minimum value, called the signal threshold, and it is followed by a plane, B, with a magnitude value associated with it, mag(B), that is at least some other specified value, called the 23 noise threshold, less than mag(A). There may be no planes with magnitude greater than mag(A) between A and B if A is a peak. To summarize symbolically: Plane A is a peak iff: 1) A > signal threshold 2) 3 plane B occurring after plane A such that mag(A) - mag(B) > noise threshold 3) -3 plane C such that C is between A and B and mag(C) > mag(A) The algorithm used by the Acoustical Imaging System is interested in the detection of the first peak for every location on the detection array. There will be at most one first peak associated with each pixel. That first peak will lie at some plane number. The algorithm records this plane number and the magnitude associated with it. The following four figures should help to make the above points clear. 24 I I z Increasing plane number within a 3-D ultrasound image A single pixel at position (xy) in the 128x128 detection array y x Figure 4-2: Illustration of the type of data to be shown in all of the figures in this document. In each of the next three figures, the return value from a single location will be shown. Figures will appear as one-dimensional series because the return value for many different planes will be shown. Within each figure, the plane number will be increasing from left to right. First Peak 2nd local max 1st local max 4 i noise actual drop Figure 4-3: Example returns from a single (xy) position in the 128x128 detector array. No values below the minimum threshold line will be considered as potential peaks, but they will be considered for peak confirmation when applicable. Because the local minima following the first maxima fails to drop the required distance and the second local maxima is of greater value than the first, the second local maxima replaces the first local maxima as the candidate peak when it is encountered. The second local maxima moves from being the candidate first peak to the selected first peak when it is shown to be followed by a plane with a value low enough to meet the noise threshold criteria. 25 Figure 4-4: Example returns from a single (x,y) position in the 128x128 detector array. Note that the first local maxima is considered to be the peak in this example. Even though the local minima following the first local maxima fails to drop the required distance, the first local maxima is larger than the second local maxima. Therefore, the first local maxima maintains its position as the potential first peak. It is selected as the first peak when it is shown to be followed by a plane with magnitude low enough to satisfy the noise threshold criteria. Figure 4-5: Example returns from a single (x,y) position in the 128x128 detector array. Note that despite the local minima following the first local maxima being below the signal threshold, it is still used for the noise threshold check. Also note that despite the second local maxima being greater than the first, the first local maxima is the first peak. Because the first local maxima meets the peak criteria, the second local maxima is never considered. 26 4.2.3 Expected form of data for peak detecting DSP chips The data will be shipped into the peak detecting DSP chips one frame at a time, with each DSP receiving a specific fourth of the frame. For example, the first DSP may receive an 8 plane deep block corresponding to the pixel data for the top quarter of the detection array, the second DSP may receive a block corresponding to the top-middle quarter of pixels, and so on. Within each frame of data, the data for a pixel will be located in contiguous locations, with the data from the lower plane number being located at the lower memory location. Further, each data point will have a 1 located in its most significant bit (MSB) if there has not been a magnitude in that frame at that pixel including and before the current plane that is above the signal threshold. The peak detection algorithm presented below is based upon these assumptions. 4.2.4 Memory requirements for each of peak detecting DSP chips The peak detection algorithm will require the following three memory elements on each of the four DSP chips: *A 32,768 element array of 16-bit values for storing the frames of magnitude data. *A 4,096 element array of "Potential First Peak" data structures; a more elaborate description of this data structure will be given shortly. e A counter to store the current plane number 4.2.5 Potential first peaks data structure The "Potential First Peak", or PFP, data structure will consist of the following four fields: 1) an 8-bit value to indicate whether a supra-threshold magnitude value has been encountered for the current pixel before the beginning of the current frame, 2) an 8-bit value to store an indicator as to whether the first peak has been found at a pixel, 3) an 8bit value to store the plane number of a potential first peak, and 4) a 16-bit value to store the magnitude at that plane number. 27 4.3 ADSP-21060 SHARC hardware considerations An integral part of designing an efficient algorithm is optimizing the algorithm for the hardware which will perform it. It is not enough simply to minimize the number of C code instructions that must be performed by the program. The number of delays, or lost clock cycles, should also be considered to minimize the total operation time of the algorithm. The main sources of delays are interrupt handling, memory access conflicts, and program control branches. Interrupts occur relatively infrequently during the image processing. As described previously in section 3.2, two interrupts are generated for every message that is either transmitted or received from a DSP. Each of the PKD DSP chips will receive 10 frames/image * 15 images/second = 150 frames/second. However, because the network of DSP chips as shown in figure 2-2 is not fully interconnected, the PKD processors are required to pass through messages for each other. Thus, the front PKD DSP chips will receive 300 messages/second with a frame of data and will transmit 150 of these messages unaltered. Further, each PKD DSP will send peak detected data for each of the 15 images/second. Thus the rear PKD processors will also be required to receive 15 messages/second containing peak detected data from the front PKD processors. This arrangement corresponds to a maximum load on any one processor of 465 ultrasonic data messages that are either transmitted or received per second. This load corresponds to a maximum of 930 interrupts per second for data. Therefore, a reasonable upper limit on the number of interrupts that occur each second is 1050. Currently, interrupts are handled with interrupt nesting disabled. While this strategy may increase the delay in the handling of an interrupt, it decreases the number of overhead processor cycles required per interrupt. This decrease is the result of the smaller number of registers whose states must be pushed onto the stack prior to handling an interrupt if nesting is disabled. Under this implementation, the maximum time to handle any of the interrupts by a DSP is 150 cycles. Therefore, an upper bound of approximately 160,000 clock cycles/second will be occupied by interrupt processing. This number represents a loss of approximately 0.4% of the processing capacity of each DSP chip. 28 Alternately, memory access conflicts and program control branches are likely to be associated with the processing of each voxel. (128 x 32) pixels/plane * 80 planes/image * 15 images/second = 4,915,200 voxels/second must be processed by each of the PKD processors. Therefore any unnecessary delays incurred in this area will substantially affect performance. For the purposes of discussing the memory access conflicts that will occur in the ADSP21060 SHARC during the execution of the peak detection algorithm, the memory of the DSP can be thought of as being composed of three separate components: program memory, data memory, and the instruction cache. The program memory and the data memory are both 256 kilobytes in size. The instruction cache is 32 entries in size. The ADSP-21060 maintains separate on-chip buses for both program memory and data memory. Therefore, the processor core may simultaneously fetch the next instruction for the pipeline and grab a piece of data from memory. Further, the modified Harvard architecture of the ADSP-21060 allows data storage in program memory. This modification, however, raises the possibility of data access conflicts if the core processor is asked to both fetch an instruction and grab a piece of data from the program memory. On such occasions, the instruction cache is checked to see if the instruction to be fetched is stored there. If so, the two fetches may occur concurrently. If not, the data access conflict will cause a delay. On the current cycle, the data in the program memory will be fetched, and a no-operation (NOP) will be inserted into the instruction pipeline. On the following cycle, the next instruction will be fetched from program memory and fed into the instruction pipeline. Therefore, it is desirable to minimize the number of data access conflicts that will cause cache misses. This end can be achieved in one of two ways. First, the number of data accesses to program memory can be reduced. Second, the instructions should be placed 29 in memory such that they reduce the number of cache misses when a data access conflict does occur. Minimizing the number of data accesses to program memory is the easier, and thus preferable, method of preventing memory access conflicts. Minimization of the number of memory access conflicts was a driving factor in the selection of a frame size of eight planes. Because the acoustical return magnitude data and the potential first peak data structures must be accessed during the processing of each pixel, it was decided that these elements should be placed in data memory if possible. Each of the four peak detecting processors is responsible for the data originating from a fourth of the detection array. This corresponds to a 128 x 32 section, or 4,096 pixels. The PFP data structure associated with a pixel requires 1 byte to store the supra-threshold data indicator, 1 byte to store the char indicating if a peak has been found at that pixel, 1 byte to store the char indicating the plane number of the potential peak, and 2 bytes for the short int that stores the magnitude of the potential peak. Therefore, overall, the array of PFP data structures requires 20,480 bytes. This leaves 236 kilobytes available for the data in a frame. This corresponds to 14 planes worth of magnitude data. As shown in figure 4-8 of section 4.7, a frame size of 16 planes is the most efficient of the whole divisors of 80. However, the above analysis indicates that this block size is prohibitively large and would generate memory access conflicts. Therefore, a frame size of 8, which is slightly less efficient than 16, was selected. 2 With a frame size of 8, just over half of the data memory is used by the array of PFP data structures and the frame of magnitude data. This arrangement leaves plenty of space available in data memory for all of the other memory requirements of the system. Note that the computational efficiencies of 8 planes and 10 planes to a frame are roughly equivalent. 8 planes was selected instead of 10 simply because it is a power of 2. Further, note that the use of use of 10 planes per frame would also reduce the number of messages to be passed per frame, and thus the interrupt processing overhead, compared to 8 planes per frame. However, as mentioned earlier in section 4.3, interrupt processing overhead consumes only a tiny portion of the processors' available cycles, thus this concern is not of central importance. 2 30 Therefore, there will be no memory access conflicts, and it is not necessary to attempt to place instructions in memory to avoid cache misses. To understand how an algorithm should be implemented to reduce the delays caused by program control branching statements, one must have knowledge of the instruction pipeline used by the ADSP-21060 SHARC processor. The instruction pipeline contains three stages: fetch, decode, and execute. These three stages of the pipeline allow a higher instruction throughput for the system. Any non-sequential program operation can potentially decrease the processor's throughput. In addition to the interrupts mentioned previously, these operations are: jumps, subroutine calls and returns, and loops. The reason for the decreased throughput of these operations is that they may dictate that NOP instructions be placed into the instruction pipeline behind them, generally for two cycles until any ambiguity of program flow has passed. However, if it is possible to use a delayed branch for a jump, call, or return, the throughput loss can be eliminated. In a delayed branch, the two instructions following the non-sequential instruction are executed. As previously mentioned, loops are non-sequential operations. decrease the throughput of the processor. As such, they may Thankfully, however, the ADSP-21060 SHARC processor supports loops with no overhead given that they meet certain criteria. These restrictions are the following: " Nested loops cannot terminate on the same instruction. " There may not be a jump, call, or return operation in the last three instructions of a loop. " Loops of three instructions and below must be treated specially because of the three instruction long pipeline. There are no loops of size three or below used in the peak detection process, therefore the reader is referred to the ADSP-2106x SHARC User's Manual for details on the special treatment of this case [6]. By avoiding the cases outlined above, loop overhead was eliminated from the peak detection process. 31 Further, overhead incurred by program control statements was iteratively reduced by modifying the program arrangement based upon the assembly code produced. 4.4 Required Preprocessing 4.4.1 Requirements As stated above in section 4.2.3, it is assumed that the data reaching the peak detection DSP chips will have the most-significant bit (MSB) set to 1 if no magnitude values within previous and current planes at that pixel for the current frame have been greater than the signal threshold. It is assumed that the data for the first plane to be above the signal threshold and all subsequent planes within a frame will have their MSB set to 0. This preprocessing is essential to the operation of the rest of the algorithm. 4.4.2 Expected method of preprocessing It is currently expected that the required preprocessing will be handled by the Image Interface Board (IIB). 4.4.3 Alternate preprocessing by the input DSP chip The preprocessing cannot be handled by the single DAQ SHARC DSP which is currently asked to accept the data from the IIB and then distribute it to the proper peak detection DSP chips. Each program control branching statement on the DSP requires at least three clock cycles. Any preprocessing algorithm must include a program control branching statement, because it must be able to change whether it sets the MSB to 1 or 0. Therefore, a lower bound on any preprocessing method of 3 operations per data point can be established. Multiplying this lower bound by 128x128x80 = 1,310,720 data points per image tells us that preprocessing each image will take at least 3,932,160 operations. At the current acoustical rate of 15 images/sec, the preprocessing would then place a computational burden of at least 15 images/sec * 3,932,160 ops/image = 58,982,400 ops/sec on the input DSP chip. The Analog Devices 21060 SHARC DSP chips that we are using have a clock rate of 40 MHz. Thus, the input DSP chip would not be able to keep up even with our optimistic lower bound of the preprocessing computational burden. 32 Therefore, because the DAQ DSP is not capable of performing the required preprocessing, it must be handled by the IIB. 4.5 The Algorithm 4.5.1 English description of algorithm Before each new image starts being pumped through the image detection algorithm, the "Potential First Peak" data structure for each location will be reset to indicate that a new image is just starting. This will be done by setting the supra-threshold value indicator for each pixel to "No", the peak found indicator for each pixel to "No", the plane field associated with each pixel to "No Return", and the magnitude value for each pixel to the signal threshold. When each new frame becomes available for processing, the input DSP will pass the appropriate blocks of data to the peak detecting DSPs. Each pixel will then be examined individually. For each pixel, it will first be determined if the data has passed the signal threshold at that location before the beginning of the frame by looking at the suprathreshold indicator in the PFP for that pixel. For each pixel that has had data above the signal threshold before the beginning of the current frame, it will be determined if a first peak has already been found at its location. If one has, no further processing is necessary at that pixel for this frame. If one has not, however, the block must be processed. If the data has not passed the signal threshold by the beginning of the frame, then it will be determined if the signal threshold has been passed by the last plane of the frame. This task will be accomplished by examining the MSB of the last plane in the frame. If the magnitude data for that pixel has not passed the signal threshold by the end of the frame, then the pixel data for that frame should be ignored. If the data for a pixel had already gone supra-threshold before the current frame but a peak has not been found yet, the processing of that frame for that pixel should begin at the beginning of the frame. However, if the plane at which the data first passed the signal threshold is in the current frame, then that first supra-threshold plane should be the 33 starting position. To start processing at the plane for which the data first passes the signal threshold, the program should start at the beginning of the current frame and simply advance to the next plane while the MSB of the plane magnitude is 1. When a plane magnitude with MSB of 0 is encountered, the program should then start fully processing each plane. To minimize the number of program control branching statements that the program must run, the above two cases should be handled separately. To fully process the current plane, its magnitude should be compared to the magnitude of the pixel's PFP magnitude. If the current plane's magnitude is greater, then the PFP array entry for that location should be updated to reflect the current plane and magnitude. If the current plane is less than or equal to the PFP magnitude entry, then the magnitude of the current plane's return should be subtracted from the PFP magnitude entry. If this difference is greater than the noise threshold, then the potential first peak is a confirmed peak. The potential peak should be marked as a peak and the processing of the frame should stop for that pixel. If the difference is not greater than the noise threshold, however, then the next plane of data should be examined in a similar manner. For each pixel, this process should continue until either the end of the frame is reached or a peak is confirmed. After every pixel in every frame of data for an image has been examined in the above manner, the PFP array will contain the plane number and the magnitude of every valid first peak found. This data must then be transferred on to its next destination in the processing pathway. For now, the algorithm does not deal with the boundary condition of the last plane. At present, if the algorithm has a recorded potential peak but this potential peak is not confirmed by the time the last plane of data is processed, then the potential peak is ignored. The most obvious approach for handling this boundary condition is to consider the plane of this potential peak to be the peak. However, this approach eliminates the benefits gained through use of the noise threshold. By assigning any potential peaks to 34 automatically be a peak after all of the data has been processed, noise with a high offset value but low variation which was eliminated by the noise threshold will be reintroduced. The exact method of dealing with the boundary condition will therefore be deferred until a later time when more actual data is available. Figure 4-6 in section 4.5.2 provides a block diagram of the peak detection algorithm. Additionally, figure 4-7 in section 4.5.3 offers an illustration of the peak detection algorithm's performance. Finally, for a listing of the source code that was used to implement a prototype of the peak detection algorithm, see appendix B. 35 4.5.2 Block diagram of the algorithm Start pWhile imaging Done Yes! No! P, For each frame in image Have we processed every frame for the current image Yes Has every pixel for the current frame been processed?+ Receive frame from input DSP No! For each pixel inram i rm A Has data been this pixel? No! Has data been No! encountered by the end of this frame No! a Start at beginning of frame Yes! , Yes! Note that we have supra-threshold data Advance to start of supra-threshold data No! For each plane from starting position P- to the end of the frame Update ptnial first Has every Yes! Is the current magnitude greater than the | potential first peak magnitude for this pixel? plane in the 4.peakentryl fr frame for he ielrfoNo thpieNo the current pixel been processed? No! Is the potential first peak magnitude minus the current magnitude stripped of any masking it may have greater than the noise threshold? 4 Yes! Mark potential first peak to be a peak and quit processing the current pixel for this frame. I Figure 4-6: Block Diagram of the peak detection algorithm. The above block diagram of the peak detection algorithm shows the control flow from the point of view of one of the four peak detection DSP chips. Note that the examination of planes of data is shown as being a part of the same strand independent of whether data was available before the beginning of the frame or not. In the actual implementation, examination of the data should be handled separately for these two cases to minimize the number of program control flow branching statements necessary. 36 4.5.3 Illustration of peak detection algorithm operation Detected Peak 1St Block 2" Block 3 rd Block 4th Block /\ First magnitude " processed 4Threshold Last magnitude processed Signal Threshold While loop < " Figure 4-7: Illustration of peak detection algorithm. The figure above illustrates the operation of the peak detection algorithm for a single pixel within the image. The planes of magnitude data associated with the pixel are broken up into four blocks. The algorithm processes the first block for every pixel within an image, then the second block for every pixel, and so on until it is done with the entire image. At a particular location, the algorithm determines if the data has passed the signal threshold by the end of the current block at that location. If it has not, then the block is ignored. If it has, the algorithm then determines if a peak has already been found for the pixel. If one has, then the block is ignored. If neither of these conditions hold, the algorithm determines the first location that it should examine within a block and examines every data point from that position on in a block until either the end of a block is reached or a peak is found. Note that there are only 40 planes of data shown above and that the block size is 10. In the real implementation, there are 80 planes of data and the block size will be 8. The figure above was scaled down to ease the illustration process. 37 4.6 Proof of Algorithm Correctness To prove that the algorithm developed above is correct, two points will be demonstrated. First, the peak detection algorithm will find a valid peak if one exists. Second, the peak detection algorithm will find the first valid peak if a peak is found. algorithm statement ignores the ending boundary condition. Recall that the Thus, an analysis of these conditions will be omitted from this discussion. Proposition #1: If the peak detection algorithm finds a peak, then the magnitude associated with that peak will be greater than the signal threshold. Proof: At the beginning of each new image to be processed, each element in the PFP array is initialized to hold the signal threshold in its magnitude field and the no return value in its plane field. The peak detection algorithm will not start examining the magnitudes of planes in a pixel until a value greater than the signal threshold is encountered, and this is the location from which the algorithm will start looking. Further, the peak detection algorithm will only replace the contents of a PFP array entry when the magnitude for the current plane in the current pixel is greater than that in the entry. So the magnitude value that is examined will be greater than the signal threshold, and that magnitude and the plane associated with it will be written into the PFP array for the current location. Moreover, since the magnitude associated with an entry in the PFP array is non-decreasing, the value at this location will remain above the signal threshold throughout the processing of the current image. Therefore, it is essential that the magnitude values associated with the planes at a location start being examined for a peak to be found. Further, the beginning of examination implies that a value greater than the signal threshold must have been written into the magnitude associated with the potential peak location, and this value is non-decreasing. The preceding three points imply that any peak detected must have a magnitude value greater than the signal threshold. 38 Proposition #2: Any peak detected by the algorithm will be followed by a plane at least noise threshold beneath it in magnitude and will be the plane of greatest magnitude before the confirming plane. Proof: When the peak detection algorithm sees the first plane with a magnitude above the signal threshold for a pixel, it examines every pixel from that plane until either a peak is confirmed or the end of the image data is reached. Because all of the values ignored were less than the signal threshold and examining the magnitudes associated with planes of data implies that the PFP entry contains a value greater than the signal threshold, all of the values ignored will be less than the PFP entry. When each plane is examined, its magnitude is compared to the value stored at the PFP entry for that pixel. If it is greater, then the current plane and magnitude will replace those in the PFP entry for the current pixel. Therefore, this implies that the magnitude stored at the PFP entry for the current pixel is the greater than or equal to the magnitude of every plane examined or ignored so far. If the magnitude for an examined plane is not greater than the value stored at the PFP array entry for the pixel, then the magnitude of the current plane is subtracted from the PFP array entry magnitude. If this difference is greater than the noise threshold, then the PFP array entry is marked as a confirmed peak, and the pixel is no longer processed. Otherwise the next plane of data is examined. So, the algorithm is guaranteed to have stored the greatest value ignored or examined and the plane associated with it in the PFP array entry for the current pixel, and the PFP array entry will be marked as a confirmed peak stopping processing for the pixel when a plane with a magnitude value below the PFP entry by at least noise threshold is found. Proposition #3: If a valid peak exists, then a valid peak will be found. Proof: Assume for the purposes of contradiction that a valid peak exists, but that no peak is found by the algorithm. This implies that either the valid peak occurred before the 39 peak detection algorithm started examining plane magnitude data or it occurred at or after the plane at which the peak detection algorithm started examining plane magnitude data. If it occurred before the plane magnitude data started being examined, then the peak must have had a value beneath the signal threshold. But this is a contradiction, because a valid peak must have value above the signal threshold. Therefore, the valid peak must have occurred after the plane magnitudes started being examined. Since no peak was detected, this implies that there exists no plane magnitude which is followed by a plane magnitude at least noise threshold below it. This conclusion, however, also leads to a contradiction, because a valid peak must be followed by a plane magnitude at least noise threshold below it. Therefore, to avoid contradiction, a peak must be found if a valid peak exists. Proposition #1 and proposition #2 then imply that the peak that was found must be above the signal threshold and must be followed by a plane with magnitude at least noise threshold less than the peak's magnitude, with no intervening planes of magnitude greater than the peak magnitude. Therefore, if a valid peak exists, a peak will be found and that peak will be valid. Proposition #4: If at least one valid peak exists, then the peak detection algorithm will detect the first valid peak. Proof: By proposition #3 we know that if a valid peak exists, then a valid peak will be found. Assume for the purposes of contradiction that more than one peak exists and that the first peak is not the detected peak. This supposition implies that the peak detection algorithm must have encountered a plane that was at least peak threshold lower than the value stored in the PFP array entry for the current pixel and not marked the current peak as a confirmed peak and stopped execution. This conclusion, however, contradicts the fact that the peak detection algorithm always marks the current peak as a confirmed peak and stops processing at a pixel when a plane magnitude at least noise threshold less than 40 the PFP array magnitude entry for the current pixel is encountered. Therefore, if a valid peak exists, then the peak detection algorithm will detect the first peak. 4.7 Analysis of Computational Burden of Algorithm In all of the computational burden analysis that follows, it is assumed that the number of planes from the signal first passing above the signal threshold to a peak being confirmed by the signal dropping at least noise threshold below the current potential peak may be upper bounded by some parameter that is significantly less than the total number of planes in an image, 80. This parameter will be referred to as the maximum examination window. In the ideal case, the maximum examination window would be two, as the first plane above the threshold would be the peak and it would be confirmed by the next plane's magnitude. As the maximum examination window rises, the performance of the peak detection algorithm presented gets steadily worse, as shown by the following analysis. When the maximum examination window reaches approximately 25, then the computational burden of the peak detection algorithm on each of the four peak detecting DSPs can no longer be guaranteed to be less than the computational capacity of the DSPs. While the existence of a maximum examination window and its length given that such an upper bound can be guaranteed can not be demonstrated until more ultrasonic data is available to examine, it is felt that the value of this parameter will most likely be in the five to eight range. In addition to the existence of a maximum examination window, the computational burden analysis makes the following assumptions: 1) valid peaks are detected at every pixel in the detection array and 2) each valid peak that is detected requires maximum examination window planes to be examined. Both of these assumptions represent a worst case as far as their contributions to the computational burden. That is to say that by removing these restrictions, the expected computational burden of the peak detection algorithm on each DSP would be reduced. 41 In all worst case data, the following restriction is added to the above constraints: the first plane with a magnitude above the signal threshold is always the last plane in its respective processing block. This constraint ensures that the maximum number of while loops must be incurred before processing the data followed by the processing burden which must be spread over at least two blocks. In all expected data, this restriction is lifted, and the first plane above the threshold is assumed to occur with equal likelihood at any of the locations within a block. Finally, for the expected data analysis below, the location of the peak at a particular pixel is assumed to be independent of all other pixels. While this assumption is certainly not valid in reality, it reduces complexity significantly and almost certainly has a negligible impact on the burdens calculated. A sensitivity analysis of this assumption was not performed because the results of the worst case data analysis indicated it was unnecessary. Figure 4-8 presents data on the worst case computational burden of the peak detection algorithm as a function of the processing block size. Only block sizes that divide 80 evenly are considered. The data in this figure was computed for an expected examination window length of five. As can readily be seen, increasing the frame size generates a more efficient algorithm up to a point after which efficiency decreases as processing block size is increased further. Less obvious from this figure is exactly which block size produces the minimum worst case computational burden. A block size of 16 produced the minimum computational burden of all block sizes examined, with a worst case value of approximately 17. x 107 operations/DSP/second. However, as mentioned previously, a frame size of 8 has been selected for the system. Because the performances of the frame sizes 8 and 10, 19.7x10 7 and 18.2x10 7 operations/DSP/second respectively, are not substantially worse than that of a frame size of 16 and these smaller frame sizes avoid the memory access conflicts mentioned earlier, they were preferred over 16. A frame size of 8 was then selected because of a bias towards factors of two and because it eased the design of the read-out integrated circuit (ROIC) to be part of the THA. 42 1.OE+08 9.OE+07 a. 8.OE+07 o7.0E+07 CL 5.0E+074.0E+07-'1,N; Cpcit ofS SP ch'p 3.0E+07- 2.0E+07 E 1.0E+07-1 0.0 E+001 2 4 5 8 10 16 20 40 80 Processing block size (planes/block) Figure 4-8: Worst case computational burden of peak detection v. processing block size Figure 4-9 presents worst case and expected computational burdens of the peak detection algorithm on a single DSP chip as a function of the examination window length. In all cases, the processing block size is assumed to be equal to 8 planes. The data in figures 4-8 and 4-9 demonstrate that the peak detection algorithm should easily be able to fit on to the four peak detection chips as long as the assumption of a relatively short examination window length is valid. The excess processing power that is available may be used for advanced image processing, classification, or display algorithms. The assembly code listing from which the computational burden estimates were derived and the specific method of calculating the computational burden are shown in appendix C. 43 N Worst Case Computational Burden M Expected Computational Burden 5.OE+074.5E+07- capacit or sH ARc DsP chips 0 4.0E+07-,. 0. 0 1.5E+0740. 1OE+07 1.5E+07- 0 4.) 5.0E+06- 2 3 4 5 6 7 8 9 10 11 12 Maximum examination window (planes) Figure 4-9: Computational burden of peak detection v. maximum examination window 4.8 Analysis of the communications burden of the peak detection algorithm Because the ultrasonic data communications that occur are the same for every image, the analysis required to estimate the commirunications burden that the peak detectioni algorithm places upon the DSP network is much less complicated. The maximum amount of data that must be passed over any link is half of an image 15 times per second. This amount of data must be passed over the links connecting the DAQ to the JIB and the link connecting the DAQ to the front two PKDs as shown in figure 2-2. This corresponds to 128 x 64 pixels/half plane * 2 bytes/pixel * 80 half planes/image * 15 images/second = 19,660,800 bytes/second. The maximum amount of data that can be passed over a DSP to DSP link by DMA transfer is 1 byte/clock cyle * 40 MHz = 40 megabytes/second. Therefore, the communications links connecting the DAQ and the PKDs is under half 44 burdened by the ultrasonic data transfers and has plenty of room left for other messages. The maximum amount of data that can be transferred over the IIB to DAQ link, however, is limited by the transfer speed of the IIB. The IIB is also capable of transferring 1 byte/clock cycle, however, its internal clock operates at 33 MHz, not 40 MHz like the DSPs. Therefore, the IIB to DAQ transfers can occur at a rate of 33 Mbytes/sec. Again, this transfer rate is sufficiently high, especially since no other messages are ever asked to pass over these links. 4.9 Conclusions for the software peak detection algorithm section This chapter has shown that the peak detection algorithm can viably be performed in software. Further, by moving the peak detection from hardware in the IIB to software in the DSP Image Processing Board, the DSP chips have been given access to the full set of acoustical image data. Both the computational and communications burdens created by the algorithm are under half the capacity of the DSP chips currently slated to be in the AIS. Thus, a considerable amount of processing power is still available. Therefore, new features may be added to the system's software which make use of this power. An object recognition algorithm, like that presented in chapters 5 through 7, is an example of exactly this type of feature. 45 5 Analysis of data gathered by Acoustical Imaging System 5.1 Introduction Before an object recognition (OR) system may be built, a device capable of gathering data which may be used in the object recognition process must exist. Lockheed Martin IR Imaging Systems' Acoustical Imaging System is not currently developed enough to attack the OR problem. An experimental system available at LMIRIS, however, is capable of gathering finely sampled acoustic time series. It is desired to determine whether the experimental system is a good tool to assess the ability of the AIS to perform object recognition. To accomplish this task, the degree to which the Acoustical Imaging System may be modeled as being linear and time invariant (LTI) will be explored. If it is found that the system is LTI, the ability to use LTI methods will greatly simplify the data analysis process. Next, the dynamic range of the AIS will be discussed. An explanation of and justification for the experimental set-up at LMIRIS will then be presented, followed by a discussion of the relation of this experimental set-up to the Acoustical Imaging System. Based upon the preceding analysis, it will be concluded that data from the experimental system can be used to help establish the feasibility of using Lockheed Martin's AIS for object recognition. 5.2 Assumption of linear time invariant (LTI) system performance The system electronics diagram of figure 2-1 shows the signal flow path through the electronics of the AIS. As can be seen in the figure, all of the components after the analog to digital (A/D) converter in the AIM are digital. Further, commercially available A/D converters can be considered to be LTI devices to within a high degree of accuracy. Therefore, all of the system behind the THA can be considered to be LTI. The task of assessing the degree to which the AIS performs as an LTI system is thus reduced to assessing the linearity and time invariance of the THA, the acoustical lens, and water. Figure 5-1 below illustrates the signal path that must be evaluated. 46 Figure 5-1: Block diagram of signal path of interest in evaluation of LTI assumption Previous analysis by Lohmeyer in [2] has already addressed the degree to which the assumptions of linearity and time invariance apply to the AIS for the transmit transducer, water, and receive transducer. In short, Lohmeyer found that over the majority of the output range of the Transducer Hybrid Array, the receive performance of THA is linear to within experimental measurement limitations [2]. Further, Lohmeyer makes a strong argument that despite the non-linearity of the acoustical transmission of water in the 1-3 atmospheres pressure range the system will encounter, these non-linear effects can largely be ignored. The non-linearities in water produce harmonics in the fundamental frequency of the propagating wave [2 referencing 8]. Water, however, behaves much like a lowpass filter. Above 1 MHz, the attenuation caused by acoustical propagation through water increases substantially with frequency. Thus, the higher harmonics will have been attenuated dramatically more than the original frequency during propagation. Therefore, although the water produces non-linearities, this effect may be ignored [2]. Although beyond the scope of this thesis, a few brief comments can be made to justify neglecting the non-linear aspects of sound propagation through the lens in the AIS. The lens is composed of polystyrene. The tensile strength of the polystyrene in the lens can conservatively be estimated to be 35 MPa. Further, the AIS transmits at approximately 60 dB re lPaIV, meaning that an input of 1 V to the transducer will produce an output pressure of 1000 Pa. The input voltage is always kept well below 1000 V, therefore, an upper bound of 1 MPa can safely be applied to the transmission strength. Because this upper bound is less than 3%of the tensile strength of polystyrene, it can be assumed that the acoustic pulse does not induce the polystyrene to move out of the linear region of its 47 stress-strain curve. Therefore, the propagation of sound through the acoustical lens in the AIS can be assumed to be approximately linear [8]. Because each of the components of the AIS has been shown to behave linearly or nearly linearly, the assumption of a linear time invariant (LTI) system is fairly accurate. With this assumption, the well-developed tools of LTI analysis may be used. The use of LTI analysis will significantly simplify the mathematical and conceptual details of the investigation in this thesis. 5.3 System dynamic range analysis The dynamic range of the Acoustical Imaging System is limited by the system noise-level on the low end and by the non-linear effects of acoustical pulse transmission in water, which place a limit on the amount of power which may be used for a pulse, on the high end. Further, as the AIS is developed further and monostatic operation incorporated into the THA, the THA's power transmission capabilities may impose an even tighter limit on the high end of the dynamic range. However, in bistatic operating mode, the AIS dynamic range is shown below in figure 5-2. As can be seen below, the instantaneous dynamic range of the system is approximately 60 dB. Note that over much of the range of interest planar reflectors and specular targets produce return levels above the system maximum signal level. This situation can easily be remedied by reducing the transmit power [3]. Furthermore, recent calculations indicate that the instantaneous dynamic range of the bistatic AIS could well rise to approximately 75 dB [8] 48 > 20- 0 0 Planar Perfect Reflector -20 -40 3 ~~ ~"Point" --- 60-60 Cl) --- -80 -100 Specular Target (-20 dB) Target (-30 dB) Maximum Signal Level System Noise Level Range - Meters Figure 5-2 Typical bistatic Acoustical Imaging System signal levels (taken from [3]) 5.4 Experimental set-up: explanation, justification, and relation to AIS For several reasons, it was impossible to use the AIS itself for the examinations performed in the second half of this thesis. First, the Image Interface Board mentioned during the first half of the thesis is a second generation board which had not been completed as of the writing of this thesis. Therefore, only the first generation IIB was available. This version of the IB performs the peak detection in hardware before the data is transmitted to the DSP Image Processing Board. Therefore, a user of the AIS does not have access to the full time series of acoustical returns which are essential for the type of object recognition analysis which is intended. Second, the technology for the Transducer Hybrid Assembly is still in development. While there has been success in this area, the availability of THAs is limited. Further, it is desired to not expose any of the few THAs to the extensive use necessary for all of the object recognition (OR) data gathering. Because of these reasons, an alternate experimental set-up had to be developed for the OR work. This section of the thesis will first describe the experimental set-up that was developed, including both the data gathering system and the test objects selected. Next, a brief justification of this experimental set-up will be presented. Finally, the relation of this set-up to the actual AIS will be discussed. 49 5.4.1 Data acquisition Panametrics V380 piston transducer was operated in transmit/receive mode in conjunction with Panametrics Pulser/Receiver Model 5072PR. Data was collected by a Gage 6012PCI data acquisition card which was set to sample the received data at 30 MHz using an external clock. Data was gathered in a water tank of length 100 cm, width 50 cm, and depth 40 cm. During data acquisition, the water depth in the tank was approximately 25 cm. 5.4.2 Target selection The chief underwater imaging task currently foreseen for the Acoustical Imaging System is underwater mine detection. There is great variability in the size of underwater mines. They range from the extreme small of the German DM 221, a cylindrically shaped mine with a diameter of 65 mm and a length of 145 mm, to the extreme large of the Iraqi Al Kaakaa/16, a roughly box shaped mine composed of stacked cylinders and plates with a length of 3.4 m, a width of 3.4 m, and a height of 3.0 m, and the Russian SMDM-2, a cylindrically shaped mine of diameter 0.7 m and length 11 m. Based upon a survey of the more than 50 mines listed in Jane's Underwater Warfare Systems [9], approxiamtely 70% of the mines listed had a roughly cylindrical shape. Further, a typical size was approximately 1.0 m diameter and 2.5 m length. While objects at the lower end of this size spectrum are of reasonable dimensions for experimental work, the typical and large objects are prohibitively large. Because of the size of the tank available for this work, objects will be restricted to be smaller than approximately 100 mm in diameter and 400 mm in length. It will be argued later in section 5.4.4 that the results obtained with these smaller objects are generalizable to the larger targets that may be encountered in actual mine hunting operations. Because of the high prevalence of cylindrical shapes in underwater mines and the geometrical simplifications possible due to cylindrical symmetry, pipes and rods were selected as the primary test targets. Plates were also purchased when available for a particular material. Copper, brass, aluminum, stainless steel, polyvinylchloride (PVC), 50 and pine were selected as the materials. Further, object diameters spanning the range of 5 to 90 mm were selected. Additionally, for the pipes, pipe wall thickness varied over the range 0.5 to 25 mm. A full listing of the 41 test objects used is provided in appendix D. 5.4.3 Target presentation The targets were suspended from a horizontal crossbar using two loops of equally long fishing leader line. The horizontal crossbar was clamped to a vertical support bar. This vertical support bar was then attached to a cranking system that allowed the horizontal and vertical position of the target to be precisely adjusted in increments as small as 0.015 mm. This arrangement is shown pictorially below in figure 5-3. Targets were suspended with their long axis parallel to the bottom surface of the water tank. Further, they were presented with their long axis perpendicular to the surface normal of the face of the piston transducer. Point of attachment to vertical and horizontal positioning system -0 Suspended Target Figure 5-3: Drawing of target suspension apparatus. The target was positioned so that it was approximately halfway between the surface of the water and the tank bottom. Further, the target was centered horizontally within the tank. Due to the experimental set-up conditions available, there may have been some deviation from the parallel and perpendicular positions described above. In particular, the pipe was placed parallel to the tank floor by measuring the height of each side of the overhanging crossbar to ensure that both sides were the same. During the course of 51 experiments, there may have been some variation from horizontal as the set-up was changed and the bar was bumped. Because this type of variation was minor and maintained the normal angle of incidence between the piston transducer and the target, great pains were not taken to eliminate it. Additionally, the normal relationship between the long axis of the target and the surface normal of the face of the piston transducer was established in a similarly ad hoc manner. Using a measurement strategy similar to that mentioned above, the horizontal crossbar was positioned so as to be parallel to the back wall of the water tank. Further, before each data acquisition the angle of the transducer could be adjusted to maximize the strength of the received signal. This approach would ensure that despite limitations in set-up accuracy and disturbances after set-up the same relationship would exist between the transducer and the target surface on every data acquisition. In practice, however, the angle of the transducer was not monitored very closely. As will be shown in section 6.2, data measurements turned out to be relatively invariant to minute variations in the angle. 5.4.4 Experimental set-up justification The decision to use cylindrical test objects has already been justified based upon the representative nature of that shape in section 5.4.2. However, as noted at that time, the size of the test objects used for this work is significantly smaller than a typical sea mine. Therefore, it is also necessary to show that the results for these smaller test objects are generalizable to sea mines. While in general the scattering by targets of sound is very complex, for the relatively simple geometry of a cylinder and for wavelengths that are very small compared to the circumference of the cylinder, the scattering analysis can be greatly reduced through use of "geometrical acoustics". In this short-wavelength limit, the scattered wave can be thought of as splitting into two parts: the reflected and the shadow-forming waves. Further, the scattering behavior of cylinders as a function of the wavelength of the acoustical energy incident upon them reaches an asymptote in both the total intensity of sound scattered and the directionality of the scattered sound. 52 While for very long wavelengths compared to the cylinder circumference relatively little sound is scattered and that which is scattered is directed primarily backward, as the incident wavelength is shortened the scattering pattern becomes much more complex with more of the scattered energy traveling in the forward direction. For very short wavelengths, the directionality of the scattered pulse is very nearly constant, with approximately half of the scattered energy directed forward in a very sharp beam and the rest of the energy approximately uniformly distributed. While the total intensity scattered very quickly reaches its asymptotic behavior, the directionality of the scattered pulse is somewhat slower to reach its short-wavelength limit. Figure 5-4 shows the total intensity of scattered sound from a cylinder as a function of the ratio between the cylinder circumference and the sound wavelength. The evolution of the scattering directionality from a cylinder at increasing circumference to wavelength ratios shows similar asymptotic behavior [10]. Total intensity scattered 0 1 2 3 4 5 Figure 5-4: The dependence of scattered intensity on the ratio 2nra/ incident upon a rigid cylinder of radius a (adapted from [10]) for sound with wavelength X Of the 39 cylindrical test objects selected for the object recognition feasibility study, all possess a radius (a) of greater than 3.175 mm. This implies that all possess a wave number, ka = (2n/)*a, of at least (27c/0.5 mm)*3.175 mm ~ 40. Further, all but two of the test objects possess a radius of at least 6.35 mm, indicating a wave number greater than approximately 80. Based upon information from Morse and Ungard [10], these wave numbers indicate that the scattering behavior of the cylinders is safely into the 53 asymptotic regions of both scattering intensity and directionality. Therefore, for the test objects selected, the short-wavelength limit applies, and it can safely be stated that the results obtained will be generalizable to larger objects such as sea mines. 5.4.5 Relation of experimental system to Acoustical Imaging System The use of the piston transducer-based experimental system instead of the Acoustical Imaging System for this work introduces some discrepancies between the problem space being explored for the object recognition feasibility study and the problem space in which an object recognition system would be applied. It is important that these discrepancies be understood so that the knowledge gained in this work may be correctly interpreted. There are two main differences between the experimental system used in the object recognition feasibility study in the second half of this thesis and the Acoustical Imaging System. First, the data acquisition of the two systems varies in terms of spatial resolution of the acoustical pulses, signal-to-noise ratio, length and frequency content of the transmitted system, and in the sampling period of the received signal. Second, the experimental system is manually set up so that the spatially and temporally interesting regions of the imaging environment are recorded, but no such assurance may be made for the AIS. The first difference mentioned above is caused by the use of the piston transducer instead of the Acoustical Imaging System's focused transducer array elements for data acquisition. The piston transducer produces an acoustical pulse that projects directly forward for a distance approximately equal to the surface area of the piston transducer divided by the wavelength of pulse (ira 2/X). For the experimental set-up, with a piston transducer diameter of approximately 1.125" = 2.86 cm and a wavelength of 0.5 mm, the near field - far field transition point, or na2/A distance, occurs at approximately 125 cm. Therefore, over the distances used in the experimental set-up (typically approximately 20 cm to the target), the acoustical pulse from the piston transducer will maintain its circular shape [11]. In a surface area equal to that of the piston transducer on the Transducer Hybrid Assembly, there would be many acoustical elements. Indeed, the piston transducer head is a circle of approximately 1" in diameter, while the THA is 54 approximately a 2" by 2". Therefore, there would be approximately 3200 THA elements contained within the surface area of the piston transducer. When imaging an object that is in focus, each of these elements would project itself through the acoustical lens onto the surface of the target and then back. Under these circumstances, the data gathered with the Acoustical Imaging System would have much more spatial resolution than the relatively coarse data that is now gathered with the experimental set-up. This higher level of detail should prove valuable in object recognition, opening up a range of features that were not available using just the piston transducer. As the imaged target falls out of focus, the received image will blur. The signal received at any one THA element will thus be the convolution of the signal that would have been received at each of the elements in the neighborhood of the element, with the size of the neighborhood increasing as the image becomes more out of focus. Even for severely distorted images, however, the image will still contain a significant amount of information about the spatial variations in the received signal that should be useful for object recognition. During all of the data collection with the piston transducer, the maximum output signal voltage was kept at or below 2 V. Because the baseline noise level in the piston transducer and electronics used was slightly higher than 0.02 V, this implies that at any one time, no more than 40 dB of dynamic range was used. The settings used in data collection, however, had to be changed to get the same magnitude of signal from smaller pipes as from larger pipes and plates. At the settings used for the smallest pipes, the largest pipes and plates would have produced returns in the range of 8 V. Therefore, in total, approximately 52 dB of dynamic range was used. These figures compare very favorably with the 60 dB of instantaneous dynamic range available to a detector element in the AIS, and the considerably greater dynamic range that can be achieved by adjusting the output signal level. Further, if one assumes that detector noise is uncorrelated across elements, because there are many elements in the AIS, averaging of received values over intelligently selected regions of the detection array may further increase the SNR by a factor of 1N, where N is the number of elements averaged. Such an approach involving the use of composite signals from a region of detector elements could significantly 55 increase the dynamic range beyond the 60 dB figure previously quoted. For example, use of 100 elements could increase the dynamic range to 80 dB. Therefore, it appears that the AIS will have more than adequate dynamic range available to perform the type of object recognition tasks accomplished using data from the piston cylinder. Additionally, the length and frequency content of the transmitted pulse from the piston transducer and the THA differ. While the exact length of the pulse to be transmitted by the THA is not yet set, it is currently planned to be about three times longer in duration than the pulse used by the experimental system. This difference will have two effects: 1) the temporal resolution of the AIS will be lower than the experimental system, and 2) the transmitted pulse will have less bandwidth. Shown below in figure 5-5 is the frequency content of the recorded acoustical pulse from the piston transducer. To form this estimate of the frequency content, ten recordings of the acoustical pulse's reflection off of a thick (~ 15 cm), planar, stainless steel block were taken. From each recording, 5 gsec (150 samples) of the reflection was saved. The saved portions of each recording for the ten reflections were then appended together into a single file. The spectrogram of the appended file was then computed using Matlab's psd command with a 1024 point Fast Fourier Transform (FFT), a window size of 100, and an overlap of 80. In this implementation, the input signal is divided into overlapping sections, each with length window size. Each section is then windowed using a Hanning window and zero-padded to match the length of the FFT. The magnitude squared Discrete Fourier Transform (DFT) of each section is then computed, and these results are averaged to form the frequency content estimate [12]. 56 0 -5 ..-.. ... ....-.. .....-.. ... .... ..-.. .-.... ... ...-.. ... .... .-.. .... ...-.. ..... ..... ... - .. ....-.. .... ...-.. ... ....-.. -10 -15 - -20 - - - - .................. . - ............. - .......... ...... - - ..-...... ..... ...........-.. U) C - E -25 -30 ................ ... - ............. - ........... .... - ....... ........... . -........... ...-. .... ............... ......... - -. - -. - - ..... - U) ...........................-.... ...............- .... .......-.. -........ ... -...... ..... .... ... ..... ..-..-..-..-..-..-.. ..... ..... ..-....... ... ...... -... .......... .-........ -35 0 CL -40 -45 -50 0 1 2 3 Frequency 4 5 6 x 106 Figure 5-5: Frequency content of the recorded piston transducer acoustical pulse Shown in figure 5-6 is the receive transfer function of the bistatic AIS transducer. At I MHz from the central frequency, the receive transfer function has an attenuation of approximately 13 dB. However, the power spectral density of the recorded piston transducer acoustic pulse has only dropped by approximately 8 dB at 1 MHz from the central frequency. Further, the power spectral density of the recorded acoustic pulse from the piston transducer shown in figure 5-5 takes into account both the transmitted pulse shape and the receive transfer function. The receive transfer function shown in figure 56, however, has not been affected by the frequency content of the pulse to be transmitted. Therefore, the difference between the frequency content of a recorded pulse from the piston transducer and a recorded pulse from the bistatic AIS would be somewhat greater. 57 R eceive T ransfer Function - V(R x)/P(Pascals) 45.000 180.000 Vo/Pi T heta dB V/MPa Degrees -5.000 2.000 . Frequency in MHz 44.227 dB V/M Pa / -93.050 Degrees at 3.000 MH z -180.000 4.000 Figure 5-6: PiezoCAD design report plot showing the expected frequency content of the transmitted pulse from the bistatic AIS. In the graph above, the smooth curve near the top represents the magnitude portion of the bistatic transducers' receive transfer function. The y-axis label of the plot above, Vo/Pi, stands for voltage out relative to pressure in. As can be seen by comparing the data presented in figures 5-5 and 5-6, the frequency domain characteristics of the AIS and the piston transducer are far from identical. However, they are qualitatively similar in that both have the greatest sensitive at approximately 3 MHz. Therefore, while data gathered using the piston transducer would be inappropriate for predicting the expected return from an object imaged with the AIS, the same analysis techniques used to perform object recognition with piston transducer frequency data should be applicable to data gathered with the AIS. Because a narrower range of frequencies are present, however, it is likely that the usefulness of frequencydomain classification characteristics will be lower in data gathered by the AIS than the piston transducer. The final difference between the experimental set-up and the AIS due to the use of the piston transducer for this work has to do with the sampling period. The AIS quadrature samples its data. In the present application, the AIS samples data at four times the center frequency of transmitted pulse, 3 MHz. Because we are capturing data at 12 MHz, the AIS will be capable of sampling data containing frequencies of up to 6 MHz before aliasing becomes an issue. Signals were captured from the piston transducer using an 58 externally generated 30 MHz clock. At this data acquisition rate, the experimental system was capable of capturing data at frequencies of up to 15 MHz before aliasing is introduced. Based upon the examination of the power spectral densities (PSD) of many captured signals, almost all of the frequency content of the returned signals is below 5 MHz, with the frequency content at 5 MHz generally being reduced by a factor of 30 to 100 over the content at the center frequencies in the 3 - 3.5 MHz range. Additionally, while there are on occasion some low level features of interest in the 6-10 MHz range of the PSDs, the vast majority of interesting features are located below 6 MHz. Finally, essentially all of the frequency content above 10 MHz is noise. Therefore, although the AIS is not capable of sampling the data at the higher rate that was used in the experimental set-up, it appears that aliasing of the returned signal is not a major issue. Furthermore, since the vast majority of the interesting frequency domain features are in the range below 6 MHz, the frequency domain-based recognition power of the AIS should not be adversely affected by its lower sampling rate. The second class of differences between the experimental set-up and the AIS arose because the experimental set-up is specifically designed to capture interesting signals. Before a piece of data is acquired, the target and the transducer are positioned so that a strong signal will be received. Further, due to file size constraints, only those portions of the acquired signal containing the reflections of interest are stored for later analysis. With the AIS, however, there is no such guarantee that the signals recorded will be of use in object recognition. Obviously, the system operator must first encounter and image a target. Further, within the volume of data acquired, only a small subset will be useful in object recognition. A mechanism must be put in place, whether automated or operator controlled, to allow the system to identify such regions of interest. This topic will be addressed further in chapter 7. 5.5 Conclusion Such differences between the Acoustical Imaging System design and the experimental system as the AIS's longer acoustic pulse and narrower frequency band will make the 59 object recognition task more complicated for the AIS than the experimental system. On the whole, however, the type of data available from the two systems is very similar: low megahertz underwater acoustic data from generally cylindrical targets that are very large compared to the wavelength. Further, the AIS, possesses more dynamic range than used by the experimental system and has an ensemble of transducers instead of just one. These factors should serve to make the object recognition task less complicated for the AIS than the experimental system. Moreover, because it is still in the design stage, future changes may be made to the AIS. Indeed, demonstration of a viable object recognition algorithm using information like that available from the experimental system may provide motivation for such changes. Therefore, development of a prototype object recognition algorithm to work with the data available from the experimental system is deemed to be a worthwhile endeavor. 60 6 Object recognition feasibility study 6.1 Introduction To determine the feasibility of performing object recognition with the AIS, this chapter presents a prototypical object recognition system built for an experimental system that has access to similar data to the AIS. While a great deal of work has been devoted to object recognition in similar systems such as synthetic aperture radar and sonar that also possess an inherent noisiness or speckle, no work has been done on the development of object recognition algorithms for a low megahertz acoustical imager [13, 14]. This chapter begins by presenting the target orientation dependence found in the experimental data. This viewpoint dependence may or may not be present in data from the AIS. The features used during object recognition and the rationale behind their selection are then presented. In general, it is found that no single feature is effective for classifying all of the test objects, so many features must be developed. Next, the object recognition prototype that was developed to work with the data gathered by the experimental system is presented. Finally, the object recognition system's performance is presented in a variety of situations. The performance of the system is analyzed as noise and viewpoint uncertainty are added to the data. Further, the performance of the object recognition prototype is explored under an open problem space. In all cases, the performance of the system is found to be quite good. 6.2 Dependence of acquired data on viewpoint Early in data collection, it was noted that specular reflections from a target could appear qualitatively different. In particular, it was observed that the return signal was sensitive to translations of the target with respect to the transducer in the vertical direction. Said another way, as the targets location was adjusted up or down with the cranking system, the observed reflection kept approximately the same amount of power but varied qualitatively. These variations were symmetric about the centerline of the pipe and manifested themselves through strengthening and weakening of the different peak amplitudes in the reflection. Furthermore, minor variations in the reflected signal were 61 observed as the angle of the transducer was varied in the plane containing the target surface normal and long axis. No variations, however, were noted as the target was translated toward or away from the transducer within the transducer's near field. Following up on the observations noted above, measurements of the reflections from four targets were taken as a function of vertical translation distance of the transducer from the target centerline and rotational deviation from perpendicular of the transducer in the plane containing the target surface normal and long axis. The four targets selected are shown below in table 6-1. These objects were selected because they covered a range of materials, diameters, and wall thicknesses. The recorded time series data and an estimate of the frequency content of a region including the first reflection are shown for object #9 and object #19 in figure 6-1. The region from which the frequency content estimate was formed started approximately 1 gsec prior to the reflection start and had roughly 17 gsec duration. The data generated for object #27 and object #31 are very similar in nature to those presented. Material Outer Diameter Wall Thickness 9 Copper 41.10 mm 1.08 mm 19 Brass 35.64 mm 1.71 mm 27 Aluminum 31.71 mm 2.05 mm 31 Aluminum 50.80 mm 6.40 mm Object Number Table 6-1: Listing of the four targets for which recordings were taken over a course of precise translational and rotational variations. 62 0 1.5 1 0 .5 -5 -10 0 -1 5 -20 E -0 .5 -1.50 1 T im e2(usec 3 4 .5-30 0.0E+00 2.0E+06 4.0E+06 F requen cy (H ertz) 6.0E+06 Figure 6-1: Acoustical backscatter recordings from object #9 over a series of vertical translations of the object. Object #9 is a copper pipe with 41.10 mm diameter and 1.08 mm shell thickness. In the left column is the time series data, which has a range of 4 usec or 120 samples. In the right column is the power spectral density computed from the time series data, which covers the range of 0 to 6 MHz. The first row of plots corresponds to data taken with the transducer located at a height equal to that of the target centerline. Between each row of plots, the target centerline is translated by approximately 2 mm relative to the transducer. Due to symmetry of the pipe, the results are the same for either an upward or downward translation. Note that the shape of the power spectral density has changed considerably by the time the translation has reached 4 mm. The strength of the time series, however, does not decay significantly until approximately 8 mm of translation. 63 - 1.5 0 0.5 -1 0 0 1 5 -0.5 - 20 -1-2 30 0 .0E+00 -1.5 0 1 2 3 4 Tim e (usec) 2 .0E+06 F requen 4 .0E+06 cy (H 6 .0E+06 ertz) Figure 6-2: Acoustical backscatter recordings from object #9 over a series of horizontal rotational displacements of the transducer. Object #9 is a copper pipe with 41.10 mm diameter and 1.08 mm shell thickness. In the left column is the time series data, which has a range of 4 usec or 120 samples. In the right column is the power spectral density computed from the time series data, which covers the range of 0 to 6 MHz. The first row of plots corresponds to data taken with the surface normal of the transducer parallel to the surface normal of the target at its centerline. Between each row of plots, the transducer is rotated by approximately 0.2 degrees relative to the target. Due to symmetry of the pipe, the results are the same for either clockwise or counterclockwise rotation. Note that the strength of the time series begins to drop very quickly with a considerable decrease in signal strength occurring by a rotation of 0.4 degrees. The form of the power spectral density, however, exhibits a high degree of consistency throughout the rotations. 64 3 0 $ 2 -5 -10 0 -1 -5 -20 -2 -3 $ 0 1 2 3 4 Tim e (u sec) -25 0 .0E + 00 2 .0E + 06 F requen cy 4.0E + 06 (H ertz) 6 .0E + 06 Figure 6-3: Acoustical backscatter recordings from object #19 over a series of vertical translations of the object. Object #19 is a brass pipe with 35.06 mm diameter and 1.71 mm shell thickness. In the left column is the time series data, which has a range of 4 usec or 120 samples. In the right column is the power spectral density computed from the time series data, which covers the range of 0 to 6 MHz. The first row of plots corresponds to data taken with the transducer located at a height equal to that of the target centerline. Between each row of plots, the target centerline is translated by approximately 2 mm relative to the transducer. Due to symmetry of the pipe, the results are the same for either an upward or downward translation. Note that the shape of the power spectral density has changed considerably by the time the translation has reached 4 mm. The strength of the time series, however, does not decay significantly until approximately 8 mm of translation. 65 3 0 2 -5 1 -0 30 -15 1 -20 -2 5 -2 0 1 2 Time (usec) 3 4 0 .0E+00 2 .0E+06 4 .0E+06 F requency (H ertz) 6 .0E+06 Figure 6-4: Acoustical backscatter recordings from object #19 over a series of horizontal rotational displacements of the transducer. Object #19 is a brass pipe with 35.06 mm diameter and 1.71 mm shell thickness. In the left column is the time series data, which has a range of 4 usec or 120 samples. In the right column is the power spectral density computed from the time series data, which covers the range of 0 to 6 MHz. The first row of plots corresponds to data taken with the surface normal of the transducer parallel to the surface normal of the target at its centerline. Between each row of plots, the transducer is rotated by approximately 0.2 degrees relative to the target. Due to symmetry of the pipe, the results are the same for either clockwise or counterclockwise rotation. Note that the strength of the time series begins to drop very quickly with a considerable decrease in signal strength occurring by a rotation of 0.4 degrees. The form of the power spectral density, however, exhibits a high degree of consistency throughout the rotations. 66 As can be seen in figures 6-1 through 6-4, the data for both objects shows roughly the same pattern. Significant drops in the reflection power do not occur until approximately 8.0 - 10.0 mm of vertical translation has been introduced. However, within 3.0 mm of vertical offset, the time series has altered in a qualitatively noticeable fashion. Further, within 3.0 mm of movement, the frequency content of the signal has altered significantly with peaks and troughs in the spectrum appearing and disappearing. This finding suggests that for object classification, especially if the frequency content of the returned signal is to be used during the identification process, it may not be enough to simply ensure that a specular reflection is used - it may be necessary to ensure that the reflections are recorded from consistent vertical positions. The degree to which the above assertion is true depends upon the source of the translational variations. The nature of the source is speculated on later in this section. The data in figures 6-1 through 6-4 suggest that angular variations do not follow the same pattern as vertical translations. In particular, signal strength generally declined noticeably within 0.4' rotation from normal. However, the reflection time series do not appear to change qualitatively until approximately 0.8 - 1.00 rotation. Similarly, the frequency content of the backscattered acoustical pulse maintains much greater similarity with the normal frequency content even as the signal strength decays. This finding suggests that unlike with vertical translations any strong reflection will have the proper angular alignment for object classification. It should be noted that the reflections in figures 6-1 and 6-3 appear qualitatively different from those in 6-2 and 6-4. The mechanical apparatus that was used to achieve the precise angular orientations was quite large. Therefore, its use necessitated that the transducer be located far enough away from the target that it was close to the acoustical near-field/farfield transition point as discussed earlier in section 5.5.5. This target location is the cause of the dissimilarity between the rotational and translational data. While it was impossible to take precise angular measurements in the near-field without the mechanical apparatus, near-field backscatter behavior was observed throughout a range of rotations. 67 These observations indicated the same pattern as shown in figures 6-2 and 6-4: the amplitude of the received signal dropped off significantly faster than the frequency content of the signal changed. To provide a physical explanation for the observed phenomena, it is postulated that the transfer function the acoustical signals encounter is highly dependent upon the vertical location of incidence. Therefore, as the target is translated vertically with respect to the piston transducer, the average transfer function seen at the face of the piston transducer changes relatively quickly as the symmetrical appearance of the target is lost. This effect is illustrated in figure 6-5. Move target down Figure 6-5: Illustration of the changing appearance of the target with respect to the transducer as the target is translated vertically. The figure above depicts two scenarios. At left, the target's centerline and the piston transducer's centerline are located at the same height. The acoustical pulse, represented as the dotted rays in the figure above, strikes the surface of the target and is reflected back to the piston transducer. As can be seen, the distribution of incident signal on the piston transducer will he symmetric. At right, the target has been translated downward by a small amount. In this case, the distribution of the acoustical signal incident upon the piston transducer is no longer symmetric. This change in the nature of the incident signal is hypothesized to account for the rapid variation in the observed transfer function as vertical translations occur. The transfer function, however, does not appear to be highly sensitive to perturbations with respect to angle of incidence in the horizontal direction. If, as conjectured in the previous section, the alteration of the observed transfer function is due to the loss of symmetry as seen at the face of the piston transducer, this lower sensitivity to rotational movements would be expected. Because rotational movements keep the centerline of the 68 piston transducer in the same plane as the centerline of the target, symmetry is maintained throughout. Finally, it must also be recognized that the effects noted may be due solely to spatial variations in the piston transducer beam pattern. The transmission characteristics of a piston transducer vary spatially, with peaks and nulls in the transmission pattern appearing and disappearing in a complex manner. Because the acoustical pulse used had a short duration, it was broadband in frequency content. The broadband nature of the pulse tends to smear out the peaks and nulls of the transmission pattern. A complicated, frequency-related pattern, however, still exists. This pattern could induce orientation specific alterations in the backscattered signal [15, 8]. Based upon the above findings, the vertical position of the transducer was closely monitored during the recording of the target data. As mentioned previously, because of the symmetry possessed by the pipes and cylinders, the reflection alterations produced by a vertical translation were symmetric with respect to the pipe's centerline. By searching for this point of symmetry in the reflection variations, the transducer was centered on pipe. During the course of the recordings, the transducer's vertical position was varied within approximately ± 1.5 mm of this centerline. This variation was introduced in an attempt to make the data gathered in this study more relevant to the AIS being developed by Lockheed Martin. Because vertical positions can not be controlled as precisely in a real-world environment as in the lab, this variation was introduced to reflect that realworld uncertainty. However, as shown in the data in figure 6-1 above, this degree of variation produces only minor effects on the received reflection, and therefore should not hinder OR system development so much as to make the task impossible. To more precisely explore the importance of vertical position uncertainty in the data, a second set of data was also taken for each target. In this data set, the vertical position was varied over a range of approximately ± 12 mm. While the OR system was developed using the data for which the vertical position was more precisely controlled, the system was also evaluated using the less precise data. The results of this exercise are presented in section 69 6.5.3. Further, in chapter 7, a discussion of how data with a precise vertical position may be obtained by a real-world system is presented, along with a discussion of how precise such an automatic positioning may be expected to be. As previously stated in section 5.4.3, the rotational position of the transducer was not monitored closely prior to acoustical recordings. In between each recording, the transducer head was swiveled so that the acoustic signal would disappear. The transducer head was then reset, taking care only to ensure that a reflection within about two-thirds of the maximum was received. 6.3 Object recognition feature selection Early in the OR system development process, it was recognized that a universal set of features which could be extracted once and allow precise determination of the identity of the target was beyond reach. The reason for this is simple: the objects to be identified were often very different. Therefore, features that were quite significant in identification of some objects were not only worthless but also detrimental to identification of others. Therefore, the objects were broken into classes, and for each class a family of attributes was developed that allowed the members of the class to be discerned from the other targets. 70 Recorded data for object #29, a thick-shelled alum inum p ip e 2 .5 2 1 .5 0 .5 0 0 .5 -1 1 .5 -2 -2 .5 2 .4 0 E -0 4 2 .5 0 E -0 4 2 .6 0 E -0 4 2 .7 0 E -0 4 T is R e c o r d e d fo r o b je c t # 2 6 . a e (s e c 2 .8 0 E -0 4 2 .9 0 E -0 4 ) t h in - s h e lIe d a lu m in u m p ip e 2 1 .5 0 .5 -1 0 7 -0 .5 -1 .5 -2 2 .90 E -04 3.00E -04 3.10E -04 3.20 E -04 3.30E -04 T im e (sec) Recorded data for object #35, a solid alum inum c y lin d e r 1 .5 1 0 .5 0 -0 .5 -1 1 -1 .5 2 .7 0 E -0 4 2.8 0 E -0 4 2 .9 0 E -0 4 3.0 0 E -0 4 3.1 0 E -0 4 3 .2 0 E -0 4 T im e (sec) Figure 6-6: An example of a thick-shelled object (top), a thin-shelled object (middle), and a cylindrical object (bottom). Each of the above three recordings is typical of its class. Comparison of the top two recordings quickly establishes the essential difference between thick and thin-shelled objects: whereas individual reflections may be identified for thick-shelled objects, only complex reflection regions may be identified for thin-shelled objects. Cylindrical objects were assigned to their own class because of the relative sparseness of their data sets and because of the lack of the same structural features as exhibited by pipes. 71 Three classes of objects were used: thick-shelled objects, thin-shelled objects, and solid cylinders. Figure 6-6 shows an example recording for each of the three classes. An object was considered to be a thick-shelled object if its walls were thick enough to produce distinct reflections. Because the acoustical pulse used for the experiments was approximately 1.4 gsec in duration, for a target to be considered thick-shelled, the wall of the target had to be wide enough that sound required 1.4 gsec to traverse twice its width. Further, because sound travels at different speeds in different materials, the distinction between a thick and a thin shell depends not only on the shell thickness but also on the material composing the shell. For the materials used in the experiments, table 6-2 lists the thickness required to be considered a thick-shelled object. Sixteen objects were assigned to the thin-shelled class, fifteen objects to the thick-shelled class, and eight objects to the solid cylinder class. Note that the two metal plates were both assigned to the thick-shelled class. Aluminum 6.40 mm/sec 4.48 mm Brass 4.42 mm/gsec 3.09 mm Copper 5.01 mm/tsec 3.51 mm PVC 2.38 mm/gsec 1.67 mm Table 6-2: Pipe wall thickness required to be considered a thick-shelled pipe for various materials For each class of objects, a family of classification features was developed to exploit the material and structural properties of the target. Further, estimates of an object's transfer function, which depends upon the material and structural properties of the object, were also used in differing ways by each class of objects. 72 6.3.1 Material-based classification features 6.3.1.1 Thick-shelled targets Just as Snell's Law can be used to determine the direction of propagation of scattered waves at material interfaces, the acoustical Fresnel equations describe the relation between the amplitude of scattered waves and the amplitude of the incident waves. The acoustical Fresnel equations allow the determination of the reflection coefficient and the transmission coefficient for a wave incident upon an interface. These coefficients, in turn, can be used to determine the amplitude of the reflected and transmitted portions of the scattered wave, in terms of the amplitude of the incident wave. In general, these coefficients are complicated functions of the type of wave (longitudinal or shear) that is incident, the types of waves produced, the angle of incidence, and the materials composing the interface [16]. For the Acoustical Imaging System, all the imaged objects will reside in water. Since liquids support only longitudinal and not shear waves, the acoustical situation is thus somewhat simplified. Therefore, all incident waves are longitudinal. While shear wave propagation may occur within the solid as the result of scattered acoustical energy from a non-perpendicular wave incidence, all waves recorded at the transducer will also be longitudinal. Further, because firm restrictions have been placed upon the orientation of the transducer and the target for experimental data acquisition as discussed in section 6.2, there is only small variation in terms of the angle of incidence of the acoustical waves on the target. While small variations in angle of incidence can cause wild variations in the reflection and transmission coefficients, the acoustical picture is straightforward enough that it may be possible to extract a simple statistic from the reflected time series that will shed light on the material of the target [16]. Figure 6-7 shows example recordings from PVC, aluminum and brass pipes. As can readily be seen, the ringing patterns observed for these pipes are quite distinct. The ringing from the PVC pipe dies very quickly. The ringing from the aluminum pipe is 73 much more persistent. The brass pipe, however, maintains the most energy in its ringing over time. After examining thick-shelled material time series like those shown above, through trial and error a statistic was found that could serve as an excellent predictor of material from recorded data. The derived statistic is the ratio of the maximum absolute value of the 3 rd reflection convolved with a matched filter and the maximum absolute value of the 2 nd reflection convolved with the same matched filter. Shown below in figure 6-8 is a schematic illustration of the method by which the material determination statistic was computed. 74 Tim e series data about first reflection for object #5 - a PV C pipe 1 .5 0 .5 0 -0 .5 2.8 4 E -0 4 2.8 8 E -0 4 2.9 2 E -0 4 T in T in e s e r ie s a b o u t fir s t r e fIe c t io n e 2.9 6 E -0 4 3 .0 0 E -0 4 (se c) fo r o b je c t # 3 3 - a n a lu m in u m p ip e 1 .5 1 0 .5 0 -0 .5 -1 -1 .5 2 2 .5 2 E -0 4 2.5 6 E -0 4 2.6 0 E -0 4 2.6 4 E -0 4 2.6 8 E -0 4 T im e (sec) Tim e series data about first reflection for object #22 - a brass p ip e 1 .5 1 0 .5 0 -0 .5 -1 -1 *- .5 2 .60E -04 2.64E -04 2.68 E -04 2.72E -04 2.76 E -04 T im e (sec) Figure 6-7: Example recordings from thick-shelled PVC, aluminum, and brass pipes. The three figures above illustrate two important points. The first point is that the ringing from brass appears to be more persistent than that from either aluminum or PVC. Likewise, aluminum appears to ring longer than PVC. The second point is that the relationship between the first reflection (which is from the front edge of the front wall of the pipe) and the second reflection (which is from the back edge of the front wall of the pipe) is complicated. Simply because an object exhibits a strong second reflection does not mean that it will ring well. Furthermore, examination of the data indicates that there is a high degree of variability in the relationship between these two reflections, even within the recordings for a single object. The relationships between the second, third, and subsequent reflections (as long as they are still reflections from the back edge of the pipe's front wall) are much simpler and more consistent. Therefore, the ability of a pipe to ring is best characterized by examining these later reflections. 75 n Find the maximum absolute value of the convolution over a 2 psec window whose center is located as far after the 2 nd reflection as the 2 nd is Find maximum absolute value of convolution between 1 and 20 ssec after first Find maximum absolute value of Convolution with matched filter at Data convolution - this is the 1 " reflection! Measu re the time from the 14 tc the 2 "dreflection. reflection - for thick- 4- shelled objects, this value corresponds to the 1streflection from the back edge of the pipe's front wall (2 nd overall). located after the 1 ". For thick-shelled objects, this value will correspond to the 2 nd reflection from the back edge of the pipe's front wall ( 3 rd overall). Figure 6-8: A schematic illustration of the method by which the material determination statistic is computed. Unfortunately, the thick-shelled targets were composed of only the three materials shown above. While the number of materials was quite limited, each of these three produced As will be discussed later, the thirty data very different results for the test statistic. recordings for each of the fifteen thick-shelled objects were divided into training and test sets, with fifteen recordings assigned to each set. With the exception of one outlier from the aluminum recordings, all of the ratios for the training data fell into non-overlapping segments. Table 6-3 lists the range of maximum and minimum mean ratios for each material. Also included is the standard deviation of the ratio values for the object that produced this maximum or minimum. 76 Maximum mean ratio Material Standard devaition of the Object producing maximum Minimum mean ratio Standard devaition of the Object producing minimum maximum maximum PVC 0.0686 0.0056 7 0.0440 0.0028 1 Aluminum 0.6819 0.0237 37 0.5130 0.0120 34 Brass 0.8307 0.0164 25 0.7594 0.0125 20 Table 6-3: 3rd to 2 "dreflection ratio statistics for the training data examined. The variations in the reflection ratio data are most likely due to unaccounted for angle of incidence and shear wave effects. Because of the short distances over which the acoustical propagation is occurring, these differences are most likely not due to loss during propagation. This assertion is borne out by the data. While for PVC the object producing the maximum ratio is the thinnest and the object producing the minimum ratio is the thickest, this pattern does not follow for aluminum and brass. For example, object #32, an aluminum pipe which at 24.78 mm thick is by far the thickest aluminum pipe, has the third greatest 3 rd to 2 nd reflection ratio of the six aluminum pipes. While the variations are significant, the data definitely indicates that there is a good deal of information that can be gleaned from data of this type. Further, it appears that a major source of variation in reflection ratios is the shape of the object. While more data is necessary to confirm this suspicion, the objects with the greatest reflection ratio for both aluminum and brass are the flat plates. Indeed, for aluminum, the deviation between the reflection ratio of the flat plate and the second greatest reflection ratio is approximately three times as great as the variation amongst the aluminum pipes. Because of the clear separation amongst the ratio ranges of the materials, in the experimental system, the 3 rd to 2"d reflection ratio data could be translated directly into a material assignment. For more complex situations in which there are not clearly separable material ratio ranges, such information could be used to restrict the possible materials to be considered. 77 6.3.1.2 Thin-shelled targets Because of the complex nature of the reflections encountered with thin-shelled targets, no simple and general statistic could be found that would produce an estimate of the target's material composition. While it was noted that the ringing from copper pipes appeared to be more persistent than that from brass pipes which in turn appeared to ring longer than aluminum pipes, these observations were difficult to quantify. Moreover, translation of this insight into a computer program proved infeasible. Therefore, a composite statistic was developed that is dependent upon both the material and structural properties of the target. This statistic will be discussed later in section 6.3.2.2, which covers structural classification criteria for thin-shelled targets. 6.3.1.3 Cylindrical targets While there certainly is clear separation between the reflections produced by the cylindrical targets, data to allow material determination using a technique similar to that used by the thick-shelled cylinders was not available. Because of memory constraints, the recorded data was occasionally truncated prior to the second reflection from the back wall of the cylinder. Further, by the time that it became apparent that use of the ratio of the and 2 nd reflections (which correspond to the 2 nd and 1 st 3 rd reflections from the cylinder back wall) could be useful in material determination, the experimental set-up was no longer available. Additionally, as clearly evidenced in figure 6-11 shown later, the acoustical environment for solid cylinders can be relatively complicated. Many reflections are apparent in addition to longitudinal wave reflections from the various interfaces. Sources for these reflections include shear waves, creeping waves, and surface waves [17]. Therefore, no material information could be extracted from the data recorded on the cylinders. Finally, it should be noted that some of these more complex features have been observed in the thick-shelled pipes. They are much less prominent, however, and do not complicate the identification of structural features to a high degree. 78 6.3.2 Structure-based classification features 6.3.2.1 Thick-shelled targets For thick-shelled targets, two structural features can be easily extracted. They are the pipe shell thickness and the inner diameter of the pipe. Both could be measured to a precision on the order of 0.1 mm and were very useful in target identification. Figure 6-9 exhibits which features in a typical thick-shelled data recording correspond to these structural features. 2 1 .5 1 -s 0.5 0 0.5 -1 1 .5 L -2 1 .80E-04 2.00E-04 2.20E-04 2.40E-04 2.60E-04 2 .80 E -04 T im e (sec) Figure 6-9: Demonstration of which features in a typical thick-shelled recording correspond to the structure-based features of shell thickness and pipe inner diameter. The data shown above was recorded from object #34, an aluminum pipe with a shell thickness of 12.75 mm and an inner diameter of 50.81 mm. As illustrated in the figure, a thick-shelled pipe's wall thickness may be determined by examining how long after the first reflection (which corresponds to the front edge of the front wall of the pipe) the second reflection (which corresponds to the back edge of the front wall of the pipe) occurs. The inner diameter of the pipe may be determined by locating how long after the reflection from the back edge of the front wall the reflection from the front edge of the back wall occurs. While the algorithm that extracts these features uses a matched filter as explained in this section, the data shown above has not yet undergone this process. The first step in measuring the thickness of the pipe wall is to convolve the data with a matched filter (which is a time-reversed version of the strongest portion of the transmitted pulse). The start of the pipe's front wall reflection is then found by searching for the maximum absolute value of the convolved data. Because it is assumed during the application of this criteria that the data originated from a thick-shelled pipe, it is known that the 1st back wall reflection must occur at least 1.4 gsec after the front wall reflection. 79 Further, because it is known that the maximum thickness of any of the thick-shelled pipes is 24.78 mm and occurs for target #32, an aluminum pipe, the 1 st back wall reflection must occur within 6 gsec of the front wall reflection. Therefore, the convolved data between 1.4 gsec and 6.0 gsec after the front wall reflection is searched for the maximum absolute value. The location of this local maximum is considered to be the location of the Vt back wall reflection. 1 reflection and the 1 st Dividing the number of samples between the front wall back wall reflection by the sampling rate and then multiplying this value by the speed of sound in the material yields the wall thickness. A similar procedure is used to find the inner diameter of the pipe. This task is slightly complicated by the fact that the phase information available through the experimental system is somewhat difficult to use. This difficulty in the phase recovery arose because the sampling times are dictated by an external clock that has no relation to the pulse's central frequency. Therefore, phase information was deemed not to have a high enough value to merit the level of effort required to recover it. Although it was possible to work around the lack of phase information, inner diameter determination should be considerably easier for data acquired through the AIS. With the AIS, phase data can be quickly computed from the quadrature samples. This phase data could then be used to tell if a reflection is from an interface in which sound is passing from a material of higher to lower acoustical impedance or lower to higher acoustical impedance. With such information, reflections from the back edge of the front wall of the pipe could quickly be identified and ignored during the search for the first reflection from the front edge of the back wall of the pipe. To find the inner diameter of the pipe, the front reflection is again located. Because of knowledge of the large shelled pipes in the problem space, it is known that no back wall reflections will occur within 15 gsec of the front reflection. Therefore, from 15 gsec after the front reflection until the end of the recording is examined in the matched filtered data to find the location with the largest absolute value. Ignoring the first 15 gsec after the front reflection serves two purposes: 1) by ignoring data that could not possibly contain 80 the back wall reflection, the amount of work to be performed is decreased and 2) during the 15 gsec after the front reflection, the reflections from the back side of the front wall have time to die down to a level at which the front reflection of the pipe's back wall should be larger. To ensure that the reflection of the front edge of the back wall has been found instead of just more ringing from the front wall, data preceding the presently located data by a period of time equal to the propagation time through a pipe wall is examined. If this data has a strength that relates to the data just located by a relationship that matches the material's reflection ratio constraint, then the currently identified reflection is simply ringing from the front wall. If no such relationship exists, however, the currently identified location is determined to be the location of the back wall. Using the location of the back wall and the previously identified position of the back edge of the front wall, the inner diameter of the pipe is calculated. If no back wall data can be identified, the present object is assumed to be a plate. 6.3.2.2 Thin-shelled targets As stated previously in section 6.3.1.2, because of the complexity of the backscattered signal from a thin-shelled target, formation of an estimate of the object's material is essentially impossible. For the same reason, precise determination of the time at which the 1 't reflection from the back surface of the pipe's front wall occurs is quite difficult. While a sophisticated technique involving deconvolution may be able to determine this time, no success was achieved during the work on this thesis. The ringing pattern following the first reflection, however, is highly variable across pipes, yet essentially invariant across recordings for a particular pipe. Therefore, it was felt that a criterion that captures information about this ringing would be helpful in discriminating amongst the thin-shelled objects. Examination of the thin-shelled target reflection data indicated that for all objects, ringing persisted at a level greater than the background noise for at least 6 gsec. Further, during 81 the first 2 gsec after the start of the first reflection, the backscattered signals all appeared about the same. Therefore, the data in the range 2 - 6 gsec following the first reflection is used to characterize the ringing. The first step in the characterization process is to convolve the reflection data with a matched filter. By finding the maximum absolute value in the resulting data, the location of the first reflection is found. The convolved data is then broken up into four sections: 2 - 3 psec, 3 - 4 gsec, 4 - 5 gsec, and 5 - 6 gsec. The sum of the squares of the convolved values is computed for each region (these will be referred to as SSQ1, SSQ2, SSQ3, and SSQ4 respectively). A set of ratios of the sum of squared values is then used as a criteria which captures effects caused by the pipe material and wall thickness. Six independent ratios of two regions' sum of squares values can be formed from the four regions listed above. They are SSQ1:SSQ2, SSQ1:SSQ3, SSQl:SSQ4, SSQ2:SSQ3, SSQ2:SSQ4, and SSQ3:SSQ4. From these six ratios, a total of sixty-four sets of ratios could be formed. These span from the empty set to a set that includes all six of the ratios. To use a ratio set for discrimination, a template matching score is produced will be done later with the object recognition prototype. First, the value of each ratio in the set is computed for an unknown target. These values are then compared against the mean and standard deviation ratio values for each of the ratios that is stored in a template formed from the training data. These comparisons are used to produce the average number of standard deviations that each ratio differs from the template mean. This number of standard deviations is the template matching score. Each of these sixty-four ratio sets was evaluated based upon its discriminatory power. The measure of discriminatory power used was area under the receiver operating curve (ROC). A receiver operating curve plots the probability of a false alarm versus the probability of a detection for a particular object over all possible detection thresholds for a particular discriminatory test. The area under the ROC then gives a measure of the discriminatory power of a test for a particular object: the greater the area under the ROC 82 (up to a maximum of one), the greater the power of the test. For a particular test, an ROC could be drawn for each object. To create the ROC for an object, a ratio set template was formed from the training data for that object. A template scoring match value was then computed for every piece of training data. By varying the detection threshold, an ROC was plotted for that ratio set for that object. The area under this ROC was then computed. For a particular set of ratios, the average, maximum, and minimum area under the ROC over all objects was computed. Further, the standard deviation of the areas under the ROC over all of the objects was computed for each set of ratios. Following evaluation using this method, a ratio set composed of 2 - 3 sec:3 - 4 gsec, 3 - 4 gsec:5 - 6 gsec, and 4 - 5 psec:5 - 6 gsec was selected. This ratio set was selected for two reasons, simplicity and discriminatory power. Of the sixty-four ratio sets evaluated, the selected set performed the seventh best, with an average area under the ROC of 0.988, a standard deviation of 0.018, a maximum of 1.000, and a minimum of 0.942. Further, the selected criteria had the best performance of any ratio set comprised of three or fewer ratios in terms of mean area under the ROC and minimum area under the ROC. To summarize, the criterion developed in this section first locates the start of the first reflection by finding the maximum absolute value of the recorded data convolved with a matched filter. The sum of squared values in the convolved data is then calculated for the spans 2 - 3 gsec, 3 - 4 psec, 4 - 5 gsec, and 5 - 6 gsec following the first reflection start (these will be referred to as SSQ1, SSQ2, SSQ3, and SSQ4 respectively). The ratios SSQ1/SSQ2, SSQ2/SSQ4, and, SSQ3/SSQ4 are then computed and together serve as the criterion. 83 2 Ii - -1 I t- t 5 -2 2 .9 0 E -0 4 3 .0 0 E -0 4 3 .1 0 E -0 4 T im e 3 .2 0 E -0 4 3 .3 0 E -0 4 (se c) Figure 6-10: Illustration of the regions of data that correspond to the features extracted for a typical thin-shelled object data recording. The data shown above were recorded from object #16, a brass pipe with shell thickness of 1.62 mm and an inner diameter of 12.62 mm. Note that the data has not yet undergone matched filtering. The four brackets above denote the regions used in the ratio formation. The long two-headed arrow shows roughly what the length of time from the start of the first reflection to the strongest portion of the backscatter caused by the pipe's back wall might be. For thin-shelled objects, it is also possible to get consistent measurements of the length of time from the first reflection to the strongest portion of the backscatter caused by the back wall of the pipe. Because the material of the target is not known, it is not possible to form an inner diameter estimate from this time data, however, this time information provides a strong constraint on the classification possibilities. Figure 6-10 illustrates the regions of data that correspond to the features extracted for a typical thin-shelled object data recording that has not yet undergone matched filtering. 6.3.2.3 Cylindrical targets Because of the lack of material information for cylindrical targets as discussed in section 6.3.1.3, the exact thickness of a cylindrical target cannot be determined. However, the time at which the second reflection occurs can be precisely determined by searching for the first strong reflection following the front wall reflection. Currently, this task is implemented by matched filtering the data, finding the first reflection, and then searching for the first location in the matched filtered data whose absolute value exceeds a 84 threshold. Much like with thin-shelled targets, this time information provides a strong constraint on the classification possibilities. Figure 6-11 shows an example of the time to back surface feature on a typical cylindrical data recording prior to matched filtering. 2 1 .5 1'' 0 .5 ,~ 0 -0 .5 T imne f r om ecyIi nd e r fr Ln t e dg e t io n re fle c tio n _to ba1ic k e d ge ru fec -1 -1 .5 -2- 2 .90E-04 3.00E -04 3.10E -04 3 .20E -04 3 .30E -04 T im e (sec) Figure 6-11: Example of the time from cylinder front edge to back edge structural feature from a typical cylindrical data recording prior to matched filtering. The data shown above was recorded from object #21, a brass cylinder with diameter 12.66 mm. As previously mentioned in section 6.3.1.1, the acoustical recordings from cylindrical objects can include considerable complexity. The recording shown above in figure 6-11 is typical of this class. Through knowledge of the cylinder thickness and the speed of sound in brass, it can be determined that the reflection located at approximately 305 gsec is the first longitudinal wave reflection from the back surface of the cylinder. For all cylinders used, the first relatively strong reflection is the first longitudinal wave reflection from the back wall. Indeed. barring reflections from internal structure. the first relatively strong reflection following the front wall reflection will always be the first longitudinal wave reflection from the back wall, as longitudinal wave velocities are greater than both shear and surface wave velocities. Several other features may be identified in the recording shown in figure 6-11. The reflection at 310 sec is the result of the first shear wave reflection from the cylinder's back surface. The smaller reflection located just after that reflection is the second longitudinal wave reflection from the back surface of the cylinder. The larger reflection starting at approximately 317 gsec is due to surface waves. Finally, other features present are a complicated mixture of higher order reflections of the types just mentioned, creeping waves, and reflections from the internal structure of brass. 85 6.3.3 Frequency domain-based classification features Examination of the frequency content of the backscatterred acoustical signals indicated that physical processes manifested themselves identifiably in the frequency domain. For example, resonant frequencies with respect to the pipe wall appeared as troughs in the frequency content of the return. This phenomenon occurs because energy at the pipes resonant frequencies rings easily in the pipe walls. Therefore, this energy stays trapped within the pipe, which results in a deficit in the amount of energy at the resonant frequencies that is returned to the transducer. While this information could manually be used for pipe wall thickness determination for some pipes, other frequency content effects of unidentified physical origin often interfered with this process. Therefore, it was not possible to exploit frequency content information for uses such as automatic wall thickness determination for thin-shelled pipes. Despite confounding attempts to use the frequency domain to determine pipe shell thickness, the frequency-content effects of unknown source were consistent and showed a low degree of variation, especially within regions of the frequency spectrum. Matching the frequency content of specific regions of an unknown target against templates individualized for each known object proved highly valuable in the identification task. To make the object frequency content templates more general, the templates were based upon the transfer function of the object. Recall that for an LTI system, the output is just the convolution of the input and the system's transfer function. In the frequency domain, this corresponds to the output being the multiplication of the input by the system's frequency response. Luckily, it has been assumed that the AIS and the experimental system are LTI. Further, the transmitted acoustical pulse is known and the backscattered acoustical signal can be measured. Therefore, an estimate of the frequency response of the system can be generated by dividing the received frequency spectrum by the frequency spectrum of the transmitted acoustical pulse. While this estimated frequency response includes effects of transmission through water, it also contains information 86 about the target. For the experimental set-up, the impact of the transmission through water on the frequency content was essentially unchanged over all recordings. Therefore, the impact of water on the transfer function could be ignored. However, if it is later found alterations in water temperature, pulse transmission distance, and water particulate content substantially degrade the target transfer function estimate, corrections can be made for these effects. For frequency response calculations 500 data points, or 16.667 gsec of recorded data, were used. It is important to remember that by using such a large range of data, the acoustical energy that is recorded may have passed through differing "systems". For example, within 17 gsec for a thick-shelled object, reflections from the front edge of the front pipe wall and reflections from the back edge of the front pipe wall will both be recorded. The backscattered energy resulting from each of these interfaces will have experienced different transfer functions. Therefore, references to a single transfer function estimate are fast and loose with terminology, but avoid the complication of having to continuously refer to the average frequency response encountered by the acoustical pulse for the data recorded over some period of time. In all cases, the frequency response estimate of a target was calculated in the same manner. The power spectral density of this signal was calculated using Welch's averaged periodogram approach. In this approach, the 500 data points were broken up into twentyone evenly spaced overlapping sections of length 100 (this corresponds to an overlap of 80 data points). Each of the sections was then filtered with a Hanning window to reduce the frequency content impact of the truncation process. Each section was zero-padded to length 1024, and the magnitude-squared Discrete Fourier Transform (DFT) of each section is computed using the Fast Fourier Transform (FFT) algorithm. Following FFT computation, the twenty-one regions are averaged to produce the estimate of the frequency content of the signal. The data was then converted into decibels (dB) by taking the logarithm and multiplying that result by ten. Following this conversion, the transfer function estimate in dB was then calculated by subtracting the acoustical pulse frequency 87 content in dB from the received signal frequency content in dB. Figure 6-12 shows a typical complete transfer function estimate for an object. Marked off within the figure are also the regions of interest that are used as the transfer function template for the object. .. ..... 2 - 0 -4 2 -6 -8 -1 0 0.0E+00 1 .0E+06 2.0E+06 3.0E+06 4. 0E+06 5. 0E+06 6.0E+06 Frequency (H ertz) Figure 6-12: A typical complete transfer function estimate. The data shown above is derived from data taken for object #25, a brass plate with thickness 6.35 mm, height 225 mm, and width 125 mm. The recordings were made at the center of the plate. Also shown above are the two regions of the transfer function estimate that are used as templates during transfer function estimate similarity determination. They are denoted by the numbers one and two in the stars and span the entire region between the dotted lines as indicated by the two-headed arrows with which they are associated. These regions were chosen because of two reasons: 1) they were not generally present for all objects, and 2) they were very consistent in form across all of the training data for object #25. While the method used to compute the transfer function estimate of a target was the same in all cases, the data used to calculate this estimate differed by object class. It was found that the frequency content of the received signal contained varying amounts of information at different times depending upon the class of the object. The following sections describe the different ways in which frequency-content information was used for each class of objects. 6.3.3.1 Thick-shelled objects For thick-shelled objects, data in the range of 20 samples prior to the start of the first reflection to 479 data points after the start of the first reflection were used to create the 88 frequency response estimate. This data includes the first reflection and ringing from the back edge of the front wall of the pipe. 6.3.3.2 Thin-shelled objects For thin-shelled objects, data in the range of 20 samples prior to the start of the first reflection to 479 data points after the start of the first reflection were used to create the frequency response estimate. This data includes the first reflection and ringing from the back edge of the front wall of the pipe. For pipes with an inner diameter beneath approximately 25 mm, this range will also generally include at least a bit of the acoustical backscatter caused by the back wall of the pipe. 6.3.3.3 Cylindrical objects For cylindrical objects, data in the range of 20 samples prior to the start of the first reflection from the back edge of the cylinder to 479 data points after the start of this back edge reflection were used to create the frequency response estimate. While excluding the first reflection, this data includes the first reflection from the back edge of the cylinder. As mentioned previously, the backscattered acoustical energy for the solid cylinders can include a great deal of complexity. Therefore, acoustical backscatter originating from shear wave reflections, creeping waves, and surface waves may be included. Finally, this range also generally includes the second reflection from the back edge of the cylinder if the data recording has been carried out over sufficiently long period of time. 6.3.4 Summary of target classification criteria Depending upon the class of an object, between two and four criteria are used in the creation of a template to characterize the acoustical returns expected from that object. Table 6-4 summarizes the classification criteria used for templates of the different object classes. Further, an equation-based explanation of each of the nine criteria listed below is included in appendix E. 89 1) Material determination 2) Pipe shell thickness estimate 3) Pipe inner diameter estimate 1) Signal strength ratios for time ranges shortly after the first reflection 2) Length of time from the first reflection to the peak strength of the pipe back wall reflections 3) "Transfer function" estimate from data around first reflection 1) Length of time from reflection to the first back edge reflection 2) "Transfer function" from data around first back edge reflection the first cylinder estimate cylinder 4) "Transfer function" estimate from data around first reflection Table 6-4: Summary of the criteria used to determine the similarity between a target and an object of each class 6.4 Object recognition prototype presentation The object recognition prototype consists of two stages: 1) model-building from training data and 2) classification from testing data. As stated previously, thirty recordings were taken for each object. These recording sets were broken up into training and testing subsets, each containing fifteen recordings. From the training data, a class-specific template for each object was created. In this template is stored the average score of each member of the object's training set on each of its criteria. The standard deviation over the training set of the scores for each criteria are also recorded. Additionally, contained in each template is a listing of each frequency region that was determined to be of classification interest in the transfer function estimate. Finally, the average transfer function value as determined from the training data is stored in the template. The transfer function region listings and average region values are contained in the template so that a general function can be used to perform the frequency domain template matching instead of requiring the development of a new function for each object. After template formation has been concluded for each of the objects, all of the targets that are in each object's testing set are classified. As stated previously in section 6.3.1.1, the thick-shelled pipe material determination algorithm was found to perform at essentially 100% accuracy for the training data. Additionally, it was found to classify 100% of the 90 cylindrical targets as having an unknown material for the training data. Over the training data, the results of the algorithm were highly variable for the thin-shelled objects. Therefore, to reduce the amount of computation required, the first step in the classification algorithm is performance of the thick-shelled pipe material determination algorithm. If the algorithm produces a material classification, the thick-shelled and thinshelled object templates are checked. If the algorithm decides that the targets material is unknown, the thin-shelled and cylindrical object templates are checked. To check a template against a target, the classification algorithm performs each of the classification criteria for the object template's class on the target data. The results of each of the tests is then compared with the object mean and standard deviation stored in the template to compute the number of standard deviations by which the target data differs from the template. The average number of standard deviations by which the target data's test result differs from the template's mean for each of the criteria is then that target's score for the template. After the target has been compared against each template, a classification is made based upon which template produced the lowest score for the target. Shown below in figure 6-13 is a schematic illustration of the object recognition prototype. 91 For a particular --------------------Apply the thick-shelled object class's material determination algorithm to the recording. object, perform the appropriate classification criteria, as described in section 6.3, on every member of the object's training set sets L - . - - - - - - - '41m Repeat the preceding two steps until they have been completed for every object. For a particular object, select a particular element of its testing array. Calculate the mean score and the standard deviation of the scores for each criterion for the object. Store these values in the object template. Training Score the test * Did the material determination algorithm return es Perform all thinshelled object and cylindrical object esNt tests on the recording No Perform all thin-shelled object and thick-shelled object tests on the recording. 0I results against Classify the each template by computing the average number of recording according to which template received the lowest score. standard deviations the recording differed from the template mean for all criteria. J Repeat the preceding six steps until each member of the object's testing array has been classified. Repeat the preceding seven steps until each object has had all of the members of its testing classified. Testing Figure 6-13: Flow diagram of the object recognition prototype program. 92 6.5 Object recognition prototype performance The performance of the object recognition prototype is presented for four cases: 1) closed problem space with highly controlled target/transducer orientation and no noise added, 2) closed problem space with highly controlled target/transducer orientation and Gaussian noise added, 3) closed problem space with loosely controlled target/transducer orientation and no additive noise, and 4) open problem space with highly controlled target/transducer orientation and no additive noise. The terms highly and loosely controlled transducer/target orientations refer to the data sets discussed at the end of section 6.2. For the highly controlled orientation data set, the vertical position of the transducer centerline was varied about ± 1.5 mm of the centerline of the pipe. For the loosely controlled data set, this variation reached ± 12 mm. In neither of the data sets was the horizontal position of the transducer controlled beyond assuring that a specular reflection was received. For each of these cases, the accuracy of the system will be presented. To further illustrate the discriminatory power of the prototype, confusion matrices will be presented for some of the cases. 6.5.1 OR performance: closed problem space, highly controlled orientation, no noise added When the orientation of the transducer and the target was closely controlled, the system's performance was quite good. The prototype achieved 97.78% classification accuracy, incorrectly classifying only 13 out of 575 targets. confusion matrix for this case. 93 Shown below in table 6-5 is the Classification Jl 34 35 36 37 38 39 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 15 15 15 15 11 4 155 15 15 15I 14 155 ;5 15 U 0 15 13 2 ------------- --- --- - -- 3 - .......-- - ...... ....-- - 1-.........- --.- L ---.-- ...------.....- 15 -. . - - ...............- i5 15 15! i 15 14 15 15 15 'II 115 14 15! 15 15 15 Table 6-5: Confusion matrix for the object recognition prototype output. The row headings correspond to the actual identity of a target. The column headings correspond to the classification that a target received. The numbers that appear in a specific cell at row i, column j in the table correspond to how many targets of identity i received classification j. 94 6.5.2 OR performance: closed problem space, highly controlled orientation, noise added The prototype behaved as expected when its data was subjected to Gaussian additive noise. Performance degraded as more noise was added. Note that even the lowest amount of noise added to the data, Gaussian with zero mean and 0.01 standard deviation, is greater than the noise inherent in the system. Table 6-6 below presents the accuracy of the system under various levels of additive noise. All noise is Gaussian with zero mean. To increase the noise level, the standard deviation of the Gaussian distribution was increased. Standard 0.00 0.01 0.02 0.05 97.78% 95.73% 84.62% 44.44% deviation of added noise Accuracy Table 6-6: Performance statistics for object recognition prototype at varying levels of added Gaussian noise. Confusion matrices were also prepared for the two lowest noise levels. Tables 6-7 and 6.8 present this data. When misclassifications did occur, the incorrect classification that the prototype produced was skewed towards a small number of objects. Specifically, most of the misclassifications erroneously identified the target as either object #11, 18, 23, or 24. This phenomenon is largely attributable to the large standard deviations the templates for these objects were assigned during the training process. The criteria scores that a target received were often much closer to the mean values stored in the template for the actual object than the mean values stored in the template for the misclassification. Because the standard deviations in the latter, however, were much greater than the former, the target was incorrectly identified. This source of mistakes could possibly be lessened by weighting the scores of a template match by a factor related to the size of the standard deviation. 95 111 1 2 1 2 3 4 5 7 8 Classification 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 15 - 15 15 1 14! 13 2 15 15 9 10 i5 12 15 13 5 14 14 15 5 16 ~ 17 0 pa 15 19 20 - 15 18 0 15 2 10! 21 3 14 22 15 23 15 24 15 25 14! 26 27 ~ 15 213 28 29 - 15 15 3 12 30 15 31 ~ - IS 32 .3 33 15 34 35 5 36 ~ 37 -- - -5 38 9 139 6 151 Table 6-7: Confusion matrix for the object recognition prototype output in which Gaussian noise with standard deviation of 0.01 has been added to the data. The row headings correspond to the actual identity of a target. The column headings correspond to the classification that a target received. The numbers that appear in a specific cell at row i, column j in the table correspond to how many targets of identity i received classification j. 96 Classification 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 15 15 14I 15 14! 13 2 5 -I----------------------------------------------- 5i 51 5 5 1 13 44, 3 2 14 14 0 - - - - - 2 ------------ 13 ---------------I.-.....-..._. ------------.-----.-.--.....-.-... . - - - - - -- -L- - - - - - -- t A3 'U 2 - - -- - - 11 4 15 '12 2 I 3 11 ;5 14 2 13 15 3 15 --- - -- --- ~ -~- - - ~- -3-- - - - - - 3 ----- 17 - - - - ---- 102 91 95 ---- --- 7 -- - -- - -- - - - - 3 Table 6-8: Confusion matrix for the object recognition prototype output in which Gaussian noise with standard deviation of 0.02 has been added to the data. The row headings correspond to the actual identity of a target. The column headings correspond to the classification that a target received. The numbers that appear in a specific cell at row i, column j in the table correspond to how many targets of identity i received classification j. 97 6.5.3 OR performance: closed problem space, loosely controlled orientation, no noise added The prototype system performed better than expected on the data set with the loosely controlled orientation. The classification accuracy was 90.76%, a mere 7% below the tightly controlled data set. Shown in table 6-9 is the confusion matrix for the classification results derived from the data set with the loosely controlled orientation. This confusion matrix again exhibits a clustering of erroneously used classification labels, with targets often inappropriately declared to be either object #13, 15, or 16. Further, while 54 of 585 targets were mislabeled, 31 of these misclassifications occurred for data originating from either object #29, 32, or 39. 98 Classification 1 2 1 2 3 4 5 6 7 8 9 10 12 3 4 5 6 7 8 9 10 11 12 1314 15 1617 18 19 20 212223 24 25 26 27 28 29 30 31 323334 35 36 37 38 39 15 15 15 14 151 ~ I 5 5 15 15; 15 1 1 1 3 4 13 14 1 4 I 1 1 4, 16 115 I5 17 t 18 5I 15 19 3 20 1 11; 21 !15 22 5 I5 23 24 4 25 5 26 2 13 15 27 15 28 29 3 4 1 31 I1 5 2 15 33 15 34 35 36 7 5 30 32 ~ ~ 15 5 4 !10 37 15 38 15 39 It 2 Table 6-9: Confusion matrix for the object recognition prototype output when recognition is performed on the loosely controlled orientation data. The row headings correspond to the actual identity of a target. The column headings correspond to the classification that a target received. The numbers that appear in a specific cell at row i, column j in the table correspond to how many targets of identity i received classification j. 99 6.5.4 OR performance: open problem space, highly controlled orientation, no noise added Up to this point, the object recognition task has only been dealt with in the context of a closed problem space. That is to say that every target that is presented to the OR system is a member of a finite set of objects. For each of these objects, a template has been generated during the system's training phase. The classification task is thus reduced to selecting the template that best fits the target data. In the real world, however, object recognition must be performed in an open problem space. There is no finite set of which every encountered target must be a member. Templates may not be built to represent every possible target. Therefore, it is essential to divide the world up into two groups of objects: those objects that should be exactly identifiable (known objects) and those objects that should be identified simply as not being a member of the former group (unknown objects). For a mine classification application using the AIS, this division is straightforward. The world of objects is divided into mines and not mines. Once the problem space of targets has been divided into known and unknown objects, data should be collected on all of the known objects and classification templates formed. At this stage, however, a complication in the classification process arises. When attempting to classify a target, it is no longer enough to simply determine which template best matches the target data. Under an open problem space, it must also be decided whether the best matching template matches the target data "closely enough", a frustratingly subjective criterion. A main advantage to the scoring system used in the prototype object recognition system is that it somewhat reduces the degree of this subjectivity. The scoring system assigns each template match a score based upon the average number of standard deviations the target data differs from the template's means on class specific criteria. Therefore, the same subjective "close enough" test can be applied to the top scoring template match regardless of the object to which the template 100 corresponds. Further, because only one criterion must be adjusted, tuning the system to achieve the desired false alarm/false negative ratio is greatly simplified. Histogram of the top scoring template matches 600 400- 2000 0 10 5 15 20 Partial histogram of the second best scoring template matches 600 400L 2000 0 ---10 5 15 20 Template matching score (standard deviations) Figure 6-14: Histograms of the best and second best scoring template matches from an execution of the object recognition algorithm with a closed problem space, highly controlled transducer/target orientation, and no additive noise. The data displayed above were produced from an execution of the object recognition prototype in which 573 of the 585 classifications were correct. Therefore, the top scoring template match data displayed in the top plot above is a good estimate of the template match scores that each target's true identity received. The data in the bottom plot, representing the second best template match score for each classification, is only a partial histogram because the scores actually stretch out to approximately 50. To better showcase the degree of overlap between the two plots at low levels, however, the higher-level data points were omitted. Figure 6-14 above displays histograms of the top scoring template match and the second best scoring template match for an execution of the closed problem space OR prototype similar to that discussed in section 6.5.1. In this execution, the transducer/target orientation were highly controlled for the data collection, and no noise was added to the data prior to its use by the prototype. This particular execution correctly identified 573 out of the 585 targets classified. Therefore, the top scoring template match data shown in figure 6-14 is essentially equivalent to the scoring data for the template that corresponds to each target's true identity. As can be seen, there is a small degree of overlap between the top scoring and the second best scoring templates. For the most part, however, the 101 data indicates that even the best scoring template for a false match performs substantially worse than the worst scoring target for a true match. This suggests that the object recognition can performed in an open problem space without significantly degrading performance. To test the ability of the object recognition prototype to perform object recognition in an open problem space, roughly a third of the thirty-nine objects that have been used in the closed prototype OR assessment so far were randomly selected and removed from the list of known objects. Two additional objects, a solid pine cylinder of diameter 37.31 mm (#40) and a pine plank of thickness 19.32 mm and width 88.68 mm (#41), were added to the third of the objects that were randomly selected to form the unknown object list. Templates were formed for all of the known objects, and then classifications were performed on all of the objects, both known and unknown. A decision was made to err in favor of false alarms instead of false negatives. Therefore, if the top scoring template had a score below the quite generous level of 4.0 standard deviations, a classification was made. If the top score, however, exceeded this threshold, the object was labeled as unknown. Following this procedure, a classification accuracy of 92.2% was achieved (567 out of 615). Table 6-10 shows the confusion matrix for the open problem space object recognition prototype execution. This confusion matrix provides a good deal of insight into the overlap in the template matching scores between the known and the unknown objects. As can be seen, accuracy on the known objects for was 94.29% and accuracy on the unknown objects was 87.69%. For the threshold of 4.0 standard deviations that was used, the probability of a false alarm for an unknown object was 12.31%. Further, of the twenty-four unknown targets that were assigned classifications, twenty-three of those targets had a true identity of either object #24 or object #6. Finally, the probability of a missed detection (a known object being classified as unknown) is 1.67%. 102 Classification 2 4 5 2 4 5 7 10 Q 12 13 14 15 16 19 20 22 26 7 1011 12131415161920222627282930313233343536373839Unk 15 15 1 14 15 15 213 15 1 14 14L_ i15 _ 12 3 1 14 15 I 15! "0 27 5 28 29 4 1 11 15 15 30 31 32 33 34 35 36 37 38 39 Unk 15 1 ;14 15 13 2 15 1 141 1 13 2 15 15 8 171 1 Table 6-10: Confusion matrix for the object recognition prototype output with an open problem space. The row headings correspond to the actual identity of a target. The column headings correspond to the classification that a target received. The numbers that appear in a specific cell at row i, column j in the table correspond to how many targets of identity i received classification j. Note that the heading Unk represents unknown objects. 103 6.6 Conclusion The object recognition prototype system presented in this chapter demonstrates the wealth of identification information that can be extracted from acoustical backscatter in the low megahertz range. For the data gathered with the experimental system, the prototype was able to achieve near 100% accuracy. While the data presented in section 6.2 suggested that at least the frequency content of the backscattered signal may be highly dependent upon target/transducer orientation, the OR prototype produced highly accurate results when using a data set in which this orientation was not tightly controlled. This indicates that one of the main obstacles to implementing an object recognition system using the AIS, ensuring consistency in the viewpoint from which data is gathered, may not be as much of a hurdle as previously suspected. Further, as demonstrated in section 6.5.4, the template scoring showed a good deal of separation between the template corresponding to the target's identity and other templates. Because of the relatively high degree of separation amongst the objects in the identification scoring space, implementing a reliable open problem space system for real world mine identification applications seems plausible. 104 7 Object recognition implementation suggestions 7.1 Introduction This chapter briefly presents suggestions for the implementation of an object recognition system using the AIS. In section 7.2, sample images taken using a precursor of the AIS are presented. Section 7.3 then discusses methods by which the presence of a target in an image may be noted and how a region of interest for that target may be extracted. In section 7.4, the use of an ensemble of time series originating from many detector elements to perform more robust object recognition is explored. Incorporation of imagelevel classification features into an object recognition algorithm is then dealt with in section 7.5. Finally, section 7.6 investigates how classification features could be provided with weights that are individualized for each template, thereby allowing those features with higher discriminatory power for a particular object to be emphasized. 7.2 Sample images As mentioned previously, technical difficulties precluded working with the AIS during this thesis. Recently, however, it has become possible to acquire crude pictures using a full array on a precursor of the AIS. While examining these images, it is important to keep two points in mind. First, only the depth and magnitude of the strongest return value at a particular array location can currently be recorded. Second, many system-level imaging issues remain to be resolved such as how to reduce acoustical lens reflectivity and how to determine the proper settings for the AIM. As these matters are resolved, image quality will certainly improve. Despite these caveats, inspection of sample images still provides clues as to what an image from the AIS will be like. Shown below in figure 7-1 are two such sample images. 105 Image of object #8: A copper pipe w ith outer diameter 53.98 mm, thickness 1.02 rrm, and approx. length 225 rrm 20 40 60 20 40 60 80 100 120 Image of object #33: An aluminum pipe w ith outer diameter 76.23 mm, thickness 6.03 mrn, and approx. length 400 m S20 S40 60 20 40 60 80 100 120 Array elements Figure 7-1: Sample images taken from a precursor of the Acoustical Imaging System (AIS). To allow the figure to be more compact while not distorting the aspect ratio of the images, both images show only the data from the top half of the detector array since there was no data located in the bottom halves. The top image is from object #8, a copper pipe with outer diameter 53.98 mm, thickness 1.02 mm, and length of approximately 225 mm. The bottom image is from object #33, an aluminum pipe with outer diameter of 76.23 mm, thickness 6.03 mm, and length of approximately 400 mm. The images shown represent the maximum reflection intensity encountered at each array element, regardless of the depth of that maximum. Lighter shades of gray represent more intense reflections. In both cases, image data was acquired for approximately one second (ten images at the data acquisition rate used). Both of the images shown above were the first recorded image. There was a slight degree of background speckle that varied from image to image, however, the target images were very consistent. Note that a median filter has been applied to the images shown above to eliminate some of the speckle. In both images, there is a central bright spot. Further, the targets appear to stretch further horizontally than they do vertically. Also, the long axes of the targets appear to have a slight clockwise rotation with respect to horizontal. Finally, both images contain elliptical scatter patterns that are centered about the bright spot. Because the central bright spot and the elliptical scatter patterns are both likely results of the transducer beam used, these features are consistent with the way that the pipes were presented to the AIS. The apparent length of the pipes, however, is not consistent with the pipes used. 106 Both pipes were located approximately a meter from the AIS and were centered horizontally and slightly offset vertically with respect to the imager. The field of view of the AIS at this range is approximately 250 mm by 250 mm, thus each pixel represents approximately 2 mm by 2 mm area. Therefore, the length of the copper pipe (top image) is such that it should just about fill the complete horizontal field of view, and the length of the aluminum pipe is such that it should easily span the image horizontally. Neither of the pipes, however, appears to stretch from edge to edge of the image. Instead, the copper and aluminum pipes appear to stretch approximately 80 mm and 120 mm respectively. Further, by examining the region outside the central bright spot, the aluminum pipe appears to be approximately ten pixels (or 20 mm) in diameter and the copper pipe a bit thinner. Because a surface must be roughly perpendicular to the imager's surface normal to be observed, this diminished apparent diameter was expected. It is not known why the apparent lengths of the pipes are shorter than the actual lengths. The region of insonification had a diameter of approximately 360 mm at the range of the pipes. Therefore, the insonified region covered the entire field of view. It is hypothesized, however, that the apparent length of the aluminum pipe is greater than the copper pipe due to the greater outer diameter of the aluminum pipe. It is believed that aluminum's material properties may also be a factor. Further, it is hypothesized that the greater actual length of the aluminum pipe is not a factor in this phenomenon. 7.3 Image segmentation In real-world object recognition tasks, an important first step is referred to as image segmentation. The system must determine whether there is a target that should be classified in the current image. If so, the spatial extent of that target must be identified. In high signal-to-noise environments, this task is fairly straightforward: a specular reflection can be identified by looking for data that exceed a threshold. To identify the extent of the target, all of the contiguous pixels that contain a reflection with strength greater than some other (likely lower) threshold could be grouped. 107 In lower signal-to-noise environments, the image segmentation task becomes more complicated. Because noise levels may rise to the level of even stronger specular reflections, it is not sufficient to naively apply a threshold to all of the data to search for the presence of a target. As can clearly be seen in figure 7-1, the image contains a great deal of order. It is suggested that a detection metric which takes into account the connectivity of the high intensity pixels be used to determine whether or not a target is present. Such an approach could be as simple as requiring that the signal strength in some prescribed region be above a threshold, as done by Tuovila, Nelson, and Smith [13]. A more complicated detection method that incorporates knowledge of the noise environment and the targets to be identified could also be developed. Czerwinski, Jones, and O'Brien present an example line and boundary detection system for two-dimensional speckle images and discuss the tradeoffs involved extensively in [14]. 7.4 Use of an ensemble of time series to improve object recognition Following target identification, the region of interest selected from the image, assuming there was a region of interest selected, will undergo classification. Due to limitations in the nature of the data that could be acquired, the work in this thesis has focused on a prototype classification algorithm that makes use of only a single acoustical time series. Many such time series, however, make up an image. Using only a single time series, therefore, neglects a great deal of the information available. Further, the prototype made no use of such image-level characteristics as shapes and relative orientations. This section will suggest approaches that may allow multiple time series to be used together to provide increased discriminatory power over a single time series. The next section will briefly discuss the incorporation of image-level characteristics. The first step in the use of multiple time series for target classification is to precisely determine the degree to which the signals acquired by the AIS are viewpoint dependent. This issue was first presented in section 6.2. In that section, it was shown that the data acquired by the experimental system was dependent upon the viewpoint, particularly with regard to vertical translations of the transducer with respect to the target. The reasons for 108 this viewpoint dependence, however, could not be precisely determined. Neither viewpoint dependent alterations in the transfer function encountered by the acoustical signal nor spatial variations in the transmitted acoustical pulse could be ruled out. Therefore, it will be necessary to perform a similar set of experiments with the AIS to determine the degree to which the data it acquires is viewpoint dependent. If it is found that the data from the AIS exhibits little or no viewpoint dependence, then all pixels in the region of interest should be good candidates for use in time series object recognition. If on the other hand it is found that the data from the AIS exhibits viewpoint dependence similar to that shown by the experimental system, then only a subset of those pixels in the region of interest should be used in time series object recognition. The subset of array elements whose time series data is used should exhibit two properties: 1) the viewpoint of all of the elements is similar in nature and 2) that subset of pixels can be reliably located and extracted each time a target is encountered. For the pipes and cylinders used in this thesis, such a subset could be easily extracted. As discussed in section 6.2, due to the symmetry of the pipes and cylinders, the viewpoint dependence of the received signal is also symmetric about the target's long axis. By first identifying the directional orientation of the object's long axis and then searching for this point of symmetry, a line of array elements could be identified from which to take the time series data. Once a set of array elements has been identified for use in time series object recognition, the time series data from these elements must be combined in some way to produce a classification. Two relatively simple approaches seem to merit exploration. In the first, the data from each of the elements is combined to form an aggregate signal. This task may be accomplished using an algorithm as simple as aligning the start of each signal and then averaging the return at every point. The object recognition algorithm should then be performed on the aggregate signal. The second approach would first run the object recognition algorithm on each of the elements' time series. The results of each of these classifications would then be pooled to form a final classification. This pooling could 109 take place using a simple voting algorithm in which the known object receiving a plurality is selected. Alternately, a more complicated algorithm that takes into account the similarity between objects could be used. Under this more complicated scheme, a similarity score between each time series and the templates would be computed. The similarities for each template over all time series would then be summed, and the template that was found to be most similar selected. The exact classification method used should depend upon the amount of computation required and the performance achieved by the various methods. 7.5 Image-level object recognition While to a human observer interpretation of the types of images shown in figure 7-1 is very difficult, it should be possible to extract meaning from such images and to use this meaning in the classification process. For simple objects like those shown in figure 7-1, such features as apparent target diameter and length can be extracted. Target shape is another important feature that should be exploited. For more complicated targets that are composed of interconnected parts, it should also be possible to get a sense not only of the shape of each part, but the relative orientation of the parts. A precise statement of the types of features that can be extracted at the image level and the usefulness of those features cannot be made until more experimentation has been performed. While difficulties in using this type of data, such as how to deal with the inherent viewpoint dependence in an image, are certain to be encountered, addition of image-level information to time series classification algorithms seems promising. 7.6 Weighting of classification features While the performance of the classification algorithm developed in chapter 6 was shown to be quite good in section 6.5, it is believed that this performance could be improved even more by the addition of another step to the training process. As described in section 6.4, for each known object a template is created during the training process that contains the mean score and standard deviation of scores for class specific criteria as computed on the training data. At present, each of the template's criteria is given equal weight during 110 the classification process. The criteria, however, have individual discriminatory power that differs from object to object. By applying template specific weights to each of the criteria, the overall classification performance of the system should be able to be improved. This weighting could be accomplished in one of two ways. First, a weighting function could be developed that assigns weights based upon the discriminatory power of each criterion. Second, after the criteria have been computed, a multidimensional optimization algorithm could be used to compute the weights that would result in the fewest incorrect responses. Both approaches would require that the weights for each criterion in a template be normalized so that the sum of the weights applied equals the same value for all templates. Further, for both approaches, if sufficient data is available, the training data should be split into two subsets. The first subset should then be used for template creation and the second subset for criterion weight determination. 7.7 Acoustical Imaging System (AIS) design suggestions Previously, the surface detection algorithm was carried out by specialized hardware in the Image Interface Board. Only the plane and magnitude of the surface at each point in the detector array were passed back to the DSP Image Processing Board. In chapter 4, an implementation of the surface detection algorithm was developed that could perform the algorithm under worst case conditions using under half of the computational and communications power available from the DSP chips. By showing that the surface detection algorithm could be managed by the digital signal processing chips of the DSP Image Processing Board from both a computation and communication standpoint, these highly versatile programmable chips can now be given access to vast amounts of data. Current plans for the AIS, however, intend to present only the magnitude of the acoustical return at each of eighty planes to the DSP Image Processing Board. This approach would severely limit the object recognition power that could be achieved by the AIS. Therefore, it is suggested that the capability be added to the AIS for the DSP Image Processing Board to request certain parts of data from the Image Interface Board. In this way, a background task on a DSP could identify a region of interest in the magnitude data for an 111 acoustical image. It could then request to have all 320 of the samples that were used to generate the 80 magnitude values for each detection element that is in the region of interest shipped back to it. This approach would allow the communications and memory requirements of an object recognition algorithm to be kept in check while at the same time maintaining a high degree of discriminatory power. Finally, it is also recommended that if object recognition is to be attempted using the AIS, use of shorter duration acoustical pulses should be explored. Shorter pulses will allow for finer-grained determination of structural features, as more objects will be able to be considered thick-shelled. Further, because shorter pulse duration translates into higher bandwidth transmission, more frequency spectrum-based information will be available. 7.8 Conclusion This chapter attempted to briefly provide suggestions for the implementation of an object recognition algorithm in the AIS. Crude images taken from a precursor to the AIS are shown in section 7.2. Section 7.3 then discusses how regions of interest that may represent known objects can be located within such images. Following this image segmentation, detector array elements must be selected and their time series analyzed. Time series classification using data from an ensemble of array elements is discussed in section 7.4. Image-level data will also be available from the AIS; incorporation of this type of data into a classification algorithm is the subject of section 7.5. Section 7.6 explores methods by which the classification criteria may be weighted to emphasize those criteria with more discriminatory power. Because much work remains to be done on the AIS, these suggestions could not always be concrete. This state of the AIS, however, afforded the additional opportunity to suggest simple design changes that could increase the object recognition power of the system. In section 7.7, it is suggested that the DSP Image Processing Board should be given the ability to request additional data from the Image Interface Board. This data would contain all of the samples within a specified region of detector elements and could then be used by an object recognition algorithm. Further, section 7.7 recommends exploration of the use of shorter acoustical pulses than 112 currently planned for the AIS. It is hoped that the suggestions in this chapter may serve as a starting point for the development of a more complex object recognition system to be used by the AIS that builds off of the work in this thesis. 113 8 Conclusion This thesis deals with a subset of the data analysis tasks that could be attacked by the Acoustical Imaging System. After presenting a brief overview of the system hardware and software, two main tasks are addressed: surface detection and object recognition. Chapters 2 and 3 briefly present the hardware and the software components of the Acoustical Imaging System. These are intended to familiarize the reader with the system being developed at Lockheed Martin, as well as serve as background for various implementation issues that are discussed later. It is shown in chapter 4 that the surface detection algorithm may be reliably performed by software running on the DSP Image Processing Board. The chapter starts by presenting a brief justification for the algorithm selected. It then presents the preprocessing necessary to prepare the acoustical return data for the surface detection algorithm. The algorithm is then presented in depth, and is followed by a proof of its correct function. Finally, the computational and communications burden caused by the surface detection algorithm are analyzed given a few weak assumptions and shown to consume less than half of the processing power of the DSP Image Processing Board. Chapters 5 and 6 showed that on an experimental system that gathered information similar to that available to the AIS, highly reliable object recognition could be performed on a set of test objects consisting of solid cylinders, pipes, and plates of varying materials. These objects are relatively simple, however, their shapes were shown to be representative of sea mines currently in use. While they lack the internal structure of many mines, this omission is both a blessing and a curse. This internal structure will make the acoustical returns from mines more complicated. An intelligent object recognition algorithm that makes use of this structure as well as its position in the image, however, could extract a great deal of meaning from it. Further, the homogeneity of the test objects used created a relatively complex object recognition environment, despite the relative simplicity of the targets. Therefore, it is firmly believed that with time and effort, 114 an effective sea mine object recognition system could be developed for the Acoustical Imaging System. Further, it is believed that this system could be a fully automated background task that would occur in the spare processing cycles left open by the surface detection algorithm. Finally, implementation suggestions for an object recognition system on the AIS are presented in chapter 7. To begin the discussion, sample images from a precursor of the AIS are presented. Segmentation of such images to extract regions of interest that will be used in object recognition is then touched upon. Next, two simple ways in which the use of many time series may improve object recognition over a single time series are presented. Additionally, incorporation of image-level features into the object recognition process and object-specific weighting of classification features are mentioned. Finally, simple issues in the design of the AIS that may increase the object recognition power of the system are addressed. The issues raised in chapter 7 are but a few of the ways in which the work in this thesis could be built upon. 115 Appendix A. List of Acr onyms 2DUS Two-dimensional Ultrasound 3D Three-dimensional 3DUS Three-dimensional Ultrasound A/D Analog-to-Digital Converter AIS Acoustical Imaging System AIM Acoustical Imaging Module D/A Digital-to-Analog Converter DAQ DSP Data Acquisition Digital Signal Processor DFT Discrete Fourier Transform DIS DSP Video Display Digital Signal Processor DMA Direct Memory Access DSP Digital Signal Processor FFT Fast Fourier Transform (an algorithm to perform the DFT) FIFO First-In First-Out Queue IIB Image Interface Board LCD Liquid Crystal Display LMIRIS Lockheed Martin IR Imaging Systems LTI Linear Time Invariant MSB Most Significant Bit NOP Assembly language instruction indicating that no operation is to be performed on the current clock cycle OR Object Recognition PFP Potential First Peak PKD DSP Peak Detection Digital Signal Processor PSD Power Spectral Density PVC Polyvinylchloride ROC Receiver Operating Curve ROIC Read-Out Integrated Circuit SHARC Super Harvard Architecture SNR Signal-to-Noise Ratio THA Transducer Hybrid Assembly TRIC Transmit-receive Integrated Circuit 116 B. Source code of prototype peak detection algorithm The following is the source code for the prototype peak detection algorithm developed. It was written using Microsoft Visual C++, version 1.5. coordination function is included. Note that no image-level Because the computational burden of calling the frame-level processing algorithm approximately ten times for an image is insignificant compared to the cost of running the frame-level processing algorithm, it is felt that such an omission is justified. // // File Peak051.c test version 0.5.1 of the peak detection algorithm. // // Author: Daniel C. Letzler // Started: July 9, 1998 // Last Updated: July 21, 1998 // // Sonoelectronics Project // Lockheed Martin IR Imaging Systems - // The header files // to make use of. Lexington, MA for the precompiled code that I will want #include <stdio.h> #include <stdlib.h> // Some definitions for use during the body of the code. #define MAX 1 #define MIN -1 #define NEITHER 0 #define SIGTHRESH 400 #define NOISETHRESH 200 #define FRAMES 1 // // // // // // How many pixels from the detection array will be handled by a DSP. How many total planes of data do we have. How many planes of US data are there to a frame. How many frames of Data are there. #define NORETURN 160 // Value used by system when there was no plane #define ARRAYSIZE 8 #define PLANES 5 #define FRAMESIZE 5 // at a pixel. // The "Potential First Peaks" data structure. typedef struct{ char AboveThresh; char PeakFound; 117 unsigned char plane; mag; int PFP; // Globals variables - keep it simple, don't worry about elegance. // The G's are appended on the beginning because these are global // variables and I use their names elsewhere without the G's The current entries to pass in. GFrameArray[ARRAYSIZE]; // int The Potential First Peaks Array PFP GPFPArray[ARRAYSIZE]; // // This function will coordinate the processing of a block of // frame data. // Tested 07/23/98 // *FrameArray, Frame, int void ProcessFrame(int FrameSize, int ArraySize, PFP *PFPArray) int { int i, ii; int *PixelForFrame; Loop through each of the pixels that we have. // // Pass grab the proper information and pass it to // ProcessPixelForFrame for processing. for(i = 0; i < ArraySize; i++) { PixelForFrame = FrameArray; // Find out if the data at the pixel has // by the beginning of the Frame. if(PFPArray->AboveThresh) been above the SigThresh { // // // Data has been above the threshold before the Yes! current frame. Find out if we have already found a peak. if(PFPArray->PeakFound) { // Yes so quit processing. FrameArray += FrameSize; // Increment the FrameArray pointer // to look at the next pixel in the FrameArray. // PFPArray++; // Move the PFPArray to the next pixel also. continue; } else { // No, so process pixel from start of frame. for(ii = 0; ii < FrameSize; ii++) { // Is Curr greater than our Potential Peak? if(*PixelForFrame > PFPArray->mag) { // Yes, it meets all of the criteria to update the PotPeak // Update the PFPArray entry. PFPArray->mag = *PixelForFrame; PFPArray->plane = Frame*FrameSize + ii; 118 } // Are we below the PotPeak by enough to call it a peak? else if((PFPArray->mag - (*PixelForFrame&4095)) > NOISETHRESH) { The PotPeak should now be marked as a peak! // Yes! PFPArray->PeakFound = 1; FrameArray += FrameSize; // Increment the FrameArray pointer PFPArray++; // Move the PFPArray to the next pixel also. Quit: our processing here is done break; // } PixelForFrame++; } // // Point to the next pixel for the block; End of the for loop } // End of the else statement that meant we hadn't found a peak. } // End of "has data been above the SigThresh before the frame" if else { // We have not been above SigThresh by the beginning of the // See if we go over by the end; if(!(PixelForFrame[FrameSize-l]&0x8000)) frame { // Yes, so process the frame, starting where we go over. // Also, mark that we went over. PFPArray->AboveThresh = 1; ii = 0; while(*PixelForFrame&0x8000) { PixelForFrame++; } // Now we are located at the first location where the // data is above the signal threshold. // Note that we don't have to initialize ii for the for loop // because the while statement above put it in the proper place. for(; ii < FrameSize; ii++) { // Is Curr greater than our Potential Peak? if(*PixelForFrame > PFPArray->mag) { // Yes, it meets all of the criteria to update the PotPeak // Update the PFPArray entry. PFPArray->mag = *PixelForFrame; PFPArray->plane = Frame*FrameSize + ii; } // End of the LocalMax // Are we below the PotPeak by enough to call it a peak? else if((PFPArray->mag - (*PixelForFrame&4095)) > NOISETHRESH) { The PotPeak should now be marked as a peak! // Yes! PFPArray->PeakFound = 1; FrameArray += FrameSize; // Increment the FrameArray pointer PFPArray++; // Move the PFPArray to the next pixel also. break; } // End of the LocalMin statement // Make us point to the next pixel PixelForFrame++; } // End of the for loop } // end of if statement that revealed we pass above thresh by end 119 else { // okay, we don't pass above SigThresh before the end of frame // just quit processing FrameArray += FrameSize; // Increment the FrameArray pointer to PFPArray++; // Move the PFPArray to the next pixel also. continue; } }// End of the else saying we are below SigThresh for all points // the frame. FrameArray += FrameSize; // Increm FrameArray pointer to next pixel PFPArray++; // Move the PFPArray to the next pixel also. } C. Assembly code listing associated with the prototype and computational burden calculations The assembly code generated from the prototype peak detection algorithm whose code is listed in appendix B is shown below. This assembly code was generated on the Analog Devices g21k compiler. The compiler was provided with the command line switches -S and -03, which instruct the compiler to generate an assembly code listing only and to perform the maximum amount of optimizations, respectively. Analog Devices ADSP210x0 .segment /pm seg-pmco; .file "peak051.c"; .segment /dm seg-dmda; .gcccompiled; .endseg; .segment /pm seg-pmco; .endseg; _ProcessFrame; .global ProcessFrame: FUNCTION PROLOGUE: ProcessFrame rtrts protocol, params in registers, DM stack, doubles are floats modify(i7, -10); saving registers: dm(-2, i6) =r3; dm(-3, i6)=r5; dm(-4, i6)=r7; dm(-5, i6) =r9; dm (-6, i6) =r11; dm(-7,i6)=r13; r2=iO; dm(-8, i6) =r2; r2=il; dm(-9, i6) =r2; r2=i2; dm (-10 ,i6 )=r2; r2=i3; dm(-l1, i6) =r2; 120 .val .def end-prologue; .; .scl i2=r12; r11=dm(1,i6); rll=pass r11; if le jump (pc, i1=dm(2,i6) r2=r8; r7=4095; r3=200; i3=il; rO=i; r13=r2*r4 r5=32768; _L$3) (DB) ; (ssi) ,modify(i3,m6) _L$28: lcntr = r1l, do r2=dm(il,m5) r2=pass r2; if eq jump (pc, i0=i2; nop; r2=dm(i3,m5) r2=pass r2; _L$3-1 until lce; (DB); _L$5) if ne jump (pc, _L $15) comp(r2,r4); nop; if ge jump (pc, _L $15) (DB); (DB); r9=r2; nop; rl=pass r13,i4=il; modify(i4,2); _L$14 !move : r2=dm(iO,m5) r8=dm(1, i4) comp(r2,r8); if gt jump (pc, _L$11) (DB); r12=r2 and r7; r12=r8-r12; comp(r12,r3); (DB); nop; if le jump (pc, _L$12) jump (pc, _L$31) (DB); modify(i3,4); m4=r4; _L$11: dm(i4,m5)=rl; dm(1,i 4)=r2; _L$12: r9=r9+1; comp(r9,r4); if lt jump (pc, _L$14) (DB); modify(iO,m6); rl=r1+1; jump (pc, _L$32) (DB); modify(il,4) m4=r4; _L$5: r2=r4-1; 121 nop; 109; .endef; m4=r2; r2=dm(m4,i2); r8=r2 and r5; if ne jump (pc, dm(il,m5)=rO; r2=dm(i2,m5); r2=r2 and r5; if eq jump (pc, r9=r8; nop; _L$15) (DB); nop; _L$18) (DB); _L$19) (DB); nop; _L$19: modify(i0,m6) r2=dm(iO,m5); r2=r2 and r5; if ne jump (pc, r9=r9+1; nop; _L$18: comp(r9,r4); if ge jump (pc, _L$15) rl=r9+r13,i4=il; modify(i4,2); (DB); nop; nop; _L$26: r2=dm(iO,m5); r8=dm(1,i4); comp(r2,r8); if gt jump (pc, r12=r2 and r7; r12=r8-r12; _L$23) (DB); _L$24) (DB) ; comp(r12,r3); if le jump m4=r4; (pc, nop; modify(i3,4); _L$31: modify(il,4); jump (pc, _L$15) dm (- 1,i4) =r0; modify(i2,m4); (DB) ; _L$23: dm(i4,mS) =rl; dm(1,i4)=r2; L$24: r9=r9+1; comp(r9,r4); if lt jump (pc, modify(i0,m6) _L$26) (DB); rl=rl+1; L$15: modify(i1,4); m4=r4; _L$32: modify(i3,4); modify(i2,m4); nop; _L$29: _L$3: FUNCTION EPILOGUE: 122 i12=dm (-1, i6) r3=dm(-2,i6); r5=dm(-3, i6); r7=dm(-4,i6); r9=dm(-5,i6); rll=dm(-6, i6); r13=dm(-7,i6); iO=dm(-8, i6); i1=dm ( -9 , i6 ) ; i2=dm (-10 , i6 ); i3=dm(-11,i6); jump (m14,il2) (DB); i7=i6; i6=dm(O, i6); .endseg; .segment /dm segdmda; .global _GFrameArray; .var .global _GPFPArray; .var .endseg; _GFrameArray[7]; _GPFPArray[32]; The computational burden calculations were made using the program Burd05 1.c. This is a C program which was written for the express purpose of calculating the computational burden of the peak detection algorithm with varying parameters. This program was shown to correctly compute the computational burden of processing various situations by checking it against manual calculations. The program groups the control strands of the peak detection algorithm based upon execution time of the strands. Each group is then assigned a value based upon the most expensive strand in the group, as determined by examining the assembly code listing above. Therefore the computational estimates are upper-bounds on the computational burden that will be incurred. The computational burden calculations performed are shown on the next page. 123 OverheadNoProcessing = OverheadProcessing = 18; 30; WhileLoopCost = 6; ExaminationCost= 15; UFC = OverheadNoProcessing; PFC = OverheadProcessing; WhC = W hileLoops ExC = e WhileLoopCost; EmaminedPixelse ExaminationCost; PixelFrameCost= { UFC, if pixel skipped; PFC + WhC + ExC, if pixel examined; PixelFrameCost; PixelCost = frames ImageCost = I PixelCost; pixels ComputationalBurden= ImageCost * AcousticUpdateRate; 124 D. Listing of test objects used Object # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Material Dimensions Shape 114.25 mm diameter, 5.98 mm thick, approx. 380 mm long Pipe 89.10 mm diameter, 5.62 mm thick, approx. 350 mm long Pipe 60.42 mm diameter, 4.21 mm thick, approx. 350 mm long Pipe PVC 48.12 mm diameter, 3.77 mm thick, approx. 275 mm long Pipe PVC 42.17 mm diameter, 3.64 mm thick, approx. 300 mm long Pipe PVC 33. 40 mm diameter, 3.44 mm thick, approx. 325 mm long Pipe PVC 26.93 mm diameter, 2.95 mm thick, approx. 350 mm long Pipe PVC 53.98 mm diameter, 1.02 mm thick, approx. 225 mm long Pipe Copper 41.10 mm diameter, 1.08 mm thick, approx. 275 mm long Pipe Copper 34.82 mm diameter, 0.95 mm thick, approx. 300 mm long Pipe Copper 28.43 mm diameter, 1.20 mm thick, approx. 275 mm long Pipe Copper 22.20 mm diameter, 0.73 mm thick, approx. 275 mm long Pipe Copper 16.04 mm diameter, 0.66 mm thick, approx. 325 mm long Pipe Copper 14.85 mm diameter, 0.50 mm thick, approx. 225 mm long Pipe Brass 15.96 mm diameter, 0.80 mm thick, approx. 225 mm long Pipe Brass 15.86 mm diameter, 1.62 mm thick, approx. 275 mm long Pipe Brass 9.51 mm diameter, 1.75 mm thick, approx. 325 mm long Pipe Brass 6.33 mm diameter, 2.12 mm thick, approx. 375 mm long Pipe Brass mm diameter, 1.71 mm thick, approx. 210 mm long 35.06 Pipe Brass 63.43 mm diameter, 3.18 mm thick, approx. 375 mm long Pipe Brass 12.66 mm diameter, approx. 260 mm long Cylinder Brass 14.26 mm diameter, approx. 250 mm long Cylinder Brass 25.43 mm diameter, approx. 450 mm long Cylinder Brass 50.81 mm diameter, approx. 200 mm long Cylinder Brass 6.35 mm thick, approx. 125 mm wide, approx. 225 mm long Plate Brass 22.16 mm diameter, 1.60 mm thick, approx. 275 mm long Pipe Aluminum 31.71 mm diameter, 2.05 mm thick, approx. 250 mm long Pipe Aluminum 34.86 mm diameter, 1.42 mm thick, approx. 325 mm long Pipe Aluminum 38.06 mm diameter, 6.43 mm thick, approx. 550 mm long Pipe Aluminum 41.26 mm diameter, 0.86 mm thick, approx. 125 mm long Pipe Aluminum Pipe 50.80 mm diameter, 6.40 mm thick, approx. 325 mm long Aluminum 76.19 mm diameter, 24.78 mm thick, approx. 160 mm long Pipe Aluminum 76.23 mm diameter, 6.03 mm thick, approx. 400 mm long Pipe Aluminum 76.31 mm diameter, 12.75 mm thick, approx. 475 mm long Pipe Aluminum 50.77 mm diameter, approx. 270 mm long Aluminum Cylinder 76.11 mm diameter, approx. 250 mm long Aluminum Cylinder 6.50 mm thick, approx. 150 mm wide, approx. 250 mm long Plate Aluminum 25.60 mm diameter, approx. 425 mm long Steel - 303 Cylinder 25.51 mm diameter, approx. 240 mm long Steel - 316 Cylinder 37.31 mm diameter, approx. 390 mm long Wood - Pine Cylinder 19.32 mm thick, 88.68 mm wide, approx. 410 mm long Wood - Pine Plank PVC PVC 125 E. Equation-based statement of each of the discriminatory tests used Listing of operators: As a convention, series will appear in bold and single values will appear normal. Further, an operation that is performed on a series, unless it is defined as operating on a series and returning a specific value, will perform the operation on every member of the series. So, for example, x + 2 would add 2 to every element in the series x. cony - performs convolution (example: conv(x,y);) max - finds maximum value in a series (example: max([l 5 3]) = 5) abs - returns the absolute value of a value (example: abs(- 1) = 1) loc - returns the index of a value in a list (example: find([1 5 3] = 5) = 2) ( - index into a series (example: y(2) = 2nd value in the series y) - indicates that the elements between the first and the last indices shown are selected from a series (example: y(start:stop) selects every element of y with an index in the interval spanned by start and stop, inclusive) sum - sums the elements of a series (example: sum([1 5 3]) = 9) A - exponentiate (example: yA2 squares every element in y) psd - calculates the power spectral density in the manner described in section 6.3.3 for the input series (example: [Pxx, F] = psd(y(start:stop)) computes the power spectral density of the data in series y with indices in the interval spanned by start and stop, inclusive; Pxx and F are series of the same length - in F is stored every frequency for which a power spectral density value was computed and in the same location in Pxx is stored the PSD value that corresponds to that frequency) Background calculations: USEC = the number of samples in 1 psec Input acoustic recording = x; Matched filter = m; y = conv(m,x); yabs = abs(y); ymax = max(yabs); yfirst = find(yabs = ymax) BEFORE = 20; AFTER = 479; Thick-shelled object classification criteria: Material determination start = yfirst + 1.4*USEC; stop = yfirst + 6*USEC; g = yabs(start:stop); gmax = max(g); ysecond = yfirst + find(g = gmax); start = ysecond + (ysecond - yfirst) - 1*USEC; stop = ysecond + (ysecond - yfirst) + 1*USEC; g = yabs(start:stop); gmax = max(g); ythird = ysecond + (ysecond - yfirst) + find(g = gmax); 126 secondval = yabs(ysecond); thirdval = yabs(ythird); ratio = thirdval/secondval; Then just look up what material is indicated by that ratio! Pipe shell thickness estimate c = speed of sound in material which the material determination algorithm decided; samples = ysecond - yfirst; time = samples/USEC; pipe shell thickness = time*c/2; Pipe inner diameter estimate start = yfirst + 15*USEC; stop = the end of the data series; g = yabs(start:stop); gmax = max(g); maxloc = start + find(g = gmax); start = maxloc - samples - 1*USEC; stop = maxloc - samples + 1*USEC; g = yabs(start:stop); gmax = max(g); checkloc = start + find(g = gmax); maxval = yabs(maxloc); checkval = yabs(checkloc); if (maxval/chackval is a valid ratio for the present material) then we are in a plate and we should quit else we are not in a plate and we have found the front edge of the back wall - keep going cwater = speed of sound in water; innersamples = maxloc - secondloc; time = innersamples/USEC; inner diameter = time*c/2; "Transfer function" estimate about the first reflection APxx = PSD of the acoustic pulse; start = yfirst - BEFORE; stop = yfirst + AFTER; h = x(start:stop); [Pxx, F] = psd(h); Pxx = Pxx - APxx; Let TemPxx be a region of a PSD stored in a template for an object. For every template, do the following: TotSSQ = 0; For every region of a PSD stored in the current template do the following: startFreq = lowest frequency in this region for the template; stopFreq = highest frequency in this region for the template; start = find(F = startFreq); stop = find(F = stopFreq); 127 maxTemp = max(TemPxx); maxDat = max(Pxx(start:stop)); Pxx = Pxx + maxTemp - maxDat; Error = Pxx - TemPxx; SqError = ErrorA2; SSQ = sum(SqError); TotSSQ = TotSSQ + SSQ; End of the tasks to be completed for every region of a PSD stored in a template. "Transfer function" estimate criteria score for the current template = TotSSQ; End of tasks to be completed for every template. Thin-shelled object classification criteria: Signal strength ratios ysquared = yA2; start = yfirst + 2*USEC; stop = yfirst + 3*USEC; SSQI = sum(ysquared(start:stop)); start = stop; stop = yfirst + 4*USEC; SSQ2 = sum(ysquared(start:stop)); start = stop; stop = yfirst + 5*USEC; SSQ3 = sum(ysquared(start:stop)); start = stop; stop = yfirst + 6*USEC; SSQ4 = sum(ysquared(start:stop)); Ratio1 = SSQ1/SSQ2; Ratio2 = SSQ2/SSQ4; Ratio3 = SSQ3/SSQ4; Use these three ratios as the classification criteria! Length of time from the first reflection to the strongest reflection from the back wall start = yfirst + 3*USEC; stop = end of data; g = yabs(start:stop); gmax = max(g); maxloc = start + find(g = gmax); time = (maxloc - yfirst)/USEC; "Transfer function" estimate about the first reflection This criterion is the same as for thick-shelled objects. Cylindrical object classification criteria Length of time from first reflection to reflection from the back edge of the cylinder start = yfirst + 4*USEC; stop = end of data; 128 g = yabs(start:stop); gmax = max(g); maxloc = start + find(g = gmax); time = (maxloc - yfirst)/USEC; "Transfer function" estimate about the first reflection from the back edge of the cylinder This criterion is the same as for thick-shelled objects except the first six lines should be replaced with the following: APxx = PSD of the acoustic pulse; start = yfirst + 4*USEC; stop = end of data; g = yabs(start:stop); gmax = max(g); maxloc = start + find(g = gmax); start = maxloc - BEFORE; stop = maxloc + AFTER; h = x(start:stop); [Pxx, F] = psd(h); Pxx = Pxx - APxx; 129 F. Matlab code for the object recognition prototype Below are the five highest-level functions used during the OR prototype construction and evaluation. Lower-level functions have been omitted in the interest of saving space and because mathematical and pseudcode explanations of all the classification criteria have been provided in appendix E. The code is written in Matlab, version 5. EvaluateOR: The function that coordinates the model building and the prototype evaluation. function Templates = EvaluateOR(; CylinderObjectList = [21 22 23 24 35 36 38 39]; SmallShellobjectList = [8 9 10 11 12 13 14 15 16 17 18 19 26 27 28 30]; LargeShellObjectList = [1 2 3 4 5 6 7 20 25 29 31 32 33 34 37]; ObjectList = [CylinderobjectList SmallShellObjectList LargeShellObjectList]; * Randomly select the recordings to use for training and then assign % those unselected to be used for testing. TrainingArray = TestingArray = NumToTry = 15; for j = 1:NumToTry Try = ceil(30*rand); ii = 1; while (ii <= length(TrainingArray)) if TrainingArray(ii) == Try ii = 1; Try = ceil(30*rand); else ii = ii + 1; end end TrainingArray = [TrainingArray Try); end TrainingArray = sort(TrainingArray); for i = 1:30 loc = find(TrainingArray==i); if isempty(loc) TestingArray = [TestingArray i]; end end disp(['Forming object templates from the training data...']); TCyl = CreateCylinderTemplates (CylinderObjectList, TrainingArray); TSma = CreateSmallShellTemplates (SmallShellObjectList, TrainingArray); TLar = CreateLargeShellTemplates (LargeShellObjectList, TraiiingArray); Templates = ClassifyObjects(TCyl, TSma, TLar, TestingArray); NumRight = 0; NumTotal = 0; for i = 1:length(Templates) for ii = 1:length(Templates(i) .Classification) if Templates(i) .Classification(ii) == Templates(i) .Number NumRight = NumRight + 1; end NumTotal = NumTotal + 1; end end Accuracy = (NumRight/NumTotal)*100; disp(' '); disp(['OR classification accuracy = disp(['NumRight = disp(' '); ' ' num2str(Accuracy) '%'); num2str(NumRight) '; NumTotal = d = cd; 130 ' num2str(NumTotal)]); cd c:\Dan\ORProto; FPrigtResultsOf0R(Templates, 'Results.dat'); cd(d); CreateSmallShellTemplates: The function that coordinates the building of the thin-shelled object templates. function Templates = CreateSmallShellTemplates(SmallShellObjectArray, TrainingArray); NumObjs = length(SmallShellObjectArray); DataPts = length(TrainingArray); % What regions were found to be important for the PSD % template matching? These regions were manually typed in here after % finding the important regions with the program FindPSDTemplatePieceCombos. % In future, could automate. PSDpieces(l).ObjectNum = 8; PSDpieces(l).Piece(l).p = [1.5e6 2.0e6]; PSDpieces(l).Piece(2).p = [4.0e6 4.6e6]; PSDpieces(2) PSDpieces(2) PSDpieces(2) PSDpieces(2) .ObjectNum = 9; .Piece(l).p = [1.4e6 2.0e6]; .Piece(2).p = [4.0e6 4.6e6]; .Piece(3).p = [5.le6 5.8e6]; PSDpieces(3) .ObjectNum = 10; PSDpieces (3) .Piece(l).p = [1.5e6 2.0e6]; PSDpieces (3) .Piece(2).p = [5.2e6 6.0e6]; PSDpieces(4) PSDpieces(4) PSDpieces(4) PSDpieces(4) PSDpieces(4) .ObjectNum 11; PSDpieces(5) PSDpieces(5) PSDpieces(5) PSDpieces(5) .ObjectNum = 12; = .Piece(l).p .Piece(2).p .Piece(3).p .Piece(4).p [O.1e6 [1.2e6 = [4.0e6 = [5.2e6 = = 0.5e6]; 2.9e6]; 5.0e6]; 5.8e6]; .Piece(l).p = [2.2e6 3.0e6]; .Piece(2).p = [3.0e6 3.7e6]; .Piece(2).p = (5.0e6 6.0e6]; PSDpieces(6).ObjectNum = 13; PSDpieces(6).Piece(l).p = [3.8e6 4.2e6]; PSDpieces(7).ObjectNum = 14; PSDpieces(7).Piece(l).p = [3.5e6 4.2e6]; PSDpieces(8).ObjectNum = 15; PSDpieces(8).Piece(l).p = [2.0e6 2.8e6]; PSDpieces(9).ObjectNum = 16; PSDpieces(9).Piece(l).p = [1.0e6 2.6e6]; PSDpieces(9).Piece(2).p = [5.0e6 5.6e6]; PSDpieces(10).ObjectNum = 17; PSDpieces(10).Piece(l).p = [0.5e6 1.1e6]; PSDpieces(10).Piece(2).p = [1.le6 1.7e6]; PSDpieces(ll).ObjectNum = 18; PSDpieces(ll).Piece(l).p = [0.5e6 2.0e6]; PSDpieces(ll).Piece(2).p = [3.1e6 4.2e6]; PSDpieces(12).ObjectNum = 19; PSDpieces(12).Piece(l).p = [0.7e6 1.2e6]; PSDpieces(13).ObjectNum = 26; PSDpieces(13).Piece(l).p = [1.3e6 2.0e6]; PSDpieces(14).ObjectNum = 27; PSDpieces(14).Piece(l).p = [0.7e6 1.5e6]; PSDpieces(15).ObjectNum = 28; PSDpieces(15).Piece(l).p = [1.5e6 2.3e6]; 131 PSDpieces(16).ObjectNum = 30; PSDpieces(16).Piece(l).p = [2.5e6 3.6e6]; disp('Calculating the acoustic pulse PSD...'); [AcousticPulse, F] = AcousticPulsePSD; d = cd; cd c:\Dan\Data\052599; for i = l:NumObjs Num = SmallShellObjectArray(i); Templates(i).Number = Num; pind = 1; while PSDpieces(pind).ObjectNum -= Num pind = pind + 1; end for j = 1:length(PSDpieces(pind).Piece) Templates(i).PSDTemplate(j).StartFreq = PSDpieces(pind).Piece(j).p(l); Templates(i).PSDTemplate(j).StopFreq = PSDpieces(pind).Piece(j).p(2); end disp(['Calculating the transfer function estimate for object #' num2str(Num)]); [FullPSDTemplate, F] = TrialFullPSDTemplateCreator(Templates(i), TrainingArray, AcousticPulse); Templates(i) = TrialPSDTemplateCreator(F, FullPSDTemplate, Templates(i)); disp(['Creating the template for object #' num2str(Num)]); RatioMatrix = []; TimeArray = HssqArray = for ii = 1:DataPts DataNum = TrainingArray(ii); string = ['load ' num2str(Num) '_' num2str(DataNum) '.asc;']; eval(string); string = ['curr = X' num2str(Num) '_' num2str(DataNum) ';']; eval(string); Ratios = SmallShellRing4(curr); RatioMatrix = [RatioMatrix; Ratios]; Time = NextPeakLocater(curr); TimeArray = [TimeArray Time]; [H, F] = MakeEstimateOfH(curr, AcousticPulse); Hssq = PSDTemplateMatch(F, H, Templates(i).PSDTemplate); HssqArray = [HssqArray Hssq]; end Templates(i).TimeMean = mean(TimeArray); Templates(i).TimeStd = std(TimeArray); Templates(i).RatioMeanArray = mean(RatioMatrix); Templates(i).RatioStdArray = std(RatioMatrix); Templates(i).HssqMean = mean(HssqArray); Templates(i).HssqStd = std(HssqArray); end cd(d); CreateLargeShellTemplates: The function that coordinates the building of the thick-shelled object templates. function Templates = CreateLargeShellTemplates(LargeShellObjectArray, TrainingArray); PVC = 1; BRASS = 2; ALUMINUM = 3; UNKNOWN = 4; IDTHRESH = 2; % ID deviation threshold so that the data may be % median filtered prior to template formation. NumObjs = length(LargeShellobjectArray); DataPts = length(TrainingArray); % What regions were found to be important for the PSD % template matching? These regions were manually typed in here after % finding the important regions with the program FindPSDTemplatePieceCombos. 132 % In future, could automate. PSDpieces(1).ObjectNum = 1; PSDpieces(l).Piece(l).p = [l.0e6 2.0e6]; PSDpieces(l).Piece(2).p = [3.0e6 4.0e6]; PSDpieces(l).Piece(3).p = [4.0e6 5.0e6]; PSDpieces(2).ObjectNum = 2; PSDpieces(2).Piece(l).p = [0.7e6 2.1e6]; PSDpieces(2).Piece(2).p = [3.0e6 4.0e6]; PSDpieces(3)'.ObjectNum = 3; PSDpieces(3).Piece(l).p = [0.5e6 2.2e6]; PSDpieces(3).Piece(2).p = [3.0e6 5.0e6]; PSDpieces(4).ObjectNum = 4; PSDpieces(4).Piece(l).p = [2.4e6 4.0e6]; PSDpieces(4).Piece(2).p = [4.8e6 5.8e6]; PSDpieces(5).ObjectNum = 5; PSDpieces(5).Piece(l).p = [0.5e6 2.4e6]; PSDpieces(6).ObjectNum = 6; PSDpieces(6).Piece(l).p = [0.5e6 2.3e6]; PSDpieces(7).ObjectNum = 7; PSDpieces(7).Piece(l).p = [0.5e6 2.3e6]; PSDpieces(8).ObjectNum = 20; PSDpieces(8).Piece(l).p = PSDpieces(8).Piece(2).p = [0.5e6 2.4e6); [2.6e6 4.0e6]; PSDpieces(9).ObjectNum = 25; PSDpieces(9).Piece(l).p = (0.4e6 1.5e6]; PSDpieces(9).Piece(2).p = [1.5e6 3.0e6]; PSDpieces(10).ObjectNum = 29; PSDpieces(10).Piece(l).p = [0.1e6 1.5e6]; PSDpieces(10).Piece(2).p = [2.5e6 3.5e6]; PSDpieces(10).Piece(3).p = [4.1e6 6.0e6]; PSDpieces(ll).ObjectNum = 31; PSDpieces(ll).Piece(l).p = [0.4e6 1.8e6]; PSDpieces(ll).Piece(2).p = [1.8e6 2.4e6]; PSDpieces(ll).Piece(3).p = [2.4e6 3.5e6]; PSDpieces(ll).Piece(4).p = [4.8e6 6.0e6]; PSDpieces(12).ObjectNum = 32; PSDpieces(12).Piece(l).p = [3.8e6 4.7e6]; PSDpieces(12).Piece(2).p = [4.7e6 6.0e6]; PSDpieces(13).ObjectNum = 33; PSDpieces(13).Piece(1).p = [0.5e6 2.2e6]; PSDpieces(14).ObjectNum = 34; PSDpieces(14).Piece(1).p = [0.5e6 2.0e6]; PSDpieces(14).Piece(2).p = [2.0e6 4.Oe6]; PSDpieces(14).Piece(3).p = [4.0e6 6.0e6]; PSDpieces(15).ObjectNum = 37; PSDpieces(15).Piece(1).p = [0.5e6 1.6e6]; disp('Calculating the acoustic pulse PSD...'); [AcousticPulse, F] = AcousticPulsePSD; d = cd; cd c:\Dan\Data\052599; for i = 1:NumObjs Num LargeShellObjectArray(i); Templates(i).Number = Num; pind = 1; while PSDpieces(pind).ObjectNum -= Num pind = pind + 1; end for j = 1:length(PSDpieces(pind).Piece) 133 Templates(i).PSDTemplate(j).StartFreq = PSDpieces(pind).Piece(j).p(l); Templates(i).PSDTemplate(j).StopFreq = PSDpieces(pind).Piece(j).p(2); end disp(['Calculating the transfer function estimate for object #' num2str(Num)]); [FullPSDTemplate, F] = TrialFullPSDTemplateCreator(Templates(i), TrainingArray, AcousticPulse); Templates(i) = TrialPSDTemplateCreator(F, FullPSDTemplate, Templates(i)); disp(['Creating the template for object #' num2str(Num)]); MaterialArray = WallArray = []; IDArray = [; HssqArray = []; for ii = 1:DataPts DataNum = TrainingArray(ii); string = ['load ' num2str(Num) '_' num2str(DataNum) '.asc;']; eval(string); string = {'curr = X' num2str(Num) '_' eval(string); num2str(DataNum) ';']; Material = DetermineMaterial(curr); MaterialArray = [MaterialArray Material]; Wall = LargeShellWallThickness(curr, Material); ID = LargeShellID(curr, Material, Wall); if Material UNKNOWN WallArray = [WallArray Wall]; IDArray = [IDArray ID]; end [H, F] = MakeEstimateOfH(curr, AcousticPulse); Hssq = PSDTemplateMatch(F, H, Templates(i).PSDTemplate); HssqArray = [HssqArray Hssq]; end Templates(i).Material = median(MaterialArray); Templates(i).WallMean = mean(WallArray); Templates(i).WallStd = std(WallArray); % IDArray requires median filtering because very infrequently % there is an outlier which greatly increases the IDStd! IDArray = MedianFilter(IDArray, IDTHRESH); Templates(i).IDMean = mean(IDArray); Templates(i).IDStd = std(IDArray); Templates(i).HssqMean = mean(HssqArray); Templates(i).HssqStd = std(HssqArray); end cd(d); CreateCylinderTemplates: The function that coordinates the building of the cylindrical object templates. function Templates = CreateCylinderTemplates(CylinderObjectArray, TrainingArray); CYLDIATHRESH = 0.25; % Threshold to use whi2o deteriining time to % back wall of a cylinder. NumObjs = length(CylinderObjectArray); DataPts = length(TrainingArray); % What regions were found to be important for the PSD % template matching? These regions were manually typed in here after % finding the important regions with the program FindPSDTemplatePieceCombos. % In future, could automate. PSDpieces(l).ObjectNum = 21; PSDpieces().Piece(l).p = [0.5e6 1.1e6]; PSDpieces(l).Piece(2).p = [l.le6 2.1e6]; PSDpieces(l).Piece(3).p = [2.le6 3.8e6]; PSDpieces(l).Piece(4).p = [3.8e6 5.0e6]; PSDpieces(2).ObjectNum = 22; PSDpieces(2).Piece(l).p = [0.8e6 1.6e6]; PSDpieces(2).Piece(2).p = [1.6e6 2.4e6]; PSDpieces(3).ObjectNum = 23; PSDpieces(3).Piece(l).p = [0.5e6 2.5e6]; 134 PSDpieces(4).ObjectNum = 24; PSDpieces(4).Piece(l).p = [0.3e6 l.0e6]; PSDpieces(4).Piece(2).p = [1.0e6 4.0e6]; PSDpieces(4).Piece(3).p = [4.0e6 5.0e6]; PSDpieces(5).ObjectNum = 35; PSDpieces(5).Piece(l).p = [1.3e6 2.le6]; PSDpieces(5).Piece(2).p = [2.1e6 3.2e6]; PSDpieces(5).Piece(3).p = [3.2e6 4.4e6]; PSDpieces(6).ObjectNum = 36; PSDpieces(6).Piece(l).p = [0.3e6 1.0e6]; PSDpieces(6).Piece(2).p = [l.0e6 4.0e6]; PSDpieces(6).Piece(3).p = [4.0e6 5.0e6]; PSDpieces(7).ObjectNum = 38; PSDpieces(7).Piece(1).p = [0.5e6 1.5e6]; PSDpieces(7).Piece(2).p = [2.2e6 3.2e6]; PSDpieces(8).ObjectNum = 39; PSDpieces(8).Piece(l).p = [2.0e6 2.7e6]; PSDpieces(8).Piece(2).p = [2.8e6 4.0e6]; disp('Calculating the acoustic pulse PSD...'); [AcousticPulse, F] = AcousticPulsePSD; d = cd; cd c:\Dan\Data\052599; for i = 1:NumObjs Num = CylinderObjectArray(i); Templates(i).Number = Num; pind = 1; while PSDpieces(pind).ObjectNum ~ Num pind = pind + 1; end for j = 1:length(PSDpieces(pind).Piece) Templates(i).PSDTemplate(j).StartFreq = PSDpieces(pind).Piece(j).p(l); Templates(i).PSDTemplate(j).StopFreq = PSDpieces(pind).Piece(j).p(2); end disp(['Calculating the transfer function estimate for object #' num2str(Num)]); [FullPSDTemplate, F] = CylinderFullPSDTemplateCreator(Templates(i), TrainingArray, AcousticPulse); Templates(i) = TrialPSDTemplateCreator(F, FullPSDTemplate, Templates(i)); disp(['Creating the template for object #' num2str(Num)]); DiaTimeArray = []; HssqArray = []; for ii = 1:DataPts DataNum = TrainingArray(ii); string = ['load ' num2str(Num) '_' num2str(DataNum) '.asc;']; eval(string); string = ('curi , X' num2str Num; '_' num2str(DataNum) ';']; eval(string); DiaTime = CylinderThickness(curr, CYLDIATHRESH); DiaTimeArray = (DiaTimeArray DiaTime]; [H, F] = CylMakeEstimateOfH(curr, AcousticPulse); Hssq = PSDTemplateMatch(F, H, Templates(i).PSDTemplate); HssqArray = [HssqArray Hssq]; end Templates(i).DiaTimeMean = mean(DiaTimeArray); Templates(i).DiaTimeStd = std(DiaTimeArray); Templates(i).HssqMean = mean(HssqArray); Templates(i).HssqStd = std(HssqArray); end cd(d); 135 ClassifyObjects: The function that applies the templates and selects the most similar object to be the classification. function Templates = ClassifyObjects(TCyl, TSma, TLar, TestingArray); PVC = 1; BRASS = 2; ALUMINUM = 3; UNKNOWN = 4; CylinderObjectList = []; for i = 1:length(TCyl) CylinderObjectList = [CylinderObjectList TCyl(i).Number]; end SmallShellObjectList = []; for i = 1:length(TSma) SmallShellObjectList = [SmallShellObjectList TSma(i).Number]; end LargeShellObjectList = []; for i = 1:length(TLar) LargeShellObjectList = [LargeShellObjectList TLar(i).Number]; end ObjectList = [CylinderObjectList SmallShellObjectList LargeShellObjectList]; NumObjs = length(ObjectList); DataPts = length(TestingArray); %disp(ObjectList) disp('Calculating the acoustic pulse PSD...'); [AcousticPulse, F] = AcousticPulsePSD; d = cd; cd c:\Dan\Data\052599; for i 1:NumObjs Num = ObjectList(i); disp(['Classifying the testing data for object #' Templates(i).Number = Num; num2str(Num)]); SearchSpace = El; for ii = 1:DataPts DataNum = TestingArray(ii); string = ['load ' num2str(Num) '_' num2str(DataNum) '.asc;']; eval(string); string = ['curr = X' num2str(Num) '_' num2str(DataNum) ';']; eval(string); SmallScores = CalcSmallScores(TSma, curr, AcousticPulse); Material = DetermineMaterial(curr); if Material == UNKNOWN SearchSpace = [SmallSnellObjectList CylinderObjectList); CylinderScores = CalcCylinderScores(TCyl, curr, AcousticPulse); Scores = [SmallScores(l,:) CylinderScores(l,:)]; else SearchSpace = [SmallShellObjectList LargeShellObjectList]; LargeScores = CalcLargeScores(TLar, curr, Material, AcousticPulse); Scores = [SmallScores(l,:) end % This is LargeScores(l,:)]; like golf - lowest score wins! (score represents std % from Lhe Lemplate) MinScore = min(Scores); Index = find(Scores==MinScore); Classification = SearchSpace(Index); Templates(i).Classification(ii) = Classification; Templates(i).Scores(ii).list = Scores; Templates(i).SmallScores(ii).list = SmallScores(l:end,:); if Material == UNKNOWN Templates(i).CylinderScores(ii).list = CylinderScores(l:end,:); Templates(i).LargeScores(ii).list = []; else Templates(i).CylinderScores(ii).list []; Templates(i).LargeScores(ii).list = LargeScores(l:end,:); 136 end dispf(['Data piece #' num2str(TestingArray(ii)) 'num2str (Classification)]); disp(Scores); end end cd(d); 137 ': Classification = References 1. B. Kamgar-Parsi, B. Johnson, D. L. Folds, and E. 0. Belcher. High-Resolution Underwater Acoustical Imaging with Lens-Based Systems. InternationalJournal of Imaging Systems and Technology, 8:377- 385, 1997. 2. D. F Lohmeyer. 1998. Signal Processing in a Real-Time Three-Dimensional Acoustical Imaging System. Master of Engineering Thesis, Massachusetts Institute of Technology. 3. K. Erikson, et al. 1999. Imaging with an underwater acoustical camera. Proceedings,SPIE Cinference on Information Systems for Navy Divers and Autonomous Underwater Vehicles, Orlando. To be published. 4. V. 0. Knudsen, R. S. Alford, and J. W. Emling. Underwater ambient noise. Journal.of Marine Research. 7:410-429, 1948. 5. W. W. L. Au and K. Banks. The acoustics of the snapping shrimp Synalpheus parneomeris in Kaneohe Bay. Journal of the Acoustical Society of America. 103(1):41-47, 1998. 6. ADSP-2106x SHARC User'sManual. Analog Devices, Inc. Norwood, MA. 1995. 7. R. T. Beyer. NonlinearAcoustics. Acoustical Society of America. New York, 1997. 8. K. Erikson. 1999. Private Communication. Lexington, MA: Lockheed Martin IR Imaging Systems. 9. Jane's Underwater Warfare Systems. Alexandria, VA, 1996. 10. P. M. Morse and K. U. Ingard. Edited by A. J. Watts. Jane's Information Group Inc. Theoretical Acoustics. McGraw-Hill Book Company. New York, 1968. 11. T. F. Hueter and R. H. Bolt. Sonics: Techniques for the Use of Sound and Ultrasound in Engineering and Science. John Wiley & Sons, Inc. New York, 1955. 12. Matlab (IBM PC and Compatibles Version 5.2.0.3084), [Computer Program]. 1998. The MathWorks Inc., Natick, MA. 13. S. M. Tuovila, S. R. Nelson, and C. M. Smith. Automated Target Classification and False Targetr Rejection in AN/AQS- 14 Sonar Images. U.S. Navy Journalof UnderwaterAcoustics. 47(2):895-903, April 1997. 14. R. N. Czerwinski, D. L. Jones, and W. D. O'Brien. Line and Boundary Detection in Speckle Images. IEEE Transactionson Image Processing.7(12):1700-1714, December 1998. 15. T. F. Heuter and R. H. Bolt. 1955. Sonics. New York: John Wiley & Sons, Inc. 16. B. A. Auld. 1990. Acoustical Fields and Waves in Solids, Volume II. Malabar, FL: Krieger Publishing Company. 17. N. Yen, L. R. Dragonette, S. K. Numich. Time-frequency analysis of scattering from elastic objects. J. Acoustical Society of America. 87 (6) p. 2359-2370, June 1990. 138