A Neural Network Approach to 3D Imaging by Approximating the Inverse Scattering Problem BY Jacob Michael Wilson, B.S.E.E. A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master of Science Major Subject: Electrical Engineering New Mexico State University Las Cruces, New Mexico October 2021 Jacob Wilson Candidate Electrical Engineering Major This thesis is approved on behalf of the faculty of New Mexico State University, and it is acceptable in quality and form for publication. Approved by the thesis committee: Dr. Steven Sandoval Chair Person Dr. Muhammed Dawood Committee Member Dr. Michael DeAntonio Committee Member ii DEDICATION I would like to dedicate this document to my family and beautiful fiancé. iii ACKNOWLEDGMENTS I would like to thank my family and fiancé for being so supportive of me. They are always there for me no matter the difficulties I may face. I would also like to thank my advisor Dr. Sandoval for going above and beyond to help me complete this document. I could not have done it without his help. Additionally, I would like to extend my thanks to Nicholas Von Wolff and the NMSU HPC team for making this work computationally possible. Finally, and most importantly, may all the glory be to God! iv VITA April 02, 1997 Born in El Paso, Texas, USA 2015-2019 B.S.E.E., New Mexico State University, Las Cruces, New Mexico. PUBLICATIONS [or Papers Presented] 1. J. M. Wilson, “A Neural Network Approach to 3D Imaging by Approximating the Inverse Scattering Problem” in preparation. FIELD OF STUDY Major Field: Electrical Engineering Area of Specialty: Electromagnetics v ABSTRACT A Neural Network Approach to 3D Imaging by Approximating the Inverse Scattering Problem BY Jacob Michael Wilson, B.S.E.E. MASTER OF SCIENCE New Mexico State University Las Cruces, New Mexico, 2021 Dr. Steven P. Sandoval, Chair The goal of this work is to present a methodology for approximating the solution to an inverse electromagnetic scattering problem using a data-driven neural network approach. Such a solution can be leveraged to resolve a 3D image without resorting to the use of ionizing (hazardous) radiation—which some methods such as X-Ray are limited by. Additionally, the proposed methodology has the potential to resolve images of greater resolution than the wavelength of the excitation or the associated radar range bin. In this work, both 1D and 3D imaging problems were considered. The 1D simulation utilized a Convolutional Neural Network (CNN) with cross-convolutional pooling to predict the impedance distribution of a transmission line. The 3D simulation predicted the permittivity distribution of simulated synthetic targets representing functional runway slabs, faulty runway slabs, simplified human thighs, and randomly oriented cubes and spheres. To vi prevent overfitting and ensure appropriate generalization, the proposed method combines two different neural network architectures which are trained in a multistage process. In particular, the decoder of a Variational Autoencoder (VAE) is used to generate the output image, while a CNN estimates the physics related the the physical problem and is used to extract relevant features to infer the contents of the scene. Applications of this work potentially include biomedical imaging, cancer detection, subterranean imaging, non-destructive testing, and material permittivity characterization via a free space or transmission line approach. vii CONTENTS LIST OF TABLES xi LIST OF FIGURES xiv LIST OF ACRONYMS xv 1 Introduction 1 1.1 Purpose of the Presented Work . . . . . . . . . . . . . . . . . . . 1 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background 2.1 2.2 2.3 3 Wave Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Forward Scattering Problem . . . . . . . . . . . . . . . . . 3 2.1.2 Inverse Scattering Problem . . . . . . . . . . . . . . . . . . 4 Imaging Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Electromagnetic Imaging . . . . . . . . . . . . . . . . . . . 6 2.2.2 Acoustical Imaging . . . . . . . . . . . . . . . . . . . . . . 9 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . 10 2.3.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.4 Variational Autoencoder . . . . . . . . . . . . . . . . . . . 17 2.3.5 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . 20 3 Method Overview 22 viii 4 Simulation One 24 4.1 Data Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 A Neural Network Approach to the Problem . . . . . . . . . . . . 28 4.2.1 Initialization and Training . . . . . . . . . . . . . . . . . . 29 4.2.2 Initial Architecture Evaluation . . . . . . . . . . . . . . . . 30 4.2.3 Secondary Architecture Evaluation . . . . . . . . . . . . . 37 4.2.4 Tertiary Architectures . . . . . . . . . . . . . . . . . . . . 41 4.2.5 Final Evaluation and Summary of Results . . . . . . . . . 43 5 Simulation Two 5.1 5.2 49 Data Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Neural Network Design . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.1 Variational Autoencoder . . . . . . . . . . . . . . . . . . . 66 5.2.2 Full Network . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . 90 6 Conclusion 99 7 Future Work 101 REFERENCES 103 ix LIST OF TABLES 1 Summary of the synthesized data set of simulation one. . . . . . . 27 2 Simulation one notation. . . . . . . . . . . . . . . . . . . . . . . . 30 3 Initial architectures summary. . . . . . . . . . . . . . . . . . . . . 32 4 Initial architecture evaluation converged errors. . . . . . . . . . . 36 5 Secondary architectures summary. . . . . . . . . . . . . . . . . . . 37 6 Secondary architecture evaluation converged errors. . . . . . . . . 39 7 Tertiary architectures summary. . . . . . . . . . . . . . . . . . . . 41 8 Tertiary architecture evaluation converged errors. . . . . . . . . . 42 9 The log of the expected value of the converged error of the best performing architectures of simulation one. . . . . . . . . . . . . . 45 10 Summary of the parameters used for simulation in openEMS. . . . 54 11 Summary of the four simulated data sets of simulation two. . . . . 60 12 Simulation two notation. . . . . . . . . . . . . . . . . . . . . . . . 65 13 Normalization method summary. . . . . . . . . . . . . . . . . . . 69 14 Summary of the layer attributes for each of the models considered in the most successful architectures. . . . . . . . . . . . . . . . . . 71 15 Trial 1 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 16 Trial 2 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 17 Trial 3 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 18 Trial 4 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 19 Trial 5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 20 Trial 6 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 21 Trial 7 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 x 22 Summary of the layer attributes for each of the models considered in the first three variations of full networks. . . . . . . . . . . . . 23 Summary of the layer attributes for each of the models considered in the final two variations of full networks. . . . . . . . . . . . . . 24 85 87 Summary of the minimum error for each of the models considered in the final two variations of full networks. . . . . . . . . . . . . . xi 88 LIST OF FIGURES 1 Illustration of the forward problem. . . . . . . . . . . . . . . . . . 4 2 Illustration of the inverse problem. . . . . . . . . . . . . . . . . . 5 3 A simple example that shows it can be difficult to determine the origination of reflected waves. . . . . . . . . . . . . . . . . . . . . 5 4 Illustration of a simple neural network consisting of two dense layers. 12 5 Illustration of a simple convolutional network consisting of two layers. 13 6 Operation of a simple cross convolutional layer. . . . . . . . . . . 7 Two possible implementations of a convolutional transpose operation. 15 8 An example of an autoencoder. . . . . . . . . . . . . . . . . . . . 18 9 An example of a VAE. . . . . . . . . . . . . . . . . . . . . . . . . 20 10 Illustration of the forward model simulation set up. . . . . . . . . 25 11 Simulation one forward problem Ansys Circuit. . . . . . . . . . . 25 12 Simulation one initial architecture evaluation error for architectures without cross-convolutional layers. . . . . . . . . . . . . . . . . . . 13 35 Simulation one secondary architecture evaluation error for architectures without cross-convolutional layers. . . . . . . . . . . . . . 15 34 Simulation one initial architecture evaluation error for architectures with cross-convolutional layers. . . . . . . . . . . . . . . . . . . . 14 14 39 Simulation one secondary architecture evaluation error for architectures with cross-convolutional layers. . . . . . . . . . . . . . . . 40 16 Simulation one tertiary architecture evaluation error. . . . . . . . 42 17 Simulation one expected value of the best performing architectures error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 45 18 An example output of the network (‘cCDDv1’). . . . . . . . . . . 46 19 An example output of the network (‘cCDDv1’). . . . . . . . . . . 47 20 An example output of the network (‘cCDDv1’). . . . . . . . . . . 48 21 Possible transmitter and receiver configurations. . . . . . . . . . . 51 22 Chosen transmitter/receiver SIMO configuration. . . . . . . . . . 53 23 A sample point from the Runway-Functional data set. . . . . . . . 55 24 A sample point from the Runway-Faulty data set. . . . . . . . . . 56 25 A sample point from the Human Leg data set. . . . . . . . . . . . 58 26 A sample point from the random shapes data set. . . . . . . . . . 59 27 An illustration of a CNN that will be used in conjuction with the decoder of a VAE to create the proposed full network. . . . . . . . 62 28 An illustration of the VAE architecture utilized. . . . . . . . . . . 63 29 An illustration of a full network. . . . . . . . . . . . . . . . . . . . 64 30 An example of a “flat” output from the VAE. . . . . . . . . . . . . 68 31 Error of the top performing VAE architectures. . . . . . . . . . . 72 32 A sample output from VAEv3 . . . . . . . . . . . . . . . . . . . . . 79 33 A sample output from VAEv3. . . . . . . . . . . . . . . . . . . . . 80 34 A sample output from VAEv13. . . . . . . . . . . . . . . . . . . . . 81 35 A sample output from VAEv13. . . . . . . . . . . . . . . . . . . . . 82 36 Training and validation error of FNv1. . . . . . . . . . . . . . . . 85 37 Training and validation error of FNv2. . . . . . . . . . . . . . . . 86 38 Training and validation error of FNv3. . . . . . . . . . . . . . . . . 87 39 Training and validation error of FNv4. . . . . . . . . . . . . . . . . 88 40 Training and validation error of FNv5. . . . . . . . . . . . . . . . . 89 41 An example output of network FNv3 on the Human Leg data set. 94 xiii 42 An example output of network FNv1 on the Random Shapes data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 43 An example output of network FNv3. . . . . . . . . . . . . . . . . 96 44 An example output of network FNv3. . . . . . . . . . . . . . . . . 97 45 An example output of network FNv4. . . . . . . . . . . . . . . . . 98 xiv LIST OF ACRONYMS Acronym Meaning 1D 2D 3D ADAM CNN CSV CT DOT GAN GPR GPU GUI HPC IQ KL MAE MIMO MISO ML MRI NDT OMP PINN PML ReLU SIMO SISO SVM VAE One Dimensional Two Dimensional Three Dimensional ADaptive Moment Estimation Convolutional Neural Network Comma Separated Value Computed Tomography Diffuse Optical Tomography Generative Adversarial Network Ground Penetrating Radar Graphics Processing Unit Graphical User Interface High Performance Computer In-phase Quadrature Kullback–Leibler Mean Absolute Error Multiple Input Multiple Output Multiple Input Single Output Machine Learning Magnetic Resonance Imaging Non-Destructive Testing Orthogonal Matching Pursuit Physics Inspired Neural Network Perfectly Matched Layer Rectified Linear Unit Single Input Multiple Output Single Input Singe Output Support Vector Machine Variational Autoencoder xv 1 Introduction Biomedical and subterranean imaging have found great importance in modern society. From industrial Non-Destructive Testing (NDT) [1], to locating cancerous growths within a person [2], these types of imaging have had a profound impact on our lives. Biomedical and subterranean imaging covers a very broad area of study. Arguably the most successful biomedical techniques are x-ray [3, 4] and MRI [5]. Both are able to produce high resolution images, but both have drawbacks. X-ray uses potentially harmful radiation [3] and MRI requires a very strong synthetic magnetic field [5]. Possibly the most successful subterranean imaging technique is Ground Penetrating Radar (GPR). GPR is capable of locating changes in mediums underground, but struggles to produce easy to interpret results [6]. While many methods exist to perform both biomedical and subterranean imaging, no method currently exists that can perform imaging regardless of frequency or excitation. Existing methods leverage aspects of their respective problems to generate images, although the physics of the methods remain the same. These are all examples of the inverse scattering problem and if the inverse scattering problem can be solved or a solution accurately approximated, a general type of imaging can be implemented. 1.1 Purpose of the Presented Work The goal of this thesis is to provide a methodology that can be utilized for 3D subterranean and biomedical imaging by approximating a solution to the inverse scattering problem. The proposed method will approximate the location of objects as well as their electrical properties. 1 First, a 1D method will be developed that will approximate the impedance distribution down a transmission line from a measured reflected pulse. Studying the 1D case can provide important information to better inform the 3D imaging problem. Next, the proposed 3D imaging technique will show that it is possible to produce a system which can potentially improve upon existing radar imaging systems while simultaneously estimating the permittivity of the target. 1.2 Overview This thesis is organized as follows. Chapter 2 provides relevant background for imaging via inverse scattering including a brief description of existing popular methods. Chapter 3 presents an overview of two experiments that were considered in this work which are presented in Chapters 4 and 5. In particular, Chapter 4 presents the 1D case termed “simulation one”, while Chapter 5 presents the 3D case termed “simulation two”. Finally, Chapters 6 presents conclusions for both experiments and Chapter 7 discusses possible future work. 2 2 Background 2.1 Wave Scattering This work will adopt the following terminology. The excitation is the source of energy that creates a wave. It is usually an antenna of some sort. The waves are the electromagnetic or acoustical waves that are generated due to the excitation. The waves can be categorized into the incident, reflected, and transmitted waves. The incident waves are those that are directly created by the excitation. The reflected waves are created by the collision of the incident wave upon a boundary such that energy is rejected from the boundary. The transmitted waves are accepted by the boundary after the collision of an incident wave upon that boundary. Finally, the system is the object that is being imaged. 2.1.1 Forward Scattering Problem The forward problem refers to a scenario where the excitation and system are known and the resulting reflected waves are unknown [7]. This is representative of many common problems in electromagnetic theory. Figure 1 shows an illustration of this type of problem. 3 Excitation and Generated Fields Known Reflected Waves System Unknown Known Figure 1: Illustration of the forward problem. First, a source generates an excitation which propagates through space and strikes a target resulting in a reflected wave. In this scenario, the excitation signal and the target are known, while the reflected wave is unknown. Thus, the forward problem is that of determining the unknown reflected wave. 2.1.2 Inverse Scattering Problem The inverse scattering problem is opposite to the forward problem, in that the excitation and resulting fields are known (possibly from measurements), but we do not know what system caused these resulting fields [7]. This type of problem is illustrated in Figure 2. Theoretically, the problem is undetermined and there are many systems that can result in the same measurements. Intuitively, this can be understood using a simple example. In Figure 3, we can see that for an isotropic source, objects that are equidistant from the transmitter and receiver will result in the same measured reflections. 4 Excitation and Generated Waves Known Reflected Waves System Measured Unknown Figure 2: Illustration of the Inverse Problem. First, a source generates an wave which propagates through space and strikes a target resulting in a reflected wave. In this scenario, the excitation and reflected waves are known, while the target is unknown. Thus, the inverse problem is that of determining the unknown system. Figure 3: Illustration of a scenario with an isotropic radiator emitting a pulse to determine the unknown location of a target. This scenario shows a simple example from which it can be difficult to determine the origination of reflected waves. 5 2.2 Imaging Methods Imaging via inverse scattering is the generation of an image or scene by reconstructing an unknown system from reflected wave measurements. The measurements can either be taken from mechanical waves (e.g. vibrations/sound) or electromagnetic waves. Both methods of imaging will be briefly reviewed in this document. Many methods of electromagnetic imaging are currently in use and each has its own method of approximating a solution to the inverse scattering problem. Some examples include X-ray/Computed Tomography (CT), [3,4], Magnetic Resonance Imaging (MRI) [5], Diffuse Optical Tomography (DOT) [8], and impedance tomography [9] for biomedical applications and Ground Penetrating Radar (GPR) [1] and impedance tomography [10] for Non-Destructive Testing (NDT) and subterranean imaging. 2.2.1 Electromagnetic Imaging Electromagnetic imaging methods are presented first. Electromagnetic imaging refers to the imaging of a scene using an electromagnetic wave to sample the target. This section will present biomedical electromagnetic imaging methods before presenting a subterranean imaging method. X-Ray Imaging X-ray imaging is widely in use today and it is likely that the reader has had some form of X-ray imaging used upon themselves. X-ray imaging works by exciting the scene using a high frequency electromagnetic wave and measuring the resulting “shadow”. Due to the high frequency of the excitation wave, energy attenuates quickly in certain objects as compared to others [3]. 6 X-ray imaging exploits this high attenuation characteristic of the excitation to avoid solving the inverse scattering problem. The result of a plain X-ray image is therefore, two dimensional. However, image processing techniques can be used in conjunction with taking multiple images in a 180◦ radius around a scene to produce high resolution 3D images. These 3D images are known as CT [4]. While X-ray imaging is highly effective for acquiring high resolution images, it has drawbacks. For example, X-ray imaging uses electromagnetic waves at an ionizing frequencies to obtain images. Unfortunately, ionizing radiation is energy that produces charged particles as it passes through an object [11] and can cause radiation damage to biological tissue [3]. Magnetic Resonance Imaging MRI uses a different approach to imaging than X-ray. It works by first subjecting a scene to a strong magnetic field which aligns the spin of protons within the scene being imaged. An electromagnetic pulse is then used to excite the scene. The electromagnetic pulse causes a response that can be measured and interpreted into an image [5]. This method is capable of producing high resolution 3D images without the use of ionizing radiation. At the time of writing this work, a typical cost for an MRI machine is approximately $100k-3M, which is significantly more expensive than an X-ray machine which costs approximately $40k-200k. Diffuse Optical Tomography DOT describes the use of an optical laser to illuminate an object and measure the reflected or transmitted photons to form an image. The excitations utilized are electromagnetic waves at optical frequencies. Consequently, the high frequencies associated with optical waves prevent deep penetration [8] which limits the depth of imaging. 7 Impedance Tomography Impedance tomography describes the use of an electromagnetic wave with the intention of determining the impedance distribution of the unknown system. There are many methods that are used to determine an impedance distribution. A common method used to solve for the unknown impedance distribution is to solve some approximation of Laplace’s equation ∇Φ = 0 (1) where Φ is the electric potential [12]. Unlike MRI or CT, impedance tomography has the ability to determine the conductivity of biological material. Impedance tomography has the additional advantage that it can estimate the actual properties of the tissue which it is imaging and not only where a change in medium occurs. However, it also has the drawback of poor spacial resolution [13]. Ground Penetrating Radar GPR is a technique used to map subterranean objects. It works by injecting an electromagnetic pulse into the ground and measuring the reflected response. GPR can be used to map objects underground, but it takes a bit specialized training to interpret the images. Often there is no shielding, so energy radiates in all directions. When the returned signal is measured, the response may not have originated from directly below the imaging system [6]. Although GPR can produce subterranean images, without proper training these images can often be misinterpreted. 8 2.2.2 Acoustical Imaging Next, acoustical imaging methods will be presented. Acoustical imaging is very similar to electromagnetic imaging. The type of excitation is the main difference between the two methods. Electrical waves do not require a physical medium as mechanical waves do, thus there are some minor differences. Below several types of acoustical imaging are discussed. Similar to the previous electromagnetic imaging section, a biomedical imaging method will be presented first followed by a subterranean method. Sonography Sonography is the use of a high frequency sound pulse to image a scene. The method utilizes an array of sensors to transmit and receive acoustical pulses then calculate the delay between the transmit sound and echo to determine where objects are located. This method can be used to create two or threedimensional images. Sonography has the specific advantage that it can image a scene without the use of any ionizing radiation, however, it cannot be used to image certain objects such as the lungs. This is due to the large acoustical mismatch between the lungs and the surrounding tissue [14]. Seismic Imaging Seismic imaging bears remarkable similarity to GPR. The main difference is the change in excitation. Seismic imaging is the use of a mechanical wave generally applied by using an explosive and is measured with some sort of transducer - usually a geophone. Measurements taken along a grid are recorded after the explosions are introduced and are used to produce images [15]. Seismic imaging produces images similar to that of GPR, but is capable of imaging at lower depths than GPR [16]. 9 2.3 Machine Learning Machine learning (ML) is a data driven [17] approach to solving a problem. For supervised learning, input examples are fed into a model and the model parameters are recursively modified in such a way that outputs are driven to match some desired response [18,19]. Machine learning approaches potentially hold great promise for estimating the solution to the inverse scattering problems because they can estimate a solution to cases where no analytical solution can/has been found. Next, a few common types of machine learning will be briefly reviewed. 2.3.1 Bayesian Bayesian methods have been used to approximate the solution to inverse scattering problems [20]. Bayesian machine learning seeks to maximize the likelihood of a certain outcome. It involves using some information that is known a-priori (or known before-hand) to calculate which output is most likely to occur given a certain input. This method has the ability to solve problems with few training points because it forces the algorithm to be biased toward particular outputs. 2.3.2 Support Vector Machines Support Vector Machines (SVMs) work by first projecting the data into a higher dimension data space. Projecting to a higher dimensional space potentially allows a non-linearly separable data set in the input space to be transformed into a linearly separable data set in the higher dimensional space [17]. 10 2.3.3 Neural Networks Neural networks were utilized throughout this work work. For this reason, more detail will be given on neural networks then other machine learning methods. Neural networks work by adjusting weighted connections between each neuron in a network of neurons. Typically neural networks are trained using a backpropagation algorithm to minimize a cost function [17]. One simple type of neural network is composed entirely of layers consisting of fully-connected feed-forward neurons and is known as a dense network. Dense networks contain layers that are often referred to as dense layers and are illustrated in Figure 4. In the figure, each neuron is represented by a circular node, each line segment between two nodes represents a weighted connection, the square nodes represent inputs to the neural network, and line segments ending in arrowheads represent the output of the network. For each neuron, the output is formed by summing the result of all weighted connections entering that node then passing that result through a nonlinear activation function. Thus, a single neuron can be modeled by the equation y = σ(wT x + b) (2) where y is the output of the neuron, σ(·) is a nonlinear function, x is the input vector, w is the weight vector, and b is a bias [17]. While the use of dense layers in a neural network gives the system significant flexibility, it also complicates training of the network by introducing a large number of degrees of freedom [21]. One way to reduce the number of degrees of freedom is to utilize convolutional layers as shown in Figure 5. Like dense layers, convolutional layers are also feed-forward. However, unlike dense layers, convolu- 11 Figure 4: Illustration of a simple neural network consisting of two dense layers. Note that each neuron is fully connected to the previous layer. tional layers are not fully connected. Rather, each neuron in a convolutional layer is connected to a small set of inputs termed the receptive field and weights are shared among neurons. One way to interpret the shared connections in a convolutional layer is to imagine that a neuron “slides” it’s receptive field over the input in a way analogous to a convolution operation [22]. Convolutional Neural Networks (CNNs) significantly reduce the number of degrees of freedom resulting in improved convergence and training time. While reducing the number of degrees of freedom results in some loss of flexibility in the network, the overall number of neurons may be adjusted to tailor the network for a particular amount of model complexity [23, 24]. 12 Figure 5: Illustration of a simple convolutional network where different colors represent different receptive fields. Note that each neuron is only connected to a subset of the neurons in the previous layer. One variation of a CNN architecture uses cross-convolutional layers [25,26]. A cross-convolutional layer connects neurons which have a common receptive field. Thus, instead of simply flattening the output of a CNN layer, the cross-convolution layer helps to preserve spatial information while at the same time reducing dimensionality. This allows for the use of more complex structures while minimizing computational costs. Figure 6 illustrates how a simple cross convolution layer works. Similar to convolutional layers are convolutional transpose layers. Convolutional transpose layers also slide an activation function over an input, however the goal of convolutional transpose layers is to increase the output dimensionality as opposed to decrease or keep the same dimensionality [27]. Convolutional trans13 Figure 6: Operation of a simple cross convolutional layer. Cross convolutional layers help to preserve spatial information that can be lost during a flattening process [25, 26]. pose layers can be trained to invert a convolutional operation [28]. An example of how a convolutional transpose layer can work is shown in Figure 7. 14 (a) (b) Figure 7: Two possible implementations of a convolutional transpose operation. Purple shading represents filter receptive fields. Grey shading represents zero padding. (a) Weights can be oriented such that a filter kernel is only a single pixel but the resulting convolution operation is many pixels. (b) Padding can be placed between and around actual image pixels such that a normal convolutional operation will result in an increase in dimensionality. 15 Neural Network Training Training of neural networks refers to any algorithm that is used to obtain weights so that the network’s response is tuned to a particular data set. The convergence of training algorithms is strongly dependent both the initial network parameters, and the algorithm which is used to optimize those parameters. The most common method of training is likely the backpropagation algorithm in which the weights are moved in a direction negative to the gradient of the cost function [18, 19]. In this work, an advanced form of training called the ADaptive Moment estimation (ADAM) algorithm is utilized for network training [29]. ADAM works by adaptively changing the step size (learning rate) for each weight in the network. The algorithm is known to have robust default parameters that need little tuning and pseudocode for the algorithm is given in Algorithm 1. Algorithm 1 ADAM optimizer 1: Require: α: Stepsize 2: Require: β ffl 1 , β 2 ∈ [0,1): Exponential decay rates for the moment estimates 3: Require: (θ) : Stochastic objective function with parameters θ 4: Require: θ0 : Initial parameter vector 5: m0 ← 0 (Initialize 1st moment vector) 6: v0 ← 0(Initialize 2nd moment vector) 7: t ← 0(Initialize timestep) 8: while θ t not converged do 9: t ← t + 1ffl 10: gt ← ∇θ t (θt−1 ) (Get gradients w.r.t stochastic objective at timestep t) 11: mt ← β1 ∗ mt−1 + (1 − β1 ) ∗ gt (Update biased first moment estimate) 12: vt ← β2 ∗ vt−1 + (1 − β2 ) ∗ gt2 (Update biased second raw moment estimate) 13: m̂t ← mt /(1 − β1t )(Compute bias-corrected first moment estimate) 14: vˆt ← vt /(1 − β2t )(Compute bias-corrected second moment estimate) √ 15: θt ← θt−1 − α ∗ m̂t /( vˆt + )(Update parameters) 16: end while 17: return θt (Resulting parameters) 16 Neural Network Cost Function Selection of an appropriate cost function is necessary for good convergence of the network. For this work, the cost function selected for all networks execpt the variational autoencoder was the Mean Absolute Error (MAE) PN MAE = i=1 |xi − yi | N (3) where N is the total number of data points in the data set, i is the current point, x is the output of the network for point i, and y is the desired output of the network for point i. For purposes of this work, the MAE was empirically found to be superior to the mean squared error loss function. 2.3.4 Variational Autoencoder A Variational Autoencoder (VAE) is a special type of neural network that is capable of generating data. Once trained, a VAE can be used to draw data points that are similar to but not exactly like training points within a data set. Data generation has many applications such as generating new data to train other neural networks [30]. VAE’s are a part of a class of neural networks called autoencoders. An autoencoder is a type of network that is trained to replicate it’s input by first compressing the input into a smaller latent space then rebuilding the input. By compressing and reconstructing the data, the network is first learning to encode an input into a low dimensional space. Then, it learns to decode the compressed version back into an estimate of the original input. This method of compressing and reconstructing data is useful for data compression, noise removal, and data generation [31–33]. By removing the encoder and directly giving a vector to the decoder, new data 17 Input Data Encoder Compressed Input (Latent Vector/Space) Decoder Reconstructed Input Figure 8: An example of an autoencoder. The input data is encoded by neural network layers inside of the encoder into a latent space vector. The latent space vector then decoded by the decoder to reconstruct the input. Once trained, the decoder can be removed from the encoder and can generate new data when given a latent space vector. points can be generated. Note that the encoder and decoder can be composed of many different types of layers including convolutional layers. An example of an autoencoder is shown in Figure 8. VAEs improve upon basic autoencoders by enforcing additional structure in the latent space such that points that are closely related in the input space will be closely related in the latent space. By adding this structure to the latent space, data generation is greatly simplified. This simplification occurs because particular types of data can be generated by first using the encoder to find where in the latent space certain types of data reside. Then, slightly different latent space vectors can be input into the decoder to generate similar but not exactly the same data points [30]. The latent space structure of VAEs is enforced by applying a random draw after the encoder and by utilizing a special type of loss function. By applying a 18 random draw after the encoder, the network will no longer learn a mapping from input to output, but it will learn a mapping from an input distribution to an output distribution. By learning a distribution as opposed to a single point, the structure of the latent space is improved. The loss function penalizes reconstruction error just like a normal neural network or autoencoder. However, VAE’s additionally add a term that penalizes latent space vectors that are not drawn from a Gaussian distribution. The additional penalty term is known as the KL divergence [34]. KL divergence is used to estimate the difference between two distributions. The typical VAE loss function utilized is L = R − βK (4) where L is the total loss, R is the reconstruction loss (often the binary cross entropy), β is a weighting parameter, and K is the KL divergence. The introduction of the β parameter allows more emphasis to be applied to the reconstruction loss or to the latent space structure and when applied, the network is known as a disentangled or β VAE [35]. The binary cross entropy is given by R = −y ∗ log(ŷ + ) − (1 − y) ∗ log(1 − ŷ + ) (5) where y is the input, ŷ is the output and is added for numerical stability. Additionally, Figure 9 illustrates a sample VAE architecture. A full explanation of a VAE is beyond the scope of this text. For a great introduction to VAE’s the reader is referred to [36] and [37]. For the original VAE, the reader is referred to [30] and finally, for a disentangled VAE, the reader 19 Input Data Encoder Random Draw + x N(0,1) Compressed Input (Latent Vector/Space) Decoder Reconstructed Input Figure 9: An example of a VAE. The encoder encodes the input into a normal distribution with mean µ and variance σ. A value is then drawn from a normal distribution with mean µ and variance σ. The randomly drawn value is then decoded to reconstruct the input. is referred to [35]. 2.3.5 Other Algorithms It is worth noting a few other computational methods that may not be considered machine learning but have been used to solve the inverse scattering problem. In [38], the researchers used a method known as conjugate gradient to solve a cost function related to the physics behind the problem. In [39], the authors used a method known as basis pursuit to solve this problem given sparsity constraints and a desire to minimize the amount of sensing elements. Additionally, a method known as Orthogonal Matching Pursuit (OMP) has also been used to solve a sim- 20 ilar problem with similar constraints. Although these algorithms are not directly machine learning algorithms, it is worth noting that there are other computational approaches to solving inverse scattering. Each works by setting up a cost function that is related to the underlying physics. The cost function does not need to (and likely will not) have an available analytical solution. 21 3 Method Overview In this work a method of imaging will be proposed which differs from the previously discussed imaging methods in that it not only images the scene, but it is also obtains electrical properties of objects in the scene. This is similar in nature to impedance tomography, however, the proposed methodology is based on the use of radiated waves instead of electrostatic electrodes. The proposed methodology is also very similar to a conventional radar system in that it illuminates a target with an electromagnetic wave, however, the proposed method also estimates electrical properties of objects in the scene. This additional information allows the internal composition of the object to be estimated, not simply the relative position of the objects’ boundary surfaces. Moreover, this additional information can aid in estimating the location and properties of objects which are typically hidden/masked when only surface reflections are considered. Although previous attempts have been made to design a imaging system similar to the proposed method of imaging [2, 21], these attempts have fallen short most likely because the problem is ill-posed and underdetermined [2, 21]. In this work, only an ideal environment is considered, and no part of the reflected wave is considered clutter. The entirety of the target is desired to be recovered. Specifically, it is desired to recover the profile of electrical properties (impedance and permittivity) of the target such that material boundaries and object features can be easily discerned. In the following chapters two problems are considered. First, only a representative one-dimensional version of the problem at hand is examined. This provides a simplified foundation on which to later approach the more challenging 22 three-dimensional problem. To this end, a transmission line analysis is utilized to represent the one-dimensional scattering problem. This allows a complicated electromagnetic problem to be represented by a simple electrical circuit [40]. The second simulation will seek to take the results from the first simulation and generalize the results to three dimensions. Each simulation and the respective results will now be presented independently. 23 4 Simulation One In this chapter, a method is formulated for one-dimensional imaging assuming an ideal lossless transmission line. The approach taken in this work is to design a neural network to estimate the characteristic impedance along the transmission line from an observed reflection. As will be seen, the development of such a neural network architecture will aid in generalization to 3D considered in simulation two. For simulation purposes, the system of interest will be modeled as several lossless transmission lines in series where each line has a different length and characteristic impedance. The lines are modeled in series to simulate plane waves incident upon planar boundaries [40]. This is illustrated in Figure 10. The overall goal is to completely determine the unknown characteristic impedance distribution along the length of the transmission line using measured reflections. The author is setting a goal to achieve a Mean Absolute Error (MAE) of impedance prediction of ±1 Ω to roughly correspond with permittivity measurement accuracy currently achievable through transmission line measurement techniques [41]. The log of the errors will be reported, therefore, the goal is to achieve log 1 = 0 error. The remainder of this chapter is organized as follows. First, the methodology for generating the training and testing data set is presented. Second, several potential architectures are presented, and their performance evaluated on the data set. 24 Z Source TL1 TL2 TL3... TLN A= V Figure 10: Illustration of the forward model simulation set up. The source will output a rectangular sinusoidal pulse. The source has an input impedance of Zsource . The darkened lines represent each lossless transmission line. The first line will always be matched to the source, but every other line will have a slight mismatch. There are an arbitrary amount of lossless transmission lines. The last line will have a high attenuation before connecting to an open stub to represent no further energy return after the last line. 4.1 Data Synthesis To utilize machine learning methods on this problem, training and testing data must first be acquired or simulated. Simulated data for the forward problem was obtained using Ansys Electromagnetics Desktop - Circuit Designer 2018.2. The forward model utilized assumed an ideal transmission line consisting of 5 elements in series, terminated with an open end and is shown in Figure 11. Figure 11: Set up in Ansys Circuit. It consists of a port and 5 transmission lines connected in series and terminated in an open stub. Variables were created for the length and characteristic impedance of each of the lines to vary in a simulation. In order to simulate reflections from a transmission line, a pulse was injected into one end of the transmission line. The pulse was generated using an In-phase 25 and Quadrature (IQ) modulated source. The IQ waveform was generated using the Julia programming language [42] and was imported into Ansys using a text file. The IQ waveform consisted of a pulsed sine wave 1 ns in duration centered at a frequency of 2 GHz with a peak voltage of 170 V sampled every ∼4.2 ps (fs ≈ 238 GHz). All simulations had an overall duration of 250 ns. Ansys Circuit Designer has the capability of “sweeping” over all possible combinations of a discretized set of circuit parameters to optimize a circuit. The range of swept values is shown in Table 1. This functionality was leveraged to produce several different unique observation points. The “range” variable in Table 1 represents the range of physical lengths of the line considered in units of in degrees (360◦ corresponds to one wavelength). Similarly, the “count” variable represents the number of quantization levels for each parameter of each segment of the transmission line. For example, the first transmission line segment (TL1) is simulated using thee different lengths between 1800◦ and 2700◦ , and each of the three lengths is simulated using all other possible combinations of the other sweep variables. A total of 2916 simulated reflections were generated and consisted of approximately 4 GB of data. The number of simulated reflections was constrained to be below 3,000 due to computational limitations imposed by Ansys Circuit Designer. Although the simulation consists of five impedance line segments, there exists the distinct possibility that two adjacent segments have the same impedance. In this case, the simulation effectively consists of fewer impedance line segments. Thus, the simulation configuration described above actually simulates cases with up to five impedance line segments. The first impedance line segment stays fixed at 377 Ω (the approximate impedance 26 Table 1: Summary of the synthesized data set. The counts were varied to keep the total samples below 3,000 for machine limitations. The total number of samples was 2916. The lengths given are electrical lengths where 360◦ correspond to one wavelength. (TL - Transmission Line) Sweep TL1 Length TL2 Length TL3 Length TL4 Length TL5 Length TL1 Impedance TL2 Impedance TL3 Impedance TL4 Impedance TL5 Impedance Range (degrees) 1800◦ -2700◦ (1.5-2.25 1800◦ -2700◦ (1.5-2.25 1800◦ -2700◦ (1.5-2.25 1800◦ -2700◦ (1.5-2.25 10,000◦ (8.33 m) 377Ω 45Ω-377Ω 45Ω-377Ω 45Ω-377Ω 45Ω-377Ω Count m) m) m) m) 3 3 3 3 1 1 3 2 2 3 of free space) to ensure the line is always matched to the source. The length of the final impedance line was chosen to ensure no reflections from the open stub at the end. Specifically, a high attenuation value of 10 V/m was applied to the end of the line to simulate the appropriate boundary condition (all other lines were lossless). The impedance values were chosen to roughly approximate impedances which may be encountered in an application such as GPR. As a result of Ansys Circuit Designer’s use of variable step solvers, additional measures were taken to condition the time-sampling of the simulations. In particular, the above configuration was first generated using Ansys Circuit Desiner’s GUI, then a circuit netlist was generated and modified with the following commands. .option tran.accurate_points.num = 59415 .option tran.accurate_points.start = 0 .option tran.accurate_points.stop = 2.5e-7 27 .option tran.trtol = 1 While using a netlist to condition the time-sampling of the simulations allows the generation of data sets with approximately uniform sampling, there is still a slight variation, or jitter, to the sampling. However, it was found that the variation was small enough to not significantly impact results. Finally, the data sets were exported from Ansys Circuit Designer in a Comma Separated Value (CSV) format. 4.2 A Neural Network Approach to the Problem Neural networks have been shown to improve the performance of inverse scattering systems and medical imaging applications [43–45]. The researchers in [43] successfully designed a neural network (termed DeepNIS) capable of learning general inverse scattering problems with a finite amount of transmitters and receivers surrounding an object of interest. While DeepNIS has good performance in these operating conditions, the imaging system requires the object of interest be surrounded by transmitters and receivers. In many cases such as GPR [46–48], it is not practical to place transmitters around an object of interest. In these cases, it is desirable to image a target with sensors placed only on one side of the target. The goal of simulation one is to design a compact neural network capable of imaging a 1-dimensional scene. In later experiments, more complex 3D scattering problems, without imposing the restrictive assumptions that sensors must be placed entirely around a target of interest. In what follows, various forms of neural network architectures are chosen and evaluated for performance on estimating the characteristic impedances in the data 28 synthesized in the previous section. After the initial performance has been evaluated, each architecture is refined with the intention of improving performance. 4.2.1 Initialization and Training To evaluate the performance of the architectures, the data was divided into a training set consisting of 85% of the data and a testing set consisting of 15% of the data. All simulations were performed using the Julia [42] programming language using the “Flux” [49, 50] machine learning package. New Mexico State University’s high performance cluster termed “Discovery” was used for all network training. The weights were initialized using Flux’s default layer initialization (Xavier initialization) by drawing from a uniform distribution as described in [51]. All dense layers utilized the Leaky ReLU activation function while all convolutional and crossconvolutional layers utilized the sigmoid activation functions [52]. The network weights were obtained using the ADAM algorithm [29]. All default Adam parameters were used with the exception of the learning rate. Starting from a value of 0.01, the learning rate was decreased by a factor of 10 if there was no improvement after 5 epochs and the learning rate was still larger than 1e-6. Training was terminated after 10 epochs without improvement. The MAE given in equation (3) was used as the cost function to train the networks. As a side note, it was found experimentally that using the MAE resulted in very low final error values while using the Mean Squared Error (MSE) resulted in the error always staying very large. This may be caused by the squaring of strong errors at discontinuities causing the network to over correct and diverge. 29 4.2.2 Initial Architecture Evaluation Simulation one began by choosing twelve compact architectures for evaluation. Recall, the goal of the simulations was to achieve an error of log(MAE) = 0. Ideally, all possible network configurations would be simulated. However, due to computational limitations, this is not feasible. By starting with 12 unique networks, an effective starting point can be established and refined. All architectures considered take on one of two forms. The first form is a typical Convolutional Neural Network (CNN) where the first layer(s) are convolutional layers with flattened outputs fed into one or more dense layers. The second form of network is the same as a the first form except a cross-convolutional layer is used to pool the data as opposed to a raw flatten. Because the data is very temporally dependent (an observation of 1 V at 0 ns means something very different from an observation of 1 V at 100 ns), it was hypothesized that the cross-convolutional layers would preserve temporal information that can be lost by flattening. For convenience, the notation summarized in Table 2 is used for describing the networks considered. Specifically, a lowercase ‘c’ denotes a single convolutional layer, a capital ‘C’ denotes a cross-convolutional layer, and a capital ‘D’ denotes a fully connected dense layer. Table 2: The abbreviations/notation used to describe neural network architectures used throughout this chapter. Notation Meaning Lowercase ‘c’ Convolutional Layer Uppercase ‘C’ Cross-Convolutional Layer Uppercase ‘D’ Dense Layer vX Version X 30 Each of the networks considered contained different amounts of convolutional and dense layers. The minimum number of convolutional layers was chosen to be one and the maximum was chosen to be three. Similarly, the minimum number of dense layers was chosen to be one and the maximum was chosen to be two. All convolutional layers utilized sigmoid activation functions and all dense layers utilized Leaky ReLU activation functions. Finally, all convolutional layers consisted of three filter kernels with a receptive field of size 1×51. All architectures were designed to take a one-dimensional voltage observation as an input and output a prediction for the impedance along the length of the transmission line. The output of the network was discretized to consist of 900 samples along a 9 m transmission line resulting in a spacial sampling rate of 0.01 meters per sample. Because both network types end in dense layers, the number of output samples is equal to the number of neurons meaning all networks contain 900 neurons in the final dense layer. A complete summary of the architectures considered is provided in Table 3. Note that architectures will be refined, therefore these preliminary architectures will be denoted version 1 or ‘v1’. 31 Table 3: Summary of the layer attributes for each of the models considered in the initial architecture evaluation. LRU signifies the Leaky ReLU activation function. Dense Sigmoid Sigmoid Sigmoid - - 900 900 900 LRU LRU LRU cDDv1 ccDDv1 cccDDv1 3 3 3 1×51 1×51 1×51 Sigmoid Sigmoid Sigmoid - - 2000→900 2000→900 2000→900 LRU LRU LRU cCDv1 ccCDv1 cccCDv1 3 3 3 1×51 1×51 1×51 Sigmoid Sigmoid Sigmoid 1×1×3 1×1×3 1×1×3 Sigmoid Sigmoid Sigmoid 900 900 900 LRU LRU LRU cCDv1 ccCDv1 cccCDv1 3 3 3 1×51 1×51 1×51 Sigmoid Sigmoid Sigmoid 1×1×3 1×1×3 1×1×3 Sigmoid Sigmoid Sigmoid 2000→900 2000→900 2000→900 LRU LRU LRU 32 Cha Act ivat ion 1×51 1×51 1×51 Num Act ivat ion 3 3 3 Rec cDv1 ccDv1 cccDv1 Architecture nne ls Rec . Fi eld . N euro ns Cross-Convolution Act ivat ion . Fi eld Convolution The results from the initial simulation are shown in Figures 12 and 13. Although the best architecture from the initial architectures achieved an error of 0.528, it was unable to achieve the goal of log(MAE) = 0. The results were divided into CNN and CNN with cross-convolutional pooling for clarity. Curves that do not extend to 1000 epochs were terminated because convergence had plateaued. The sharper dips (e.g. in Figure 13 the dip at 120 epochs for ‘cCDDv1’) occur when the learning rate was decreased. Figure 12 shows the results achieved using the basic CNN architectures. Of the six architectures tested, three performed particularly well (‘cD’,‘ccD’ and ‘cccD’). These three architectures contained only one dense layer. The three architectures that performed poorly were ‘cDD’, ‘ccDD’, and ’cccDD’. All the poorly performing basic CNN architectures contained two dense layers. Given this information, it appears as though adding a second dense layer to the basic CNN architecture degrades performance. This may be because a second dense layer is causing the network to be too flexible and therefore difficult to train. Figure 13 shows the results achieved using the CNN architecture that was pooled using a cross-convolutional layer. The two architectures that performed the best were ‘cCDD’ and ‘cCD’. The third best performer was ‘ccCDD’. The final three poor performers were ‘ccCD’, ‘cccCD’, and ‘cccCDD’. The top two performing architectures were ‘cCDD’ and ‘ccD’. The ‘best’ architectures were determined by looking at the lowest error value achieved. These two architectures were further refined in an attempt to lower the error. Table 4 gives the converged error of each architecture. From this table, the two top performing architectures ‘ccD’ and ‘cCDD’ can easily be identified. 33 Figure 12: The log of the mean absolute error for the models considered in the initial architecture evaluation. This figure only shows the results for CNN architectures without cross-convolution layers. The sharp decreases in error generally correspond to the changes (decreases) in the learning rate. The best performing architecture is marked with a ‘*’. The versions are denoted v* where * represents the version number. 34 Figure 13: The log of the mean absolute error for the models considered in the initial architecture evaluation. This figure only shows the results for CNN architectures with cross-convolution layers. The sharp decreases in error generally correspond to the changes (decreases) in the learning rate. The best performing architecture is marked with a ‘*’. The versions are denoted v* where * represents the version number. 35 Table 4: The log of the converged error for each of the models considered in the initial architecture evaluation. The two entries in bold are the top performing architectures. Architecture cDv1 ccD cccDv1 cDDv1 ccDDv1 cccDDv1 cCDv1 ccCDv1 cccCDv1 cCDDv1 ccCDDv1 cccCDDv1 Minimum Error 0.83 0.754 0.992 2.035 2.035 2.035 .87 2.035 2.035 0.528 1.503 2.035 36 4.2.3 Secondary Architecture Evaluation In an attempt to get closer to the goal of log(MAE) = 0, several variations of the two top performing architectures (‘cCDDv1’ and ‘ccDv1’) from the initial architecture evaluation were considered for refinement. More specifically, the performance change is evaluated as the number of channels and the filter sizes are varied. The number of channels was varied from two to five and the receptive field was varied from 1×25 to 1×101. Summary of the layer attributes for each of the models considered in the secondary architecture evaluation is provided in Table 5. The results of the secondary architecture evaluation are given in Figures 14 and 15 and Table 6. Table 5: Summary of the layer attributes for each of the models considered in the secondary architecture evaluation. The bold entries represent changes from the initial architecture evaluation. LRU signifies the Leaky ReLU activation function. Dense - - 900 900 900 900 900 LRU LRU LRU LRU LRU cCDDv2 cCDDv3 cCDDv4 cCDDv5 cCDDv6 4 5 2 3 3 1×51 1×51 1×51 1×25 1×101 Sigmoid Sigmoid Sigmoid Sigmoid Sigmoid 1×1×4 1×1×5 1×1×2 1×1×3 1×1×3 Sigmoid Sigmoid Sigmoid Sigmoid Sigmoid 2000→900 2000→900 2000→900 2000→900 2000→900 LRU LRU LRU LRU LRU Act iva Cha tion Act ivat ion Sigmoid Sigmoid Sigmoid Sigmoid Sigmoid Num . Fi eld 1×51 1×51 1×51 1×25 1×101 Rec 4 5 2 3 3 Rec ccDv2 ccDv3 ccDv4 ccDv5 ccDv6 Architecture nne ls Act ivat ion . N euro ns Cross-Convolution . Fi eld Convolution Figure 14 shows the performance of the models considered in the secondary ar37 chitecture evaluation without cross-convolutional layers. Generally speaking, these five architectures performed as would be expected from variations of the ‘ccD’ architecture. It is interesting to note that the two smallest architectures (‘ccDv4’ and ‘ccDv5’) achieved the lowest error from this set of evaluations. This supports the idea that using small architectures well matched to the problem, can have high performance despite the size. Compared to the models with cross-convolutional layers, models without cross-convolutional layers generally have worse performance. Figure 15 shows the performance of the models considered in the secondary architecture evaluation with cross-convolutional layers. Again, the two smallest architectures (‘cCDDv4’ and ‘cCDDv5) achieved the lowest error from this set of evaluations, out performing all models without cross-convolutional layers. Given both architectures with and without cross-convolutional layers that were smaller performed better, one more evaluation will be performed in the next section where even smaller architectures are evaluated. However, as a final note, the models in the secondary architecture evaluation did not result in better performance than the models in the initial architecture evaluation. The best architecture was able to achieve an error of 0.645 and therefore did not meet the goal of log(MAE) = 0. 38 Figure 14: The log of the mean absolute error for the first 300 epochs of the secondary architecture evaluation. This figure only shows the results for CNN architectures without crossconvolution layers. The sharp decreases in error generally correspond to the changes (decreases) in the learning rate. Table 6: The log of the converged error for each of the models considered in the secondary architecture evaluation. The two entries in bold are the top performing architectures. Architecture Minimum Error ccDv2 ccDv3 ccDv4 ccDv5 ccDv6 cCDDv2 cCDDv3 cCDDv4 cCDDv5 cCDDv6 0.787 0.924 0.729 0.733 0.936 0.888 0.734 0.573 0.645 0.768 39 Figure 15: The log of the mean absolute error for the first 300 epochs of the secondary architecture evaluation. This figure only shows the results for CNN architectures with cross-convolution layers. The sharp decreases in error generally correspond to the changes (decreases) in the learning rate. The best performing architectures are marked with a ‘*’. 40 4.2.4 Tertiary Architectures Although secondary architectures were unable to improve upon initial architecture performance, the secondary evaluations generally showed that smaller architectures were performing better. Because of this, one final refinement was considered where the two best performing architectures (‘ccDv1’ and ‘cCDDv1’) had a kernel size of 25 and only two channels. This results in the smallest architectures of simulation one. Summary of the layer attributes for each of the models considered in the tertiary architecture evaluation is provided in Table 7. The results of the tertiary architecture evaluation are given in Figure 16 and Table 8. Although the best architecture only achieved an error of 0.681 and was not able to meet the goal of log(MAE) = 0, the model with the cross-convolutional layers outperformed the model without the cross-convolutional layers. The results of the tertiary evaluation were consistent with the first two evaluations, but did not result in improvement over the initial architecture evaluations. Table 7: Summary of the layer attributes for each of the models considered in the tertiary architecture evaluation. The bold entries represent changes from the initial architecture evaluation. LRU signifies the Leaky ReLU activation function. 900 LRU cCDDv7 2 1×25 Sigmoid 1×1×2 Sigmoid 2000→900 LRU Act ivat io Num . Act ivat io Act ivat io Cha Rec 41 n - Neu - n Sigmoid n 1×25 Rec 2 ls . Fi eld rons Dense ccDv7 Architecture . Fi eld Cross-Convolution nne Convolution Figure 16: The log of the mean absolute error for the first 300 epochs of the tertiary architecture evaluation. The sharp decreases in error generally correspond to the changes (decreases) in the learning rate. The best performing architectures are marked with a ‘*’. Table 8: The log of the converged error for each of the models considered in the tertiary architecture evaluation. The entry in bold is the top performing architecture. Architecture Minimum Error ccDv7 cCDDv7 1.129 0.681 42 4.2.5 Final Evaluation and Summary of Results The top three performing architectures of simulation one were ‘cCDDv1’, ‘cCDDv4’, and ‘cCDDv5’. Model ‘cCDDv1’ performed the best of the three. Thus far, only one trial for each architecture was simulated and reported. To decrease the stochastic effects of initialization on the findings, five additional simulations of each architecture with random initialization were performed and the results averaged. Average results help give more confidence about network convergence and therefore the design of the architecture. This will give more confidence when applying the networks in simulation two. The results of this final architecture evaluation are given in Figure 17. Thus far, all errors reported correspond to the training error. Next, the testing error is reported to demonstrate how well the models generalize to new data. The average testing and training errors for the final architecture evaluation are shown in Table 9. Overall, the architecture ‘cCDDv1’ has the best performance and appears to generalize better than the other architectures considered. Moreover, the results of these simulations support the hypothesis that cross-convolutional layers aid in performance. In order to give a different demonstration of the model performance, inputs were randomly selected and given to the network for prediction. The outputs were then visually compared to the ideal outputs. Figures 18 - 20 show the outputs of ‘cCDDv1’ for three randomly selected inputs taken from the testing set. As can be seen in the figures, the network output is fairly consistent with the desired output. However, the exact impedance values for some predictions have some error. For example, in Figure 20(c) the output impedance after ∼5 m should be 50 Ω but appears to be slightly less than 50 Ω. There also appears to be some 43 false edges. For example, in Figure 19(c) there there is an incorrect edge at ∼2 m with an impedance of ∼200 Ω. In some cases a ringing may be observed which may be an overcompensation for an upcoming sharp discontinuity, for example see Figure 20(c) at about 4.5 m. Finally, some edges were detected either early or late, for example see Figure 18(c) at ∼2 m. Simulation one has produced several neural network architectures that can estimate the impedance down a transmission line given a measurement of reflections from an emitted pulse. Subjectively, the appeared to produce good impedance profiles, however, no architecture was able to meet the goal of log(MAE) = 0. Two aspects in particular could have been modified to potentially improve performance. First, the use of Ansys imposed restrictions on the type of scenarios that could be simulated. This is due to the fact Ansys was designed to optimize circuits, not generate machine learning data sets. Future research could make use of custom code to compute the forward problem. Custom code could improve memory management to more easily allow large data sets to be simulated. Second, to decrease the stochastic effects of random initialization, each architecture could be evaluated many more times, as was done in the final evaluation. This work can find application in estimating the permittivity of unknown materials similar to a conventional transmission line method as described in [53]. Additionally, this method can be directly implemented in 3D if the incident wave is assumed to be a plane wave incident upon flat, parallel media [40]. In summary, simulation one has provided information that will aid in the design of neural networks for the 3D inverse scattering problems in simulation two. 44 Figure 17: The expected value of the log of the mean absolute error for the first 300 epochs of the top three performing architectures of simulation one. The expected value is computed with N = 5. The best performing architecture is marked with a ‘*’. Table 9: The log of the expected value of the converged error of the best performing architectures of simulation one. The expected value is computed with N = 5. The entry in bold is the top performing architecture. Architecture Average Training Error Average Testing Error cCDDv1 cCDDv4 cCDDv5 0.662 1.411 0.962 1.145 1.569 1.189 45 (a) (b) (c) Figure 18: An example of the network (‘cCDDv1’) output for a randomly selected input taken from the testing set. (a) The measured signal. (b) The ideal output. (c) The output of the network. 46 (a) (b) (c) Figure 19: An example of the network (‘cCDDv1’) output for a randomly selected input taken from the testing set. (a) The measured signal. (b) The ideal output. (c) The output of the network. 47 (a) (b) (c) Figure 20: An example of the network (‘cCDDv1’) output for a randomly selected input taken from the testing set. (a) The measured signal. (b) The ideal output. (c) The output of the network. 48 5 Simulation Two In this chapter, a neural network will be designed that extends the 1D imaging of the previous chapter to 3D imaging. First, the approach used to synthesize a 3D data set will be presented. Training and refinement of the neural network will then follow. 5.1 Data Synthesis To properly train the neural network, a 3D data set was required. A 3D data set was obtained through simulation with the free and open source software, openEMS [54,55]. The simulation software openEMS utilizes a flexible Matlab interface that naturally scaled to New Mexico State University’s High Performance Computer (HPC) “Discovery”. In simulation one, the goal was to determine the impedance profile of multiple transmission lines. In simulation two, the goal is to determine the permittivity profile as opposed to the impedance. Specifically the goal is to predict the relative permittivity r (x, y, z). The relative permittivity r (x, y, z) is related to the total permittivity (x, y, z) through (x, y, z) = r (x, y, z)0 (6) where 0 is the permittivity of free space and the inputs x, y, and z represent the the permittivity at a particular location in 3D space [56]. The author has chosen to attempt to achieve a Mean Absolute Error (MAE) of r < 1. Given the range of permittivity values that can occur in real world applications (r ≈ 1 − 60) [56–58], 49 a MAE of r < 1 should give reasonable differentiation between objects and remain fairly close to the true permittivity of the object. In 3D, there are many locations that the transmitters and receivers can be placed and consequently multiple transmitters and receivers can now be utilized simultaneously as opposed to only a single pair for the 1D case. Specifically, there are four configurations that could be utilized: Single Input Single Output (SISO), Single Input Multiple Output (SIMO), Multiple Input Single Output (MISO), and Multiple Input Multiple Output (MIMO). An example of each configuration is shown in Figure 21. The configuration chosen in simulation two is SIMO. This configuration has several benefits. Only a single source is required which simplifies the simulation significantly by avoiding any timing issues required by multiple sources. If the SISO configuration was used, only one source would be required, however this source would have to move or scan an object to get more information about the target. Computationally, moving the source would require multiple simulations for a single target which is expensive. The chosen configuration SIMO allows for only a single simulation to be performed per target. The usage of multiple sensors allow for more information about the reflected waves to be obtained from a single pulse and a single simulation. The chosen SIMO configuration utilized 25 sensors placed on 5×5 planar grid with 1 wavelength separation between sensors. Note that simulation two considers most distances in terms of wavelengths. The wavelength specified will always be at the minimum cutoff frequency of 1.5 GHz resulting in the maximum wavelength present in the simulation of 0.1999 m unless otherwise specified. The chosen configuration is shown in Figure 22. A planar grid was chosen for simplicity, however this will cause ambiguity if there are known targets equidistant behind 50 Figure 21: Possible transmitter and receiver configurations. In 1D, if constrained to place transmitters and receivers only on one side of the target, there is only one location to place the transitter/receiver pair. Shown here is only 2D, however it illustrates how many configurations are possible as well as the types of configurations. 51 the sensors (see Figure 3). For the purpose of simulation two, no targets were located behind the grid and therefore any ambiguity was eliminated. All simulations were run with the following parameters. Also note that the neural network (to be described in the next section) will only predict a small cubic volume inside of the simulated data set which is termed the imaging volume. The maximum resolution in all simulations was the minimum wavelength wavelength/20 = 6 mm. The sensor grid was placed 5.5 wavelengths away from the imaging volume with 1 wavelength between sensors. The source is placed at the center of the sensor grid five wavelengths away from the imaging volume to avoid numerical instability at the sensors1 . The receivers were simulated by adding field dumps that straddled single grid points. The source was generated in openEMS by exciting a line of 3 mesh points with y polarization and a strength of 100 V/m. This excitation numerically simulated a line source. The excitation was a Gaussian pulse centered at 2 GHz and a 20 dB cutoff frequency of 1.5 GHz/2.5 GHz. The simulation boundary utilized 12 Perfectly Matched Layers (PMLs) to absorb all radiation leaving the simulation. The simulation boundary kept at least 2 wavelengths away from all objects in the simulation to ensure no objects directly interacted with the PML’s. These parameters are summarized in Table 10. 1 Numerical instability was observed near the source that is believed to be caused by the Finite Difference Time Domain method near sharp discontinuities. 52 Receivers Target Transmitter Figure 22: Chosen transmitter/receiver SIMO configuration. The 25 sensors are placed on a square 5 x 5 planar grid with 1 wavelength separation between receivers. The transmitter is offset from the receivers by half a wavelength to avoid a numerical error consistent with FDTD computational methods near sharp discontinuities. The target is 5.5 wavelengths away from the receivers and 5 wavelengths away from the transmitter. 53 Table 10: Summary of the parameters used for simulation in openEMS. *The maximum wavelength is termed wavelength throughout this chapter unless otherwise specified. Simulation Parameter Value Excitation Type Gaussian Pulse Excitation Strength 100 V/m (Vertical Polarization) Center Frequency 2 GHz Max/Min Cutoff Frequency 1.5 GHz/2.5 GHz Max/Min Wavelength* 199.9 mm/119.9 mm Max Spacial Resolution 6 mm Dist. Sensor Grid to Imaging Volume 5.5 wavelengths (1.0992 m) Dist. Source to Imaging Volume 5 wavelengths (.9993 m) Sensor Grid Type/Configuration Planar Grid (5 x 5) Sensor Grid Sensor Spacing 1 wavelength Simulation Boundary Layers 12 Perfectly Matched Layers Simulation Boundary Min Dist. to Objects 2 wavelengths Timestep 5.5 ps Simulation Length (Samples/Time) 3501 samples / 19.2555 ns 5.1.1 Data Sets To demonstrate a simplified real world application, four different data sets were simulated. The first three represent simplified real world scenarios and the fourth is composed of random objects to aid in algorithm generalization. Note that the imaging volume is only a small cube of the entire simulation. However, the data sets were designed to show the most detail inside of the imaging volume to best demonstrate functionality. The first data set consisted of asphalt cubes of different sizes. This data set will be termed the Runway-Functional data set as it is representative of a slab of runway without cracks or inclusions. The permittivities were chosen to be similar to that of asphalt [56]. The authors of [56] found the true permittivity values of asphalt to be slightly lossy, however for simplicity only the lossless permittivity 54 Figure 23: Slices down the yz, xz, and xy planes of a sample point from the Runway-Functional data set. The dimensions of the x and y axes are in mm. values were utilized. These permittivity values had an r value between 4 and 8. It is assumed that the cubes are floating in free space and all energy that scatters off of the cube will continue to infinity. In total, 320 data points were simulated with the box size ranging from 25 mm to 100 m with r values ranging from 4 to 8. A sample data point from this data set is shown in Figure 23. The second data set is composed of cubes of asphalt, but unlike the first data set, the cubes have spherical inclusions of air. This data set will be termed the Runway-Faulty data set. This data set is intended to be compared with the Runway-Functional data set and is intended to demonstrate the ability to detect 55 Figure 24: Slices down the yz, xz, and xy planes of a sample point from the Runway-Faulty data set. The dimensions of the x and y axes are in mm. cracks within an asphalt runway. The cubes were kept at a constant size of 100 mm while the size of the sphere was varied between 12.5 mm and 25 mm. The permittivity of the asphalt was varied between 4 and 8 as before. In total, 320 data points were simulated. A sample data point from this data set is shown in Figure 24. The third data set is intended to simulate a simplified biomedical imaging application. Two concentric cylinders of the approximate size and permittivity of a human thigh were simulated. This data set will be termed the Human Leg data set. The leg was assumed to float in free space as before. The outer cylinder had the 56 approximate permittivity of muscle and the inner cylinder had the approximate permittivity of bone [57, 58]. True human tissue is lossy and anisotropic, however this is neglected for simplicity. The outer cylinder radius was held constant at 75 mm and the length of both cylinders was held constant at 350 mm. The inner cylinder radius was varied from 20 to 30 mm. The permittivities of the muscle and bone were varied from 40 to 60 and from 15 to 40 respectively. The distance from the source to the start of the thigh is 1.07 m (5.37 wavelengths). Note that the imaging volume is smaller than the total size of the leg, therefore only the inner cylinder (bone) will be visible in the imaging volume. In total 320 points were simulated. A sample data point from this data set is shown in Figure 25. The fourth data set was created to improve algorithm generalization. It consisted of cubes and spheres of random size, location, and permittivity placed into the space. This data set shall be termed the Random Shapes data set. The number of spheres and the number of cubes inside of the simulation space were varied from 1 to 4 and from 1 to 5 respectively. The sizes of the cubes were randomly drawn from a uniform distribution between 25 and 100 mm. The radius of the spheres was randomly drawn from a uniform distribution with radii between 12.5 and 50 mm. Finally, the permittivity values for the cubes and the spheres were randomly chosen from a uniform distribution between 1 and 60. The permittivity values were chosen to match the permittivity values of the first three data sets and the sizes were chosen to keep all shapes inside of a 101 mm × 101 mm × 101 mm imaging volume. It is important to note that as each cube and sphere is added, its permittivity overwrites the object that was written before it. Because of this overwriting process, much more complex structures were simulated as edges were overwritten to make different, more complex shapes. In total, 960 57 Figure 25: Slices down the yz, xz, and xy planes of a sample point from the Human Leg data set. The dimensions of the x and y axes are in mm. The images shown are of the imaging volume, which is inside of the human leg. Consequently, the boundary of the outer cylinder which is present during synthesis will not be visible during imaging due to the size of the human leg compared to the size of the imaging volume. 58 Figure 26: Slices down the yz, xz, and xy planes of a sample point from the random shapes data set. The dimensions of the x and y axes are in mm. The sizes of the cubes and spheres were chosen to fit inside of the imaging volume and permittivities were chosen to correspond to the other three data sets. Note that more complex structures are formed through the combination of all structures such as the orange triangular shape on the lower left side of the xz plane image. This data set was generated to aid in algorithm generalization. points were simulated. A sample data point from this data set is shown in Figure 26. A summary of each data set is shown in Table 11. 59 Table 11: Summary of the four simulated data sets. Random Shapes data set was primarily used for training. The Runway-Functional, Runway-Faulty, and Human Leg data sets were designed to represent a simplified real world scene and were primarily used to test trained networks. Data Set Number of Points Description Runway-Functional Runway-Faulty 320 320 Human Leg Random Shapes 320 960 A solid cube of asphalt An asphalt cube with a spherical inclusion of air in the center Two concentric cylinders Cubes and spheres randomly placed inside the imaging volume 5.2 Neural Network Design Due to the design of the simulated data sets, the input of the network is constrained to the 25 electric field trace inputs. The method chosen to represent the output permittivity profile is a 3D cube of size 101 mm × 101 mm × 101 mm resulting in 1 voxel per mm3 . Note that x-ray is currently able to achieve a resolution of approximately 100 µm [59]. Increasing the resolution will either shrink the field of view of the imaging volume or increase the computational complexity of the problem. The chosen size of the imaging volume will allow slightly larger objects to be simulated with manageable computational complexity. The proposed method can easily be modified to increase the resolution of the imaging volume by decreasing the size of the imaging volume while keeping the same dimensions. Increasing the resolution is not studied in this work and is left for future work. As previously mentioned, the imaging volume does not cover the entire extent of the 3D simulations from the previous section. However, the data sets were designed to show the most detail inside of the cube. The imaging volumes were generated using the OpenEMS simulations by specifying the objects drawn (e.g. cubes, spheres, cylinders) and their respective locations. Then 3D arrays 60 were generated where each point was the appropriate impedance at the appropriate distance from the sensor grid (approximately 5.5 wavelengths away). Once the data sets were simulated, the neural networks could then be trained. Recall the goal of simulation two is to predict the unknown target’s permittivity profile from the 25 measured time domain electric field traces. Initial attempts to directly transfer the result of simulation one to simulation two, were unsuccessful because it was found that this method overfit the data and provided poor generalization. For this reason and due to the large number of neurons required for this approach, a different direction was taken. A custom neural network was designed that consisted of a Convolutional Neural Network (CNN)—used to learn physics—followed by the decoder of a Variational Autoencoder (VAE)—used to draw the predicted scene. More specifically, the neural network was broken into two parts: 1) a normal CNN for the first layers followed by 2) the decoder of a pre-trained variational autoencoder to generate the imaging volume prediction. The variational autoencoder was first trained to generate the types of imaging volumes that are present in the data sets A variational autoencoder is used as opposed to a traditional autoencoder to add structure to the latent space. The encoder takes an imaging volume as an input and compresses that volume to a low dimensional vector. The decoder then takes this low dimensional representation and maps it to an imaging volume that is representative of the data set. Once trained, the decoder is removed and is never trained again. The decoder is then attached to the output of the convolutional network so that the input of the decoder is now the output of the convolutional network. Together, the convolutional network is responsible for 61 learning the physics and the decoder is responsible for generating the imaging volume. Note that after the decoder is removed from the variational autoencoder, the random draw for the input is not longer performed. Figures 27 - 29 show the described neural network with color coding for clarity. Convolutional Neural Network Trace 1 Trace 2 Input Traces Trace 3 ...Trace 25 Output Vector Figure 27: An illustration of a CNN that will be used in conjuction with the decoder of a VAE to create the proposed full network. The CNN is color coded blue for reference in Figure 29. 62 Variational Autoencoder Input Volume Encoder + rand x exp() Random Draw Latent Space Vector Reconstructed Volume Figure 28: An illustration of the VAE architecture utilized. First the VAE will be trained, then the decoder will be removed, the weights frozen, and finally the decoder will be attached to the output of the CNN of Figure 27. The decoder is color coded brown for reference in Figure 29. 63 Complete Set Up Trace 1 Trace 2 Input Traces Trace 3 ...Trace 25 Output/Latent Space Vector Predicted Volume Figure 29: An illustration of a full network. The CNN of Figure 27 (blue) will be trained to learn physics and will be placed as the input layers of the network. The trained decoder of the VAE (brown) from Figure 28 will then be attached to the output of the CNN (with weights frozen from further training). 64 Splitting the network into two parts has three benefits. 1. The convolutional network that is learning the physics is not directly connected to the output. Therefore it is difficult for the network to overfit the data and is more likely to learn the underlying physics of the problem. 2. The training time for the convolutional network is greatly reduced due to exponentially fewer parameters to train. 3. The VAEs use the same data for the input and targets. Therefore data does not have to be labeled and acquiring training data is much easier. For convenience, the notation summarized in Table 12 is used for describing the networks considered. The notation is identical to simulation one where a lowercase ‘c’ denotes a single convolutional layer, a capital ‘C’ denotes a cross-convolutional layer, and a capital ‘D’ denotes a fully connected dense layer. Additionally, for this simulation, a ‘ct‘ will denote a convolutional transpose layer. Table 12: The abbreviations/notation used to describe neural network architectures used throughout this chapter. Notation Meaning Lowercase ‘c’ Convolutional Layer Uppercase ‘C’ Cross-Convolutional Layer Uppercase ‘D’ Dense Layer Uppercase ‘ct’ Convolutional Transpose Layer VAEvX Variational Autoencoder Version X FNvX Full Network Version X 65 5.2.1 Variational Autoencoder The VAE was developed first. As with simulation one, the neural networks (both the CNN and the VAE) were developed using Julia [42] and the “Flux” machine learning package [49,50]. All networks were trained on New Mexico State University’s HPC termed “Discovery”. Initial Considerations Because the VAE is only used to draw objects, the network can be trained using all data available. This ensures the the output space is adequately sampled for this application. Recall, that the VAE is only responsible for generating the imaging volume. Therefore, the training stage for the VAE will never utilize the sensor probe input data—ensuring that only the CNN will be responsible for learning the underlying physics. To add additional structure to the VAE, both convolutional and convolutional transpose layers are used to extract features in the encoder and build features in the decoder [60]. For simplicity however, the convolutional VAE will still be referred to as the VAE. Although VAEs are typically trained using batch learning, online learning was utilized because GPU memory limitations were encountered (due to the high dimensional output representing the imaging volume). All networks considered had two convolutional layers followed by two dense layers for the encoder. The decoder is symmetrical to the encoder; it begins with two dense layers followed by two convolutional transpose layers. The Adam optimizer was used to minimize the loss function. The loss function chosen was the binary cross entropy for the reconstruction with β = 1 (see Section 2.3.4 for more details). The weights of all networks were initialized using Flux’s default layer initialization (Xavier initialization) by drawing from a uniform 66 distribution as described in [51]. A problem that was initially encountered was the output was always biased to a cube that covered almost the entire imaging volume as shown in Figure 30. This was caused by two things. First, if there were more layers than described above, the output would be compressed to very small values in the latent space. These very small values did not carry enough weight to change the decoder output values and consequently resulted in a flat output as the convolutional transpose layers convolved over a small and not diverse value. Secondly, the choice of an activation function such as the Sigmoid activation function dramatically worsened the small output values of the encoder. The Sigmoid activation function would map the already small values to a scale between 0 and 1 which ensured the values were getting smaller as the number of layers increased. By decreasing the number of layers to the described network and utilizing the ReLU or Leaky ReLU activation function to help increase the variance of the encoder output, the problem of a “flat” output was overcome. 67 Figure 30: An example of a “flat” output from the VAE. The dimensions of the x and y axes are in mm. The output will look “flat” no matter the input or how many epochs are trained. The “flat“ output problem is fixed by increasing the variance of the encoder output by decreasing the number of layers and utilizing the ReLU activation function. 68 Top Performing Architecture Details The top six VAE variations that were developed are shown in Table 14. The three architectures shown in the top half of Table 14 will be discussed first. Note that most variations utilized the Adam optimizer with learning rate η = 10−5 . Some variations were able to function with η = 10−4 , however it was found that larger learning rates resulted in NaN parameters. Two different methods of data normalization were implemented to make each point lie on a scale of 0 to 1. In the first method, termed Method 1, each volume is divided by maximum value in that volume. In the second method, termed Method 2, each volume is divided by the maximum value in the training data set (r = 60). Table 13 gives a summary of these methods. Table 13: A summary of the methods used to normalize the data before training the VAE. Method 2 is preferred over Method 1 because the original volume can be recovered by simply multiplying by the maximum value in the data set of 60. Normalization Method Description Method 1 Divide each volume by the maximum value in that volume. Method 2 Divide each volume by the maximum value in the training data set (r = 60). For the encoder, each of these three networks utilized 2 convolutional layers with 16 filter kernels in each layer with a receptive field of size 3 × 3 × 3. The small receptive field size helped to increase the speed of the network without sacrificing performance. Additionally, the encoder had two dense layers after the convolutional layers to intelligently compress the extracted features. There were 100 neurons in the first dense layer and the final dense layer of the decoder was varied from 10 to 100 neurons. The output of the encoder is a vector from a latent 69 space. The more dimensions the latent space vector has, the more flexibility the network has to differentiate between objects inside the latent space. The decoder has 100 neurons in the first dense layer and 100 neurons in the second dense layer. The second dense layer had the number of neurons equal to the output size of the encoder’s convolutional layers (912,673). The final two layers of the network were convolutional transpose layers. The first convolutional transpose layer utilized 16 filter kernels with a receptive field of size 3 × 3 × 3. The final convolutional transpose layer generated the imaging volume and consisted of a single neuron with a receptive field of size 3 × 3 × 3. All layers utilized the Leaky ReLU activation function with the exception of the last layer being the ReLU activation. The ReLU activation function for the last layer ensured only positive values could be predicted. This kept the range of possible outputs to what is physically possible (positive only values). Additionally, the ReLU activation function does not saturate, so it has the ability to predict any positive permittivity value. The data for each of the first three networks was normalized by Method 1. These three architectures are shown in the top half of Table 14 and their performance is shown in Table 15. The next three highest performing architectures are described in the bottom half of Table 14. These three architectures followed after VAEv3 but the number of filter kernels was varied. Additionally, the last layer was changed to a Leaky ReLU activation as opposed to a typical ReLU activation function. The last activation was changed because the gradient for these three networks was consistently going negative preventing the network from updating. The Leaky ReLU is not exclusively positive and does not saturate, so the network should be able to predict any possible impedance value. Also differing from VAEv3, the networks were 70 normalized by Method 2. Normalization in this manor is preferable because the true permittivity of the object can readily be recovered by multiplying the output of the network by the maximum value of 60. Table 14: Summary of the layer attributes for each of the models considered in the most successful architectures. The format for the convolutional layers is number of kernels/receptive field dimensions. The notation 3* denotes dimensionality of (3 × 3 × 3). The format for the dense layers is the number of neurons. Bold entries represent items that were changed from variation to variation. For the upper three architectures listed, all layers utilized the Leaky ReLU activation function with the exception of the last layers utilizing the ReLU activation. The lower three architectures listed utilized only the Leaky ReLU activation function. Encoder Decoder se2 Den 100 100 100 10 50 100 100 100 100 912673 912673 912673 16/3* 16/3* 16/3* 1/3* 1/3* 1/3* 1 1 1 VAEv12 VAEv13 VAEv14 16/3* 32/3* 64/3* 1/3* 1/3* 1/3* 100 100 100 100 100 100 100 100 100 912673 912673 912673 16/3* 32/3* 64/3* 1/3* 1/3* 1/3* 2 2 2 Met se1 m. Nor Den 1/3* 1/3* 1/3* ose2 ose1 hod ansp Den se1 vTr Con Den 2 16/3* 16/3* 16/3* ansp v Con vTr Con v1 Con VAEv1 VAEv2 VAEv3 se2 Architecture Table 15: Trial 1 summary. Epochs Trained Latent Size Architecture Minimum Error 80 173 143 10 50 100 VAEv1 VAEv2 VAEv3 418572 397117.06 394610.8 71 The error corresponding to the six top performing architectures are shown in Figure 31. For some networks, additional iterations may have resulted in lower error, however, in most cases the general trend of network performance was clear and therefore training was terminated early. For example, VAEv13 would likely see increased performance by increasing training iterations. Each network was trained for approximately two days. Architecture VAEv13 achieved the lowest error, but as will be shown momentarily, VAEv3 subjectively captured features the best. Figure 31: Error of the top performing VAE architectures.VAEv13 achieved the lowest error, however VAEv3 appeared to capture image features better than any other architecture. Other Architecture Details Other trials existed that are worth noting not because of the performance, but because of the knowledge gained from the lack of performance. These architectures are described briefly in this section. For chrono72 logical purposes, it should be noted that the first three architectures of Table 14 were the first architectures tested. Refinements were made after analyzing the results of these three architectures. Trial 2—The architectures of Table 16 were trained after analyzing the results of the first three architectures of Table 14. After analyzing the results, the activation of the last layer was changed to the sigmoidal activation because it was believed that using an activation which can only output on the range of 0 to 1 would be beneficial after normalizing all data points. Additionally, normalization was done through Method 2. It is desirable to normalize the data set in this manner because the original permittivity can readily be recovered, however, it appears that normalizing by Method 1 improves training more than by normalizing by Method 2. This is possibly due to an increase of variance in the data points by ensuring the different objects have dramatically different values. After these changes were implemented, it was found that performance degraded from the previous set of architectures. Both the sigmoid and the change in normalization adversely affected performance. Table 16: Trial 2 summary. Epochs Trained Latent Size Architecture Minimum Error 131 90 44 10 50 100 VAEv4 VAEv5 VAEv6 665946.06 666117.6 673167.7 Trial 3—The next set of architectures that were tested are shown in Table 17. The sigmoidal activation was kept the same, however, the normalization was changed to Method 1. The change in normalization achieved marginally better 73 performance indicating that normalization by Method 1 is aiding in algorithm training. Table 17: Trial 3 summary. Epochs Trained Latent Size Architecture Minimum Error 51 47 26 10 50 100 VAEv7 VAEv8 VAEv9 539597.75 539979.94 550160.8 Trial 4—It was hypothesized that utilization of convolutional transpose layers adversely affect network performance. Convolutional transpose layers look for different characteristics as they slides across an input. It was believed the latent space vector and output of the decoder’s dense layers may not have enough structure to pass valid information through to the convolutional transpose layers. Additionally, the dense layers have to learn structure to pass information to the convolutional transpose layers. It was believed that removing the convolutional transpose layers and utilizing only dense layers may remove this difficulty. These architectures are described in Table 18. The first architecture was normalized by Method 2 and the second architecture was normalized by Method 1. While performance did increase, it was not able to improve upon the first three architectures of Table 14. This is likely because the convolutional transpose layers make decisions about local regions of the input. Each neuron at the output of a dense layer must make a decision based on the entire input to that neuron. For this reason, it is likely that removing the convolutional transpose layers made the output too flexible and degraded performance. 74 Table 18: Trial 4 summary. Epochs Trained Latent Size Architecture Minimum Error 116 6 100 100 VAEv10 VAEv11 419576.8 502137 Trial 5—Because no architecture was able to improve upon the performance of the first three architectures of Table 14, a closer refinement of these three architectures was attempted by taking VAEv3, normalizing by Method 2, and increasing the number of filters. These three architectures are shown in the bottom of Table 14 and the corresponding performance is shown in Table 19. Table 19: Trial 5 summary Epochs Trained Num. Filters Architecture Minimum Error 127 105 88 16 32 64 VAEv12 VAEv13 VAEv14 395603.2 384601.75 391259.66 Trial 6—To attempt to further improve the normalization, the architectures in Table 20 were investigated. The VAEv3 model was visited again but using sigmoid activation function to ensure the network was only capable of outputing values on the range of 0 to 1. It was found that the sigmoid activation performed very poorly with training of any VAE architecture. 75 Table 20: Trial 6 summary. Epochs Trained Num. Filters Architecture Minimum Error 3 88 47 16 32 64 VAEv15 VAEv16 VAEv17 715101.4 667760.1 671514 Trial 7—After analyzing the results given in in Table 14, it was found that the output of the Leaky ReLU and ReLU functions were difficult to maintain between values of 0 and 1. But if the sigmoid activation was utilized, performance also degraded. For these reasons, it was hypothesized that the ReLU and Leaky ReLU activation functions were firing too strongly at most locations in the imaging volume. To compensate and generate as low an error as possible, the network appeared to be capturing as much of the information about the shapes as possible because it was difficult to perfectly capture the permittivity information. Conversely, it appeared that the sigmoid activation was not firing strongly enough to capture permittivity information. It was hypothesized the network was relying on the fact that it was already outputting values on the correct range of 0 to 1. This appeared to result in the network not being able to learn information about the shapes correctly. In an attempt to remedy these two problems, the architectures shown in Table 21 were first trained with the Leaky ReLU activation function to learn the proper shapes in the data set. Once a plateau in training was reached, the activation function of the last layer was swapped to the Sigmoidal activation function to learn permittivity information. It was found that even with starting with a pre-trained network, the Sigmoidal activation function was unable to learn and adversely 76 affected performance. Table 21: Trial 7 summary. Epochs Trained Num. Filters Architecture Minimum Error 47 41 34 16 32 64 VAEv18 VAEv19 VAEv20 665520.2 662566 664195.2 VAE Summary and Results To summarize, it was found that a large amount of layers coupled with the sigmoid activation function resulted in a constant “flat” output regardless of whether the network was pre-trained or not. This was fixed by using fewer layers with the ReLU or Leaky ReLU activation. It was found that the ReLU activation function sometimes produced a negative gradient during training that resulted in no parameters updating. This can be fixed by utilizing the Leaky ReLU activation function. It was also found that Sigmoidal functions appear to perform poorly in these VAE architectures applied to these data sets. Additionally, the following observations were made. Learning rates larger than η = 10−4 resulted in NaN parameters. This is likely due to large initial errors passed through an exponential function causing values to grow larger than machine precision can handle. Normalizing the data set by Method 1 as opposed to normalizing by Method 2 increased performance likely due to increased data variance, but made it difficult to determine the original permittivity. Convolutional transpose layers appear to perform better than dense layers for building an output likely due to convolutional layers accepting a local input of the previous layer as opposed to the entire previous layer. As an additional note, in simulation one, multiple trainings were performed with the same parameters to reduce the uncer77 tainty of network performance due to stochastic initialization. In simulation two, due to time constraints, only single trainings without averaging were performed. Some sample results are shown in Figures 32–35. The outputs were generated by passing the ideal function through the encoder layers and directly passing the mean output to the decoder without performing a random draw or including the variance output. This should query the network for the ideal output as opposed to an object that looks similar to the input. Figures 32 and 33 are outputs from VAEv3 while Figures 34 and 35 are from VAEv13. Architecture VAEv3 subjectively appeared to produce the best results while VAEv13 achieved the lowest error. Because VAEv3 had a higher error than VAEv13 but subjectively appeared to perform better, it is likely that the utilized loss function can be improved upon. VAEv3 utilized the ReLU activation function and appeared to generate sharper objects as opposed to VAEv13 that utilized the Leaky ReLU activation. When trained, the Leaky ReLU activation function tends to fire in the slowly varying negative portion of the activation likely causing edges to be distorted. Both networks were able to generate objects that looked similar to the ideal images, however fine details were not able to be recovered. For example in Figure 33, the strong permittivity shown in the bottom left slice was able to be captured, but the shapes in the center were unable to be recovered. Additionally, the true permittivity values were unable to be recovered. Also, observe Figure 35. The shapes were recovered fairly well, however the network predicted negative permittivities which are not physically possible. 78 (a) (b) Figure 32: A sample output from VAEv3. The dimensions of the x and y axes are in mm. (a) Ideal output. (b) Predicted output. The network was able to capture the sphere in the center, but not the cube shown at the top right of the ideal slices likely due to the permittivity being close to free space. Note the predicted permittivites are larger than 1. 79 (a) (b) Figure 33: A sample output from VAEv3. The dimensions of the x and y axes are in mm. (a) Ideal output. (b) Predicted output. The network was able to capture the shape with stronger permittivity shown at the bottom of the ideal slices. The network was not able to capture the shapes with smaller permittivity values. Note the predicted values are greater than 1. 80 (a) (b) Figure 34: A sample output from VAEv13. The dimensions of the x and y axes are in mm. (a) Ideal output. (b) Predicted output. The network was able to detect the stronger permittivity features of the cube in the top two ideal slices and the sphere and cube of the bottom ideal slice. Note the permittivities were not correctly predicted. 81 (a) (b) Figure 35: A sample output from VAEv13. The dimensions of the x and y axes are in mm. (a) Ideal output. (b) Predicted output. The network was able to capture both the sphere and the cube in the ideal image however it was not able to display the correct permittivity. 82 5.2.2 Full Network After training the VAEs, the top performing VAE architecture was selected (VAEv3). Although VAEv3 normalized by Method 1 (which makes it difficult to recover the original permittivity values), it achieved one of the lowest error values and subjectively output the best images. Because the decoder of the VAE is simply drawing the output, it should not matter how the network is normalized. A network composed of convolutional and dense layers was placed at the front of a trained VAE decoder. Note that the weights were initialized using Flux’s default layer initialization (Xavier initialization) by drawing from a uniform distribution as described in [51]. The combination of these two networks will be referred to as the “full network” for the remainder of this document. The weights corresponding to the VAE decoder are frozen from subsequent training to isolate the input from the output. Isolating the input from the output ensures that the network will not overfit the data. The decoder weights ensure the network is capable of drawing the types of objects desired to be imaged. Once the decoder weights are trained and frozen, they cannot learn to draw any other types of objects. Therefore, once the convolutional and dense layers are attached, there is no way for the network to modify the weights that are drawing the imaging volumes and therefore there it is very difficult for the network to overfit to the images it is training on. The full networks were only trained on the Random Shapes data set using the MAE as the loss function. The Asphalt-Functional, Asphalt-Faulty, and Human Leg data sets were used for validation and testing purposes only. Of the three data sets not used for training, 200 points were randomly selected for validation from each data set and the remaining 120 points from each data set were used 83 for testing. In total, 600 points were used for validation and 360 points were used for testing. Utilizing only the random shapes data sets for training is highly advantageous for assessing network ability to generalize. If the network can learn how to predict real world scenes from completely random data, then it is highly likely the network has learned the underlying physics and is not overfitting the problem. In total, five variations of full networks were trained and evaluated. From simulation one, it was found that architectures of format “cCDD” process time domain traces well. For this reason, the first three full networks utilized “cCDD” for the CNN half of the full network. That is, each of the first three full networks contained a single convolutional layer followed by a cross-convolutional layer to pool the results followed by two dense layers with 100 neurons in each dense layer. The number of kernels was varied as well as the receptive field of each crossconvolutional kernel. Padding was set for the convolutional layers so that the input and output dimensionality were the same by adding zeros to the beginning and end of the time domain traces. A summary of each of the first three architectures is shown in Table 22 and the results are shown in Figures 36 - 38. Validation error is important because it exemplifies the network’s ability to generalize to new data. The network with the lowest validation error was FNv2, however it appeared that FNv1 was the most stable of the three architectures. For this reason, FNv1 was selected to be refined one final time. Note that the lowest error achieved was 11.222 therefore none of the architectures were able to achieve the goal of a MAE of r < 1. 84 Table 22: Summary of the layer attributes for each of the models considered in the first three variations of full networks. The format for the convolutional layers is number of kernels/receptive field dimensions. The format for the dense layers is the number of neurons. Bold entries represent items that were changed from variation to variation. All layers utilized the Leaky ReLU activation function with the exception of the last layers being the ReLU activation. CNN Figure 36: Training and validation error of FNv1. 85 10.536 10.803 10.350 n atio VAEv3 VAEv3 VAEv3 d Vali 100 100 100 ing sion Ver 100 100 100 Minumum Error in Tra se2 1/(16x1) 1/(32x1) 1/(64x1) Den onv ss-C Cro 16/(25x51) 32/(25x51) 64/(25x51) se1 v Con FNv1 FNv2 FNv3 Den Architecture Decoder 12.864 11.376 12.856 Figure 37: Training and validation error of FNv2. FNv4 and FNv5 were based off of architecture FNv1 but with varying number of layers. A summary of these architectures is shown in Tables 23 and 24 and the results are shown in Figures 39 and 40. Again, note that no architecture was able achieve the goal of a MAE of r < 1. 86 Figure 38: Training and validation error of FNv3. Table 23: Summary of the layer attributes for each of the models considered in the final two variations of full networks. The format for the convolutional layers is number of kernels/receptive field dimensions. The format for the dense layers is the number of neurons. Bold entries represent items that were changed from variation to variation. All layers utilized the Leaky ReLU activation function with the exception of the last layers for the first three listed architectures being the ReLU activation. 87 1/(16x1) 1/(16x1) se2 16/(16x51) 16/(16x51) Den 16/(25x51) 16/(16x51) Den se1 16/(25x51) -Co nv Con v FNv4 FNv5 Cro ss Architecture Con v Decoder Con v CNN 100 100 100 100 Table 24: Summary of the minimum error for each of the models considered in the final two variations of full networks. Architecture Decoder Version Min. Training Error Min. Validation Error FNv4 FNv5 VAEv3 VAEv3 8.736 13.925 11.222 19.216 Figure 39: Training and validation error of FNv4. 88 Figure 40: Training and validation error of FNv5. 89 5.2.3 Summary of Results The top performing architecture of simulation two was FNv2, however FNv1 appeared to have the most stable error curve for generalization. Subjectively, all architectures seemed to perform better or worse for different data sets. Figures 41 - 45 show some sample outputs taken from the data sets. All points were generated with the weights that resulted in the lowest training error. Figure 42 shows a point taken from the Random Shapes training set while Figures 41, 43, 44, and 45 show a point taken from the Human Leg, Runway-Functional and two points from the Runway-Faulty testing sets respectively. The left columns represents the ideal output of the network, the middle columns represent the output of a VAE (VAEv3), and the right columns represents the output of the full network. The middle column should be the best achievable output for the full network because it should represent the best possible way for the decoder to draw the ideal input. Note that none of the VAE outputs were able to resolve finer details such as edges as will be seen momentarily. Figure 41 shows an example output of FNv3 on the Human Leg testing set. This figure is very interesting because the full network was never trained on this data set; only the VAE was trained on this data set. Vertical lines can be seen in the prediction of the full network on the lower two images of the full network output that should ideally be present. However the full network was only trained with cubes and spheres; the full network was not trained on any cylinders. This means that in the latent space of the VAE, cylindrical objects are situated close enough to other objects that it was still able to utilize this section of the VAE latent space. Figure 44 shows FNv3 applied to the Runway-Faulty data set which is most 90 similar to the Human Leg data set in the center of the images. The network was not able to discern the inclusion in the center. Because the architecture is able to detect the larger cylindrical object, but not the smaller spherical inclusion, this suggests the network was unable to detect the spherical object—regardless of the VAE’s ability to draw the object. It should be noted that just because FNv3 was unable to utilize physics to resolve small spherical inclusions, it does not mean it is not possible to resolve small features. Figure 45 shows an example output of FNv4 on the Runway-Faulty data set. Contrary to FNv3, it appears as though FNv4 was able to detect a spherical inclusion at the center of the imaging volume. This figure suggests that it is possible to improve upon the results of simulation two to form a more generalized architecture that can consistently resolve smaller features. Figure 43 shows the output of FNv2 to the Runway-Functional data set. In this figure, the full network was able to resolve the edges of the cube, but the permittivity in the center of the cube seems to indicate there is a sphere. This is likely due to bias in the network toward spherical objects. Bias in the network is likely due to a lack of diversity in the training data. This output suggests that the Random Shapes data set had a preference of spherical objects in the center of the imaging volume. This bias can be seen in each of the five sample outputs given. For example, observe Figure 44. The rightmost column shows swirling artifacts in the shapes of spheres. To improve upon this bias, the Random Shapes data set should be modified such that spheres and cubes can be drawn outside of the imaging volume. By writing spheres only to the center of the imaging volume, it is likely that larger spheres were always being centered and therefore inadvertently altering the uniform random distribution of the data set to a distribution that is 91 biased toward the center of the imaging volume. Because the VAE architectures are not able to fully describe the output space, it appears as though the full network compensates to a degree. For example, observe the permittivity color bars in each of the five examples. The best VAE output (middle column) always predicts an unrealistically high permittivity value while the full network always predicts a much lower permittivity value. This means that the full network has been constrained to utilizing sections of the VAE latent space that corresponded to lower and more accurate permittivity values. However, constraining to sections with smaller permittivity values likely means it will not be possible to resolve finer details such as edges. Figure 42 shows an example output of FNv1 on the training set (Random Shapes data set). This figure exemplifies how the full network attempts to predict the correct permittivity. The permittivity values the full network predicts are contained within the permittivity values of the data set (r = 1 − 60). The VAE will attempt to capture the edges of the shapes, but in doing so, it sacrifices a realistic permittivity. In this case, it appears as though the full network prioritized the permittivity values as opposed to capturing the finer detail in the imaging volume. Neither simulation one nor simulation two studied the effect of noise on the networks. Noise plays a great effect on the performance of any real world electromagnetic system, however, the VAE of the full network adds a sort of noise to the system. Even though the full network removes the random draw from the decoder, the decoder is still trained to decode a point drawn from a distribution; the full VAE maps input distributions to output distributions rather than input points to output points. For this reason, the full network is technically operating on noisy 92 inputs even though there is no noise present in the inputs. Because the network is operating on a distribution as opposed to a point, it is likely the network will generalize well to unseen points and this is supported by the results of simulation two. Although the best architecture was only able to achieve an error of 11.222 and was not able to achieve the MAE goal of r < 1, it has been shown that it is possible to detect features that are much smaller than the wavelength of the excitation or the width of the pulse. The center frequency of the pulse was 2 GHz with a maximum wavelength of 199.9 mm. This is twice the size of every dimension of the entire imaging volume. Similarly, if traditional radar theory is utilized, the range resolution can be resolved with equation (7) ∆R = c 2B (7) where ∆R is the range resolution, c is the speed of light in free space (3×108 m/s), and B is the bandwidth of the pulse [61]. Using equation 7 with a bandwidth of 1 GHz leads to a range resolution of ∆R = 150 mm. This is also larger than the entire size of the imaging volume. Therefore, while there is much to be improved, simulation two has proven that is possible to approximate the inverse scattering problem using the proposed method to potentially exceed existing radar imaging capabilities with the additional capability of estimating the permittivity of the target simultaneously. 93 (a) (b) (c) Figure 41: An example output of network FNv3 on the Human Leg data set. The dimensions of the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction. 94 (a) (b) (c) Figure 42: An example output of network FNv1 on the Random Shapes data set. The dimensions of the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction. 95 (a) (b) (c) Figure 43: An example output of network FNv3. The dimensions of the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction. 96 (a) (b) (c) Figure 44: An example output of network FNv3. The dimensions of the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction. 97 (a) (b) (c) Figure 45: An example output of network FNv4. The dimensions of the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction. 98 6 Conclusion This paper has presented a methodology that can estimate the solution to an inverse scattering problem which has relevance in applications such as subterranean and biomedical imaging. Two separate simulations were performed in this work. The first simulation presented a method that can be utilized for 1D imaging down a transmission line. The second simulation expanded upon the first simulation by considering 3D images. For the 1D case considered in simulation one, it was found that Convolutional Neural Networks (CNN)s with cross-convolutional pooling perform well for processing radar-like signals to estimate a permittivity profile down a transmission line. The networks performed subjectively very well with a minimum log of Mean Absolute Error (MAE) of 0.662 but were unable to obtain the goal of log(MAE) < 0. Simulation one can find application in plane wave imaging, transmission line characterization, and as an alternative transmission line method for determining the unknown impedance/permittivity of a material. For the 3D case considered in simulation two, it was found that the direct implementation of a CNN architecture on the simulated 3D data sets resulted in overfitting of the data. To improve generalization, a Variational Auto Encoder (VAE) was trained to draw the types of objects in the data set. Then, a CNN was trained to output a vector in the latent space the VAE uses to draw outputs. This method greatly improved generalization, however, it resulted in a loss of detail in the generated objects. Although the VAE improved generalization, it is likely that the VAE was limiting the performance of the overall imaging system. In the end, it was found that for a resolution of 1 voxel per mm the proposed method 99 achieved a minimum MAE of 11.222 but was unable to achieve the MAE goal of r < 1. However, the proposed method has been shown to discern features in the target that are much smaller than the excitation wavelength and much smaller than a traditional radar imaging bin can theoretically differentiate. Additionally, the proposed method simultaneously approximates the permittivity profile of the target and provides more information than ranging information alone. Simulation two can find application in biomedical imaging applications including cancer detection, subterranean imaging and non-destructive testing. All together, this work presented a study of methods which can be used to approximate the solution to an inverse scattering problem which can potentially exceed existing radar imaging capabilities while simultaneously estimating permittivity. The author hopes that one day this work will lead to the replacement of existing biomedical and subterranean imaging techniques for an inverse scattering technique that will produce easy to interpret images, remove the need for hazardous radiation, and be cost effective to the user. 100 7 Future Work The proposed methodology can discern small details of targets, however, there is much that can be improved upon. The Variational Auto Encoder (VAE) was not able to completely describe the output space and this ultimately affected full network performance. Future work could improve upon the VAE by improving the loss function. Developing a custom loss function that will enforce edge/feature detection and correct permittivity prediction simultaneously could greatly improve VAE performance and therefore full network performance. Other methods can be utilized when training the full network. The proposed methodology froze the decoder weights completely during the full network training. It is possible that alternating between training the VAE and training the full network could improve performance. In this way, a cost function can be utilized to penalize both the VAE and the full network in such a way that the VAE will be tailored to only draw objects that the full network will be able to utilize. Alternatively, the existing training can be utilized with the addition of a Generative Adversarial Network (GAN) to penalize the VAE for outputting an object that is unlike an object in the data set. The addition of a GAN could improve the performance of the VAE and therefore the performance of the full network. For the purpose of biomedical imaging, it is highly desirable to trust the network will not produce large artifacts that are not truly a part of the target. For example, if a person has an unknown foreign object inside of them, a medical professional can trust that an x-ray will not produce artifacts that are not truly in the scene. Currently, the proposed work will make predictions that are not necessarily based on the underlying physics due to the fact that no underlying physics of the 101 problem were ever enforced and the network was entirely data driven. Future work can improve the reliability of the proposed method by enforcing physics during training by utilizing Physics Inspired Neural Networks (PINN)s. In this way, the network should not be able to predict scenes that are not physically possible. Simulation of the forward problem takes a significant amount of time. For successful application of the proposed method, a significant amount of data must be simulated to train the network on realistic scenarios. Future work can develop a novel approach that will decrease the time necessary to simulate the forward problem to rapidly generate realistic data. This could be done through the use of a PINN because a typical neural network only takes a few hundred micro seconds to generate a prediction. An accurate and fast simulation method should greatly increase the proposed method’s performance. Finally, the proposed method can be tested with a working prototype. A working prototype will require the development of an antenna system, the required electronics to drive a signal onto the antenna, and a data acquisition system. Once completely assembled, development for real world applications could begin. 102 REFERENCES [1] X. Q. He, Z. Q. Zhu et al., “Review of GPR rebar detection,” in Progress In Electromagnetics Research Symposium proceedings, 2009, pp. 804–813. [2] R. M. Murugan, “An improved electrical impedance tomography (EIT) algorithm for the detection and diagnosis of early stages of breast cancer,” Ph.D. dissertation, University of Manitoba, 1999. [3] R. Fitzgerald, “Phase-sensitive x-ray imaging,” Physics Today, pp. 23–26, July 2000. [4] T. Dreiseidler, R. A. Mischkowski et al., “Comparison of cone-beam imaging with orthopantomography and computerized tomography for assessment in presurgical implant dentistry,” The International Journal of Oral & Maxillofacial Implants, vol. 24, no. 2, pp. 216––225, 2009. [5] E. Bresch, Y. C. Kim et al., “Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging,” IEEE Signal Processing Magazine, p. 123–132, May 2008. [6] M. Robinson, C. Bristow et al., Ground Penetrating Radar. British Society for Geomorphology, 2013. [7] J. C. Portinari, “The one-dimensional inverse scattering problem,” Ph.D. dissertation, Massachusetts Institute of Technology, 1966. [8] A. Farina, M. Betcke et al., “Multiple-view diffuse optical tomography system based on time-domain compressive measurements,” Optics Letters, vol. 42, no. 14, pp. 2822–2825, 2017. [9] D. C. Barber and B. H. Brown, “Applied potential tomography,” Journal of Physics E: Scientific Instruments, vol. 17, no. 9, p. 723–733, 1984. [10] G. Strobel, “Electrical resistivity tomography in environmental amelioration,” Ph.D. dissertation, University of Manitoba, 1996. [11] K. H. Ng, “Non-ionizing radiation sources, biological effects, emissions and exposures,” in Proceedings of the international conference on non-ionizing radiation, 2003, pp. 1–16. [12] J. Malmivuo and R. Plonsey, Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields. New York Oxford University Press, 1995. 103 [13] C. Dang, M. Darnajou et al., “Performance analysis of an electrical impedance tomography sensor with two sets of electrodes of different sizes,” in 9th World Congress on Industrial Process Tomography, 2018. [14] M. E. Anderson and G. E. Trahey, “A seminar on k-space applied to medical ultrasound,” Department of Biomedical Engineering, Duke University, Apr 2000. [15] J. A. Scales, Theory of Seismic Imaging. Samizdat Press, 1995. [16] O. Oncken, G. Asch et al., “Seismic imaging of a convergent continental margin and plateau in the central andes (andean continental research project 1996 (ANCORP’96)),” Geophysical Research, vol. 108, 2003. [17] A. Massa, S. Caorsi et al., “A comparative study of NN and SVM-based electromagnetic inverse scattering approaches to on-line detection of buried objects,” Applied Computational Electromagnetics Society, vol. 18, no. 2, p. 65–75, Jul 2003. [18] S. Haykin, Neural Networks and Learning Machines. Pearson, 2009, vol. 3. [19] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2016. [20] E. A. Marengo, “Inverse scattering by compressive sensing and signal subspace methods,” 2007 2nd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2007. [21] G. Carleo, I. Cirac et al., “Machine learning and the physical sciences,” Reviews of Modern Physics, vol. 91, no. 4, 2019. [22] A. V. Oppenheim, A. Willsky, and S. H. Nawab, Signals and Systems, 2nd ed. Prentice Hall, 1997. [23] K. Fukushima, “Neocognitron for handwritten digit recognition,” Neurocomputing, vol. 51, pp. 161–180, 2003. [24] S. Lawrence, C. L. Giles et al., “Face recognition: A convolutional neuralnetwork approach,” IEEE transactions on neural networks, vol. 8, no. 1, pp. 98–113, 1997. [25] L. Liu, C. Shen, and A. van. den. Hengel, “The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. 104 [26] ——, “Cross-convolutional-layer pooling for image recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2305–2313, 2016. [27] W. Shi, J. Caballero et al., “Is the deconvolution layer the same as a convolutional layer?” Computing Research Repository (CoRR), 2016. [28] M. D. Zeiler, D. Krishnan et al., “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010, pp. 2528–2535. [29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference for Learning Representations, 2015. [30] D. Kingma and M. Welling, “Auto-encoding variational bayes,” Computing Research Repository (CoRR), 2014. [31] L. Theis, W. Shi et al., “Lossy image compression with compressive autoencoders,” Computing Research Repository (CoRR), 2017. [32] N. M. N. Leite, E. T. Pereira et al., “Deep convolutional autoencoder for EEG noise filtering,” in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2018, pp. 2605–2612. [33] J. Tu, H. Liu et al., “Spatial-temporal data augmentation based on LSTM autoencoder network for skeleton-based human action recognition,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 3478–3482. [34] S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951. [35] I. Higgins, L. Matthey et al., “beta-vae: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations (ICLR), 2017. [36] M. Tschannen, O. Bachem, and M. Lucic, “Recent advances in autoencoderbased representation learning,” Computing Research Repository (CoRR), 2018. [37] C. Doersch, “Tutorial on variational autoencoders,” Computing Research Repository (CoRR), 2016. 105 [38] N. H. Jamali, K. A. H. Ping et al., “Image reconstruction based on combination of inverse scattering technique and total variation regularization method,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 5, no. 3, p. 569–576, March 2017. [39] A. C. Fannjiang, “Compressive inverse scattering: I. high-frequency SIMO/MISO and MIMO measurements,” Inverse Problems, 2010. [40] S. Li, Q. Zhang, and B. C. Ahn, “Wave-amplitude transmission matrix approach to the analysis of the electromagnetic planewave reflection and transmission at multilayer material boundaries,” Department of Radio and Communications Engineering, Graduate School of Electrical and Computer Engineering Chungbuk National University, 2019. [41] V. Matko and M. Milanovič, “Sensitivity and accuracy of dielectric measurements of liquids significantly improved by coupled capacitive-dependent quartz crystals,” Multidisciplinary Digital Publishing Institute - Sensors, vol. 21, no. 10, p. 3565, 2021. [42] J. Bezanson, A. Edelman et al., “Julia: A fresh approach to numerical computing,” Society for Industrial and Applied Mathematics Review, vol. 59, no. 1, pp. 65–98, 2017. [43] L. Li, L. G. Wang et al., “DeepNIS: Deep neural network for nonlinear electromagnetic inverse scattering,” IEEE Transactions on Antennas and Propagation, vol. 67, no. 3, pp. 1819–1825, 2018. [44] Y. Chen, Y. Xie et al., “Brain MRI super resolution using 3D deep densely connected neural networks,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 739–742. [45] S. J. Hamilton and A. Hauptmann, “Deep D-bar: Real-time electrical impedance tomography imaging with deep neural networks,” IEEE transactions on medical imaging, vol. 37, no. 10, pp. 2367–2377, 2018. [46] M. S. Kaufmann, A. Klotzsche et al., “Simultaneous multichannel multi-offset ground-penetrating radar measurements for soil characterization,” Vadose Zone Journal, vol. 19, no. 1, 2020. [47] S. Esmaeili, S. Kruse et al., “Resolution of lava tubes with ground penetrating radar: The TubeX project,” Journal of Geophysical Research: Planets, vol. 125, no. 5, 2020. [48] M. S. Kang, N. Kim et al., “3D GPR image-based UcNet for enhancing underground cavity detectability,” Remote Sensing, vol. 11, no. 21, 2019. 106 [49] M. Innes, E. Saba et al., “Fashionable modelling with flux,” Computing Research Repository (CoRR), 2018. [50] M. Innes, “Flux: Elegant machine learning with julia,” Journal of Open Source Software, 2018. [51] X. Glorot, Y. Bengio et al., “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, vol. 9. Proceedings of Machine Learning Research, 13–15 May 2010, pp. 249–256. [52] A. A. S. Sharma, S. Sharma, “Activation functions in neural networks,” International Journal of Engineering Applied Sciences and Technology, vol. 4, no. 12, 2020. [53] J. B. Jarvis, M. D. Janezic et al., Transmission/reflection and short-circuit line methods for measuring permittivity and permeability. National Institute of Standards and Technology, 1993. [54] T. Liebig, A. Rennings, and D. Erni, “openEMS–a free and open source equivalent-circuit (EC) FDTD simulation platform supporting cylindrical coordinates suitable for the analysis of traveling wave MRI applications,” International Journal of Numerical Modelling: Electronic Networks, Devices and Fields, vol. 26, no. 6, pp. 680–696, 2013. [55] T. Liebig. openEMS - open electromagnetic field solver. General and Theoretical Electrical Engineering (ATE), University of Duisburg-Essen. [Online]. Available: https://www.openEMS.de [56] E. J. Jaselskis, J. Grigas et al., “Dielectric properties of asphalt pavement,” Journal of materials in civil engineering, vol. 15, no. 5, pp. 427–434, 2003. [57] W. D. Hurt, “Multiterm debye dispersion relations for permittivity of muscle,” IEEE Transactions on Biomedical Engineering, no. 1, pp. 60–64, 1985. [58] B. Amin, M. A. Elahi et al., “An insight into bone dielectric properties variation: A foundation for electromagnetic medical devices,” in 2018 EMF-Med 1st World Conference on Biomedical Applications of Electromagnetic Fields (EMF-Med). IEEE, 2018. [59] W. Huda and R. B. Abrahams, “X-ray-based medical imaging and resolution,” American Journal of Roentgenology, vol. 204, no. 4, pp. W393–W397, 2015. 107 [60] A. Kastanos, “Convolutional VAE in flux,” alecokas.github.io, Sept. 2020. [Online]. Available: http://alecokas.github.io/julia/flux/vae/2020/07/22/ convolutional-vae-in-flux.html [61] B. R. Mahafza, Radar Systems Analysis and Design Using MATLAB Third Edition. Chapman and Hall/Chemical Rubber Company press, 2013. 108 ProQuest Number: 28776120 INFORMATION TO ALL USERS The quality and completeness of this reproduction is dependent on the quality and completeness of the copy made available to ProQuest. Distributed by ProQuest LLC ( 2022 ). Copyright of the Dissertation is held by the Author unless otherwise noted. This work may be used in accordance with the terms of the Creative Commons license or other rights statement, as indicated in the copyright statement or in the metadata associated with this work. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. This work is protected against unauthorized copying under Title 17, United States Code and other applicable copyright laws. Microform Edition where available © ProQuest LLC. No reproduction or digitization of the Microform Edition is authorized without permission of ProQuest LLC. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 USA