Uploaded by hussainshadab08

A NN approach to 3D imaging by approximating the inverse scattering problem

advertisement
A Neural Network Approach to 3D Imaging by Approximating the Inverse
Scattering Problem
BY
Jacob Michael Wilson, B.S.E.E.
A thesis submitted to the Graduate School
in partial fulfillment of the requirements
for the degree
Master of Science
Major Subject: Electrical Engineering
New Mexico State University
Las Cruces, New Mexico
October 2021
Jacob Wilson
Candidate
Electrical Engineering
Major
This thesis is approved on behalf of the faculty of New Mexico State University,
and it is acceptable in quality and form for publication.
Approved by the thesis committee:
Dr. Steven Sandoval
Chair Person
Dr. Muhammed Dawood
Committee Member
Dr. Michael DeAntonio
Committee Member
ii
DEDICATION
I would like to dedicate this document to my family and beautiful fiancé.
iii
ACKNOWLEDGMENTS
I would like to thank my family and fiancé for being so supportive of me. They
are always there for me no matter the difficulties I may face. I would also like to
thank my advisor Dr. Sandoval for going above and beyond to help me complete
this document. I could not have done it without his help. Additionally, I would
like to extend my thanks to Nicholas Von Wolff and the NMSU HPC team for
making this work computationally possible. Finally, and most importantly, may
all the glory be to God!
iv
VITA
April 02, 1997
Born in El Paso, Texas, USA
2015-2019
B.S.E.E., New Mexico State University,
Las Cruces, New Mexico.
PUBLICATIONS [or Papers Presented]
1. J. M. Wilson, “A Neural Network Approach to 3D Imaging by
Approximating the Inverse Scattering Problem” in preparation.
FIELD OF STUDY
Major Field: Electrical Engineering
Area of Specialty: Electromagnetics
v
ABSTRACT
A Neural Network Approach to 3D Imaging by Approximating the Inverse
Scattering Problem
BY
Jacob Michael Wilson, B.S.E.E.
MASTER OF SCIENCE
New Mexico State University
Las Cruces, New Mexico, 2021
Dr. Steven P. Sandoval, Chair
The goal of this work is to present a methodology for approximating the solution to an inverse electromagnetic scattering problem using a data-driven neural
network approach. Such a solution can be leveraged to resolve a 3D image without
resorting to the use of ionizing (hazardous) radiation—which some methods such
as X-Ray are limited by. Additionally, the proposed methodology has the potential to resolve images of greater resolution than the wavelength of the excitation
or the associated radar range bin. In this work, both 1D and 3D imaging problems were considered. The 1D simulation utilized a Convolutional Neural Network
(CNN) with cross-convolutional pooling to predict the impedance distribution of
a transmission line. The 3D simulation predicted the permittivity distribution of
simulated synthetic targets representing functional runway slabs, faulty runway
slabs, simplified human thighs, and randomly oriented cubes and spheres. To
vi
prevent overfitting and ensure appropriate generalization, the proposed method
combines two different neural network architectures which are trained in a multistage process. In particular, the decoder of a Variational Autoencoder (VAE) is
used to generate the output image, while a CNN estimates the physics related the
the physical problem and is used to extract relevant features to infer the contents
of the scene. Applications of this work potentially include biomedical imaging,
cancer detection, subterranean imaging, non-destructive testing, and material permittivity characterization via a free space or transmission line approach.
vii
CONTENTS
LIST OF TABLES
xi
LIST OF FIGURES
xiv
LIST OF ACRONYMS
xv
1 Introduction
1
1.1
Purpose of the Presented Work . . . . . . . . . . . . . . . . . . .
1
1.2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2 Background
2.1
2.2
2.3
3
Wave Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1.1
Forward Scattering Problem . . . . . . . . . . . . . . . . .
3
2.1.2
Inverse Scattering Problem . . . . . . . . . . . . . . . . . .
4
Imaging Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2.1
Electromagnetic Imaging . . . . . . . . . . . . . . . . . . .
6
2.2.2
Acoustical Imaging . . . . . . . . . . . . . . . . . . . . . .
9
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3.1
Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3.2
Support Vector Machines . . . . . . . . . . . . . . . . . . .
10
2.3.3
Neural Networks . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.4
Variational Autoencoder . . . . . . . . . . . . . . . . . . .
17
2.3.5
Other Algorithms . . . . . . . . . . . . . . . . . . . . . . .
20
3 Method Overview
22
viii
4 Simulation One
24
4.1
Data Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.2
A Neural Network Approach to the Problem . . . . . . . . . . . .
28
4.2.1
Initialization and Training . . . . . . . . . . . . . . . . . .
29
4.2.2
Initial Architecture Evaluation . . . . . . . . . . . . . . . .
30
4.2.3
Secondary Architecture Evaluation . . . . . . . . . . . . .
37
4.2.4
Tertiary Architectures . . . . . . . . . . . . . . . . . . . .
41
4.2.5
Final Evaluation and Summary of Results . . . . . . . . .
43
5 Simulation Two
5.1
5.2
49
Data Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.1.1
Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Neural Network Design . . . . . . . . . . . . . . . . . . . . . . . .
60
5.2.1
Variational Autoencoder . . . . . . . . . . . . . . . . . . .
66
5.2.2
Full Network . . . . . . . . . . . . . . . . . . . . . . . . .
83
5.2.3
Summary of Results . . . . . . . . . . . . . . . . . . . . .
90
6 Conclusion
99
7 Future Work
101
REFERENCES
103
ix
LIST OF TABLES
1
Summary of the synthesized data set of simulation one. . . . . . .
27
2
Simulation one notation. . . . . . . . . . . . . . . . . . . . . . . .
30
3
Initial architectures summary. . . . . . . . . . . . . . . . . . . . .
32
4
Initial architecture evaluation converged errors. . . . . . . . . . .
36
5
Secondary architectures summary. . . . . . . . . . . . . . . . . . .
37
6
Secondary architecture evaluation converged errors. . . . . . . . .
39
7
Tertiary architectures summary. . . . . . . . . . . . . . . . . . . .
41
8
Tertiary architecture evaluation converged errors. . . . . . . . . .
42
9
The log of the expected value of the converged error of the best
performing architectures of simulation one. . . . . . . . . . . . . .
45
10
Summary of the parameters used for simulation in openEMS. . . .
54
11
Summary of the four simulated data sets of simulation two. . . . .
60
12
Simulation two notation. . . . . . . . . . . . . . . . . . . . . . . .
65
13
Normalization method summary. . . . . . . . . . . . . . . . . . .
69
14
Summary of the layer attributes for each of the models considered
in the most successful architectures. . . . . . . . . . . . . . . . . .
71
15
Trial 1 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
16
Trial 2 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
17
Trial 3 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
18
Trial 4 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
19
Trial 5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
20
Trial 6 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
21
Trial 7 summary. . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
x
22
Summary of the layer attributes for each of the models considered
in the first three variations of full networks. . . . . . . . . . . . .
23
Summary of the layer attributes for each of the models considered
in the final two variations of full networks. . . . . . . . . . . . . .
24
85
87
Summary of the minimum error for each of the models considered
in the final two variations of full networks. . . . . . . . . . . . . .
xi
88
LIST OF FIGURES
1
Illustration of the forward problem. . . . . . . . . . . . . . . . . .
4
2
Illustration of the inverse problem. . . . . . . . . . . . . . . . . .
5
3
A simple example that shows it can be difficult to determine the
origination of reflected waves. . . . . . . . . . . . . . . . . . . . .
5
4
Illustration of a simple neural network consisting of two dense layers. 12
5
Illustration of a simple convolutional network consisting of two layers. 13
6
Operation of a simple cross convolutional layer. . . . . . . . . . .
7
Two possible implementations of a convolutional transpose operation. 15
8
An example of an autoencoder. . . . . . . . . . . . . . . . . . . .
18
9
An example of a VAE. . . . . . . . . . . . . . . . . . . . . . . . .
20
10
Illustration of the forward model simulation set up. . . . . . . . .
25
11
Simulation one forward problem Ansys Circuit. . . . . . . . . . .
25
12
Simulation one initial architecture evaluation error for architectures
without cross-convolutional layers. . . . . . . . . . . . . . . . . . .
13
35
Simulation one secondary architecture evaluation error for architectures without cross-convolutional layers. . . . . . . . . . . . . .
15
34
Simulation one initial architecture evaluation error for architectures
with cross-convolutional layers. . . . . . . . . . . . . . . . . . . .
14
14
39
Simulation one secondary architecture evaluation error for architectures with cross-convolutional layers. . . . . . . . . . . . . . . .
40
16
Simulation one tertiary architecture evaluation error. . . . . . . .
42
17
Simulation one expected value of the best performing architectures
error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
45
18
An example output of the network (‘cCDDv1’). . . . . . . . . . .
46
19
An example output of the network (‘cCDDv1’). . . . . . . . . . .
47
20
An example output of the network (‘cCDDv1’). . . . . . . . . . .
48
21
Possible transmitter and receiver configurations. . . . . . . . . . .
51
22
Chosen transmitter/receiver SIMO configuration. . . . . . . . . .
53
23
A sample point from the Runway-Functional data set. . . . . . . .
55
24
A sample point from the Runway-Faulty data set. . . . . . . . . .
56
25
A sample point from the Human Leg data set. . . . . . . . . . . .
58
26
A sample point from the random shapes data set. . . . . . . . . .
59
27
An illustration of a CNN that will be used in conjuction with the
decoder of a VAE to create the proposed full network. . . . . . . .
62
28
An illustration of the VAE architecture utilized. . . . . . . . . . .
63
29
An illustration of a full network. . . . . . . . . . . . . . . . . . . .
64
30
An example of a “flat” output from the VAE. . . . . . . . . . . . .
68
31
Error of the top performing VAE architectures. . . . . . . . . . .
72
32
A sample output from VAEv3 . . . . . . . . . . . . . . . . . . . . .
79
33
A sample output from VAEv3. . . . . . . . . . . . . . . . . . . . .
80
34
A sample output from VAEv13. . . . . . . . . . . . . . . . . . . . .
81
35
A sample output from VAEv13. . . . . . . . . . . . . . . . . . . . .
82
36
Training and validation error of FNv1. . . . . . . . . . . . . . . .
85
37
Training and validation error of FNv2. . . . . . . . . . . . . . . .
86
38
Training and validation error of FNv3. . . . . . . . . . . . . . . . .
87
39
Training and validation error of FNv4. . . . . . . . . . . . . . . . .
88
40
Training and validation error of FNv5. . . . . . . . . . . . . . . . .
89
41
An example output of network FNv3 on the Human Leg data set.
94
xiii
42
An example output of network FNv1 on the Random Shapes data
set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
43
An example output of network FNv3. . . . . . . . . . . . . . . . .
96
44
An example output of network FNv3. . . . . . . . . . . . . . . . .
97
45
An example output of network FNv4. . . . . . . . . . . . . . . . .
98
xiv
LIST OF ACRONYMS
Acronym
Meaning
1D
2D
3D
ADAM
CNN
CSV
CT
DOT
GAN
GPR
GPU
GUI
HPC
IQ
KL
MAE
MIMO
MISO
ML
MRI
NDT
OMP
PINN
PML
ReLU
SIMO
SISO
SVM
VAE
One Dimensional
Two Dimensional
Three Dimensional
ADaptive Moment Estimation
Convolutional Neural Network
Comma Separated Value
Computed Tomography
Diffuse Optical Tomography
Generative Adversarial Network
Ground Penetrating Radar
Graphics Processing Unit
Graphical User Interface
High Performance Computer
In-phase Quadrature
Kullback–Leibler
Mean Absolute Error
Multiple Input Multiple Output
Multiple Input Single Output
Machine Learning
Magnetic Resonance Imaging
Non-Destructive Testing
Orthogonal Matching Pursuit
Physics Inspired Neural Network
Perfectly Matched Layer
Rectified Linear Unit
Single Input Multiple Output
Single Input Singe Output
Support Vector Machine
Variational Autoencoder
xv
1
Introduction
Biomedical and subterranean imaging have found great importance in modern
society. From industrial Non-Destructive Testing (NDT) [1], to locating cancerous
growths within a person [2], these types of imaging have had a profound impact on
our lives. Biomedical and subterranean imaging covers a very broad area of study.
Arguably the most successful biomedical techniques are x-ray [3, 4] and MRI [5].
Both are able to produce high resolution images, but both have drawbacks. X-ray
uses potentially harmful radiation [3] and MRI requires a very strong synthetic
magnetic field [5]. Possibly the most successful subterranean imaging technique is
Ground Penetrating Radar (GPR). GPR is capable of locating changes in mediums
underground, but struggles to produce easy to interpret results [6].
While many methods exist to perform both biomedical and subterranean imaging, no method currently exists that can perform imaging regardless of frequency
or excitation. Existing methods leverage aspects of their respective problems to
generate images, although the physics of the methods remain the same. These
are all examples of the inverse scattering problem and if the inverse scattering
problem can be solved or a solution accurately approximated, a general type of
imaging can be implemented.
1.1
Purpose of the Presented Work
The goal of this thesis is to provide a methodology that can be utilized for 3D
subterranean and biomedical imaging by approximating a solution to the inverse
scattering problem. The proposed method will approximate the location of objects
as well as their electrical properties.
1
First, a 1D method will be developed that will approximate the impedance
distribution down a transmission line from a measured reflected pulse. Studying
the 1D case can provide important information to better inform the 3D imaging
problem. Next, the proposed 3D imaging technique will show that it is possible
to produce a system which can potentially improve upon existing radar imaging
systems while simultaneously estimating the permittivity of the target.
1.2
Overview
This thesis is organized as follows. Chapter 2 provides relevant background for
imaging via inverse scattering including a brief description of existing popular
methods. Chapter 3 presents an overview of two experiments that were considered
in this work which are presented in Chapters 4 and 5. In particular, Chapter 4
presents the 1D case termed “simulation one”, while Chapter 5 presents the 3D
case termed “simulation two”. Finally, Chapters 6 presents conclusions for both
experiments and Chapter 7 discusses possible future work.
2
2
Background
2.1
Wave Scattering
This work will adopt the following terminology. The excitation is the source of energy that creates a wave. It is usually an antenna of some sort. The waves are the
electromagnetic or acoustical waves that are generated due to the excitation. The
waves can be categorized into the incident, reflected, and transmitted waves. The
incident waves are those that are directly created by the excitation. The reflected
waves are created by the collision of the incident wave upon a boundary such that
energy is rejected from the boundary. The transmitted waves are accepted by the
boundary after the collision of an incident wave upon that boundary. Finally, the
system is the object that is being imaged.
2.1.1
Forward Scattering Problem
The forward problem refers to a scenario where the excitation and system are
known and the resulting reflected waves are unknown [7]. This is representative of
many common problems in electromagnetic theory. Figure 1 shows an illustration
of this type of problem.
3
Excitation and Generated
Fields
Known
Reflected
Waves
System
Unknown
Known
Figure 1: Illustration of the forward problem. First, a source generates an excitation which
propagates through space and strikes a target resulting in a reflected wave. In this scenario,
the excitation signal and the target are known, while the reflected wave is unknown. Thus, the
forward problem is that of determining the unknown reflected wave.
2.1.2
Inverse Scattering Problem
The inverse scattering problem is opposite to the forward problem, in that the
excitation and resulting fields are known (possibly from measurements), but we
do not know what system caused these resulting fields [7]. This type of problem is
illustrated in Figure 2. Theoretically, the problem is undetermined and there are
many systems that can result in the same measurements. Intuitively, this can be
understood using a simple example. In Figure 3, we can see that for an isotropic
source, objects that are equidistant from the transmitter and receiver will result
in the same measured reflections.
4
Excitation and Generated
Waves
Known
Reflected
Waves
System
Measured
Unknown
Figure 2: Illustration of the Inverse Problem. First, a source generates an wave which propagates
through space and strikes a target resulting in a reflected wave. In this scenario, the excitation
and reflected waves are known, while the target is unknown. Thus, the inverse problem is that
of determining the unknown system.
Figure 3: Illustration of a scenario with an isotropic radiator emitting a pulse to determine
the unknown location of a target. This scenario shows a simple example from which it can be
difficult to determine the origination of reflected waves.
5
2.2
Imaging Methods
Imaging via inverse scattering is the generation of an image or scene by reconstructing an unknown system from reflected wave measurements. The measurements can either be taken from mechanical waves (e.g. vibrations/sound) or electromagnetic waves. Both methods of imaging will be briefly reviewed in this
document.
Many methods of electromagnetic imaging are currently in use and each has its
own method of approximating a solution to the inverse scattering problem. Some
examples include X-ray/Computed Tomography (CT), [3,4], Magnetic Resonance
Imaging (MRI) [5], Diffuse Optical Tomography (DOT) [8], and impedance tomography [9] for biomedical applications and Ground Penetrating Radar (GPR) [1]
and impedance tomography [10] for Non-Destructive Testing (NDT) and subterranean imaging.
2.2.1
Electromagnetic Imaging
Electromagnetic imaging methods are presented first. Electromagnetic imaging
refers to the imaging of a scene using an electromagnetic wave to sample the
target. This section will present biomedical electromagnetic imaging methods
before presenting a subterranean imaging method.
X-Ray Imaging X-ray imaging is widely in use today and it is likely that the
reader has had some form of X-ray imaging used upon themselves. X-ray imaging
works by exciting the scene using a high frequency electromagnetic wave and
measuring the resulting “shadow”. Due to the high frequency of the excitation
wave, energy attenuates quickly in certain objects as compared to others [3].
6
X-ray imaging exploits this high attenuation characteristic of the excitation to
avoid solving the inverse scattering problem. The result of a plain X-ray image
is therefore, two dimensional. However, image processing techniques can be used
in conjunction with taking multiple images in a 180◦ radius around a scene to
produce high resolution 3D images. These 3D images are known as CT [4].
While X-ray imaging is highly effective for acquiring high resolution images,
it has drawbacks. For example, X-ray imaging uses electromagnetic waves at an
ionizing frequencies to obtain images. Unfortunately, ionizing radiation is energy
that produces charged particles as it passes through an object [11] and can cause
radiation damage to biological tissue [3].
Magnetic Resonance Imaging MRI uses a different approach to imaging than
X-ray. It works by first subjecting a scene to a strong magnetic field which aligns
the spin of protons within the scene being imaged. An electromagnetic pulse is
then used to excite the scene. The electromagnetic pulse causes a response that
can be measured and interpreted into an image [5]. This method is capable of
producing high resolution 3D images without the use of ionizing radiation. At
the time of writing this work, a typical cost for an MRI machine is approximately
$100k-3M, which is significantly more expensive than an X-ray machine which
costs approximately $40k-200k.
Diffuse Optical Tomography DOT describes the use of an optical laser to
illuminate an object and measure the reflected or transmitted photons to form an
image. The excitations utilized are electromagnetic waves at optical frequencies.
Consequently, the high frequencies associated with optical waves prevent deep
penetration [8] which limits the depth of imaging.
7
Impedance Tomography Impedance tomography describes the use of an electromagnetic wave with the intention of determining the impedance distribution
of the unknown system. There are many methods that are used to determine
an impedance distribution. A common method used to solve for the unknown
impedance distribution is to solve some approximation of Laplace’s equation
∇Φ = 0
(1)
where Φ is the electric potential [12]. Unlike MRI or CT, impedance tomography
has the ability to determine the conductivity of biological material. Impedance
tomography has the additional advantage that it can estimate the actual properties of the tissue which it is imaging and not only where a change in medium
occurs. However, it also has the drawback of poor spacial resolution [13].
Ground Penetrating Radar GPR is a technique used to map subterranean
objects. It works by injecting an electromagnetic pulse into the ground and measuring the reflected response. GPR can be used to map objects underground, but
it takes a bit specialized training to interpret the images. Often there is no shielding, so energy radiates in all directions. When the returned signal is measured,
the response may not have originated from directly below the imaging system [6].
Although GPR can produce subterranean images, without proper training these
images can often be misinterpreted.
8
2.2.2
Acoustical Imaging
Next, acoustical imaging methods will be presented. Acoustical imaging is very
similar to electromagnetic imaging. The type of excitation is the main difference
between the two methods. Electrical waves do not require a physical medium
as mechanical waves do, thus there are some minor differences. Below several
types of acoustical imaging are discussed. Similar to the previous electromagnetic
imaging section, a biomedical imaging method will be presented first followed by
a subterranean method.
Sonography Sonography is the use of a high frequency sound pulse to image a
scene. The method utilizes an array of sensors to transmit and receive acoustical
pulses then calculate the delay between the transmit sound and echo to determine
where objects are located. This method can be used to create two or threedimensional images. Sonography has the specific advantage that it can image
a scene without the use of any ionizing radiation, however, it cannot be used
to image certain objects such as the lungs. This is due to the large acoustical
mismatch between the lungs and the surrounding tissue [14].
Seismic Imaging Seismic imaging bears remarkable similarity to GPR. The
main difference is the change in excitation. Seismic imaging is the use of a mechanical wave generally applied by using an explosive and is measured with some
sort of transducer - usually a geophone. Measurements taken along a grid are
recorded after the explosions are introduced and are used to produce images [15].
Seismic imaging produces images similar to that of GPR, but is capable of imaging
at lower depths than GPR [16].
9
2.3
Machine Learning
Machine learning (ML) is a data driven [17] approach to solving a problem. For supervised learning, input examples are fed into a model and the model parameters
are recursively modified in such a way that outputs are driven to match some desired response [18,19]. Machine learning approaches potentially hold great promise
for estimating the solution to the inverse scattering problems because they can estimate a solution to cases where no analytical solution can/has been found. Next,
a few common types of machine learning will be briefly reviewed.
2.3.1
Bayesian
Bayesian methods have been used to approximate the solution to inverse scattering
problems [20]. Bayesian machine learning seeks to maximize the likelihood of a
certain outcome. It involves using some information that is known a-priori (or
known before-hand) to calculate which output is most likely to occur given a
certain input. This method has the ability to solve problems with few training
points because it forces the algorithm to be biased toward particular outputs.
2.3.2
Support Vector Machines
Support Vector Machines (SVMs) work by first projecting the data into a higher
dimension data space. Projecting to a higher dimensional space potentially allows
a non-linearly separable data set in the input space to be transformed into a
linearly separable data set in the higher dimensional space [17].
10
2.3.3
Neural Networks
Neural networks were utilized throughout this work work. For this reason, more
detail will be given on neural networks then other machine learning methods.
Neural networks work by adjusting weighted connections between each neuron
in a network of neurons. Typically neural networks are trained using a backpropagation algorithm to minimize a cost function [17]. One simple type of neural
network is composed entirely of layers consisting of fully-connected feed-forward
neurons and is known as a dense network. Dense networks contain layers that are
often referred to as dense layers and are illustrated in Figure 4. In the figure,
each neuron is represented by a circular node, each line segment between two
nodes represents a weighted connection, the square nodes represent inputs to the
neural network, and line segments ending in arrowheads represent the output
of the network. For each neuron, the output is formed by summing the result
of all weighted connections entering that node then passing that result through
a nonlinear activation function. Thus, a single neuron can be modeled by the
equation
y = σ(wT x + b)
(2)
where y is the output of the neuron, σ(·) is a nonlinear function, x is the input
vector, w is the weight vector, and b is a bias [17].
While the use of dense layers in a neural network gives the system significant flexibility, it also complicates training of the network by introducing a large
number of degrees of freedom [21]. One way to reduce the number of degrees of
freedom is to utilize convolutional layers as shown in Figure 5. Like dense layers,
convolutional layers are also feed-forward. However, unlike dense layers, convolu-
11
Figure 4: Illustration of a simple neural network consisting of two dense layers. Note that each
neuron is fully connected to the previous layer.
tional layers are not fully connected. Rather, each neuron in a convolutional layer
is connected to a small set of inputs termed the receptive field and weights are
shared among neurons.
One way to interpret the shared connections in a convolutional layer is to imagine that a neuron “slides” it’s receptive field over the input in a way analogous to a
convolution operation [22]. Convolutional Neural Networks (CNNs) significantly
reduce the number of degrees of freedom resulting in improved convergence and
training time. While reducing the number of degrees of freedom results in some
loss of flexibility in the network, the overall number of neurons may be adjusted
to tailor the network for a particular amount of model complexity [23, 24].
12
Figure 5: Illustration of a simple convolutional network where different colors represent different
receptive fields. Note that each neuron is only connected to a subset of the neurons in the
previous layer.
One variation of a CNN architecture uses cross-convolutional layers [25,26]. A
cross-convolutional layer connects neurons which have a common receptive field.
Thus, instead of simply flattening the output of a CNN layer, the cross-convolution
layer helps to preserve spatial information while at the same time reducing dimensionality. This allows for the use of more complex structures while minimizing
computational costs. Figure 6 illustrates how a simple cross convolution layer
works.
Similar to convolutional layers are convolutional transpose layers. Convolutional transpose layers also slide an activation function over an input, however
the goal of convolutional transpose layers is to increase the output dimensionality
as opposed to decrease or keep the same dimensionality [27]. Convolutional trans13
Figure 6: Operation of a simple cross convolutional layer. Cross convolutional layers help to
preserve spatial information that can be lost during a flattening process [25, 26].
pose layers can be trained to invert a convolutional operation [28]. An example
of how a convolutional transpose layer can work is shown in Figure 7.
14
(a)
(b)
Figure 7: Two possible implementations of a convolutional transpose operation. Purple shading
represents filter receptive fields. Grey shading represents zero padding. (a) Weights can be
oriented such that a filter kernel is only a single pixel but the resulting convolution operation
is many pixels. (b) Padding can be placed between and around actual image pixels such that a
normal convolutional operation will result in an increase in dimensionality.
15
Neural Network Training Training of neural networks refers to any algorithm
that is used to obtain weights so that the network’s response is tuned to a particular data set. The convergence of training algorithms is strongly dependent both
the initial network parameters, and the algorithm which is used to optimize those
parameters. The most common method of training is likely the backpropagation
algorithm in which the weights are moved in a direction negative to the gradient
of the cost function [18, 19].
In this work, an advanced form of training called the ADaptive Moment estimation (ADAM) algorithm is utilized for network training [29]. ADAM works by
adaptively changing the step size (learning rate) for each weight in the network.
The algorithm is known to have robust default parameters that need little tuning
and pseudocode for the algorithm is given in Algorithm 1.
Algorithm 1 ADAM optimizer
1: Require: α: Stepsize
2: Require: β
ffl 1 , β 2 ∈ [0,1): Exponential decay rates for the moment estimates
3: Require:
(θ) : Stochastic objective function with parameters θ
4: Require: θ0 : Initial parameter vector
5: m0 ← 0 (Initialize 1st moment vector)
6: v0 ← 0(Initialize 2nd moment vector)
7: t ← 0(Initialize timestep)
8: while θ t not converged do
9:
t ← t + 1ffl
10:
gt ← ∇θ t (θt−1 ) (Get gradients w.r.t stochastic objective at timestep t)
11:
mt ← β1 ∗ mt−1 + (1 − β1 ) ∗ gt (Update biased first moment estimate)
12:
vt ← β2 ∗ vt−1 + (1 − β2 ) ∗ gt2 (Update biased second raw moment estimate)
13:
m̂t ← mt /(1 − β1t )(Compute bias-corrected first moment estimate)
14:
vˆt ← vt /(1 − β2t )(Compute
bias-corrected second moment estimate)
√
15:
θt ← θt−1 − α ∗ m̂t /( vˆt + )(Update parameters)
16: end while
17: return θt (Resulting parameters)
16
Neural Network Cost Function Selection of an appropriate cost function is
necessary for good convergence of the network. For this work, the cost function selected for all networks execpt the variational autoencoder was the Mean Absolute
Error (MAE)
PN
MAE =
i=1
|xi − yi |
N
(3)
where N is the total number of data points in the data set, i is the current point,
x is the output of the network for point i, and y is the desired output of the
network for point i. For purposes of this work, the MAE was empirically found
to be superior to the mean squared error loss function.
2.3.4
Variational Autoencoder
A Variational Autoencoder (VAE) is a special type of neural network that is
capable of generating data. Once trained, a VAE can be used to draw data
points that are similar to but not exactly like training points within a data set.
Data generation has many applications such as generating new data to train other
neural networks [30].
VAE’s are a part of a class of neural networks called autoencoders. An autoencoder is a type of network that is trained to replicate it’s input by first compressing
the input into a smaller latent space then rebuilding the input. By compressing
and reconstructing the data, the network is first learning to encode an input into a
low dimensional space. Then, it learns to decode the compressed version back into
an estimate of the original input. This method of compressing and reconstructing
data is useful for data compression, noise removal, and data generation [31–33].
By removing the encoder and directly giving a vector to the decoder, new data
17
Input Data
Encoder
Compressed Input (Latent Vector/Space)
Decoder
Reconstructed Input
Figure 8: An example of an autoencoder. The input data is encoded by neural network layers
inside of the encoder into a latent space vector. The latent space vector then decoded by the
decoder to reconstruct the input. Once trained, the decoder can be removed from the encoder
and can generate new data when given a latent space vector.
points can be generated. Note that the encoder and decoder can be composed of
many different types of layers including convolutional layers. An example of an
autoencoder is shown in Figure 8.
VAEs improve upon basic autoencoders by enforcing additional structure in
the latent space such that points that are closely related in the input space will
be closely related in the latent space. By adding this structure to the latent
space, data generation is greatly simplified. This simplification occurs because
particular types of data can be generated by first using the encoder to find where
in the latent space certain types of data reside. Then, slightly different latent
space vectors can be input into the decoder to generate similar but not exactly
the same data points [30].
The latent space structure of VAEs is enforced by applying a random draw
after the encoder and by utilizing a special type of loss function. By applying a
18
random draw after the encoder, the network will no longer learn a mapping from
input to output, but it will learn a mapping from an input distribution to an
output distribution. By learning a distribution as opposed to a single point, the
structure of the latent space is improved.
The loss function penalizes reconstruction error just like a normal neural network or autoencoder. However, VAE’s additionally add a term that penalizes
latent space vectors that are not drawn from a Gaussian distribution. The additional penalty term is known as the KL divergence [34]. KL divergence is used to
estimate the difference between two distributions. The typical VAE loss function
utilized is
L = R − βK
(4)
where L is the total loss, R is the reconstruction loss (often the binary cross entropy), β is a weighting parameter, and K is the KL divergence. The introduction
of the β parameter allows more emphasis to be applied to the reconstruction loss
or to the latent space structure and when applied, the network is known as a
disentangled or β VAE [35]. The binary cross entropy is given by
R = −y ∗ log(ŷ + ) − (1 − y) ∗ log(1 − ŷ + )
(5)
where y is the input, ŷ is the output and is added for numerical stability. Additionally, Figure 9 illustrates a sample VAE architecture.
A full explanation of a VAE is beyond the scope of this text. For a great
introduction to VAE’s the reader is referred to [36] and [37]. For the original
VAE, the reader is referred to [30] and finally, for a disentangled VAE, the reader
19
Input Data
Encoder
Random Draw
+ x N(0,1)
Compressed Input (Latent Vector/Space)
Decoder
Reconstructed Input
Figure 9: An example of a VAE. The encoder encodes the input into a normal distribution with
mean µ and variance σ. A value is then drawn from a normal distribution with mean µ and
variance σ. The randomly drawn value is then decoded to reconstruct the input.
is referred to [35].
2.3.5
Other Algorithms
It is worth noting a few other computational methods that may not be considered machine learning but have been used to solve the inverse scattering problem.
In [38], the researchers used a method known as conjugate gradient to solve a cost
function related to the physics behind the problem. In [39], the authors used a
method known as basis pursuit to solve this problem given sparsity constraints
and a desire to minimize the amount of sensing elements. Additionally, a method
known as Orthogonal Matching Pursuit (OMP) has also been used to solve a sim-
20
ilar problem with similar constraints. Although these algorithms are not directly
machine learning algorithms, it is worth noting that there are other computational
approaches to solving inverse scattering. Each works by setting up a cost function
that is related to the underlying physics. The cost function does not need to (and
likely will not) have an available analytical solution.
21
3
Method Overview
In this work a method of imaging will be proposed which differs from the previously discussed imaging methods in that it not only images the scene, but it is also
obtains electrical properties of objects in the scene. This is similar in nature to
impedance tomography, however, the proposed methodology is based on the use
of radiated waves instead of electrostatic electrodes. The proposed methodology
is also very similar to a conventional radar system in that it illuminates a target with an electromagnetic wave, however, the proposed method also estimates
electrical properties of objects in the scene. This additional information allows
the internal composition of the object to be estimated, not simply the relative
position of the objects’ boundary surfaces. Moreover, this additional information
can aid in estimating the location and properties of objects which are typically
hidden/masked when only surface reflections are considered.
Although previous attempts have been made to design a imaging system similar to the proposed method of imaging [2, 21], these attempts have fallen short
most likely because the problem is ill-posed and underdetermined [2, 21].
In this work, only an ideal environment is considered, and no part of the
reflected wave is considered clutter. The entirety of the target is desired to be
recovered. Specifically, it is desired to recover the profile of electrical properties
(impedance and permittivity) of the target such that material boundaries and
object features can be easily discerned.
In the following chapters two problems are considered. First, only a representative one-dimensional version of the problem at hand is examined. This
provides a simplified foundation on which to later approach the more challenging
22
three-dimensional problem. To this end, a transmission line analysis is utilized
to represent the one-dimensional scattering problem. This allows a complicated
electromagnetic problem to be represented by a simple electrical circuit [40]. The
second simulation will seek to take the results from the first simulation and generalize the results to three dimensions. Each simulation and the respective results
will now be presented independently.
23
4
Simulation One
In this chapter, a method is formulated for one-dimensional imaging assuming an
ideal lossless transmission line. The approach taken in this work is to design a
neural network to estimate the characteristic impedance along the transmission
line from an observed reflection. As will be seen, the development of such a neural
network architecture will aid in generalization to 3D considered in simulation two.
For simulation purposes, the system of interest will be modeled as several lossless
transmission lines in series where each line has a different length and characteristic
impedance. The lines are modeled in series to simulate plane waves incident upon
planar boundaries [40]. This is illustrated in Figure 10.
The overall goal is to completely determine the unknown characteristic impedance
distribution along the length of the transmission line using measured reflections.
The author is setting a goal to achieve a Mean Absolute Error (MAE) of impedance
prediction of ±1 Ω to roughly correspond with permittivity measurement accuracy currently achievable through transmission line measurement techniques [41].
The log of the errors will be reported, therefore, the goal is to achieve log 1 = 0
error.
The remainder of this chapter is organized as follows. First, the methodology
for generating the training and testing data set is presented. Second, several
potential architectures are presented, and their performance evaluated on the
data set.
24
Z
Source
TL1
TL2
TL3...
TLN
A=
V
Figure 10: Illustration of the forward model simulation set up. The source will output a rectangular sinusoidal pulse. The source has an input impedance of Zsource . The darkened lines
represent each lossless transmission line. The first line will always be matched to the source,
but every other line will have a slight mismatch. There are an arbitrary amount of lossless
transmission lines. The last line will have a high attenuation before connecting to an open stub
to represent no further energy return after the last line.
4.1
Data Synthesis
To utilize machine learning methods on this problem, training and testing data
must first be acquired or simulated. Simulated data for the forward problem was
obtained using Ansys Electromagnetics Desktop - Circuit Designer 2018.2. The
forward model utilized assumed an ideal transmission line consisting of 5 elements
in series, terminated with an open end and is shown in Figure 11.
Figure 11: Set up in Ansys Circuit. It consists of a port and 5 transmission lines connected in
series and terminated in an open stub. Variables were created for the length and characteristic
impedance of each of the lines to vary in a simulation.
In order to simulate reflections from a transmission line, a pulse was injected
into one end of the transmission line. The pulse was generated using an In-phase
25
and Quadrature (IQ) modulated source. The IQ waveform was generated using
the Julia programming language [42] and was imported into Ansys using a text
file. The IQ waveform consisted of a pulsed sine wave 1 ns in duration centered
at a frequency of 2 GHz with a peak voltage of 170 V sampled every ∼4.2 ps
(fs ≈ 238 GHz). All simulations had an overall duration of 250 ns.
Ansys Circuit Designer has the capability of “sweeping” over all possible combinations of a discretized set of circuit parameters to optimize a circuit. The range
of swept values is shown in Table 1. This functionality was leveraged to produce
several different unique observation points.
The “range” variable in Table 1 represents the range of physical lengths of
the line considered in units of in degrees (360◦ corresponds to one wavelength).
Similarly, the “count” variable represents the number of quantization levels for
each parameter of each segment of the transmission line. For example, the first
transmission line segment (TL1) is simulated using thee different lengths between
1800◦ and 2700◦ , and each of the three lengths is simulated using all other possible
combinations of the other sweep variables.
A total of 2916 simulated reflections were generated and consisted of approximately 4 GB of data. The number of simulated reflections was constrained to be
below 3,000 due to computational limitations imposed by Ansys Circuit Designer.
Although the simulation consists of five impedance line segments, there exists
the distinct possibility that two adjacent segments have the same impedance. In
this case, the simulation effectively consists of fewer impedance line segments.
Thus, the simulation configuration described above actually simulates cases with
up to five impedance line segments.
The first impedance line segment stays fixed at 377 Ω (the approximate impedance
26
Table 1: Summary of the synthesized data set. The counts were varied to keep the total samples
below 3,000 for machine limitations. The total number of samples was 2916. The lengths given
are electrical lengths where 360◦ correspond to one wavelength. (TL - Transmission Line)
Sweep
TL1 Length
TL2 Length
TL3 Length
TL4 Length
TL5 Length
TL1 Impedance
TL2 Impedance
TL3 Impedance
TL4 Impedance
TL5 Impedance
Range (degrees)
1800◦ -2700◦ (1.5-2.25
1800◦ -2700◦ (1.5-2.25
1800◦ -2700◦ (1.5-2.25
1800◦ -2700◦ (1.5-2.25
10,000◦ (8.33 m)
377Ω
45Ω-377Ω
45Ω-377Ω
45Ω-377Ω
45Ω-377Ω
Count
m)
m)
m)
m)
3
3
3
3
1
1
3
2
2
3
of free space) to ensure the line is always matched to the source. The length of
the final impedance line was chosen to ensure no reflections from the open stub at
the end. Specifically, a high attenuation value of 10 V/m was applied to the end
of the line to simulate the appropriate boundary condition (all other lines were
lossless). The impedance values were chosen to roughly approximate impedances
which may be encountered in an application such as GPR.
As a result of Ansys Circuit Designer’s use of variable step solvers, additional
measures were taken to condition the time-sampling of the simulations. In particular, the above configuration was first generated using Ansys Circuit Desiner’s
GUI, then a circuit netlist was generated and modified with the following commands.
.option tran.accurate_points.num = 59415
.option tran.accurate_points.start = 0
.option tran.accurate_points.stop = 2.5e-7
27
.option tran.trtol = 1
While using a netlist to condition the time-sampling of the simulations allows
the generation of data sets with approximately uniform sampling, there is still
a slight variation, or jitter, to the sampling. However, it was found that the
variation was small enough to not significantly impact results. Finally, the data
sets were exported from Ansys Circuit Designer in a Comma Separated Value
(CSV) format.
4.2
A Neural Network Approach to the Problem
Neural networks have been shown to improve the performance of inverse scattering systems and medical imaging applications [43–45]. The researchers in [43]
successfully designed a neural network (termed DeepNIS) capable of learning general inverse scattering problems with a finite amount of transmitters and receivers
surrounding an object of interest. While DeepNIS has good performance in these
operating conditions, the imaging system requires the object of interest be surrounded by transmitters and receivers. In many cases such as GPR [46–48], it is
not practical to place transmitters around an object of interest. In these cases, it is
desirable to image a target with sensors placed only on one side of the target. The
goal of simulation one is to design a compact neural network capable of imaging a
1-dimensional scene. In later experiments, more complex 3D scattering problems,
without imposing the restrictive assumptions that sensors must be placed entirely
around a target of interest.
In what follows, various forms of neural network architectures are chosen and
evaluated for performance on estimating the characteristic impedances in the data
28
synthesized in the previous section. After the initial performance has been evaluated, each architecture is refined with the intention of improving performance.
4.2.1
Initialization and Training
To evaluate the performance of the architectures, the data was divided into a training set consisting of 85% of the data and a testing set consisting of 15% of the data.
All simulations were performed using the Julia [42] programming language using
the “Flux” [49, 50] machine learning package. New Mexico State University’s high
performance cluster termed “Discovery” was used for all network training. The
weights were initialized using Flux’s default layer initialization (Xavier initialization) by drawing from a uniform distribution as described in [51]. All dense layers
utilized the Leaky ReLU activation function while all convolutional and crossconvolutional layers utilized the sigmoid activation functions [52]. The network
weights were obtained using the ADAM algorithm [29]. All default Adam parameters were used with the exception of the learning rate. Starting from a value of
0.01, the learning rate was decreased by a factor of 10 if there was no improvement after 5 epochs and the learning rate was still larger than 1e-6. Training was
terminated after 10 epochs without improvement. The MAE given in equation
(3) was used as the cost function to train the networks. As a side note, it was
found experimentally that using the MAE resulted in very low final error values
while using the Mean Squared Error (MSE) resulted in the error always staying
very large. This may be caused by the squaring of strong errors at discontinuities
causing the network to over correct and diverge.
29
4.2.2
Initial Architecture Evaluation
Simulation one began by choosing twelve compact architectures for evaluation.
Recall, the goal of the simulations was to achieve an error of log(MAE) = 0.
Ideally, all possible network configurations would be simulated. However, due
to computational limitations, this is not feasible. By starting with 12 unique
networks, an effective starting point can be established and refined.
All architectures considered take on one of two forms. The first form is a typical
Convolutional Neural Network (CNN) where the first layer(s) are convolutional
layers with flattened outputs fed into one or more dense layers. The second form
of network is the same as a the first form except a cross-convolutional layer is used
to pool the data as opposed to a raw flatten. Because the data is very temporally
dependent (an observation of 1 V at 0 ns means something very different from an
observation of 1 V at 100 ns), it was hypothesized that the cross-convolutional
layers would preserve temporal information that can be lost by flattening.
For convenience, the notation summarized in Table 2 is used for describing the
networks considered. Specifically, a lowercase ‘c’ denotes a single convolutional
layer, a capital ‘C’ denotes a cross-convolutional layer, and a capital ‘D’ denotes
a fully connected dense layer.
Table 2: The abbreviations/notation used to describe neural network architectures used throughout this chapter.
Notation
Meaning
Lowercase ‘c’
Convolutional Layer
Uppercase ‘C’ Cross-Convolutional Layer
Uppercase ‘D’
Dense Layer
vX
Version X
30
Each of the networks considered contained different amounts of convolutional
and dense layers. The minimum number of convolutional layers was chosen to be
one and the maximum was chosen to be three. Similarly, the minimum number
of dense layers was chosen to be one and the maximum was chosen to be two. All
convolutional layers utilized sigmoid activation functions and all dense layers utilized Leaky ReLU activation functions. Finally, all convolutional layers consisted
of three filter kernels with a receptive field of size 1×51. All architectures were
designed to take a one-dimensional voltage observation as an input and output a
prediction for the impedance along the length of the transmission line. The output
of the network was discretized to consist of 900 samples along a 9 m transmission
line resulting in a spacial sampling rate of 0.01 meters per sample. Because both
network types end in dense layers, the number of output samples is equal to the
number of neurons meaning all networks contain 900 neurons in the final dense
layer. A complete summary of the architectures considered is provided in Table 3.
Note that architectures will be refined, therefore these preliminary architectures
will be denoted version 1 or ‘v1’.
31
Table 3: Summary of the layer attributes for each of the models considered in the initial architecture evaluation. LRU signifies the Leaky ReLU activation function.
Dense
Sigmoid
Sigmoid
Sigmoid
-
-
900
900
900
LRU
LRU
LRU
cDDv1
ccDDv1
cccDDv1
3
3
3
1×51
1×51
1×51
Sigmoid
Sigmoid
Sigmoid
-
-
2000→900
2000→900
2000→900
LRU
LRU
LRU
cCDv1
ccCDv1
cccCDv1
3
3
3
1×51
1×51
1×51
Sigmoid
Sigmoid
Sigmoid
1×1×3
1×1×3
1×1×3
Sigmoid
Sigmoid
Sigmoid
900
900
900
LRU
LRU
LRU
cCDv1
ccCDv1
cccCDv1
3
3
3
1×51
1×51
1×51
Sigmoid
Sigmoid
Sigmoid
1×1×3
1×1×3
1×1×3
Sigmoid
Sigmoid
Sigmoid
2000→900
2000→900
2000→900
LRU
LRU
LRU
32
Cha
Act
ivat
ion
1×51
1×51
1×51
Num
Act
ivat
ion
3
3
3
Rec
cDv1
ccDv1
cccDv1
Architecture
nne
ls
Rec
. Fi
eld
. N
euro
ns
Cross-Convolution
Act
ivat
ion
. Fi
eld
Convolution
The results from the initial simulation are shown in Figures 12 and 13. Although the best architecture from the initial architectures achieved an error of
0.528, it was unable to achieve the goal of log(MAE) = 0. The results were divided into CNN and CNN with cross-convolutional pooling for clarity. Curves that
do not extend to 1000 epochs were terminated because convergence had plateaued.
The sharper dips (e.g. in Figure 13 the dip at 120 epochs for ‘cCDDv1’) occur
when the learning rate was decreased.
Figure 12 shows the results achieved using the basic CNN architectures. Of the
six architectures tested, three performed particularly well (‘cD’,‘ccD’ and ‘cccD’).
These three architectures contained only one dense layer. The three architectures
that performed poorly were ‘cDD’, ‘ccDD’, and ’cccDD’. All the poorly performing
basic CNN architectures contained two dense layers. Given this information, it
appears as though adding a second dense layer to the basic CNN architecture
degrades performance. This may be because a second dense layer is causing the
network to be too flexible and therefore difficult to train.
Figure 13 shows the results achieved using the CNN architecture that was
pooled using a cross-convolutional layer. The two architectures that performed
the best were ‘cCDD’ and ‘cCD’. The third best performer was ‘ccCDD’. The final
three poor performers were ‘ccCD’, ‘cccCD’, and ‘cccCDD’.
The top two performing architectures were ‘cCDD’ and ‘ccD’. The ‘best’ architectures were determined by looking at the lowest error value achieved. These two
architectures were further refined in an attempt to lower the error. Table 4 gives
the converged error of each architecture. From this table, the two top performing
architectures ‘ccD’ and ‘cCDD’ can easily be identified.
33
Figure 12: The log of the mean absolute error for the models considered in the initial architecture
evaluation. This figure only shows the results for CNN architectures without cross-convolution
layers. The sharp decreases in error generally correspond to the changes (decreases) in the
learning rate. The best performing architecture is marked with a ‘*’. The versions are denoted
v* where * represents the version number.
34
Figure 13: The log of the mean absolute error for the models considered in the initial architecture
evaluation. This figure only shows the results for CNN architectures with cross-convolution
layers. The sharp decreases in error generally correspond to the changes (decreases) in the
learning rate. The best performing architecture is marked with a ‘*’. The versions are denoted
v* where * represents the version number.
35
Table 4: The log of the converged error for each of the models considered in the initial architecture evaluation. The two entries in bold are the top performing architectures.
Architecture
cDv1
ccD
cccDv1
cDDv1
ccDDv1
cccDDv1
cCDv1
ccCDv1
cccCDv1
cCDDv1
ccCDDv1
cccCDDv1
Minimum Error
0.83
0.754
0.992
2.035
2.035
2.035
.87
2.035
2.035
0.528
1.503
2.035
36
4.2.3
Secondary Architecture Evaluation
In an attempt to get closer to the goal of log(MAE) = 0, several variations of
the two top performing architectures (‘cCDDv1’ and ‘ccDv1’) from the initial
architecture evaluation were considered for refinement. More specifically, the performance change is evaluated as the number of channels and the filter sizes are
varied. The number of channels was varied from two to five and the receptive
field was varied from 1×25 to 1×101. Summary of the layer attributes for each
of the models considered in the secondary architecture evaluation is provided in
Table 5. The results of the secondary architecture evaluation are given in Figures
14 and 15 and Table 6.
Table 5: Summary of the layer attributes for each of the models considered in the secondary
architecture evaluation. The bold entries represent changes from the initial architecture evaluation. LRU signifies the Leaky ReLU activation function.
Dense
-
-
900
900
900
900
900
LRU
LRU
LRU
LRU
LRU
cCDDv2
cCDDv3
cCDDv4
cCDDv5
cCDDv6
4
5
2
3
3
1×51
1×51
1×51
1×25
1×101
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Sigmoid
1×1×4
1×1×5
1×1×2
1×1×3
1×1×3
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Sigmoid
2000→900
2000→900
2000→900
2000→900
2000→900
LRU
LRU
LRU
LRU
LRU
Act
iva
Cha
tion
Act
ivat
ion
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Num
. Fi
eld
1×51
1×51
1×51
1×25
1×101
Rec
4
5
2
3
3
Rec
ccDv2
ccDv3
ccDv4
ccDv5
ccDv6
Architecture
nne
ls
Act
ivat
ion
. N
euro
ns
Cross-Convolution
. Fi
eld
Convolution
Figure 14 shows the performance of the models considered in the secondary ar37
chitecture evaluation without cross-convolutional layers. Generally speaking, these
five architectures performed as would be expected from variations of the ‘ccD’ architecture. It is interesting to note that the two smallest architectures (‘ccDv4’
and ‘ccDv5’) achieved the lowest error from this set of evaluations. This supports
the idea that using small architectures well matched to the problem, can have high
performance despite the size. Compared to the models with cross-convolutional
layers, models without cross-convolutional layers generally have worse performance.
Figure 15 shows the performance of the models considered in the secondary
architecture evaluation with cross-convolutional layers. Again, the two smallest
architectures (‘cCDDv4’ and ‘cCDDv5) achieved the lowest error from this set of
evaluations, out performing all models without cross-convolutional layers. Given
both architectures with and without cross-convolutional layers that were smaller
performed better, one more evaluation will be performed in the next section where
even smaller architectures are evaluated. However, as a final note, the models in
the secondary architecture evaluation did not result in better performance than
the models in the initial architecture evaluation. The best architecture was able
to achieve an error of 0.645 and therefore did not meet the goal of log(MAE) = 0.
38
Figure 14: The log of the mean absolute error for the first 300 epochs of the secondary architecture evaluation. This figure only shows the results for CNN architectures without crossconvolution layers. The sharp decreases in error generally correspond to the changes (decreases)
in the learning rate.
Table 6: The log of the converged error for each of the models considered in the secondary
architecture evaluation. The two entries in bold are the top performing architectures.
Architecture
Minimum Error
ccDv2
ccDv3
ccDv4
ccDv5
ccDv6
cCDDv2
cCDDv3
cCDDv4
cCDDv5
cCDDv6
0.787
0.924
0.729
0.733
0.936
0.888
0.734
0.573
0.645
0.768
39
Figure 15: The log of the mean absolute error for the first 300 epochs of the secondary architecture evaluation. This figure only shows the results for CNN architectures with cross-convolution
layers. The sharp decreases in error generally correspond to the changes (decreases) in the
learning rate. The best performing architectures are marked with a ‘*’.
40
4.2.4
Tertiary Architectures
Although secondary architectures were unable to improve upon initial architecture
performance, the secondary evaluations generally showed that smaller architectures were performing better. Because of this, one final refinement was considered
where the two best performing architectures (‘ccDv1’ and ‘cCDDv1’) had a kernel
size of 25 and only two channels. This results in the smallest architectures of
simulation one. Summary of the layer attributes for each of the models considered in the tertiary architecture evaluation is provided in Table 7. The results
of the tertiary architecture evaluation are given in Figure 16 and Table 8. Although the best architecture only achieved an error of 0.681 and was not able to
meet the goal of log(MAE) = 0, the model with the cross-convolutional layers
outperformed the model without the cross-convolutional layers. The results of
the tertiary evaluation were consistent with the first two evaluations, but did not
result in improvement over the initial architecture evaluations.
Table 7: Summary of the layer attributes for each of the models considered in the tertiary architecture evaluation. The bold entries represent changes from the initial architecture evaluation.
LRU signifies the Leaky ReLU activation function.
900
LRU
cCDDv7
2
1×25
Sigmoid
1×1×2
Sigmoid
2000→900
LRU
Act
ivat
io
Num
.
Act
ivat
io
Act
ivat
io
Cha
Rec
41
n
-
Neu
-
n
Sigmoid
n
1×25
Rec
2
ls
. Fi
eld
rons
Dense
ccDv7
Architecture
. Fi
eld
Cross-Convolution
nne
Convolution
Figure 16: The log of the mean absolute error for the first 300 epochs of the tertiary architecture
evaluation. The sharp decreases in error generally correspond to the changes (decreases) in the
learning rate. The best performing architectures are marked with a ‘*’.
Table 8: The log of the converged error for each of the models considered in the tertiary architecture evaluation. The entry in bold is the top performing architecture.
Architecture
Minimum Error
ccDv7
cCDDv7
1.129
0.681
42
4.2.5
Final Evaluation and Summary of Results
The top three performing architectures of simulation one were ‘cCDDv1’, ‘cCDDv4’,
and ‘cCDDv5’. Model ‘cCDDv1’ performed the best of the three. Thus far,
only one trial for each architecture was simulated and reported. To decrease the
stochastic effects of initialization on the findings, five additional simulations of
each architecture with random initialization were performed and the results averaged. Average results help give more confidence about network convergence
and therefore the design of the architecture. This will give more confidence when
applying the networks in simulation two. The results of this final architecture
evaluation are given in Figure 17.
Thus far, all errors reported correspond to the training error. Next, the testing
error is reported to demonstrate how well the models generalize to new data. The
average testing and training errors for the final architecture evaluation are shown
in Table 9. Overall, the architecture ‘cCDDv1’ has the best performance and
appears to generalize better than the other architectures considered. Moreover,
the results of these simulations support the hypothesis that cross-convolutional
layers aid in performance.
In order to give a different demonstration of the model performance, inputs
were randomly selected and given to the network for prediction. The outputs were
then visually compared to the ideal outputs. Figures 18 - 20 show the outputs
of ‘cCDDv1’ for three randomly selected inputs taken from the testing set. As
can be seen in the figures, the network output is fairly consistent with the desired
output. However, the exact impedance values for some predictions have some
error. For example, in Figure 20(c) the output impedance after ∼5 m should be
50 Ω but appears to be slightly less than 50 Ω. There also appears to be some
43
false edges. For example, in Figure 19(c) there there is an incorrect edge at ∼2
m with an impedance of ∼200 Ω. In some cases a ringing may be observed which
may be an overcompensation for an upcoming sharp discontinuity, for example
see Figure 20(c) at about 4.5 m. Finally, some edges were detected either early or
late, for example see Figure 18(c) at ∼2 m.
Simulation one has produced several neural network architectures that can estimate the impedance down a transmission line given a measurement of reflections
from an emitted pulse. Subjectively, the appeared to produce good impedance
profiles, however, no architecture was able to meet the goal of log(MAE) = 0.
Two aspects in particular could have been modified to potentially improve
performance. First, the use of Ansys imposed restrictions on the type of scenarios
that could be simulated. This is due to the fact Ansys was designed to optimize
circuits, not generate machine learning data sets. Future research could make
use of custom code to compute the forward problem. Custom code could improve
memory management to more easily allow large data sets to be simulated. Second,
to decrease the stochastic effects of random initialization, each architecture could
be evaluated many more times, as was done in the final evaluation.
This work can find application in estimating the permittivity of unknown
materials similar to a conventional transmission line method as described in [53].
Additionally, this method can be directly implemented in 3D if the incident wave is
assumed to be a plane wave incident upon flat, parallel media [40]. In summary,
simulation one has provided information that will aid in the design of neural
networks for the 3D inverse scattering problems in simulation two.
44
Figure 17: The expected value of the log of the mean absolute error for the first 300 epochs of
the top three performing architectures of simulation one. The expected value is computed with
N = 5. The best performing architecture is marked with a ‘*’.
Table 9: The log of the expected value of the converged error of the best performing architectures
of simulation one. The expected value is computed with N = 5. The entry in bold is the top
performing architecture.
Architecture
Average Training Error
Average Testing Error
cCDDv1
cCDDv4
cCDDv5
0.662
1.411
0.962
1.145
1.569
1.189
45
(a)
(b)
(c)
Figure 18: An example of the network (‘cCDDv1’) output for a randomly selected input taken
from the testing set. (a) The measured signal. (b) The ideal output. (c) The output of the
network.
46
(a)
(b)
(c)
Figure 19: An example of the network (‘cCDDv1’) output for a randomly selected input taken
from the testing set. (a) The measured signal. (b) The ideal output. (c) The output of the
network.
47
(a)
(b)
(c)
Figure 20: An example of the network (‘cCDDv1’) output for a randomly selected input taken
from the testing set. (a) The measured signal. (b) The ideal output. (c) The output of the
network.
48
5
Simulation Two
In this chapter, a neural network will be designed that extends the 1D imaging
of the previous chapter to 3D imaging. First, the approach used to synthesize a
3D data set will be presented. Training and refinement of the neural network will
then follow.
5.1
Data Synthesis
To properly train the neural network, a 3D data set was required. A 3D data set
was obtained through simulation with the free and open source software, openEMS
[54,55]. The simulation software openEMS utilizes a flexible Matlab interface that
naturally scaled to New Mexico State University’s High Performance Computer
(HPC) “Discovery”.
In simulation one, the goal was to determine the impedance profile of multiple
transmission lines. In simulation two, the goal is to determine the permittivity
profile as opposed to the impedance. Specifically the goal is to predict the relative
permittivity r (x, y, z). The relative permittivity r (x, y, z) is related to the total
permittivity (x, y, z) through
(x, y, z) = r (x, y, z)0
(6)
where 0 is the permittivity of free space and the inputs x, y, and z represent the
the permittivity at a particular location in 3D space [56]. The author has chosen
to attempt to achieve a Mean Absolute Error (MAE) of r < 1. Given the range of
permittivity values that can occur in real world applications (r ≈ 1 − 60) [56–58],
49
a MAE of r < 1 should give reasonable differentiation between objects and remain
fairly close to the true permittivity of the object.
In 3D, there are many locations that the transmitters and receivers can be
placed and consequently multiple transmitters and receivers can now be utilized
simultaneously as opposed to only a single pair for the 1D case. Specifically, there
are four configurations that could be utilized: Single Input Single Output (SISO),
Single Input Multiple Output (SIMO), Multiple Input Single Output (MISO),
and Multiple Input Multiple Output (MIMO). An example of each configuration
is shown in Figure 21. The configuration chosen in simulation two is SIMO.
This configuration has several benefits. Only a single source is required which
simplifies the simulation significantly by avoiding any timing issues required by
multiple sources. If the SISO configuration was used, only one source would
be required, however this source would have to move or scan an object to get
more information about the target. Computationally, moving the source would
require multiple simulations for a single target which is expensive. The chosen
configuration SIMO allows for only a single simulation to be performed per target.
The usage of multiple sensors allow for more information about the reflected waves
to be obtained from a single pulse and a single simulation.
The chosen SIMO configuration utilized 25 sensors placed on 5×5 planar grid
with 1 wavelength separation between sensors. Note that simulation two considers
most distances in terms of wavelengths. The wavelength specified will always be at
the minimum cutoff frequency of 1.5 GHz resulting in the maximum wavelength
present in the simulation of 0.1999 m unless otherwise specified. The chosen
configuration is shown in Figure 22. A planar grid was chosen for simplicity,
however this will cause ambiguity if there are known targets equidistant behind
50
Figure 21: Possible transmitter and receiver configurations. In 1D, if constrained to place
transmitters and receivers only on one side of the target, there is only one location to place the
transitter/receiver pair. Shown here is only 2D, however it illustrates how many configurations
are possible as well as the types of configurations.
51
the sensors (see Figure 3). For the purpose of simulation two, no targets were
located behind the grid and therefore any ambiguity was eliminated.
All simulations were run with the following parameters. Also note that the
neural network (to be described in the next section) will only predict a small cubic volume inside of the simulated data set which is termed the imaging volume.
The maximum resolution in all simulations was the minimum wavelength wavelength/20 = 6 mm. The sensor grid was placed 5.5 wavelengths away from the
imaging volume with 1 wavelength between sensors. The source is placed at the
center of the sensor grid five wavelengths away from the imaging volume to avoid
numerical instability at the sensors1 . The receivers were simulated by adding field
dumps that straddled single grid points. The source was generated in openEMS
by exciting a line of 3 mesh points with y polarization and a strength of 100 V/m.
This excitation numerically simulated a line source. The excitation was a Gaussian pulse centered at 2 GHz and a 20 dB cutoff frequency of 1.5 GHz/2.5 GHz.
The simulation boundary utilized 12 Perfectly Matched Layers (PMLs) to absorb
all radiation leaving the simulation. The simulation boundary kept at least 2
wavelengths away from all objects in the simulation to ensure no objects directly
interacted with the PML’s. These parameters are summarized in Table 10.
1
Numerical instability was observed near the source that is believed to be caused by the
Finite Difference Time Domain method near sharp discontinuities.
52
Receivers
Target
Transmitter
Figure 22: Chosen transmitter/receiver SIMO configuration. The 25 sensors are placed on a
square 5 x 5 planar grid with 1 wavelength separation between receivers. The transmitter is
offset from the receivers by half a wavelength to avoid a numerical error consistent with FDTD
computational methods near sharp discontinuities. The target is 5.5 wavelengths away from the
receivers and 5 wavelengths away from the transmitter.
53
Table 10: Summary of the parameters used for simulation in openEMS. *The maximum wavelength is termed wavelength throughout this chapter unless otherwise specified.
Simulation Parameter
Value
Excitation Type
Gaussian Pulse
Excitation Strength
100 V/m (Vertical Polarization)
Center Frequency
2 GHz
Max/Min Cutoff Frequency
1.5 GHz/2.5 GHz
Max/Min Wavelength*
199.9 mm/119.9 mm
Max Spacial Resolution
6 mm
Dist. Sensor Grid to Imaging Volume
5.5 wavelengths (1.0992 m)
Dist. Source to Imaging Volume
5 wavelengths (.9993 m)
Sensor Grid Type/Configuration
Planar Grid (5 x 5)
Sensor Grid Sensor Spacing
1 wavelength
Simulation Boundary Layers
12 Perfectly Matched Layers
Simulation Boundary Min Dist. to Objects
2 wavelengths
Timestep
5.5 ps
Simulation Length (Samples/Time)
3501 samples / 19.2555 ns
5.1.1
Data Sets
To demonstrate a simplified real world application, four different data sets were
simulated. The first three represent simplified real world scenarios and the fourth
is composed of random objects to aid in algorithm generalization. Note that the
imaging volume is only a small cube of the entire simulation. However, the data
sets were designed to show the most detail inside of the imaging volume to best
demonstrate functionality.
The first data set consisted of asphalt cubes of different sizes. This data set
will be termed the Runway-Functional data set as it is representative of a slab of
runway without cracks or inclusions. The permittivities were chosen to be similar
to that of asphalt [56]. The authors of [56] found the true permittivity values of
asphalt to be slightly lossy, however for simplicity only the lossless permittivity
54
Figure 23: Slices down the yz, xz, and xy planes of a sample point from the Runway-Functional
data set. The dimensions of the x and y axes are in mm.
values were utilized. These permittivity values had an r value between 4 and 8.
It is assumed that the cubes are floating in free space and all energy that scatters
off of the cube will continue to infinity. In total, 320 data points were simulated
with the box size ranging from 25 mm to 100 m with r values ranging from 4 to
8. A sample data point from this data set is shown in Figure 23.
The second data set is composed of cubes of asphalt, but unlike the first data
set, the cubes have spherical inclusions of air. This data set will be termed the
Runway-Faulty data set. This data set is intended to be compared with the
Runway-Functional data set and is intended to demonstrate the ability to detect
55
Figure 24: Slices down the yz, xz, and xy planes of a sample point from the Runway-Faulty
data set. The dimensions of the x and y axes are in mm.
cracks within an asphalt runway. The cubes were kept at a constant size of 100
mm while the size of the sphere was varied between 12.5 mm and 25 mm. The
permittivity of the asphalt was varied between 4 and 8 as before. In total, 320
data points were simulated. A sample data point from this data set is shown in
Figure 24.
The third data set is intended to simulate a simplified biomedical imaging
application. Two concentric cylinders of the approximate size and permittivity of a
human thigh were simulated. This data set will be termed the Human Leg data set.
The leg was assumed to float in free space as before. The outer cylinder had the
56
approximate permittivity of muscle and the inner cylinder had the approximate
permittivity of bone [57, 58]. True human tissue is lossy and anisotropic, however
this is neglected for simplicity. The outer cylinder radius was held constant at 75
mm and the length of both cylinders was held constant at 350 mm. The inner
cylinder radius was varied from 20 to 30 mm. The permittivities of the muscle
and bone were varied from 40 to 60 and from 15 to 40 respectively. The distance
from the source to the start of the thigh is 1.07 m (5.37 wavelengths). Note that
the imaging volume is smaller than the total size of the leg, therefore only the
inner cylinder (bone) will be visible in the imaging volume. In total 320 points
were simulated. A sample data point from this data set is shown in Figure 25.
The fourth data set was created to improve algorithm generalization. It consisted of cubes and spheres of random size, location, and permittivity placed into
the space. This data set shall be termed the Random Shapes data set. The number of spheres and the number of cubes inside of the simulation space were varied
from 1 to 4 and from 1 to 5 respectively. The sizes of the cubes were randomly
drawn from a uniform distribution between 25 and 100 mm. The radius of the
spheres was randomly drawn from a uniform distribution with radii between 12.5
and 50 mm. Finally, the permittivity values for the cubes and the spheres were
randomly chosen from a uniform distribution between 1 and 60. The permittivity
values were chosen to match the permittivity values of the first three data sets
and the sizes were chosen to keep all shapes inside of a 101 mm × 101 mm ×
101 mm imaging volume. It is important to note that as each cube and sphere
is added, its permittivity overwrites the object that was written before it. Because of this overwriting process, much more complex structures were simulated
as edges were overwritten to make different, more complex shapes. In total, 960
57
Figure 25: Slices down the yz, xz, and xy planes of a sample point from the Human Leg data
set. The dimensions of the x and y axes are in mm. The images shown are of the imaging
volume, which is inside of the human leg. Consequently, the boundary of the outer cylinder
which is present during synthesis will not be visible during imaging due to the size of the human
leg compared to the size of the imaging volume.
58
Figure 26: Slices down the yz, xz, and xy planes of a sample point from the random shapes data
set. The dimensions of the x and y axes are in mm. The sizes of the cubes and spheres were
chosen to fit inside of the imaging volume and permittivities were chosen to correspond to the
other three data sets. Note that more complex structures are formed through the combination
of all structures such as the orange triangular shape on the lower left side of the xz plane image.
This data set was generated to aid in algorithm generalization.
points were simulated. A sample data point from this data set is shown in Figure
26. A summary of each data set is shown in Table 11.
59
Table 11: Summary of the four simulated data sets. Random Shapes data set was primarily used
for training. The Runway-Functional, Runway-Faulty, and Human Leg data sets were designed
to represent a simplified real world scene and were primarily used to test trained networks.
Data Set
Number of Points
Description
Runway-Functional
Runway-Faulty
320
320
Human Leg
Random Shapes
320
960
A solid cube of asphalt
An asphalt cube with a spherical
inclusion of air in the center
Two concentric cylinders
Cubes and spheres randomly
placed inside the imaging volume
5.2
Neural Network Design
Due to the design of the simulated data sets, the input of the network is constrained to the 25 electric field trace inputs. The method chosen to represent the
output permittivity profile is a 3D cube of size 101 mm × 101 mm × 101 mm
resulting in 1 voxel per mm3 . Note that x-ray is currently able to achieve a resolution of approximately 100 µm [59]. Increasing the resolution will either shrink the
field of view of the imaging volume or increase the computational complexity of
the problem. The chosen size of the imaging volume will allow slightly larger objects to be simulated with manageable computational complexity. The proposed
method can easily be modified to increase the resolution of the imaging volume
by decreasing the size of the imaging volume while keeping the same dimensions.
Increasing the resolution is not studied in this work and is left for future work.
As previously mentioned, the imaging volume does not cover the entire extent of the 3D simulations from the previous section. However, the data sets
were designed to show the most detail inside of the cube. The imaging volumes
were generated using the OpenEMS simulations by specifying the objects drawn
(e.g. cubes, spheres, cylinders) and their respective locations. Then 3D arrays
60
were generated where each point was the appropriate impedance at the appropriate distance from the sensor grid (approximately 5.5 wavelengths away).
Once the data sets were simulated, the neural networks could then be trained.
Recall the goal of simulation two is to predict the unknown target’s permittivity
profile from the 25 measured time domain electric field traces. Initial attempts
to directly transfer the result of simulation one to simulation two, were unsuccessful because it was found that this method overfit the data and provided poor
generalization. For this reason and due to the large number of neurons required
for this approach, a different direction was taken. A custom neural network was
designed that consisted of a Convolutional Neural Network (CNN)—used to learn
physics—followed by the decoder of a Variational Autoencoder (VAE)—used to
draw the predicted scene.
More specifically, the neural network was broken into two parts: 1) a normal
CNN for the first layers followed by 2) the decoder of a pre-trained variational
autoencoder to generate the imaging volume prediction. The variational autoencoder was first trained to generate the types of imaging volumes that are present
in the data sets
A variational autoencoder is used as opposed to a traditional autoencoder to
add structure to the latent space. The encoder takes an imaging volume as an
input and compresses that volume to a low dimensional vector. The decoder
then takes this low dimensional representation and maps it to an imaging volume
that is representative of the data set. Once trained, the decoder is removed
and is never trained again. The decoder is then attached to the output of the
convolutional network so that the input of the decoder is now the output of the
convolutional network. Together, the convolutional network is responsible for
61
learning the physics and the decoder is responsible for generating the imaging
volume. Note that after the decoder is removed from the variational autoencoder,
the random draw for the input is not longer performed. Figures 27 - 29 show the
described neural network with color coding for clarity.
Convolutional Neural Network
Trace 1
Trace 2
Input Traces
Trace 3
...Trace 25
Output Vector
Figure 27: An illustration of a CNN that will be used in conjuction with the decoder of a VAE
to create the proposed full network. The CNN is color coded blue for reference in Figure 29.
62
Variational Autoencoder
Input Volume
Encoder
+ rand x exp()
Random Draw
Latent Space Vector
Reconstructed Volume
Figure 28: An illustration of the VAE architecture utilized. First the VAE will be trained, then
the decoder will be removed, the weights frozen, and finally the decoder will be attached to the
output of the CNN of Figure 27. The decoder is color coded brown for reference in Figure 29.
63
Complete Set Up
Trace 1
Trace 2
Input Traces
Trace 3
...Trace 25
Output/Latent Space Vector
Predicted Volume
Figure 29: An illustration of a full network. The CNN of Figure 27 (blue) will be trained to
learn physics and will be placed as the input layers of the network. The trained decoder of the
VAE (brown) from Figure 28 will then be attached to the output of the CNN (with weights
frozen from further training).
64
Splitting the network into two parts has three benefits.
1. The convolutional network that is learning the physics is not directly connected to the output. Therefore it is difficult for the network to overfit the
data and is more likely to learn the underlying physics of the problem.
2. The training time for the convolutional network is greatly reduced due to
exponentially fewer parameters to train.
3. The VAEs use the same data for the input and targets. Therefore data does
not have to be labeled and acquiring training data is much easier.
For convenience, the notation summarized in Table 12 is used for describing the
networks considered. The notation is identical to simulation one where a lowercase
‘c’ denotes a single convolutional layer, a capital ‘C’ denotes a cross-convolutional
layer, and a capital ‘D’ denotes a fully connected dense layer. Additionally, for
this simulation, a ‘ct‘ will denote a convolutional transpose layer.
Table 12: The abbreviations/notation used to describe neural network architectures used
throughout this chapter.
Notation
Meaning
Lowercase ‘c’
Convolutional Layer
Uppercase ‘C’
Cross-Convolutional Layer
Uppercase ‘D’
Dense Layer
Uppercase ‘ct’
Convolutional Transpose Layer
VAEvX
Variational Autoencoder Version X
FNvX
Full Network Version X
65
5.2.1
Variational Autoencoder
The VAE was developed first. As with simulation one, the neural networks (both
the CNN and the VAE) were developed using Julia [42] and the “Flux” machine
learning package [49,50]. All networks were trained on New Mexico State University’s HPC termed “Discovery”.
Initial Considerations Because the VAE is only used to draw objects, the
network can be trained using all data available. This ensures the the output
space is adequately sampled for this application. Recall, that the VAE is only
responsible for generating the imaging volume. Therefore, the training stage for
the VAE will never utilize the sensor probe input data—ensuring that only the
CNN will be responsible for learning the underlying physics.
To add additional structure to the VAE, both convolutional and convolutional
transpose layers are used to extract features in the encoder and build features
in the decoder [60]. For simplicity however, the convolutional VAE will still be
referred to as the VAE. Although VAEs are typically trained using batch learning,
online learning was utilized because GPU memory limitations were encountered
(due to the high dimensional output representing the imaging volume). All networks considered had two convolutional layers followed by two dense layers for
the encoder. The decoder is symmetrical to the encoder; it begins with two dense
layers followed by two convolutional transpose layers.
The Adam optimizer was used to minimize the loss function. The loss function chosen was the binary cross entropy for the reconstruction with β = 1 (see
Section 2.3.4 for more details). The weights of all networks were initialized using
Flux’s default layer initialization (Xavier initialization) by drawing from a uniform
66
distribution as described in [51].
A problem that was initially encountered was the output was always biased to
a cube that covered almost the entire imaging volume as shown in Figure 30. This
was caused by two things. First, if there were more layers than described above,
the output would be compressed to very small values in the latent space. These
very small values did not carry enough weight to change the decoder output values
and consequently resulted in a flat output as the convolutional transpose layers
convolved over a small and not diverse value. Secondly, the choice of an activation
function such as the Sigmoid activation function dramatically worsened the small
output values of the encoder. The Sigmoid activation function would map the
already small values to a scale between 0 and 1 which ensured the values were
getting smaller as the number of layers increased. By decreasing the number of
layers to the described network and utilizing the ReLU or Leaky ReLU activation
function to help increase the variance of the encoder output, the problem of a
“flat” output was overcome.
67
Figure 30: An example of a “flat” output from the VAE. The dimensions of the x and y axes are
in mm. The output will look “flat” no matter the input or how many epochs are trained. The
“flat“ output problem is fixed by increasing the variance of the encoder output by decreasing the
number of layers and utilizing the ReLU activation function.
68
Top Performing Architecture Details The top six VAE variations that were
developed are shown in Table 14. The three architectures shown in the top half
of Table 14 will be discussed first. Note that most variations utilized the Adam
optimizer with learning rate η = 10−5 . Some variations were able to function
with η = 10−4 , however it was found that larger learning rates resulted in NaN
parameters.
Two different methods of data normalization were implemented to make each
point lie on a scale of 0 to 1. In the first method, termed Method 1, each volume
is divided by maximum value in that volume. In the second method, termed
Method 2, each volume is divided by the maximum value in the training data set
(r = 60). Table 13 gives a summary of these methods.
Table 13: A summary of the methods used to normalize the data before training the VAE.
Method 2 is preferred over Method 1 because the original volume can be recovered by simply
multiplying by the maximum value in the data set of 60.
Normalization
Method
Description
Method 1
Divide each volume by the maximum value
in that volume.
Method 2
Divide each volume by the maximum value
in the training data set (r = 60).
For the encoder, each of these three networks utilized 2 convolutional layers
with 16 filter kernels in each layer with a receptive field of size 3 × 3 × 3. The
small receptive field size helped to increase the speed of the network without
sacrificing performance. Additionally, the encoder had two dense layers after the
convolutional layers to intelligently compress the extracted features. There were
100 neurons in the first dense layer and the final dense layer of the decoder was
varied from 10 to 100 neurons. The output of the encoder is a vector from a latent
69
space. The more dimensions the latent space vector has, the more flexibility the
network has to differentiate between objects inside the latent space.
The decoder has 100 neurons in the first dense layer and 100 neurons in the
second dense layer. The second dense layer had the number of neurons equal
to the output size of the encoder’s convolutional layers (912,673). The final two
layers of the network were convolutional transpose layers. The first convolutional
transpose layer utilized 16 filter kernels with a receptive field of size 3 × 3 × 3. The
final convolutional transpose layer generated the imaging volume and consisted
of a single neuron with a receptive field of size 3 × 3 × 3. All layers utilized
the Leaky ReLU activation function with the exception of the last layer being
the ReLU activation. The ReLU activation function for the last layer ensured
only positive values could be predicted. This kept the range of possible outputs
to what is physically possible (positive only values). Additionally, the ReLU
activation function does not saturate, so it has the ability to predict any positive
permittivity value. The data for each of the first three networks was normalized
by Method 1. These three architectures are shown in the top half of Table 14 and
their performance is shown in Table 15.
The next three highest performing architectures are described in the bottom
half of Table 14. These three architectures followed after VAEv3 but the number
of filter kernels was varied. Additionally, the last layer was changed to a Leaky
ReLU activation as opposed to a typical ReLU activation function. The last activation was changed because the gradient for these three networks was consistently
going negative preventing the network from updating. The Leaky ReLU is not
exclusively positive and does not saturate, so the network should be able to predict any possible impedance value. Also differing from VAEv3, the networks were
70
normalized by Method 2. Normalization in this manor is preferable because the
true permittivity of the object can readily be recovered by multiplying the output
of the network by the maximum value of 60.
Table 14: Summary of the layer attributes for each of the models considered in the most successful architectures. The format for the convolutional layers is number of kernels/receptive field
dimensions. The notation 3* denotes dimensionality of (3 × 3 × 3). The format for the dense
layers is the number of neurons. Bold entries represent items that were changed from variation
to variation. For the upper three architectures listed, all layers utilized the Leaky ReLU activation function with the exception of the last layers utilizing the ReLU activation. The lower
three architectures listed utilized only the Leaky ReLU activation function.
Encoder
Decoder
se2
Den
100
100
100
10
50
100
100
100
100
912673
912673
912673
16/3*
16/3*
16/3*
1/3*
1/3*
1/3*
1
1
1
VAEv12
VAEv13
VAEv14
16/3*
32/3*
64/3*
1/3*
1/3*
1/3*
100
100
100
100
100
100
100
100
100
912673
912673
912673
16/3*
32/3*
64/3*
1/3*
1/3*
1/3*
2
2
2
Met
se1
m.
Nor
Den
1/3*
1/3*
1/3*
ose2
ose1
hod
ansp
Den
se1
vTr
Con
Den
2
16/3*
16/3*
16/3*
ansp
v
Con
vTr
Con
v1
Con
VAEv1
VAEv2
VAEv3
se2
Architecture
Table 15: Trial 1 summary.
Epochs Trained
Latent Size
Architecture
Minimum Error
80
173
143
10
50
100
VAEv1
VAEv2
VAEv3
418572
397117.06
394610.8
71
The error corresponding to the six top performing architectures are shown in
Figure 31. For some networks, additional iterations may have resulted in lower
error, however, in most cases the general trend of network performance was clear
and therefore training was terminated early. For example, VAEv13 would likely see
increased performance by increasing training iterations. Each network was trained
for approximately two days. Architecture VAEv13 achieved the lowest error, but
as will be shown momentarily, VAEv3 subjectively captured features the best.
Figure 31: Error of the top performing VAE architectures.VAEv13 achieved the lowest error,
however VAEv3 appeared to capture image features better than any other architecture.
Other Architecture Details Other trials existed that are worth noting not
because of the performance, but because of the knowledge gained from the lack of
performance. These architectures are described briefly in this section. For chrono72
logical purposes, it should be noted that the first three architectures of Table 14
were the first architectures tested. Refinements were made after analyzing the
results of these three architectures.
Trial 2—The architectures of Table 16 were trained after analyzing the results
of the first three architectures of Table 14. After analyzing the results, the activation of the last layer was changed to the sigmoidal activation because it was
believed that using an activation which can only output on the range of 0 to 1
would be beneficial after normalizing all data points. Additionally, normalization
was done through Method 2. It is desirable to normalize the data set in this
manner because the original permittivity can readily be recovered, however, it
appears that normalizing by Method 1 improves training more than by normalizing by Method 2. This is possibly due to an increase of variance in the data
points by ensuring the different objects have dramatically different values. After
these changes were implemented, it was found that performance degraded from the
previous set of architectures. Both the sigmoid and the change in normalization
adversely affected performance.
Table 16: Trial 2 summary.
Epochs Trained
Latent Size
Architecture
Minimum Error
131
90
44
10
50
100
VAEv4
VAEv5
VAEv6
665946.06
666117.6
673167.7
Trial 3—The next set of architectures that were tested are shown in Table
17. The sigmoidal activation was kept the same, however, the normalization was
changed to Method 1. The change in normalization achieved marginally better
73
performance indicating that normalization by Method 1 is aiding in algorithm
training.
Table 17: Trial 3 summary.
Epochs Trained
Latent Size
Architecture
Minimum Error
51
47
26
10
50
100
VAEv7
VAEv8
VAEv9
539597.75
539979.94
550160.8
Trial 4—It was hypothesized that utilization of convolutional transpose layers adversely affect network performance. Convolutional transpose layers look for
different characteristics as they slides across an input. It was believed the latent space vector and output of the decoder’s dense layers may not have enough
structure to pass valid information through to the convolutional transpose layers.
Additionally, the dense layers have to learn structure to pass information to the
convolutional transpose layers. It was believed that removing the convolutional
transpose layers and utilizing only dense layers may remove this difficulty. These
architectures are described in Table 18. The first architecture was normalized by
Method 2 and the second architecture was normalized by Method 1. While performance did increase, it was not able to improve upon the first three architectures
of Table 14. This is likely because the convolutional transpose layers make decisions about local regions of the input. Each neuron at the output of a dense layer
must make a decision based on the entire input to that neuron. For this reason,
it is likely that removing the convolutional transpose layers made the output too
flexible and degraded performance.
74
Table 18: Trial 4 summary.
Epochs Trained
Latent Size
Architecture
Minimum Error
116
6
100
100
VAEv10
VAEv11
419576.8
502137
Trial 5—Because no architecture was able to improve upon the performance
of the first three architectures of Table 14, a closer refinement of these three
architectures was attempted by taking VAEv3, normalizing by Method 2, and
increasing the number of filters. These three architectures are shown in the bottom
of Table 14 and the corresponding performance is shown in Table 19.
Table 19: Trial 5 summary
Epochs Trained
Num. Filters
Architecture
Minimum Error
127
105
88
16
32
64
VAEv12
VAEv13
VAEv14
395603.2
384601.75
391259.66
Trial 6—To attempt to further improve the normalization, the architectures in
Table 20 were investigated. The VAEv3 model was visited again but using sigmoid
activation function to ensure the network was only capable of outputing values
on the range of 0 to 1. It was found that the sigmoid activation performed very
poorly with training of any VAE architecture.
75
Table 20: Trial 6 summary.
Epochs Trained
Num. Filters
Architecture
Minimum Error
3
88
47
16
32
64
VAEv15
VAEv16
VAEv17
715101.4
667760.1
671514
Trial 7—After analyzing the results given in in Table 14, it was found that
the output of the Leaky ReLU and ReLU functions were difficult to maintain between values of 0 and 1. But if the sigmoid activation was utilized, performance
also degraded. For these reasons, it was hypothesized that the ReLU and Leaky
ReLU activation functions were firing too strongly at most locations in the imaging volume. To compensate and generate as low an error as possible, the network
appeared to be capturing as much of the information about the shapes as possible
because it was difficult to perfectly capture the permittivity information. Conversely, it appeared that the sigmoid activation was not firing strongly enough to
capture permittivity information. It was hypothesized the network was relying on
the fact that it was already outputting values on the correct range of 0 to 1. This
appeared to result in the network not being able to learn information about the
shapes correctly.
In an attempt to remedy these two problems, the architectures shown in Table
21 were first trained with the Leaky ReLU activation function to learn the proper
shapes in the data set. Once a plateau in training was reached, the activation
function of the last layer was swapped to the Sigmoidal activation function to learn
permittivity information. It was found that even with starting with a pre-trained
network, the Sigmoidal activation function was unable to learn and adversely
76
affected performance.
Table 21: Trial 7 summary.
Epochs Trained
Num. Filters
Architecture
Minimum Error
47
41
34
16
32
64
VAEv18
VAEv19
VAEv20
665520.2
662566
664195.2
VAE Summary and Results To summarize, it was found that a large amount
of layers coupled with the sigmoid activation function resulted in a constant “flat”
output regardless of whether the network was pre-trained or not. This was fixed by
using fewer layers with the ReLU or Leaky ReLU activation. It was found that the
ReLU activation function sometimes produced a negative gradient during training
that resulted in no parameters updating. This can be fixed by utilizing the Leaky
ReLU activation function. It was also found that Sigmoidal functions appear to
perform poorly in these VAE architectures applied to these data sets.
Additionally, the following observations were made. Learning rates larger than
η = 10−4 resulted in NaN parameters. This is likely due to large initial errors
passed through an exponential function causing values to grow larger than machine precision can handle. Normalizing the data set by Method 1 as opposed to
normalizing by Method 2 increased performance likely due to increased data variance, but made it difficult to determine the original permittivity. Convolutional
transpose layers appear to perform better than dense layers for building an output
likely due to convolutional layers accepting a local input of the previous layer as
opposed to the entire previous layer. As an additional note, in simulation one,
multiple trainings were performed with the same parameters to reduce the uncer77
tainty of network performance due to stochastic initialization. In simulation two,
due to time constraints, only single trainings without averaging were performed.
Some sample results are shown in Figures 32–35. The outputs were generated
by passing the ideal function through the encoder layers and directly passing the
mean output to the decoder without performing a random draw or including the
variance output. This should query the network for the ideal output as opposed
to an object that looks similar to the input. Figures 32 and 33 are outputs from
VAEv3 while Figures 34 and 35 are from VAEv13. Architecture VAEv3 subjectively
appeared to produce the best results while VAEv13 achieved the lowest error. Because VAEv3 had a higher error than VAEv13 but subjectively appeared to perform
better, it is likely that the utilized loss function can be improved upon.
VAEv3 utilized the ReLU activation function and appeared to generate sharper
objects as opposed to VAEv13 that utilized the Leaky ReLU activation. When
trained, the Leaky ReLU activation function tends to fire in the slowly varying negative portion of the activation likely causing edges to be distorted. Both
networks were able to generate objects that looked similar to the ideal images,
however fine details were not able to be recovered. For example in Figure 33, the
strong permittivity shown in the bottom left slice was able to be captured, but
the shapes in the center were unable to be recovered. Additionally, the true permittivity values were unable to be recovered. Also, observe Figure 35. The shapes
were recovered fairly well, however the network predicted negative permittivities
which are not physically possible.
78
(a)
(b)
Figure 32: A sample output from VAEv3. The dimensions of the x and y axes are in mm. (a)
Ideal output. (b) Predicted output. The network was able to capture the sphere in the center,
but not the cube shown at the top right of the ideal slices likely due to the permittivity being
close to free space. Note the predicted permittivites are larger than 1.
79
(a)
(b)
Figure 33: A sample output from VAEv3. The dimensions of the x and y axes are in mm. (a)
Ideal output. (b) Predicted output. The network was able to capture the shape with stronger
permittivity shown at the bottom of the ideal slices. The network was not able to capture the
shapes with smaller permittivity values. Note the predicted values are greater than 1.
80
(a)
(b)
Figure 34: A sample output from VAEv13. The dimensions of the x and y axes are in mm. (a)
Ideal output. (b) Predicted output. The network was able to detect the stronger permittivity
features of the cube in the top two ideal slices and the sphere and cube of the bottom ideal slice.
Note the permittivities were not correctly predicted.
81
(a)
(b)
Figure 35: A sample output from VAEv13. The dimensions of the x and y axes are in mm. (a)
Ideal output. (b) Predicted output. The network was able to capture both the sphere and the
cube in the ideal image however it was not able to display the correct permittivity.
82
5.2.2
Full Network
After training the VAEs, the top performing VAE architecture was selected (VAEv3).
Although VAEv3 normalized by Method 1 (which makes it difficult to recover the
original permittivity values), it achieved one of the lowest error values and subjectively output the best images. Because the decoder of the VAE is simply drawing
the output, it should not matter how the network is normalized.
A network composed of convolutional and dense layers was placed at the front
of a trained VAE decoder. Note that the weights were initialized using Flux’s
default layer initialization (Xavier initialization) by drawing from a uniform distribution as described in [51]. The combination of these two networks will be
referred to as the “full network” for the remainder of this document.
The weights corresponding to the VAE decoder are frozen from subsequent
training to isolate the input from the output. Isolating the input from the output
ensures that the network will not overfit the data. The decoder weights ensure the
network is capable of drawing the types of objects desired to be imaged. Once the
decoder weights are trained and frozen, they cannot learn to draw any other types
of objects. Therefore, once the convolutional and dense layers are attached, there
is no way for the network to modify the weights that are drawing the imaging
volumes and therefore there it is very difficult for the network to overfit to the
images it is training on.
The full networks were only trained on the Random Shapes data set using the
MAE as the loss function. The Asphalt-Functional, Asphalt-Faulty, and Human
Leg data sets were used for validation and testing purposes only. Of the three
data sets not used for training, 200 points were randomly selected for validation
from each data set and the remaining 120 points from each data set were used
83
for testing. In total, 600 points were used for validation and 360 points were
used for testing. Utilizing only the random shapes data sets for training is highly
advantageous for assessing network ability to generalize. If the network can learn
how to predict real world scenes from completely random data, then it is highly
likely the network has learned the underlying physics and is not overfitting the
problem.
In total, five variations of full networks were trained and evaluated. From
simulation one, it was found that architectures of format “cCDD” process time
domain traces well. For this reason, the first three full networks utilized “cCDD”
for the CNN half of the full network. That is, each of the first three full networks
contained a single convolutional layer followed by a cross-convolutional layer to
pool the results followed by two dense layers with 100 neurons in each dense layer.
The number of kernels was varied as well as the receptive field of each crossconvolutional kernel. Padding was set for the convolutional layers so that the
input and output dimensionality were the same by adding zeros to the beginning
and end of the time domain traces.
A summary of each of the first three architectures is shown in Table 22 and
the results are shown in Figures 36 - 38. Validation error is important because
it exemplifies the network’s ability to generalize to new data. The network with
the lowest validation error was FNv2, however it appeared that FNv1 was the most
stable of the three architectures. For this reason, FNv1 was selected to be refined
one final time. Note that the lowest error achieved was 11.222 therefore none of
the architectures were able to achieve the goal of a MAE of r < 1.
84
Table 22: Summary of the layer attributes for each of the models considered in the first three
variations of full networks. The format for the convolutional layers is number of kernels/receptive
field dimensions. The format for the dense layers is the number of neurons. Bold entries
represent items that were changed from variation to variation. All layers utilized the Leaky
ReLU activation function with the exception of the last layers being the ReLU activation.
CNN
Figure 36: Training and validation error of FNv1.
85
10.536
10.803
10.350
n
atio
VAEv3
VAEv3
VAEv3
d
Vali
100
100
100
ing
sion
Ver
100
100
100
Minumum Error
in
Tra
se2
1/(16x1)
1/(32x1)
1/(64x1)
Den
onv
ss-C
Cro
16/(25x51)
32/(25x51)
64/(25x51)
se1
v
Con
FNv1
FNv2
FNv3
Den
Architecture
Decoder
12.864
11.376
12.856
Figure 37: Training and validation error of FNv2.
FNv4 and FNv5 were based off of architecture FNv1 but with varying number
of layers. A summary of these architectures is shown in Tables 23 and 24 and the
results are shown in Figures 39 and 40. Again, note that no architecture was able
achieve the goal of a MAE of r < 1.
86
Figure 38: Training and validation error of FNv3.
Table 23: Summary of the layer attributes for each of the models considered in the final two
variations of full networks. The format for the convolutional layers is number of kernels/receptive
field dimensions. The format for the dense layers is the number of neurons. Bold entries
represent items that were changed from variation to variation. All layers utilized the Leaky ReLU
activation function with the exception of the last layers for the first three listed architectures
being the ReLU activation.
87
1/(16x1)
1/(16x1)
se2
16/(16x51)
16/(16x51)
Den
16/(25x51)
16/(16x51)
Den
se1
16/(25x51)
-Co
nv
Con
v
FNv4
FNv5
Cro
ss
Architecture
Con
v
Decoder
Con
v
CNN
100
100
100
100
Table 24: Summary of the minimum error for each of the models considered in the final two
variations of full networks.
Architecture
Decoder
Version
Min. Training
Error
Min. Validation
Error
FNv4
FNv5
VAEv3
VAEv3
8.736
13.925
11.222
19.216
Figure 39: Training and validation error of FNv4.
88
Figure 40: Training and validation error of FNv5.
89
5.2.3
Summary of Results
The top performing architecture of simulation two was FNv2, however FNv1 appeared to have the most stable error curve for generalization. Subjectively, all
architectures seemed to perform better or worse for different data sets. Figures 41
- 45 show some sample outputs taken from the data sets. All points were generated with the weights that resulted in the lowest training error. Figure 42 shows
a point taken from the Random Shapes training set while Figures 41, 43, 44, and
45 show a point taken from the Human Leg, Runway-Functional and two points
from the Runway-Faulty testing sets respectively. The left columns represents the
ideal output of the network, the middle columns represent the output of a VAE
(VAEv3), and the right columns represents the output of the full network. The
middle column should be the best achievable output for the full network because
it should represent the best possible way for the decoder to draw the ideal input.
Note that none of the VAE outputs were able to resolve finer details such as edges
as will be seen momentarily.
Figure 41 shows an example output of FNv3 on the Human Leg testing set.
This figure is very interesting because the full network was never trained on this
data set; only the VAE was trained on this data set. Vertical lines can be seen
in the prediction of the full network on the lower two images of the full network
output that should ideally be present. However the full network was only trained
with cubes and spheres; the full network was not trained on any cylinders. This
means that in the latent space of the VAE, cylindrical objects are situated close
enough to other objects that it was still able to utilize this section of the VAE
latent space.
Figure 44 shows FNv3 applied to the Runway-Faulty data set which is most
90
similar to the Human Leg data set in the center of the images. The network was
not able to discern the inclusion in the center. Because the architecture is able to
detect the larger cylindrical object, but not the smaller spherical inclusion, this
suggests the network was unable to detect the spherical object—regardless of the
VAE’s ability to draw the object. It should be noted that just because FNv3 was
unable to utilize physics to resolve small spherical inclusions, it does not mean
it is not possible to resolve small features. Figure 45 shows an example output
of FNv4 on the Runway-Faulty data set. Contrary to FNv3, it appears as though
FNv4 was able to detect a spherical inclusion at the center of the imaging volume.
This figure suggests that it is possible to improve upon the results of simulation
two to form a more generalized architecture that can consistently resolve smaller
features.
Figure 43 shows the output of FNv2 to the Runway-Functional data set. In
this figure, the full network was able to resolve the edges of the cube, but the
permittivity in the center of the cube seems to indicate there is a sphere. This is
likely due to bias in the network toward spherical objects. Bias in the network is
likely due to a lack of diversity in the training data. This output suggests that the
Random Shapes data set had a preference of spherical objects in the center of the
imaging volume. This bias can be seen in each of the five sample outputs given.
For example, observe Figure 44. The rightmost column shows swirling artifacts
in the shapes of spheres. To improve upon this bias, the Random Shapes data
set should be modified such that spheres and cubes can be drawn outside of the
imaging volume. By writing spheres only to the center of the imaging volume, it is
likely that larger spheres were always being centered and therefore inadvertently
altering the uniform random distribution of the data set to a distribution that is
91
biased toward the center of the imaging volume.
Because the VAE architectures are not able to fully describe the output space,
it appears as though the full network compensates to a degree. For example,
observe the permittivity color bars in each of the five examples. The best VAE
output (middle column) always predicts an unrealistically high permittivity value
while the full network always predicts a much lower permittivity value. This
means that the full network has been constrained to utilizing sections of the VAE
latent space that corresponded to lower and more accurate permittivity values.
However, constraining to sections with smaller permittivity values likely means it
will not be possible to resolve finer details such as edges.
Figure 42 shows an example output of FNv1 on the training set (Random
Shapes data set). This figure exemplifies how the full network attempts to predict
the correct permittivity. The permittivity values the full network predicts are
contained within the permittivity values of the data set (r = 1 − 60). The VAE
will attempt to capture the edges of the shapes, but in doing so, it sacrifices a
realistic permittivity. In this case, it appears as though the full network prioritized
the permittivity values as opposed to capturing the finer detail in the imaging
volume.
Neither simulation one nor simulation two studied the effect of noise on the
networks. Noise plays a great effect on the performance of any real world electromagnetic system, however, the VAE of the full network adds a sort of noise to the
system. Even though the full network removes the random draw from the decoder,
the decoder is still trained to decode a point drawn from a distribution; the full
VAE maps input distributions to output distributions rather than input points to
output points. For this reason, the full network is technically operating on noisy
92
inputs even though there is no noise present in the inputs. Because the network
is operating on a distribution as opposed to a point, it is likely the network will
generalize well to unseen points and this is supported by the results of simulation
two.
Although the best architecture was only able to achieve an error of 11.222
and was not able to achieve the MAE goal of r < 1, it has been shown that it
is possible to detect features that are much smaller than the wavelength of the
excitation or the width of the pulse. The center frequency of the pulse was 2
GHz with a maximum wavelength of 199.9 mm. This is twice the size of every
dimension of the entire imaging volume. Similarly, if traditional radar theory is
utilized, the range resolution can be resolved with equation (7)
∆R =
c
2B
(7)
where ∆R is the range resolution, c is the speed of light in free space (3×108 m/s),
and B is the bandwidth of the pulse [61]. Using equation 7 with a bandwidth of
1 GHz leads to a range resolution of ∆R = 150 mm. This is also larger than the
entire size of the imaging volume. Therefore, while there is much to be improved,
simulation two has proven that is possible to approximate the inverse scattering
problem using the proposed method to potentially exceed existing radar imaging
capabilities with the additional capability of estimating the permittivity of the
target simultaneously.
93
(a)
(b)
(c)
Figure 41: An example output of network FNv3 on the Human Leg data set. The dimensions of
the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full
Network prediction.
94
(a)
(b)
(c)
Figure 42: An example output of network FNv1 on the Random Shapes data set. The dimensions
of the x and y axes are in mm. (a) The ideal output. (b) Best possible VAE prediction. (c) Full
Network prediction.
95
(a)
(b)
(c)
Figure 43: An example output of network FNv3. The dimensions of the x and y axes are in mm.
(a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction.
96
(a)
(b)
(c)
Figure 44: An example output of network FNv3. The dimensions of the x and y axes are in mm.
(a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction.
97
(a)
(b)
(c)
Figure 45: An example output of network FNv4. The dimensions of the x and y axes are in mm.
(a) The ideal output. (b) Best possible VAE prediction. (c) Full Network prediction.
98
6
Conclusion
This paper has presented a methodology that can estimate the solution to an inverse scattering problem which has relevance in applications such as subterranean
and biomedical imaging. Two separate simulations were performed in this work.
The first simulation presented a method that can be utilized for 1D imaging down
a transmission line. The second simulation expanded upon the first simulation by
considering 3D images.
For the 1D case considered in simulation one, it was found that Convolutional Neural Networks (CNN)s with cross-convolutional pooling perform well for
processing radar-like signals to estimate a permittivity profile down a transmission line. The networks performed subjectively very well with a minimum log
of Mean Absolute Error (MAE) of 0.662 but were unable to obtain the goal of
log(MAE) < 0. Simulation one can find application in plane wave imaging, transmission line characterization, and as an alternative transmission line method for
determining the unknown impedance/permittivity of a material.
For the 3D case considered in simulation two, it was found that the direct
implementation of a CNN architecture on the simulated 3D data sets resulted in
overfitting of the data. To improve generalization, a Variational Auto Encoder
(VAE) was trained to draw the types of objects in the data set. Then, a CNN was
trained to output a vector in the latent space the VAE uses to draw outputs. This
method greatly improved generalization, however, it resulted in a loss of detail
in the generated objects. Although the VAE improved generalization, it is likely
that the VAE was limiting the performance of the overall imaging system. In the
end, it was found that for a resolution of 1 voxel per mm the proposed method
99
achieved a minimum MAE of 11.222 but was unable to achieve the MAE goal of
r < 1. However, the proposed method has been shown to discern features in the
target that are much smaller than the excitation wavelength and much smaller
than a traditional radar imaging bin can theoretically differentiate. Additionally,
the proposed method simultaneously approximates the permittivity profile of the
target and provides more information than ranging information alone. Simulation two can find application in biomedical imaging applications including cancer
detection, subterranean imaging and non-destructive testing.
All together, this work presented a study of methods which can be used to
approximate the solution to an inverse scattering problem which can potentially
exceed existing radar imaging capabilities while simultaneously estimating permittivity. The author hopes that one day this work will lead to the replacement
of existing biomedical and subterranean imaging techniques for an inverse scattering technique that will produce easy to interpret images, remove the need for
hazardous radiation, and be cost effective to the user.
100
7
Future Work
The proposed methodology can discern small details of targets, however, there
is much that can be improved upon. The Variational Auto Encoder (VAE) was
not able to completely describe the output space and this ultimately affected full
network performance. Future work could improve upon the VAE by improving
the loss function. Developing a custom loss function that will enforce edge/feature detection and correct permittivity prediction simultaneously could greatly
improve VAE performance and therefore full network performance.
Other methods can be utilized when training the full network. The proposed
methodology froze the decoder weights completely during the full network training. It is possible that alternating between training the VAE and training the full
network could improve performance. In this way, a cost function can be utilized
to penalize both the VAE and the full network in such a way that the VAE will
be tailored to only draw objects that the full network will be able to utilize. Alternatively, the existing training can be utilized with the addition of a Generative
Adversarial Network (GAN) to penalize the VAE for outputting an object that
is unlike an object in the data set. The addition of a GAN could improve the
performance of the VAE and therefore the performance of the full network.
For the purpose of biomedical imaging, it is highly desirable to trust the network will not produce large artifacts that are not truly a part of the target. For
example, if a person has an unknown foreign object inside of them, a medical professional can trust that an x-ray will not produce artifacts that are not truly in the
scene. Currently, the proposed work will make predictions that are not necessarily
based on the underlying physics due to the fact that no underlying physics of the
101
problem were ever enforced and the network was entirely data driven. Future work
can improve the reliability of the proposed method by enforcing physics during
training by utilizing Physics Inspired Neural Networks (PINN)s. In this way, the
network should not be able to predict scenes that are not physically possible.
Simulation of the forward problem takes a significant amount of time. For
successful application of the proposed method, a significant amount of data must
be simulated to train the network on realistic scenarios. Future work can develop
a novel approach that will decrease the time necessary to simulate the forward
problem to rapidly generate realistic data. This could be done through the use of
a PINN because a typical neural network only takes a few hundred micro seconds
to generate a prediction. An accurate and fast simulation method should greatly
increase the proposed method’s performance.
Finally, the proposed method can be tested with a working prototype. A
working prototype will require the development of an antenna system, the required
electronics to drive a signal onto the antenna, and a data acquisition system. Once
completely assembled, development for real world applications could begin.
102
REFERENCES
[1] X. Q. He, Z. Q. Zhu et al., “Review of GPR rebar detection,” in Progress In
Electromagnetics Research Symposium proceedings, 2009, pp. 804–813.
[2] R. M. Murugan, “An improved electrical impedance tomography (EIT) algorithm for the detection and diagnosis of early stages of breast cancer,” Ph.D.
dissertation, University of Manitoba, 1999.
[3] R. Fitzgerald, “Phase-sensitive x-ray imaging,” Physics Today, pp. 23–26,
July 2000.
[4] T. Dreiseidler, R. A. Mischkowski et al., “Comparison of cone-beam imaging
with orthopantomography and computerized tomography for assessment in
presurgical implant dentistry,” The International Journal of Oral & Maxillofacial Implants, vol. 24, no. 2, pp. 216––225, 2009.
[5] E. Bresch, Y. C. Kim et al., “Seeing speech: Capturing vocal tract shaping
using real-time magnetic resonance imaging,” IEEE Signal Processing Magazine, p. 123–132, May 2008.
[6] M. Robinson, C. Bristow et al., Ground Penetrating Radar. British Society
for Geomorphology, 2013.
[7] J. C. Portinari, “The one-dimensional inverse scattering problem,” Ph.D. dissertation, Massachusetts Institute of Technology, 1966.
[8] A. Farina, M. Betcke et al., “Multiple-view diffuse optical tomography system
based on time-domain compressive measurements,” Optics Letters, vol. 42,
no. 14, pp. 2822–2825, 2017.
[9] D. C. Barber and B. H. Brown, “Applied potential tomography,” Journal of
Physics E: Scientific Instruments, vol. 17, no. 9, p. 723–733, 1984.
[10] G. Strobel, “Electrical resistivity tomography in environmental amelioration,”
Ph.D. dissertation, University of Manitoba, 1996.
[11] K. H. Ng, “Non-ionizing radiation sources, biological effects, emissions and
exposures,” in Proceedings of the international conference on non-ionizing
radiation, 2003, pp. 1–16.
[12] J. Malmivuo and R. Plonsey, Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields. New York Oxford University
Press, 1995.
103
[13] C. Dang, M. Darnajou et al., “Performance analysis of an electrical impedance
tomography sensor with two sets of electrodes of different sizes,” in 9th World
Congress on Industrial Process Tomography, 2018.
[14] M. E. Anderson and G. E. Trahey, “A seminar on k-space applied to medical
ultrasound,” Department of Biomedical Engineering, Duke University, Apr
2000.
[15] J. A. Scales, Theory of Seismic Imaging. Samizdat Press, 1995.
[16] O. Oncken, G. Asch et al., “Seismic imaging of a convergent continental margin and plateau in the central andes (andean continental research project
1996 (ANCORP’96)),” Geophysical Research, vol. 108, 2003.
[17] A. Massa, S. Caorsi et al., “A comparative study of NN and SVM-based
electromagnetic inverse scattering approaches to on-line detection of buried
objects,” Applied Computational Electromagnetics Society, vol. 18, no. 2, p.
65–75, Jul 2003.
[18] S. Haykin, Neural Networks and Learning Machines. Pearson, 2009, vol. 3.
[19] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2016.
[20] E. A. Marengo, “Inverse scattering by compressive sensing and signal subspace methods,” 2007 2nd IEEE International Workshop on Computational
Advances in Multi-Sensor Adaptive Processing, 2007.
[21] G. Carleo, I. Cirac et al., “Machine learning and the physical sciences,” Reviews of Modern Physics, vol. 91, no. 4, 2019.
[22] A. V. Oppenheim, A. Willsky, and S. H. Nawab, Signals and Systems, 2nd ed.
Prentice Hall, 1997.
[23] K. Fukushima, “Neocognitron for handwritten digit recognition,” Neurocomputing, vol. 51, pp. 161–180, 2003.
[24] S. Lawrence, C. L. Giles et al., “Face recognition: A convolutional neuralnetwork approach,” IEEE transactions on neural networks, vol. 8, no. 1, pp.
98–113, 1997.
[25] L. Liu, C. Shen, and A. van. den. Hengel, “The treasure beneath convolutional
layers: Cross-convolutional-layer pooling for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
2014.
104
[26] ——, “Cross-convolutional-layer pooling for image recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp.
2305–2313, 2016.
[27] W. Shi, J. Caballero et al., “Is the deconvolution layer the same as a convolutional layer?” Computing Research Repository (CoRR), 2016.
[28] M. D. Zeiler, D. Krishnan et al., “Deconvolutional networks,” in 2010 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition.
IEEE, 2010, pp. 2528–2535.
[29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 3rd
International Conference for Learning Representations, 2015.
[30] D. Kingma and M. Welling, “Auto-encoding variational bayes,” Computing
Research Repository (CoRR), 2014.
[31] L. Theis, W. Shi et al., “Lossy image compression with compressive autoencoders,” Computing Research Repository (CoRR), 2017.
[32] N. M. N. Leite, E. T. Pereira et al., “Deep convolutional autoencoder for EEG
noise filtering,” in 2018 IEEE International Conference on Bioinformatics
and Biomedicine (BIBM). IEEE, 2018, pp. 2605–2612.
[33] J. Tu, H. Liu et al., “Spatial-temporal data augmentation based on LSTM
autoencoder network for skeleton-based human action recognition,” in 2018
25th IEEE International Conference on Image Processing (ICIP). IEEE,
2018, pp. 3478–3482.
[34] S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals
of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.
[35] I. Higgins, L. Matthey et al., “beta-vae: Learning basic visual concepts with a
constrained variational framework,” in International Conference on Learning
Representations (ICLR), 2017.
[36] M. Tschannen, O. Bachem, and M. Lucic, “Recent advances in autoencoderbased representation learning,” Computing Research Repository (CoRR),
2018.
[37] C. Doersch, “Tutorial on variational autoencoders,” Computing Research
Repository (CoRR), 2016.
105
[38] N. H. Jamali, K. A. H. Ping et al., “Image reconstruction based on combination of inverse scattering technique and total variation regularization
method,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 5, no. 3, p. 569–576, March 2017.
[39] A. C. Fannjiang, “Compressive inverse scattering: I. high-frequency
SIMO/MISO and MIMO measurements,” Inverse Problems, 2010.
[40] S. Li, Q. Zhang, and B. C. Ahn, “Wave-amplitude transmission matrix approach to the analysis of the electromagnetic planewave reflection and transmission at multilayer material boundaries,” Department of Radio and Communications Engineering, Graduate School of Electrical and Computer Engineering Chungbuk National University, 2019.
[41] V. Matko and M. Milanovič, “Sensitivity and accuracy of dielectric measurements of liquids significantly improved by coupled capacitive-dependent
quartz crystals,” Multidisciplinary Digital Publishing Institute - Sensors,
vol. 21, no. 10, p. 3565, 2021.
[42] J. Bezanson, A. Edelman et al., “Julia: A fresh approach to numerical computing,” Society for Industrial and Applied Mathematics Review, vol. 59,
no. 1, pp. 65–98, 2017.
[43] L. Li, L. G. Wang et al., “DeepNIS: Deep neural network for nonlinear electromagnetic inverse scattering,” IEEE Transactions on Antennas and Propagation, vol. 67, no. 3, pp. 1819–1825, 2018.
[44] Y. Chen, Y. Xie et al., “Brain MRI super resolution using 3D deep densely
connected neural networks,” in 2018 IEEE 15th International Symposium on
Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 739–742.
[45] S. J. Hamilton and A. Hauptmann, “Deep D-bar: Real-time electrical
impedance tomography imaging with deep neural networks,” IEEE transactions on medical imaging, vol. 37, no. 10, pp. 2367–2377, 2018.
[46] M. S. Kaufmann, A. Klotzsche et al., “Simultaneous multichannel multi-offset
ground-penetrating radar measurements for soil characterization,” Vadose
Zone Journal, vol. 19, no. 1, 2020.
[47] S. Esmaeili, S. Kruse et al., “Resolution of lava tubes with ground penetrating
radar: The TubeX project,” Journal of Geophysical Research: Planets, vol.
125, no. 5, 2020.
[48] M. S. Kang, N. Kim et al., “3D GPR image-based UcNet for enhancing underground cavity detectability,” Remote Sensing, vol. 11, no. 21, 2019.
106
[49] M. Innes, E. Saba et al., “Fashionable modelling with flux,” Computing Research Repository (CoRR), 2018.
[50] M. Innes, “Flux: Elegant machine learning with julia,” Journal of Open
Source Software, 2018.
[51] X. Glorot, Y. Bengio et al., “Understanding the difficulty of training deep
feedforward neural networks,” in Proceedings of the Thirteenth International
Conference on Artificial Intelligence and Statistics, vol. 9. Proceedings of
Machine Learning Research, 13–15 May 2010, pp. 249–256.
[52] A. A. S. Sharma, S. Sharma, “Activation functions in neural networks,” International Journal of Engineering Applied Sciences and Technology, vol. 4,
no. 12, 2020.
[53] J. B. Jarvis, M. D. Janezic et al., Transmission/reflection and short-circuit
line methods for measuring permittivity and permeability. National Institute
of Standards and Technology, 1993.
[54] T. Liebig, A. Rennings, and D. Erni, “openEMS–a free and open source
equivalent-circuit (EC) FDTD simulation platform supporting cylindrical coordinates suitable for the analysis of traveling wave MRI applications,” International Journal of Numerical Modelling: Electronic Networks, Devices and
Fields, vol. 26, no. 6, pp. 680–696, 2013.
[55] T. Liebig. openEMS - open electromagnetic field solver. General and
Theoretical Electrical Engineering (ATE), University of Duisburg-Essen.
[Online]. Available: https://www.openEMS.de
[56] E. J. Jaselskis, J. Grigas et al., “Dielectric properties of asphalt pavement,”
Journal of materials in civil engineering, vol. 15, no. 5, pp. 427–434, 2003.
[57] W. D. Hurt, “Multiterm debye dispersion relations for permittivity of muscle,”
IEEE Transactions on Biomedical Engineering, no. 1, pp. 60–64, 1985.
[58] B. Amin, M. A. Elahi et al., “An insight into bone dielectric properties variation: A foundation for electromagnetic medical devices,” in 2018 EMF-Med
1st World Conference on Biomedical Applications of Electromagnetic Fields
(EMF-Med). IEEE, 2018.
[59] W. Huda and R. B. Abrahams, “X-ray-based medical imaging and resolution,”
American Journal of Roentgenology, vol. 204, no. 4, pp. W393–W397, 2015.
107
[60] A. Kastanos, “Convolutional VAE in flux,” alecokas.github.io, Sept. 2020.
[Online]. Available: http://alecokas.github.io/julia/flux/vae/2020/07/22/
convolutional-vae-in-flux.html
[61] B. R. Mahafza, Radar Systems Analysis and Design Using MATLAB Third
Edition. Chapman and Hall/Chemical Rubber Company press, 2013.
108
ProQuest Number: 28776120
INFORMATION TO ALL USERS
The quality and completeness of this reproduction is dependent on the quality
and completeness of the copy made available to ProQuest.
Distributed by ProQuest LLC ( 2022 ).
Copyright of the Dissertation is held by the Author unless otherwise noted.
This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.
This work is protected against unauthorized copying under Title 17,
United States Code and other applicable copyright laws.
Microform Edition where available © ProQuest LLC. No reproduction or digitization
of the Microform Edition is authorized without permission of ProQuest LLC.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA
Download