Uploaded by samihaq

Hinders M. Intelligent Feature Selection for Machine Learning 2020

advertisement
Mark K. Hinders
Intelligent
Feature Selection
for Machine
Learning Using
the Dynamic
Wavelet Fingerprint
Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint
Mark K. Hinders
Intelligent Feature Selection
for Machine Learning Using
the Dynamic Wavelet
Fingerprint
123
Mark K. Hinders
Department of Applied Science
William & Mary
Williamsburg, VA, USA
ISBN 978-3-030-49394-3
ISBN 978-3-030-49395-0
https://doi.org/10.1007/978-3-030-49395-0
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2020
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Squirrels are the bane of human existence.1
Everyone accepts this. As with deer, many
more squirrels exist today than when
Columbus landed in the New World in 1492.
Misguided, evil squirrel lovers provide these
demons with everything they need to survive
and thrive—food (bird feeders), water
(birdbaths), and shelter (attics).
Wisdom from “7 Things You Must Know About Stupid Squirrels” by Steve Bender https://www.
notestream.com/streams/56e0ec0c6e55e/ Also: “Tapped: A Treasonous Musical Comedy” A live
taping at Theater Wit in Chicago, IL from Saturday, July 25, 2016. See https://www.youtube.com/
watch?v=JNBd2csSkqg starting at 1:59.
1
Preface
Wavelet Fingerprints
Machine learning is the modern terminology for what we’ve always been trying to
do, namely, make sense of very complex signals recorded by our instrumentation.
Nondestructive Evaluation (NDE) is an interdisciplinary field of study which is
concerned with the development of analysis techniques and measurement technologies for the quantitative characterization of materials, tissues, and structures by
non-invasive means. Ultrasonic, radiographic, thermographic, electromagnetic, and
optic methods are employed to probe interior microstructure and characterize
subsurface features. Applications are in non-invasive medical diagnosis, intelligent
robotics, security screening, and online manufacturing process control, as well as
the traditional NDE areas of flaw detection, structural health monitoring, and
materials characterization. The focus of our work is to implement new and better
measurements with both novel instrumentation and machine learning that automates
the interpretation of the various (and multiple) imaging data streams.
Twenty years ago, we were facing applications where we needed to automatically interpret very complicated ultrasound waveforms in near real time. We were
also facing applications where we wanted to incorporate far more information than
could be presented to a human expert in the form of images. Time-scale representations, such as the spectrogram, looked quite promising, but there’s no reason
to expect that a boxcar FFT would be optimal. We had been exploring non-Fourier
methods for representing signals and imagery, e.g., Gaussian weighted Hermite
polynomials, and became interested wavelets.
The basic idea is that we start with a time-domain signal and then perform a
time-scale or time–frequency transformation on it to get a two-dimensional representation, i.e., an image. Most researchers then immediately extracted a parameter
from that image to collapse things back down to a 1D data stream. In tissue
characterization with diagnostic ultrasound, one approach was to take a boxcar FFT
of the A-scan line, which returned something about the spectrum as a function of
anatomical depth. The slope, mid-point value, and/or intercept gave parameters that
vii
viii
Preface
seemed to be useful to differentiate healthy from malignant tissues. B-scans are
made up of a fan of A-scan lines, so one could use those spectral parameters to
make parameter images. When Jidong Hou tried that approach using wavelets
instead of windowed FFTs, he rendered them as black and white contour plots
instead of false-color images. Since they often looked like fingerprints, we named
them wavelet fingerprints.
Since 2002, we have applied this wavelet fingerprint approach to a wide variety
of different applications and have found that it's naturally suited to machine
learning. There are many different wavelet basis functions that can be used. There
are adjustable parameters that allow for differing amounts of de-noising and richness of the resulting fingerprints. There are all manner of features that can be
identified in the fingerprints, and innumerable ways that identified features can be
quantified. In this book, we describe several applications of this approach. We also
discuss stupid squirrels and flying saucers and tweetstorms. You may find some
investment advice.
Williamsburg, Virginia
April 2020
Mark K. Hinders
Acknowledgements
MKH would especially like to thank his research mentors, the late Profs. Asim Yildiz
and Guido Sandri, as well as their research mentors, Julian Schwinger, and J. Robert
Oppenheimer. Asim Yildiz (DEng, Yale) was already a Professor of Engineering
at the University of New Hampshire when Prof. Schwinger at Harvard told him that
he was “doing good physics” already so he should “get a union card.” Schwinger
meant that Yildiz should get a Ph.D. in theoretical physics with him at Harvard, which
Yildiz did while still keeping his faculty position at UNH and mentoring his own
engineering graduate students, running his own research program, etc. He also taught
Schwinger to play tennis, having been a member of the Turkish national team as
well as all the best tennis clubs in the Boston area.
When Prof. Yildiz died at age 58, his genial and irrepressibly jolly BU colleague
took on his orphaned doctoral students, including yours truly, even though the
students’ research areas were all quite distant from his own. Prof. Sandri had done
postdoctoral research with Oppenheimer at the Princeton Institute for Advanced
Study and then was a senior scientist at Aeronautical Research Associates of
Princeton for many years before spending a year in Milan at Instituto Di
ix
x
Acknowledgements
Mathematica del Politecnico and then joining BU. In retirement, he helped found
Wavelet Technologies, Inc. to exploit mathematical applications of novel wavelets
in digital signal processing, image processing, data compression, and problems in
convolution algebra.
Pretty much all of the work described in this book benefited from the talented
efforts of Jonathan Stevens, who can build anything a graduate student or a startup
might ever need, whether that’s a mobile robot or an industrial scanner or an
investigational device for medical ultrasound, or a system to stabilize and digitize
decaying magnetic tape recordings. The technical work described in this volume is
drawn from the thesis and dissertation research of my students in the Applied
Science Department at The College of William and Mary in Virginia, which was
chartered in 1693. I joined the faculty of the College during its Tercentenary and
co-founded the new Department of Applied Science at America’s oldest University.
Our campus abuts the restored eighteenth-century Colonial Williamsburg where
W&M alum Thomas Jefferson and his compatriots founded a new nation. It’s a
beautiful setting to be pushing forward the boundaries of machine learning in order
to solve problems in the real world.
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
5
8
9
14
22
24
25
26
30
30
31
34
35
37
40
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb
Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Introduction to Lamb Waves . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Simulation Methods for SHM . . . . . . . . . . . . . . . . . . . . . . .
2.4 Signal Processing for Lamb Wave SHM . . . . . . . . . . . . . . .
2.5 Wavelet Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Wavelet Fingerprinting . . . . . . . . . . . . . . . . . . . . . . .
2.6 Machine Learning with Wavelet Fingerprints . . . . . . . . . . . .
2.7 Applications in Structural Health Monitoring . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
48
49
51
52
54
58
64
1 Background and History . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Has Science Reached the End of Its Tether? . . . . . . . .
1.2 Did the Computers Get Invited to the Company Picnic?
1.2.1 But Who Invented Machine Learning, Anyway?
1.3 You Call that Non-invasive? . . . . . . . . . . . . . . . . . . . .
1.4 Why Is that Stupid Squirrel Named Steve? . . . . . . . . . .
1.5 That’s Promising, but What Else Could We Do? . . . . .
1.5.1 Short-Time Fourier Transform . . . . . . . . . . . . .
1.5.2 Other Methods of Time–Frequency Analysis . . .
1.5.3 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 The Dynamic Wavelet Fingerprint . . . . . . . . . . . . . . . .
1.6.1 Feature Creation . . . . . . . . . . . . . . . . . . . . . . .
1.6.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . .
1.6.3 Edge Computing . . . . . . . . . . . . . . . . . . . . . . .
1.7 Will the Real Will West Please Step Forward? . . . . . . .
1.7.1 Where Are We Headed? . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xi
xii
Contents
2.7.1 Dent and Surface Crack Detection in Aircraft Skins .
2.7.2 Corrosion Detection in Marine Structures . . . . . . . .
2.7.3 Aluminum Sensitization in Marine Plate-Like
Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.4 Guided Wave Results—Crack . . . . . . . . . . . . . . . .
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
....
....
.
.
.
.
.
.
.
.
.
.
.
.
65
73
.
82
.
90
.
98
. 102
. . . . . . . 115
. . . . . . . 115
. . . . . . . 118
3 Automatic Detection of Flaws in Recorded Music . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Digital Music Editing Tools . . . . . . . . . . . . . . . . . . . .
3.3 Method and Analysis of a Non-localized Extra-Musical
Event (Coughing) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Errors Associated with Digital Audio Processing . . . . .
3.5 Automatic Detection of Flaws in Cylinder Recordings
Using Wavelet Fingerprints . . . . . . . . . . . . . . . . . . . . .
3.6 Automatic Detection of Flaws in Digital Recordings . . .
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
132
137
140
4 Pocket Depth Determination with an Ultrasonographic
Periodontal Probe . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Classification: A Binary Classification Algorithm . .
4.6.1 Binary Classification Algorithm Examples . .
4.6.2 Dimensionality Reduction . . . . . . . . . . . . . .
4.6.3 Classifier Combination . . . . . . . . . . . . . . . .
4.7 Results and Discussion . . . . . . . . . . . . . . . . . . . . .
4.8 Bland–Altman Statistical Analysis . . . . . . . . . . . . .
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
143
145
147
150
152
155
157
158
159
159
163
166
166
167
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . 123
. . . . . . . 126
5 Spectral Intermezzo: Spirit Security Systems . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6 Classification of Lamb Wave Tomographic Rays in
to Distinguish Through Holes from Gouges . . . . . .
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pipes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
181
181
183
185
Contents
xiii
6.3.1 Apparatus . . . . . . . . . . . . . . . . . . . .
6.3.2 Ray Path Selection . . . . . . . . . . . . . .
6.4 Classification . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 Feature Extraction . . . . . . . . . . . . . .
6.4.2 DWFP . . . . . . . . . . . . . . . . . . . . . . .
6.4.3 Feature Selection . . . . . . . . . . . . . . .
6.4.4 Summary of Classification Variables .
6.4.5 Sampling . . . . . . . . . . . . . . . . . . . . .
6.5 Decision . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 Results and Discussion . . . . . . . . . . . . . . . .
6.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . .
6.6.2 Flaw Detection Algorithm . . . . . . . .
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
185
187
188
188
189
190
191
192
192
196
196
197
203
204
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
207
207
209
209
212
215
216
222
223
223
225
227
231
231
233
234
237
238
243
244
8 Pattern Classification for Interpreting Sensor Data
from a Walking-Speed Robot . . . . . . . . . . . . . . . . .
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 rMary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.1 Sensor Modalities for Mobile Robots . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
247
247
249
249
7 Classification of RFID Tags with Wavelet
Fingerprinting . . . . . . . . . . . . . . . . . . . . . . .
7.1 Introduction . . . . . . . . . . . . . . . . . . . .
7.2 Classification Overview . . . . . . . . . . . .
7.3 Materials and Methods . . . . . . . . . . . .
7.4 EPC Extraction . . . . . . . . . . . . . . . . . .
7.5 Feature Generation . . . . . . . . . . . . . . .
7.5.1 Dynamic Wavelet Fingerprint . .
7.5.2 Wavelet Packet Decomposition .
7.5.3 Statistical Features . . . . . . . . . .
7.5.4 Mellin Features . . . . . . . . . . . .
7.6 Classifier Design . . . . . . . . . . . . . . . . .
7.7 Classifier Evaluation . . . . . . . . . . . . . .
7.8 Results . . . . . . . . . . . . . . . . . . . . . . . .
7.8.1 Frequency Comparison . . . . . . .
7.8.2 Orientation Comparison . . . . . .
7.8.3 Different Day Comparison . . . .
7.8.4 Damage Comparison . . . . . . . .
7.9 Discussion . . . . . . . . . . . . . . . . . . . . .
7.10 Conclusion . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiv
Contents
8.3
Investigation of Sensor Modalities Using rMary . . . . .
8.3.1 Thermal Infrared (IR) . . . . . . . . . . . . . . . . . . .
8.3.2 Kinect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.3 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.4 Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 Pattern Classification . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 Compiling Data . . . . . . . . . . . . . . . . . . . . . . .
8.4.2 Aligning Reflected Signals . . . . . . . . . . . . . . .
8.4.3 Feature Creation with DWFP . . . . . . . . . . . . .
8.4.4 Intelligent Feature Selection . . . . . . . . . . . . . .
8.4.5 Statistical Pattern Classification . . . . . . . . . . . .
8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.1 Proof-of-Concept: Acoustic Classification
of Stationary Vehicles . . . . . . . . . . . . . . . . . .
8.5.2 Acoustic Classification of Oncoming Vehicles .
8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
251
251
252
256
261
265
265
266
271
273
275
278
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
278
280
290
293
9 Cranks and Charlatans and Deepfakes . . . . . . . . . . . . . . . .
9.1 Digital Cranks and Charlatans . . . . . . . . . . . . . . . . . . .
9.2 Once You Eliminate the Possible, Whatever Remains,
No Matter How Probable, Is Fake News . . . . . . . . . . .
9.3 Foo Fighters Was Founded by Nirvana Drummer Dave
Grohl After the Death of Grunge . . . . . . . . . . . . . . . . .
9.4 Digital Imaging Is Why Our Money Gets Redesigned
so Often . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5 Social Media Had Sped up the Flow of
Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6 Discovering Latent Topics in a Corpus of Tweets . . . . .
9.6.1 Document Embedding . . . . . . . . . . . . . . . . . . .
9.6.2 Topic Models . . . . . . . . . . . . . . . . . . . . . . . . .
9.6.3 Uncovering Topics in Tweets . . . . . . . . . . . . . .
9.6.4 Analyzing a Tweetstorm . . . . . . . . . . . . . . . . . .
9.7 DWFP for Account Analysis . . . . . . . . . . . . . . . . . . . .
9.8 In-Game Sports Betting . . . . . . . . . . . . . . . . . . . . . . . .
9.9 Virtual Financial Advisor Is Now Doable . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 297
. . . . . . . 297
. . . . . . . 302
. . . . . . . 305
. . . . . . . 310
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
313
314
315
316
320
321
331
335
337
340
Chapter 1
Background and History
Mark K. Hinders
Abstract Machine learning is the modern lingo for what we’ve been trying to do
for decades, namely, to make sense of the complex signals in radar and sonar and
lidar and ultrasound and so forth. Deep learning is fashionable right now and those
sorts of black-box approaches are effective if there is a sufficient volume and quality of training data. However, when we have appropriate physical and mathematical
models of the underlying interaction of the radar, sonar, lidar, ultrasound, etc. with
the materials, tissues, and/or structures of interest, it seems odd to not harness that
hard-won knowledge. We explain the key issue of feature vector selection in terms of
autonomously distinguishing rats from squirrels. Time–frequency analysis is introduced as a way to identify dynamic features of varmint behavior, and the dynamic
wavelet fingerprint is explained as a tool to identify features from signals that may
be useful for machine learning.
Keywords Machine learning · Pattern classification · Wavelet · Fingerprint ·
Spectrogram
1.1 Has Science Reached the End of Its Tether?
Radar blips are reflections of radio waves. Lidar does that same thing with lasers.
Sonar uses pings of sound to locate targets. Autonomous vehicles use all of these,
plus cameras, in order to identify driving lanes, obstructions, other vehicles, bicycles
and pedestrians, deer and dachshunds, etc. Whales and dolphins echolocate with
sonar. Bats echolocate with chirps of sound that are too high in frequency for us to
hear, which is called ultrasound. Both medical imaging and industrial nondestructive
testing have used kHz and MHz frequency ultrasound for decades. Infrasounds are
frequencies that are too low for us to hear, which hasn’t found a practical application
yet. It all started with an unthinkable tragedy.
M. K. Hinders (B)
Williamsburg Machine Learning Algorithmics, Williamsburg, VA 23187-0767, USA
e-mail: hinders@wmla.ai
URL: http://www.wmla.ai
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_1
1
2
M. K. Hinders
Not long after its maiden voyage in 1912 when the unsinkable Titanic struck an
iceberg and sank [1] Sir Hiram Maxim self-published a short book and submitted a
letter to Scientific American [2] entitled, “A New System for Preventing Collisions
at Sea” in which he said:
The wreck of the Titanic was a severe and painful shock to us all; many of us lost friends and
acquaintances by this dreadful catastrophe. I asked myself: “Has Science reached the end of
its tether? Is there no possible means of avoiding such a deplorable loss of life and property?
Thousands of ships have been lost by running ashore in a fog, hundreds by collisions with
other ships or with icebergs, nearly all resulting in great loss of life and property.” At the end
of four hours it occurred to me that ships could be provided with what might be appropriately
called a sixth sense, that would detect large objects in their immediate vicinity without the
aid of a searchlight. Much has been said, first and last, by the unscientific of the advantages of
a searchlight. Collisions as a rule take place in a fog, and a searchlight is worse than useless
even in a light haze, because it illuminates the haze, making all objects beyond absolutely
invisible.
Some have even suggested that steam whistle or siren might be employed that would
periodically give off an extremely powerful sound, that is, a veritable blast, an ear-piercing
shriek, and then one is supposed to listen for an echo, it being assumed that if any object
should be near, a small portion of the sound would be reflected back to the ship, but this plan
when tried proved an absolute failure. The very powerful blast given off by the instrument
is extremely painful to the ears and renders them incapable of hearing the very feeble echo
which is supposed to occur only a few seconds later. Moreover, sirens or steam whistles of
great power are extremely objectionable on board passenger ships; they annoy the passengers
and render sleep impossible. It is, therefore, only too evident that nothing in the way of a
light or noise producing apparatus could be of any use whatsoever.
Maxim knew that bats used some form of inaudible sound—outside the range of
human hearing—to echolocate and feed, but he thought it was infrasound rather than
ultrasound. He then proceeded to describe an extremely low-frequency directional
steam whistle or siren that could be used to (echo)locate icebergs during foggy nights
when collisions were most likely to occur. Whether his patented apparatus would
have been effective at preventing collisions at sea is a question that’s a little like
whether Da Vinci’s contraptions would have flown. He got the general idea right,
and can be credited with stimulating the imaginations of those who subsequently
worked out all the engineering details.
His figure reproduced below is quite remarkable. The key idea is that the time
delay of the echoes determines distance because the speed of sound is known, but
more importantly the shape of the echoes gives information about the object which
is returning those echoes. Analysis of those echo waveforms can, in principle, tell
the difference between a ship and an iceberg, and also differentiate large and small
icebergs. He even illustrates how clutter affects the echoes differently from backscattering targets. Science has not reached the end of its tether, even after a century
of further development. This is exactly how radar and sonar and ultrasound work
(Fig.1.1).
Maxim’s suggested apparatus embodies a modified form of “siren,” through which
high-pressure steam can be made to flow in order to produce sound waves having
about 14 to 15 vibrations per second, and consequently not coming within the frequency range of human hearing. These waves, it is correctly asserted, would be
1 Background and History
3
Fig. 1.1 The infrasonic echo waves would be recorded by a stretched membrane that the infrasound
waves would vibrate, and those membrane vibrations could jingle attached bells or wiggle pens
tracing lines on paper as Maxim illustrated [3]. Maxim’s concept was discussed in Nature, a leading
scientific journal yet today [4]
capable of traveling great distances, and if they struck against a body ahead of the
ship they would be reflected back toward their source, “echo waves” being formed
[3]. This self-published pamphlet was discussed in [4].
Well more than a century before Maxim, Lazzaro Spallanzani performed extensive
experiments on bats, and concluded that bats possess some sort of sixth sense, in that
they use their ear for detecting objects, if not actually seeing them [5]. His blinded
bats got around just fine in either light or dark, and bats he deafened flew badly
and hurtled themselves into objects in both dark and light. The problem was that no
technology existed until the early twentieth century that could measure the ultrasonic
screeches bats were using to echolocate. Prof. G.W. Pierce in the Physics Department
at Harvard happened to build such a device, and in 1938 an undergraduate zoology
student, who was banding bats to study their migration patterns, asked if they could
use that (ultra)sonic detector apparatus to listen to his bats. Young Donald Griffin
brought a cage full of bats to Prof. Pierce’s acoustics laboratory wherein humans first
heard their down-converted “melody of raucous noises from the loudspeaker” of the
device [6, 7]. Donald Griffin and a fellow student found that bats could use reflected
sounds to detect objects (Fig. 1.2).
What Griffin discovered about bats just before WWII, other scientists discovered
about dolphins and related mammals just after WWII. Curiously, unequivocal demonstration of echolocation in dolphins wasn’t accomplished until 1960. Nevertheless,
the first underwater sonar apparatus was constructed in order to detect submarines in
1915, using the concepts sketched out by Maxim. By WWII sonar was an essential
part of antisubmarine warfare, spurred on by Nazi U-boats sinking merchant ships
transporting men and materiel to European allies [10].
4
M. K. Hinders
Fig. 1.2 Donald R. Griffin:
1915–2003 [8, 9]. Dr. Griffin
coined the term echolocation
to describe the phenomenon
he continued to study for 65
years
Practical radar technology was ready just in time to turn the tide during the Battle
of Britain, thanks to the cavity magnetron [11] and amateur scientist and Wall Street
tycoon Alfred Lee Loomis, who personally funded significant scientific research at
his private estate before leading radar research efforts during World War II [12, 13].
The atomic bomb may have ended the war, but radar and sonar won it.1 During the
entirety of the Cold War, uncounted billions have been spent continuing to refine
radar and sonar technology, countermeasures, counter-countermeasures, etc. with
much of that mathematically esoteric scientific work quite highly classified. The
result is virtually undetectable submarines that patrol the world’s oceans standing
ready to assure mutual destruction as a deterrent to sneak attack. More visibly, but
highly stealthy, radar evading fighters and bombers carrying satellite-guided preci1 Time
magazine was all set to do a cover story about the importance of radar technology in the
impending Allied victory, but the story got bumped off the August 20, 1945 cover by the A-bombs
dropped on Japan and the end of WWII: “The U.S. has spent half again as much (nearly $3 billion)
on radar as on atomic bombs. As a military threat, either in combination with atomic explosives or
as a countermeasure, radar is probably as important as atomic power itself. And while the peacetime
potentialities of atomic power are still only a hope, radar already is a vast going concern—a $2
billion-a-year industry, six times as big as the whole prewar radio business.”
1 Background and History
5
sion weapons can destroy any fixed target anywhere on the planet with impunity
while minimizing collateral damage. It’s both comforting and horrifying at the same
time.
What humans have been trying to figure out since Maxim’s 1912 pamphlet, is how
best to interpret the radar blips and sonar pings and lidar reflections and ultrasound
images [14]. The fundamental issue is that the shape, size, orientation, and composition of the object determines the character of the scattered signal, so an enormous
amount of mental effort has gone into mathematical modeling and computer simulation to try to understand enough about that exceedingly complex physics in order to
detect navigation hazards, enemy aircraft and submarines, tumors, structural flaws,
etc. Much of that work has been in what is called by mathematical physics forward
scattering, where
I know what I transmit, and I know the size & shape & materials & location & orientation
of the scattering object. I then want to predict the scattered field so I’ll know what to look
for in my data.
The real problem is mathematically much more difficult, called inverse scattering,
wherein
I know what I transmit, and I measure some of the scattered field. I then want to estimate
the size & shape & materials & location & orientation of the scatterer.
Many scientific generations have been spent trying to solve inverse scattering problems in radar and sonar and ultrasound. In some special cases or with perhaps a few
too many simplifying assumptions, there has been a fair amount of success [15].
1.2 Did the Computers Get Invited to the Company Picnic?
Once upon a time “computer” was a job description [16, 17]. They were excellent
jobs for mathematically talented women at a time when it was common for young
women to be prevented from taking math classes in school. My mother describes
being jealous of her high school friend who was doing math homework at lunch,
because she wasn’t allowed to take math. As near as I can tell it wasn’t her widower
father who said this, but it was instead her shop-keeper/school-teacher aunt who was
raising her. This was the 1950s, when gender roles had regressed to traditional forms
after the war. Recently, a mathematically talented undergraduate research student
in my lab was amused that her MATLAB code reproduced instantly plots from a
1960 paper [18] that had required “the use of two computers for two months” until
I suggested that she check the acknowledgements where the author thanked those
two computers by name. That undergraduate went on to earn a degree with honors—
double majoring in physics and anthropology, with a minor in math and research
in medical ultrasound—and then a PhD in medical imaging. My mother became
a teacher because her freshman advisor scoffed when she said she wanted to be a
freelance writer and a church organist: you’re a woman, you’ll be either a teacher
6
M. K. Hinders
Fig. 1.3 Computers hard at work in the NACA Dryden High Speed Flight Station “Computer
Room” in 1949 [19]. Seen here, left side, front to back, Mary (Tut) Hedgepeth, John Mayer, and
Emily Stephens. Right side, front to back, Lilly Ann Bajus, Roxanah Yancey, Gertrude (Trudy)
Valentine (behind Roxanah), and Ilene Alexander
or a nurse. She started teaching full time with an associate degree and earned BA,
MA, and EdD while working full time and raising children. I remember her struggles
with inferential statistics and punch cards for analyzing her dissertation data on an
electronic computer (Fig. 1.3).
Hedy Lamarr was the face of Snow White and Catwoman. In prewar Vienna she
was the bored arm-candy wife of an armaments manufacturer. In Hollywood, she
worked 6 days a week under contract to MGM, but kept equipment in her trailer for
inventing between takes. German U-boats were devastating Atlantic shipping, and
Hedy thought a radio-controlled torpedo might help tip the balance. She came up with
a brilliant, profoundly original jamming-proof way to guide a torpedo to the target:
frequency hopping. The Philco Magic Box remote control [20] probably inspired
the whole thing. The American film composer George Antheil said, “All she wants
to do is stay home and invent things.” He knew how to synchronize player pianos,
and thought that pairs of player-piano rolls could synchronize the communications
of a ship and its torpedo, on 88 different frequencies of course. They donated their
patented invention to the Navy, who said, “What do you want to do, put a player
piano in a torpedo? Get out of here.” The patent was labeled top secret and the idea
hidden away until after the war. They also seized her patent because she was an alien.
The Navy just wanted Hedy Lamarr to entertain the troops and sell war bonds. She
sold $25 M worth of them. That could have paid for quite a few frequency-hopping
torpedoes (Fig. 1.4).
1 Background and History
7
Fig. 1.4 Hedy Lamarr, bombshell and inventor [21]. You may be surprised to know that the most
beautiful woman in the world made your smartphone possible
Computers have been getting smaller and more portable since their very beginning.
First, they were well-paid humans in big rooms, then they were expensive machines
in big rooms, then they got small enough and cheap enough to sit on our desktops,
then they sat on our laps and we got used to toting them pretty much everywhere, and
then they were in our pockets and a teenager’s semi-disposable supercomputer comes
free with an expensive data plan. Now high-end computer processing is everywhere.
We might not even realize we’re wearing a highly capable computer until it prompts
us to get up and take some more steps. We talk to computers sitting on an end table and
they understand just like in Star Trek. In real life, the artificially intelligent machines
learn about us and make helpful suggestions on what to watch or buy next and where
to turn off just up ahead.
With the advent of machine learning we can take a new approach to making
sense of radar and sonar and ultrasound. We simply have to agree to stipulate that
the physics of that scattering is too complex to formally solve the sorts of inverse
problems we are faced with, so we accept the more modest challenge of using forward
scattering models to help define features to use in pattern classification. Intelligent
feature selection is the current incarnation of the challenge that was laid down by
Maxim. In this paradigm, we use advanced signal processing methods to proliferate
candidate “fingerprints” of the scattering object in the time traces we record. Machine
learning then formally sorts out which combinations of those signal features are most
useful for object classification.
8
M. K. Hinders
1.2.1 But Who Invented Machine Learning, Anyway?
Google [22, 23] tells me that machine learning is a subset of artificial intelligence
where computers autonomously learn from data, and they don’t have to be programmed by humans but can change and improve their algorithms by themselves.
Today, machine learning algorithms enable computers to mine unstructured text,
autonomously drive cars, monitor customer service calls, and find me that thing I
didn’t even know I needed to buy or watch. The enormous amount of data is the fuel
these algorithm need, surpassing the limitations of calculation that existed prior to
ubiquitous computing (Fig. 1.5).
Probably machine learning has its roots at the dawn of electronic computation
when humans were encoding their own algorithmic processes into machines. Alan
Turing [25] described the “Turing Test” in 1950 to determine if a computer has real
intelligence via fooling a human into believing it is also human. Also in the 1950s,
Arthur Samuel [26] wrote the first machine learning program to play the game of
Fig. 1.5 A mechanical man has real intelligence if he can fool a redditor into believing he is also
human [24]
1 Background and History
9
checkers, and as it played more it noted which moves made up winning strategies and
incorporated those moves into its program. Frank Rosenblatt [27] designed the first
artificial neural network, the perceptron, which he intended to simulate the thought
processes of the human brain. Marvin Minsky [28] convinced humans to start using
the term Artificial Intelligence.
In the 1980s, Gerald Dejong [29] introduced explanation-based learning where a
computer analyzes training data and creates a general rule it can follow by discarding unimportant data, and Terry Sejnowski [30] invented NetTalk, which learns to
pronounce words the same way a baby does. 1980s style expert systems are based on
rules. These were rapidly adopted by the corporate sector, generating new interest in
machine learning.
Work on machine learning then shifted from a knowledge-driven approach to a
data-driven approach in the 1990s. Scientists began creating programs for computers to analyze large amounts of data and learn from the results. Computers started
becoming faster at processing data with computational speeds increasing 1,000-fold
over a decade. Neural networks began to be fast enough to take advantage of their
ability to continue to improve as more training data is added. In recent years, computers have beaten learned human at chess (1997) and Jeopardy (2011) and Go (2016).
Many businesses are moving their companies toward incorporating machine learning
into their processes, products, and services in order to gain an edge over their competition. In 2015, the non-profit organization OpenAI was launched with a billion
dollars and the objective of ensuring that artificial intelligence has a positive impact
on humanity.
Deep learning [31] is a branch of machine learning that employs algorithms to
process data and imitate the thinking process of carbon-based life forms. It uses
layers of algorithms to process data, understand speech, visually recognize objects,
etc. Information is passed through each layer, with the output of the previous layer
providing input for the next layer. Feature extraction is a key aspect of deep learning
which uses an algorithm to automatically construct meaningful “features” of the
data for purposes of training, learning, and understanding. The data scientist had
heretofore been solely responsible for feature extraction, especially in cases where an
understanding of the underlying physical process aids in identifying relevant features.
In applications like radar and sonar and lidar and diagnostic ultrasonography, the
hard-won insight of humans and their equation-based models are now working in
partnership with machine learning algorithms. It’s not just the understanding of the
physics the humans bring to the party, it’s an understanding of the context.
1.3 You Call that Non-invasive?
It almost goes without saying that continued exponential growth in the price performance of information technology has enabled the gathering of data on an unprecedented scale. This is especially true in the healthcare industry. More than ever before,
data concerning people’s lifestyles, data on medical care, diseases and treatments,
10
M. K. Hinders
and data about the health systems themselves are available. However, there is concern that this data is not being used as effectively as possible to improve the quality
and efficiency of care. Machine learning has the potential to transform healthcare by
deriving new and important insights from the vast amount of data generated during
the delivery of healthcare [32]. There is a growing awareness that only a fraction of
the enormous amount of data available to make healthcare decisions is being used
effectively to improve the quality and efficiency of care and to help people take
control of their own health [33]. Deep learning offers considerable promise for medical diagnostics [34] and digital pathology [35–45]. The emergence of convolutional
neural networks in computer vision produced a shift from hand-designed feature
extractors to automatically generated feature extractors trained with backpropagation [46]. Typically, substantial expertise is needed to implement machine learning,
but automated deep learning software is becoming available for use by naive users
[47] sometimes resulting in coin-flip results. It’s not necessarily that there’s anything
wrong with deep learning approaches, it’s typically that there isn’t anywhere near
enough training data. If your black-box training dataset is insufficient, your classifier
isn’t learning, it is merely memorizing your training data. Naive users will often
underestimate the amount of training data that’s going to be needed by a few (or
several) orders of magnitude and then not test their classifier(s) on different enough
datasets. We are now just beginning to see AI systems that outperform radiologists
on clinically relevant tasks such as breast cancer identification in mammograms [48],
but there is a lot of angst [49, 50] (Fig. 1.6).
Many of the things that can now be done with machine learning have been talked
about for 20 years or more. An example is a patented system for diagnosis and
treatment of prostate cancer [52] that we worked on in the late 1990s. A talented
NASA gadgeteer and his business partner conceived of a better way to do ultrasoundguided biopsy of the prostate (US Patent 6,824,516 B2) and then arranged for a few
million dollars of Congressional earmarks to make it a reality. In modern language, it
was all about machine learning but we didn’t know to call it that back then. Of course,
training data has always been needed to do machine learning; in this case, it was to
be acquired in partnership with Walter Reed Army Medical Center in Washington,
DC.
The early stages of a prostate tumor often go undetected. Generally, the impetus
for a visit to the urologist is when the tumor has grown large enough to mimic some
of the symptoms of benign prostatic hyperplasia (BPH) which is a bit like being
constipated, but for #1 not #2. Two commonly used methods for detecting prostate
cancer have been available to clinicians. Digital rectal examination (DRE) has been
used for years as a screening test, but its ability to detect prostate cancer is limited.
Small tumors often form in portions of the prostate that cannot be reached by a
DRE. By the time the tumor is large enough to be felt by the doctor through the
rectal wall, it is typically a half inch or larger in diameter. Considering that the entire
prostate is normally on the order of one and a half inches in diameter, this cannot
be considered early detection. Clinicians may also have difficulty distinguishing
between benign abnormalities and prostate cancer, and the interpretation and results
of the examination may vary with the experience of the examiner. The prostate-
1 Background and History
11
Fig. 1.6 A server room in a
modern hospital [51]
specific antigen (PSA) is an enzyme measured in the blood that may rise naturally
as men age. It also rises in the presence of prostate abnormalities. However, the PSA
test cannot distinguish prostate cancer from benign growth of the prostate and other
conditions of the prostate, such as prostatitis. If the patient has a blood test that shows
an elevated PSA level, then that can be an indicator, but the relationship between PSA
level and tumor size is not definitive, nor does it give any indication of tumor location.
PSA testing also fails to detect some prostate cancers—about 20% of patients with
biopsy-proven prostate cancer have PSA levels within normal range. Transrectal
ultrasound (TRUS) is sometimes used as a corroborating technique; however, the
images produced can be ambiguous. In the 1990s, there were transrectal ultrasound
scanners in use, which attempted to image the prostate through the rectal wall. These
systems did not produce very good results [53].
Both PSA and TRUS enhance detection when added to DRE screening, but they
are known to have relatively high false positive rates and they may identify a greater
number of medically insignificant tumors. Thus, PSA screening might lead to treatment of unproven benefit which then could result in impotence and incontinence.
It’s part of what we currently call the overdiagnosis crisis. When cancer is suspected
because of an elevated PSA level or an anomalous digital rectal exam, a fan pattern
12
M. K. Hinders
Fig. 1.7 Prostate cancer detection and mapping system that “provides the accuracy, reliability,
and precision so additional testing is not necessary. No other ultrasound system can provide the
necessary precision.” The controlling computer displays the data in the form of computer images
(presented as level of suspicion (LOS) mapping) to the operating physician as well as archiving all
records. Follow-on visits to the urologist for periodic screenings will use this data, together with new
examination data, to determine changes in the patient’s prostate condition. This is accomplished by
a computer software routine. In this way, the physician has at his fingertips all of the information
to make an accurate diagnosis and to choose the optimum protocol for treatment
of biopsies is frequently taken in an attempt to locate a tumor and determine if a
cancerous condition exists. Because of the crudeness of these techniques, the doctor
has only a limited control over the placement of the biopsy needle, particularly in
the accuracy of the depth of penetration. Unless the tumor is quite large, the chance
of hitting it with the biopsy needle is not good. The number of false negatives is
extremely high.
In the late 1990s, it was estimated that 80% of all men would be affected by prostate
problems with advancing age [54, 55]. Prostate cancer was then one of the top three
killers of men from cancer, and diagnosis only occurred in the later stages when
successful treatment probabilities are significantly reduced and negative side effects
from treatment are high. The need was critical for accurate and consistently reliable,
early detection and analysis of prostate cancer. Ultrasound has the advantage that it
can be used to screen for a host of problems before sending patients to more expensive
and invasive tests and procedures that ultimately provide the necessary diagnostic
1 Background and History
13
Fig. 1.8 The ultrasound scan of the prostate from within the urethra is done in lockstep with the
complementary, overlapping ultrasound scan from the TRSB probe which has been placed by the
doctor within the rectum of the patient. The two ultrasound scanning systems face each other and
scan the same volume of prostate tissue from both sides. This arrangement offers a number of
distinct advantages over current systems
precision. The new system (Fig. 1.7) was designed consisting of three complimentary
subsystems: (1) transurethral scanning stem (TUScan), (2) transrectal scanning, and
(3) slaved biopsy system, referred to together as TRSB.
One of the limitations of diagnostic ultrasound is that while higher frequencies
give better resolution, they also have less depth of penetration into body tissue.
By placing one scanning system within the area of interest and adding a second
system at the back of the area of interest (Fig. 1.8) the necessary depth of penetration
is halved for each system. This permits the effective use of higher frequencies for
better resolution. The design goal was to develop an “expert” system that uses custom
software to integrate and analyze data from several advanced sensor technologies,
something that is beyond the capacity of human performance. In so doing, such a
system could enable detection, analysis, mapping, and confirmation of tumors of a
size that is as much as one-fourth the size of those detected by traditional methods.
14
M. K. Hinders
Fig. 1.9 The advanced
prostate imaging and
mapping system serves as an
integrated platform for
diagnostics and treatments of
the prostate that integrates a
number of ultrasound
technologies and techniques
with “expert system”
software to provide
interpretations, probability
assessment, and level of
suspicion mapping for
guidance to the urologist
That’s all very nice, of course, but the funding stream ended before clinical data
was acquired to develop the software expert system. Machine learning isn’t magic;
training data is always needed.
Note in Fig. 1.9 that in the 1990s we envisioned incorporating medical history into
the software expert system. That became possible after the 2009 Stimulus Package
included $40 billion in funding for healthcare IT [56] to “make sure that every
doctor’s office and hospital in this country is using cutting-edge technology and
electronic medical records so that we can cut red tape, prevent medical mistakes, and
help save billions of dollars each year.” Real-time analysis, via machine learning, of
conversations between humans is now a routine part of customer service (Fig. 1.10).
1.4 Why Is that Stupid Squirrel Named Steve?
Context matters. Machine learning is useful precisely because it helps us to draw
upon a wealth of data to make some determination. Often that determination is easy
for humans to make, but we want to off-load the task onto an electronic computer.
Here’s a simple example: squirrel or rat? Easy, but now try to describe what it is about
squirrels and rats that allow you to tell one from another. The context is that rats are
nasty and rate a call to the exterminator while squirrels are harmless and even a little
cute unless they’re having squirrel babies in my attic in which case they rate a call
to a roofer to install aluminum fascias. The standard machine learning approach to
distinguishing rats from squirrels is to train the system with a huge library of images
that have been previously identified as rat or squirrel. But what if you wanted to
articulate specific qualities that distinguish rats from squirrels? We’re going to need
to select some features.
1 Background and History
15
Fig. 1.10 The heart of the system is software artificial intelligence that develops a level of suspicion
(LOS) map for cancer throughout the prostate volume. Rather than depending on a human to search
for suspect features in the images, the multiple ultrasonic, 4D Doppler, and elastography datasets are
processed in the computer to make that preliminary determination. A false color LOS map is then
presented to the physician for selection of biopsy at those areas, which are likely cancer. The use of
these two complementary ultrasound scanners within the prostate and the adjacent rectum provides
all of the necessary information to the system computer to enable controllable, precise placement
of the integral slaved biopsy needle at any selected point within the volume of the prostate that
the doctor wishes. Current techniques have poor control over the depth of penetration, and only a
limited control over the angle of entry of the biopsy needle. With current biopsy techniques, multiple
fans of 6 or more biopsies are typically taken and false negatives are common
Fig. 1.11 It’s easy to tell the difference between a rat and a squirrel
16
M. K. Hinders
Fig. 1.12 Only squirrels climb trees. I know this because I googled “rats in trees” and got no hits
Fig. 1.13 Squirrels always have bushy tails. Yes I’m sure those are squirrels
1 Background and History
17
Fig. 1.14 Rats eat whatever they find in the street. Yes, that’s the famous pizza rat [57]
Here are some preliminary assessments based on the images in Figs. 1.11, 1.12, 1.13,
1.14, and 1.15:
• If squirrels happen to have tails, then bushiness might work pretty well, because
rat tails are never bushy.
• Ears, eyes, and heads look pretty similar, at least to my eye.
• Body fur may or may not be very helpful.
• Feet and whiskers look pretty similar.
• Squirrels are more likely to be up in trees, while rats are more likely to be down
in a gutter.
• Both eat junk food, but squirrels especially like nuts.
• Rats are nasty and they scurry whereas squirrels are cute and they scamper.
We can begin to quantify these qualities by drawing some feature spaces. For example,
any given rat or squirrel can be characterized by it’s up in a tree or down in the gutter
or somewhere in between to give an altitude number. The bushiness of the tail can
be quantified as well (Fig. 1.16). Except for unfortunate tailless squirrels, most of
them will score quite highly in bushiness while rats will have a very low bushiness
score unless they just happen to be standing in front of something that looks like a
bushy tail. Since we’ve agreed that both rats and squirrels eat pizza, but squirrels have
18
M. K. Hinders
Fig. 1.15 Rats are mean ole fatties. Squirrels are happy and healthy
Fig. 1.16 Feature space
showing the distribution of
rats and squirrels quantified
by their altitude and
bushiness of their tails
a strong preference for nuts, maybe plotting nuttiness versus bushiness (Fig. 1.17)
would work. But recall that squirrels scamper and rats scurry. If we could quantify
those, my guess is that would give well-separated classes, like Fig. 1.18.
The next step in pattern classification is to draw a decision boundary that divides
the phase space into the regions with the squirrels and the rats, respectively. Sometimes this is easy (Fig. 1.19) but it will never be perfect. Once this step is done, any
new image to be classified gets a (scamper, scurry) score which defines a
point in the phase space relative to the decision boundary. Sometimes drawing the
1 Background and History
19
Fig. 1.17 Feature space
showing the distribution of
rats and squirrels quantified
instead by their nuttiness and
bushiness of their tails
Fig. 1.18 An ideal feature
space gives tight,
well-separated clusters, but
note that there always seems
to be a squirrel or two mixed
in with the rats and vice
versa
Fig. 1.19 Easy decision
boundary
decision boundary is tricky (Fig. 1.20) and so the idea is to draw it as best you can to
separate the classes. It doesn’t have to be a straight line, of course, but it’s important
not to overfit things to the data that you happen to be using to train your classifier
(Fig. 1.21).
20
M. K. Hinders
Fig. 1.20 Trickier decision
boundary
Fig. 1.21 Complicated
decision boundary
These phase spaces are all 2D, but there’s no apparent reason to limit things
to two or three or more dimensions. In 3D, the decision boundary line becomes
a plane or some more complicated surface. If we consider more features then the
decision boundary will be a hypersurface that we can’t draw, but that we can define
mathematically. Maybe then we define a feature vector consisting of (altitude,
bushiness, nuttiness, scamper, scurry). We could even go back
and consider a bunch of other characteristics of rats and squirrels and add them.
Don’t go there. Too many features will make things worse. Depending upon the
number of classes you want to differentiate, there will be an optimal size feature
vector and if you add more features the classifier performance will degrade. This is
another place were human insight comes into play (Fig. 1.22).
If instead of trying to define more and more features we think a bit more deeply
about features that look promising, we might be able to do more with less. Scamper
and scurry are inherently dynamic quantities, so presumably we’ll need multiple
image frames rather than snapshots, which shouldn’t be a problem. A game I play
when out and about on campus, or in a city, is to classify the people and animals
around me according to my own personal (scamper, scurry) metrics. Try it
sometime. Scamper implies a lightness of being. Scurry is a bit more emotionally
dark and urgent. Scamper almost rhymes with jumper. Scurry actually rhymes with
1 Background and History
21
Fig. 1.22 The curse of
dimensionality is surprising
and disappointing to
everybody who does pattern
classification [58]
hurry. Let’s agree to stipulate that rats don’t scamper.2 In a sequence of images,
we could simply track the center of gravity of the varmint over time, which will
easily allow us to differentiate scampering from scurrying. A scampering squirrel
will have a center of gravity that goes up and down a lot over time, whereas the rat
will have a center of gravity that drops a bit at the start of the scurry but then remains
stable (Fig. 1.23). If we also calculate a tail bushiness metric for each frame of the
video clip we could form an average bushiness in order to avoid outliers due to some
extraneous bushiness from a shrubbery showing up occasionally in an image. We
then plot these as (CG Stability, Average Bushiness) as sketched in
Fig. 1.24 in case you haven’t been following along with my phase space cartoons
decorated with rat-and-squirrel clipart.
2 I’ve
watched the videos with the title “Rats Scamper Outside Notre Dame Cathedral as Flooding
Pushes Rodents Onto Paris Streets” (January 24, 2018) but those rats are clearly scurrying. Something must have gotten lost in translation. I wonder if rats are somehow to blame for the Notre Dame
fire? Surely it wasn’t squirrels nesting up in the attic!
22
M. K. Hinders
Fig. 1.23 Motion of the center of gravity is the key quantity. The blue dashed line shows the center
of gravity while sitting. The green line shows the center of gravity lowered during a scurry. The red
line shows the up and down of the center of gravity while scampering
Fig. 1.24 Use multiple
frames from video to form
new more sophisticated
feature vectors while
avoiding the curse of
dimensionality
1.5 That’s Promising, but What Else Could We Do?
We might want to use the acoustic part of the video, since it’s going to be available
anyway. We know that
•
•
•
•
•
Rats squeak but squirrels squawk.
Rats giggle at ultrasound frequencies if tickled.
Squirrel vocalizations are kuks, quaas, and moans.
If frequencies are different enough, then an FFT should provide feature(s).
Time-scale representations will allow us to visualize time-varying frequency content of complex sounds.
Many measurements acquired from physical systems are one-dimensional timedomain signals, commonly representing amplitude as a function of time. In many
cases, useful information can be extracted from the signal directly. Using the waveform of an audio recording as an example, the total volume of the recording at any
point in time is simply the amplitude of the signal at that time point. More in-depth
analysis of the signal could show that regular, sharp, high-amplitude peaks are drum
hits, while broader peaks are sustained organ notes. Amplitude, peak sharpness, and
1 Background and History
23
peak spacing are all examples of features that can be used to identify particular events
occurring in the larger signal. As signals become more complicated, such as an audio
recording featuring an entire orchestra as compared to a single instrument or added
noise, it becomes more difficult to identify particular features in the waveform and
correlate them to physical events. Features that were previously used to differentiate
signals then no longer do so reliably.
One of the most useful, and most common, transformations we can make on a
time-domain signal is the conversion to a frequency-domain spectrum. For a real
signal f (x), this is accomplished with the Fourier transform
1
F(ω) = √
2π
∞
−∞
f (t) e−iωt dt.
(1.1)
The resultant signal F(ω) is in the frequency domain, with angular frequency ω
related to the natural frequency ξ (with units cycles per second) by ω = 2π ξ . An
inverse Fourier transform will transform this signal back to the time domain. Since
this is the symmetric formulation of the transform, the inverse transform can be
written as
∞
1
F(ω) eiωt dω.
(1.2)
f (t) = √
2π −∞
Since the Fourier transform is just an extension of the Fourier series, looking at this
series is the best way to understand what actually happens in the Fourier transform.
The Fourier series, discovered in 1807, decomposes any periodic signal into a sum
of sines and cosines. This series can be expressed as the infinite sum
∞
f (t) =
a0 +
an cos(nt) + bn sin(nt),
2
n=1
(1.3)
where the an and bn are the Fourier coefficients. By finding the values of these
coefficients that best describe the original signal, we are describing the signal in
terms of some new basis functions: sines and cosines. The relation to the complex
exponential given in the Fourier transform comes from Euler’s formula, e2πiθ =
cos 2π θ + i sin 2π θ.
In general, any continuous signal can be represented by a linear combination of
orthonormal basis functions (specifically, the basis functions must define a Hilbert
space). Sines and cosines fulfill this requirement and, because of their direct relevance
to describing wave propagation, provide a physically relatable explanation for what
exactly the decomposition does—it describes the frequency content of a signal.
In practice, since real-world signals are sampled from a continuous measurement,
calculation of the Fourier transform is accomplished using a discrete Fourier transform. A number of stable, fast algorithms exist and are staples of any numerical signal
processing analysis software. As long as the Nyquist–Shannon sampling theorem is
24
M. K. Hinders
respected, sampling rate f s must be at least twice the maximum frequency content
present in the signal, no information about the original signal is lost.
1.5.1 Short-Time Fourier Transform
While the Fourier transform allows us to determine the frequency content of a signal,
all time-domain information is lost in the transformation. The spectrum of the audio
recording tells us which frequencies are present but not when those notes were being
played.
The simple solution to this problem is to look at the Fourier transform over a
series of short windows along the length of the signal. This is called the short-time
Fourier transform (STFT), and is implemented as
1
STFT { f (t)} (τ, ω) ≡ F(τ, ω) = √
2π
∞
−∞
f (t) w̄(t − τ ) e−iωt dt,
(1.4)
where w̄(t − τ ) is a windowing function that is nonzero for only a short
time,
typically
. Since this
a Hann window, described in the discrete domain by w̄(n) = sin2 Nπn
−1
is an invertible process it is possible to recreate the original signal using an inverse
transform, but windowing of the signal makes inversion more difficult.
Taking the squared magnitude of the STFT (|F(τ, ω)|2 ) and displaying the result as
a color-mapped image with frequency on the vertical axis and time on the horizontal
axis shows the evolution of the frequency spectrum as a function of time. These plots
are often referred to as spectrograms, an example of which is shown in Fig. 1.25.
It is important to note that this transformation from the one-dimensional time
domain to a joint time–frequency domain creates a two-dimensional representation
of the signal. Adding a dimension to the problem gives us more information about
our signal at the expense of more difficult analysis.
Fig. 1.25 The spectrogram (bottom) of the William and Mary Alma Mater, performed by the
William and Mary Chorus, provides information about the frequency content of the signal not
present in the time-domain waveform (top)
1 Background and History
25
The more serious limitation of the STFT comes from the uncertainty principle
known as the Gabor limit,
1
(1.5)
Δt Δω ≥ ,
2
which says that a function cannot be both time and band limited. It is impossible
to simultaneously localize a function in both the time domain and the frequency
domain, which leads to resolution issues for the STFT. A short window will provide
precise temporal resolution and poor frequency resolution, while a wide window has
the exact opposite effect.
1.5.2 Other Methods of Time–Frequency Analysis
The development of quantum mechanics in the twentieth century ushered in a number
of alternative time–frequency representations because the mathematics are similar in
the position-momentum and time–frequency domains. One of these is the Wigner–
Ville distribution, introduced in 1932, which maps the quantum mechanical wave
function to a probability distribution in phase space. In 1948, Ville wrote a time–
frequency formulation,
W (τ, ω) =
∞
f
−∞
τ+
t
2
t
f∗ τ −
e−iωt dt,
2
(1.6)
where f ∗ (t) is the complex conjugate of f (t). This can be thought of as the Fourier
transform of the autocorrelation of the original signal f (t), but because it is not a
linear transform, cross-terms occur when the input signal is not monochromatic.
Gabor also tried to improve the resolution issues with the STFT by introducing
the transform
∞
2
e−π(t−τ ) e−iωt f (t) dt,
(1.7)
G(τ, ω) =
−∞
which is basically the STFT with a Gaussian window function. Like the STFT, this
is a linear transformation and there is no problem with cross-terms. By combining
the Wigner–Ville and Gabor transforms, we can mitigate the effects of the crossterms and improve the resolution of the time–frequency representation. One possible
representation of the Gabor–Wigner transform is
D(τ, ω) = G(τ, ω) × W (τ, ω).
(1.8)
The spectrogram (STFT) is rarely the optimal time–frequency representation, but
there are others such as the Wigner (Fig. 1.26) and positive transforms (Fig. 1.27).
We can also use wavelet transforms to form analogous time-scale representations.
There are many mother wavelets to choose from.
26
M. K. Hinders
Fig. 1.26 Time-domain waveform is shown (bottom) and its power spectrum (rotated, left) along
with the Wigner transform as a false color time–frequency image
1.5.3 Wavelets
The overarching issue with any of the time–frequency methods is that the basis of
the Fourier transform is chosen with the assumption that the signals to be analyzed
are periodic or infinite in time. Most real-world signals are not periodic but change
character over time. This problem becomes even more clear when looking at finite
signals with sharp discontinuities. Approximating such signals as linear combination of sinusoids creates overshoot at the discontinuities. The well-known Gibbs
phenomenon is illustrated in Fig. 1.28.
Instead we can use a basis of finite signals, called wavelets [59], to better approximate real-world signals. The wavelet transform is written as
1 Background and History
27
Fig. 1.27 Time-domain waveform is shown (bottom) and its power spectrum (rotated, left) along
with the positive transform as a false color time–frequency image
Fig. 1.28 Attempting to
approximate a square wave
using Fourier components
(sines and cosines) creates
large oscillations near the
discontinuities. Known as
the Gibbs phenomenon, this
overshoot increases with
frequency (as more sums are
added to the Fourier series)
but eventually approaches a
finite limit
28
M. K. Hinders
Fig. 1.29 A signal s(t) is decomposed into approximations (A) and details (D), corresponding to
low- and high-pass filters, respectively. By continually decomposing the approximation coefficients
in this manner and removing the first several levels of details, we have effectively applied a low-pass
filter to the signal
1
W(τ, s) = √
s
∞
−∞
f (t) ψ
t −τ
s
dt.
(1.9)
A comparison to the STFT (1.4) shows that this transform decomposes the signal
not into linear combinations of sines and cosines, but into linear combinations of
wavelet functions ψ(τ, s). We can relate this to the Fourier decomposition (1.3) by
defining the wavelet coefficients
c jk = W k2− j , 2− j .
(1.10)
Here, τ = k2− j and is referred to as the dyadic position and s = 2− j and is called
the dyadic dilation. We are decomposing our signal in terms of a wavelet that can
move (position τ ) and deform by stretching or shrinking (scale s). This transforms our
original signal into a joint time-scale domain, rather than a frequency domain (Fourier
transform) or joint time–frequency domain (STFT). Although the wavelet transform
doesn’t provide any direct frequency information, scale is related to the inverse of
frequency, with low-scale decompositions relating to high frequency and vice versa.
This relationship is often exploited to de-noise signals by removing information at
particular scales (Fig. 1.29).
In addition to representing near-discontinuous signals better than the STFT, the
dyadic (factor-of-two) decomposition of the wavelet transform allows an improvement in time resolution at high frequencies (Fig. 1.30).
1 Background and History
29
Fig. 1.30 The STFT has similar time resolution at all frequencies, while the dyadic nature of the
wavelet transform affords better time resolution at high frequencies (low-scale values)
In the time domain, wavelets are completely described by the wavelet function
(mother wavelet ψ(t)) and a scaling function (father wavelet φ(t)). The scaling
function is necessary because stretching the wavelet in the time domain reduces the
bandwidth, requiring an infinite number of wavelets to accurately capture the entire
spectrum. This is similar to Zeno’s paradox, in which trying to get from point A to
point B by crossing half the remaining distance each step is logically fruitless. The
scaling function is an engineering solution to this problem, allowing us to get close
enough for all practical purposes by covering the rest of the spectrum.
In order to completely represent a continuous signal, we must make sure that
our wavelets form an orthonormal basis. Since as part of the decomposition we
are allowed to scale and shift our original wavelet, we only need to ensure that
the mother wavelet is continuously differentiable and compactly supported. For our
analysis, we typically use the wavelet definitions and transform algorithms included
in MATLAB.3
The Haar wavelet is the simplest example of a wavelet—a discontinuous step
function with uniform scaling function. The Haar wavelet is also the first (db1) of
the Daubechies family of wavelets abbreviated dbN, with order N the number of vanishing moments. Historically, these were the first compactly supported orthonormal
set of wavelets and were soon followed by Daubechies’ slightly modified and least
asymmetric Symlet family. The Coiflet family, also exhibiting vanishing moments,
was also created by Daubechies at the request of other researchers.
The Meyer wavelet has both its scaling and wavelet functions defined in the
frequency domain, but is not technically a wavelet because its wavelet function is
not compactly supported. However, ψ → 0 as x → ∞ fast enough that the pseudowavelet is infinitely differentiable. This allows the existence of good approximations for use in discrete wavelet transforms, and we often consider the Meyer and
related Discrete Meyer functions as wavelets for our analysis.
Both the Mexican hat and Morlet wavelets are explicitly defined and have no
scaling function. The Mexican hat wavelet is proportional to the second derivative
function of the Gaussian probability density function, while the Morlet wavelet is
2
defined as ψ(x) = Ce−x cos(5x), with scaling constant C.
3 See,
for example https://www.mathworks.com/products/wavelet.html.
30
M. K. Hinders
1.6 The Dynamic Wavelet Fingerprint
While alternative time–frequency transformations can improve the resolution limits
of the STFT, they often create their own problems such as the cross-terms in the
Wigner–Ville transform. Combinations of transforms can reduce these effects while
still offering increased resolution, but this then comes at the cost of computational
complexity. Wavelets offer an alternative basis for decomposition that is more suited
to finite real-world signals, but without the direct relationship to frequency.
One of the issues with time–frequency representations of signals is the added
complexity of the resultant time–frequency images. Just as displaying a onedimensional signal requires a two-dimensional image, viewing a two-dimensional
signal requires a three-dimensional visualization method. Common techniques
include three-dimensional surface plots that can be rotated on a computer screen
or color-mapped two-dimensional images where the value at each point is mapped
to a color.
While these visualizations work well for human interpretation of the images, computers have a difficult time distinguishing between those parts of the image we care
about and those that are just background clutter. This difficulty with image segmentation is especially true for noisy signals. The human visual system is evolutionarily
adapted to be quite good at this4 but computers lack such an advantage. Automated
image segmentation methods work well for scenes where a single object is moving
in a predictable path across a mostly stationary background.
We have developed an alternative time–frequency representation called the
dynamic wavelet fingerprint (DWFP) that we have found useful to reveal subtle
features in noisy signals. This technique takes a one-dimensional time-domain waveform and converts it to a two-dimensional time-scale image [60] generating a presegmented binary image that can be analyzed using image processing techniques.
1.6.1 Feature Creation
The DWFP process first filters a one-dimensional signal using a stationary discrete
wavelet transform. This decomposes the signal into wavelet components at a set
number of levels, removes the chosen details, and then uses the inverse stationary
wavelet transform to recompose the signal. The number of levels, details to remove,
and wavelet used for the transform are all user-specified parameters. A Tukey window
can also be applied to the filtered signal at this point to smooth out behavior at the
edges.
Next, the wavelet coefficients are created using a continuous wavelet transform.
The normalized coefficients form a three-dimensional surface, and can be thought
of as “peaks” or “valleys” depending on if the coefficients are positive or negative.
4 Those
us.
ancient humans who didn’t notice that tiger behind the bush failed to pass their genes on to
1 Background and History
31
Slicing this surface (both slice thickness and number of slices are user parameters)
and projecting the slices to a plane generate a two-dimensional binary image. The
vertical axis of this image is scale (inversely related to frequency), and the horizontal
axis remains time, allowing direct comparison to the original one-dimensional timedomain signal.
The image often resembles a set of fingerprints (hence the name), but most importantly the image is pre-segmented and can be easily analyzed by standard image
processing techniques. Since the slicing process does not distinguish between peak
(positive coefficients) and valleys (negative coefficients) we can instead do the slicing operation in two steps, keeping the peak and valley projections separate. This
generates two fingerprint images for each signal—one for peaks and one for valleys—
which can be analyzed separately or combined into a (still segmented) ternary image.
A number of additional features can be extracted from this fingerprint image.
Some of the features we extract are functions of time, for example, a simple count of
the number of ridges at each time point. However, many of the features that we want
to extract from the image are tied to a particular individual fingerprint, requiring us
to first identify and consecutively label the individual fingerprints. We use a measure
of nearly connectedness, in which pixels of the same value within a set distance
of each other are considered connected, to label each individual fingerprint. This
measure works well as long as each individual fingerprint is spatially separated from
its neighbor, something that is not necessarily true for the ternary fingerprint images.
For those cases, we actually decompose the ternary image into two separate binary
images, label each one individually, and then recombine and relabel the two images
(Fig. 1.31).
In some cases, the automated labeling will classify objects as a fingerprint even
though they may not represent our expectation of a fingerprint. While this won’t
affect the end results because such fingerprints won’t contain any useful information,
it can slow down an already computationally intensive process. To reduce these false
fingerprints, an option is added to restrict the allowed solidity range for an object to
be classified as an individual fingerprint.
1.6.2 Feature Extraction
Once the location and extent of each individual fingerprint has been determined, we
apply standard image processing libraries included in MATLAB to extract features
from the image. The resemblance of our images to fingerprints, for which a large
image recognition literature already exists, can be exploited in this process.
These parameter waveforms are then linearly interpolated to facilitate a direct
comparison to the original time-domain signal. Typically, about 25 one-dimensional
parameter waveforms are created for each individual measurement. Some of these
features are explained in more detail below.
32
M. K. Hinders
Fig. 1.31 To consecutively label the individual fingerprints within the fingerprint image, the valleys
(top left) and peaks (top right) images are first labeled individually and then combined into an overall
labeled image (bottom)
A number of features are extracted from both the raw signal and the wavelet
fingerprint image using the MATLAB image processing toolbox regionprops analysis
to create an optimized feature vector for each instance.
1. Area: Number of pixels in the region
2. Filled Area: Number of pixels in the bounding box (smallest rectangle that completely
encloses the region)
3. Extent: Ratio of pixels to pixels in bounding box, calculated as
Area
Area of bounding box
4. Convex Area: Area of the convex hull (the smallest convex polygon that contains the area)
5. Equivalent
Diameter: Diameter of a circle with the same area as the region, calculated as
4·Area
π
6. Solidity: Proportion of pixels in the convex hull to those also in the region, calculated as
Area
Convex Area
7. xCentroid: Center of mass of the region along the horizontal axis
8. yCentroid: Center of mass of the region along the vertical axis
9. Major Axis Length: Pixel length of the major axis of the ellipse that has the same normalized
second central moments as the region
1 Background and History
33
Table 1.1 List of user parameters in DWFP creation and feature extraction process
Setting
Options
Description
Wavelet filtering
filtmethod
wvtpf
numlvls
swdtoremove
Wavelet transform
wvt
ns
normconstant
numslices
slicethickness
Feature extraction
saveimages
fullorred
solidity_range
Filt, filtandwindow, window,
none
wavelet name
Z+
[Z+ ]
How to filter data
Filtering wavelet
Number of levels to filter
Details to remove
wavelet name
Z+
Z+
Z+
R+
Transform wavelet
Number of scales for transform
Normalization constant
Number of slices
Thickness of each slice
Binary switch
Full, reduced
[R ∈ [0, 1], R ∈ [0, 1]]
Save fingerprint images?
Require certain solidity
Allowable solidity range
10. Minor Axis Length: Pixel length of the minor axis of the ellipse that has the same normalized
second central moments as the region
11. Eccentricity: Eccentricity of the ellipse that has the same normalized second central
moments as the region, computed as the ratio of the distance between the foci of the
ellipse and its major axis length
12. Orientation: Angle (in degrees) between the x-axis and the major axis of the ellipse that
has the same second moments as the region
13. Euler Number: Number of objects in the region minus the number of holes in those objects,
calculated using 8-connectivity
14. Ridge count: Number of ridges in the fingerprint image, calculated by looking at the number
of transitions between pixels on and off at each point in time
The user has control of a large number of parameters in the DWFP creation and
feature extraction process (Table 1.1), which affect the appearance of the fingerprint
images, and thus the extracted features. There is no way to tell a priori which combination of parameters will create the ideal representation for a particular application.
Past experience with analysis of DWFP images helps us to avoid an entirely brute
force implementation for many applications. However, in some cases, the signals
to be analyzed are so noisy that humans are incapable of picking out useful patterns in the fingerprint images. For these applications, we use the formal machine
learning language of pattern classification and a computing cluster to run this feature
extraction process in parallel for a large number of parameter combinations.
We first have to choose a basis function, which could be sinusoidal, wavelet, or
whatever. Then we have to identify something(s) in the resulting images that can
34
M. K. Hinders
be used to make features. The tricky bit is automatically extracting those image
features: it’s computationally expensive. Then we have to somehow downselect the
best image features from all the ones that are promising. You’ll be unsurprised that the
curse of dimensionality comes into play here. All we wanted to do was distinguish
rats from squirrels and somehow we’re now doing wavelet transforms to tell the
difference between scampering-squawking squirrels and scurrying-squeaking rats.
It would seem simpler just to brute force the problem by training a standard machine
learning system with a large library of squirrel versus rat images. Yes, that would be
simpler conceptually. Go try that if you like. We’ll wait.
1.6.3 Edge Computing
Meanwhile, the world is pushing hard into the Internet of Things (IoT) right now.
The sorts of low-power sensor hubs that are in wearables have an amazing amount
of capability, both in terms of sensors and local processing of sensor data which
enables what’s starting to be called distributed machine learning, edge computing,
etc. For the last 30 years, my students and I have been modeling the scattering of
radar, sonar, and ultrasound waves from objects, tissues, materials, and structures.
The reflection, transmission, refraction and diffraction of light, and the conduction
of heat are also included in this body of work. As computers have become more and
more capable, three-dimensional simulations of these interactions have become a key
aspect of sorting out very complex behaviors. Typically, our goal has been to solve
inverse problems where we know what the excitation source is, and some response of
the system is measured. Success is being able to automatically and in (near) real time
deduce the state of the object(s), tissue(s), material(s), and/or structure(s). We also
want quantitative outputs with a resolution appropriate for that particular case. The
mathematical modeling techniques and numerical simulation methods are the same
across a wide range of physical situation and field of application. Why the sky is blue
[61–65] and how to make a bomber stealthy [66–69] both utilize Maxwell’s equations.
Seismic waves and ultrasonic NDE employ identical equations once feature sizes are
normalized by wavelength. Sonar of whales and submarines is identical to that of
echolocating bats and obstacle-avoiding robots.
In general, inverse scattering problems are too hard, but bats do avoid obstacles
and find food. Radar detects inbound threats and remotely estimates precipitation.
Cars park themselves and (usually) stop before impact even if the human driver is
not paying close attention and I’m jaywalking with a cup of WaWa coffee trying not
to spill. Medical imaging is now so good that we find ourselves in an overdiagnosis
dilemma. Computed tomography is familiar to most people from X-ray CT scanners used in medicine and baggage screening. In a different configuration, CT is a
workhorse method for seismic wave exploration for natural resources.
We adapted these methods for ultrasound-guided wave characterization of flaw
in large engineered structures like aircraft and pipelines. By extracting features from
signals that have gone through the region of interest in many directions, the recon-
1 Background and History
35
struction algorithms output a map of the quantities of interest, e.g., tissue density
or pipe wall thickness. The key is understanding which features to extract from the
signals for use in tomographic reconstruction, but that understanding comes from
an analysis of the signal energy interacting with the tissue, structure, or material
variations that matter.
For many years now, we’ve been developing the underlying physics to enable
robots to navigate the world around them. We tend to focus on those sensors and
imagers where the physics is interesting, and then develop the machine learning
algorithms to allow the robots to autonomously interpret its sensors and imagers.
For autonomous vehicles, we now say IoV instead, and there the focus is sensors/interpretation, but also communication among vehicles which is how an
autonomous vehicle sees around corners and such.
The gadgets used for computed tomography tend to be large and expensive and
often immobile, but their primary constraint is measurement geometries that can
be overly restrictive. Higher and higher fidelity reconstructions require more and
more ray paths, each of which is a signal from which features must be extracted
in (near) real time. Our primary effort over the last two decades has been signal
processing methods to extract relevant features from these sorts of signals, and we
eventually began calling these sets of algorithms narrow-AI expert systems. They’re
always carefully tailored and finely tuned to the particular measurement scenario of
interest at that time, and they use whatever understanding of the appropriate scattering
problem we can bring to bear.
1.7 Will the Real Will West Please Step Forward?
In colonial India, British bureaucrats had a problem. They couldn’t tell one native
pensioner from another, so when one died another person would show up and continue
to collect that pension. Photo ID wasn’t a thing yet, and illiterate Indians couldn’t
sign for their cash. Fingerprints provided the answer, because there is this belief that
they are unique to each individual.
In Victorian times, prisons switched over from punishment to rehabilitation, but
if you’re going to give harsher sentences to repeat offenders you have to be able
to tell whether someone has been to prison before. Again, without photography
and hence no ability to store and transmit images, it would be pretty simple to avoid
harsher and harsher sentences by giving a fake name every subsequent time you were
arrested. The question then becomes: What features of individuals can be measured
to uniquely identify them? Hairstyle or color won’t work. Eye color would work, but
weight wouldn’t. Scars, tattoos, etc. should work if there are any. Alphonse Bertillon
[71] had the insight that bones don’t change after you’ve stopped growing, so a list of
your particulars including this kind of measurement should allow you to be uniquely
identified. His system included 11 measurements:
36
M. K. Hinders
Fig. 1.32 In 1903, a prisoner named Will West arrived at Leavenworth. William West had been
serving a life sentence at Leavenworth since 1901. They had identical Bertillon measurements.
Fingerprints were distinct, though. See: [70]
• Height,
• Stretch: Length of body from left shoulder to right middle finger when arm is
raised,
• Bust: Length of torso from head to seat taken when seated,
• Length of head: Crown to forehead,
• Width of head: temple to temple,
• Length of right ear,
• Length of left foot,
• Length of left middle finger,
• Length of left cubit: elbow to tip of middle finger,
• Width of cheeks, and
• Length of left little finger.
This can be pretty expensive to do, and can’t distinguish identical twins: Will West and
William West were both in Leavenworth and had identical Bertillon measurements,
Fig. 1.32. Objections to this system included the cost of instruments employed and
their liability to become out of order; the need for specially instructed measurers
of superior education; errors frequently crept in when carrying out the processes
and were all but irremediable; and modesty for women. The last objection was a
problem in trying to quantify recidivism among prostitutes. The consensus was that
a small number of repeat offenders were responsible for a large proportion of crimes,
and that these “female defective delinquents” spread STDs. Bertillon anthropometry
required a physical intimacy between the operator and the prisoner’s body that was
not deemed appropriate for female prisoners. This was a time when doctors did
make house calls, but typically wouldn’t have their female patients disrobe at all and
may not even lay on hands because such things would be immodest. It’s no wonder
that a common diagnosis was “women’s troubles.” Also, a problem for Bertillonage
was that Bouffant hairstyles throw off height measures. When people complained to
Bertillon, he responded that of course you can’t make it work, you’re not French.
Fingerprints seemed like a better solution all around. They also have the advantage
of being cheap and easy, and the Henry fingerprint classification system allowed
1 Background and History
37
fingerprint information to be written down and filed for later use, and also sent by
telegraph or letter.:
•
•
•
•
•
•
Basic patterns: arch, loop, whorl, and composite.
Fingers numbered 1–10 starting at left pinkie.
Primary classification a fraction with odd number fingers over even number fingers.
Add 1, 2, 4, 8, 16 when whorls appeared.
1/1 is zero whorls, 32/32 is ten whorls.
Filing cabinets had 32 rows and 32 columns.
Secondary classification indicated arches, tented arches, radial loops, or ulnar loops,
with right-hand numerators and left denominators. Further subclassification of whorls
was done by ridge tracing since whorls originate in two deltas. Loops were subclassified by ridge counting, i.e., the number of ridges between delta and core. Hence,
5/17 R/U OO/II 19/8 for the author of a surprisingly interesting history of
fingerprinting [72] who has whorls in both thumbs; high-ridge-count radial loops in
the right index and middle fingers; low-ridge-count ulnar loops in the left index and
middle fingers; nineteen-ridge-count loop in the right little finger; and eight-ridgecount loop in the left little finger. This pattern classification system worked pretty
well until the database got large enough that searching it became impractical. There
was also a terrific battle for control over the official fingerprint repository between
New York and the FBI. You can guess who won that. The key message is that the
way biometrics have always worked is not by comparing images, but by comparing
features extracted from them. In fingerprint analysis that’s the way it works to this
day, with the language being the number of points that match. People still believe
that fingerprints are unique to the individual, and don’t realize that extracting features from fingerprints can be quite subjective. I wonder what a jury would say if the
suspect’s cubit (length from elbow to fingertip) was used as proof of guilt (Fig. 1.33).
1.7.1 Where Are We Headed?
So you see, machine learning is the modern lingo for what we’ve always been trying
to do. Sometimes we’ve called it pattern classification or expert systems, but the
key issue is determining what’s what from measured data streams. Machine learning
is usually divided into supervised and unsupervised learning. Supervised learning
requires training data with known class labels, and unless one has a sufficient amount
of relevant training data these methods will return erroneous classifications. Unsupervised learning can reveal structures and inter-relations in data with no class labels
required for the data. It can be used as a precursor for supervised learning, but can
also uncover the hidden thematic structure in collections of documents, phone conversations, emails, chats, photos, videos, etc.
This is important because more than 90% of all data in the digital universe is
unstructured. Most people have an idea that every customer service phone call is
now monitored, but that doesn’t mean that a supervisor is listening in, it means
38
M. K. Hinders
Fig. 1.33 From top left to bottom right: loop, double loop, central pocket loop, plain whorl, plain
arch, and tented arch [72]
that computers are hoovering up everything and analyzing the conversations. Latent
topics are the unknown unknowns in unstructured data, and contain the most challenging insights for humans to uncover. Topic modeling can be used as a part of the
human–machine teaming capability that leverages both the machine’s strengths to
reveal structures and inter-relationships, and the human’s strengths to identify patterns and critique solutions using prior experiences. Listening to calls and/or reading
documents will always be the limiting factors if we depend on humans with human
attention spans, but topic modeling allows the machines to plow through seemingly
impossible amounts of data to uncover the unknown unknowns that could lead to
actionable insights. Of course, we don’t want to do this in a truly unsupervised fashion, because we’ve been trying to mathematically model and numerically simulate
complicated systems (and their responses) for decades.
Our current work is designed to exploit those decades of work to minimize the
burden of getting a sufficiently large and representative set of training data with
assigned class labels. We still have both supervised and unsupervised machine learning approaches working in tandem, but now we invoke insight provided by mathematical models and numerical simulations to add what we sometimes call model-assisted
learning. One area that many researchers are beginning to actively explore is health
care, with the goal of replacing the human scribes with machine learning systems
that transcribe conversations between and among patients, nurses, doctors, etc. in
real time while also drawing upon the patients’ digital medical records, scans, and
current bodily function monitors. Recall Fig. 1.9 where we imagined doing this 20
years ago.
1 Background and History
39
Most people have heard about Moore’s Law even if they don’t fully appreciate
that they’re walking around with a semi-disposable supercomputer in their pocket
and perhaps even on their wrist. What’s truly new and amazing, though, is the
smarts contained in the low-power sensor hubs that every tablet, smartphone, smartwatch, etc. are built around. Even low-end fitness trackers do so much more than the
old-fashioned pedometers handed out in gym class. On-board intelligence defeats
attempts to “finish your PE homework” on the school-bus ride home with the old
shakey-shakey, even if some of the hype around tracking sleep quality and counting
calories burned is a bit overblown. These modern MEMS sensor hubs have local
processors to interpret the sensor data streams in an efficient enough manner that the
battery draw is reduced by 90% compared to utilizing the main (general purpose)
CPU. Of course, processing sensor data streams to extract features of interest is at
the heart of machine learning. Now, however, sensor hubs and processing power are
so small and cheap that we can deploy lots and lots of these things—connected to
each other wirelessly and/or via the Internet—and do distributed machine learning
on the Internet of Things (IoT). There’s been a lot of hype about the IoT because
even non-technical business-channel talking head experts understand that something
new is starting to happen and there’s going to be a great deal of money to be made
(and lost).
The hardware to acquire, digest, and share sensor data just plummeted from $104 to
$101 and that trend line seems to be extending. Music and video streaming on-demand
are the norm, even in stadiums where people have paid real money to experience live
entertainment. Battery fires are what make the news, but my new smoke detector lasts
a decade without needing new batteries when I change the clocks twice a year which
happens automatically these days anyway. Even radar has gotten small enough to
Fig. 1.34 Kids today may only ever get to drive Cozy Coupes, since by the time they grow up
autonomous vehicles should be commonplace. I’ll expect all autonomous vehicles to brake sharply
if I step out into the street without looking
40
M. K. Hinders
put on a COTS drone, and not just a little coffee can radar but a real phased array
radar using meta-materials to sweep the beam in two directions with an antennalens combination about the size and weight of an iPad and at a cost already an
order of magnitude less than before. For this drone radar to be able to sense-andavoid properly, though, the algorithms are going to need to improve because small
drones operate in a cluttered environment. This explains why their output looks a
lot like B-mode ultrasound, and while it’s true that power lines and barbed-wire
fences backscatter radar similarly, the fence will always have bushes, trees, cows,
etc. that confound the signal. This clutter problem is a key issue for autonomous
ground robots, because there’s no clear distinction between targets and clutter and
some (me) would say pedestrians are neither (Fig. 1.34).
References
1. 1,340 Perish as Titanic sinks, only 886, mostly women and children, rescued. New York Tribune,
New York. Page 1, Image 1, col. 1. Accessed 16 Apr 1912
2. Maxim SH (1912) Preventing collisions at sea, a mechanical application of the bat’s sixth sense.
Sci Am 80–82. Accessed 27 July 1912
3. Maxim SH (1912) A new system of preventing collisions at sea. Cassel and Co., London, 147
p
4. A new system of preventing collisions at sea. Nature 89(2230):542–543
5. Dijkgraaf S (1960) Spallanzani’s unpublished experiments on the sensory basis of object perception in bats. Isis 51(1):9–20. JSTOR. www.jstor.org/stable/227600
6. Griffin DR (1958) Listening in the dark: the acoustic orientation of bats and men. Yale University Press, New Haven. Paperback – Accessed 1 Apr 1986. ISBN-13: 978-0801493676
7. Grinnell AD (2018) Early milestones in the understanding of echolocation in bats. J Comp
Physiol A 204:519. https://doi.org/10.1007/s00359-018-1263-3
8. Donald R. Griffin obituary. http://www.nytimes.com/2003/11/14/nyregion/donald-r-griffin88-dies-argued-animals-can-think.html
9. Donald R. Griffin: 1915–2003. Photograph by Greg Auger. Bat Research News 45(1) (Spring
2004). http://www.batresearchnews.org/Miller/Griffin.html
10. Au WWL (1993) The sonar of dolphins. Springer, New York
11. Brittain JE (1985) The magnetron and the beginnings of the microwave age. Physics Today
38:7, 60. https://doi.org/10.1063/1.880982
12. Buderi R (1998) The invention that changed the world: how a small group of radar pioneers
won the second world war and launched a technical revolution. Touchstone, Reprint edition.
ISBN-13: 978-0684835297
13. Conant J (2002) Tuxedo park: a wall street tycoon and the secret palace of science that changed
the course of world war II. Simon and Schuster, New York
14. Denny M (2007) Blip, ping, and buzz: making sense of radar and sonar. Johns Hopkins University Press, Baltimore. ISBN-13: 978-0801886652
15. Bowman JJ, Thomas BA, Senior, Uslenghi PLE, Asvestas JS (1970) Electromagnetic and
acoustic scattering by simple shapes. North-Holland Pub. Co., Amsterdam. Paperback edition:
CRC Press, Boca Raton. Accessed 1 Sept 1988. ISBN-13: 978-0891168850
16. Grier DA (2005) When computers were human. Princeton University Press, Princeton
17. The human computer project needs help finding all of the women who worked as computers
or mathematicians at the NACA or NASA. https://www.thehumancomputerproject.com/
18. Anderson VC (1950) Sound scattering from a fluid sphere. J Acoust Soc Am 22:426. https://
doi.org/10.1121/1.1906621
1 Background and History
41
19. NASA Dryden Flight Research Center Photo Collection (1949) NASA Photo: E49-54. https://
www.nasa.gov/centers/dryden/multimedia/imagegallery/Places/E49-54.html
20. Covert A (2011) Philco mystery control: the world’s first wireless remote. Gizmodo.
Accessed 11 Aug 2011. https://gizmodo.com/5857711/philco-mystery-control-the-worldsfirst-wireless-remote
21. “Bombshell: the Hedy Lamarr story” Director: Alexandra Dean opened in theaters on November 24, 2017. http://www.pbs.org/wnet/americanmasters/bombshell-hedy-lamarr-story-fullfilm/10248/, https://zeitgeistfilms.com/film/bombshellthehedylamarrstory. Photo credit to
https://twitter.com/Intel - Accessed 11 Mar 2016
22. Marr B (2016) A short history of machine learning – every manager should read.
Forbes. Accessed 19 Feb 2016. https://www.forbes.com/sites/bernardmarr/2016/02/19/ashort-history-of-machine-learning-every-manager-should-read/65578fb215e7
23. Gonzalez V (2018) A brief history of machine learning. Synergic Partners. Accessed Jun 2018.
http://www.synergicpartners.com/en/espanol-una-breve-historia-del-machine-learning
24. Johnson D (2017) Find out if a robot will take your job. Time. Accessed 19 Apr 2017. http://
time.com/4742543/robots-jobs-machines-work/
25. Alan Turing: the enigma. https://www.turing.org.uk/
26. Professor Arthur Samuel. https://cs.stanford.edu/memoriam/professor-arthur-samuel
27. “Professor’s perceptron paved the way for AI – 60 years too soon”. https://news.cornell.edu/
stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon
28. Minsky M, Professor of media arts and sciences. https://web.media.mit.edu/~minsky/
29. DeJong G (a.k.a. Mr. EBL). http://mrebl.web.engr.illinois.edu/
30. Sejnowski T, Professor and computational neurobiology laboratory head. https://www.salk.
edu/scientist/terrence-sejnowski/
31. Foote KD (2017) A brief history of deep learning. Dataversity. Accessed 7 Feb 2017. http://
www.dataversity.net/brief-history-deep-learning/
32. US Food and Drug Administration (2019) Proposed regulatory framework for modifications to
artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD)
- discussion paper and request for feedback. www.fda.gov
33. Philips (2019) Adaptive intelligence. The case for focusing AI in healthcare on people, not
technology. https://www.usa.philips.com/healthcare/resources/landing/adaptive-intelligencein-healthcare
34. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas
M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK (2019) A comparison of deep learning performance against health-care professionals
in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet
Digit Health 1(6):e271–e297. https://doi.org/10.1016/S2589-7500(19)30123-2
35. Krupinski EA, Graham AR, Weinstein RS (2013) Characterizing the development of visual
search expertise in pathology residents viewing whole slide images. Hum Pathol 44(3):357–
364. https://doi.org/10.1016/j.humpath.2012.05.024
36. Janowczyk A, Madabhushi A (2016) Deep learning for digital pathology image analysis: a
comprehensive tutorial with selected use cases. J Pathol Inform 7:29
37. Roy S, Kumar Jain A, Lal S, Kini J (2018) A study about color normalization methods for
histopathology images. Micron 114:42–61. https://doi.org/10.1016/j.micron.2018.07.005
38. Komura D, Ishikawa S (2018) Machine learning methods for histopathological image analysis.
Comput Struct Biotechnol J 16:34–42. https://doi.org/10.1016/j.csbj.2018.01.001
39. Landau MS, Pantanowitz L (2019) Artificial intelligence in cytopathology: a review of the
literature and overview of commercial landscape. J Am Soc Cytopathol 8(4):230–241. https://
doi.org/10.1016/j.jasc.2019.03.003
40. Kannan S, Morgan LA, Liang B, Cheung MKG, Lin CQ, Mun D, Nader RG, Belghasem ME,
Henderson JM, Francis JM, Chitalia VC, Kolachalama VB (2019) Segmentation of glomeruli
within trichrome images using deep learning. Kidney Int Rep 4(7):955–962. https://doi.org/
10.1016/j.ekir.2019.04.008
42
M. K. Hinders
41. Niazi MKK, Parwani AV, Gurcan MN (2019) Digital pathology and artificial intelligence.
Lancet Oncol 20(5):e253–e261. https://doi.org/10.1016/S1470-2045(19)30154-8
42. Wang S, Yang DM, Rong R, Zhan X, Xiao G (2019) Pathology image analysis using segmentation deep learning algorithms. Am J Pathol 189(9):1686–1698. https://doi.org/10.1016/
j.ajpath.2019.05.007
43. Wang X et al (2019) Weakly supervised deep learning for whole slide lung cancer image
analysis. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2935141
44. Abels E, Pantanowitz L, Aeffner F, Zarella MD, van der Laak J, Bui MM, Vemuri VN, Parwani AV, Gibbs J, Agosto-Arroyo E, Beck AH, Kozlowski C (2019) Computational pathology
definitions, best practices, and recommendations for regulatory guidance: a white paper from
the Digital Pathology Association. J Pathol 249:286–294. https://doi.org/10.1002/path.5331
45. Tajbakhsh N, Jeyaseelan L, Li Q, Chiang J, Wu Z, Ding X (2019) Embracing imperfect datasets:
a review of deep learning solutions for medical image segmentation. https://arxiv.org/abs/1908.
10454
46. Janke J, Castelli M, Popovic A (2019) Analysis of the proficiency of fully connected neural
networks in the process of classifying digital images. Benchmark of different classification
algorithms on high-level image features from convolutional layers. Expert Syst Appl 135:12–
38. https://doi.org/10.1016/j.eswa.2019.05.058
47. Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, Back T, Chopra R, Pontikos N, Kern C,
Moraes G, Schmid MK, Sim D, Balaskas K, Bachmann LM, Denniston AK, Keane PA (2019)
Automated deep learning design for medical image classification by health-care professionals
with no coding experience: a feasibility study. Lancet Digit Health 1(5):e232–e242. https://
doi.org/10.1016/S2589-7500(19)30108-6
48. McKinney SM, Sieniek M, Godbole V et al (2020) International evaluation of an AI system
for breast cancer screening. Nature 577:89–94. https://doi.org/10.1038/s41586-019-1799-6
49. Marks M (2019) The right question to ask about Google’s project nightingale. Slate. Accessed
20 Nov 2019. https://slate.com/technology/2019/11/google-ascension-project-nightingaleemergent-medical-data.html
50. Copeland R, Mattioli D, Evans M (2020) Paging Dr. Google: how the tech giant is laying
claim to health data. Wall Str J. Accessed 11 Jan 2020. https://www.wsj.com/articles/pagingdr-google-how-the-tech-giant-is-laying-claim-to-health-data-11578719700
51. Photo from https://www.reddit.com/r/cablegore/ but it gets reposted quite a lot
52. Rous SN (2002) The prostate book, sound advice on symptoms and treatment. W. W. Norton
& Company, Inc., New York. ISBN 978-0-393-32271-2 [53]
53. Imani F et al (2015) Computer-aided prostate cancer detection using ultrasound RF time series.
In vivo feasibility study. IEEE Trans Med Imaging 34(11):2248–2257. https://doi.org/10.1109/
TMI.2015.2427739
54. Welch HG, Schwartz L, Woloshin S (2012) Overdiagnosed: making people sick in the pursuit
of health, 1st edn. Beacon Press, Boston. ISBN-13: 978-0807021996
55. Holtzmann Kevles B (1998) Naked To the bone: medical imaging in the twentieth century,
Reprint edn. Basic Books, New York. ISBN -13: 978-0201328332
56. Agarwal S, Milch B, Van Kuiken S (2009) The US stimulus program: taking medical records
online. McKinsey Q. https://www.mckinsey.com/industries/healthcare-systems-and-services/
our-insights/the-us-stimulus-program-taking-medical-records-online
57. Pizza Rat is the nickname given to a rodent that became an overnight Internet sensation after it
was spotted carrying down a slice of pizza down the stairs of a New York City subway platform
in September 2015. https://knowyourmeme.com/memes/pizza-rat
58. Surprised Squirrel Selfie image at https://i.imgur.com/Tl1ieNZ.jpg. https://www.reddit.com/
r/aww/comments/4vw1hk/surprised_squirrel_selfie/. This was a PsBattle: a squirrel surprised
by a selfie three years ago
59. Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics.
https://epubs.siam.org/doi/abs/10.1137/1.9781611970104
60. Hou J (2004) Ultrasonic signal detection and characterization using dynamic wavelet fingerprints. Doctoral dissertation, William and Mary, Department of Applied Science
1 Background and History
43
61. Howard JN (1964) The Rayleigh notebooks. Appl Opt 3:1129–1133
62. Strutt JW (1871) On the light from the sky, its polarization and colour. Philos Mag XLL:107–
120, 274–279
63. van de Hulst HC (1981) Light scattering by small particles. Dover books on physics. Corrected
edition. Accessed 1 Dec 1981. ISBN-13: 978-0486642284
64. Kerker M (1969) The scattering of light and other electromagnetic radiation. Academic, New
York
65. Bohren C, Huffman D (2007) Absorption and scattering of light by small particles. Wiley, New
York. ISBN: 9780471293408
66. Knott EF, Tuley MT, Shaeffer JF (2004) Radar cross section. Scitech radar and defense, 2nd
edn. SciTech Publishing, Raleigh
67. Richardson D (1989) Stealth: deception, evasion, and concealment in the air. Orion Books,
London. ISBN-13: 978-0517573433
68. Sweetman B (1986) Stealth aircraft: secrets of future airpower. Motorbooks Intl, London.
ISBN-13: 978-0879382087
69. Kenton Z (2016) Stealth aircraft technology. CreateSpace Independent Publishing Platform,
Scotts Valley. ISBN-13: 978-1523749263
70. Mistaken identity. Futility Closet. Accessed 29 Apr 2011. http://www.futilitycloset.com/2011/
04/29/mistaken-identity-2/
71. Cellania M (2014) Alphonse Bertillon and the identity of criminals. Ment Floss. Accessed
21 Oct 2014. https://www.mentalfloss.com/article/59629/alphonse-bertillon-and-identitycriminals
72. Cole SA (2002) Suspect identities a history of fingerprinting and criminal identification. Harvard University Press, Cambridge. ISBN 9780674010024
Chapter 2
Intelligent Structural Health Monitoring
with Ultrasonic Lamb Waves
Mark K. Hinders and Corey A. Miller
Abstract Structural health monitoring is a branch of machine learning where we
automatically interpret the output of in situ sensors to assess the structural integrity
and remaining useful lifetime of engineered systems. Sensors can often be permanently placed in locations that are inaccessible or dangerous, and thus not appropriate for traditional nondestructive evaluation techniques where a technician both
performs the inspection and interprets the output of the measurement. Ultrasonic
Lamb waves are attractive because they can interrogate large areas of structures with
a relatively small number of sensors, but the resulting waveforms are challenging to
interpret even though these guided waves have the property that their propagation
velocity depends on remaining wall thickness. Wavelet fingerprints provide a method
to interpret these complex, multi-mode signals and track changes in arrival time that
correspond to thickness loss due to inevitable corrosion, erosion, etc. Guided waves
follow any curvature of plates and shells, and will interact with defects and structural features on both surfaces. We show results on samples from aircraft and naval
structures.
Keywords Lamb wave · Wavelet fingerprint · Machine learning · Structural
health monitoring
2.1 Introduction to Lamb Waves
Ultrasonic guided waves are naturally suited to structural health monitoring of
aerospace and naval structures, since large areas can be inspected with a relatively
small number of transducers operating in various pitch-catch and/or pulse-echo scenarios. They are confined to the plate-/shell-/pipe-like structure itself and so follow its
shape and curvature, with sensitivity to material discontinuities at either surface or in
the interior. In structures, these guided waves are typically referred to as Lamb waves,
a terminology we will tend to use here. Lamb waves are multi-modal and dispersive.
M. K. Hinders (B) · C. A. Miller
Department of Applied Science, William & Mary, Williamsburg, VA, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_2
45
46
M. K. Hinders and C. A. Miller
Fig. 2.1 Dispersion curves for an aluminum plate. Solutions to the Rayleigh–Lamb wave equations
are plotted here for both symmetric (solid lines) and antisymmetric (dashed lines) for both phase
and group velocities
When plotted versus a combined frequency-thickness parameter, Fig. 2.1, the phase
and group velocities of the symmetric and antisymmetric families of modes are as
shown for aluminum plates, although other structural materials have similar behavior.
With the exception of the zeroth-order modes, all Lamb wave modes have a cutoff
frequency-thickness value where their phase and group velocities tend to infinity and
zero, respectively, and hence below which those modes do not propagate.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
47
Fig. 2.2 Lamb wave tomography allows the reconstruction of thickness maps in false-color images
because slowness is related to plate thickness via the dispersion curves. Each reconstruction requires
many criss-cross Lamb wave measurements with arrival times of the modes of interest automatically
extracted from the recorded waveforms
Fig. 2.3 A typical Lamb waveform with modes of interest indicated
Characteristic change of group velocity with thickness changes is what makes
Lamb waves so useful for detecting flaws, such as corrosion and disbonds, that represent an effective change in thickness. We exploited these properties years ago in
making Lamb wave tomographic reconstructions, as shown in false-color thickness
maps in Fig. 2.2 for a partially delaminated doubler (left) a dished-out circular thinning (middle) and a circular flat-bottom hole (right).
The key technical challenge to using Lamb waves effectively turns out to be
automatically identifying which modes are which in very complex waveform signals.
A typical signal for an aluminum plate is shown in Fig. 2.3, with modes marked
according to the group velocities from the dispersion curve. In sharp contradistinction
to traditional bulk-wave ultrasonics, it’s pretty clear that peak-detection sorts of
approaches will fail rather badly. Although the dispersion curves tell us that the S2
mode is fastest, for this particular experimental case that mode is not generated with
much energy and the S2 mode happens to have an amplitude more or less in the
noise. The A0 , S0 , and A1 modes have higher amplitude, but those three modes all
have about the same group velocity so they arrive jumbled together in time. Angle
48
M. K. Hinders and C. A. Miller
Fig. 2.4 Lamb wave scattering from flaws is inherently a three-dimensional process, as can be seen
from the screenshots of simulations of Lamb waves interacting with an attached thickness (left) and
a rectangular thinning (right)
blocks, comb transducers, etc. can be employed to select purer modes, or perhaps a
wiser choice of frequency-thickness product can give easier to interpret signals. For
example, f d = 4 MHz-mm could give a single fast S1 mode with all other modes
much slower. Of course, most researchers simply choose a value below f d = 2
MHz-mm where all but the two fundamental modes are cut off. Some even go much
lower in frequency where the A0 mode is nearly cut off and the S0 mode is not very
dispersive, although that tends to minimize the most useful aspects of Lamb waves
for flaw detection which is that the different modes each have different throughthickness displacement profiles. Optimal detection of various flaw types depends
on choice of modes with displacement profiles that will interact strongly with it,
i.e., scatter from it. Moreover, this scattering interaction will cause mode mixing to
occur which can be exploited to better identify, locate, and size flaws. Lamb wave
scattering from flaws is inherently a three-dimensional process, as can be seen from
the screenshots of simulations of Lamb waves interacting with an attached thickness
(left) and a rectangular thinning (right) in Fig. 2.4.
2.2 Background
There is a large literature on the use of Lamb waves for nondestructive evaluation
and structural health monitoring. Mathematically, they were first described by Lamb
in 1917 [1] with experimental confirmation at ultrasonic frequencies published by
Worlton in 1961 [2]. Viktorov’s classic book [3] still provides one of the best practical descriptions of the use of Lamb waves although the Dover edition of Graff’s
book [4] is more widely available and gives an excellent discussion of the mathematical necessities. Rose’s more recent text [5] is also quite popular with Lamb wave
researchers. Overviews of this research area can be found in several recent review
articles [6–11] and books such as [12, 13].
There are two distinct guided wave propagation approaches available for structural
health monitoring. The first is characterized by trying to simplify the signal inter-
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
49
pretation by selectively generating a single pure guided wave mode either by using
more complex transducers, e.g., angle blocks or interdigitated comb transducers, or
by selecting a low-enough frequency-thickness that only the first two guided wave
modes aren’t cut off. The second approach aims to minimize the size and complexity
of the transducer itself but to then choose a particular range of frequency-thickness
values where the multi-mode guided wave signals are still manageable. The first
approach also typically uses a tone-burst excitation in order to narrow the frequency
content of the signals, whereas the second approach allows either a spike excitation or a walking tone-burst scheme with a broader frequency content. Which is the
appropriate choice depends on the specific SHM application, although it’s important
to point out that rapid advances in signal processing hardware and algorithms means
that the second approach is the clear winner over the long term.
The “pure mode” approach is inherently limited by trying to peak detect as is
done in traditional ultrasound. Also, for scattering flaws such as cracks it’s critical
both to have some bandwidth to the signals, so that the frequency dependence of
scattering can be exploited to size flaws, and to be able to interpret multi-mode
signals because guided wave mode conversion is the most promising approach to
identify flaws without the need to resort to baseline subtraction.
Many structural health monitoring approaches still use baseline subtraction [14–
21] which requires one or more reference signals recorded from the structure in an
unflawed state. Pure modes [22–32] and/or dispersion and temperature correction
[33–39] are also often employed to simplify the signal interpretation, although mode
conversion during scattering at a crack often enables baseline-free approaches [40–
55] as does the time-reversal technique [56–64].
2.3 Simulation Methods for SHM
In optimizing guided wave SHM approaches and defining signal processing strategies, it’s important to be able to simulate the propagation of guided waves in structures
and to be able to visualize how they interact with flaws. Most researchers tend to
use one of the several commercially available finite element method (FEM) packages for this or the semi-analytic finite element (SAFE) technique [65–80]. Not very
many researchers are doing full 3D simulations, since this typically requires computer clusters rather than desktop workstations in order to grid the full volume of the
simulation space at high enough spatial resolution to accurately capture multi-mode
wave behavior as well as the flaw geometry. This is an issue because guided wave
interaction with flaws is inherently a 3D process, and two-dimensional analyses can
give quite misleading information. The primary lure of FEM packages is their ability
to model complex structures, although this is less of an advantage when simulating
guided wave propagation.
50
M. K. Hinders and C. A. Miller
A more attractive approach turns out to be variations of the finite-difference timedomain (FDTD) method [81, 82] where the elastodynamic field equations and boundary conditions are discretized directly and the simulation is stepped forward in time,
recording the stress components and/or displacements across the 3D Cartesian grid
at each time step. Fellinger et al. originally developed the basic equations of the elastodynamic finite integration technique (EFIT) along with a unique way to discretize
the material parameters for ensured continuity of stress and displacement across the
staggered grid in 1995 [83]. Schubert et al. demonstrated the flexibility of EFIT
with discretizations in Cartesian, cylindrical, and spherical coordinates for a wide
range of modeling applications [84–86]. Although commercial codes aren’t mature,
these approaches are relatively straightforward to implement on either multi-core
workstations or large computer clusters in order to have sufficient processing power
and memory to perform high-resolution 3D simulations of realistic structures. In
our experience [87–92], the finite integration technique tends to be quite a bit faster
than the finite element method because the computationally intensive meshing step
is eliminated by using a simple uniform Cartesian grid. For pipe inspection, a different, but only slightly more complex, cylindrical discretization of the field equations
and boundary conditions is used, which optimizes cylindrical EFIT for simulating
guided wave propagation in pipe-like structures. In addition to simple plate-like or
pipe-like structures, EFIT can also be used to simulate wave propagation in complex built-up structures. Various material combinations, interface types, etc. can be
simulated directly with cracks and delaminations introduced by merely adjusting the
boundary conditions at the appropriate grid points.
Simulation methods are also critical to optimizing guided wave structural health
monitoring because the traditional approaches for modeling scattering from canonical flaws [93–95] fail for guided waves or are limited to artificial 2D situations. For
low-enough frequency-thickness, the Mindlin plate theory [96] allows for analytical approaches to Lamb wave scattering from simple through-thickness holes and
such, and can even account for some amount of mode conversion, but at the cost
of assuming a simplistic through-thickness displacement profile. Most researchers
have typically used boundary element method (BEM) and related integral equation
approaches to simulate 2D scattering, but these are in some sense inherently lowfrequency methods. The sorts of high-frequency approaches that served the radar
community so well until 3D computer simulations became viable aren’t appropriate
for guided wave SHM, although there was a significant effort to begin to derive a
library of diffraction coefficients three decades ago. With currently available computers, almost all researchers are now using FEM or EFIT to isolate the details of
guided wave interaction with flaws. There are also a variety of experimental studies
reported recently [40–110] to examine Lamb wave scattering from flaws, including
Lamb wave tomography [104–113] which we worked on quite bit a number of years
ago [114–125].
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
51
2.4 Signal Processing for Lamb Wave SHM
Advanced signal processing methods are necessary for guided wave SHM both
because the signals are quite complex and because identifying and quantifying small
flaws while covering large structures with a minimum number of sensors means propagation distances are going to be large and the “fingerprint” of the flaw scattering
will usually be quite subtle. There is also the confounding issue that environmental changes, especially temperature, will affect the guided wave signals. This is a
particular problem for baseline subtraction methods which assume that one or more
baseline signals have been recorded with the structure in an unflawed state so that
some sort of signal difference metric can be employed to indicate the presence of
damage. With baseline subtraction approaches, there is also the issue of natural fluctuations (noise) in the Lamb waveforms themselves, which usually means that some
sort of simplified representation of the signal, such as envelope, is rendered before
subtracting off the baseline. The danger of this is that the subtle flaw fingerprints may
be further suppressed by simplifying the signals. Other ways to compare signals in
time domain are cross-correlation sorts of approaches, with or without stretching
or multiple baselines to account for temperature variations, etc. Because scattering
from flaws is frequency dependent, and because different Lamb wave modes have
differing through-thickness displacement profiles and frequency-dispersion properties, the most promising signal processing approaches for Lamb wave SHM include
joint time–frequency and time–scale methods.
Echolocation [126] is actually quite similar to structural health monitoring with
guided waves in that the time delay is used to locate flaws, while the character of the
scattered echoes is what allows us to identify and quantify the flaws. A 2D color image
time–frequency representation (TFR) typically has time delay on the horizontal axis
and frequency on the vertical axis. The simplest way to form a spectrogram is via
a boxcar FFT, where an FFT is performed inside of a sliding window to give the
spectrum at a sequence of time delays. Boxcar FFT is almost never the optimal
TFR, however, since it suffers rather badly from an uncertainty effect. Making the
time window shorter to better localize the frequency content in time means that
there often aren’t enough sample points to accurately form the FFT. Lengthening the
window to get a more accurate spectrum doesn’t solve the problem, since then the
time localization is imprecise. Alternative TFRs have been developed to overcome
many of the deficiencies of the traditional spectrogram [127]. However, since guided
wave SHM signals are typically composed of one or more relatively short wave
pulses, albeit often overlapping, it is natural to explore TFRs that use basis functions
with compact support. Wavelets [128] are very useful for analyzing time-series data
because the wavelet transform allows us to keep track of both time and frequency, or
scale features. Whereas Fourier transforms break down a signal into a series of sines
and cosines in order to identify the frequency content of the entire signal, wavelet
transforms keep track of local frequency features in the time domain.
Ultrasonic signal analysis with wavelet transforms was first studied by Abbate
in 1994 who found that if the mother wavelet was well defined there was good
52
M. K. Hinders and C. A. Miller
peak detection even with large amounts of added white noise [129]. Massicotte,
Goyette, and Bose then found that even noisy EMAT sensor signals were resolvable
using the multi-scale method of the wavelet transform [130]. One of the strengths
compared to the fast Fourier transform was that since the extraction algorithm did
not need to include the inverse transform, the arrival time could be taken directly
from the time–frequency domain of the wavelet transform. In 2002, Perov et al.
considered the basic principles of the formulation of the wavelet transform for the
purpose of an ultrasonic flaw detector and concluded that any of the known systems of
orthogonal wavelets are suitable for this purpose as long as the number of levels does
not drop below 4–5 [131]. In 2003, Lou and Hu found that the wavelet transform
was useful in suppressing non-stationary wideband noise from speech [132]. In a
comparison study between the Wigner–Ville distribution and the wavelet transform,
preformed by Zou and Chen, the wavelet transform out performed the Wigner–Ville
in terms of sensitivity to the change in stiffness of a cracked rotor [133]. In 2002,
Hou and Hinders developed a multi-mode arrival time extraction tool that rendered
the time-series data in 2D time-scale binary images [134]. Since then this technique
has been applied to multi-mode extraction of Lamb wave signals for tomographic
reconstruction [121, 123], time-domain reflectometry signals wiring flaw detection
[135], acoustic microscopy [136], and a periodontal probing device [137]. Wavelets
remain under active study worldwide for the analysis of a wide variety of SHM
signals [138–153] as do various other time–frequency representations [154–159].
The Hilbert–Huang transform (HHT) [160–164] along with chirplets, Golay codes,
fuzzy complex numbers, and related approaches are also gaining popularity [65–166].
Our preferred method, which we call dynamic wavelet fingerprints, is discussed in
some detail below.
Wavelets are often ideally suited to analyzing non-stationary signals, especially
since there are a wide variety of mother wavelets that can be evaluated to find those
that most parsimoniously represent a given class of signals. The wavelet transform
coefficients can be rendered in an image similar to the spectrogram, except that
the vertical axis will now be “wavelet scale” instead of frequency. The horizontal
axis will still be time delay because the “wavelet shift” corresponds to that directly.
Nevertheless, these somewhat abstract time-scale images can be quite helpful for
identifying subtle signal features that may not be resolvable via other TFR methods.
2.5 Wavelet Transforms
Wavelets are ideally suited for analyzing non-stationary signals. They were originally
developed to introduce a local formulation of time–frequency analysis techniques.
The continuous wavelet transform (CWT) of a square-integrable, continuous function
s(t) can be written as
C(a, b) =
+∞
−∞
∗
ψa,b
(t)s(t)dt,
(2.1)
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
53
where ψ(t) is the mother wavelet, ∗ denotes the complex conjugate, and ψa,b (t) is
given by
t −b
−p
.
(2.2)
ψa,b (t) = |a| ψ
a
Here, the constants a, b ∈ R, where a is a scaling parameter defined by p ≥ 0, and
b is a translation parameter related to the time localization of ψ. The choice of p is
dependent only upon which source in the literature is being referred to, much like
the different conventions for the Fourier transform, so we choose to implement the
most common value of p = 1/2. The mother wavelet can be any square-integrable
function of finite energy and is often chosen based on its similarity to the inherent
structure of the signal being analyzed. The scale parameter a can be considered to
relate to different frequency components of the signal. For example, small values of
a result in a compressed mother wavelet, which will then highlight many of the highdetail characteristics of the signal related to the signal’s high-frequency components.
Similarly, large values of a result in stretched mother wavelets, returning larger
approximations of the signal related to the underlying low-frequency components.
To better understand the behavior of the CWT, it can be rewritten as an inverse
Fourier transform,
C(a, b) =
1
2π
+∞
−∞
∗
√ ŝ(ω) a ψ̂(aω) e jωb dω,
(2.3)
where ŝ(ω) and ψ̂(ω) are the Fourier transforms of the signal and wavelet, respectively. From Eq. (2.3), it follows that stretching a wavelet in time causes its support
in the frequency domain to shrink as well as shift its center frequency toward a lower
frequency. This concept is illustrated in Fig. 2.5. Applying the CWT with only a single mother wavelet can therefore be thought of as applying a bandpass filter, while a
series of mother wavelets via changes in scale can be thought of as a bandpass filter
bank.
An infinite number of wavelets are therefore needed for the CWT to fully represent the frequency spectrum of a signal s(t); since every time the value of the
scaling parameter a is doubled, the bandwidth coverage is reduced by a factor of 2.
An efficient and accurate discretization of this involves selecting dyadic scales and
positions based on powers of two, resulting in the discrete wavelet transform (DWT).
In practice, the DWT requires an additional scaling function to act as a low-pass filter to allow for frequency spectrum coverage from ω = 0 up to the bandpass filter
range of the chosen wavelet scale. Together, scaling functions and wavelet functions
provide full-spectrum coverage for a signal. For each scaled version of the mother
wavelet ψ(t), a corresponding scaling function φ(t) exists.
Just as Fourier analysis can be thought of as the decomposition of a signal into various sine and cosine components, wavelet analysis can be thought of as a decomposition into approximations and details. These are generated through an implementation
of the wavelet and scaling function filter banks. Approximations are the high-scale
54
M. K. Hinders and C. A. Miller
Fig. 2.5 Frequency-domain representation of a hypothetical wavelet at scale parameter values of
a = 1, 2, 4. It can be seen that increasing the value of a leads to both a reduced frequency support
and a shift in the center frequency component of the wavelet toward lower frequencies. In this sense,
the CWT acts as a shifting bandpass filter of the input signal
(low-frequency) components of the signal revealed by the low-pass scaling function
filters, while details are the low-scale (high-frequency) components revealed by the
high-pass wavelet function filter. This decomposition process is iterative, with the
output approximations for each level used as the input signal for the following level,
illustrated in Fig. 2.6. In general, most of the information in a time-domain signal
is contained in the approximations of the first few levels of the wavelet transform.
The details of these low levels often have mostly high-frequency noise information.
If we remove the details of these first few levels and then reconstruct the signal with
the inverse wavelet transform, we will have effectively de-noised the signal, keeping
only the information of interest.
2.5.1 Wavelet Fingerprinting
Once a raw signal has been filtered, we then pass it through the DWFP algorithm.
Originally developed by Hou [134], the DWFP applies a wavelet transform on the
original time-domain data, resulting in an image containing “loop” features that
resemble fingerprints. The wavelet transform coefficients can be rendered in an image
similar to a spectrogram, except that the vertical axis will be scale instead of frequency. These time-scale image representations can be quite helpful for identifying
subtle signal features that may not be resolvable via other time–frequency methods.
Combining Eqs. 2.1 and 2.2, the CWT of a continuous square-integrable function
s(t) can be written as
1
C(a, b) = √
a
+∞
−∞
s(t)ψ ∗
t −b
dt.
a
(2.4)
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
55
Fig. 2.6 The signal is decomposed into approximations (A1 ) and details (D2 ) at the first level.
The next iteration then decomposes the first-level approximation coefficients into second-level
approximations and details, and this process is repeated for the desired number of levels. For
wavelet filtering, the first few levels of details can be removed, effectively applying a low-pass filter
to the signal
Unlike the DWT, where scale and translation parameters are chosen according to
the dyadic scale (a = 2m , b = n2m , n, m ∈ Z2 ), the MATLAB implementation of
the CWT used here utilizes a range of real numbers for these coefficients. A normal
range of scales includes a = 1, . . . , 50 and b = 1, . . . , N for a signal of length N .
This results in a two-dimensional array of coefficients, C(a, b), which are normalized to the range of [−1, 1]. These coefficients are then sliced in a “thick” contour
manner, where the number of slices and thickness of each slice is defined by the
user. To increase efficiency, the peaks (C(a, b) ≥ 0) and valleys (C(a, b) < 0) are
considered separately. Each slice is then projected onto the time-scale plane. The
resulting slice projections are labeled in an alternating, binary manner, resulting in
a binary “fingerprint” image, I (a, b):
s(t)
DW F P(ψa,b )
−→
I (a, b).
(2.5)
56
M. K. Hinders and C. A. Miller
The values of slice thickness and number of slices can be varied to alter the appearance
of the wavelet coefficients, as can changing which mother wavelet is used. The
process of selecting mother wavelets for consideration is application-specific, since
certain choices of ψ(t) will be more sensitive to certain types of signal features. In
practice, mother wavelets used are often chosen based on preliminary analysis results
as well as experience.
In general, most of the information in a signal is contained in the approximations
of the first few levels of the wavelet transform. The details of these low levels often
have mostly high-frequency noise information. If we set the details of these first few
levels to zero, when we reconstruct the signal with the inverse wavelet transform
we have effectively de-noised our signal to keep only information of the Lamb
wave modes of interest. In our work, we start with the filtered ultrasonic signal and
take a continuous wavelet transform (CWT). The CWT gives a surface of wavelet
coefficients, and this surface is then normalized between [0–1]. Then, we perform
a thick contour slice operation where the user defines the number of slices to use:
the more the slices, the thinner the contour slice. The contour slices are given the
value of 0 or 1 in alternating fashion. They are then projected down to a 2D image
where the result often looks remarkably like the ridges of a human fingerprint, hence
the name “wavelet fingerprints.” Note that we perform a wavelet transform as usual,
but then instead of rendering a color image we form a particular type of binary
contour plot. This is illustrated in Fig. 2.7. The wavelet fingerprint has time on the
horizontal axis and wavelet scales on the vertical axis. We’ve deliberately shown
this at a “resolution” where the pixilated nature of the wavelet fingerprint is obvious.
This is important because each of the pixels is either black or white: it is a binary
image.
The problem has thus been transformed from one-dimensional signal identification problem to a 2D image recognition scenario. The power of the dynamic wavelet
fingerprint (DWFP) technique is that it converts the time-series data into a binary
matrix that is easily stored and transferred, and is amenable to edge computing implementations. There is also robustness to the simple algorithm (Fig. 2.8) since different
mother wavelets emphasize different features in the signals.
The last piece of the DWFP technique is recognition of the binary image features
that correspond to the waveform features of interest. We have found that different
modes are represented in unique features in our applications. We’ve also found that
using a simple ridge-counting algorithm on the 2D images is often a helpful way to
identify some of the features of interest. In Fig. 2.9, we show a small portion of a
fingerprint image, blown up so the ridge counting at each time sample can be seen.
Figure 2.10 shows longer fingerprints for two waveforms with and without a flaw
in the location indicated by the dashed rectangle. In this particular case, the flaw is
identified by thresholding the ridge-count metric, as indicated by the bottom panel.
Once such a feature has been identified in the time-scale space, we know its arrival in
the time domain as well and we can then draw conclusions about its location based
on our knowledge of that guided wave mode velocity.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
57
0.5
0
-0.5
-0.2
0
Time (ms)
0.2
Scales
60
40
20
-0.2
0
Time (ms)
0.2
Fig. 2.7 A visual summary of the DWFP algorithm [134]. A time-domain signal for which a
set of wavelet coefficients is generated via the continuous wavelet transform. The coefficients are
then “thickly” sliced and projected onto the time-scale plane, resulting in two-dimensional binary
images, shown here with white peaks and gray valleys for distinction
The inherent advantage of traditional TFRs is thus to transform a one-dimensional
time-trace signal into a two-dimensional image, which then allows powerful image
processing methods to be brought to bear. The false-color images also happen to be
visually appealing, but this turns out to be somewhat of a disadvantage when the
goal is to automatically identify the features in the image that carry the information
about the flaw(s). High-resolution color imagery is computationally expensive to
both store and process, and segmentation is always problematic. This latter issue is
particularly difficult for SHM because it’s going to be something about the shape
of the flaw signals(s) in the TFR image that we’re searching for automatically via
image processing algorithms. A binary image requires much less computer storage
than does a greyscale or color image, and segmentation isn’t an issue because it’s a
trivial matter to decide between black and white. These sorts of fingerprint images
can be formed from any TFR, of course, although wavelets seem to work quite well
for guided wave ultrasonics and a variety of applications that we have investigated.
Figure 2.10 shows two cases with and without flaws.
58
M. K. Hinders and C. A. Miller
Dynamic Wavelet Fingerprint Algorithm
Nondestructive Evaluation Laboratory – William & Mary Applied Science Department
The code at the bottom is a MATLAB algorithm to create a 2D fingerprint image from a 1D signal. This is the same algorithm used in
the Wavelet Fingerprint Tool Box. This algorithm can easily be implemented in any programming language that can perform a
continuous wavelet transform. The following table description the variables passed to the function.
datain
wvt
ns
numridges
rthickness
The raw 1D signal in which the wavelet fingerprint is created.
The name of the mother wavelet. For example: ‘coif2’, ‘sym2’, ‘mexh’, ‘db10’.
The number of scales to use in the continuous wavelet transform (start with 50).
The number of ridges used in the wavelet fingerprint. (start with 5)
The thickness of the ridges normalized to 1. (start with 0.12)
The output variable fingerprint contains the wavelet fingerprint image. It is an array (length(rawdata) by ns) of 1’s and 0’s
where the 1’s represent the ridgelines. The following is a sample call of this function.
>>
fingerprint = getfingerprint( rawdata, ‘coif3’, 50, 5, 0.12 );
MATLAB Wavelet Fingerprint Algorithm
function [ fingerprint ] = getfingerprint( datain, wvt, ns, numridges, rthickness)
cfX = cwt(datain, 1:ns, wvt);
cfX = cfX./ max(max(abs(cfX)));
fingerprint(1:ns,1:length(datain))=0;
% get continuous wavelet transform coefficients
% normalize coefficients
% set image size and all values to zero
% Ridge locations is an array that holds the center of each slice (ridge)
rlocations = [-1:(1/numridges):-(1/numridges) (1/numridges):(1/numridges):1];
for sl = 1:length(rlocations)
% Loop through each slice
for y = 1:ns
% Loop through cfX array
for x = 1:length(rawdata)
if (cfX(y,x)>=(rlocations(sl)-(rthickness/2))) & (cfX(y,x)<=(rlocations(sl)+(rthickness/2))));
fingerprint(x,y) = 1;
% Set ridges to white
end
end
end
end
Fig. 2.8 The simple wavelet fingerprint algorithm is only a few lines of code in MATLAB
2.6 Machine Learning with Wavelet Fingerprints
As candidate TFRs are considered and binary fingerprint images are formed from
them, the task is to down select those that seem to best highlight the signal features of
interest while minimizing the clutter from signal features that aren’t of interest. We
typically perform this in an interactive manner, since we’ve implemented the wavelet
fingerprint method via a MATLAB GUI which allows us to easily read in signals,
perform de-noising and windowing as needed, and then form wavelet fingerprints
from any one of the 36 mother wavelets built into the MATLAB wavelet toolbox,
selecting a variety of other parameters/options particular to the method. This works
well because the human visual system is naturally adapted to identifying features in
this sort of binary imagery (think reading messy handwriting) and we’re interested
initially in naming features and describing qualitatively how the fingerprint features
change as the flaws or other physical features in the measurement setup are varied.
For example, a triangular feature might be the signature of a particular kind of flaw,
with the number of ridges indicating the severity of the flaw and the position in time
corresponding to the flaw’s physical location. A feature of interest might instead be a
circle or an oval, with the “circularity” or eccentricity and orientation as quantitative
measures of severity. The point is that once such features are identified it is a relatively
simple matter to write algorithms to track them in fingerprint images. Indeed, there
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
59
Fig. 2.9 Example of ridge counting in a fingerprint image. The number of connected “on” pixel
regions in each column corresponds to the number of ridges for that point in time
Fig. 2.10 A ridge-counting metric clearly identifies the differences between the two wavelet fingerprints due to the presence of a flaw
60
M. K. Hinders and C. A. Miller
is quite a large literature on fingerprint classification algorithms dating back to the
telegraph days where fingerprint images were manually converted to strings of letters
and numbers so they could be transmitted over large distances to attempt to identify
criminal suspects [167].
For very subtle flaws, i.e., small cracks in large structures, it may not be possible to find simple, readily identifiable fingerprint features in any TFR, whether
Fourier or wavelet-based. That isn’t an insurmountable problem, of course, because
the transformation of one-dimensional waveforms into two-dimensional TFR images
allows a variety of image characterization and pattern classification approaches to be
employed. Previous researchers have made use of the wavelet transform for pattern
classification applications [168, 169]. One option is to integrate wavelets directly
by capitalizing on the orthogonal property of wavelets to estimate the class density functions [170]. However, most applications of wavelets to pattern recognition
focus on feature extraction techniques [171]. One common method involves finding
the wavelet transform of a continuous variable (sometimes a signal) and computing
the spectral density, or energy, which is the square of the coefficients [172, 173].
Peaks of the spectral density or the sum of the density can be used as features and
have been applied to flank wear estimation in turning processes and classifying diseased lung sounds [172] as well as to evaluating astronomical chirps and the equine
gait [173]. This technique is similar to finding the cross-correlation but is only one
application of wavelets to signal analysis. One alternate method is to deconstruct
the signal into an orthogonal basis, such as Laguerre polynomials [174]. Another
technique is the adaptive wavelet method [175–178] which stems from multiresolution analysis [179]. Multiresolution analysis applies a wavelet transform using an
orthogonal basis resulting in filter coefficients in a pyramidal computation scheme,
while adaptive wavelet analysis uses a generalized M-band wavelet transform to
similarly achieve decomposition into coefficients and insert those coefficients into
matrix form for optimization. Adaptive wavelets result in efficient compression and
have the advantage of being widely applicable.
Wavelet methods are also often applied for feature extraction in images, such
as for shape characterization and boundary detection [180, 181]. Wavelets are particularly useful for detecting singularities, and in 2D data spaces this results in an
ability to identify corners and boundaries. Shape characterization itself is a precursor to template matching in pattern classification, in which outlines of objects are
extracted from an image and matched to known shapes from a library. Other techniques are similar to those described above, including multiresolution analysis, which
is also similar to the image processing technique of matched filters [182–186]. Either
libraries of known wavelets or wavelets constructed from the original signal are used
to match the signal of interest. Pattern recognition then proceeds in a variety of ways
from the deconstructed wavelet coefficients. The coefficients with minimal cost can
be used as features, or the results from each sequential step can be correlated individually. To reduce dimensionality, sometimes projection transforms are precursors
to decomposition. Some authors have constructed a rotationally invariant projection
that deconstructs an image into sub-images and transforms the mathematical space
from 2D to 1D [187, 188]. Also, constructing a set of orthonormal bases, just as in
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
61
the case of adaptive wavelets above, remains useful for images [189]. Because of the
dimensionality, computing the square of the energy is cumbersome and so a library
of codebook vectors is necessary for classification [190].
There have been many successful applications of pattern recognition in ultrasound, some of which include wavelet feature extraction methods. In an industrial
field, Tansel et al. [191, 192] selected coefficients from wavelet decomposition for
feature vectors and were able to use pattern recognition techniques to detect tool
failure in drills. Learned and Wilsky [193] constructed a wavelet packet approach,
in which energy values are calculated from a full wavelet decomposition of a signal, to detect sonar echoes for submarines. Wu and Du [194] also used a wavelet
packet description but found that feature selection required knowledge of the physical space, in this case, drill failure. Case and Waag [195] used Fourier coefficients
instead of wavelet coefficients for features that successfully identified flaws in pipes.
Comparing several techniques, including features selected from wavelet, time, and
spectral domains, Drai et al. [196] identified welding defects using a neural network
classifier. Buonsanti et al. [197] compared ultrasonic pulse-echo and eddy current
techniques to detect flaws in plates using a fuzzy logic classifier and wavelet features
relevant to the physical domain.
In medical diagnostics, an early example of applying classification techniques
is evidenced in Momenan et al.’s work [198] in which features selected offline
are used to identify changes in tissues as well as clustering in medical ultrasound
images. Bankman et al. [199] classified neural waveforms successfully with the careful application of preprocessing techniques such as whitening via autocorrelation.
Meanwhile, Kalayci et al. [200] detected EEG spikes by selecting eight wavelet
coefficients from two different Debauchies wavelet transforms for application in a
neural network. Tate et al. [201] similarly extracted wavelet coefficients as well as
other prior information to attempt to identify vegans, vegetarians, and meat-eaters
by their magnetic resonance spectra. Interestingly, vegetarians were more difficult
to identify, with a classification accuracy of 80%. Mojsilovic et al. [202] applied
wavelet multiresolution analysis to identify infarcted myocardial tissue from medical ultrasound images. Georgiou et al. [203, 204] used wavelet decomposition to
calculate scale-averaged wavelet power up to a threshold and detected the presence
of breast cancer in ultrasound waveforms by means of hypothesis testing. Also using
multiresolution techniques, Lee et al. [205] further selected fractal features to detect
liver disease. Alacam et al. [206] improved existing breast cancer characterization
of ultrasonic B-mode images by adding fractional differencing and moving average
polynomial coefficients as features. We have applied pattern classification techniques
to mine roof-fall prediction [207] and mobile robotic navigation [208] in addition
to ultrasonic identification of structural flaws via detection of abstract features in
dynamic wavelet fingerprints [209–214].
The goal of pattern recognition is to classify different objects into categories
(classes) based on measurements made on those objects (features). Pattern recognition is a subset of machine learning, in which computers use algorithms to learn from
data. Statistical pattern recognition, however, requires a statistical characterization
of the likelihood of each object belonging to a particular class. Pattern recognition
62
M. K. Hinders and C. A. Miller
usually proceeds in two different ways: unsupervised pattern recognition draws distinctions or clusters in the data without taking into account the actual class labels,
while supervised pattern recognition uses the known class labels along with the statistical characteristics of the measurements to identify objects with the class that
reduces the error. We will focus our attention on supervised pattern classification. In
applying pattern recognition techniques to finite, real-world datasets, merely classifying the data is not sufficient. To train the classifier and also evaluate its performance,
we have to withhold some of the data to test the classifier. A classifier that is tested
with a subset of the data it was trained on is said to be overtrained, which means the
classifier may perform very well on data it was trained on but its performance on
unseen data is not measured. Therefore, for finite, real-world datasets, the available
observations need to be split into a subset used for training and a subset used for
testing.
Pattern classification is the subset of machine learning that involves taking in raw
data and grouping it into categories. Many excellent textbooks exist on the subject
[215–220], while several review papers have explored the topic as well [221–223].
Emerging applications in the fields of biology, medicine, financial forecasting, signal
analysis, and database organization have resulted in the rapid growth of pattern classification algorithms. We focus our research here on statistical pattern recognition,
but other approaches exist including template matching, structural classification, and
neural networks [223].
Statistical pattern classification represents data by a series of measurements,
forming a one-dimensional feature vector for each individual data point. A general
approach for statistical pattern classification includes the following steps: preprocessing, feature generation, feature extraction/selection, learning, and classification.
Preprocessing involves any segmentation and normalization of data that leads to a
compact representation of the pattern. Feature generation involves creating the feature vector from each individual pattern, while feature extraction and selection are
optional steps that reduce the dimension of the feature vector using either linear transformations or the direct removal of redundant features. The learning step involves
training a given classification algorithm, which then outputs a series of decision rules
based on the data supplied. Finally, new data points are supplied to the trained classifier during the classification step, where they are categorized based on their own
feature vector relative to the defined decision rules.
There exists a theorem in pattern classification known as the Ugly Duckling Theorem, which states that in the absence of assumptions there is no “best” feature
representation for a dataset, since assumptions about what “best” means are necessary for the choice of features [224]. Appropriate features for a given problem are
usually unknown a priori, and as a result, many features are often generated without
any knowledge of their relevancy [225]. This is frequently the motivation behind
generating a large number of features in the first place. If a specific set of features
is known that completely defines the problem and accurately represents the input
patterns, then there is no need for any reduction in the feature space dimensionality.
In practice, however, this is often not the case, and an intelligent feature reduction
technique can simplify the classifiers that are built, resulting in both increased com-
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
63
putational speed and reduced memory requirements. Aside from performance gains,
there is a well-known phenomenon in pattern classification affectionately called the
Curse of Dimensionality that occurs when the number of objects to be classified is
small relative to the dimension of the feature vector. A generally accepted practice
in classifier design is to use at least ten times as many training samples per class as
the number of features [226]. The dataset becomes sparse when represented in too
high of dimension feature space, degrading classifier performance.
It follows that an intelligent reduction in the dimension of the feature space is
needed. There are two general approaches to reducing the feature set: feature extraction and feature selection. Feature extraction reduces the feature set by creating new
features through transformations and combinations of the original features. Principal
component analysis, for example, is a commonly used feature extraction technique.
Since we are interested in retaining the original physical interpretation of the feature
set, we opt not to use any feature extraction techniques in our analysis.
There are three general approaches for feature selection: wrapper methods, embedded methods, and filter methods [227]. Wrapper methods use formal classification
to rank individual feature space subsets, applying an iterative search procedure that
trains and tests a classifier using different feature subsets for accuracy comparison. This continues until a given stopping criterion is met [228]. This approach is
computationally intensive, and there is often a trade-off among algorithms between
computation speed and the quality of results that are produced [229–231]. Additionally, these methods have a tendency to over-train themselves, where data in the
training set is perfectly fitted and results in poor generalization performance [232].
Similar to wrapper methods, an embedded method performs feature selection while
constructing the classification algorithm itself. The difference is that the feature
search is intelligently guided by the learning process itself. Filter methods perform
their feature ranking by looking at intrinsic properties of the data without the input
of a formal classification algorithm. Traditionally, these methods are univariate and
therefore don’t account for multi-feature dependencies.
Regardless of feature selection, once a feature set has been finalized, the selection
of an appropriate classification algorithm depends heavily on what is known about the
domain of the problem. If the class-conditional densities for the problem at hand are
known, then Bayes decision theory can be applied directly to design a classifier. This,
however, is rarely the case for experimental data. If the training dataset has known
labels associated with it, then the problem is one of supervised learning. If not, then
the underlying structure of the dataset is analyzed through unsupervised classification
techniques such as cluster analysis. Within supervised learning, a further dichotomy
exists based on whether the form of the class-conditional densities is known. If it is
known, parametric approaches such as Bayes plug-in classifiers can be developed
that estimate missing parameters based on the training dataset. If the form of the
densities is unknown, nonparametric approaches must be used, often constructing
the decision boundaries geometrically from the training data.
Wavelets are very useful for analyzing time-series data because wavelet transforms
allow us to keep track of the time localization of frequency components. Unlike the
Fourier transform, which breaks a signal down into sine and cosine components to
64
M. K. Hinders and C. A. Miller
identify frequency content, the wavelet transform measures local frequency features
in the time domain. One direct advantage the wavelet transform has over the fast
Fourier transform is that the time information of signal features can be taken directly
from the transformed space without an inverse transform required. One common
wavelet-based feature used in classification is generated by wavelet packet decomposition (WPD) [233]. Yen and Lin successfully classify faults in a helicopter gearbox
using WPD features generated from time-domain vibration analysis signals [234].
The use of wavelet analysis for feature extraction has also been explored by Jin et al.
[235], where wavelet-based features have been used for damage detection in polycrystalline alloys. Gaul and Hurlebaus also used wavelet transforms to identify the
location of impacts on plate structures [236]. Sohn et al. review the statistical pattern
classification models currently being used in structural health monitoring [237]. Most
implementations involve identifying one specific type of “flaw”, including loose bolts
and small notches, and utilize only a few specific features to separate the individual
flaw classes within the feature space [238–240]. Biemans et al. detected crack growth
in an aluminum plate using wavelet coefficient analysis generated from guided waves
[241]. Legendre et al. found that even noisy electromagnetic acoustic transducer sensor signals were resolvable using the multi-scale method of the wavelet transform
[242]. The wavelet transform has also been shown to outperform other traditional
time–frequency representations in many applications. Zou and Chen compared the
wavelet transform to the Wigner–Ville distribution for identifying a cracked rotor
through changes in stiffness [243], identifying the wavelet transform to be more
sensitive to variation and generally superior to the Wigner–Ville distribution.
We have introduced the principles of using the DWFP wavelet transform technique for time-domain signal analysis. We next apply this technique to two real-world
industrial structural health monitoring (SHM) applications. First, we explore how the
DWFP technique can be utilized to distinguish dents and the resulting rear surface
cracks generated in aircraft-grade aluminum plates. We then show how wavelet fingerprints can be used to identify flaws in steel and aluminum naval structures.
2.7 Applications in Structural Health Monitoring
Originally explored by Lord Rayleigh in 1885 while investigating the propagation
of surface waves in a solid, e.g., as in earthquakes, the study of guided waves has
continued over the years [244]. The key technical challenge to using Lamb waves
effectively is automatically identifying which modes are in very complex waveform signals. There are usually several guided wave modes present at any given
frequency-thickness value, often having overlapping mode velocities. Since each
mode propagates with its own modal structure, modes traveling with the same group
velocity result in a superposition of the individual displacement and stress components. The resulting signals are inherently complex, and sophisticated analysis
techniques need to be applied to make sense of the signals. Contrary to traditional
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
65
Fig. 2.11 A plate with no flaw present, a plate with a shallow, large crack present, and a plate with
both a dent and a crack present
bulk-wave ultrasonics, standard peak-detection approaches often fail for Lamb wave
analysis.
2.7.1 Dent and Surface Crack Detection in Aircraft Skins
Aircraft manufacturers and operators have concerns over fuselage damage commonly
called “ramp rash” which occurs in the lower fuselage sections and can be caused
by incidental contact with ground equipment such as air stairs, baggage elevators,
and food service trucks [245]. Ramp rash costs airlines in both damage repair and
downtime [246]. Even minor damage can become the source of serious mechanical
failure if left unattended [247]. The formation of dents in the fuselage is common,
but the generation of hidden cracks can lead to serious complications. Repairing this
subsequent damage is critical for aviation safety; however, the hidden extent of the
damage is initially unknown. In order to accurately estimate repair requirements,
and therefore reduce unnecessary downtime losses, an accurate assessment of the
damage is required.
A continuously monitoring inspection system at fuselage areas prone to impact
would provide an alternative to conventional point-by-point inspections. Guided
waves have previously shown potential for damage detection in metallic aircraft
components. We test a guided wave-based SHM technique for identifying potential
flaws in metallic plate samples. We employ the DWFP to generate time-scale signal
representations of the complex guided waveforms, from which we extract subtle
features to assist in damage identification.
The samples tested were aircraft-grade 1.62-mm-thick aluminum 2024 plates
(with primer), approximately 45 cm × 35 cm in size. Some of them had dents, and
some also had simulated cracks of varying length and orientation, images of which
are shown in Fig. 2.11.
Experimental data was collected using transducers manufactured by Olympus.1
Signals were generated and received using a Nanopulser 3 (NP3) from Automated
1 Waltham,
MA (http://www.olympus-ims.com/en/).
66
M. K. Hinders and C. A. Miller
Inspection Systems.2 The NP3 integrates both a pulser/receiver and a digitizer into a
portable USB-interfaced unit, emitting a spike excitation which relies on the resonant
frequency of the transducers in order to generate the ultrasonic waves with a broad
frequency content to maximize scattering interaction with the surface cracks. The
compact size of the NP3 makes it favorable for in-field use for aircraft damage
analysis. A graphical user interface was developed in MATLAB to control the NP3
system.
We explore both a pitch/catch scanning configuration, where one transducer is
the transmitter while a second transducer is the receiver, as well as a pulse/echo
configuration where one transducer acts as both the transmitter and receiver. We
control the position of each transducer by attaching two individual transducers to
linear slides, controlled by stepper motors that advance them along the edges of the
test area. The test area was centered around the flaw, with the transducers placed
215 mm apart from each other and advancing through 25 individual locations in
8.6 cm increments for a total scan length of 215 mm.
Each waveform is run through the DWFP algorithm to generate a fingerprint
representation. This process is demonstrated for a typical waveform in Fig. 2.12 and
involves windowing the full signal around the first arriving wave mode, which is then
used as an input signal for the DWFP transformation.
Dent Detection—Pitch/Catch Scanning
We first present results here from the pitch/catch scanning configuration. The first
sample considered contained a crack only, and the raw waveforms can be seen in
Fig. 2.13. It can be seen that these signals are all very similar to each other, and little
information is readily available to identify the crack in the time domain. Figure 2.14
provides the DWFP representations of these same signals. For each individual DWFP
image, a specific feature is identified and automatically tracked throughout the scan
progression. The feature in this case is the first arriving mode, indicated by the first
“peak” (in white) in time. For each following DWFP image, this feature is identified
and highlighted by a red star (∗), with the corresponding position in time given to the
right of each fingerprint image. This first plate sample again shows no variation in
the DWFP representations as the transducers are moved along the edges of the plate.
The feature of interest highlighted varies little in its position. This indicates that the
pitch/catch scanning configuration is inadequate for identifying surface cracks in the
material.
The second plate scanned contained a dent only. Raw waveforms collected as the
pair of transducers progress along the sample edges can be seen in Fig. 2.15. It can
be seen again that these signals are all very similar to each other, with the timedomain representations unable to highlight the dent. Figure 2.16 provides the DWFP
representations of these same signals. The same feature described above, the first
arriving mode indicated by the first DWFP peak, is identified and tracked again. It
can be seen that there is a region of the sample where this feature shifts, indicating the
presence of a discontinuity in the propagation caused by the dent. It follows that the
pitch/catch scanning configuration is adequate for identifying dents in the material.
2 Martinez,
CA (http://www.ais4ndt.com/).
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
67
1500
Fig. 2.12 A raw
Rayleigh–Lamb waveform
collected from an unflawed
plate sample. The signal is
first windowed to the region
of interest, here the first
mode arrival, from which a
DWFP image is generated
1000
500
0
-500
-1000
-1500
55
60
65
70
75
80
85
90
95
Time (µs)
1000
0
-1000
74
75
76
77
78
79
80
81
Time (µs)
Scale
10
20
30
40
74
75
76
77
78
79
80
81
Time (µs)
Crack Detection—Pulse/Echo Scanning
We now use the same transmitting transducer in pulse/echo mode to determine if
any energy is being reflected from either type of flaw to aid in detection. We again
present results here for the two samples previously considered: one with a crack
and one with a dent. We first present the raw waveforms collected from interaction
with the crack only, shown in Fig. 2.17. The raw waveforms contain an initial highamplitude portion that is a result of energy reflecting within the transducer itself. We
window our search in the time frame where a reflection would be expected to exist
if present. It can be seen that these signals are all very similar to each other, and
all have a low signal-to-noise ratio making it difficult to easily detect any features
identifying a flaw.
Figure 2.18 provides the DWFP representations of these same signals. For each
individual DWFP image, a specific feature is identified and automatically tracked
throughout the scan progression. The feature in this case is the first peak “doublet”
feature after the 65 μs mark, where a doublet is indicated by two fingerprint features
68
M. K. Hinders and C. A. Miller
Fig. 2.13 Raw waveforms collected from a plate sample with a crack only, using a pitch/catch
scanning configuration. Little information is readily available from the time-domain representation
of the signal. The dotted lines indicate the windowed region used for the DWFP image generation
existing for different scales at the same point in time. For each following DWFP
image, this feature is identified and highlighted by a red star (∗), with the corresponding position in time given to the right of each fingerprint image. If no such
feature is found, the end point of the window is used. We can see that the DWFP
representations of the pulse/echo signals are able to identify waveforms that correlate
with the position of the crack if present.
The second plate under consideration contains a dent only. Raw waveforms collected as the transducer progressed along an edge of the sample can be seen in
Fig. 2.19. It is again clear that these signals are all very similar to each other, with
the low signal-to-noise ratios making it difficult to easily extract any information
from them. Figure 2.20 provides the DWFP representations of these same signals.
The same feature described above, the first peak “doublet” feature after the 65 μs
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
69
Fig. 2.14 DWFP representations of the pitch/catch raw waveforms shown in Fig. 2.13 from a
sample with a crack only. For each individual DWFP image, a specific feature is identified and
automatically tracked throughout the scan progression. The feature in this case is the first arriving
mode, indicated by the first “peak” (in white) in time. For each following DWFP image, this feature
is identified and highlighted by a red star (∗), with the corresponding position in time given to the
right of each fingerprint image
mark, is again identified and tracked. It can be seen that there is no region in this
scan where the feature is identified, indicating that this pulse/echo approach is not
sufficient to identify dents.
For each of the DWFP images, we have identified and automatically tracked in
time a specific feature within the image. We summarize the extracted time locations of
these features in Fig. 2.21, where the extracted times are plotted against their position
relative to the plate. Vertical dotted lines indicate the actual (known) locations of each
flaw present. We can clearly see that the pitch/catch waveforms are able to identify
the dent, while the pulse/echo waveforms are able to identify the crack. This follows
from the concept that the surface crack is not severe enough to distort much of the
70
M. K. Hinders and C. A. Miller
Fig. 2.15 Raw waveforms collected from a plate sample with both a dent and a crack, using a
pitch/catch scanning configuration. Little information is readily available from the time-domain
representation of the signal. The dotted lines indicate the windowed region used for the DWFP
image generation
propagating waveform, but still reflects enough energy to be identified back at the
initial transducer location.
Crack Angle Dependence
In order to determine how the angle of the crack affects the reflected signal’s energy
with respect to the incident wave angle, an angle-dependent study was performed.
Samples containing a large crack only, a dent without a crack, and a dent with a large
crack were included here. A point in the center of the flaw (either the center of the
dent or the center of the crack) was chosen as the center of rotation, and the Rayleigh–
Lamb transducers were placed 10 cm from this center point in 10◦ increments around
the point of rotation up to a minimum of 120◦ away from the starting location. A
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
71
Fig. 2.16 DWFP representations of the pitch/catch raw waveforms shown in Fig. 2.15 from a sample
with a dent only. For each individual DWFP image, a specific feature is identified and automatically
tracked throughout the scan progression. The feature in this case is the first arriving mode, indicated
by the first “peak” (in white) in time. For each following DWFP image, this feature is identified
and highlighted by a red star (∗), with the corresponding position in time given to the right of each
fingerprint image
pulse/echo measurement was taken at each location to measure any energy reflected
from the flaw.
Two features we extracted from the recorded pulse/echo signals are the arrival
time of the reflected signal and the peak instantaneous amplitude of that reflection.
The instantaneous amplitude is calculated by first calculating the discrete Hilbert
transform of the signal s(t), which returns a version of the original signal with a
90◦ phase shift (preserving the amplitude and frequency contents of the signal). The
magnitude of this phase-shifted signal and the original signal is the instantaneous
amplitude of the signal, another name for the signal’s envelope. The maximum of
72
M. K. Hinders and C. A. Miller
Fig. 2.17 Raw waveforms collected from a plate sample with a crack only, using a pulse/echo
scanning configuration. The low signal-to-noise ratio makes it difficult to analyze these raw timedomain signals. The dotted lines indicate the windowed region used for the DWFP image generation
this instantaneous amplitude is what will now be referred to as the “peak energy” of
the signal.
The first sample, which did not have a crack present, did not return any measurable
reflected energy at any angle. Both of the two samples with cracks show a measurable
reflection from the crack when the incident angle is normal to the crack, regardless
of whether or not a dent is present. Incident angles that are 0–20◦ from normal still
had measurable reflection energy; however, incident angles beyond that did not have
a significant measurable reflection. These results agree with expectations that the
cracks would be highly directional in their detectability (Fig. 2.22).
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
73
Fig. 2.18 DWFP representations of the pulse/echo raw waveforms shown in Fig. 2.17 from a sample
with a crack only. For each individual DWFP image, a specific feature is identified and automatically
tracked throughout the scan progression. The feature in this case is the first peak “doublet” feature
after the 65 μs mark, where a doublet is indicated by two fingerprint features existing for different
scales at the same point in time. For each following DWFP image, this feature is identified and
highlighted by a red star (∗), with the corresponding position in time given to the right of each
fingerprint image
2.7.2 Corrosion Detection in Marine Structures
Structural health monitoring is an equally important area of research for the world’s
navies. From corrosion due to constant exposure to harsh saltwater environments, to
the more recent issue of sensitization due to the cyclic day/night heat profile exposure
of the open water, maritime vessels are in constant need of repair. The biggest cost
in maintenance of these ships is often having to pull them out of service in order to
characterize and repair any and all damage present. Navies and shipyards are actively
74
M. K. Hinders and C. A. Miller
Fig. 2.19 Raw waveforms collected from a plate sample with a dent only, using a pulse/echo
scanning configuration. The low signal-to-noise ratio makes it difficult to analyze these raw timedomain signals. The dotted lines indicate the windowed region used for the DWFP image generation
researching intelligent monitoring systems that will provide constant feedback on the
structural integrity of areas prone to these damages.
Lamb waves provide a natural approach to identifying corrosion in metals. Each
mode’s group velocity is dependent on both the inspection frequency used and the
material thickness. Since corrosion can be thought of as a local material loss, it follows
that mode velocities will change when passing through a corroded area. They will
either speed up or slow down, depending on the modes and frequency-thickness
product being used. Often several modes will propagate simultaneously within a
structure and will overlap each other if they share a similar group velocity. As long
as a frequency-thickness regime is chosen so that a single mode is substantially faster
than the rest and therefore arrives earlier, most of the slower, jumbled modes can be
windowed out of the signal. Key to utilizing Lamb waves for SHM is understanding
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
75
Fig. 2.20 DWFP representations of the pulse/echo raw waveforms shown in Fig. 2.19 from a sample
with a dent only. For each individual DWFP image, a specific feature is identified and automatically
tracked throughout the scan progression. The feature in this case is the first peak “doublet” feature
after the 65 μs mark, where a doublet is indicated by two fingerprint features existing for different
scales at the same point in time. For each following DWFP image, this feature is identified and
highlighted by a red star (∗), with the corresponding position in time given to the right of each
fingerprint image
this dispersion-curve behavior since this is what allows arrival time shifts to be
correlated directly to material thickness changes.
In order to reliably identify these arrival time shifts in Lamb wave signals, we
again employ the dynamic wavelet fingerprint (DWFP) technique. The patterns in
DWFP images then allow us to identify particular Lamb wave modes and directly
track subtle shifts in their arrival times.
Apprentice shipbuilders fabricated a “T”-shaped plate sample made of 9.5-mm
(3/8-in.)-thick mild steel for testing, as shown in Fig. 2.23. The sample was first
ground down slightly in several different areas on both the top and bottom surfaces
M. K. Hinders and C. A. Miller
Feature arrival time (µs)
76
75.6
75.4
75.2
75
74.8
74.6
0
20
40
60
80
100
120
140
160
180
200
140
160
180
200
140
160
180
200
140
160
180
200
Feature arrival time (µs)
Position (mm)
80
60
40
20
0
0
20
40
60
80
100
120
Feature arrival time (µs)
Position (mm)
75.6
75.4
75.2
75
74.8
74.6
0
20
40
60
80
100
120
Feature arrival time (µs)
Position (mm)
80
60
40
20
0
0
20
40
60
80
100
120
Position (mm)
Fig. 2.21 Extracted time locations of the features identified in each of the DWFP representations
for the sample with a crack only using pitch/catch and pulse/echo waveforms, as well as those from
the sample with a dent only using pitch/catch and pulse/echo waveforms. We see that the pitch/catch
scanning configuration is able to identify any dents present, while the pulse/echo configuration is
better at identifying any cracks present
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
77
Peak energy
15
10
5
0
0
20
40
60
80
Degrees
100
120
100
120
Peak energy
15
10
5
0
0
50
100
Degrees
Peak energy
15
10
5
0
0
20
40
80
60
Degrees
Fig. 2.22 A plate sample with a dent only. Results show no reflection above the noise level at any
angle due to no crack being present. A plate sample with a dent and a large crack. Results show
significantly higher reflection energy when the Rayleigh–Lamb waves were incident at an angle
close to broadside. A plate sample with a large crack only. Results show highest reflection energy
at normal incidence to the crack
to simulate the effects of corrosion. The sample was then covered in yellow paint,
and a 1-in.-thick green foam layer was bonded to both the top and bottom surfaces.
The sample was intended to be representative of bulkhead sections.
Shear wave contact transducers in a parallel pitch/catch scanning configuration
were used to systematically scan the full length of the T-plate. A Matec3 TB1000
3 Northborough,
MA (http://www.matec.com/).
78
M. K. Hinders and C. A. Miller
Fig. 2.23 A 9.5-mm (3/8-in.)-thick steel bulkhead sample used in this analysis. Several areas on the
sample were ground down to introduce simulate wastage or corrosion. The sample was then covered
in yellow paint and then a 1-in.-thick insulating layer of green foam was bonded to the surface.
In this picture, a section of the green foam has been removed over one of the several underlying
thinned regions
pulser/receiver was paired with a Gage4 CS8012a A/D digitizer to collect data.
We selected a 0.78 MHz tone-burst excitation pulse for this inspection, giving a
frequency-thickness product of 7.4 MHz-mm. This corresponds to an area of the dispersion curve where the S2 mode is fastest and its group velocity is at a maximum,
shown in Fig. 2.24. As the thickness of the plate decreases, i.e., a corroded region,
the frequency-thickness will decrease resulting in a shifted, slower S2 group velocity
and therefore a later arrival time of the S2 mode. Because the entire T-plate sample
4 Lockport,
IL (http://www.gage-applied.com/).
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
79
Fig. 2.24 Group velocity dispersion curve for steel. At a frequency-thickness product of 7.4 MHzmm, the S2 mode is the fastest and therefore first arriving mode. If the wave propagates through an
area of reduced thickness, the frequency-thickness product drops as well. This results in a slower
S2 mode velocity, and therefore a later arrival time
was covered in the thick foam insulation, we removed a narrow strip from the edges
of the plate in order to have direct contact between the transducers and the material.
The remaining foam was left on the sample for scanning. This insulation removal
could be avoided if transducers were bonded to the material during the construction
process. The transducers were stepped in parallel along opposing exposed edges of
the T-plate in 1 cm steps through 29 total locations, covering the full length of the
sample (projection 1). The sample was rotated 90◦ , and the process was repeated for
the remaining two edges (projection 2).
In order to extract mode arrivals from the raw waveforms, we used the DWFP
technique. We first filtered the raw waveforms with a third-order Coiflet mother
wavelet. We then used the DWFP to generate 2D fingerprint images. These are
used to identify the features of interest, here the S2 mode arrival, in the signals.
Proper mode selection is especially important for this plate sample because of the
thick rubbery coating on each surface. Some modes are strongly attenuated by such
coatings, although we found that the S2 mode was able to both propagate well and
detect the thinning flaws.
Results
The raw waveforms were converted to DWFP fingerprint images for analysis. The
mode of interest is the first arriving S2 , so the signals were windowed around the
expected S2 arrival time in order to observe any changes due to damage. A raw
80
M. K. Hinders and C. A. Miller
Fig. 2.25 A raw Lamb wave signal (top) is first windowed around the region of interest, here
the first mode arrival time. DWFP images of this region can then be compared directly between
unflawed (middle) and flawed (bottom) signals. A simple tracking of the mode arrival time can be
applied, shown here as the red dot and triangle with corresponding arrival time on the right of each
image, allowing for the identification of any flawed regions
waveform is shown in Fig. 2.25, with a windowed DWFP representation for both
an unflawed ray path and one that passes through a flaw on the test sample. In each
DWFP image, the red circle and corresponding red triangle indicate the automatically
identified S2 arrival time, provided to the right of each image. It was found that several
areas along the length of the plate in each orientation had easily identifiable changes
in S2 mode arrival time from the 132.0 μs arrival time for the unflawed waveforms.
We can apply a simple threshold to these extracted mode arrival times, where any
waveforms with arrival times later than 134.0 μs are labeled as “flawed”, and any
that arrive before this threshold are labeled as “unflawed”. Since we collected data
along two orthogonal directions in projections 1 and 2, we can map out the flawed
versus unflawed ray paths geometrically and identify any hotspots where ray paths
in both orientations identify a suspected flaw. This is illustrated in Fig. 2.26, where
“flawed” waveforms from each projection are indicated by gray, and any spatial
areas that indicated “flaw” in both orientations highlighted in red. We also provide
a photograph of the actual sample with the foam insulation completely removed for
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
81
Fig. 2.26 Spatially
overlaying the individual ray
paths onto a grid of the
bulkhead sample and
thresholding their first mode
arrival time, we can see
agreement between our
experimentally identified
“flawed” areas highlighted in
red and the known thinned
regions of the plate identified
by the blue (top) and white
(bottom) ellipses
final identification of the flawed regions, indicated by the blue/white ovals in each
subfigure. Excellent agreement was found between the suspected flaw locations and
the sample’s actual flaws.
It should be noted that these results were collected with transducers stepping
in parallel (keeping straight across from each other) in two orthogonal directions,
allowing us to do a reasonably good job of localizing and to some extent sizing the
flaws. Two of the expected flaw areas are over-sized, and there is one “ghost flaw”
which isn’t actually present but is due to an artifact of the “shadows” created by two
alternate flaws. In order to more accurately localize and size flaws, it is necessary to
82
M. K. Hinders and C. A. Miller
incorporate information from Lamb wave ray paths at other angles. The formal way
to do this is called Lamb wave tomography, [114–125].
2.7.3 Aluminum Sensitization in Marine Plate-Like
Structures
Al–Mg-based aluminum alloys suffer from intergranular corrosion and intergranular
stress corrosion cracking when the alloy has become “sensitized” [248]. The degree
of sensitization (DoS) is currently quantified by the ASTM G67 Nitric Acid Mass
Loss Test. A nondestructive method for rapid detection and the assessment of the
condition of DoS in AA5XXX aluminum alloys in the field is needed [249]. Two
samples of aluminum plate approximately 2 × 3 in size were provided for this study,
which had been removed from a Naval vessel due to sensitization-induced cracking.
Sample 1, which was 0.375 thick, can be seen in Fig. 2.27, and Sample 2, which was
0.25 thick, can be seen in Fig. 2.28. The degree of sensitization (DOS) of each was
unknown, although Sample 1 appeared to have a line of testing pits from a previous
destructive sensitization measurement.
The guided wave apparatus that was used in this data collection is a Nanopulser 3
(np3) pocket-sized, computer-controlled pulser/receiver and analog-to-digital (A/D)
converter combination unit developed by AIS5 connected to a computer running
MATLAB. The np3 generates a spike excitation voltage, which is converted to
mechanical energy by a piezoelectric transducer in contact with the sample surface.
The resonant frequency of the transducer determines the frequency content of ultrasonic waves which are generated. These Lamb waves then propagate throughout the
sample while the receiving transducer converts these vibrations back into a voltage,
which is then sampled and recorded via the on-board A/D. The transducer pairs used
for the data shown here were Panametrics V204 2.25 MHz contact transducers made
by Olympus, although a wide selection of transducer configurations was available
for use with this apparatus.
Ultrasonic C-scans of each of the samples were performed prior to Lamb wave
scanning, with a 10.0 MHz unfocused (immersion) transducer stepped point-by-point
in a raster pattern above the area of interest, recording the pulse-echo ultrasonic
waveforms at each location. These signals are then gated around the reflection of
interest—i.e., from the top surface, bottom surface, or a defect in between—and
a map of that reflection time delay can be reconstructed. Another possible feature
that can be extracted is the maximum amplitude of the reflected signal. Many more
exist, each offering their own unique advantages. Two examples, both maximum
amplitude and top surface reflection time-of-arrival, can be seen in Figs. 2.29 and
2.30 for two regions of interest on one of the samples, a crack and a bottom-surface
welded attachment, respectively.
5 Automated
Inspection Systems, Martinez, CA. http://ais4ndt.com/index.html.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
83
Fig. 2.27 Photographs of
Sample 1 showing the top
and bottom surfaces. The
bottom surface has a vertical
line of small pits which are
thought to be testing points
from a previous destructive
sensitization measurement,
although no information
about the results of those
tests has been made available
for this study
Due to the large size and curvature of the two samples, they were each scanned in
smaller regions and the individual C-scans for each reason were stitched together in
the post-scan analysis to form composite images. As this technique uses a point-bypoint measurement, each individual section contained a 61 × 61 grid of waveform
recordings, and took approximately 35 min to complete. Each of the two samples
therefore took roughly 12 h to fully scan in an automated, continuous manner. Postprocessing to form the composite C-scans was done interactively and was also quite
time-consuming. Results can be seen in Fig. 2.30. It can be seen that the Sample 1
results highlight both the macro-top surface features in Fig. 2.27 and more subtle
and reverse-side features. Results for Sample 2 also highlight the obvious macrofeatures, but don’t return any unexpected features that weren’t visible to the naked
eye (Figs. 2.31 and 2.32).
Once the two samples had been characterized by traditional C-scan techniques,
guided waves were used to interrogate a variety of the features found on the samples.
Unlike the individual point-by-point pulse/echo measurements collected in the Cscan, guided waves propagate along the plate samples between pairs of transducers
which can be widely separated. This allows for wave interactions with structural
84
M. K. Hinders and C. A. Miller
Fig. 2.28 Photographs of
Sample 2 showing the top
and bottom surfaces. The top
surface has original paint still
attached, while the bottom
surface has the remnants of
attached insulation in places
features and/or flaws along the entire “ray path” between two transducers arranged
in a pitch/catch setup. Because of this, only a handful of waveforms are needed to
interrogate a large region of interest. This rapid, large-area inspection capability is
the inherent advantage of guided waves. No special surface treatment or coupling is
required, and the transducers only need direct contact with the surface at the pitch–
catch locations. Between the transducers, the surface can be covered or otherwise
inaccessible and the transducers can contact the plate from whichever surface is most
convenient.
Sample 1 had three regions on it where guided waves were used to characterize an
existing feature, as can be seen in Fig. 2.33: an L-shaped surface weld attachment, a
through-thickness crack, and a rear weld located on the opposite side of the sample.
Sample 2 had two regions where guided waves were used, as shown in Fig. 2.34: an
area to demonstrate the distance guided waves are able to propagate as well as the fact
that they also follow the curvature of the plate, and an area where a through-hole had
been poorly repaired. Results that follow are presented via both raw waveforms and
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
85
Fig. 2.29 C-scan results from a subsection of Sample 1 that contained through-thickness crack
using a time-of-arrival extraction technique, where the color relates to the height of the top surface
of the sample. The same data was processed to extract a maximum amplitude value for each signal,
where the color essentially relates to the strength of the reflected signal. Areas where the material
was not consistent or was angled oddly show up as lower energy reflections. The photograph shows
the actual scanned area highlighted by the green dotted line, as well as the visible crack, traces of
which are seen in both of the C-scan representations
DWFP representations, and we will see that the DWFPs allow for easier recognition
of subtle differences between the raw signals.
Guided Wave Results—Surface Weld
The first region inspected using guided waves, a surface weld located on Sample 1,
included an obvious surface feature in the shape of an L where an attachment had
been previously removed, as can be seen in Fig. 2.35. Transmitting and receiving
transducers were placed 17 cm apart from each other on opposite sides of the feature,
and stepped horizontally across the region in 2 cm steps, recording a total of 11
86
M. K. Hinders and C. A. Miller
Fig. 2.30 C-scan results from a subsection of Sample 1 that contained a weld on the reverse side
of the plate using a time-of-arrival extraction technique, where the color relates to the height of the
top surface of the sample. The same data was processed to extract a maximum amplitude value for
each signal, where the color essentially relates to the strength of the reflected signal. The weld on
the reverse side of the plate scatters the incident wave, resulting in a lower energy reflection. The
photograph shows the actual scanned area highlighted by the green dotted line. The weld is not
visible from this side of the plate; however, it clearly shows up in the maximum amplitude C-scan
image
waveforms. Even positioning the transducers manually, this 340 cm2 region only
took around one minute to cover using guided waves, which is significantly shorter
than the 40+ min required for an automated C-scan of the same area. Visually, it
appears that waveforms 04–08 pass through, or immediately next to, this L feature.
The raw waveforms from surface weld region can be seen in Fig. 2.36. In this
case, it is not too difficult to see that waveforms 5 and 6 are lower in amplitude
than the surrounding waveforms, which correspond to the two ray paths that travel
directly through the L-shaped surface feature. These signals were windowed around
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
87
Fig. 2.31 Composite C-scan representation of Sample 1 using a time-of-arrival extraction technique. With point-by-point pulse-echo ultrasound, the time-of-arrival of the reflection measures the
height of the top surface of the plate, which shows up as variations in color intensity in this image.
All the surface features can be easily identified using this method; however, features beneath the
surface or on the reverse side do not show up. An alternate representation for the C-scan data is by
extracting the maximum signal amplitude, which relates the color intensity to the reflected signal
energy. In this image, not only are the surface feature visible, but the two large welds on the reverse
side are detectable as well. It should be noted that the curvature in these representations is largely
an artifact of the section-by-section stitching algorithm and is not necessarily to scale
88
M. K. Hinders and C. A. Miller
Fig. 2.32 Composite C-scan representation of Sample 2 using a time-of-arrival extraction technique. With point-by-point pulse-echo ultrasound, the time-of-arrival of the reflection measures
the height of the top surface of the plate, which shows up as variations in color intensity in this
image. The surface features can be easily identified using this method; however, the gradual thinning
section is hard to identify from this image alone. An alternate representation for the C-scan data is
by extracting the maximum signal amplitude, which relates the color intensity to the reflected signal
energy. In this image, not only are the surface feature visible, but the gradual thinning region shows
up more clearly as well. It should be noted that the curvature in these representations is largely an
artifact of the section-by-section stitching algorithm and is not necessarily to scale
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
89
Fig. 2.33 Regions of
Sample 1 where guided wave
data was collected, including
a surface weld, a
through-thickness crack, and
a weld located on the reverse
side of the plate. Each region
was inspected individually
using guided waves. All
three regions represent
different types of features
that the through-thickness
profiles of guided waves will
interact with
Fig. 2.34 Regions of
Sample 2 where guided wave
data was collected, including
a full-length region to
highlight the guided waves’
ability to propagate long
distances and follow plate
curvature, as well as a
section that included a
thinned region with a
through-hole
the Lamb wave arrival time, as identified by dotted lines in Fig. 2.36, and DWFP
representations were generated as shown in Fig. 2.37. The red arrow indicates a
“double-triangle” signal feature that changes with respect to the location L-shaped
feature. The feature vanishes completely in signals 5 and 6 but also appears slightly
different in waveforms 4 and 7, corresponding to the ray paths that propagate along
the edges of the L so that the Lamb waves interact weakly with it. By tracking this
feature, the guided waveforms allow a relatively complete interrogation of the full
340 cm2 region around the surface weld, interacting strongly with the surface L in
the center of the test area.
90
M. K. Hinders and C. A. Miller
Fig. 2.35 Picture of the L-shaped surface weld on Sample 1 with an overlay of the ray paths used
in the guided wave inspection. A transmitting transducer generating a signal was placed at one end
of ray path 01 with a receiving transducer to record the signal placed at the other. Both transducers
were then moved to ray path 02, and the process was repeated across the weld region
2.7.4 Guided Wave Results—Crack
A through-thickness crack near the edge of Sample 1 can be seen in Fig. 2.38. Visually
the crack appeared to have propagated a few centimeters, but the actual end location
of the crack was not known. Lamb waves are quite sensitive to cracks, since throughthickness cracks will transmit very little of the Lamb wave signal. Transmitting and
receiving transducers were placed 5 cm apart from each other, on either side of the
crack, and stepped parallel to the length of the crack in 2 cm increments.
The raw waveforms from the region along the crack can be seen in Fig. 2.39.
Since ultrasonic vibrations at MHz frequencies do not travel well through air, any
cracks or similar discontinuities in the material will reflect almost all of the wave
energy. It follows that a receiving transducer will pick up essentially zero signal so
long as a full-thickness crack is directly between it and the transmitting transducer.
This is seen in waveforms 01–03. If the crack depth is only partial thickness of the
plate, then some wave energy can propagate through the still intact portion of the
material. This can be seen in waveform 04, where a very low amplitude signal was
recorded. The signal gains amplitude in waveforms 05 and 06, indicating that the
crack is still present but less severe, and waveforms 07–09 appear to have no crack
present. The DWFP representations reinforce these interpretations, as can be seen in
Fig. 2.40. These results indicate that the severity of the crack is larger than visually
apparent, i.e., it extends several centimeters past where it’s obvious visually. Since
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
91
Fig. 2.36 Raw waveforms corresponding to ray paths covering a region with the L-shaped surface
weld. The raw waveforms are difficult to interpret, with no clear indication of a weld present in any
of the ray paths other than a slight reduction in amplitude in signals 5 and 6. Windowed sections
of these signals, defined by the dotted lines, were used to generate DWFP fingerprints, as shown in
Fig. 2.37
guided waves propagate with a full-thickness displacement profile, they are useful
at identifying and locating hidden cracks.
Guided Wave Results—Rear Weld
The scanning surface for the rear weld on Sample 1 can be seen in Fig. 2.41. However,
the feature of interest in this section is actually on the opposite side of the plate.
Highlighted by the blue dotted line in Fig. 2.41, an attachment on the rear side had
been cut off at the weld, leaving behind the remaining weld in a box outline. The
transmitting and receiving transducers were placed 10 cm apart, so the ray paths
would cross this box weld as shown in Fig. 2.41. Geometrically, it was known that
ray paths 04–07 crossed the weld, with ray paths 03 and 08 not only crossing but
running along the weld.
The raw waveforms, as shown in Fig. 2.42, are more difficult to interpret in this
example, as there is no obvious signal feature that correlates with the box weld. The
windowed and converted DWFP representations, as shown in Fig. 2.43, offer a better
comparison. As indicated by the red arrow, an early DWFP feature can be seen that
shifts in time (to the new location of the blue arrow) when the ray paths cross the
92
M. K. Hinders and C. A. Miller
Fig. 2.37 DWFP representations of the raw waveforms covering an area with the L shaped surface
weld. A column beneath the red triangle highlights the location of a “double-triangle” signal feature
which can be used to identify the surface weld. As the ray paths cross the surface weld, this doubletriangle feature first loses amplitude in signal 4, and then disappears completely in signals 5 and 6,
only to reappear once the ray path has crossed the weld. By tracking this signal feature, the guided
waveforms allow a complete interrogation of the full 40 cm2 region around the surface weld
Fig. 2.38 Picture of the through-thickness crack on Sample 1 with an overlay of the ray paths used
in the guided wave inspection. A transmitting transducer generating a signal was placed at one end
of ray path 01 with a receiving transducer to record the signal placed at the other. Both transducers
were then moved to ray path 02, and the process was repeated along the crack region
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
93
Fig. 2.39 Raw waveforms corresponding to ray paths covering a region with a through-thickness
crack. The raw waveforms easily identify where the crack is located, and even suggest that the crack
is longer than visually apparent. Since guided waves propagate with a full-thickness displacement
profile, they are useful at identifying and locating cracks. Windowed sections of these signals,
defined by the dotted lines, were used to generate DWFP fingerprints, as shown in Fig. 2.40
weld located on the reverse side of the plate. Also, in waveforms 03 and 08, there is
an even greater time shift corresponding to the ray paths traveling along the weld,
rather than just crossing it, resulting in the earliest DWFP feature being even further
to the right than the blue arrow. By changing the time-of-arrival of the wave modes,
such welds are readily detectable as effective local increases in plate thickness. These
results show that the guided waves are sensitive to surface features on either side of
a plate, which is essential in situations where only one side of the plate is accessible.
Guided Wave Results—Distance and Curvature
The region of Sample 2 used to demonstrate the propagation properties of guided
waves can be seen in Fig. 2.44. The region of interest spans the full 46 cm length
of the plate and follows the greatest curvature of the plate. Several ray paths were
considered, again 2 cm apart from each other. There were no visible flaws in this
section, and the sample still had paint.
The raw waveforms from this region can be seen in Fig. 2.45. Again, the raw
waveforms all appear very similar. Any differences should be minor, since no flaws
94
M. K. Hinders and C. A. Miller
Fig. 2.40 DWFP representations of the raw waveforms covering a region with a through-thickness
crack. The red triangle highlights the arrival time of the waveforms that do not propagate through
the crack. It is clearly seen that waveforms 1–4 have almost zero signal because they are uniformly
dark, indicating that the crack is not only present but extends the full depth of the plate and blocks
the Lamb waves so that no signal is recorded at the receiving transducer. Waveforms 5 and 6 show
a clear signal; however, the first feature seen in the DWFP is a bit further to the right than in
waveforms 7–9, indicating that waveforms 5 and 6 contain slower Lamb wave modes as a result
of a partial-thickness crack still being present. Waveforms 7–9 all appear essentially identical and
correspond to no flaw being present. By tracking the mode arrival features in the DWFP, crack
presence and severity can be determined
were believed to be present along any of the ray paths. The DWFP representations can
be seen in Fig. 2.46 and do not show significant differences between the waveforms.
The initial mode arrival, indicated by the red arrow, does not change in time apart
from slight differences due to coupling since the data was collected by hand. All the
fingerprint representations share the same overall structure, as was expected.
These results highlight the guided waves’ ability to propagate longer distances,
interacting with the full plate thickness throughout the entire propagation path. The
curvature of the plate is also followed, allowing for more complicated structures to
be investigated using guided waves.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
95
Fig. 2.41 Picture of the region of Sample 1 where a rear-located weld was present, indicated by the
blue dotted line, with an overlay of the ray paths used in the guided wave inspection. A transmitting
transducer generating a signal was placed at one end of ray path 01 with a receiving transducer
to record the signal placed at the other. Both transducers were then moved to ray path 02, and the
process was repeated along the width of the region
Fig. 2.42 Raw waveforms corresponding to ray paths covering a region with a weld on the reverse
side of the plate. The raw waveforms do not easily identify the presence of the weld, which runs
from waveforms 3–8. Welds which are effective local increases in plate thickness result in a change
in arrival time for the guided waves. Windowed sections of these signals, defined by the dotted
lines, were used to generate DWFP fingerprints for further inspection, as shown in Fig. 2.43
96
M. K. Hinders and C. A. Miller
Fig. 2.43 DWFP representations of the raw waveforms covering region with a weld located on the
reverse side of the plate. The red triangle highlights the arrival time of the waveforms that do not
interact with the weld, which can be seen to correspond with a small initial group of fingerprints in
1–2 and 9–12. This group of fingerprints is shifted to the right, meaning later in time, in waveforms
5–7. These waveforms are known to cross the weld located on the rear side of the plate. This local
increase in thickness slows the mode velocity momentarily, resulting in a later mode arrival time.
Waveforms 3, 4, and 8 have an even further shift in mode arrival, which agrees with the fact that
these ray paths run along the weld, not just crossing it. By tracking a specific feature in the DWFP
corresponding to a mode arrival, it is straightforward to track features on either side of the plate,
regardless of which side is accessible for scanning
Guided Wave Results—Thinning Near Hole
A region of Sample 2 that contained a thinned region with a through-hole can be seen
in Fig. 2.47. This is a 29 cm by 20 cm region of the plate, with the waveforms crossing
an unflawed region and then a region with a poorly repaired through-hole. Around
the repaired through-hole, the plate gradually loses thickness in a larger circular area.
Again, the paint was still left on the sample in its as-received state.
The raw waveforms can be seen in Fig. 2.48. They are difficult to distinguish from
each other, although they appear to have a gradual decrease in amplitude as the ray
paths approach the flaw. When the signals are windowed and DWFP representations
are generated, a more thorough analysis can be performed. Waveforms 1–4 appear
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
97
Fig. 2.44 Picture of the region of Sample 2 where the propagation distance was to be demonstrated
along with the property that the guided waves follow plate curvature, with an overlay of the ray
paths used in the guided wave inspection. A transmitting transducer generating a signal was placed
at one end of ray path 01 with a receiving transducer to record the signal placed at the other. Both
transducers were then moved to ray path 02, and the process was repeated across the region. The
curvature of the plate is also highlighted
very similar to each other, both in the time domain and the number of rings present
in the fingerprint objects. The red arrow corresponds to the point in time for first
mode arrival. Waveforms 5–10 all show a reduced number of rings in the fingerprint
regions, as the ray paths cross the reduced thickness region in the repaired hole. They
also show a shifted mode arrival, as indicated by the blue arrow. Waveform 11 shows
a further time shift, corresponding to the ray path that crosses the through-hole itself.
Again, the guided waves are successfully able to interrogate the plate sample and
identify the flawed region. A series of recorded signals can cover a large region in
very little time, while the DWFP representations allow for quick fingerprint feature
identification, a process that can also be automated using standard image processing
techniques. The rapid inspection and analysis of these signals allows for a thorough
large-area inspection technique.
98
M. K. Hinders and C. A. Miller
Fig. 2.45 Raw waveforms corresponding to ray paths propagating the full length of the plate. The
raw waveforms all appear quite similar, with mode arrival times all the same. Windowed sections
of these signals, defined by the dotted lines, were used to generate DWFP fingerprints, as shown in
Fig. 2.46
2.8 Summary
We have demonstrated the usefulness of the dynamic wavelet fingerprinting technique
for analysis of time-domain signals, with three specific applications in the field of
structural health monitoring. We have discussed the advantages of time–frequency
signal representations, and the specific subset associated with time-scale wavelet
transformations. We demonstrated how the DWFP technique can be used to automatically identify differences between dents and subtle surface cracks in aircraft-grade
aluminum through a combination of pitch/catch and pulse/echo inspection configurations. Combined with straightforward image analysis techniques identifying and
tracking specific features of interest within these DWFP images, we have shown
how to implement a low-power, portable Rayleigh–Lamb wave inspection system for
mapping flaws in airplane fuselages caused by incidental contact on runways. Additionally, we have presented a similar approach for the identification and localization
of corrosion and cracks in marine structures. Even when the material is underneath a
bonded layer of insulation, the guided wave modes were shown to reliably propagate
the full length of the sample without significant distortion of the wavelet fingerprints.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
99
Fig. 2.46 DWFP representations of the raw waveforms propagating the full length of the plate. The
red triangle highlights the approximate arrival time of the guided wave modes in all the waveforms.
Because the propagation distance was so long and the transducers were held by hand, the effects of
transducer coupling and orientation show up as minor fluctuations in the fingerprints. The overall
shape of the fingerprints and time-of-arrival is similar for all waveforms, indicating that there were
no hidden flaws present in any of the ray paths. The main point here is that the signals were still
very usable propagating the full distance of the plate, even with the curvature present
Multiple simulated corrosion regions were identified using DWFP image analysis,
again by identifying a specific feature of interest and tracking it. The two sensitized
aluminum samples provided for this work contained a variety of flaws and features
that can be located and characterized with Lamb waves. The wavelet fingerprint representation of Lamb waveforms that we have used here provides a route to routine
use of Lamb waves outside of laboratory environments. The method can be tailored
to specific inspection scenarios of interest, and the identification of specific “flaw
signatures” can be done automatically and in real time.
These examples both make use of a DWFP feature identification technique, where
the feature identified is specific to the application. There is no one “magic” feature we
can identify that works across all applications. Here, we had insight into the underlying physics of Lamb wave propagation and were able to predict what types of features
in the signals would correlate with the changes we were interested in identifying, i.e.,
changes in mode arrival time or structure due to some form of material damage. For
100
M. K. Hinders and C. A. Miller
Fig. 2.47 Picture of the region of Sample 2 where a gradual thinning occurs along with an eventual
through-hole, highlighted by the blue dotted line, with an overlay of the ray paths used in the guided
wave inspection. A transmitting transducer generating a signal was placed at one end of ray path
01 with a receiving transducer to record the signal placed at the other end. Both transducers were
then moved to ray path 02, and the process was repeated up along the area of interest. It should be
noted that ray path 01 corresponds to no flaw, progressing to ray path 11 which corresponds to the
through-hole
Fig. 2.48 Raw waveforms corresponding to ray paths covering a region with gradual thinning
leading up to a through-hole. The raw waveforms 5–11 appear to have a lower amplitude than
waveforms 1–4; however, not much else can be determined directly from the raw signals. Windowed
sections of these signals, defined by the dotted lines, were used to generate DWFP fingerprints for
further inspection, as shown in Fig. 2.49
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
101
Fig. 2.49 DWFP representations of the raw waveforms covering a region with gradual thinning
leading up to a through-hole. The red triangle highlights the arrival time of the waveforms that do
not interact with the thinned region, as shown in waveforms 1–4. As the ray paths start interacting
with the thinned region, there is an obvious shift in arrival time where the first fingerprint feature
is further to the right, as indicated by the blue triangle. This indicates that the plate thickness has
decreased in this region. Waveform 11 has a significantly later mode arrival, corresponding to the ray
path that crosses the through-hole. The hole scatters most of the waveforms’ energy. By tracking the
mode arrival through a specific feature in the DWFP, it is straightforward to identify when material
thickness changes or when flaws scatter the propagating wave’s energy
many time-domain signals, we either do not have prior knowledge of expected signal changes, or they are too complicated for analytical solutions that might provide
insight into the physical interaction at hand. The DWFP analysis technique provides
enough freedom for generating and isolating useful features of interest; however,
only formal pattern classification feature selection would guarantee the inclusion of
the best features for a given application.
102
M. K. Hinders and C. A. Miller
References
1. Lamb H (1917). On waves in an elastic plate. In: Proceedings of the royal society of London
series A, Vol. XCIII, pp. 114–128
2. Worlton DC (1961) Experimental confirmation of lamb waves at megacycle frequencies. J
Appl Phys 32(6):967–971
3. Viktorov IA (1967) Rayleigh and Lamb waves - physical theory and applications. Plenum
Press, New York
4. Graff KE (1991) Wave motion in elastic solids. Dover, New York
5. Rose J (1999) Ultrasonic waves in solid media. Cambridge University Press, Cambridge
6. Rose JL (2002) A baseline and vision of ultrasonic guided wave inspection potential. J Press
Vessel Technol 124:273–282
7. Giurgiutiu V, Cuc A (2005) Embedded nondestructive evaluation for structural health monitoring, damage detection, and failure prevention. Shock Vib Digest 37:83–105
8. Su Z, Ye L, Lu Y (2006) Guided Lamb waves for identification of damage in composite
structures: a review. J Sound Vib 295:753–780
9. Raghavan A, Cesnik CES (2007) Review of guided-wave structural health monitoring. Shock
Vib Digest 39(2):91–114
10. Giurgiutiu V (2007) Damage assessment of structures - an US Air Force office of scientific
research structural mechanics perspective. Key Eng Mater 347:69–74
11. Giurgiutiu V (2010) Structural health monitoring with piezoelectric wafer active sensors predictive modeling and simulation. INCAS Bull 2(3):31–44
12. Adams DA (2007) Health monitoring of structural materials and components. Wiley, West
Sussex
13. Giurgiutiu V (2008) Structural health monitoring with piezoelectric wafer active sensors.
Academic, London
14. Filho JV, Baptista FG, Inman DJ (2011) Time-domain analysis of piezoelectric impedancebased structural health monitoring using multilevel wavelet decomposition. Mech Syst Signal
Process In Press, https://doi.org/10.1016/j.ymssp.2010.12.003
15. Giridhara G, Rathod VT, Naik S, Roy Mahapatra D, Gopalakrishnan S (2010) Rapid localization of damage using a circular sensor array and Lamb wave based triangulation. Mech
Syst Signal Process 24(8):2929–2946. https://doi.org/10.1016/j.ymssp.2010.06.002
16. Park S, Anton SR, Kim J-K, Inman DJ, Ha DS (2010) Instantaneous baseline structural
damage detection using a miniaturized piezoelectric guided waves system. KSCE J Civil Eng
14(6):889–895. https://doi.org/10.1007/s12205-010-1137-x
17. Wang D, Ye L, Ye L, Li F (2010) A damage diagnostic imaging algorithm based on the
quantitative comparison of Lamb wave signals. Smart Mater Struct 19:1–12. https://doi.org/
10.1088/0964-1726/19/6/065008
18. Silva, C., Rocha, B., and Suleman, A. (2009). A structural health monitoring approach based
on a PZT network using a tuned wave propagation method. Paper presented at the 50th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference,
Palm Springs, California
19. Rocha B, Silva C, Suleman A (2010) Structural health monitoring system using piezoelectric
networks with tuned lamb waves. Shock Vib 17(4–5):677–695
20. Wang G, Schon J, Dalenbring M (2009) Use of Lamb wave propagation for SHM and damage detection in sandwich composite aeronautical structures,. In: Paper presented at the 7th
international workshop on structural health monitoring, California, Stanford, pp 111–118
21. Park S, Anton SR, Inman DJ, Kim JK, Ha DS (2009) Instantaneous baseline damage detection
using a low power guided waves system. In: Paper presented at the 7th international workshop
on structural health monitoring, California, Stanford, pp 505–512
22. Kim Y-G, Moon H-S, Park K-J, Lee J-K (2011) Generating and detecting torsional guided
waves using magnetostrictive sensors of crossed coils. NDT E Int 44(2):145–151. https://doi.
org/10.1016/j.ndteint.2010.11.006
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
103
23. Chati F, Leon F, El Moussaoui M, Klauson A, Maze G (2011) Longitudinal mode L(0,4) used
for the determination of the deposit width on the wall of a pipe. NDT E Int 44(2):188–194.
https://doi.org/10.1016/j.ndteint.2010.12.001
24. Mustapha S, Ye L, Wang D, Ye L (2011) Assessment of debonding in sandwich CF/EP
composite beams using A0 Lamb wave at low frequency. Compos Struct 93(2):483–491.
https://doi.org/10.1016/j.compstruct.2010.08.032
25. Huang Q, Balogun O, Yang N, Regez B, Krishnaswamy S (2010) Detection of disbonding in
glare composites using lamb wave approach. In: Paper presented at the review of progress in
QNDE, AIP conference proceedings, vol 11211, pp 1198–1205
26. Koduru JP, Rose JL (2010) Modified lead titanate/polymer 1–3 composite transducers for
structural health monitoring. In: Paper presented at the review of progress in QNDE, AIP
conference proceedings, vol 1211, pp 1799–1806
27. Koehler B, Frankenstein B, Schubert F, Barth M (2009) Novel piezoelectric fiber transducers
for mode selective excitation and detection of LAMB waves. In: Paper presented at the review
of progress in QNDE, AIP conference proceedings, vol 1096, pp 982–989
28. Advani SK, Breon LJ, Rose JL (2010) Guided wave thickness measurement tool development
for estimation of thinning in plate-like structures. In: Paper presented at the review of progress
in QNDE, AIP conference proceedings, vol 1211, pp 207–214
29. Kannajosyula H, Puthillath P, Lissenden CJ, Rose JL (2009) Interface waves for SHM of
adhesively bonded joints. In: Paper presented at the 7th international workshop on structural
health monitoring, California, Stanford, pp 247–254
30. Chandrasekaran J, Krishnamurthy CV, Balasubramaniam K (2009) Higher order modes cluster
(HOMC) guided waves - a new technique for NDT inspection. In: Paper presented at the review
of progress in QNDE, AIP conference proceedings, vol 1096, pp 121–128
31. Ratassepp M, Lowe MJS (2009) SH0 Guided wave interaction with a crack aligned in the
propagation direction in a plate. In: Paper presented at the review of progress in QNDE, AIP,
conference proceedings, vol 1096, pp 161–168
32. Masserey B, Fromme P (2009) Defect detection in plate structures using coupled rayleigh-like
waves. In: Paper presented at the review of progress in QNDE, AIP conference proceedings,
vol 1096, pp 193–200
33. De Marchi L, Ruzzene M, Buli X, Baravelli E, Speciale N (2010) Warped basis pursuit for damage detection using lamb waves. IEEE Trans Ultrason Ferroelectr Freq Control 57(12):2734–
2741. https://doi.org/10.1109/TUFFC.2010.1747
34. Limongelli MP (2010) Frequency response function interpolation for damage detection under
changing environment. Mech Syst Signal Process 24(8):2898–2913. https://doi.org/10.1016/
j.ymssp.2010.03.004
35. Clarke T, Simonetti F, Cawley P (2010) Guided wave health monitoring of complex structures
by sparse array systems: influence of temperature changes on performance. J Sound Vib
329(12):2306–2322. Structural Health Monitoring Theory Meets Practice. https://doi.org/10.
1016/j.jsv.2009.01.052
36. De Marchi L, Marzani A, Caporale S, Speciale N (2009) A new warped frequency transformation (WFT) for guided waves characterization. In: Paper presented at health monitoring of
structural and biological systems, proceedings of spie - the international society for optical
engineering, vol 7295
37. Xu B, Yu L, Giurgiutiu V (2009) Lamb wave dispersion compensation in piezoelectric wafer
active sensor phased-array applications. In: Paper presented at health monitoring of structural
and biological systems, proceedings of SPIE - the international society for optical engineering,
vol 7295
38. Engholm M, Stepinski T (2010) Using 2-D arrays for sensing multimodal lamb waves. In:
Paper presented at nondestructive characterization for composite materials, aerospace engineering, civil infrastructure, and homeland security, proceedings of SPIE - the international
society for optical engineering, vol 7649
39. De Marchi L, Ruzzene M, Xu B, Baravelli E, Marzani A, Speciale N (2010) Warped frequency
transform for damage detection using lamb waves. In: Paper presented at health monitoring of
104
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
M. K. Hinders and C. A. Miller
structural and biological systems, proceedings of SPIE - the international society for optical
engineering, vol 7650
Thalmayr F, Hashimoto K-Y, Omori T, Yamaguchi M (2010) Frequency domain analysis
of lamb wave scattering and application to film bulk acoustic wave resonators. IEEE Trans
Ultrason Ferroelectr Freq Control 57(7):1641–1648. https://doi.org/10.1109/TUFFC.2010.
1594
Lu Y, Ye L, Wang D, Wang X, Su Z (2010) Conjunctive and compromised data fusion schemes
for identification of multiple notches in an aluminium plate using lamb wave signals. IEEE
Trans Ultrason Ferroelectr Freq Control 57(9):2005–2016. https://doi.org/10.1109/TUFFC.
2010.1648
Ye L, Ye L, Zhongqing S, Yang C (2008) Quantitative assessment of through-thickness crack
size based on Lamb wave scattering in aluminium plates. NDT E Int 41(1):59–68. https://doi.
org/10.1016/j.ndteint.2007.07.003
Moore EZ, Murphy KD, Nichols JM (2011) Crack identification in a freely vibrating plate
using Bayesian parameter estimation. Mech Syst Signal Process In Press, https://doi.org/10.
1016/j.ymssp.2011.01.016
An Y-K, Sohn H (2010) Instantaneous crack detection under varying temperature and static
loading conditions. Struct Control Health Monit 17:730–741. https://doi.org/10.1002/stc.394
Parnell WJ, Martin PA (2011) Multiple scattering of flexural waves by random configurations
of inclusions in thin plates. Wave Motion 48(2):161–175. https://doi.org/10.1016/j.wavemoti.
2010.10.004
Wilcox PD, Velichko A, Drinkwater BW, Croxford AJ, Todd MD (2010) Scattering of plane
guided waves obliquely incident on a straight feature with uniform cross-section. J Acoust
Soc Am 128:2715–2725. https://doi.org/10.1121/1.3488663
Soni S, Kim SB, Chattopadhyay A (2010) Reference-free fatigue crack detection, localization
and quantification in lug joints. In: Paper presented at the 51st AIAA/ASME/ASCE/AHS/ASC
structures, structural dynamics and materials conference, Orlando, Florida
Hall JS, Michaels JE (2009) On a model-based calibration approach to dynamic baseline
estimation for structural health monitoring. In: Paper presented at the review of progress in
QNDE, AIP conference proceedings, vol 1096, pp 896–903
Sohn H, Kim SB (2010) Development of dual PZT transducers for reference-free crack detection in thin plate structures. IEEE Trans Ultrason Ferroelectr Freq Control 57(1):229–240
Park S, Lee C, Sohn H (2009) Frequency domain reference-free crack detection using transfer
impedances in plate structures. In: Paper presented at health monitoring of structural and
biological systems, proceedings of SPIE - the international society for optical engineering,
vol 7295
Michaels TE, Ruzzene M, Michaelsa JE (2009) Frequency-wavenumber domain methods for
analysis of incident and scattered guided wave fields. In: Paper presented at health monitoring
of structural and biological systems, proceedings of SPIE - the international society for optical
engineering, vol 7295
Lee C, Kim S, Sohn H (2009) Application of a baseline-free damage detection technique to
complex structures. In: Paper presented at sensors and smart structures technologies for civil,
mechanical, and aerospace systems, proceedings of SPIE - the international society for optical
engineering, vol 7292
Ruzzene M, Xu B, Lee SJ, Michaels TE, Michaels JE (2010) Damage visualization via
beamforming of frequency-wavenumber filtered wavefield data. In: Paper presented at health
monitoring of structural and biological systems ,proceedings of SPIE - the international society
for optical engineering, vol 7650
Soni S, Kim SB, Chattopadhyay A (2010) Fatigue crack detection and localization using
reference-free method. In: Paper presented at smart sensor phenomena, technology, networks,
and systems proceedings of SPIE - the international society for optical engineering, vol 7648
Ayers J, Apetre N, Ruzzene M, Sharma V (2009) Phase gradient-based damage characterization of structures. In: Paper presented at the 7th international workshop on structural health
monitoring, California, Stanford, pp 1113–1120
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
105
56. Zhang J, Drinkwater BW, Wilcox PD, Hunter AJ (2010) Defect detection using ultrasonic
arrays: the multi-mode total focusing method. NDT E Int 43(2):123–133. ISSN:0963-8695.
https://doi.org/10.1016/j.ndteint.2009.10.001
57. Kang T, Lee D-H, Song S-J, Kim H-J, Jo Y-D, Cho H-J (2011) Enhancement of detecting
defects in pipes with focusing techniques, NDT E Int 44(2):178–187. ISSN 0963-8695. https://
doi.org/10.1016/j.ndteint.2010.11.009
58. Higuti RT, Martinez-Graullera O, Martin CJ, Octavio A, Elvira L, De Espinosa FM (2010)
Damage characterization using guided- wave linear arrays and image compounding techniques. IEEE Trans Ultrason Ferroelectr Freq Control 57(9):1985–1995. https://doi.org/10.
1109/TUFFC.2010.1646
59. Hall JS, Michaels JE (2010) Minimum variance ultrasonic imaging applied to an in situ sparse
guided wave array. IEEE Trans Ultrason Ferroelectr Freq Control 57(10):2311–2323. https://
doi.org/10.1109/TUFFC.2010.1692
60. Kumar A, Poddar B, Kulkarni G, Mitra M, Mujumdar PM (2010) Time reversibility
of lamb wave for damage detection in isotropic plate. In: Paper presented at the 51st
AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference,
Orlando, Florida
61. Zhao N, Yan S (2011) A new structural health monitoring system for composite plate. Adv
Mater Res 183–185:406–410
62. Zhang H, Cao Y, Yu J, Chen X (2010) Time reversal and cross-correlation analysis for damage
detection in plates using lamb waves. In: Paper presented at the 2010 international conference
on audio, language and image processing, Shanghai, China, ICALIP 2010 Proceedings, pp
1516–1520
63. Teramoto K, Uekihara A (2009) Time reversal imaging for gradient sensor networks over the
lamb-wave field. In: Paper presented at the ICCAS-SICE 2009 - ICROS-SICE international
joint conference, Fukuoka, Japan, Proceedings, pp 2311–2316
64. Zhao N, Yan S (2010) Experimental research on damage detection of large thin aluminum
plate based on lamb wave. In: Paper presented at sensors and smart structures technologies
for civil, mechanical, and aerospace systems, proceedings of SPIE - the international society
for optical engineering, vol 7647
65. Bellino A, Fasana A, Garibaldi L, Marchesiello S (2010) PCA-based detection of damage in
time-varying systems. Mech Syst Signal Process 24(7) 2010:2250–2260. https://doi.org/10.
1016/j.ymssp.2010.04.009. Special Issue: ISMA
66. Garcia-Rodriguez M, Yanez Y, Garcia-Hernandez MJ, Salazar J, Turo A, Chavez JA (2010)
Application of Golay codes to improve the dynamic range in ultrasonic Lamb waves aircoupled systems. NDT E Int 43(8):677–686. https://doi.org/10.1016/j.ndteint.2010.07.005
67. Kim K-S, Fratta D (2010) Travel-time tomographic imaging: multi-frequency diffraction
evaluation of a medium with a high-contrast inclusion. NDT E Int 43(8):695–705. https://doi.
org/10.1016/j.ndteint.2010.08.001
68. Xin F, Shen Q (2011) Fuzzy complex numbers and their application for classifiers performance
evaluation. Pattern Recognit 44(7):1403–1417. https://doi.org/10.1016/j.patcog.2011.01.011
69. Gutkin R, Green CJ, Vangrattanachai S, Pinho ST, Robinson P, Curtis PT (2011) On acoustic
emission for failure investigation in CFRP: Pattern recognition and peak frequency analyses.
Mech Syst Signal Process 25(4):1393–1407. https://doi.org/10.1016/j.ymssp.2010.11.014
70. Duroux A, Sabra KG, Ayers J, Ruzzene M (2010) Extracting guided waves from crosscorrelations of elastic diffuse fields: applications to remote structural health monitoring. J
Acoust Soc Am 127:204–215. https://doi.org/10.1121/1.3257602
71. Kerber F, Sprenger H, Niethammer M, Luangvilai K, Jacobs LJ (2010) Attenuation analysis
of lamb waves using the chirplet transform. EURASIP J Adv Signal Process 2010, 6. Article
ID 375171. https://doi.org/10.1155/2010/375171
72. Martinez L, Wilkie-Chancellier N, Glorieux C, Sarens B, Caplain E (2009) Transient spacetime surface waves characterization using Gabor analysis paper presented at the anglo french
physical acoustics conference. J Phys: Conf Ser 195 012009. https://doi.org/10.1088/17426596/195/1/012009
106
M. K. Hinders and C. A. Miller
73. Roellig M, Schubert L, Lieske U, Boehme B, Frankenstein B, Meyendorf N (2010) FEM
assisted development of a SHM-piezo-package for damage evaluation in airplane components.
In: Paper presented at the 11th international conference on thermal, mechanical and multiphysics simulation, and experiments in microelectronics and microsystems, EuroSimE 2010,
Bordeaux, France
74. Xu H, Xu C, Zhou S (2010) Study of lamb wave propagation in plate for UNDE by 2-D FEM
model. In: Paper presented at the 2010 international conference on measuring technology and
mechatronics automation, Changsha, China, ICMTMA 2010, vol 3, pp 556–559
75. Qu W, Xiao L (2009. Finite element simulation of lamb wave with piezoelectric transducers
for composite plate damage detection. Adv Mater Res 79–82, 1095–1098
76. Fromme P (2010) Directionality of the scattering of the A0 lamb wave mode at cracks. In:
Paper presented at the review of progress in QNDE, AIP conference proceedings, vol 11211,
pp 129–136
77. Karthikeyan P, Ramdas C, Bhardwa MC, Balasubramaniam K (2009) Non-contact ultrasound
based guided lamb waves for composite structure inspection: Some interesting observations.
In: Paper presented at the review of progress in QNDE, AIP conference proceedings, vol
1096, pp 928–935
78. Fromme P (2009) Structural health monitoring of plates with surface features using guided
ultrasonic waves. In: Paper presented at health monitoring of structural and biological systems,
proceedings of SPIE - the international society for optical engineering, vol 7295
79. Moreau L, Velichko A, Wilcox PD (2010) Efficient methods to model the scattering of ultrasonic guided waves in 3D. In: Paper presented at Health monitoring of structural and biological
systems proceedings of SPIE - the international society for optical engineering, vol 7650
80. Qu W, Xiao L, Zhou Y (2009) Finite element simulation of Lamb wave with piezoelectric
transducers for plastically-driven damage detection. In: Paper presented at the 7th international
workshop on structural health monitoring, California, Stanford, pp 737–744
81. Teramoto K, Uekihara A (2009) The near-field imaging method based on the spatio-temporal
gradient analysis. In: Paper presented at the ICCAS-SICE 2009 - ICROS-SICE international
joint conference, proceedings, pp 3393–3398
82. Teramoto K, Tamachi N (2010) Near-field acoustical imaging of cracks over the A0-mode
lamb-wave field. In: Paper presented at the SICE annual conference, Proceedings, Taipei,
Taiwan, pp 2742–2747
83. Fellinger P, Marklein R, Langenberg KJ, Klaholz S (1995) Numerical modeling of elastc
wave propagation and scattering with EFIT- elastodynamic finite integration technique. Wave
Motion 21(1):47–66
84. Schubert F, Peiffer A, Kohler B, Sanderson T (1998) The elastodynamic finite integration
technique for waves in cylindrical geometries. J Acoust Soc Am 104(5):2604–2614
85. Schubert F, Koehler B (2001) Three-dimensional time domain modeling of ultrasonic wave
propagation in concrete in explicit consideration of aggregates and porosity. J Comput Acoust
9(4):1543–1560
86. Schubert F (2004) Numerical time-domain modeling of linear and nonlinear ultrasonic
wave propagation using finite integration techniques - theory and applications. Ultrasonics
42(1):221–229
87. Rudd K, Bingham J, Leonard K, Hinders M (2007) Simulation of guided waves in complex
piping geometries using the elastodynamic finite integration technique. JASA 121(3):1449–
1458
88. Rudd K, Hinders M (2008) Simulation of incident nonlinear sound beam 3d scattering from
complex targets. Comput Acoust 16(3):427–445
89. Bingham J, Hinders M (2009) Lamb wave detection of delaminations in large diameter pipe
coatings. Open Acoust J 2:75–86
90. Bingham J, Hinders M (2009) Lamb wave characterization of corrosion-thinning in aircraft
stringers: experiment and 3d simulation. JASA 126(1):103–113
91. Bingham J, Hinders M, Friedman A (2009) Lamb Wave detection of limpet mines on ship
hulls. Ultrasonics 49:706–722
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
107
92. Bingham J, Hinders M (2010) 3D Elastodynamic finite integration technique simulation of
guided waves in extended built-up structures containing flaws computational acoustics, vol
18, Issue 2, pp 165–192
93. Bowman JJ, Senior TBA, Uslenghi PLE (1987) Electromagnetic and acoustic scattering by
simple shapes. Hemisphere Publishing, New York
94. Varadan VV, Lakhtakia A, Varadan VK (eds) (1986) Low and high frequency asymptotics.
Elsevier Science Publishing, New York
95. Varadan VV, Lakhtakia A, Varadan VK (eds) (1991) Field representations and introduction
to scattering. Elsevier Science Publishing, New York
96. Mindlin RD In: Yang J (ed) (2006) Introduction to the mathematical theory of vibrations of
elastic plates. World Scientific Publishing, Singapore
97. Takeda N, Takahashi I, Ito Y (2010).Visualization of impact damage in composite structures using pulsed laser scanning method. In: Paper presented at the 51st
AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference,
Orlando, Florid
98. Garcia-Rodriguez M, Ya Y, Garcia-Hernandez MJ, Salazar J, Turo A, Chavez JA (2010)
Laser interferometric measurements of air-coupled lamb waves. In: Paper presented at the 9th
international conference on vibration measurements by laser and non-contact techniques and
short course, Ancona, Italy. AIP Conference Proceedings, vol 1253, pp 88–93
99. Kostson E, Fromme P (2009) Defect detection in multi-layered structures using guided ultrasonic waves. In: Paper presented at the review of progress in QNDE, AIP conference proceedings, vol 1096, pp 209–216
100. Chunguang X, Rose JL, Yan F, Zhao X (2009) Defect sizing of plate-like structure using Lamb
waves. In: Paper presented at the review of progress in QNDE, AIP conference proceedings,
vol 1096, pp 1575–1582
101. Fromme P (2010) Directionality of the scattering of the A0 Lamb wave mode at cracks. In:
Paper presented at the review of progress in QNDE, AIP conference proceedings, vol 1211,
pp 129–136
102. Schubert L, Lieske U, Kohler B, Frankenstein B (2009) Interaction of Lamb waves with
impact damaged CFRP’s -effects and conclusions for acousto-ultrasonic applications. In:
Paper presented at the 7th international workshop on structural health monitoring, California,
Stanford, pp 151–158
103. Salas KI, Nadella KS, Cesnik CES (2009) characterization of guided-wave excitation and
propagation in composite plates. In: Paper presented at the 7th international workshop on
structural health monitoring, California, Stanford, pp 651–658
104. Zhang H, Sun X, Fan S, Qi X, Liu X, Donghui L (2008) An ultrasonic signal processing
technique for extraction of arrival time from Lamb waveforms, advanced intelligent computing
theories and applications. Springer Lect Notes Comput Sci 5226(2008):704–711. https://doi.
org/10.1007/978-3-540-87442-3-87
105. Chai HK, Momoki S, Kobayashi Y, Aggelis DG, Shiotani T (2011) Tomographic reconstruction for concrete using attenuation of ultrasound. NDT E Int 44(2):206–215. https://doi.org/
10.1016/j.ndteint.2010.11.003
106. Ramadas C, Balasubramaniam Krishnan, Makarand Joshi CV, Krishnamurthy, (2011) Characterisation of rectangular type delaminations in composite laminates through B- and D-scan
images generated using Lamb waves. NDT E Int 44(3):281–289. https://doi.org/10.1016/j.
ndteint.2011.01.002
107. Belanger P, Cawley P, Simonetti F (2010) Guided wave diffraction tomography within the
born approximation. IEEE Trans Ultrason Ferroelectr Freq Control 57(6):1405–1418. https://
doi.org/10.1109/TUFFC.2010.1559
108. Wu1 C-H, Yang C-H (2010) An investigation on ray tracing algorithms in Lamb wave tomography. In: Paper presented at the 31st symposium on ultrasonic electronics, Proceedings, vol
31, Tokyo Japan, pp 483–484
109. Hu Y, Xu C, Xu H (2009) The application of wavelet transform for lamb wave tomography. In:
Paper presented at the 1st international conference on information science and engineering,
Nanjing, China. ICISE 2009, pp 681–683
108
M. K. Hinders and C. A. Miller
110. Lissenden CJ, Cho H, Kim CS (2010) Fatigue crack growth monitoring of an aluminum joint
structure. In: Paper presented at the review of progress in QNDE, AIP conference proceedings,
vol 1211, pp 1868–1875
111. Balvantin A, Baltazar A, Kim J (2010) Ultrasonic lamb wave tomography of non-uniform
interfacial stiffness between contacting solid bodies. In: Paper presented at the review of
progress in QNDE, AIP conference proceedings, vol 1211, pp 1463–1470
112. Xu C, Rose JL, Yan F, Zhao X (2009) Defect sizing of plate-like structure using lamb waves.
In: Paper presented at the review of progress in QNDE, AIP conference proceedings, vol
1096, pp 1575–1582
113. Ng CT, Veidt M, Rajic N (2009) Integrated piezoceramic transducers for imaging damage
in composite laminates. In: Paper presented at the second international conference on smart
materials and nanotechnology in engineering, Proceedings of SPIE - the international society
for optical engineering, vol 7493
114. McKeon J, Hinders M (1999) Parallel projection and crosshole Lamb wave contact scanning
tomography. J Acoust Soc Am 106(5):2568–2577
115. McKeon J, Hinders M (1999) Lamb Wave scattering from a through hole. J Sound Vib
224(2):843–862
116. Malyarenko E, Hinders M (2000) Fan beam and double crosshole Lamb wave tomography
for mapping flaws in aging aircraft structures. J Acous Soc Am 108(10):1631–1639
117. Malyarenko E, Hinders M (2001) Ultrasonic Lamb wave diffraction tomography. Ultrasonics
39(4):269–281
118. Hinders M, Malyarenko E, Leonard K (2002) Blind test of Lamb wave diffraction tomography.
In: Thompson DO, Chimenti DE (eds) Reviews of progress in QNDE, vol 21, AIP CP 615,
pp 278–283
119. Leonard K, Malyarenko E, Hinders M (2002) Ultrasonic Lamb wave tomography. Inverse
Prob Special NDE Issue 18(6):1795–1808
120. Leonard K, Hinders M (2003) Guided wave helical ultrasonic tomography of pipes. JASA
114(2):767–774
121. Hou J, Leonard KR, Hinders M (2004) Automatic multi-mode Lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Prob 20:1873–1888
122. Hinders M, Leonard KR (2005) Lamb wave tomography of pipes and tanks using frequency
compounding. In: Thompson DO, Chimenti DE (eds) Reviews of progress in QNDE, vol 24,
pp 867-874
123. Hinders M, Hou J, Leonard KR (2005) Multi-mode Lamb wave arrival time extraction for
improved tomographic reconstruction. In: Thompson DO, Chimenti DE (eds) Reviews of
progress in QNDE, vol 24, pp 736–743
124. Leonard K, Hinders M (2005) Multi-mode Lamb wave tomography with arrival time sorting.
JASA 117(4):2028–2038
125. Leonard K, Hinders M (2005) Lamb wave tomography of pipe-like structures Ultrasonics
44(7):574–583
126. Griffin DR (1958) Listening in the dark: the acoustic orientation of bats and men. Yale University Press, New Haven
127. L. Cohen, (1995). Time-frequency analysis, Prentice-Hall signal processing series
128. Deubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics
129. Abbate A, Koay J, Frankel J, Schroeder SC, Das P (1994) Application of wavelet transform
signal processor to ultrasound. In: Paper presented at the IEEE Ultrasonics Symposium, pp
1147–1152
130. Masscotte D, Goyette J, Bose TK (2000) Wavelet-transorm-based method of analysis for
lamb-wave ultrasonic nde signals. IEEE Trans Instrum Meas 49(3):524–529
131. Perov DV, Rinkeich AB, Smorodinskii YG (2002) Wavelet filtering of signals for ultrasonic
flaw detector. Russ J Nondestr Test 38(12):869–882
132. Lou HW, Guang Rui H (2003) An approach based on simplified klt and wavelet transform
for enhancing speech degraded by non-stationary wideband noise. J Sound Vib 268:717–729
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
109
133. Zou J, Chen J (2004) A comparative study on time-frequency fracture of cracked rotor by
Wigner-Ville distribution and wavelet transform. J Sound Vib 276:1–11
134. Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals.
Mater Eval 60(9):1089–1093
135. Hinders M, Bingham J, Jones KR, Leonard K (2006) Wavelet thumbprint analysis of TDR
signals for wiring flaw detection. In: Thompson DO, Chimenti DE (eds) Reviews of progress
in QNDE, vol 25, pp 641–648
136. Hinders M, Hou J, McKeon JCP (2005) Ultrasonic inspection of thin multilayers. In: Thompson DO, Chimenti DE (eds) Reviews of progress in QNDE vol 24, pp 1137–1144
137. Ghorayeb SR, Bertoncini CA, Hinders MK (2008) Ultrasonography in Dentistry: A Review
IEEE Transactions on Ultrasonics. Ferroelectrics and Frequency Control 55(6):1256–1266
138. Al-Badour F, Sunar M, Cheded L (2011) Vibration analysis of rotating machinery using timefrequency analysis and wavelet techniques. Mech Syst Signal Process In Press. https://doi.
org/10.1016/j.ymssp.2011.01.017
139. Hein H, Feklistova L (2011) Computationally efficient delamination detection in composite
beams using Haar wavelets. Mech Syst Signal Process In Press. https://doi.org/10.1016/j.
ymssp.2011.02.003
140. Li H, Zhang Y, Zheng H (2011) Application of Hermitian wavelet to crack fault detection in
gearbox. Mech Syst Signal Process 25(4):1353–1363. https://doi.org/10.1016/j.ymssp.2010.
11.008
141. Wang X, Zi Y, He Z (2011) Multiwavelet denoising with improved neighboring coefficients
for application on rolling bearing fault diagnosis. Mech Syst Signal Process 25(1):285–304.
https://doi.org/10.1016/j.ymssp.2010.03.010
142. Jiang X, Mahadevan S (2011) Wavelet spectrum analysis approach to model validation of
dynamic systems. Mech Syst Signal Process 25(2):575–590. https://doi.org/10.1016/j.ymssp.
2010.05.012
143. Kim JH, Kwak H-G (2011) Rayleigh wave velocity computation using principal waveletcomponent analysis. NDT E Int 44(1):47–56. https://doi.org/10.1016/j.ndteint.2010.09.005
144. Jin X, Gupta S, Mukherjee K, Ray A (2011) Wavelet-based feature extraction using probabilistic finite state automata for pattern classification. Pattern Recognit 44(7):1343–1356.
https://doi.org/10.1016/j.patcog.2010.12.003
145. Acciani G, Brunetti G, Fornarelli G, Giaquinto A (2010) Angular and axial evaluation of
superficial defects on non-accessible pipes by wavelet transform and neural network-based
classification. Ultrasonics 50(1):13–25. https://doi.org/10.1016/j.ultras.2009.07.003
146. Jha R, Watkins R (2009) Lamb wave based diagnostics of composite plates using a modified time reversal method. In: Paper presented at the 50th AIAA/ASME/ASCE/AHS/ASC
structures, structural dynamics and materials conference, Palm Springs, California
147. Rathod VT, Panchal M, Mahapatra DR, Gopalakrishnan S (2009) Lamb wave based sensor
network for identification of damages in plate structures. In: Paper presented at the 50th
AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference,
Palm Springs, California
148. Du C, Ni Q, Natsuki T (2010) Determination of vibration source locations in laminated
composite plates using lamb wave analysis. Adv Mater Res 79–82:1181–1184
149. Raghuram V, Shukla R, Pramila T (2010) Studies on Lamb waves in long aluminium plates
generated using laser based ultrasonics. In: Paper presented at the 9th international conference on vibration measurements by laser and non-contact techniques and short course, AIP
conference proceedings, vol 1253, pp 100–105
150. Rathod VT, Mahapatra DR, Gopalakrishnan S (2009) Lamb wave based identification and
parameter estimation of corrosion in metallic plate structure using a circular PWAS array.
In: Paper presented at health monitoring of structural and biological systems, Proceedings of
SPIE - the international society for optical engineering, vol 7295
151. Song F, Huang GL, Hu GK (2009) Online debonding detection in honeycomb sandwich
structures using multi-frequency guided waves. In: Paper presented at the second international
conference on smart materials and nanotechnology in engineering, Proceedings of SPIE - the
international society for optical engineering, vol 7493
110
M. K. Hinders and C. A. Miller
152. Hinders MK, Bingham JP (2010) Lamb wave pipe coating disbond detection using the
dynamic wavelet fingerprinting technique. In: Paper presented at the review of progress in
QNDE AIP conference proceedings, vol 1211, pp 615–622
153. Bingham JP, Hinders MK (2010) Automatic multimode guided wave feature extraction using
wavelet fingerprints. In: Paper presented at the review of progress in QNDE, AIP conference
proceedings, vol 1211, pp 623–630
154. Treesatayapun C, Baltazar A, Balvantin A, Kim J (2009) Thickness determination of a plate
with varying thickness using an artificial neural network for time-frequency representation
of Lamb waves. In: Paper presented at the review of progress in QNDE, AIP conference
proceedings, vol 1096, pp 619–626
155. Yu L, Wang J, Giurgiutiu V, Shin Y (2010) Corrosion detection/quantification on thin-wall
structures using multimode sensing combined with statistical and time-frequency analysis. In:
Paper presented at the ASME international mechanical engineering congress and exposition,
proceedings, vol 14, pp 251–257
156. Bao P., Yuan, M., and Fu, Z. (2009). Research on monitoring technology of bolt tightness
degree based on wavelet analysis. In: Paper presented at the ICEMI 2009 - proceedings of 9th
international conference on electronic measurement and instruments, pp 4329–4333
157. Martinez L, Wilkie-Chancellier N, Glorieux C, Sarens B, Caplain E (2009) Transient spacetime surface waves characterization using Gabor analysis. In: Paper presented at the Anglo
-French physical acoustics conference. J Phys: Conf Ser 195:1–9
158. Michaels TE, Ruzzene M, Michaels JE (2009) Incident wave removal through frequencywavenumber filtering of full wavefield data. In: Paper presented at the review of progress in
QNDE, AIP conference proceedings, vol 1096, pp 604–611
159. Hanhui X, Chunguang X, Shiyuan Z, Yong H (2009) Time-frequency analysis for nonlinear
lamb wave signal. In: Paper presented at the 2nd international congress on image and signal
processing, CISP’09, Tianjin, China
160. Feldman M (2011) Hilbert transform in vibration analysis. Mech Syst Signal Process
25(3):735–802. https://doi.org/10.1016/j.ymssp.2010.07.018
161. Li C, Wang X, Tao Z, Wang Q, Shuanping D (2011) Extraction of time varying information
from noisy signals: an approach based on the empirical mode decomposition. Mech Syst
Signal Process 25(3):812–820. https://doi.org/10.1016/j.ymssp.2010.10.007
162. Yoo B, Pines DJ, Purekar AS, Zhang Y (2010) Piezoelectric paint based 2-D sensor array for detecting damage in aluminum plate. In: Paper presented at the 51st
AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference,
Orlando, Florida
163. Yuan M, Fu Z, Bao P (2009) Detection of bolt tightness degree based on HHT. In: Proceedings
of 9th international conference on electronic measurement and instruments, paper presented
at ICEMI 2009, Beijing, China, pp 4334–4337
164. Haiyan Z, Xiuli S, Zhidong S, Yueyu X (2009) Group velocity measurement of piezo-actuated
lamb waves using hilbert-huang transform method. In: Paper presented at the proceedings of
the 2nd international congress on image and signal processing, CISP’09, Tianjin, China
165. Martinez L, Wilkie-Chancellier N, Glorieux C, Sarens B, Caplain E (2009) Transient spacetime surface waves characterization using gabor analysis paper presented at the Anglo–french
physical acoustics conference. J Phys: Conf Ser 195:012009. https://doi.org/10.1088/17426596/195/1/012009
166. Xu B, Giurgiutiu V, Yu L (2009) Lamb waves decomposition and mode identification using
matching pursuit method. In: Paper presented at sensors and smart structures technologies for
civil, mechanical, and aerospace systems, proceedings of SPIE - the international society for
optical engineering, vol 7292
167. Cole SA (2001) Suspect identities: a history of fingerprinting and criminal identification.
Harvard University Press, Cambridge
168. Varitz R (2007) Wavelet transform and pattern recognition method for heart sound analysis.
United States Patent 20070191725
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
111
169. Aussem A, Campbell J, Murtagh F (1998) Wavelet-based feature extraction and decomposition strategies for financial forecasting. J Comp Int Financ 6(2):5–12
170. Tang YY, Yang LH, Liu J, Ma H (2000) Wavelet theory and its application to pattern recognition. World Scientific, River Edge
171. Brooks RR, Grewe L, Iyengar SS (2001) Recognition in the wavelet domain: a survey. J
Electron Imag 10(3):757–784
172. Nason GP, Silverman BW (1995) The stationary wavelet transform and some statistical applications. In: Oppenheim G Antoniadis A (ed) Wavelets and statistics. Lecture notes in statistics.
Springer, pp 281–299
173. Pittner S, Kamarthi SV (1999) Feature extraction from wavelet coefficients for pattern recognition tasks. IEEE Trans Pattern Anal Mach Intel 21(1):83–88
174. Sabatini AM (2001) A digital-signal-processing technique for ultrasonic signal modeling and
classification. IEEE Trans Instrum Meas 50(1):15–21
175. Coifman R, Wickerhauser M (1992) Entropy based algorithms for best basis selection. IEEE
Trans Inform Theory 38:713–718
176. Szu HH, Telfer B, Kadambe S (1992) Neural network adaptive wavelets for signal representation and classification. Opt Eng 31:1907–1916
177. Telfer BA, Szu HH, Dobeck GJ, Garcia JP, Ko H, Dubey A, Witherspoon N (1994) Adaptive
wavelet classification of acoustic and backscatter and imagery. Opt Eng 33: 2,192–2,203
178. Mallet Y, Coomans D, Kautsky J, De Vel O (1997) Classification using adaptive wavelets for
feature extraction. IEEE Trans Pattern Anal Mach Intel 19:1058–1066
179. Mallat S (1989) A theory for multiresolution signal processing: the wavelet representation.
IEEE Trans Pattern Anal Mach Intel 11:674–693
180. Antoine J-P, Barachea D, Cesar RM Jr, da Fontoura CL (1997) Shape characterization with
the wavelet transform. Sig Process 62(3):265–290
181. Yeh C-H (2003) Wavelet-based corner detection using eigenvectors of covariance matrices.
Pattern Recognit Lett 24(15):2797–2806
182. Chapa JO, Raghuveer MR (1995) Optimal matched wavelet construction and its application
to image pattern recognition. Proc SPIE 2491(1):518–529
183. Liang J, Parks TW (1996) A translation-invariant wavelet representation algorithm with applications. IEEE Trans Sig Process 44(2):225–232
184. Maestre RA, Garcia J, Ferreira C (1997) Pattern recognition using sequential matched filtering
of wavelet coefficients. Opt Commun 133:401–414
185. Murtagh F, Starck J-L, Berry MW (2000) Overcoming the curse of dimensionality in clustering
by means of the wavelet transform. Comput J 43(2):107–120
186. Yu T, Lam ECM, Tang YY (2001) Feature extraction using wavelet and fractal. Pattern
Recognit Lett 22:271–287
187. Tsai D-M, Chiang C-H (2002) Rotation-invariant pattern matching using wavelet decomposition. Pattern Recognit Lett 23(1–3):191–201
188. Du T, Lim KB, Hong GS, Yu WM, Zheng H (2004) 2D occluded object recognition using
wavelets. In: 4th international conference on computer and information technology, pp 227–
232. https://doi.org/10.1109/CIT.2004.1357201
189. Saito N, Coifman RR (1994) Local discriminant bases. Proc SPIE 2303(2):2–14. https://doi.
org/10.1117/12.188763
190. Livens S, Scheunders P, de Wouwer GV, Dyck DV, Smets H, Winkelmans J, Bogaerts W (1995)
Classification of corrosion images by wavelet signatures and LVQ networks. In: Hlavác V,
Sára R (eds) Computer analysis of images and patterns V. Springer, Berlin, pp 538–543
191. Tansel IN, Mekdeci C, Rodriguez O, Uragun B (1993) Monitoring drill conditions with
wavelet based encoding and neural networks. Int J Mach Tool Manu 33:559–575
192. Tansel IN, Mekdeci C, McLaughlin C (1995) Detection of tool failure in end milling with
wavelet transformations and neural networks (WT-NN). Int J Mach Tool Manu 35:1137–1147
193. Learned RE, Wilsky AS (1995) A wavelet packet approach to transient signal classification.
Appl Comput Harmon A 2:265–278
112
M. K. Hinders and C. A. Miller
194. Wu Y, Du R (1996) Feature extraction and assessment using wavelet packets for monitoring
of machining processes. Mech Syst Signal Process 10:29–53
195. Case TJ, Waag RC (1996) Flaw identification from time and frequency features of ultrasonic
waveforms. IEEE Trans Ultrason Ferr Freq Cont 43(4):592–600
196. Drai R, Khelil N, Benchaala A (2002) Time frequency and wavelet transform applied to
selected problems in ultrasonics NDE. NDT E Int 35(8):567–572
197. Buonsanti M, Cacciola M, Calcagno S, Morabito FC, Versaci M (2006) Ultrasonic pulseechoes and eddy current testing for detection, recognition and characterisation of flaws
detected in metallic plates. In: Proceedings of 9th European conference on non-destructive
testing, Berlin, Germany
198. Momenan R, Loew MH, Insana MF, Wagner RF, Garra BS (1990) Application of pattern
recognition techniques in ultrasound tissue characterization. In: Proceedings of 10th international conference on pattern recognition, vol 1, pp 608–612
199. Bankman IN, Johnson KO, Schneider W (1993) Optimal detection, classification, and superposition resolution in neural waveform recordings. IEEE Trans Biomed Eng 40(8):836–841
200. Kalayci T, Özdamar Ö (1995) Wavelet preprocessing for automated neural network detection
of EEG spikes. IEEE Eng Med Biol 14:160–166
201. Tate R, Watson D, Eglen S (1995) Using wavelets for classifying human in vivo Magnetic
Resonance spectra. In: Antoniadis A, Oppenheim G (eds) Wavelets and statistics. Springer,
New York, pp 377–383
202. Mojsilovic A, Popovic MV, Neskovic AN, Popovic AD (1995) Wavelet image extension for
analysis and classification of infarcted myocardial tissue. IEEE Trans Biomed Eng 44(9):856–
866
203. Georgiou G, Cohen FS (2001) Tissue characterization using the continuous wavelet transform.
I. Decomposition method. IEEE Trans Ultrason Ferr Freq Cont 48(2): 355–363
204. Georgiou G, Cohen FS, Piccoli CW, Forsberg F, Goldberg BB (2001) Tissue characterization
using the continuous wavelet transform. II. Application on breast RF data. IEEE Trans Ultrason
Ferr Freq Cont 48(2): 364–373
205. Lee W-L, Chen Y-C, Hsieh K-S (2003) Ultrasonic liver tissues classification by fractal feature
vector based on M-band wavelet transform. IEEE Trans Med Imag 22(3):382–392
206. Alacam B, Yazici B, Bilgutay N, Forsberg F, Piccoli C (2004) Breast tissue characterization
using FARMA modeling of ultrasonic RF echo. Ultrasound Med Biol 30(10):1397–1407
207. Bertoncini C, Hinders M (2010) Fuzzy classification of roof fall predictors in microseismic monitoring measurement, vol 43, pp 1690–1701. https://doi.org/10.1016/j.measurement.
2010.09.015
208. Fehlman W, Hinders M (2009) Mobile robot navigation with intelligent infrared image interpretation. Springer tracts in advanced robotics. Springer, Berlin
209. Cara Leckey, M. Rogge, C. Miller and M. Hinders, Multiple-mode Lamb wave scattering
simulations using 3D elastodynamic finite integration technique. Ultrasonics 52(2), 193–344
(2012). https://doi.org/10.1016/j.ultras.2011.08.003
210. DO Thompson, DE Chimenti (eds) (2012) 3D simulations for the investigation of lamb wave
scattering from flaws, review of progress in quantitative nondestructive evaluation, vol 31. In:
AIP Conference Proceedings, vol 1430, pp 111–117. https://doi.org/10.1063/1.4716220
211. Miller C, Hinders M (2012) Flaw detection and characterization using lamb wave tomography
and pattern classification review of progress in quantitative nondestructive evaluation. In:
Thompson DO, Chimenti DE (eds) AIP conference proceedings, vol 31, pp 1430, 663–670.
https://doi.org/10.1063/1.4716290
212. Miller C, Hinders M (2014) Classification of flaw severity using pattern recognition for guided
wave-based structural health monitoring. Ultrasonics 54:247–258. https://doi.org/10.1016/j.
ultras.2013.04.020
213. Miller C, Hinders M (2014) Multiclass feature selection using computational homology for
Lamb wave-based damage characterization. J Intell Mater Syst Struct 25:1511. https://doi.
org/10.1177/1045389X13508335
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
113
214. Miller C, Hinders M (2014) Intelligent feature selection techniques for pattern classification
of Lamb wave signals. In: AIP conference proceedings of review of progress in quantitative
nondestructive evaluation, vol 1581, p 294. https://doi.org/10.1063/1.4864833
215. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
216. Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Computer science
and scientific computing. Academic, Boston
217. Kuncheva LI (2004) Combining pattern classifiers. Wiley, New York
218. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
219. Webb AR (2012) Statistical pattern recognition. Wiley, New York
220. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press,
Cambridge
221. Nagy G (1968) State of the art in pattern recognition. Proc IEEE 56(5):836–863
222. Kanal L (1974) Patterns in pattern recognition: 1968–1974. IEEE Trans Inf Theory 20(6):697–
722
223. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern
Anal Mach Intell 22(1):4–37
224. Watanabe S (1985) Pattern recognition: human and mechanical. Wiley-Interscience Publication, Wiley
225. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning.
Artif Intell 97(1):245–271
226. Jain AK, Chandrasekaran B (1982) Dimensionality and sample size considerations in pattern
recognition practice. In: Krishnaiah PR, Kanal LN (eds) Classification pattern recognition
and reduction of dimensionality. Handbook of statistics, vol 2. Elsevier, pp 835–855
227. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics.
Bioinformatics 23(19):2507–2517
228. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1–4):131–156
229. Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK (2000) Dimensionality reduction
using genetic algorithms. IEEE Trans Evol Comput 4(2):164–171
230. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn
Res 3:1157–1182
231. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature
space. Stat Sin 20(1):101–148
232. Romero E, Sopena JM, Navarrete G, Alquézar R (2003) Feature selection forcing overtraining
may help to improve performance. In: 2003 Proceedings of the international joint conference
on neural networks, vol 3. IEEE, pp 2181–2186
233. Learned RE, Willsky AS (1995) A wavelet packet approach to transient signal classification.
Appl Comput Harmon Anal 2(3):265–278
234. Yen GG, Lin KC (2000) Wavelet packet feature extraction for vibration monitoring. IEEE
Trans Ind Electron 47(3):650–667
235. Jin X, Gupta S, Mukherjee K, Ray A (2011) Wavelet-based feature extraction using probabilistic finite state automata for pattern classification. Pattern Recognit 44(7):1343–1356
236. Gaul L, Hurlebaus S (1999) Wavelet-transform to identify the location and force-time-history
of transient load in a plate. In: Chang F-K (ed) Structural health monitoring. Technomic
Publishing Co, Lancaster, pp 851–860
237. Sohn H, Farrar CR, Hemez FM, Shunk DD, Stinemates DW, Nadler BR, Czarnecki JJ (2004)
A review of structural health monitoring literature: 1996-2001. Technical report, Los Alamos
National Laboratory, Los Alamos, NM. Report Number LA-13976-MS
238. Lee C, Park S (2011) Damage classification of pipelines under water flow operation using
multi-mode actuated sensing technology. Smart Mater Struct 20(11):115002–115010
239. Min J, Park S, Yun CB, Lee CG, Lee C (2012) Impedance-based structural health monitoring
incorporating neural network technique for identification of damage type and severity. Eng
Struct 39:210–220
240. Sohn H, Farrar CR, Hunter NF, Worden K (2001) Structural health monitoring using statistical
pattern recognition techniques. J Dyn Syst Meas Contr 123(4):706–711
114
M. K. Hinders and C. A. Miller
241. Biemans C, Staszewski WJ, Boller C, Tomlinson GR (1999) Crack detection in metallic
structures using piezoceramic sensors. Key Eng Mater 167:112–121
242. Legendre S, Massicotte D, Goyette J, Bose TK (2000) Wavelet-transform-based method of
analysis for Lamb-wave ultrasonic NDE signals. IEEE Trans Instrum Meas 49(3):524–530
243. Zou J, Chen J (2004) A comparative study on time-frequency feature of cracked rotor by
Wigner-Ville distribution and wavelet transform. J Sound Vib 276(1):1–11
244. Raghavan A, Cesnik CES (2007) Review of guided-wave structural health monitoring. Shock
Vib Dig 39(2):91–114
245. Harris D (2006) The influence of human factors on operational efficiency. Aircr Eng Aerosp
Technol 78(1):20–25
246. Marsh G (2006) Duelling with composites. Reinf Plast 50(6):18–23
247. Bowermaster D (2006) Alaska isn’t the only airline with ground-safety troubles. http://
seattletimes.com/html/businesstechnology/2002750657_alaska20.html
248. Zhang R, Knight SP, Holtz RL, Goswami R, Davies CHJ, Birbilis N (2016) A survey of
sensitization in 5xxx series aluminum alloys. Corrosion 72(2):144–159. https://doi.org/10.
5006/1787
249. Li F, Xiang D, Qin Y, Pond RB, Slusarski K (2011) Measurements of degree of sensitization
(DoS) in aluminum alloys using EMAT ultrasound. Ultrasonics 51(5), 561–570. https://doi.
org/10.1016/j.ultras.2010.12.009, ISSN 0041-624X
Chapter 3
Automatic Detection of Flaws in
Recorded Music
Ryan Laney and Mark K. Hinders
Abstract This chapter describes the application of wavelet fingerprinting as a technique to analyze and automatically detect flaws in recorded audio. Specifically, it
focuses on time-localized errors in digitized wax cylinder recordings and contemporary digital media. By taking the continuous wavelet transform of various recordings, we created a two-dimensional binary display of audio data. After analyzing the
images, we implemented an algorithm to automatically detect where a flaw occurs
by comparing the image matrix against the matrix of a known flaw. We were able to
use this technique to automatically detect time-localized clicks, pops, and crackles in
both cylinders and digital recordings. We also found that while other extra-musical
noises, such as coughing, did not leave a traceable mark on the fingerprint, they were
distinguishable from samples without the error.
Keywords Digital music editing · Wavelet fingerprint
3.1 Introduction
Practical audio recording began in the late nineteenth century with Thomas Edison’s
wax cylinders (Fig. 3.1). The product of experimentation with tinfoil as a recording
medium, wax, was found to be a more viable and marketable method of capturing
sound. A performer would play sound into the recording apparatus, shaped like a
horn, and the pressure would increase as sound traveled down the horn. It would
then cause a stylus to etch a groove into the wax mold, which could be played
back. Unfortunately, the material would degrade during playback, and the recording
would become corrupted as the wax eroded. This problem persisted into the twentieth
R. Laney · M. K. Hinders (B)
Department of Applied Science, William & Mary, Williamsburg, VA 23187-8795, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_3
115
116
R. Laney and M. K. Hinders
Fig. 3.1 An Edison phonograph (left). An Edison wax cylinder (right) [9]
century, even as the production process continued to improve. Among the flaws produced are time-localized pops and crackles, which often render cylinder recordings
unlistenable. The first step in preserving these cylinders is digitization [1] because at
this point the recording cannot undergo any further damage. While many cylinders
have been digitized, many of these recordings are not of sufficient commercial value
to merit fixing by a sound engineer. The National Jukebox project at the Library of
Congress [2] has made available to the public free of charge digitized recordings from
the extraordinary collections of the Library of Congress Packard Campus for Audio
Visual Conservation and other contributing libraries and archives. At launch, the
Jukebox included more than 10,000 recordings made by the Victor Talking Machine
Company between 1901 and 1925. Jukebox content is expected to increase regularly. Other ongoing efforts of note include the Smithsonian’s Time-Based Media
and Digital Art Working Group [3], the National Digital Information Infrastructure
and Preservation Program [4], the New York Public Library, which has one of the
oldest, largest, and most comprehensive institutional preservation programs in the
United States, with activities dating back to 1911 [5], the Stanford Media Preservation Lab [6], the Columbia Center for Oral History [7], and the Media Digitization
and Preservation Initiative at Indiana University [8].
Our work aims to make the next phase easier. If we can automatically detect where
the flaws are in a digitized recording that makes it unlistenable, an engineer will have
a much easier time fixing them and that process will be less expensive. Moreover,
if multiple recordings of a particular piece exist, an engineer can select which one
is worth fixing based on which has the least amount of errors. Thus, more of these
historically important recordings will be conserved and made available for the public
to fully enjoy.
Most recording studios now use digital audio workstations (DAWs) on computers
in conjunction with physical mixing stations, and it is common for audio to be
recorded at bit depths at or above 24 bits and 96 kHz, although this is usually truncated
3 Automatic Detection of Flaws in Recorded Music
117
Fig. 3.2 Gunnar Johansen was the first musician to be appointed artist in residence at an American
university. He held that post at the University of Wisconsin in Madison, from 1939 to 1976, during
which he not only taught at the school, but also performed several extended series of concerts on
the university’s radio station. Acting as his own recording producer and technician, he marketed
recordings on his Artist Direct record label from his home near Blue Mounds, Wisconsin. In total,
there were about 150 recordings issued on LP, including the complete keyboard works of Franz Liszt,
J.S. Bach and the lesser known composer, Ignaz Friedman, among others. All of these releases were
subsequently made available for purchase on cassette, as well. Although it has been said that some of
these recordings suffered from Johansen’s inexperience as a recording technician, the performances,
particularly the Liszt, were highly praised by reviewers in scholarly journals and music magazines.
The recording studio included a Hamburg Steinway ca. 1920 Concert Grand (D) (after ca. 1960),
a Moor Steinway Double Keyboard (D), a Moor Bösensdorfer Double Keyboard, a Harpsichord,
Spinet, Virginal, and Clavichord built in the 1950s. Johansen’s magnetic tape recordings have been
digitized and painstakingly restored by his student, Jonathan Stevens [10]
down to 16 bits and 44.1 kHz for commercial production. Unlike the case of digitized
cylinders, engineers have each instrument’s individual track (or tracks) to work with
rather than just one master track. Often this means that a flaw can be isolated to
just one track, and a correction on that level is less musically offensive than altering
the master. Common problems which recording engineers spend hours finding and
fixing include bad splices, accidental noises from an audience, and extra-musical
sounds from the performers such as vocal clicks, guitar pick noises, and unwanted
breathing. A time-consuming and monotonous portion of many recording engineers’
jobs is to go through a track little by little, find these errors, and then correct them.
Fortunately, the circumstances of digital recording allow for much easier correction
of errors. It is usually a simple matter to create a “correct” version of a track using
various processing techniques, as the digital realm gives the engineer essentially
unlimited room to duplicate, manipulate, and test different variations. This makes
automatic detection of the errors much easier in most cases (Fig. 3.2).
118
R. Laney and M. K. Hinders
3.2 Digital Music Editing Tools
Programs do exist for correcting errors in audio at a very fine level. CEDAR (http://
www.cedar-audio.com), commercial audio restoration software, lets the user look
at a spectrogram of the sound and remove unwanted parts by altering the image
of the spectrogram. Even cross-fading and careful equalization in most DAWs can
eliminate time-localized clicks, pops, and splice errors. However, this can be both a
time-consuming and expensive process, as the engineer has to both recognize all the
errors in a recording and then figure out how to correct them. In the music industry,
this never happens unless the recording is highly marketable in the first place. Due to
financial limitations, many historically important recordings on cylinders, magnetic
tape, other analog media, and even digital media don’t merit a human cleaning. The
time and cost it takes to do this would be greatly reduced with an algorithm that
automatically detects errors. There is no effort on the engineers’ part to find the
errors; they simply have to make the correction at the given time. In addition, if
multiple versions of a recording exist, an engineer could select which one would be
worth fixing based on the amount of flaws that require attention.
Previous work in this field has involved using spectrographic analysis as a basis
to de-noise a signal using the continuous wavelet transform. The Single Wavelet
Transform One-Dimensional De-Noising toolbox in MATLAB (The MathWorks,
Inc.) was used to generate a spectrogram of an input signal followed by the spectrogram of the filtered signal. The continuous wavelet transform yields coefficients of
“approximations” and “details”, or lower frequencies and higher frequencies, which
can be individually modified with a certain gain. In a sense, this is a type of equalization that relies on a wavelet transform as opposed to a Fourier transform, as all the
coefficients belonging to a certain group will be changed according to their individual
gain.
Unfortunately, spectrographic analysis was not a very practical tool in automating
noise removal. While it was useful in making individual sections of a recording better,
large-scale de-noising on the recording did not make it more listenable. For a given
moment in time, it was appropriate to kill some of the detail coefficients, while
removing them at other moments actually detracted from the musical quality.
We are using a Graphical User Interface (GUI) in MATLAB implemented by
the Non-Destructive Evaluation lab at William and Mary to display and analyze
the wavelet fingerprint of filtered data (Fig. 3.3) [11–13]. The continuous wavelet
transform is used to evaluate patterns related to frequency and intensity in the music,
but the output is not a plot of frequency intensities over time like a spectrogram, but
rather a two-dimensional binary image. This is a viable model because it is much
easier to recognize patterns in a binary image than in a spectrogram, and it’s easier to
tell a computer how to look at it. By examining raw and filtered fingerprints for many
variations of a certain type of input, we can make generalizations about how a flaw
in a recording will manifest itself in the wavelet fingerprint. The wavelet fingerprint
tool works similar to MATLAB’s built-in wavelet tool: we input sound data from the
workspace in MATLAB and view it as a waveform in the user interface. Below that,
3 Automatic Detection of Flaws in Recorded Music
119
Fig. 3.3 Wavelet fingerprint tool. Top plot: the entire input waveform. Middle plot: a selected
portion of the filtered waveform. Bottom plot: fingerprint of the selected part
we see a filtered version of the wave file according to the filter parameters, and then
the wavelet fingerprint below that.
The process of writing a detection algorithm involves several important steps.
When working with digital recordings, we reproduced a given flaw via recording or
processing in a digital audio workstation. Using Avid’s Pro Tools DAW (http://www.
avid.com), we recorded coughs, instrumental sounds, and generated other noises
one might find in a recording. We gathered multiple variations of a type of recorded
data from multiple people; it does not reveal anything if the manifestation of one
person’s cough is unique, for instance, because then the algorithm would only be
valid for finding that one error. The amplitude, panning, and reverb of the error can
be edited in Pro Tools and then synchronized in time with the clean audio tracks.
The wave files are then exported separately, imported into a workspace in MATLAB,
and analyzed using the wavelet fingerprint tool. Files are compared with other files
containing the same error as well as a control file without the error. We can then
filter the waveform using Fourier or wavelet analysis and examine the fingerprints
for patterns and similarities.
When working with cylinder recordings, we took a slightly different approach. In
this case, we did not simply insert an error into a clean recording; the recordings we
worked with had the error to begin with. This lack of control meant that, initially,
we did not know exactly what we are looking for. Fortunately, we have access to
files that have been processed through CEDAR’s de-clicking, de-crackling, and dehissing system. This does help make the recording more listenable, but the errors are
still present to a smaller degree. The cleaned-up files are also louder and include the
proper fades at the beginning and end of the track. Thus, this process is somewhat
analogous to the mastering process in digital music—the final step in making a track
sound its best. By synchronizing the cleaned-up track with the raw track, we can
120
R. Laney and M. K. Hinders
figure out what we are looking for in the wavelet fingerprint. The errors that are
affected by CEDAR will appear differently in the edited track, and the rest of the
track should manifest itself in a similar way.
However, synchronizing the files can be rather challenging. In a digital audio
workstation, it is relatively simple to get the files lined-up within several hundredths
of a second. At this point, the files will sound like they overlap almost completely,
but the sound will be more “full” due to the delay. However, several hundredths of
a second at the industry standard 44.1 kHz sample rate can be up to about 2,000
samples. Due to the visual nature of the wavelet fingerprint, we need to get the files
synchronized much better than that. Synchronization within about 100 samples is
sufficient.
One method of synchronizing the files is simply to examine them visually in a
commercial DAW. By filtering out much of the non-musical information using an
equalization program, the similarities in the unedited and edited waveforms become
clearer. We can continue to zoom-in on the waveforms in the program and synchronize
them as much as possible until they are lined-up down to the sample. However,
although this is an effective approach, it is not preferable because it is time-consuming
and subject to human error. Thus, we have developed a program in MATLAB to
automatically synchronize the two waveforms. Like a commercial DAW, our audio
workstation allows the user to first manually line-up the waveforms to the extent that
is humanly practical. A simultaneous playback function exists so that we can verify
aurally that the waveforms are reasonably synchronized.
At this point, the user can run an automatic synchronization. This program analyzes the waveforms against each other over a specified region and calculates a
Pearson correlation coefficient between them. The second waveform is then shifted
by a sample, and another correlation coefficient is calculated. After desired number
of iterations, based on how close the user thinks the initial manual synchronization
is, the program displays the maximum correlation coefficient and shifts the second waveform by the appropriate amount, thus automatically synchronizing the two
waveforms.
Like a commercial DAW, our MATLAB implementation also includes a Fourierbased equalization program, a compression algorithm to cut intensities above a certain decibel level, a gate algorithm to cut intensities below a certain decibel level,
gain control, and a trim function to change the lengths of a waveform by adding
zero values. Unlike most DAWs, however, our program allows for a more critical
examination and alteration of a waveform, including frequency spectrum analysis by
way of a discrete Fourier transform, and spectrograms of the waveforms.
The equilization tool, while not a paragraphic equalizer, allows for carefully constructed multi-band rectangular filters to be placed on a waveform after Fourier
analysis. This is particularly helpful in the synchronization process; by removing the
low end of the signal (below anything that is musically important) and removing the
high-end (above any of the fundamental pitch frequencies, usually no more than several thousand hertz), we can store the unedited and edited waveforms as temporary
variables that end up looking much more like each other than they did initially. By
running the synchronization algorithm on the musically similar temporary variables,
3 Automatic Detection of Flaws in Recorded Music
121
we know exactly how the actual waveforms should match up. With the unedited and
edited files synchronized, we can examine them effectively with the wavelet fingerprint tool. The more significant differences between the fingerprint of the unedited
sound and that of the edited sound will likely indicate an error. After we examine
multiple instances, if we have enough reason to relate a visual pattern to an audio
error, we can use our detection algorithm to automatically locate a flaw.
The detection algorithm we have implemented involves a numerical evaluation
of the similarity between a known flaw and a given fingerprint. Because the wavelet
fingerprint is in reality just a pseudocolor plot of ones and zeros, we can incrementally
shift a smaller flaw matrix over the larger fingerprint matrix and determine how
similar the two matrices are. To do this, we simply increase a value representing
how similar the matrices are by a given amount every time there is a match. The
user is able to decide how many points are awarded for a match of zeroes and how
many points are awarded for a match of ones. After the flaw is shifted over the entire
waveform from beginning to end, we plot the match values and determine where the
error is located. The advantage to this approach is that it not only points out where
errors likely are, but also allows the user to evaluate graphically where an error might
be, in case there is something MATLAB failed to catch.
In using the detection algorithm to analyze audio data, we found that it is often
necessary to increase the width of the ridges and decrease the number of ridges for
a reasonable evaluation. Multiple iterations of one error (a pop, for instance) often
manifest themselves very differently in the fingerprint. Thus, it is helpful to have a
more general picture of what is happening sonically from the fingerprint. The match
value technique in the detection algorithm gives us an unrepresentative evaluation
of similarity between the given flaw and the test fingerprint if we set the ridge detail
too fine.
A Fourier transform is a powerful tool in showing the frequency decomposition
of a given signal, but we would like to see frequency, intensity, and time all in one
representation. A spectrogram shows time and frequency on two axes and intensity
with color; we can see what frequencies are present, and how much, at a given time.
Programs exist which allow the user to modify the colors in the spectrogram to edit
the sound, but spectrograms are difficult to use for automation. The image is difficult
to mathematically associate with a particular type of flaw because the false-color
image has to be first segmented in order to define the boundaries of shapes.
A wavelet transform lends itself to our goals much better. Essentially, we calculate
the similarity between a wavelet function and a section of the input signal, and then
repeat the process over the entire signal. The wavelet function is then rescaled, and
the process is repeated. This leaves us with many coefficients for each different scale
of the wavelet, which we can then interpret as approximation or detail coefficients.
Scale is closely related to frequency, since a wavelet transform taken at a low scale
will correspond to a higher frequency component. This is easy to understand by
imagining how Fourier analysis works; by changing the scale of a sine wave, we are
essentially just changing its frequency. At this point, we now have a representation
of time, frequency, and intensity, all at once. We can plot the coefficients in terms
of time and scale simultaneously. The wavelet fingerprint tool we have implemented
122
R. Laney and M. K. Hinders
Fig. 3.4 Wavelet fingerprints of an excerpt generated by the coiflet3 wavelet (top) and Haar wavelet
(bottom)
takes these coefficients and creates a binary image of the signal, much like an actual
fingerprint, which is easy for both humans to visually interpret and computers to
mathematically analyze (Fig. 3.4).
The wavelet fingerprint tool currently runs from a 650-line code in MATLAB,
which we modify as needed for practicality and functionality. The function is implemented in a GUI, so users can easily manipulate input parameters. By clicking on a
section of the filtered signal, the user can view the signal’s fingerprint manifestation,
and clicking on the fingerprint will also let the user view what is happening at that
point in the signal to cause that particular realization.
The mathematically interesting part of the code occurs almost exclusively just in
the last hundred lines, where it creates the fingerprint from the filtered waveform. It
obtains the coefficients of the continuous wavelet transform from MATLAB’s built-in
cwt function. It takes the input parameters of the filtered wave from the specified left
and right bounds selected by the user, the number of levels (which is related to scale)
to use, and the wavelet to use for the analysis. This creates a two-dimensional matrix
of coefficient values, which are then normalized to 1 by dividing each component of
the matrix by the overall maximum value. Initially, the entire fingerprint matrix is set
to the value zero, and then certain areas will be set to one based on the values of the
continuous wavelet transform coefficients, the parameter selected by the user of how
many of these “ridges” there will appear, and how thick they will be. Throughout the
domain of the fingerprint, if the coefficients are greater than the number of ridges
minus half their thickness and less than the number of ridges plus half their thickness,
we set the fingerprint value at that point to be one. This outputs a “fingerprint” whose
information is contained in a two-dimensional matrix. For each fingerprint, the twodimensional matrix is displayed as a pseudocolor plot at the bottom of the interface.
Several modifications have been made to the original version of this code. For
speed and ease-of-use, this version’s capability to recognize binary or ASCII input
data has been eliminated. We use only data imported into MATLAB’s workspace,
so there is no need to create an error while running the program by accidentally
clicking the wrong button. Several of the program’s default parameters have also
been changed. Namely, the wavelet fingerprint no longer needs to discern between
3 Automatic Detection of Flaws in Recorded Music
123
“valleys” (negative intensities) or “peaks” (positive intensities), because it is only the
amplitude of the wave that contains relevant sonic information. We set a more full
75 “levels” of the wavelet fingerprint by default, rather than 50, so we can examine
all frequencies contained in the data more efficiently. The highest frequency we can
be concerned with is 22.05 kHz, as we cannot take any information from frequencies
higher than half the sampling rate (we always use the industry standard 44.1 kHz).
More practically, the user can now highlight any part of the fingerprint, input signal,
or filtered signal for analysis. The program will still respond even if the user selects
left and right bounds that don’t work, but rather will alter the bounds so that the
entire domain shifts over. Titles and axis labels on all plots of the waveform and the
fingerprints are now included as well.
3.3 Method and Analysis of a Non-localized Extra-Musical
Event (Coughing)
To analyze extra-musical events as errors in MATLAB, we generated them artificially using Pro Tools, and tested many different variations of them with the wavelet
fingerprint tool. We then exported both tracks together as a WAV file to be used in
MATLAB. We always specify WAV file because it is lossless audio and gives us the
best possible digital sound of our original recording. We never want to experience
audio losses due to compression and elimination of higher frequencies in formats like
mp3, despite the great reduction of file size. The file was then sent to the workspace
in MATLAB. By muting the cough, we created a control file without any extraneous
noise and sent that to the workspace in MATLAB as well. We always exported audio
files in mono rather than stereo, because currently the wavelet fingerprint tool can
only deal with a single vector of input values rather than two (one each for left and
right). For each file, we renamed the “data” portion of the file, containing the intensity
of the sound at each sample, and remove the “fs” portion, containing the sampling
rate (44.1 kHz). This created two files in the workspace: a vector of intensities for
the “uncorrupted” file and a vector for the file with coughing.
When we call the wavelet fingerprint tool in the command window, the loaded
waveform appears above its fingerprint (Fig. 3.5). In the top plot, we see the original
waveform. The second plot shows an optionally filtered (this one is still clean) version
of the selected part of the first plot. The third image shows the fingerprint, in this
case created by the coiflet3 wavelet. The black line on the fingerprint corresponds
to the red line on the second plot, so we can see how the fingerprint relates to
the waveform. In the third plot, the yellow areas represent ridges, or “ones” in the
modified wavelet transform coefficients matrix. We can also load the file with the
coughing and examine it in a similar fashion. As a convention, we always set the
ridge number to 15 and ridge thickness to 0.03 for unfiltered data.
The code allows us to take a snapshot of all the loaded fingerprints with the
“compare fingerprints” button. This lets us see each of the fingerprints within the
124
R. Laney and M. K. Hinders
Fig. 3.5 Analyzing the data with the wavelet fingerprint tool.
Fig. 3.6 Comparing wavelet fingerprints using the wavelet fingerprint tool (no error present)
most recent bounds of the middle plot (shown in Fig. 3.5). Purposely, we have selected
an area of the fingerprint that does not contain a cough (Fig. 3.6). There is no visually
obvious difference in the fingerprint—as we expect, if there is no difference in the
fingerprint, an error does not exist in this part of the sample. Shifting the region of
interest to a spot with the error (Fig. 3.7), we see in the second fingerprint that the
thumbprint looks very skewed at higher levels. The ridges seem to stretch significantly
further than they did at the lower levels, but their orientation and position are very
much unchanged.
As expected, introducing an error also introduces a difference in the waveform.
But now, we have to test against many other types of errors to determine what the
relationship is. Fortunately, Pro Tools makes it easy to keep track of many different
tracks and move sound clips to the appropriate location in time. We then compared
many different coughing samples in MATLAB (Fig. 3.8). While these coughs mani-
3 Automatic Detection of Flaws in Recorded Music
Fig. 3.7 Difference in fingerprint with (bottom) and without (top) coughing
Fig. 3.8 Comparing many different recorded coughs
125
126
R. Laney and M. K. Hinders
fest themselves differently in the music, one similarity we do notice is an exaggeration
of some of the upper level characteristics in the clean sample (the top fingerprint).
For instance, in most variations, we notice a distortion in the “hook” that occurs
at 500 samples around level 50. While this observation is vague, it is still somewhat significant, since these samples are taken from different “types” of coughs
and from different people. Unfortunately, after testing the same coughing samples
against several different audio recordings, the fingerprints did not show sufficient
visual consistency. Although these results are inconclusive, we think that if we filter
the track in some ideal way in future studies, we should be able to identify where a
cough might likely occur. The rest of our analysis focuses on more time-localized
errors.
3.4 Errors Associated with Digital Audio Processing
In mixing music, if an engineer wants to put two sounds together in immediate
sequence, he lines them up next to each other in a DAW and cross-fades each portion
with the other over a short time span. One signal fades out while the other fades
in. When this is done incorrectly, it can leave an awkward jump in the sound that
sometimes manifests itself as an audible and very un-musical popping noise. For this
analysis, we worked with three different types of music one might have to splice in
the studio: tracks of acoustic guitar strumming, vocal takes spliced together, and the
best takes of recorded audio from a live performance spliced together, which is very
useful if there are multiple takes and the performers mess some of them up.
First, we examined chord strumming recorded by an acoustic guitar. One part of
a song was recorded, and then the second part was recorded separately, and the two
pieces were put together. The three plots in Fig. 3.9 show the track spliced together
correctly, at a musically acceptable moment with a brief cross-fade, followed by
two incorrect tracks, which are missing the cross-fade. The last track even has the
splice come too late, so there is a very short silence between the two excerpts.
Mathematically, the second track just contains a slight discontinuity at the point of
the splice, but the third track contains two discontinuities and a region of zero values.
The late splice leaves an obvious mark; at the lower levels, the ridges disappear
entirely. The flaw in the second graph is less obvious. At about 250 samples, the
gap between the two adjacent “thumbprints” is sharper and narrower in the second
plot. We then ran the same analysis with vocal data. We found some similarities, but
the plot of the fingerprint is so dependent on the overall volume that it is hard to
determine what is directly connected with the flaw and what isn’t (Fig. 3.10). The
triangular shape still appears prominently due to the zero values in the third plot, but
the second plot is a little harder to analyze now. Although there is no cross-fade, the
discontinuity between the first and second waveforms is smaller. However, at about
300 samples, the gap between the two thumbprints is once again complete, narrow,
and sharp, although a little more curved this time.
3 Automatic Detection of Flaws in Recorded Music
127
Fig. 3.9 Comparing acoustic guitar splices: a correct splice (top), a bad splice (middle), zero data
in front of splice (bottom)
Fig. 3.10 Comparing vocal splices: a correct splice (top), a bad splice (middle), zero data in front
of splice (bottom)
Finally, to work ensemble playing into our analysis, we chose to examine a performance of an original composition for piano, guitar, percussion, and tenor saxophone
(Fig. 3.11). This piece was performed at a reading session, so splicing together different parts of the piece from the best takes is what made the complete recording; a
full-length, un-spliced recording does not exist. Once again, in the third plot, we see
the characteristic triangle shape. In the upper plots, however, there is little difference,
but there is a slight sharpening of the gap right after 300 samples in the second plot.
128
R. Laney and M. K. Hinders
Fig. 3.11 Comparing splices of a live recording: a correct splice (top), a bad splice (middle), zero
data in front of splice (bottom)
Fig. 3.12 Comparing wavelet fingerprints of the cylinder data (between 2,257,600 and 2,258,350
samples): edited data (top), unedited data containing pop (bottom)
Fig. 3.13 Instance of a click at 256,000 samples (between 300 and 400 samples in the fingerprint):
edited waveform fingerprint (top) and unedited waveform fingerprint (bottom)
While this result certainly is helpful, it is probably not codeable, since sharpness of
a gap is not an artifact unique to not cross-fading. However, the presence of similar
visual patterns confirmed that investigating splices further was necessary.
3 Automatic Detection of Flaws in Recorded Music
129
Fig. 3.14 Instance of a click at 103,300 samples (slightly less than 400 samples in the fingerprint):
edited waveform fingerprint (top), unedited waveform fingerprint (bottom)
3.5 Automatic Detection of Flaws in Cylinder Recordings
Using Wavelet Fingerprints
Using files from the University of California at Santa Barbara’s Cylinder Conservation Project [1], we compared the fingerprints of unedited and edited digitized
cylinders using the wavelet fingerprint tool. The files provided were in stereo, but in
actuality are the same in both the left and right channels, because stereo sound did
not exist in cylinder recording. So, we converted the files to mono by taking only the
left channel’s data. As before, the unedited file and edited file are each a vector the
length of the waveform in samples.
Because a complete song is millions of samples long, it is impractical for the
wavelet fingerprint program to process all the data at once. Thus, we examined the
complete audio file using our digital audio workstation implemented in MATLAB
to look for more specific spots we wanted to check for errors. CEDAR’s de-clicking
program did a relatively good job getting rid of the many of the louder clicks, but they
still affect the sound in a less overbearing way. We compared several fingerprints of
clicks in the unedited waveform with the fingerprints of the edited waveform. We
used a relatively large ridge width of 0.1 and 20 ridges, so we could look for more
general trends in the fingerprint (Fig. 3.12). At about 2,258,000 samples, we found
a noticeable difference in the two images. The flower-like shape that occurs at about
400 samples is common in other instances of clicks as well. From about 256,000–
256,750 samples (Fig. 3.13) and about 103,300 samples (Fig. 3.14), we observed a
similar shape. We found that this particular manifestation was more visually apparent
when the musically important information is quieter than the noise.
We then chose a flaw to use as the control. We compared this against a larger
segment of the wavelet fingerprint, and using our automatic flaw detection algorithm
found the exact location of the error. Taking the first error to generate our generic
pop fingerprint, we store the area from 400 to 450 samples in a 75 × 50 matrix. This
was shifted over the entire test fingerprint, counting only the values of the ridges
that match, as we can see by looking at the fingerprint the placement of the zero
values does not seem to have much consistency between each shape. Assigning one
match point for each ones match and no points for each zeros match, the program
130
R. Laney and M. K. Hinders
Fig. 3.15 Using the error at 2,258,000 samples as our generic flaw, we automatically detected the
flaw at 256k samples (beginning at 296 samples in the fingerprint) using our flaw detection program
detected the flaw slightly early at 296 samples (Fig. 3.15). However, it is very visually
apparent that the flaw most likely occurs in the neighborhood of 300 samples, so it
did a reasonably good job of locating it.
We expanded on this method further by filtering the waveform before the fingerprint was created. Creating a temporary variable from the original waveform revealed
useful information about where important events are located in the waveform. Using
the coiflet3 wavelet as our de-noising wavelet, we removed the third through fifth
approximation coefficients as well as the second through fifth detail coefficients on
the above excerpts. We found that while much of the musical information is lost,
information about where the errors occurred was exaggerated. At 256,500 samples,
we observed a prominent fork shape in the unedited fingerprint (Fig. 3.16). In taking
this data, we normalized each fingerprint individually due to the large difference in
intensities between the filtered versions of the clean and unedited wave files. We
observed the same shape at 103,300 samples (Fig. 3.17). For all filtered data, we use
a ridge number of 20 and thickness of 0.05.
Taking this feature as our representative pop sample, we counted the number of
zeros and ones in the first fingerprint that match up to see if we can automatically
locate it. Our program caught the flaw perfectly at about 350 samples (Fig. 3.18). The
3 Automatic Detection of Flaws in Recorded Music
131
Fig. 3.16 Using the coiflet3 wavelet to de-noise the input data, we observed the flaw at 256,500
samples with the Haar wavelet (about 400 samples in the fingerprint): filtered edited data (top) and
its associated fingerprint (second from top), filtered unedited data (second from bottom) and its
associated fingerprint (bottom)
weakness of using this method is that it appears more likely, for instance, that there
is a flaw at zero samples than at 300 samples. For a more accurate search, we need to
develop a method that does not count the zero matches unless there is a significant
amount of ones matches.
Interestingly, we found that while it is difficult to locate the crackling noise
made by cylinders just by looking at the waveform, we believe that we can successfully identify many instances through the same filtering process. Since these are
so widespread throughout the unedited waveform, we simply selected a portion of
the file and examined the fingerprint for the above shape. For instance, we chose
the region at approximately 192,800–193,550 samples in the same file (Fig. 3.19)
and saw less-defined fork shapes appear frequently in the edited waveform than
the unedited waveform. At about 292,000 samples, we observed a similar pattern
(Fig. 3.20).
While both fingerprints contain the fork shape throughout, they are more defined
in the unedited waveform. Since the forks seem to correspond to intensity extremes
when they are well defined, we think that this means they relate to the degree of
presence of the crackling sounds. Unfortunately, our algorithm for locating these
forks is of little use in this situation. We need a program to count the number of
132
R. Laney and M. K. Hinders
Fig. 3.17 Using the coiflet3 wavelet to de-noise the input data, we observed the flaw at 103,300
samples with the Haar wavelet (about 400 samples in the fingerprint): filtered edited data (top) and
its associated fingerprint (second from top), filtered unedited data (second from bottom) and its
associated fingerprint (bottom)
occurrences of forks, rather than individual points, per a certain number of samples.
When we run the fingerprint through the algorithm we already have, our results
are only reasonable. We assign match points for both ones and zeros so that we can
account for how defined the fork shapes are. Figure 3.21 shows the relative likelihood
of a flaw in the unedited waveform, and in the edited waveform. We see that for
the unedited waveform the peaks in the match values are higher than in the edited
waveform, suggesting that there is more crackling present in the unedited version.
3.6 Automatic Detection of Flaws in Digital Recordings
A common issue in editing is the latency of the cross-fade between splices. For
example, it is often necessary for an engineer to go through vocalists’ recordings
and make sure everything is in tune. This can be fixed in a DAW by selecting the
part of the file that is out of tune and simply raising or lowering its pitch. However,
an extraneous noise such as a click or pop may be created from a change in the
waveform. There might be a discontinuity, or the waveform may stay in the positive
3 Automatic Detection of Flaws in Recorded Music
133
Fig. 3.18 We took the shape in Fig. 3.17 as our generic flaw and automatically detected the error
in Fig. 3.16 using our flaw detection program
or negative region for too long. Oftentimes, an automatically generated cross-fade
will not solve the problem, so an engineer needs to go through a recording, track
by track, and listen for any bad splices. Moreover, after the problem is located, the
engineer must make a musical judgment on how to deal with it. The cross-fade
cannot be too long, or the separate regions will overlap, and filtering only a portion
of the data could make that sound too strange. However, using wavelet filtering and
fingerprinting comparison techniques, it is simple to automatically locate bad splices
that require additional attention.
In Pro Tools, simulating this is straightforward. We took the vocal track from a
recording session and fully examined it, finding and correcting all the pitch errors.
Then, we listened to the track for any sort of error that resulted from the corrections.
We duplicated the corrected track, and at each instance of an error, manually fixed
it on the control track. The track was exported from Pro Tools and examined using
the wavelet fingerprint tool (Fig. 3.22). Fortunately, since we were working in the
digital realm, we already knew exactly where the flaws were, and there was no need
to use our audio workstation to automatically sync the two tracks. At about 81,500
samples, we see a very distinct image corresponding to the flaw: a discontinuity
in the waveform that sounds like a small click. The unedited waveform, unlike the
134
R. Laney and M. K. Hinders
Fig. 3.19 Using wavelet filtering and fingerprinting to find where clicks occur between 192,800
and 193,550 samples: filtered edited data (top) and its associated fingerprint (second from top),
filtered unedited data (second from bottom) and its associated fingerprint (bottom)
control, has a triangular feature in the middle of the fingerprint. Removing the fifth
approximation coefficient using the coiflet3 wavelet as well as the second through
fifth detail coefficients, however, reveals a very distinct fork shape (Fig. 3.23). We
already know from our analysis of cylinder flaws that it is possible to code for this
shape. In one way, this is a very good thing, because it means that this shape, when
it is well defined, is indicative of a flaw and a computer has a relatively easy time
finding it. However, it does not say much about which particular type of flaw it
is. For that information, we need to rely on the appearance of unfiltered or less
filtered data. At 117300 samples, we observed another error from pitch correction,
this time a loud pop. The waveform stays in the positive region for too long as a
result of the splicing. By filtering out the second through fifth detail coefficients
using the coiflet3 wavelet, we noticed that while the error is visually identifiable in
the fingerprint, it is a different type of error than the previous one we examined.
Adding the fifth approximation coefficient to the filter, we see the characteristic fork
shape once again (Fig. 3.24). Once more, the heavy filtering with the coiflet3 wavelet
successfully marked an error, but we cannot say from the fingerprint exactly what
kind of error it is.
3 Automatic Detection of Flaws in Recorded Music
135
Fig. 3.20 Using wavelet filtering and fingerprinting to find where clicks occur between approximately 292,600 and 293,350 samples: filtered edited data (top) and its associated fingerprint (second
from top), filtered unedited data (second from bottom) and its associated fingerprint (bottom)
Fig. 3.21 Match values for the unedited (left) and edited (right) waveforms between 292,600 and
293,350 samples and the fork-shaped flaw
136
R. Laney and M. K. Hinders
Fig. 3.22 Instance of a splice at 81,600 samples (about 400 samples in the fingerprint): edited
waveform (top) and associated fingerprint (second from top), unedited waveform (second from
bottom), and associated fingerprint (bottom). Fingerprint created using the Haar wavelet
Next, we analyzed the fingerprint of an electric guitar run through a time compression program. An engineer would need to change how long a note or group of
notes lasts if a musician was not playing in time, but like with pitch corrections,
this can result in flaws. We found that although the instrument is completely different, the error manifests itself in a similar way in the wavelet fingerprint. At about
567,130 samples, there is a discontinuity in the waveform resulting from the time
compression (to the left). We made a control track by cross-fading the contracted
and normal sections of the waveform and analyzed the differences in the wavelet
fingerprint in MATLAB. Using the Haar wavelet to generate the fingerprint without
any pre-filtering, we found that a triangular figure manifests itself in the middle of the
fingerprint where the splice occurs (Fig. 3.25). This “discontinuity” in the fingerprint
was not apparent in that of the control track.
3 Automatic Detection of Flaws in Recorded Music
137
Fig. 3.23 Using the coiflet3 wavelet to de-noise the input data, we observed the flaw at 81,600
samples with the Haar wavelet (about 400 samples in the fingerprint): filtered edited data (top) and
its associated fingerprint (second from top), filtered unedited data (second from bottom) and its
associated fingerprint (bottom)
Next, we used the coiflet3 wavelet to filter the waveform (Fig. 3.26). We killed the
fifth approximation coefficient as well as the second through fifth detail coefficients,
and we found the typical pronounced fork shape occurred at the discontinuity in
the waveform. In the fingerprint of the control waveform, we saw that there was
a lack of clarity between each fork, and the forks themselves were once again not
well defined. Although the cause of the discontinuity was different in this test, it
manifested itself in the fingerprint the same way. Thus, we were able to identify that
it is a discontinuity error as well as its location in the sample.
3.7 Discussion
Wavelet fingerprinting proved to be a useful technique in portraying and detecting
time-localized flaws. We were able to very accurately and effectively detect pops and
clicks in both digital recordings and digitized cylinders through a process of filtering
the input waveform and analyzing the fingerprint. Our algorithm precisely located at
what point on the wavelet fingerprint the flaw began and provided a useful graphical
display of the relative likelihood of an error. We found that the same algorithm could
138
R. Laney and M. K. Hinders
Fig. 3.24 Instance of a splice at 117,300 samples (slightly before 400 samples in the fingerprint),
with the second through fifth detail coefficients removed as well as the fifth approximation coefficient
using the coiflet3 wavelet: edited waveform (top) and associated fingerprint (second from top),
unedited waveform (second from bottom), and associated fingerprint (bottom). Fingerprint created
using the Haar wavelet
be used to detect many manifestations of the same flaw. Whether it was produced by
a bad splice from placement, tuning, or tempo adjustment, or induced by cylinder
degradation, we were able to find where the problem occurred. While our filtering
process and algorithm did not tell us exactly what type of error occurred or how
prominent it was, analysis of the unfiltered fingerprint did reveal a visual difference
between flaws.
Wavelet fingerprinting proved relatively inefficient in detecting flaws in coughs.
We think that because events like this are more sonically diverse in nature and less
time-localized, it was not possible for us to create an automatic detection algorithm.
With a high level of filtering, it may be possible in future work to locate such errors.
We also hope in future studies to further develop a method of detecting less audible
flaws, such as the crackles we noticed in cylinder files, with a higher mathematical
level of precision and certainty. We look to extend our program’s reach to musical
flaws such as vocal clicks and the sound of a guitarist’s pick hitting a fingerboard.
Such a tool could be helpful for engineers in both audio restoration and digital
mastering.
3 Automatic Detection of Flaws in Recorded Music
139
Fig. 3.25 Using Haar wavelet to create the fingerprint without any pre-filtering, we observed the
flaw at about 567,130 samples (slightly before 400 samples in the fingerprint): filtered edited data
(top) and its associated fingerprint (second from top), filtered unedited data (second from bottom)
and its associated fingerprint (bottom)
There are several steps we must take to fully apply wavelet fingerprinting as
an error detection technique. First, we must modify our algorithm to search entire
tracks for errors and display every instance where one occurs, rather than just the
several hundred samples where it most likely occurs. We can do this essentially by
placing intensity marks (for instance, a color or a number) at every point along the
waveform indicating how likely it is that an error occurs there. When searching for
one given type of error, we suspect that these intensity marks will spike at the points
of occurrence. Furthermore, we hope to modify our algorithm to accurately detect
flaws based on the type of flaw it is. This involves inventing a more flexible and
empirical method of detecting a flaw; this could mean not necessarily looking at
all levels of the fingerprint, allowing uncertainty in how close matching points are,
contracting or expanding the flaw matrix, or assigning match points more liberally
or conservatively. It will be a challenge, but our goal is to make our algorithms more
adaptive as we continue to investigate more flaws and improve our current detection
methods.
140
R. Laney and M. K. Hinders
Fig. 3.26 Using the coiflet3 wavelet to de-noise the input data, we observed the flaw at 81,600
samples with the Haar wavelet (between 300 and 400 samples in the fingerprint): filtered edited data
(top) and its associated fingerprint (second from top), filtered unedited data (second from bottom)
and its associated fingerprint (bottom)
Acknowledgements The authors would like to thank Jonathan Stevens for numerous helpful discussions.
References
1. UC Santa Barbara Library (n.d.) Cylinder recordings: a primer. Retrieved 2010, from cylinder
preservation and digitization project: http://cylinders.library.ucsb.edu/
2. The National Jukebox project at the Library of Congress, http://www.loc.gov/jukebox/
3. Smithsonian’s Time-Based Media and Digital Art Working Group, https://www.si.edu/tbma/
about
4. National Digital Information Infrastructure and Preservation Program, http://www.
digitalpreservation.gov/about/index.html and https://www.diglib.org/
5. New York Public Library Preservation Division, https://www.nypl.org/collections/nyplcollections/preservation-division
6. Stanford Media Preservation Lab, https://library.stanford.edu/research/digitization-services/
labs/stanford-media-preservation-lab
7. Columbia Center for Oral History, https://library.columbia.edu/locations/ccoh.html
3 Automatic Detection of Flaws in Recorded Music
141
8. Media Digitization and Preservation Initiative at Indiana University, https://mdpi.iu.edu/index.
php
9. Edison phonograph and wax cylinder. http://en.wikipedia.org/wiki/Phonograph and http://en.
wikipedia.org/wiki/Phonograph_cylinders
10. Gunnar Johansen musician and composer. https://www.nytimes.com/1991/05/28/obituaries/
gunnar-johansen-85-a-pianist-composer-and-instructor-is-dead.html; https://www.discogs.
com/artist/3119653-Gunnar-Johansen-2
11. Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals.
Mater Eval 60(9):1089–1093
12. Amara Graps. An introduction to wavelets. Retrieved 2011, from Institute of Electrical and
Electronics Engineers, Inc: http://www.amara.com/IEEEwave/IEEEwavelet.html
13. The MathWorks, Inc. (n.d.) Continuous wavelet transform: a new tool for signal analysis.
Retrieved 2010, from The MathWorks-MATLAB and SimuLink for Technical Computing:
http://www.mathworks.com/help/toolbox/wavelet/gs/f3-1000759.html
Chapter 4
Pocket Depth Determination with an
Ultrasonographic Periodontal Probe
Crystal B. Acosta and Mark K. Hinders
Abstract Periodontal disease, commonly known as gum disease, affects millions of
people. The current method of detecting periodontal pocket depth is painful, invasive,
and inaccurate. As an alternative to manual probing, the ultrasonographic periodontal
probe is being developed to use ultrasound echo waveforms to measure periodontal
pocket depth, which is the main measure of periodontal disease. Wavelet transforms
and pattern classification techniques are used in artificial intelligence routines that
can automatically detect pocket depth. The main pattern classification technique
used here, called a binary classification algorithm, compares test objects with only
two possible pocket depth measurements at a time and relies on dimensionality
reduction for the final determination. The method correctly identifies up to 90% of
the ultrasonographic probe measurements within the manual probe’s tolerance.
Keywords Probe · Ultrasonography · Binary classification · Wavelet fingerprint
4.1 Introduction
In the clinical practice of dentistry, radiography is routinely used to detect structural
defects such as cavities, but too much ionizing radiation is harmful to the patient.
Radiography can also only detect defects parallel to the projection path, so cracks are
difficult to detect, and it is useless for identifying conditions such as early stage gum
disease because soft tissues are transparent to X-rays. Medical ultrasound, however,
is safe to use as often as indicated, and computer interpretation software can make
disease detection automatic. The structure of soft tissues can be analyzed effectively
with ultrasound, and even symptoms such as inflammation can be registered with this
C. B. Acosta · M. K. Hinders (B)
Department of Applied Science, William & Mary, Williamsburg, VA 23187-8795, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_4
143
144
C. B. Acosta and M. K. Hinders
Fig. 4.1 The figure compares a manual periodontal probing, in which a thin metal probe is inserted
in between the tooth and gums, versus b the ultrasonographic periodontal probe, in which ultrasound
energy propagates through water non-invasively
technology [1–6]. An ultrasonographic periodontal probe was developed as a spinoff of NASA technology [7–15]. Periodontal disease is caused by bacterial infections
in plaque, and the advanced stages can cause tooth loss when the periodontal ligament
holding the tooth in place erodes [16]. Periodontal disease is so widespread worldwide that 10–15% of adults have advanced stages of the disease with deep periodontal
pockets that put them at risk of tooth loss [17]. The usual method of detection is with
a thin metal probe scribed with gradations marking depth in millimeters (Fig. 4.1a).
The dental hygienist inserts the probe down into the area between the tooth and gum
to estimate the depth to the periodontal ligament. At best, this method is accurate
to ± 1 mm and depends on the force the hygienist uses to push the probe into the
periodontal pocket. Furthermore, this method is somewhat painful and often causes
bleeding [18]. The ultrasonographic periodontal probe uses high-frequency ultrasound to find the depth of the periodontal ligament non-invasively. An ultrasonic
transducer projects high-frequency (10–15 MHz) ultrasonic energy in between the
tooth and the gum and detects echoes of the returning wave (Fig. 4.1b). In the usual
practice of ultrasonography1 the time delay of the reflection is converted to a distance measurement by using the speed of sound in water (1482 m/s). However, both
experimental and simulation waveforms show that the echoes from the anatomy of
interest are smaller than the surrounding reflections and noise. Further mathematical
1 In the medical field, ultrasound (ultrasonic) and ultrasonography (ultrasonographic) are used inter-
changeably to describe MHz frequency acoustic waves being used for diagnostic applications.
However, in dentistry, ultrasound or ultrasonic refers to kHz-frequency vibrating tools (scalers)
used for cleaning, which is a different application of the same technology. Here, we prefer to use
“sonographic” or “ultrasonographic” to imply the diagnostic application of ultrasound energy.
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
145
techniques, including a wavelet transform and pattern classification techniques, are
required in order to identify pocket depth from these ultrasound waveforms.
The following sections describe the development of machine learning algorithms
for the ultrasonographic periodontal probe. Section 4.2 describes related work involving applying wavelet transforms for feature extraction, applying pattern classification to ultrasound waveforms, as well as other applications of pattern classification
in the medical field. Section 4.3 describes the materials and methods for gathering the ultrasound data. Sections 4.4 and 4.5 describe the feature extraction and
selection procedures in preparation for a binary classification algorithm described in
Sect. 4.6. Section 4.6.2 describes dimensionality reduction, and Sect. 4.6.3 explores
classifier combination. The results are presented in Sect. 4.7 and statistically analyzed
in Sect. 4.8, with conclusions presented in Sect. 4.9.
4.2 Related Work
Previous researchers have made use of the wavelet transform for pattern classification applications [19, 20]. One option is to integrate wavelets directly by capitalizing
on the orthogonal property of wavelets to estimate the class density functions [21].
However, most applications of wavelets to pattern recognition focus on feature extraction techniques [22]. One common method involves finding the wavelet transform
of a continuous variable (sometimes a signal) and computing the spectral density, or
energy, which is the square of the coefficients [23, 24]. Peaks of the spectral density
or the sum of the density can be used as features and have been applied to flank wear
estimation in turning processes and classifying diseased lung sounds [23] as well as
to evaluating astronomical chirps and the equine gait [24]. This technique is similar
to finding the cross-correlation but is only one application of wavelets to signal analysis. One alternate method is to deconstruct the signal into an orthogonal basis, such
as Laguerre polynomials [25]. Another technique is the adaptive wavelet method
[26–29] which stems from multiresolution analysis [30]. Multiresolution analysis
applies a wavelet transform using an orthogonal basis resulting in filter coefficients
in a pyramidal computation scheme, while adaptive wavelet analysis uses a generalized M-band wavelet transform to similarly achieve decomposition into coefficients
and inserts those coefficients into matrix form for optimization. Adaptive wavelets
result in efficient compression and have the advantage of being widely applicable.
Wavelet methods are also often applied for feature extraction in images, such as
for shape characterization and to find boundaries [31, 32]. Wavelets are particularly
useful for detecting singularities, and in 2D data spaces this results in an ability
to identify corners and boundaries. Shape characterization itself is a precursor to
template matching in pattern classification, in which outlines of objects are extracted
from an image and matched to known shapes from a library. Other techniques are
similar to those described above, including multiresolution analysis, which is also
similar to the image processing technique of matched filters [33–37]. Either libraries
of known wavelets or wavelets constructed from the original signal are used to
match the signal of interest [33]. Pattern recognition then proceeds in a variety of
146
C. B. Acosta and M. K. Hinders
ways from the deconstructed wavelet coefficients. The coefficients, with minimal
cost, can be used as features [34] or the results from each sequential step can be
correlated individually [35].
To reduce dimensionality, sometimes projection transforms are precursors to
decomposition [36, 37]. Some authors have constructed a rotationally invariant projection that deconstructs an image into sub-images and transforms the mathematical
space from 2D to 1D [21, 38, 39]. Also, constructing a set of orthonormal bases, just
as in the case of adaptive wavelets above, remains useful for images [40]. Because
of the dimensionality, computing the square of the energy is cumbersome and so
a library of codebook vectors is necessary for classification [41]. There have been
many successful applications of pattern recognition in ultrasound, some of which
include wavelet feature extraction methods. In an industrial field, Tansel et al. [42,
43] selected coefficients from wavelet decomposition for feature vectors and were
able to use pattern recognition techniques to detect tool failure in drills. Learned
and Wilsky [44] constructed a wavelet packet approach, in which energy values are
calculated from a full wavelet decomposition of a signal, to detect sonar echoes for
submarines. Wu and Du [45] also used a wavelet packet description but found that
feature selection required knowledge of the physical space, in this case, drill failure. Case and Waag [46] used Fourier coefficients instead of wavelet coefficients for
features that successfully identified flaws in pipes. Comparing several techniques,
including features selected from wavelet, time, and spectral domains, Drai et al. [47]
identified welding defects using a neural network classifier. Buonsanti et al. [48]
compared ultrasonic pulse-echo and eddy techniques to detect flaws in plates using
a fuzzy logic classifier and wavelet features relevant to the physical domain.
In the medical field, an early example of applying classification techniques is
evidenced in the work of Momenan et al. [49] in which features selected offline are
used to identify changes in tissues as well as clustering in medical ultrasound images.
Bankman et al. [50] classified neural waveforms successfully with the careful application of preprocessing techniques such as whitening via autocorrelation. Meanwhile,
Kalayci et al. [51] detected EEG spikes by selecting eight wavelet coefficients from
two different Debauchies wavelet transforms for application in a neural network.
Tate et al. [52] similarly extracted wavelet coefficients as well as other prior information to attempt to identify vegans, vegetarians, and meat eaters by their magnetic
resonance spectra. Mojsilovic et al. [53] applied wavelet multiresolution analysis to
identify infarcted myocardial tissue from medical ultrasound images. Georgiou et al.
[54, 55] used wavelet decomposition to calculate scale-averaged wavelet power up
to a threshold and detected the presence of breast cancer in ultrasound waveforms by
means of hypothesis testing. Also using multiresolution techniques, Lee et al. [56]
further selected fractal features to detect liver disease. Alacam et al. [57] improved
existing breast cancer characterization of ultrasonic B-mode images by adding fractional differencing and moving average polynomial coefficients as features.
Hou [15, 58] developed the dynamic wavelet fingerprint (DWFP) method to
transform wavelet coefficients to 2D binary images. The method is general, e.g.,
[59–65], but when directly applied to data obtained from a fourth-generation ultrasonographic periodontal probe tested on 14 patients [66, 67], the authors were able
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
147
to resolve at best around 75% of the pocket depths accurately within a tolerance of
1.5 mm. When the tolerance is adjusted to the 1 mm precision of the manual probe,
the highest success rate per patient drops to about 60%. The authors did not use
pattern classification but instead relied on image recognition techniques to detect
pocket depth.
4.3 Data Collection
Previous publications describe the fabrication of several generations of prototype
ultrasonographic probes [7–15, 68–70]. The body of the fifth-generation probe is
manufactured similar to other dental hand pieces, with the 10 MHz piezoelectric
transducer located in the head of the probe. Water is the ultrasonic coupling agent, so
the fabrication of the probe allows water to be funneled through the custom-shaped
tip. The rest of the components used to control the probe, including the generalpurpose pulser-receiver and the custom-built water flow interface device, are shown
in Fig. 4.2.
Fig. 4.2 All devices necessary to conduct experiments using the ultrasonographic periodontal probe
are shown, including a laptop computer, b interface device, c pulser-receiver, d ultrasonographic
probe, e manual probe, and f foot pedal to control water flow
148
C. B. Acosta and M. K. Hinders
Fig. 4.3 Pictured is the apparatus shown in Fig. 4.2 being operated by Prof. Gayle McCombs, with
Jonathan Stevens monitoring the computer
Clinical tests were performed at Old Dominion University’s (ODU) Dental
Hygiene Research Center on 12 patients using both the ultrasonographic periodontal
probe and the traditional manual periodontal probe (Fig. 4.3). The human subjects
protocol was approved by IRBs at both William and Mary and ODU. Most of the
measurements were for healthy subjects, with 76% of the data measuring 3 mm or
less. Figure 4.4 shows a distribution of the data versus manual pocket depth. The
ultrasonographic probe measurement was always performed first, and 30 repeated
ultrasonic waveforms were recorded at the standard 6 tooth sites in at most 28 teeth
per patient. Simulations using a finite integration technique were also performed by
our group prior to the clinical tests [68]. The simulations animate the generation of
ultrasonic energy in the transducer and its propagation through the tissues. The output of the simulations includes a time-domain waveform recorded by the transducer.
Figure 4.5 compares the simulated waveform with a sample filtered experimental
waveform. Note that the region of interest in between the first and second reflections
from the tip does not register any obvious reflections from the periodontal pocket.
To detect pocket depth, further mathematical abstractions are required. The basic
steps of the artificial intelligence are as follows:
1. Feature Extraction: Get wavelet fingerprints with DWFP and find fingerprint
properties using image recognition software.
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
149
Fig. 4.4 The histogram shows the distribution of manual periodontal pocket depth measurements
for the ODU clinical tests. Note that the majority of the population has healthy (≤3 mm) pocket
depths
a) Simulated waveform
Amplitude
4000
2000
0
−2000
2.6
2.4
Time (s)
2.2
2
1.8
2.8
3.2
−5
x 10
3
Amplitude
a) Experimental waveform
1000
0
−1000
0.4
0.6
0.8
1
1.2
Time (s)
1.4
1.6
1.8
2
−5
x 10
Fig. 4.5 Sample waveforms from a simulation and b experiment are compared here. The large
reflections indicated in the boxes are artifacts from the tip geometry. The echo from the bottom of
the pocket, which would occur in between the rectangles, is not apparent
150
C. B. Acosta and M. K. Hinders
2. Feature Selection: Find μ and σ (Eqs. 4.3 and 4.4) of each property for all waveforms and collect key values where the average property varies per pocket depth.
3. Binary Classification: Compare the selected features in a leave-one-out routine
using well-known pattern classification schemes and only two possible pocket
depths at a time.
4. Dimensionality Reduction: Evaluate binary labels using four different methods
to collapse each binary choice to one label.
5. Classifier Combination: Combine the predicted labels from the most precise tests
to improve accuracy and/or spread of labels.
Each step is explained in further detail in the sections that follow. It is also important
to note that because of the computation time of this task, the computer algorithms
were adapted to run on William and Mary’s Scientific Computing Cluster.2
4.4 Feature Extraction
Since reflections from the periodontal ligament are not apparent in the ultrasound
waveforms, advanced mathematics are needed to identify pocket depth. The clinical
trials yielded ultrasonic waveforms ws,k (t), where there are k = 1, . . . , 30 waveforms
recorded for each tooth site s = 1, . . . , 1470. The continuous wavelet transform can
be written as
+∞
w(t)ψa,b dt.
(4.1)
C(a, b) =
−∞
Here, w(t) represents a square-integrable 1D function and ψ(t) represents the mother
wavelet. The mother wavelet is transformed in time (t) and scaled in frequency ( f )
using a, b ∈ R, respectively, where a ∝ t and b ∝ f , in order to form the ψa,b (t) in
Eq. 4.1.
To extract features from ws,k (t) for classification, we use the dynamic wavelet
fingerprinting technique (DWFP) (Fig. 4.6), which creates binary contour plots of
the wavelet transform coefficients C(a, b). The DWFP, along with a mother wavelet
ψ(t) and some scaling and translation parameters a, b, applied to the waveforms
ws,k (t) yields an image, I (Fig. 4.7):
ws,k (t)
DW F P(ψa,b )
−→
Is,k (a, b).
(4.2)
Preliminary tests showed that mother wavelets Debauchies 3 (db3) and Symelet
5 (sym5) showed promise for this application, and so both were applied in this
technique. The resulting image I contains fingerprint-like binary contours of the
initial waveform ws,k (t) at tooth site s.
2 http://www.compsci.wm.edu/SciClone/.
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
151
Fig. 4.6 The DWFP technique [58] begins with a the ultrasonic signal, where it generates b wavelet
coefficients indexed by time and scale. Then c the coefficients are sliced and projected onto the
time-scale plane (d)
The next step is to perform image processing through MATLAB’s toolbox in
order to gather properties of the fingerprints in the waveform ws,k . First, the binary
image I is labeled with the 8-connected objects (Fig. 4.7d), allowing each individual
fingerprint in I to be recognized as a separate object using the procedure in Haralick
and Shapiro [71]. Next, properties are measured from each fingerprint. Some of these
properties include counting the on- and off-pixels in the region, but many involve
finding an ellipse matching the second moments of the fingerprint and measuring
properties of that ellipse, such as eccentricity. In addition to the orientation measure
provided by the ellipse, another measurement of inclination relative to the horizontal
axis was determined by Horn’s method for a continuous 2D object [72]. Lastly,
further properties were measured by determining the boundary of the fingerprint and
fitting second- or fourth-order polynomials. Table 4.11 in the appendix summarizes
the features as well as the selected indices from Sect. 4.5.
The image processing routines result in fingerprint properties ps,k,n [t], where n
represents the image processing-extracted fingerprint properties. These properties
are discrete in time because the values of the properties are matched to the time
value of the fingerprint’s center of mass. Linear interpolation yields a smoothed
152
C. B. Acosta and M. K. Hinders
a) Original Waveform
2000
0
−2000
0.5
1
1.5
2
2.5
−5
x 10
b) Windowed and Filtered Waveform
50
0
−50
−100
1.3
1.4
1.35
1.45
1.5
1.55
1.6
−5
x 10
c) DWFP
20
40
1.25
1.3
1.4
1.35
1.45
1.5
1.55
1.6
−5
x 10
d) Labeled Fingerprints
20
40
1.25
20
1.3
1.35
1.4
1.45
1.5
1.55
1.6
0
−5
x 10
Fig. 4.7 In a–c, the DWFP is shown applied to a filtered window of the original data. Image
recognition software through MATLAB is used to recognize each individual fingerprint (d) and
measure their image properties
array of property values, ps,k,n (t). Figure 4.8a shows the sparse values of the DWFP
orientation values for one tooth site, while Fig. 4.8b shows the smoothed values.
4.5 Feature Selection
It is now possible to begin reducing the dimensionality of the features extracted from
the DWFP technique. The reasoning is three-fold. First, most of the measurements do
not likely correspond to pocket depth, since the periodontal pocket is a discrete event
in the ultrasonic waveform while the features extracted from the DWFP technique are
a 1D array in the time domain. Another reason to cull information from the wavelet
fingerprint property dataset is that it is too large to manipulate even on the available
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
b) Smoothed values of Orientation
for one tooth site
Repeated Waveforms
Repeated Waveforms
a) Actual values of Orientation
for one tooth site
5
10
15
20
25
153
5
10
15
20
25
30
30
0.5
0.5
1.5
1
−5
Time (s)
x 10
1.5
1
−5
Time (s)
x 10
Fig. 4.8 The image created by the actual wavelet fingerprint properties a has been smoothed using
a linear approximation for the intervening points (b). The smoothing shown here is indexed by one
value of s, ψ, n but all values of k
computing cluster. Lastly, the set of extracted features, ps,k,n (t), is too large to use
directly in pattern classification routines.
One dimension can be eliminated immediately by averaging over the repeated
waveforms, k, so p = ps,n (t). The remaining dimensionality reduction will be performed by selecting values of p at times ti for particular fingerprint properties n i so
the feature vector at tooth site s, f s , will be formed of f s (i) = ps,ni (ti ). Then the
classification matrix will have rows corresponding to s and columns corresponding
to f s (i).
To select features of interest, we look for values of the fingerprint properties that
are different, on average, for different measured values of pocket depth. Therefore,
for each property n, we find the mean (4.3) and standard deviation (4.4) over that
property for all tooth sites:
N
ps,n (t)
(4.3)
m n (t) =
s=1
N
1 ( ps,n (t) − m n t).
σn (t) = N s=1
(4.4)
The selected features correspond to the times ti at which m n (ti ) varies greatly for
different pocket depths while σn (ti ) remains small. For a particular set of properties,
n i , and their corresponding times, ti , the feature vector for tooth sites would become
154
C. B. Acosta and M. K. Hinders
db3 Orientation: mean feature vector for all pocket depths
50
1
2
3
4
5
6
7
40
30
20
10
0.4
0.6
0.8
1
1.2
Time (s)
1.4
1.6
1.8
−5
x 10
db3 Orientation: STD of feature vector for all pocket depths
25
1
2
3
4
5
6
7
20
15
10
5
0
0.4
0.6
0.8
1
1.2
Time (s)
1.4
1.6
1.8
−5
x 10
Fig. 4.9 The feature vector created by averaging the properties in Fig. 4.8 over the repeated waveforms is plotted here. The first is the average over k for each pocket depth (Eq. 4.3), and the second
is the standard deviation (Eq. 4.4). The red boxes indicate time values of the property selected for
classification (Table 4.11), when the mean differs while the standard deviation is low
f s = { ps,ni (ti )}.
Figure 4.9 shows an example, with the regions of interest marked out by the red
boxes. This feature selection process was performed interactively, not automatically,
though it would be possible to apply this technique with thresholds on the mean and
standard deviation. Also, it is important to note that the black vertical lines mark
the boundaries of the window regions used in the original wavelet fingerprinting.
The wavelet fingerprints distort along the edges of the window, so the often extreme
behavior of the feature vectors near these points is disregarded. In the end, 54 different
features were selected from the DWFP properties from two different mother wavelets
(see Table 4.11 in the appendix).
Two other methods of reducing the dimensionality of the extracted feature space
are possible, however. The method described above first averaged over the k repeated
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
155
waveforms, but it is also possible to avoid reducing dimensionality until after classification. One option is to build feature vectors for each of the k repeated waveforms
and classify them separately. This yields feature vectors of the form
f s,k = { ps,k,ni (ti )},
which are used for training and testing of the classifier related to each of the k =
1 . . . 30 waveforms. The extra k th dimension is collapsed later. Another way is to use
the f s for training and the f s,k for testing the classifier, again reserving dimensionality
reduction until after classification. All three of these basic methods were performed:
1. Train and test the classifier with features f s ,
2. Train and test the classifier f s,k for each k = 1 . . . 30, and
3. Train the classifier with f s and test it with f s,k .
4.6 Classification: A Binary Classification Algorithm
Once the features have been generated for the wavelet fingerprint properties, we then
apply various standard pattern classification techniques from the PRTools catalog of
MATLAB functions [73]. Many of these are Bayesian techniques (Table 4.1) except
in this case we have set the prior probabilities as equal for all classes. Unfortunately,
when the classification matrix is used with these classification maps in a leave-oneout technique, no less than 60% error is observed. In all of these tests, the map tends
to classify most of the waveforms in those pocket depths that have the largest number
of objects, namely, 2 and 3 mm, because they are the most populated pocket depths
in the clinical dataset (Fig. 4.4).
Table 4.1 A list of the standard classifiers from the PRTools catalog of classification functions for
MATLAB [73]. These maps were also later used in the binary classification algorithm
Classifier
Description
LDC
QDC
KLLDC
PCLDC
LOGLC
FISHERC
NMC
NMSC
POLYC
SUBSC
KNNC
Linear classifier using normal densities
Quadratic classifier using normal densities
Linear classifier using Karhunen–Loève expansion of covariance matrix
Linear classifier using principle component analysis
Linear logistic classifier
Linear minimum least square classifier
Nearest mean classifier
Scaled NMC
Untrained classification with additional polynomial features
Subspace classifier
k nearest neighbor
156
C. B. Acosta and M. K. Hinders
Table 4.2 A sketch of the steps in the binary classification algorithm is shown below. The classification step was performed with [73]
INPUT:
C
Classifier map (such as KNN)
f s,k (i)
Array of features
ds,k
Array of manually measured pocket depth values
INDICES:
s
Tooth site index
k
Index of repeated waveforms
i
Feature vector index
r
Repetition index
FOR LOOPS:
Pick
k ∈ 1, . . . , 30
Pick
s ∈ 1, . . . , 1470
Pick
Binary pocket depth pairs
pd1, pd2 ∈ 1, . . . 7, pd1 = pd2
Pick
{S1|(s ∈
/ S1)&(d S1,k = pd1)}
And
{S2|(s ∈
/ S2)&(d S2,k = pd2)}
With
|S1| = |S2|
REPEAT
r
Until 90% data sampled
Under these conditions
CLASSIFY
T
= C ( f §1,2 ,k , d S1,2 ,k )
TEST
L
= C · f s,k (i)
SAVE
Labels
L (k, s, pd1, pd2, r )
In order to counter this predisposition of the standard classification schemes to
classify all the objects into the highest volume classes, a binary classification scheme
was developed. The procedure is similar to the one-versus-one technique of support
vector machines [74]. The basic idea is to classify the waveform associated with
any tooth site against only two possible classes at a time using standard classifiers
from Table 4.1. The training and test sets are divided from the data using a leaveone-out technique. If the number of objects in each class differs, a random sample
from the larger class of equal size to the smaller class is chosen for training. This
process is repeated until at least 90% of the waveforms from the larger class size have
been sampled. With each repetition, the predicted labels are stored. The procedure
is labeled a binary algorithm because each classification is restricted to only two
possible pocket depth values. Table 4.2 shows a flowchart of our binary classification
algorithm.
In general, the predicted pocket depth is calculated as the most frequently labeled
value, but before discussing the dimensionality reduction of the matrix of labels, we
first examine the viability of the binary classification algorithm.
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
157
Table 4.3 In any binary pocket depth pairs pd1, pd2, the table shows the % of pocket depths
measurements accurately predicted whenever the test object is one of the binary pairs. The rows
correspond to pd1 and the columns to pd2. Note that the larger accuracies occur for largely different
pocket depth pairs, which means the classifier tends to classify the tooth site as closer to its manually
measured pocket depth value
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
57.2
64.3
70.1
74.5
81.6
66.7
55.4
0
58.9
64.7
73.5
68.4
44.4
70.1
62.5
0
54.9
59.8
63.2
44.4
75.7
66.2
56.3
0
45.1
68.4
44.4
76.3
70.6
61.8
60.3
0
44.7
55.6
79.1
73.7
61.6
66.2
54.9
0
77.8
74.6
71.5
69
61.8
71.6
65.8
0
4.6.1 Binary Classification Algorithm Examples
In our binary classification algorithm, the waveform from one tooth site is tested
against binary pairs of all possible pocket depths using the classification matrices,
even if the manually measured pocket depth of that tooth site is not one of the classes
in the binary choice of pocket depths. To examine the accuracy of this technique, we
first test the binary classification algorithm on a set of waveforms when the manually measured pocket depth value is one of the binary choices. Table 4.3 shows the
percentage of manually measured pocket depths correctly identified by the binary
classification algorithm using a linear Bayes normal classifier (LDC). The grid is
indexed by the binary pocket depth pairs, so that since no pocket depth class is tested
against itself, the diagonal is set to zero. Note that the numbers near the diagonal are
smaller, which implies that the scheme finds it difficult to resolve adjacent pocket
depth measurements. The fact that adjacent pocket depths are poorly resolved is a
positive feature of the classifier, since that may imply that they share similar characteristics, which we would expect from pocket depths only 1 mm apart. Also, the
precision of the manual probe is itself only within 1 mm at best, because the markings
on the probe are spaced at 1 mm intervals. So it is possible that a reported manual
pocket depth measurement could actually differ by 1 mm, due only to the imprecision
of that probe.
It is much more difficult to quantify how well the classification scheme works
when the manually measured pocket depth is not one of the choices. Consider three
examples from the same classifier as above, LDC, using the same classification
matrix (Table 4.4). Three different binary pairs are shown. The percent of tooth sites
classified in each of the binary pairs is displayed in the table. Note that waveforms
will be labeled with the member of the binary pair closest to its manual pocket depth
measurement. The accuracy increases the more the binary pocket depth pairs differ.
The results imply that the binary classification scheme not only accurately classifies
158
C. B. Acosta and M. K. Hinders
Table 4.4 The percent of tooth sites classified in each of the predicted binary pairs is shown for
three different binary pair choices
Actual pocket depth (mm)
Label
1
2
3
4
5
6
7
pd1
pd2
pd1
pd2
pd1
pd2
1
2
1
7
2
5
55.4
44.6
74.6
25.4
83.1
16.9
42.8
57.2
74.6
25.4
70.6
29.4
31.2
68.8
63.2
36.8
50.3
49.7
26.5
73.5
57.8
42.2
38.2
61.8
21.6
78.4
51.0
49.0
26.5
73.5
28.9
71.1
47.4
52.6
18.4
81.6
44.4
55.6
33.3
66.7
22.2
77.8
the ultrasonic waveforms when their manual measurement is one of the binary choices
(Table 4.3) but when it is not one of the choices, the binary classification algorithm
applies a label closer to the actual value.
4.6.2 Dimensionality Reduction
The predicted class labels L returned by the binary classification algorithm are
higher dimensional, even if the index k is averaged over before classification, as
discussed in Sect. 4.5. Four different methods of dimensionality reduction were performed to yield only one label L p (s) per tooth site s:
1. Majority Rule: The most frequently labeled pocket depth is declared the predicted
pocket depth
L p (s) = mode(L (k, s, pd1, pd2, r )).
2. Weighted Probability 1: The first method unfairly weights the labels from the
repeated index r . This method first finds the most frequently labeled pocket depth
in the repeated index and then calculates the most frequently labeled pocket depth
L (k, s, pd1, pd2) = mode(L (k, s, pd1, pd2, r ))
.
L p (s) = mode(L )
3. Weighted Probability 2:This method creates a normalized vector of weights
from the binary pocket
depth choice to find the most probable pocket depth
w(k, s, pd1, pd2) = 17 L (k, s, pd1, pd2)
.
L p = max(w · L )
4. Process of elimination: Statistics from the above four methods are combined in
a process of elimination to predict the final pocket depth.
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
159
4.6.3 Classifier Combination
The binary classification algorithm as discussed above can be configured in three
different ways with respect to the waveform index k, and the dimensionality can
be reduced in four different ways as discussed in Sect. 4.6.2. All of these methods
were applied using 11 different maps from Table 4.1. To combine the results of these
classifiers, each individual classifier was sorted by average accuracy within 1 mm and
the highest percent, where the percent ranged from 10 to 95% was combined using
the same dimensionality reduction methods described above. The mean or mode of
the labels can be calculated, or the most probable label can be calculated. Of these, the
mean and highest probability methods are majority voting methods, while the mean
method is a democratic voting method. Combining classifiers can often reduce the
error of pattern classifiers but can never substitute for proper classification techniques
applied when the individual classifiers are formed [75].
4.7 Results and Discussion
The binary classification algorithm was applied as described in the procedure above.
The primary method of measuring the success of each technique is finding the percent of waveforms accurately described within 1 mm per pocket depth and averaging
over the accuracy per pocket depth. If we instead tried to measure the total number of
waveforms accurately described within the manual probe’s 1 mm precision regardless of pocket depth measurement, we tend to select for the classification techniques
that accurately describe only the most populated pocket depths. We performed these
tests for all seven possible pocket depths from the manually measured dataset, but we
also restricted the datasets to the 1–5 mm pocket depths, since there are so few measurements in the 6–7 mm range in our patient population, and we further restricted
the possible pocket depths to the 2–5 mm set, since we felt the 1 mm datasets might
be poorly described by the ultrasonographic probe because overlapping reflections
from inside the tip seem to cover that range. We show here results before classifier
combination and after, but only the highest average percent accurately identified
within 1 mm will be shown. The following results give the percent correctly identified per pocket depth (% correct) as well as the percent accurately identified within
1 mm tolerance (% close) (Tables 4.5 and 4.6). We also show the results graphically
in a chart indexed by the manual periodontal probe and the ultrasonographic periodontal probe. These charts are useful in determining the strengths and weaknesses
of each classification routine. Each row was normalized by the number of manual
measurements per pocket depth to yield the percent of predicted pocket depth labels
for each manual pocket depth.
The results displayed in Fig. 4.10a–c show results without combining labels from
different binary classification schemes for three different collections of possible
pocket depths. Figure 4.10d–f similarly shows results when classifiers are combined.
160
C. B. Acosta and M. K. Hinders
b)
a)
100
2
75
4
50
6
25
6
4
c)
2
0
100
1
2
3
4
5
75
50
25
5
4
3
d)
2
1
100
2
3
4
5
2
3
4
5
100
75
2
75
50
4
50
25
6
25
0
2
4
f)
e)
100
1
2
3
4
5
75
50
25
1
2
3
g)
4
5
0
1
2
3
4
5
75
50
25
2
3
i)
4
5
0
1
2
3
4
5
75
50
25
2
3
100
2
75
3
50
4
25
5
2
4
3
5
0
100
2
75
3
50
4
25
5
2
3
4
5
0
j)
100
1
0
6
h)
100
1
0
4
5
0
100
2
75
3
50
4
25
5
2
3
4
5
0
Fig. 4.10 The percent of tooth sites with a given manual pocket depth measurement (mm) is plotted
versus their ultrasonographic periodontal probe measurement. Plots a–c and g–h were classifier
results obtained without combination, but d–f and i–j demonstrate the classifier combination results.
Plots a–f demonstrate the highest possible average accuracy within 1 mm, while g–j were manually
adjusted to include more spread of labels. The array of possible pocket depths used in the classifier
is evident in the axes labels
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
161
Histogram of best classification scheme
800
600
400
200
0
1
2
3
US pocket depth labels
4
5
Histogram of revised classification scheme
400
300
200
100
0
1
2
3
US pocket depth labels
4
5
Fig. 4.11 A comparison of labeling in the original (Fig. 4.10b) and revised (Fig. 4.10g) classification
schemes for pocket depths 1–5 is shown here to illustrate the improved spread of labels
There is a small improvement when classifiers are combined, and the time required
for the extra classification is slight as well. Note that only the most accurate results
are displayed here, up to 1 mm confidence. If the goal is to label all the waveforms as
closely as possible, regardless of spread, this is sufficient. However, as the figures of
manual versus ultrasonographic probe results show, especially in the reduced pocket
depth cases, the lowest and highest pocket depth values do not receive labels at all in
the ultrasonographic probe case. A compromise may be desirable between accuracy
and large spread of labels. Figure 4.10g–j shows results for the restricted pocket depth
cases with smaller accuracy but more spread in the labels. These were reconfigured to
display a balance between a high precision and a larger spread of labels. Figure 4.11
shows how the spread of labels changes in these specially revised cases. Table 4.7
summarizes the average percent of measurements accurately classified within 1 mm
per pocket depth.
The results above show that the highest average percent accuracy for all the data
collected from 12 patients in the ODU clinical trials using a fifth-generation prototype
is at best 86.6% within a tolerance of 1 mm. Meanwhile, the best accuracy at that
tolerance using the fourth-generation probe from previous methods without using
pattern classification was 60% for a single patient [67].
162
C. B. Acosta and M. K. Hinders
Table 4.5 Percent of tooth sites accurately classified by the ultrasonographic probe
Classifier PDs used % Correct per pocket depth (mm)
combi(mm)
nation
1
2
3
4
5
6
No
No
No
Yes
Yes
Yes
Norevised
Norevised
Yesrevised
Yesrevised
7
1–7
1–5
2–5
1–7
1–5
2–5
1–5
40.7
0.6
0.0
35.0
0.0
0.0
0.6
42.2
43.9
3.1
36.9
55.8
3.3
43.9
6.6
24.6
46.0
8.4
28.7
49.9
24.6
8.3
56.9
60.3
20.1
51.5
60.3
56.9
50.0
0.0
0.0
48.0
0.0
0.0
0.0
28.9
0.0
0.0
34.2
0.0
0.0
0.0
100.0
0.0
0.0
100.0
0.0
0.0
0.0
2–5
0.0
3.1
46.0
60.3
0.0
0.0
0.0
1–5
0.0
55.8
28.7
51.5
0.0
0.0
0.0
2–5
0.0
3.3
49.9
60.3
0.0
0.0
0.0
Table 4.6 Percent of tooth sites accurately classified within 1 mm by the ultrasonographic probe
Classifier PDs used % Close per pocket depth (mm)
combi(mm)
nation
1
2
3
4
5
6
7
No
No
No
Yes
Yes
Yes
Norevised
Norevised
Yesrevised
Yesrevised
1–7
1–5
2–5
1–7
1–5
2–5
1–5
71.2
69.5
0.0
66.7
71.8
0.0
69.5
67.8
77.5
70.9
63.8
81.9
71.7
77.5
46.0
99.6
100.0
52.4
100.0
100.0
99.6
53.9
78.9
99.0
65.7
81.4
99.0
78.9
75.5
71.6
73.5
84.3
64.7
75.5
71.6
76.3
0.0
0.0
71.1
0.0
0.0
0.0
100.0
0.0
0.0
100.0
0.0
0.0
0.0
2–5
0.0
70.9
100.0
99.0
73.5
0.0
0.0
1–5
71.8
81.9
100.0
81.4
64.7
0.0
0.0
2–5
0.0
71.7
100.0
99.0
75.5
0.0
0.0
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
163
Table 4.7 Summary table of accuracy within 1 mm (%). The table below shows accuracy (%) within
the tolerance of 1 mm for the most accurate and revised classification schemes, which involve more
widely spread labels than the original
Pocket depths
Classifier combination
used (mm)
No
Revised
Yes
Revised
1–7
1–5
2–5
70.1
79.4
85.9
–
71.3
81
72.0
79.9
86.6
–
76.8
80.6
Bland−Altman statistical analysis
for predicted labels (L) and manual labels (M)
8
6
4
L−M [mm]
2
0
−2
−4
L−M vs (1/2)(L+M)
μ
μ ± 2σ
−6
−8
1
2
3
4
5
6
7
(1/2)(L+M) [mm]
Fig. 4.12 A sample Bland–Altman statistical plot is generated here from the best combined classifier results for possible pocket depths 1–7 mm. In this figure, only 6% of the labels were outside
the range of μ ± 2σ
4.8 Bland–Altman Statistical Analysis
We compared the binary classification labels with the manual probe labels using
the Bland–Altman method, which is recommended for comparing different measurement schemes [76]. Figure 4.12 shows a sample plot of this method, in which
the difference of the ultrasonographic measurements is plotted against the mean of
those measurements. Any trend between the difference and the mean of the labels
would indicate bias in the measurements. No trend is visible here and the grid-like
distribution results from the discrete nature of the labels. The numerical results are
displayed by pocket depth in Tables 4.8, 4.9, and 4.10. Included are the values and
164
C. B. Acosta and M. K. Hinders
Table 4.8 The table shows the values and confidence intervals at 95% for the mean and the limits
of agreement, which is μ ± σ , for classifiers using 1–7 mm
Pocket depths
Classifier combination
1–7 mm
No
Yes
μ(mm)
Lower confidence
Upper confidence
μ − 2σ
Lower confidence
Upper confidence
μ + 2σ
Lower confidence
Upper confidence
σ (mm)
0.44
0.36
0.51
−3.12
−3.25
−2.98
3.99
3.86
4.12
1.77
0.62
0.55
0.69
−2.83
−2.95
−2.70
4.07
3.94
4.20
1.72
Table 4.9 Bland–Altman statistical analysis for classification schemes using pocket depths 1–5 mm
Pocket depths
Classifier combination
1–5 mm
No
Revised
Yes
Revised
μ(mm)
0.21
Lower confidence 0.16
Upper confidence
0.25
μ − 2σ
−1.98
Lower confidence −2.06
Upper confidence −1.89
μ + 2σ
2.39
Lower confidence 2.31
Upper confidence
2.47
σ (mm)
1.09
0.30
0.24
0.36
−2.58
−2.69
−2.47
3.18
3.07
3.29
1.44
0.18
0.13
0.22
−1.98
−2.06
−1.90
2.33
2.25
2.42
1.08
0.12
0.07
0.17
−2.17
−2.25
−2.08
2.40
2.32
2.49
1.14
the confidence intervals at 95% of the mean and the mean plus or minus twice the
standard deviation (μ ± 2σ ). The results show that the mean of the difference is low,
less than 0.5 mm in most cases, and the standard deviation is just above 1 mm for
most classification schemes. However, the limits of agreement of μ ± 2σ are large,
around 2.5 mm, indicating that some ultrasonographic labels differ widely from the
manual labels.
It remains to be determined whether the accuracy of the ultrasonographic probe
is sufficient. Bassani et al. [76] showed that differences of 1.5 mm in manual pocket
depth measurements between different observers are not uncommon, which is closer
to the limits of agreement in the 2–5 mm classification schemes. Ahmed [77] compared manual to controlled force probes using Bland–Altman statistics and found
that the limits of agreement (μ + σ to μ − σ ) was ± 3.31 mm, which is larger than
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
165
Table 4.10 Bland–Altman statistical analysis for classification schemes using pocket depths 2–
5 mm
Pocket depths
Classifier combination
2–5 mm
No
Revised
Yes
Revised
μ(mm)
0.47
Lower confidence 0.43
Upper confidence
0.52
μ − 2σ
−1.37
Lower confidence −1.45
Upper confidence −1.30
μ + 2σ
2.32
Lower confidence 2.25
Upper confidence
2.39
σ (mm)
0.92
0.32
0.27
0.37
−1.77
−1.85
−1.68
2.41
2.32
2.49
1.04
0.47
0.43
0.51
−1.37
−1.45
−1.30
2.31
2.24
2.39
0.92
0.22
0.17
0.28
−1.99
−2.08
−1.90
2.44
2.35
2.53
1.11
any but the upper limits of agreement of the 1–7 mm ultrasonographic probe configuration described here. Yang et al. [78] compared measurements of a controlled
force probe in four different ways and determined that, in most cases, the measurement error was around 0.5 mm, which is lower than the manual probe’s 1 mm
precision. However, controlled force probes tend to be more reproducible than the
manual probe, which we are using as our gold standard. Velden [79] showed that,
when using controlled force probes, which are more accurate than manual probes,
the best agreement occurred when 0.75 N of force was applied, and since the manual
probe was used as the gold standard here, we cannot be entirely sure what force
was being applied. Rams and Slots [80] compared controlled force probes to manual
probes and found that the standard deviation between measurements was between 0.9
and 0.95 mm for lower pocket depths and 1.0 and 1.3 mm for higher pocket depths,
and the manual probe almost always had a higher standard deviation than controlled
force probes. These values are closer to the standard deviation of the difference in the
2–5 mm and 1–5 mm ultrasonographic probe configurations. The authors also found
that the measurements were reproducible within 1.0 mm only in 82–89% of the cases,
which is within the range of the most accurate ultrasonographic probe classification
schemes described here. Mayfield et al. [81] found a higher degree of reproducibility, 92–96%, when testing two different observers. Lastly, Tupta-Veselicky et al. [82]
found that the reproducibility up to 1 mm with a conventional probe was 92.8% when
tested at 20 sites 2 hours apart by one examiner, again demonstrating the inaccuracy
of the manual probe itself.
166
C. B. Acosta and M. K. Hinders
4.9 Conclusion
We have described the development of the ultrasonographic periodontal probe and
the clinical tests on 12 patients. The resulting waveforms were analyzed with DWFP
processing and image recognition techniques applied to extract numerical measurements from these wavelet fingerprints. These parameters were optimized and averaged to yield training datasets for pattern recognition analysis, where testing datasets
were configured for a leave-one-out technique. The binary classification algorithm
was described and applied to these classification sets, and the labels from this technique were combined to strengthen the results. Different sets of possible pocket
depths can be used, since there are small number of measurements in several of
the pocket depths. The results can be configured either to yield the highest average
percent of waveforms correctly identified within 1 mm, or they can be configured
to yield a larger spread in the type of labels with approximately a 5% decrease in
accuracy within 1 mm. Overall, the results from the classification scheme can identify ultrasonographic periodontal probe measurements closely to the manual probe,
so that 70.1–86.6% of the ultrasonographic measurements are within 1 mm of the
manual measurement. These values are close to those in the literature comparing
other periodontal probes to the manual probe and may be due to the imprecision and
low reproducibility of the manual probe itself.
We conclude that the ultrasonographic periodontal probe is a viable technology
and that we can use sophisticated mathematical techniques to extract pocket depth
information. To yield better results, a larger training dataset will be required.
Acknowledgements The authors would like to acknowledge partial funding support from DentSply,
Int’l and from the Virginia Space Grant Consortium. We would also like to thank Gayle McCombs
of ODU for acquiring the clinical data and Jonathan Stevens for constructing the investigational
device.
Appendix
The feature selection method described in Sect. 4.5 was applied to collect sample
number indices to later construct feature vectors from the fingerprint properties.
Table 4.11 describes each property used for both mother wavelets and the sample
numbers that were used to select the feature vectors.
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
167
Table 4.11 The types of features extracted using image recognition techniques from MATLAB are
described here, along with the sample number indices resulting from the feature selection process
described above from fingerprints generated using each mother wavelet (db3, sym5)
Feature
Description
Sample numbers
db3
sym5
Inclination
Angle between fingerprint and
horizontal axis, measured by binary
image recognition techniques
Orientation Angle between fingerprint and
horizontal axis, measured by ellipse
matching second moments
Eccentricity Eccentricity of ellipse matching
second moments of the fingerprint
Area
Number of pixels in the fingerprint
region
Y-center
Value of y-axis (scale) of the binary
image associated with its center of
mass
Euler number Number of objects in the fingerprint
minus the number of holes in the
object
Deg2fit, p1
First coefficient of second degree
polynomial fitting the boundary of
the fingerprint
Deg2fit, p2
Second coefficient of second degree
polynomial
Deg2fit, p3
Third coefficient of second degree
polynomial
First coefficient of fourth degree
Deg4fit, p1
polynomial fitting the boundary of
the fingerprint
Deg4fit, p2
Second coefficient of fourth degree
polynomial
Deg4fit, p3
Third coefficient of fourth degree
polynomial
Deg4fit, p4
Fourth coefficient of fourth degree
polynomial
Deg4fit, p5
Fifth coefficient of fourth degree
polynomial
1463, 2300, 2661
1200, 2400
1040, 2300
2215
2850
1150, 1500, 2540
1200, 2300, 2600
2270, 2800
1470, 2600
1400, 2258, 2550,
2650
1572
–
1200, 2672
1166, 1600, 2266,
2566
–
2275
–
1146, 2533
2833
2042, 2250, 2550,
2650
2831
1500, 2275, 2575
1200
1495, 2275, 2575
2850
2565
2865
–
References
1. Ghorayeb SR, Bertoncini CA, Hinders MK (2008) Ultrasonography in dentistry. IEEE Trans
Ultrason Ferr Freq Cont 55:1256–1266
2. Bains VK, Mohan R, Gundappaand M, Bains R (2008) Properties, effects and clinical applications of ultrasound in periodontics: an overview. Perio (2008) 5:291–302
168
C. B. Acosta and M. K. Hinders
3. Agrawal P, Sanikop S, Patil S (2012) New developments in tools for periodontal diagnosis. Int
Dent J 62:57–64
4. Hayashi T (2012) Application of ultrasonography in dentistry. Jpn Dent Sci Rev 48:5–13
5. Marotti J, Heger S, Tinschert J, Tortamano P, Chuembou F, Radermacher K, Wolfar S (2013)
Recent advances of ultrasound imaging in dentistry - a review of the literature. Oral Surg Oral
Med Oral Pathol Oral Radiol 115:819–832
6. Evirgen S, Kamburoglu K (2016) Review on the applications of ultrasonography in dentomaxillofacial region. World J Radiol 8:50–58
7. Hinders MK, Companion JA (1998) Ultrasonic periodontal probe. In: Thompson DO, Chimenti
DE (eds) 25th review of progress in quantitative nondestructive evaluation 18b. Plenum Press,
New York, pp 1609–1615
8. Hinders MK, Guan A, Companion JA (1998) Ultrasonic periodontal probe. J Acoust Soc Am
104:1844
9. Hartman SK (1997) Goodbye gingivitis. Virginia Business 9
10. Companion JA (1998) Differential measurement periodontal structures mapping system. US
Patent 5,755,571
11. Farr C (2000) Ultrasonic probing: the wave of the future in dentistry. Dent Today 19:86–91
12. Hinders MK, Lynch JE, McCombs GB (2001) Clinical tests of an ultrasonic periodontal probe.
In: Thompson DO, Chimenti DE (eds), 28th review of progress in quantitative nondestructive
evaluation 21b, pp 1880–1890
13. Lynch JE (2001) Ultrasonographic measurement of periodontal attachment levels. Ph.D thesis
Department of Applied Science, College of William and Mary Williamsburg, VA
14. Lynch JE, Hinders MK (2002) Ultrasonic device for measuring periodontal attachment levels.
Rev Sci Instrum 73:2686–2693
15. Hou J (2004) Ultrasonic signal detection and characterization using dynamic wavelet fingerprints. Ph.D thesis Department of Applied Science, College of William and Mary Williamsburg,
VA
16. Carranza FA, Newman MG (1996) Clinical periodontology, 8th edn. W B Saunders Co,
Philadelphia
17. Peterson PE, Ogawa H (2005) Strengthening the prevention of periodontal disease: the who
approach. J Periodontol 76:2187–2193
18. Lang NP, Corbet EF (1995) Periodontal diagnosis in daily practice. Int Dent J 45:5–15
19. Varitz R (2007) Wavelet wavelet transform and pattern recognition method for heart sound
analysis. US Patent 20070191725
20. Aussem A, Campbell J, Murtagh F (1998) Wavelet-based feature extraction and decomposition
strategies for financial forecasting. J Comp Int Finance 6:5–12
21. Tang YY, Yang LH, Liu J, Ma H (2000) Wavelet theory and its application to pattern recognition.
World Scientific, River Edge
22. Brooks RR, Grewe L, Lyengar SS (2001) Recognition in the wavelet domain: a survey. J
Electron Imaging 10:757–784
23. Nason GP, Silverman BW (1995) The stationary wavelet transform and some statistical applications. In: Antoniadis A, Oppenheim G (eds) Wavelets and statistics lecture notes in statistics.
Springer, Germany, pp 281–299
24. Pittner S, Kamarthi SV (1999) Feature extraction from wavelet coefficients for pattern recognition tasks. IEEE Trans Pattern Anal Mach Intel 21:83–88
25. Sabatini AM, A digital-signal-processing technique for ultrasonic signal modeling and classification. IEEE Trans Instrum Meas 50:15–21
26. Coifman R, Wickerhauser M (1992) Entropy based algorithms for best basis selection. IEEE
Trans Inform Theory 38:713–718
27. Szu HH, Telfer B, Kadambe S (1992) Neural network adaptive wavelets for signal representation and classification. Opt Eng 31:1907–1916
28. Telfer BA, Szu HH, Dobeck GJ, Garcia JP, Ko H, Dubey A, Witherspoon N (1994) Adaptive
wavelet classification of acoustic and backscatter and imagery. Opt Eng 33:2192–2203
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
169
29. Mallet Y, Coomans D, Kautsky J, Vel OD (1997) IEEE Trans Pattern Anal Mach Intel 19:1058–
1066
30. Mallat S, A theory for multiresolution signal processing: the wavelet representation. IEEE
Trans Pattern Anal Mach Intel 11:674–693
31. Antoine J-P, Barachea D Jr, RMC, da Fontoura Costa L (1997) Shape characterization with the
wavelet transform. Signal Process 62:674–693
32. Yeh CH (2003) Wavelet-based corner detection using eigenvectors of covariance matrices.
Pattern Recogn Lett 24:2797–2806
33. Chapa JO, Raghuveer MR (1995) Optimal matched wavelet construction and its application to
image pattern recognition. Proc SPIE 2491:518–529
34. Liang J, Parks TW (1996) A translation-invariant wavelet representation algorithm with applications. IEEE Trans Sig Proc 44:224–232
35. Maestre RA, Garcia J, Ferreira C (1997) Pattern recognition using sequential matched filtering
of wavelet coefficients. Opt Commun 133:401–414
36. Murtagh F, Starck J-L, Berry MW (2000) Overcoming the curse of dimensionality in clustering
by means of the wavelet transform. Comput J 43:107–120
37. Yu T, Lam ECM, Tang YY (2001) Feature extraction using wavelet and fractal. Pattern Recogn
Lett 22:271–287
38. Tsai D-M, Chiang C-H (2002) Rotation-invariant pattern matching using wavelet decomposition. Pattern Recogn Lett 23:191–201
39. Du T, Lim KB, Hong GS, Yu WM, Zheng H (2004) 2d occluded object recognition using
wavelets. In: 4th International conference on computer and information technology, pp 227–
232
40. Saito N, Coifman RR (1994) Local discriminant bases. Proc SPIE 2303:2–14
41. Livens S, Scheunders P, de Wouwer GV, Dyck DV, Smets H, Winkelmans J, Bogaerts W (2004)
2d occluded object recognition using wavelets. In: Hlavác V, Sára R (eds) Computer analysis
of images and patterns V. Springer, Berlin, pp 538–543
42. Tansel IN, Mekdeci C, Rodriguez O, Uragun B (1993) Monitoring drill conditions with wavelet
based encoding and neural networks. Int J Mach Tool Manu 33:559–575
43. Tansel IN, Mekdeci C, McLaughlin C (1995) Detection of tool failure in end milling with
wavelet transformations and neural networks (wt-nn). Int J Mach Tool Manu 35:1137–1147
44. Learned RE, Wilsky AS (1995) A wavelet packet approach to transient signal classification.
Appl Comput Harmon Anal 2:265–278
45. Wu Y, Du R (1996) Feature extraction and assessment using wavelet packets for monitoring
of machining processes. Mech Syst Signal Process 10:29–53
46. Case TJ, Waag RC (1996) Flaw identification from time and frequency features of ultrasonic
waveforms. IEEE Trans Ultrason Ferr Freq Cont 43:592–600
47. Drai R, Khelil N, Benchaala A (2002) Time frequency and wavelet transform applied to selected
problems in ultrasonics nde. NDT& E Int’l 35:567–572
48. Buonsanti M, Cacciola M, Calcagno S, Morabito FC, Versaci M (2006) Ultrasonic pulseechoes
and eddy current testing for detection, recognition and characterisation of flaws detected in
metallic plates. In: Proceedings of the 9th European conference non-destructive testing. Berlin,
Germany
49. Momenan R, Loew MH, Insana MF, Wagner RF, Garra BS (1990) Application of pattern recognition techniques in ultrasound tissue characterization. In: Proceedings of 10th international
conference on pattern recognition, vol 1, pp 608–612
50. Bankman IN, Johnson KO, Schneider W (1993) Optimal detection, classification, and superposition resolution in neural waveform recordings. IEEE Trans Biomed Eng 40(8):836–841
51. Kalayci T, Özdamar O (1995) Wavelet preprocessing for automated neural network detection
of eeg spikes. IEEE Eng Med Biol 14:160–166
52. Tate R, Watson D, Eglen S (1995) Using wavelets for classifying human in vivo magnetic
resonance spectra. In: Antoniadis A, Oppenheim G (eds) Wavelets and statistics. Springer,
New York, pp 377–383
170
C. B. Acosta and M. K. Hinders
53. Mojsilovic A, Popovic MV, Neskovic AN, Popovic AD (1995) Wavelet image extension for
analysis and classification of infarcted myocardial tissue. IEEE Trans Biomed Eng 44:856–866
54. Georgiou G, Cohen FS (2001) Tissue characterization using the continuous wavelet transform.
i. decomposition method. IEEE Trans Ultrason Ferr Freq Cont 48:355–363
55. Georgiou G, Cohen FS, Piccoli CW, Forsberg F, Goldberg BB (2001) Tissue characterization
using the continuous wavelet transform. ii. application on breast rf data. IEEE Trans Ultrason
Ferr Freq Cont 48:364–373
56. Lee W-L, Chen Y-C, Hsieh K-S (2003) Ultrasonic liver tissues classification by fractal feature
vector based on m-band wavelet transform. IEEE Trans Med Imag 22:382–392
57. Alacam B, Yazici B, Bilgutay N, Forsberg F, Piccoli C (2004) Breast tissue characterization
using farma modeling of ultrasonic rf echo. Ultrasound Med Biol 30:1397–1407
58. Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals.
Mate Eval 60:1089–1093
59. Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Prob 20:1873
60. Jones R, Leonard KR, Hinders MK (2007) Wavelet thumbprint analysis of time domain reflectometry signals for wiring flaw detection. Eng Intell Syst 15:65–79
61. Bingham J, Hinders M, Friedman A (2009) Lamb wave detection of limpet mines on ship hulls.
Ultrasonics 49:706–722
62. Bertoncini C, Hinders M (2010) Fuzzy classification of roof fall predictors in microseismic
monitoring. Measurement 43:1690–1701
63. Bertoncini C, Nousain B, Rudd K, Hinders M (2012) Wavelet fingerprinting of radio-frequency
identification (rfid) tags. IEEE Trans Ind Electron 59:4843–4852
64. Miller C, Hinders M (2014) Classification of flaw severity using pattern recognition for guided
wave-based structural health monitoring. Ultrasonics 54:247–258
65. Lv H, Jiao J, Meng X, He C, Wu B (2017) Characterization of nonlinear ultrasonic effects
using the dynamic wavelet fingerprint technique. J Sound Vib 389:364–379
66. Hinders, M. K. and Hou, J. R. (2004) Ultrasonic periodontal probing based on the dynamic
wavelet fingerprint. In Thompson, D. O. and Chimenti, D. E. (eds.), 31st Review of Progress
in Quantitative Nondestructive Evaluation 24b. AIP Conference Proceedings
67. Hou JR, Rose ST, Hinders MK (2005) Ultrasonic periodontal probing based on the dynamic
wavelet fingerprint. Eurasip J on Appl Signal Processing 7:1137–1146
68. Rudd K, Bertoncini C, Hinders M (2009) Simulations of ultrasonographic periodontal probe
using the finite integration technique. Open Acoustics 2:1–19
69. Lynch JE, Hinders MK, McCombs GB (2006) Clinical comparison of an ultrasonographic
periodontal probe to manual and controlled-force probing. Measurement 39:429–439
70. Hinders MK, McCombs GB (2006) The potential of the ultrasonic probe. Dim Dent Hygiene
4:16–18
71. Haralick RM, Shapiro LG (1992) Computer and Robot Vision. Addison-Wesley
72. Horn BKP (1986) Robot Vision. MIT Press
73. Duin, R. P. W. (2000) PRTools Version 3.0: A Matlab Toolbox for Pattern Recognition. Delft
University of Technology
74. Platt, J. C., Cristianini, N., and Shawe-taylor, J. (2000) Large margin dags for multiclass
classification. Advances in Neural Information Processing Systems, pp. 547–553. MIT Press
75. Kuncheva LI (2004) Combining pattern classifiers methods. Wiley, New York
76. Bassani DG, Miranda LA, Gustafsson A (2007) Use of the limits of agreement approach in
periodontology. Oral Health Prev Dent 5:119–24
77. Ahmed N, Watts TLP, Wilson RF (1996) An investigation of the validity of attachment level
measurements with an automated periodontal probe. J Clin Periodontol 23:452–455
78. Yang MCK, Marks RG, Magnusson I, Clouser B, Clark WB (1992) Reproducibility of an
electronic probe in relative attachment level measurements. J Clin Periodontol 19:306–311
79. Velden U (1979) Probing force and the relationship of the probe tip to the periodontal tissues.
J Clin Periodontol 6:106–114
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe
171
80. Rams, T. E. and Slots, J. Comparison of two pressure-sensitive periodontal probes and a manual
probe in shallow and deep pockets. Int J Periodont Rest Dent, 13, 521–529
81. Mayfield L, Bratthall G, Attström R (1996) Periodontal probe precision using 4 different
periodontal probes. J Clin Periodontol 23:76–82
82. Tupta-Veselicky L, Famili P, Ceravolo FJ, Zullo T (1994) A clinical study of an electronic
constant force periodontal probe. J Periodontol 65:616–622
Chapter 5
Spectral Intermezzo: Spirit Security
Systems
Mark K. Hinders
Abstract Machine learning methods are sometimes applied in ways that make no
sense, and then marketed as software or services. This is increasingly an issue when
the sensors that can acquire enormous amounts of real-time data become so very
inexpensive, e.g. infrared imagers attached to smartphones. As a light-hearted example of how this could go, we describe using infrared cameras in a security system for
ghost detection. It is important to be skeptical of untested and untestable claims of
machine learning systems.
Keywords Infrared imaging · Machine learning
Most camera-based home security systems don’t have sufficient sensitivity to pick
up the presence of ghosts and other apparitions. Even the very expensive invisible
laser systems that are used in museums and bank vaults are no help. Ghosts can pass
through walls at will, so locks on the doors and window sensors provide no protection
to pernicious hauntings. Our system is built around two new technologies that make
it superior to everything else on the market. The first is infrared thermography, a sort
of camera that sees heat variations to a sensitivity of 1/1000th of a degree [1–8].
These cameras came to the public’s attention during the first Gulf War because the
Nighthawk Stealth Fighter used them to sneak undetected into downtown Baghdad
and drop bombs on Saddam’s key military installations. It’s different from the greenish night vision imagery where available moonlight is simply amplified. Infrared is a
sort of invisible light that anything which is even a little bit warm gives off. Contrast
comes from even very tiny temperature variations where, for example, the engine of
a vehicle or the body heat of a person is shown as “white hot” compared to the colder
surroundings. Maintenance engineers can use these infrared cameras to take a simple
M. K. Hinders (B)
Department of Applied Science, William & Mary, Williamsburg, VA, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_5
173
174
M. K. Hinders
snapshot of an electrical breaker box and tell by the infrared heat signature whether
any of the circuits are overheating just a bit and ready to short out. Building inspectors can walk around the outside of a new building and take infrared video of the
outside walls. Any places where the insulation is letting heat (or cold) leak out show
up easily as hot (or cold) spots. Veterinarians are starting to use infrared cameras to
detect inflammation in horses’ legs, since the place where the horse is being caused
pain will have locally increased blood flow and will hence be a little hotter than other
places. Some doctors are even using infrared thermography to detect breast cancer,
since tumors stimulate the growth of new blood vessels to feed themselves that is
called angiogenesis. This also shows up as a hot spot, and most importantly breast
thermography is done without having to squeeze the breast flat like is required in
X-ray mammography.
A few years ago these IR cameras started showing up as snap-on attachments to
smartphones, e.g., FLIR One, which means that the cost of the technology is now
low enough that it can be incorporated into residential security systems [9] and for
some applications we actually do use dynamic wavelet fingerprints [10].
It’s well known that a variety of ghostly presences can be felt by the hauntee as
a localized cold spot (Fig. 5.1), although rarely poltergeists and other hyperactive
apparitions will manifest as a very subtle transient heat flash. Our security systems
employ very sensitive infrared cameras to watch for anomalous hot or cold spots, and
then track them using sophisticated computer processing of the imagery. Artificial
Intelligence (AI)-based image processing is the second new technology that sets our
system apart from all competitors. It turns out that having a human watch infrared
imagery for ghostly hot and cold patterns doesn’t work. Ghosts are somehow able
to modulate and morph their temperature signatures as they move in a way such that
muggles will almost always miss it. In our many years of research and development,
we have demonstrated scientifically that humans are only able to recognize such
transients using peripheral vision. Our studies have also proven conclusively that no
human can be expected to concentrate for extended periods using only their peripheral
vision. We’ve had testees able to do it for 8 or 10 minutes at a stretch, but of course
you never know when the haunting is going to happen. We have mimicked human
peripheral vision in our AI computer system and then quite obviously make use of
the computer’s ability to remain ever vigilant.
In our system, each room is outfitted with a small camera unit which can be
built into either the ceiling or a light fixture. Only the 2 cm diameter hemispherical
omni-directional fisheye lens actually needs to protrude into the room, and each lens
has a full 360◦ field of view. One camera per room covers an entire house with no
blind spots and no moving parts. Traditional cameras with their focused directional
lenses either have to mechanically pan about the room or you have to have several
cameras looking in different directions in order to get full coverage. Our computer
image processors automatically undistort the images at up to several hundred frames
per second, so that even the quickest of little beasties (ghosts, vermin, whatever) are
captured without motion blurring.
Moreover, once an event is detected we pass those image files to a second AI
module based on a dynamically adaptive artificial neural network (ANN) that uses
5 Spectral Intermezzo: Spirit Security Systems
175
Fig. 5.1 The second oldest building at William & Mary is the Brafferton, constructed with funds
from the estate of Robert Boyle, the famous English scientist, to house William & Mary’s Indian
School. The royal charter of 1693 that established W&M stated as one of its goals “that the Christian
faith may be propagated amongst the Western Indians, to the glory of Almighty God....” Over
30 years later, William & Mary Statutes reaffirmed the mission to “teach the Indian boys to read,
and write, and vulgar Arithmetick.... to teach them thoroughly the Catechism and the Principles of
the Christian Religion.” The only one of the university’s three colonial buildings to have escaped the
ravages of fire, the Brafferton nonetheless suffered an almost complete loss of its interior during the
Civil War, when the doors and much of the flooring were removed and used for firewood. The window
frames and sash are said to have been removed and used in quarters for the Union officers at Fort
Magruder. The exterior brick walls of the Brafferton are, however, the most substantially original
of the university’s three colonial buildings. The exterior was restored to its colonial appearance in
1932 as part of the Rockefeller Restoration of Williamsburg, and today the Brafferton houses the
offices of the president and provost of the university. It’s a bit haunted: http://flathatnews.com/2013/
10/17/braffertons-running-boy/
wavelet fingerprints to classify the apparition into one of more than a hundred object
classes according to their morphing, movements, etc. It then further categorizes the
signals using machine learning techniques to identify the particular ghostly presence
based on behavior patterns recorded in its database of past sightings. The system
is also in continuous Internet contact with our central infrared imagery archival
repository so that those ghosts who haunt more than one spatial and/or temporal
locality can be tracked over both space and time.
A side benefit of our system is that living pets and humans and even mammalian
vermin show up very strongly: a mouse gives a signal about ten thousand times as
strong as a typical poltergeist. A black cat in a dark room is even stronger. Our
cameras are so sensitive that we can easily track the minute heat print left over on
176
M. K. Hinders
Fig. 5.2 IR selfie taken with
a FLIRone. Human body
heat shows the bright colors.
Cool orb shows as darker
shades (https://www.flir.
com)
the floor from the cat’s paws. All human presences (Fig. 5.2) are tracked and noted
with enough detail so that even the stealthiest cat burglar would be immediately
identified and an alarm sounded. Last spring one of our customers had a problem
with a raccoon coming in through their doggy door and scarfing down all the pet food.
They were initially amused when the intruder was identified, but then subsequently
quite concerned about whether the raccoon might be rabid. Review of the imagery
by our experts showed conclusively that the body temperature of the raccoon was
in the normal range and the animal was healthy, even though it seemed particularly
addicted to kibbles and after a few more months would probably have been too fat
to fit through the doggy door.
I mentioned mice previously, but didn’t explain quite what our system can do.
Rats and squirrels are warm blooded, so they show up very strongly. Our system
automatically detects, counts, and tracks such critters, of course, but it can also
spot a dray of squirrel babies nesting inside of walls because their body heat comes
through the sheetrock or plaster or paneling or whatever. That allows exterminators
to target their poison/traps and thus use the smallest quantity while being assured of
complete pest eradication. Not leaving pizza lying around helps also.
5 Spectral Intermezzo: Spirit Security Systems
177
Now here’s the really cool part about our paranormal security system: countermeasures. Having an alert to the presence of a ghost is, of course, very useful but only
in the movies are there ghostbusters that you can call to come and take the apparitions
away. Also, if you remember from those movies that trapping and storing the ghosts
didn’t actually work out all that well for the Big Apple. I guess I should explain what
ghosts actually are, because it’s key to our apparition remediation program. When
we die, that part of us that Deepak Chopra would call “information and energy”
separates from our physical body and slips off into a different dimension. All of the
various religions describe this otherworldly dimension in some form or another and
each religion has its own explanation for what happens there and why. Of course,
they all agree that being a “good” person makes your subsequent extra-dimensional
experience much more pleasant.
Ghosts are simply stranded bundles of information and energy that for some reason
don’t make the transition to “the other side” as they are supposed to. Dying isn’t so
bad I suppose, considering that your fate is precisely what you deserve to get. Eternal
damnation and hellfire isn’t exactly something you aspire to, but at least over the
eons you might just come to accept that it’s your own damn fault.
Nobody deserves to get stuck in between life and death, though, especially since
you aren’t actually anybody while in that state because you have no body. You are
a nobody, but with full memory of the life you just finished living along with some
inkling of what fate has waiting for you just across the chasm. People who think ghosts
are able to come and go from “the other side” at will are very much mistaken. Ghosts
can’t actually tell you much of anything because they haven’t really gone there yet.
Nothing pisses a ghost off more than to be asked questions about what it’s like being
dead and whether so-and-so is doing OK there. Probably the best analogy is that Tom
Hanks movie “The Terminal” based on the true story of a guy who got trapped at
JFK airport when his country’s government was overthrown and customs declared
his visa invalid. He couldn’t go home because his passport was from a country that
no longer existed, and he couldn’t walk out the door of the airport terminal to New
York because he didn’t have a visa for that either. He lived at the airport terminal for
years in and among the other travelers and only occasionally glimpsing America out
the doors and windows. Initially, he didn’t even speak any English, so he could barely
communicate with the airport travelers or customs officials and he had to wash as
best he could in the restrooms, sleep on the chairs in quiet parts of the terminal, and
return rental baggage carts to the dispenser to get enough money to eat something at
the food court. Being a ghost must really suck.
What ghosts really want to do is to get on with their travels, and we have figured
out a way to get them their visa, so to speak. First I have to explain a bit about the
electromagnetic spectrum. The light that we see with our eyes is an electromagnetic
wave. It has reflected off of objects with some portion being absorbed, which is what
we see as colors. Something that looks green to us is something that absorbs all
the other colors of the rainbow but reflects the green part of the spectrum. What’s
a little weird is that there are more colors in the rainbow than the human eye can
see. Infrared cameras merely see electromagnetic waves that are “too red” for our
eyes. At the other extreme is ultraviolet, which you might call “extra purple” because
178
M. K. Hinders
it’s more violet than our eyes can see. There are others that are invisible to us, but
that we know a lot about. Beyond ultraviolet are X-rays and then gamma rays. In the
other direction, beyond infrared are microwaves and radiowaves. You probably never
thought about it this way, but the different radio stations are all simply broadcast as
different colors of radiowaves and the tuner crystal in a radio is just a sort of prism
that separates out the different colors.
Here’s what we do to give ghosts a nudge. We record their characteristic signatures
in the infrared part of the spectrum using the infrared cameras, and then flip that signal
over (what is technically called a frequency inversion) to get the corresponding mirror
signal in the ultraviolet. Since ultraviolet is a higher energy part of the electromagnetic
spectrum than is the infrared, projecting the ultraviolet image back onto the ghost
gives it a gentle but firm push across the divide to where it’s supposed to have been
all along. I should have said that ultraviolet is what most people call black light and
is perfectly safe. It’s invisible, so you only see it when it makes your psychedelic felt
poster glow in that groovy manner hippies liked so much in the 70s.
Although it only takes an instant to nudge a ghost across the chasm and the effect
is permanent, in most cases, it’s first necessary to record the sequential hauntings by
an individual apparition for several dozen visits. Sometimes an individual ghost will
be so lethargic that enough information can be acquired from a single visit worth of
infrared imagery for the AI system to calculate the correct ultraviolet inversion profile, but most hauntings are surprisingly fleeting and you have to wait through almost
a month’s worth of nightly visits. The “extra purple” inversion image projectors aren’t
usually installed permanently, but are instead temporarily placed by our expert technicians in the ectoplasmic focal spot identified from the ghost’s demonstrated pattern
of behavior in the infrared imagery. Once the ghost has been dispatched the haunting
will stop and the unit is discarded much like old-fashioned one-time-use flashbulbs.
The IR security system will continue to keep vigilant for other ghosts or mice or cat
burglars.
Afterlife Afterward
I hope you figured out that, of course, this is all pseudoscientific bullsh*t intended to
sound scientific. There’s no such thing as ghosts, despite competing ghost tours in
places like Colonial Williamsburg. All the most famous haunted houses have been
naked frauds, sorry. In Scooby Doo, it was always someone pretending to be a ghost
or a ghoul in order to make money off the gullible. I included this little intermezzo to
make the point that machine learning seems like magic to a lot of people who don’t
have the necessary mathematical background to provide pushback to unscrupulous
salesreps spouting pseudoscientific bullsh*t intended to sound scientific.
5 Spectral Intermezzo: Spirit Security Systems
179
References
1. Michael Vollmer, Klaus-Peter Mollmann (2018) Infrared thermal imaging: fundamentals,
research, and applications, 2nd edn. Wiley-VCH, Weinheim
2. Meola C (2018) Infrared thermography: recent advances and future trends. Bentham Science
Publishers, Sharjah
3. Wong WK, Tan PN, Loo CK, Lim WS (2009) An effective surveillance system using thermal
camera. In: 2009 international conference on signal acquisition and processing
4. Gerald C (2000) Holst, common sense approach to thermal imaging. JCD Publishing
5. Kecskes I, Engle E, Wolfe CM, Thompson G (2017) Low-cost panoramic infrared surveillance
system. In: Proceedings of SPIE, vol 10178
6. Carlo C (2012) Review article – infrared: a key technology for security systems. Hindawi
Publishing Corporation
7. Gade R, Moeslund TB (2014) Thermal cameras and applications: a survey, machine vision and
applications. Spring, Berlin
8. Tan TF, Toeh SS, Fow JE, Yen KS (2016) Embedded human detection system based on thermal
and infrared sensors for anti-poaching application. In: 2016 IEEE conference on systems,
process, and control
9. Trujillo VE, Hinders MK (2019) Container monitoring with infrared catadioptric imaging and
automatic intruder detection. SN Appl Sci 1:1680. https://doi.org/10.1007/s42452-019-1721-8
10. Victor E, Trujillo II (2019) Global shipping container monitoring using machine learning with
multi-sensor hubs and infrared catadioptric imaging. William and Mary, Department of Applied
Science Doctoral Dissertation
Chapter 6
Classification of Lamb Wave
Tomographic Rays in Pipes to
Distinguish Through Holes from Gouges
Crystal B. Acosta and Mark K. Hinders
Abstract Guided ultrasonic waves are often used in structural health monitoring
because the dispersive modal behavior results in differing arrival time shifts for
modes traveling through flaws. One implementation of Lamb waves for structural
health monitoring is helical ultrasound tomography (HUT) in which an array of transducers scan over the perimeter of the sample resulting in a reconstructed image that
accurately depicts the presence of flaws in plates and plate-like objects like pipes.
However, these images cannot predict the type or severity of the flaw, which are
needed in order to schedule maintenance for the structure. We describe the incorporation of pattern classification techniques to Lamb wave tomographic reconstructions
of pipes in order to predict flaw type. The features used were extracted using image
recognition techniques on dynamic wavelet fingerprint (DWFP) transforms of the
unfiltered ray paths. Features were selected at the anticipated mode arrival times for
tomographic reconstructions of a pipe with two different flaw types. The application
of support vector machines (SVM) in a multiclass one-versus-one method resulted
in a direct comparison of predicted and known flaw types.
Keywords Helical ultrasound tomography · Pipe flaw classification · Wavelet
fingerprint
6.1 Introduction
Structural health monitoring involves the application of sensor technologies to identify and locate defects that develop over time. The addition of pattern recognition
techniques to structural health monitoring may help to minimize false positives and
false negatives [1]. The feature extraction process used to distinguish between damaged and undamaged samples for specific applications varies, including time-series
analysis [2–4], energy [5, 6], Fourier transforms [7], wavelet energy [8, 9], novelty
C. B. Acosta · M. K. Hinders (B)
Department of Applied Science, William & Mary, Williamsburg, VA, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_6
181
182
C. B. Acosta and M. K. Hinders
Fig. 6.1 A tomographic
reconstruction of the pipe
under study. The gouge flaw
shows at the top of the 2D
image, while the
through-hole is visible in the
middle of the image
detection [10, 11], and principal component expansion of spectra [12]. Many of
these are in situ experiments using classifiers that include discriminants, k-nearestneighbor, support vector machines, and neural networks. The classifiers are usually
able to discriminate between specific flaws in the structure. However, these sensing
techniques are one dimensional, so that a single interrogation of a region of the
structure is associated with a single decision of whether or not a flaw exists in that
region.
Tomography, on the other hand, is two dimensional, which means the sensors
gather data from multiple directions over an area of the structure. Lamb wave tomography has previously been used to detect flaws in plates and locally plate-like structures such as pipes [13–17]. In the process of helical ultrasound tomography (HUT),
two parallel transducer arrays are installed around the perimeter of the pipe and
guided waves travel between every pair of transducers in the arrays [18]. In the laboratory, the two arrays of transducers are approximated by two single transducers that
are moved by motors around the perimeter of the area being scanned. The result of
the HUT scan is an array of recorded ultrasonic waveforms for each position of the
pair of transducers. For all the possible positions of the transmitting and receiving
transducers, the result of the tomographic reconstruction is a 2D image where each
pixel relates to the local slowness of that ray path. In this way, flaws in the sample
under study are localized. However, tomographic reconstructions are not always possible because producing accurate images requires many ray paths and access to the
structure in question may be limited. In addition, the tomographic reconstructions
cannot always predict the cause or type of flaw.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
183
Our research involves applying pattern recognition techniques to Lamb waves
generated from a tomographic scan of an aluminum pipe. The application of pattern
classification to tomography will be able to identify the source of flaws as well as
their location using a limited number of ray paths. Figure 6.1 shows a tomographic
reconstruction of the pipe used in this study. This pipe had two flaws, an internal
gouge flaw where the transducers began scanning and appears at the top of the image,
as well as a through hole that appears in the middle of the image. The hole flaw
could be misinterpreted as a circular artifact due to the tomographic reconstruction
process. These artifacts are common [19]. The addition of pattern classification to this
technology will better identify the hole flaw, reducing the risk identifying artifacts
as flaws.
There are several instances of pattern recognition studies of pipes in the literature.
Ravanbod [20] used ultrasonic pipeline inspection gauges (PIGs) commonly used
for inspection of oil pipelines. Neural network decision was made using fuzzy logic
in simulated and real data with features to accurately detect the location and type
of external and internal flaws, in addition to producing an image of the predicted
flaw. Zhao et al. [21] used electromagnetic transducers mounted on a PIG to inspect
several steel pipes with artificial flaws introduced. Features were extracted through
time-domain correlation analysis, selected with principal component analysis, and
classified using discriminant analysis. Flaws of two different types and at least 2.5 mm
deep were successfully identified with this method. Lastly, Sun et al. [22] applied
fuzzy pattern recognition to detect weld defects in steel pipes using X-rays. However,
to our knowledge there are no instances of applying pattern classification to helical
ultrasound tomography in pipes for defect characterization.
6.2 Theory
The structural acoustics of a pipe is commonly regarded as corresponding to Lamb
waves in an unrolled plate [23]. Therefore, the theory of Lamb waves propagating
in a pipe can be analogously developed by examining Lamb waves in plates.
Lamb waves occur in a plate or other solid layer with free boundaries in which
the elastic waves propagate both perpendicularly and within the plane of the plate
[24]. The solution to the free plate problem yields symmetric and antisymmetric
modes. The phase velocity (c p ) and group velocity (cg ) of Lamb waves in aluminum
versus the frequency-thickness product f · d are shown in Fig. 6.2 for the first two
symmetric (S0, S1) and antisymmetric (A0, A1) modes. This modal behavior of
Lamb waves illustrates their usefulness for NDE inspections. As the Lamb waves
propagate along the plate, if the plate has a flaw, such as a gouge or corrosion, then
the thickness of the plate is reduced at that point. Since the thickness changes, the
frequency-thickness product f · d in Fig. 6.2 changes, resulting in a different arrival
time of the mode [25]. Depending on the frequency of the ultrasonic transducer used
as well as the material and thickness of the plate, it is often sufficient to detect flaws
by measuring the arrival time of the mode. If the experimentally detected arrival time
184
C. B. Acosta and M. K. Hinders
Fig. 6.2 The solution to the Rayleigh–Lamb equations for the first two symmetric and antisymmetric modes of cg and c p in an aluminum plate are plotted here. Frequency-thickness product
( f · d) is plotted on the abscissa versus phase and group velocity, respectively. The value of f · d
used in the experiments is indicated with a line. Note that four modes are present at the selected
f · d value, with S1 appearing first since its velocity is higher. Even a small change of thickness at
this f · d value will result in a larger change in mode arrival time.eps
of the mode differs from its anticipated value, a flaw can be suspected. Changing the
transducer frequency can allow for improved detection capabilities since different
frequency-thickness products f · d change the expected arrival times at different
rates.
The application of pattern classification to tomographic Lamb wave scans of a
pipe includes the following steps:
1. Sensing: Helical ultrasound tomography is applied to an aluminum pipe with
known flaws, resulting in a 2D array of recorded Lamb waveforms.
2. Feature extraction: An integral transformation, the Dynamic Wavelet Fingerprint
(DWFP), is computed on the waveforms. The transformation maps the 1D waveform to a 2D binary image. Image recognition techniques measure properties of
those fingerprints over time.
3. Feature selection: Fingerprint properties are chosen at the anticipated mode
arrival times for tomographic scans performed at different transducer frequencies.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
185
4. Classification: The dataset is split via a resampling algorithm into training and
testing subsets. The classifier is trained on the training set and tested with the
testing set.
5. Decision: The predicted class of the testing set is finalized and the performance
of the classifier is evaluated.
Formally [26], consider a two-class problem with labels ω1 , ω2 with probabilities
of each class occuring is given by p(ω1 ), p(ω2 ). Now consider a feature vector x,
which is the vector of measurements made by the sensing apparatus for one ray path.
Then x is assigned to class ω j whenever
p(ω j |x) > p(ωk |x), k = j.
(6.1)
By using Bayes theorem, Eq. (6.1) can be rewritten as
p(x|ω j ) p(ω j ) > p(x|ωk ) p(ωk ), k = j.
(6.2)
In this way, the sensed object associated with the feature vector x is assigned to the
class ω j with the highest likelihood. In practice, there are several feature vectors xi ,
each with an associated class label wi taking on the value of one of the ω j .
Classification generally involves calculating those posterior probabilities p(ωi |x)
using some mapping. If N represents the number of objects to be classified and M
is the number of features in the feature vector x, then pattern classification will be
performed on the features xi that have an associated array of class labels wi that take
on values ω1 , ω2 , ω3 for i = 1, . . . , N . The most useful classifier for this dataset was
support vector machines (SVM) [27].
6.3 Method
6.3.1 Apparatus
The experimental apparatus is shown in Fig. 6.3a. In operating the scanner, the transmitting transducer remains fixed while the receiving transducer steps by motor along
the circumference of the pipe until it returns to the starting position. The transmitting
transducer indexes by one unit step, and the process repeats. This process gives a
dataset with helical criss-cross propagation paths which allows for mode-slowness
reconstruction via helical ultrasound tomography.
In this study, tomographic scans of a single aluminum pipe were used (Fig. 6.1).
The pipe was 4 mm thick with a circumference of 19 inches, and the transducers
were placed 12.25 in apart. Tapered delay lines were used between the transducer
face and the pipe surface. Two different kinds of flaws were introduced into the
pipe: a shallow, interior-surface gouge flaw approximately 3.5 cm in diameter, and
186
C. B. Acosta and M. K. Hinders
Fig. 6.3 a The experimental apparatus, in which motors drive a transmitting and receiving transducer around the perimeter of the pipe, and b an illustration of the ray path wrapping of the “unrolled”
pipe. In (b), the shortest ray path for the same transducer positions is actually the one that wraps
across the 2D boundary
a through-hole 3/8 in in diameter. The positions of the flaws introduced to the pipe
can be seen in Fig. 6.1, where the geometry of the figure corresponds to the unrolled
pipe. There were 180 steps of each of the transducers used. Multiple scans of the pipe
were performed while the frequency of the transducer ranged from 0.8 − 0.89 MHz
in units of 0.1 Mhz. For classification, only three of these frequencies were selected:
0.8 Mhz, 0.84 MHz, and 0.89 MHz. The pipe was scanned under these conditions,
then the hole was increased to 1/2 in diameter, and the process was repeated.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
187
6.3.2 Ray Path Selection
In order to perform classification on the ray paths involved, we first calculated which
transducer and receiver index positions, called i 1 and i 2 , respectively, correspond to
a ray path intersecting one of the flaws. To this end, the physical specifications of
the pipe are mapped to that of a plate, with the positions of the transmitting and
receiving transducers on the left- and right-hand sides. Figure 6.3b demonstrates the
unwrapping of the pipe into 2D geometry and shows that some transmitting and
receiving transducers result in a ray path that wraps around the boundary of the 2D
space.
The equation for the distance the Lamb wave travels on the surface of the pipe
can be derived from the geometry as shown in Fig. 6.4 and is given by [16]
s=
L 2 + a2 =
L 2 + γ 2r 2 ,
where L is the axial distance between the transducers, a is the arc length distended
by the axial and actual distance between the transducers, and r is the radius of the
pipe. The variable γ is the smallest angle between the transducers,
γ = min {(φ1 − φ2 + 2π ), |φ1 − φ2 |, (φ1 − φ2 − 2π )} ,
where φ1 and φ2 are the respective angles of the transducers. The transmitting and
receiving transducers have indices represented by i 1 and i 2 , respectively, so that
i 1 , i 2 = 1, . . . , 180. Then if L is the distance between the transducers, and Y is the
circumference of the pipe, both measured in centimeters, then the abscissa position
of the transducers is x1 = 0 and x2 = L, and the radius of the pipe is given by
r = Y/(2π ). The indices can be converted to angles using the fact that there are 180
transducer positions in one full rotation around the pipe. This gives
Fig. 6.4 The derivation of the equation for the distance a ray path travels on the surface of a pipe
is shown, where L is the axial distance between the transducer, s is the distance traveled by the ray
path, and a is the arc length distended by the axial and actual distance between the transducers
188
C. B. Acosta and M. K. Hinders
φ1 = i 1
2π
180
and similarly for φ2 . Substituting these into the expression for γ , the minimum angle
is
2π
min {(i 1 − i 2 + 180), |i 1 − i 2 |, (i 1 − i 2 − 180)}
(6.3)
γ =
180
and the axial distance between the transducers is already given as f z = L. Then substitution into Eqn. (6.3) yields the helical ray path distance between the transducers.
The positions of the flaws were added to the simulation space, and ray paths were
drawn between the transducers using coordinates (x1 , y1 ) and (x2 , y2 ) that depend
on i 1 and i 2 , as described above. If the ray path intersected any of the flaws, that ray
path was recorded to have a class label corresponding to that type of flaw. The labels
included no flaw encountered (ωi = 1), gouge flaw (ωi = 2), and hole flaw (ωi = 3).
For the identification of class labels, the flaws were approximated as roughly octagonal in shape, and the ray paths were approximated as lines with a width determined
by the smallest pixel size, which is 0.1 · (Y/180) = 0.2681 mm. In reality, Lamb
waves have some horizontal spread. These aspects of the ray path simulation may
result in some mislabeled ray paths.
6.4 Classification
As already mentioned, classification will be performed on ray paths with a limitation
on the distance between the transducers. In preliminary tests, it was noted that when
all the distances were considered, many of the ray paths that actually intersected
the hole flaw were labeled as intersecting no flaws. The explanation is that Lamb
waves tend to scatter from hole flaws [28], which means no features will indicate a
reduction in thickness of the plate, but the distance traveled will be longer. Therefore,
limiting the classification on ray paths of a certain distance improve the classification
by reducing the influence of scattering effects.
6.4.1 Feature Extraction
Let D represents the distance limit. The next step of classification is to find Mmany features for feature vectors xi such that si ≤ D, i = 1, .., N . Feature extraction
used dynamic wavelet fingerprinting (DWFP), while feature selection involved either
selecting points at the relevant mode arrival times for a tomographic scan using a
single transducer frequency, or selecting points at only one or two mode arrival times
for three different transducer frequencies at once.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
189
6.4.2 DWFP
The DWFP technique (Fig. 6.5) applies a wavelet transform on the original timedomain waveform, which results in “loop” features that resemble fingerprints. It has
previously shown promise for a variety of applications including an ultrasonographic
periodontal probe [29–32], detection of ultrasonic echoes in thin multilayered structures [33], and structural monitoring with Lamb waves [17, 34–36].
The Lamb wave tomographic waveforms were fingerprinted using the DWFP
algorithm without any preprocessing or filtering. Let φi (t) represent a waveform
selected from a Lamb wave tomographic scan (i = 1, . . . , N ). The first step of the
DWFP (Fig. 6.5a, b) involves applying a wavelet transform on each of the waveforms.
The continuous wavelet transform can be written as
+∞
φ(t)ψa,b (t)dt.
(6.4)
C(a, b) =
−∞
Here, φ(t) represents a square-integrable 1D function, where we are assuming φ(t) =
φi (t), and ψ(t) represents the mother wavelet. The mother wavelet is transformed
in time (t) and scaled in frequency ( f ) using a, b ∈ R, respectively, where a ∝ f
and b ∝ t, in order to form the ψa,b (t) in Eq. (6.4). The wavelet transform on a
single waveform (Fig. 6.5a) results in wavelet coefficients (Fig. 6.5b). Then, a slicing
algorithm is applied to create an image analogous to the gradient of the wavelet
coefficients in the time-scale plane, resulting in a binary image, I (a, b). The mother
wavelets selected were those that previously showed promise for similar applications,
including Debauchies 3 (db3) and Symelet 5 (sym5). The resulting image I contains
fingerprint-like binary contours of the initial waveform φi (t).
The next step is to apply image processing routines to collect properties from
each fingerprint object in each waveform. First, the binary image I is labeled with
the 8-connected objects, allowing each individual fingerprint in I to be recognized as
a separate object using the procedure in Haralick and Shapiro [37]. Next, properties
Fig. 6.5 The DWFP technique begins with (a) the ultrasonic signal, where it generates (b) wavelet
coefficients indexed by time and scale, where scale is related to frequency. Then the coefficients
are sliced and projected onto the time-scale plane in an operation similar to a gradient, resulting in
(c) a binary image that is used to select features for the pattern classification algorithm
190
C. B. Acosta and M. K. Hinders
are measured from each fingerprint. Some of these properties include counting the
on- and off-pixels in the region, but many involve finding an ellipse matching the
second moments of the fingerprint and measuring properties of that ellipse such as
eccentricity. In addition to the orientation measure provided by the ellipse, another
measurement of inclination relative to the horizontal axis was determined by Horn’s
method for a continuous 2D object [38]. Lastly, further properties were measured by
determining the boundary of the fingerprint and fitting 2nd or 4th order polynomials.
The image processing routines result in fingerprint properties Fi,ν [t] relative to
the original waveform φi (t), where ν represents an index of the image processingextracted fingerprint properties (ν = 1, . . . , 17). These properties are discrete in time
because the values of the properties are matched to the time value of the fingerprint’s
center of mass. Linear interpolation yields a smoothed array of property values,
Fi,ν (t).
6.4.3 Feature Selection
The DWFP algorithm results in a 2D array of fingerprint features for each waveform,
while only a 1D array of features can be used for classification. For each waveform
φi (t), the feature selection procedure finds M-many features xi, j , j ≤ M from the
2D array of wavelet fingerprint features Fi,ν (t). In this case, the DWFP features
selected were wavelet fingerprint features that occurred at the predicted mode arrival
times for all fingerprint features under study. At the frequency-thickness product
used, there were are four modes available: S0, S1, A0, and A1. However, as Fig. 6.6
Expected arrival times from tomographic scan at i1=45, i2=135
1500
1000
500
0
Waveform
S0
S1
A0
A1
−500
−1000
−1500
0
1000
2000
3000
4000
5000
Samples
Fig. 6.6 A sample waveform for a single pair of transducer positions (i 1 , i 2 ) = (45, 135) is shown
here along with the predicted Lamb wave mode arrival times
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
191
shows, the S1 arrival time often occurs early in the signal, and there may not always
be DWFP fingerprint feature available. In addition, because there were several different transducer frequencies studied, in order that the number of features remained
manageable, there were two different feature selection schemes used.
1. All modes: All four mode arrival times for all 17 fingerprint properties from both
mother wavelets are used, but only one transducer frequency is studied from the
range {0.8, 0.84, 0.89} MHz. There are M = 136 features selected.
2. All frequencies: One or two mode arrival times for all 17 fingerprint properties
from both mother wavelets are used for all three frequencies at once. The modes
used include S0, A0, A1, in which case there are M = 102 features used. There
were also combinations of S0& A0, S0& A1, and A0& A1 mode arrival times
used for all properties, frequencies, and mother wavelets, in which case there
were M = 204 features selected.
The class imbalance problem must be considered for this dataset [39]. The natural
class distribution for the full tomographic scan has a great majority of ray paths
intersecting with no flaws. In this case, only 3% of the data intersects with the hole
flaw, and 10.5% intersects with the gouge flaw. These are poor statistics to build a
classifier. Instead, ray paths were randomly selected from the no flaw cases to be
included for classification so that |ω1 |/|ω2 | = 2. In the resulting class distribution
used for classification, 9% of the ray paths intersect with the hole flaw and 30%
intersect with the gouge flaw, so that the number of ray paths used for classification is
reduced from N = 32, 400 to N = 11, 274 for all ray path distances. One advantage
of limiting the ω1 cases is so that classification can proceed more rapidly. Randomly
selecting the ω1 cases to be used does not adversely affect the results, and later, the
ω1 ray paths not chosen for classification will be used to test the pipe flaw detection
algorithm.
6.4.4 Summary of Classification Variables
The list below provides a summary of the variables involved in the classification
design.
1. The same pipe was scanned twice, once when the hole was 3/8 in diameter, and
again when the hole was enlarged to 1/2 in diameter.
2. The pipe had two different flaws, a gouge flaw (ω2 ) and a hole flaw (ω3 ). The ray
paths that intersected no flaw (ω1 ) were also noted.
3. A range of transducer frequencies was used, including 0.8 − 0.89MHz. For classification, only three of these frequencies were selected: 0.8MHz, 0.84MHz, and
0.89MHz.
4. Classification was restricted by ray paths that had a maximum path length D such
that s ≤ D. There were 91 different path lengths for all transducer positions. Three
different values of D were selected to limit the ray paths selected for classification.
192
5.
6.
7.
8.
9.
C. B. Acosta and M. K. Hinders
These correspond to the 10th, 20th, and 91st path length distances, or D10 = 31.21
cm, D20 = 31.53 cm, and D91 = 39.38 cm. The latter case considers all the ray
path distances.
For the feature extraction, two different wavelets were used (db3 and sym5), and
17 different wavelet fingerprint properties were extracted.
Two different feature selection schemes were attempted, varying either the modes
selected or the frequencies used.
One classifier was selected here (SVM). Other classifiers (such as quadratic discriminant) had lower accuracy.
Classification will be performed on training and testing datasets drawn from each
individual tomographic scan using the reduced class distribution. The resulting
flaw detection algorithm will be tested with the ω1 ray paths that were excluded
from the distribution used for classification.
The classification tests on the 1/2 in hole tomographic scan will use the tomographic scan of the 3/8 in hole solely for training the classifier and the 1/2 in
hole raypaths solely for testing the classifier. This does not substantially alter the
expression of the classification routine in Table 6.1
6.4.5 Sampling
The SVM classifier was used on the classifier configured by the options listed above in
Sect. 6.4.4. However, SVM is a binary classifier, and three classes were considered in
this study. Therefore, the one-versus-one approach was used, in which pairs of classes
are compared at one time for classification, and the remaining class is ignored [40].
The process is repeated until all permutations of the available classes are considered.
In this case, classification compared ω1 versus ω2 , ω1 versus ω3 , and ω2 versus ω3 .
For each pair of binary classes, the training and testing sets were split via bagging by
randomly selecting roughly twice as many samples from the more highly populated
class as the less populated class for training the SVM classifier and splitting those
sets in half for training and testing. The process is repeated until each ray path
has been selected several times for training. The results are collapsed by majority
rule normalized by the number of samples drawn in the bagging process. Table 6.1
displays pseudocode representing the sampling algorithm that splits the data into
training and testing sets and the means by which the majority vote is decided.
6.5 Decision
As Table 6.1 shows, a single configuration of the classifier variables C (described in
Sect. 6.4.4) takes an array of feature vectors xi , i = 1, . . . , N and their corresponding
class labels wi ∈ {ω1 , ω2 , ω3 } and produces an array of predicted class labels λi after
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
193
Table 6.1 The sampling algorithm used to split the data into training and testing sets for SVM
classification is described here. It is similar to bagging
dimensionality reduction. Each index i corresponds to a single ray path between the
two transducer indices. The classifier performance can be evaluated by measuring
its accuracy, which is defined as
A(ωk ) =
|(wi = ωk )&(λi = ωk )|
, i = 1, . . . , N ; k = 1, 2, 3.
|(wi = ωk )|
(6.5)
However, a further step is required in order to determine whether or not there are in
fact any flaws present in the pipe scanned by Lamb wave tomography. The predicted
labels of the ray paths are not sufficient to decide whether or not there are flaws.
Therefore, we will use the ray path drawing algorithm described in Sect. 6.3.2 to
superimpose the ray paths that receive predicted labels λi in each class ω1 , ω2 , ω3 . If
several ray paths intersect on a pixel, their value is added, so the higher the value of
the pixel, the more ray path intersections occur at that point. This technique averages
out the misclassifications that occur in the predicted class labels. The ray paths for
194
C. B. Acosta and M. K. Hinders
a)
b)
c)
0
0
5
5
10
10
15
15
15
20
20
20
25
C [cm]
5
10
C [cm]
C [cm]
0
25
25
30
30
30
35
35
35
40
40
40
45
45
0
10
20
L [cm]
30
15
10
5
45
0
0
10
20
L [cm]
4
2
0
30
8
6
10
20
L [cm]
4
2
30
6
Fig. 6.7 The ray paths that have received a predicted class label of (a) no flaw, (b) gouge flaw, and
(c) hole flaw are drawn here. The classifier configured here used all the available ray path distances
(D = D91 )
0
0
5
5
10
10
15
15
15
20
20
20
25
C [cm]
5
10
C [cm]
C [cm]
c)
b)
a)
0
25
25
30
30
30
35
35
35
40
40
40
45
45
45
0
10
0
1
30
20
L [cm]
2
0
0
10
30
20
L [cm]
2
4
0
10
0
1
20
L [cm]
2
30
3
Fig. 6.8 The ray paths that have received a predicted class label of (a) no flaw, (b) gouge flaw, and
(c) hole flaw are drawn here. Unlike Fig. 6.7, the ray path lengths limit used was D20
the predicted class labels associated with each flaw type are drawn separately, so that
the more ray paths that have been predicted to be associated with a particular flaw
intersect in the same region, the more likely a flaw exists at that point.
Figures 6.7 and 6.8 show the ray paths drawn for each predicted class label. Both of
these examples were configured for the 3/8 in hole and selected the A0 mode from all
three transducer frequencies. However, Fig. 6.7 classified all ray paths regardless of
length, and Fig. 6.8 restricted the ray path distances to D20 . The ray path intersections
were originally drawn at 10× resolution but the images were later smoothed by
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
195
averaging over adjacent cells to 1× resolution. The larger the pixel value, the more
ray paths intersected in that region, and the more likely a flaw exists at that point. Note
that in Fig. 6.8b, c, the pixel values are higher at the locations where their respective
flaws actually exist. Figure 6.8a also shows more intersections at that point, because
of the geometry of the pipe and the way the transducers scanned, those regions of
the pipe actually did have more ray path intersections than elsewhere. That’s why
those regions were selected to introduce flaws into the pipe. But the largest pixel
value when the ray paths predicted to intersect no flaws is smaller than the ray paths
predicted to intersect either of the other flaws. Also, Fig. 6.8a shows a higher average
pixel value, showing that more intersections occur throughout the pipe rather than
focused on a smaller region, such as Fig. 6.8b, c. However, due to the scattering of
the Lamb waves, Fig. 6.7 is not as precise. Figure 6.7a does show a higher average
pixel value, but Fig. 6.7b, c seem to both indicate hole flaws in the middle of the pipe,
and neither seem to indicate the correct location of the gouge flaw.
In order to automate the flaw detection process from these ray intersection plots,
image recognition routines and thresholds were applied. The process of automatically
detecting flaws in the image of ray path intersections (U ) include
1. Apply a threshold h I to the pixel values of the image. If |U > h I | = ∅, then no
flaws are detected. Otherwise, define U = U (U > h I )
2. Check that the size of the nonzero elements of U are smaller than half its total
area
1 U (i1, i2) < |U |.
2
i1 i2
3. Apply a threshold h a to the area of U . If i1 i2 U (i1, i2) ≤ h a , then no flaws
are detected. Otherwise, decide that U accurately represents the flaws in the
region. Return an image of U .
This algorithm is only intended to be performed on the ray paths that predicted a
flaw location. It does not tend to work well on the ray path intersections that predicted
no flaw, since those depend on the geometry of the object being scanned.
Figure 6.9 shows predicted flaw locations relative to the sample ray path intersection images given in Figs. 6.7 and 6.8. Specifically, Fig. 6.9a, b gives the predicted
flaw locations from Fig. 6.7b, c, while Fig. 6.9c, d gives the predicted flaw locations
from Fig. 6.8b, c. Note that the images produced by the classifier are designed to
accept all ray path distances (Fig. 6.9a, b) shows a flaw location that is more discrete
and closer to the size of the actual flaw, but it is not as accurate at predicting whether
or not flaws exist as the classifier designed to accept ray paths for classification
restricted by path length (Fig. 6.9c, d).
196
C. B. Acosta and M. K. Hinders
b) 1 flaws detected
a) 3 flaws detected
0
Circumference of pipe (cm)
Circumference of pipe (cm)
0
10
20
30
40
0
20
10
Length of pipe (cm)
10
20
30
40
0
30
30
d) 1 flaws detected
c) 1 flaws detected
0
Circumference of pipe (cm)
0
Circumference of pipe (cm)
20
10
Length of pipe (cm)
10
20
30
40
0
20
10
Length of pipe (cm)
30
10
20
30
40
0
20
10
Length of pipe (cm)
30
Fig. 6.9 The results of the automatic pipe flaw detector routine on the ray path intersection images
in Figs. 6.7 and 6.8 that predicted flaws are shown here. The approximate location of the flaw is
shown in the image. Here, subplots (a) and (b) show the predicted flaw locations for the gouge and
hole flaws, respectively, when all distances were used in classification. Similarly, subplots (c) and
(d) show the predicted flaw locations for the gouge and hole flaws when the ray path distance was
limited to D20
6.6 Results and Discussion
6.6.1 Accuracy
The classifier variables described in Sect. 6.4.4 were explored using the classification
routine sketched out in Table 6.1 with predicted labels collapsed majority voting. The
accuracy of the classifier will be examined before the predictions of the pipe flaw
detector are given. Each table of the accuracy results shows the classifier configuration, including two different tomographic scans of the aluminum pipe with different
hole flaw diameters, as well as the frequencies and modes selected for classification.
The classifiers were also limited by the maximum distance between the transducers.
For ease of comparison, the average accuracy over each class type ωk is computed
in the last column.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
197
Table 6.2 shows the accuracy A(ωk ) for both tomographic scans of the pipe when
only a single transducer frequency was used at a time, and therefore all the available Lamb wave modes were selected. The table is grouped by different maximum
distance lengths used in the classifier. Meanwhile, Tables 6.3, 6.4 and 6.5 show the
classifier accuracy for classifiers in which features were selected for only certain
modes but for three different values of the transducer frequency at once.
These classification results show that no classifier configuration had more than
80% average accuracy per class. Many classifiers scored lower than 50% average
accuracy. However, the detection routine does not require high accuracy classifiers,
and as already mentioned, the diffraction effect of Lamb waves around the hole
flaw could certainly explain the lower accuracy of the classifiers for that hole type.
Other patterns that emerge in the data show that the smaller values of the maximum
path length distance yield higher average accuracy by as much as 20%. In addition,
the feature selection strategy that utilizes only select modes for all three transducer
frequencies seems to work slightly better than the feature selection routine that selects
all modes for only one transducer frequency. Lastly, the tomographic scan of the
aluminum pipe with the 3/8 in hole seems to have higher accuracy than the scan of
the pipe with the 1/2 in hole. These results make sense given the fact that the 3/8 in
hole was trained and tested on unique subsets of the ray paths drawn from the 3/8 in
hole tomographic scan, which yields somewhat optimistic results. Meanwhile, the
results for the 1/2 in hole scan were tested with ray paths drawn from the 1/2 in hole
scan but trained with ray paths selected from the 3/8 in hole scan.
6.6.2 Flaw Detection Algorithm
As mentioned in Sect. 6.5, a more promising way of determining whether or not flaws
exist in the pipe is to use the ray path simulation routine described in Sect. 6.3.2.
Because the result of the pipe flaw detection routine is an image (Fig. 6.9), and
an exhaustive depiction of the resulting flaw prediction images would take up an
unnecessary amount of space, the results will be presented in qualitative format. The
pipe detection routine was tested on all of the classifier configurations and a judgment
was made about which flaw the routine was predicting.
Figure 6.10 shows some of the different ways that the flaw detector routine can
present results. Figure 6.10a shows a correctly identified flaw location for the gouge
flaw, despite the fact that the flaw detector claims the flaws drawn at the bottom of the
plot to be separate. That’s because the pipe in reality wraps the 2D plot into a cylinder,
so the regions identified as flaws at the bottom of the plot actually can be associated
with the correct location of the gouge flaw at the top of the plot. However, Fig. 6.10b
shows an incorrect assignment of an indentation flaw in the center of the plot, where
the hole flaw actually resides. Similarly, Fig. 6.10c shows a correct identification of
the hole flaw, despite the fact that the two identified flaws in the middle of the plot
do not seem connected. This is an artifact of the drawing algorithm. But Fig. 6.10d
198
C. B. Acosta and M. K. Hinders
Table 6.2 The classifier accuracy when the feature selection utilized only one transducer frequency
and all modes are shown below. The results are grouped by the maximum ray path distance selected.
The average accuracy per class is shown in the last column
Hole Dia.
[in]
Frequencies
used (MHz)
Modes used
A(ω) (%)
D (cm)
ω1
ω2
ω3
Average
3/8
0.80
S0, A0, A1
D10
77.7
62.4
72.4
70.8
3/8
0.84
S0, A0, A1
D10
78.5
67.3
42.9
62.9
3/8
0.89
S0, A0, A1
D10
70.8
77.6
69.5
72.6
1/2
0.80
S0, A0, A1
D10
78.5
43.2
22.8
48.2
1/2
0.84
S0, A0, A1
D10
84.0
10.9
30.9
41.9
1/2
0.89
S0, A0, A1
D10
52.3
62.4
39.8
51.5
3/8
0.80
S0, A0, A1
D20
70.5
50.6
62.8
61.3
3/8
0.84
S0, A0, A1
D20
73.9
59.9
41.9
58.5
3/8
0.89
S0, A0, A1
D20
68.3
70.0
58.1
65.5
1/2
0.80
S0, A0, A1
D20
75.8
47.7
23.3
48.9
1/2
0.84
S0, A0, A1
D20
93.0
21.2
14.6
42.9
1/2
0.89
S0, A0, A1
D20
52.6
64.9
34.4
50.6
3/8
0.80
S0, A0, A1
D91
93.8
14.1
27.4
45.1
3/8
0.84
S0, A0, A1
D91
93.8
14.6
19.8
42.7
3/8
0.89
S0, A0, A1
D91
91.7
17.0
28.2
45.6
1/2
0.80
S0, A0, A1
D91
73.7
42.7
16.1
44.1
1/2
0.84
S0, A0, A1
D91
82.9
35.7
7.6
42.0
1/2
0.89
S0, A0, A1
D91
41.9
61.1
15.4
39.5
shows a falsely-positive-identified region at the top of the plot predicted to be a hole
flaw where the gouge flaw actually exists. This is a failure of specificity rather than
sensitivity.
Tables 6.6, 6.7 and 6.8 show the performance of the pipe flaw detector routine. All
of the different frequencies, modes, and hole diameters are displayed in each table,
but Table 6.6 shows the results for a maximum ray path length of D = D10 used in
the classification, and likewise Tables 6.7 and 6.8 use a maximum ray path distance of
D20 and D91 , respectively. The latter case considers all possible ray path lengths. The
results are grouped in this way because the threshold values of h I and h a had to be
adjusted for each value of D. Obviously, the smaller the value of D, the fewer number
of ray paths included in the classification and the smaller the intersection between
the predicted ray paths. The last two columns show the pipe flaw detector routine
applied to ray paths with a predicted class of either gouge flaw (ω2 ) or hole flaw
(ω3 ), respectively. The type of flaw predicted by the detection routine is displayed
under those columns, including the possibility of no flaws displayed (ω1 ) or both the
gouge flaw and the indentation flaw displayed (ω2 , ω3 ). These judgments are made
according to the guidelines for false positives and true positives shown in Fig. 6.10.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
199
Table 6.3 The classifier accuracy when the feature selection used only one or two modes at once
but all transducer frequencies are shown here. The maximum ray path distance used here was D10
Hole Dia. Frequencies
Modes
A(ω) (%)
[in]
used (MHz)
used
D (cm)
ω1
ω2
ω3
Average
3/8
3/8
3/8
3/8
3/8
3/8
1/2
1/2
1/2
1/2
1/2
1/2
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
S0
A0
A1
S0, A0
S0, A1
A0, A1
S0
A0
A1
S0, A0
S0, A1
A0, A1
D10
D10
D10
D10
D10
D10
D10
D10
D10
D10
D10
D10
71.5
84.2
75.7
87.5
81.0
85.3
82.8
67.5
67.7
82.3
81.8
74.6
65.0
72.6
68.0
79.2
74.9
80.9
34.0
46.9
34.7
40.6
38.6
49.8
61.9
64.8
52.4
73.3
74.3
72.4
16.3
34.1
40.7
29.3
23.6
39.8
66.1
73.8
65.4
80.0
76.7
79.5
44.4
49.5
47.7
50.7
48.0
54.8
Table 6.4 The classifier accuracy when the feature selection used only one or two modes at once
but all transducer frequencies are shown here. The maximum ray path distance used here was D20
Hole Dia. Frequencies
Modes
A(ω) (%)
[in]
used (MHz)
used
D (cm)
ω1
ω2
ω3
Average
3/8
3/8
3/8
3/8
3/8
3/8
1/2
1/2
1/2
1/2
1/2
1/2
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
0.80, 0.84, 0.89
S0
A0
A1
S0, A0
S0, A1
A0, A1
S0
A0
A1
S0, A0
S0, A1
A0, A1
D20
D20
D20
D20
D20
D20
D20
D20
D20
D20
D20
D20
69.4
76.5
74.2
81.8
79.9
81.1
82.6
65.4
67.8
80.1
79.5
71.0
57.8
64.6
64.1
66.5
70.3
71.7
33.2
51.0
46.0
44.5
43.4
56.2
60.5
54.9
54.9
66.0
64.7
60.9
21.3
31.6
30.4
26.9
25.7
30.0
63.6
70.5
69.2
74.2
75.1
76.4
57.9
58.2
56.9
62.3
61.5
63.6
These results show that Table 6.6, in which D = D10 = 31.21 cm is the maximum
permitted path length used for classification, has the highest number of true positives
and the fewest number of false negatives for all the configurations used. Table 6.8
shows the worst discriminative ability when all the possible path lengths (D = D91 )
were applied. However, only one of these classifier configurations needs to perform
200
C. B. Acosta and M. K. Hinders
Table 6.5 The classifier accuracy when the feature selection used only one or two modes at once but
all transducer frequencies are shown here. All ray paths were used in this configuration (D = D91 )
Hole Dia.
[in]
Frequencies
used (MHz)
Modes
used
A(ω) (%)
D (cm)
ω1
ω2
ω3
Average
3/8
0.80, 0.84, 0.89
S0
D91
92.6
14.3
24.1
43.7
3/8
0.80, 0.84, 0.89
A0
D91
93.1
14.6
25.9
44.5
3/8
0.80, 0.84, 0.89
A1
D91
92.6
15.2
29.7
45.8
3/8
0.80, 0.84, 0.89
S0, A0
D91
93.8
16.1
32.1
47.3
3/8
0.80, 0.84, 0.89
S0, A1
D91
93.2
16.7
31.7
47.2
3/8
0.80, 0.84, 0.89
A0, A1
D91
93.5
16.9
32.7
47.7
1/2
0.80, 0.84, 0.89
S0
D91
67.4
40.8
17.0
41.7
1/2
0.80, 0.84, 0.89
A0
D91
59.6
44.8
20.9
41.8
1/2
0.80, 0.84, 0.89
A1
D91
57.8
48.8
19.9
42.2
1/2
0.80, 0.84, 0.89
S0, A0
D91
64.1
46.0
23.5
44.6
1/2
0.80, 0.84, 0.89
S0, A1
D91
64.6
48.5
19.6
44.2
1/2
0.80, 0.84, 0.89
A0, A1
D91
59.4
53.0
22.5
44.9
Table 6.6 The predicted flaw types using the pipe flaw detection routine are shown here. The
columns show the predicted ray path types that whose intersection forms the figures as shown in
Figs. 6.9 and 6.10. The rows show the different variables used to configure the classifier. All of the
classifiers shown here used D = D10 , h I = 1, h a = 10
Classifier configuration
Flaw type tested
Hole
Frequencies
Modes
Dia. (cm)
used (MHz)
used
ω2
ω3
3/8
0.8
S0, A0, A1
ω2
ω3
3/8
0.84
S0, A0, A1
ω2
ω3
3/8
0.89
S0, A0, A1
ω2
ω3
3/8
0.8, 0.84, 0.89
S0
ω2
ω3
3/8
0.8, 0.84, 0.89
A0
ω2
ω3
3/8
0.8, 0.84, 0.89
A1
ω2
ω3
3/8
0.8, 0.84, 0.89
S0, A0
ω2
ω3
3/8
0.8, 0.84, 0.89
S0, A1
ω2
ω3
3/8
0.8, 0.84, 0.89
A0, A1
ω2
ω3
1/2
0.8
S0, A0, A1
ω2
ω1
1/2
0.84
S0, A0, A1
ω1
ω3
1/2
0.89
S0, A0, A1
ω2 , ω3
ω3
1/2
0.8, 0.84, 0.89
S0
ω2
ω1
1/2
0.8, 0.84, 0.89
A0
ω2 , ω3
ω3
1/2
0.8, 0.84, 0.89
A1
ω2
ω3
1/2
0.8, 0.84, 0.89
S0, A0
ω2
ω1
1/2
0.8, 0.84, 0.89
S0, A1
ω2
ω1
1/2
0.8, 0.84, 0.89
A0, A1
ω2 , ω3
ω3
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
201
Table 6.7 Similarly to Table 6.3, the qualitative performance of the pipe flaw detector routine is
shown here using D = D20 , h I = 2, h a = 5
Classifier Configuration
Flaw Type Tested
Hole
Frequencies
Modes
Dia. (cm)
used (MHz)
used
ω2
ω3
3/8
3/8
3/8
3/8
3/8
3/8
3/8
3/8
3/8
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
0.8
0.84
0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8
0.84
0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
S0, A0, A1
S0, A0, A1
S0, A0, A1
S0
A0
A1
S0, A0
S0, A1
A0, A1
S0, A0, A1
S0, A0, A1
S0, A0, A1
S0
A0
A1
S0, A0
S0, A1
A0, A1
ω2
ω2
ω2
ω2
ω2
ω2
ω2
ω2
ω2
ω2
ω2
ω2 , ω3
ω2
ω2 , ω3
ω2
ω2
ω2
ω2
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω1
ω1
ω1
ω3
ω1
ω3
ω1
ω1
well in order to select a classifier to be applied in further applications. It is not
necessary (or possible) that all combinations of the classifier variables should perform
well. Clearly it is possible to find at least one classifier configuration that accurately
discriminates between the types of flaws.
In addition, the results for the second tomographic scan of the pipe, when the
hole diameter was increased to 1/2 in, tends to have more false positives and false
negatives than the original scan of the pipe at a hole diameter of 3/8 in. As described
above, the classification for the 1/2 in hole pipe was performed using the 3/8 in hole
pipe as a training set, so the accuracy of the classifier tends to be lower than that
of the 3/8 in pipe when mutually exclusive sets of ray paths drawn from the same
tomographic scan were used for training and testing. Since the same threshold values
h I , h a were used for all the classifier variables that used the same maximum distance
D, it might be possible to adjust these threshold values in the future to optimize for
a training set based on a wider variety of data and a testing set drawn from a new
tomographic scan.
202
C. B. Acosta and M. K. Hinders
Table 6.8 Similarly to Tables 6.3 and 6.5, the qualitative performance of the pipe flaw detector
routine is shown here using D = D91 (so all ray paths were used), h I = 3, h a = 20
Classifier Configuration
Flaw Type Tested
Hole
Frequencies
Modes
Dia. (cm)
used (MHz)
used
ω2
ω3
3/8
3/8
3/8
3/8
3/8
3/8
3/8
3/8
3/8
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
0.8
0.84
0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8
0.84
0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
0.8, 0.84, 0.89
S0, A0, A1
S0, A0, A1
S0, A0, A1
S0
A0
A1
S0, A0
S0, A1
A0, A1
S0, A0, A1
S0, A0, A1
S0, A0, A1
S0
A0
A1
S0, A0
S0, A1
A0, A1
ω2
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω3
ω2 , ω3
ω1
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
ω2 , ω3
Lastly, recall that the distribution of ray paths drawn from the three classes
ω1 , ω2 , ω3 was adjusted so that a smaller number of ω1 cases were randomly selected
to be used in classification. The ω1 ray paths not chosen for classification would later
be used to test the pipe flaw detection algorithm. The training set used for classification was the same as before but it was tested using the ω1 ray paths originally
excluded from classification in the results above. As expected, some of these ray
paths were falsely identified by the classifier as ω2 or ω3 . The pipe flaw detection
algorithm was performed on these predicted classes to see if the intersection of these
misclassified ray paths would yield a falsely identified flaw. In fact, for all of the
classifier variables described in Sect. 6.4.4, all concluded with an assignment of ω1
(no flaw identified).
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
b)
a)
0
0
10
10
20
20
30
30
40
40
0
20
10
0
30
10
c)
0
10
10
20
20
30
30
40
40
10
20
30
20
30
d)
0
0
203
20
30
0
10
Fig. 6.10 The qualitative evaluation of different flaw images produced by the flaw detector routine
for the (a)–(b) gouge and (c)–(d) hole flaws. False positives are shown in subplots (b) and (d),
while subplots (a) and (c) show true positives
6.7 Conclusion
These results demonstrate classifiers that were able to distinguish between gouge
flaws, hole flaws, and no hole present in an aluminum pipe. The type of image
produced by the flaw detection routine is similar to the result of the Lamb wave
tomographic scan itself, which is a 2D color plot of the changes in Lamb wave
velocity over the surface of the pipe. However, because it utilizes pattern classification and image recognition techniques, the method described here can be more
specific, identifying the flaw type, and it may be able to work on smaller flaws. The
higher accuracy rate of the classifier on limited ray path distances demonstrates that
pattern classification aids the successful detection of flaws in geometries where full
scans cannot be completed due to limited access. The method does require building
a training dataset of different types of flaws, and it can take long computer times
to classify and test an entire scan. It may be best used on isolated locations where
a flaw is suspected rather than in an exhaustive search over a large area. In order
to be most accurate, the training dataset should include flaws from different indi-
204
C. B. Acosta and M. K. Hinders
vidual tomographic scans but of a structure similar to the intended application of
the intelligent flaw detector. The application of pattern classification techniques to
Lamb wave tomographic scans may be able to improve the detection capability of
structural health monitoring.
Acknowledgements The authors would like to thank Dr. Jill Bingham for sharing the data used
in this paper and Dr. Corey Miller for assistance with the tomographic reconstructions. Johnathan
Stevens constructed the pipe scanner apparatus.
References
1. Mallet L, Lee BC, Staszewski WJ, Scarpa F (2004) Structural health monitoring using scanning
laser vibrometry: II. Lamb waves for damage detection. Smart Mater Struct 13:261–269
2. Farrar CR, Nix DA, Duffey TA, Cornwell PJ, Pardoen GC (1999) Damage identification with
linear discriminant operators. In: Proceedings of the 17th Intl Modal Anal Conference, vol
3727, pp 599–607
3. Lynch JP (2004) Linear classification of system poles for structural damage detection using
piezoelectric active sensors. Proc SPIE 5391:9–20
4. Nair KK, Kiremidjian AS, Law KH (2006) Time series-based damage detection and localization
algorithm with application to the asce benchmark structure. J Sound Vib 291:349–368
5. Kercel SW, Klein MB, Pouet BF (2000) Bayesian separation of lamb wave signatures in laser
ultrasonics. In: Priddy KL, Keller PE, Fogel DB (eds) Proceedings of the SPIE, vol 4055, pp
350–361
6. Kessler SS, Agrawal P (2007) Technical Report, Metis Design Corporation
7. Mita A, Fujimoto A (2005) Active detection of loosened bolts using ultrasonic waves and
support vector machines. In: Proceedings of the 5th International Workshop Structural Health
Monitoring, pp 1017–1024
8. Park S, Yun C-B, Roh Y, Lee J-J (2006) Pzt-based active damage detection techniques for steel
bridge components. Smart Mater Struct 15:957–966
9. Park S, Lee J, Yun C, Inman DJ (2007) A built-in active sensing system-based structural health
monitoring technique using statistical pattern recognition. J Mech Sci Technol 21:896–902
10. Worden K, Pierce SG, Manson G, Philp WR, Staszewski WJ, Culshaw B (2000) Detection of
defects in composite plates using lamb waves and novelty detection. Int J Syst Sci 31:1397–
1409
11. Sohn H, Allen DW, Worden K, Farrar CR (2005) Structural damage classification using extreme
value statistics. J Dyn Syst Meas Contr 127:125–132
12. Mirapeix J, Garcia-Allende P, Cobo A, Conde O, Lopez-Higuera J (2007) Real-time arcwelding defect detection and classification with principal component analysis and artificial
neural networks. NDT&E Int 40:315–323
13. Lynch JE (2001) Ultrasonographic measurement of periodontal attachment levels. Ph.D. thesis,
Department of Applied Science, College of William and Mary, Williamsburg, VA
14. McKeon JCP, Hinders MK (1999) Parallel projection and crosshole lamb wave contact scanning
tomography. J Acoust Soc Am 106:2568–2577
15. Hou J, Leonard KR, Hinders MK (2005) Multi-mode lamb wave arrival time extraction
of improved tomographic reconstruction. In: Thompson DO, Chimenti DE (eds) Review of
Progress in Quantitative Nondestructive Evaluation, vol 24a. Melville, NY pp 736–743
16. Rudd KE, Leonard KR, Bingham JP, Hinders MK (2007) Simulation of guided waves in
complex piping geometries using the elastodynamic finite integration technique. J Acoust Soc
Am 121:1449–1458
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
205
17. Bingham JP, Hinders MK (2009) Lamb wave detection of delaminations in large diameter pipe
coatings. Open Acoust (in press)
18. Leonard KR, Hinders MK (2003) Guided wave helical ultrasonic tomography of pipes. J Acoust
Soc Am 114:767–774
19. Malyarenko EV, Hinders MK (2001) Ultrasonic lamb wave diffraction tomography. Ultrasonics
39:269–281
20. Ravanbod H (2005) Application of neuro-fuzzy techniques in oil pipeline ultrasonic nondestructive testing. NDT&E Int 38:643–653
21. Zhao X, Varma VK, Mei G, Ayhan B, Kwan C (2005) In-line nondestructive inspection of
mechanical dents on pipelines with guided shear horizontal wave electromagnetic acoustic
transducers. J Pressure Vessel Technol 127:304–309
22. Sun Y, Bai P, Sun H, Zhou P (2005) Real-time automatic detection of weld defects in steel
pipe. NDT&E Int 38:522–528
23. Pierce AD, Kil H (1990) Elastic wave propagation from point excitations on thin-walled cylindrical shells. J Vib Acoust 112:399–406
24. Viktorov IA (1967) Rayleigh and lamb waves: physical theory and applications. Plenum Press,
New York
25. Rose JL (1999) Ultrasonic waves in solid media. Cambridge University Press, New York
26. Webb AR (2002) Statistical pattern recognition, 2nd edn. Wiley, Hoboken, NJ
27. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge
University Press, New York
28. Fromme P, Sayir MB (2002) Measurement of the scattering of a lamb wave by a through hole
in a plate. J Acoust Soc Am 111:1165–1170
29. Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals.
Mat Eval 60:1089–1093
30. Hinders MK, Hou JR (2005) Ultrasonic periodontal probing based on the dynamic wavelet
fingerprint. In: Thompson DO, Chimenti DE, (eds) Review of Progress in Quantitative Nondestructive Evaluation, vol 24b. AIP Conference Proceedings, Melville, New York, pp 1549–1556
31. Hou JR, Rose ST, Hinders MK (2005) Ultrasonic periodontal probing based on the dynamic
wavelet fingerprint. Eurasip J Appl Sign Process 7:1137–1146
32. Lynch JE, Hinders MK, McCombs GB (2006) Clinical comparison of an ultrasonographic
periodontal probe to manual and controlled-force probing. Measurement 39:429–439
33. Hinders MK, Hou J, McKeon JCP (2005) Ultrasonic inspection of thin multilayers. In: Thompson DO, Chimenti DE (eds) 31st Review of Progress in Quantitative Nondestructive Evaluation
24b, vol 760, pp 1137–1144 (AIP Conference Proceedings)
34. Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Prob 20:1873–1888
35. Bingham JP, Hinders MK (2009) Lamb wave characterization of corrosion-thinning in aircraft
stringers: experiment and 3D simulation. J Acoust Soc Am 126:103–113
36. Bingham JP, Hinders MK, Friedman A (2009) Lamb wave detection of limpet mines on ship
hulls. Ultrasonics (in press)
37. Haralick RM, Shapiro LG (1992) Computer and robot vision, vol I. Addison-Wesley, Boston,
MA
38. Horn BKP (1986) Robot vision. MIT Press, Cambridge, MA
39. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data
Anal 6:429–449
40. Melgani F, Bruzzone L (2004) Classification of hyperspectral remote sensing images with
support vector machines. IEEE Trans Geosci Remote Sens 42:1778–1790
Chapter 7
Classification of RFID Tags with Wavelet
Fingerprinting
Corey A. Miller and Mark K. Hinders
Abstract Passive radio frequency identification (RFID) tags lack the resources for
standard cryptography, but are straightforward to clone. Identifying RF signatures
that are unique to an emitter’s signal is known as physical-layer identification, a
technique that allows for distinction between cloned devices. In this work, we study
the effect real-world environmental variations have on the physical-layer fingerprints of passive RFID tags. Signals are collected for a variety of reader frequencies,
tag orientations, and ambient conditions, and pattern classification techniques are
applied to automatically identify these unique RF signatures. We show that identically programmed RFID tags can be distinguished using features generated from
DWFP representations of the raw RF signals.
Keywords RFID tag · Wavelet fingerprint · Pattern classification
7.1 Introduction
Radio frequency identification (RFID) tags are widespread throughout the modern
world, commonly used in retail, aviation, health care, and logistics [1]. As the price
of RFID technology decreases with advancements in manufacturing techniques [2],
new implementations of RFID technology will continue to rise. The embedding of
RFID technology into currency, for example, is being developed overseas to potentially cut down on counterfeiting [3]. Naturally, the security of these RF devices has
become a primary concern. Using techniques that range in complexity from simple
eavesdropping to reverse engineering [4], researchers have shown authentication vulnerabilities in a wide range of current RFID applications for personal identification
and security purposes, with successful cloning attacks made on proximity cards [5],
credit cards [6], and even electronic passports [7].
C. A. Miller · M. K. Hinders (B)
Department of Applied Science, William & Mary, Williamsburg, VA, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_7
207
208
C. A. Miller and M. K. Hinders
Basic passive RFID tags, which lack the resources to perform most forms of cryptographic security measures, are especially susceptible to privacy and authentication
attacks because they have no explicit counterfeiting protections built in. These lowcost tags are the type found in most retail applications, where the low price favors the
large quantity of tags required. One implementation of passive RFID tags is to act as a
replacement for barcodes. Instead of relaying a sequence of numbers identifying only
the type of object a barcode is attached to, RFID tags use an Electronic Product Code
(EPC) containing not only information about the type of object, but also a unique
serial number used to individually distinguish the object. RFID tags also eliminate
the need for line-of-sight scanning that barcodes have, avoiding scanning orientation
requirements. In a retail setting, these RFID tags are being explored for point-of-sale
terminals capable of scanning all items in a passing shopping cart simultaneously [8].
Without security measures, however, it is straightforward to surreptitiously obtain
the memory content of these basic RFID tags and reproduce a cloned signal [9].
An emerging subset of RFID short-range wireless communication technology is
near-field communication (NFC), operating within the high-frequency RFID band at
13.56 MHz. Compatible with already existing RFID infrastructures, NFC involves
an initiator that generates an RF field and a passive target, although interactions
between two powered devices are possible. The smartphone industry is one of the
leading areas for NFC research, as many manufacturers have begun putting NFC
technology to their products. With applications enabling users to pay for items such
as groceries and subway tickets by waving their phone in front a machine, NFC
payment systems are an attractive alternative to the multitude of credit cards available
today [10]. Similarly, NFC-equipped mobile phones are being explored for use as
boarding passes, where the passenger can swipe their handset like a card, even when
its batteries are dead [11].
A variety of approaches exist to solve this problem of RFID signal authentication,
in which an RFID system identifies an RFID tag as being legitimate as opposed to a
fraudulent copy. One such method involves the introduction of alternate tag-reader
protocols, including the installation of a random number generator in the reader and
tag [12], a physical proximity measure when scanning multiple tags simultaneously
[13], or re-purposing the kill PIN in an RFID tag, which normally authorizes the
deactivation of the tag [14].
Rather than changing the current tag-reader protocols, we approach this issue of
RFID tag authentication by applying a wavelet-based RF fingerprinting technique,
utilizing the physical layer of RF communication. The goal is to identify unique signatures in the RF signal that provides hardware-specific information. First pioneered
to identify cellular phones by their transmission characteristics [15], RF fingerprinting has been recently explored for wireless networking devices [16], wired Ethernet
cards [17], universal software radio peripherals (USRP) [18], and RFID devices
[19–22].
Our work builds on that of Bertoncini et al. [20], in which a classification routine was developed using a novel wavelet-based feature set to identify 150 RFID
tags collected with fixed tag orientation and distance relative to the reader with RF
shielding. That dataset, however, was collected in an artificially protected environment and did not include physical proximity variations relative to the reader, one
7 Classification of RFID Tags with Wavelet Fingerprinting
209
of the most commonly exploited benefits of RFID technology over existing barcodes. The resulting classifier performance therefore can’t be expected to translate
to real-world situations. Our goal is to collect signals from a set of 40 RFID tags with
identical Electronic Product Codes (EPC) at a variety of orientation and RFID reader
frequencies as well as over several days to test the robustness of the classifier. The
effects of tag damage in the form of water submersion and physical crumpling are
also briefly explored. Unlike Bertoncini et al., we use a low-cost USRP to record the
RF signals in an unshielded RF environment, resulting in more realistic conditions
and SNR values than previously examined.
7.2 Classification Overview
The application of pattern classification for individual RFID tag identification begins
with data collection, where each individual RFID tag is read and the EPC regions
are windowed and extracted from all of the tag-reader events. A feature space is
formed by collecting a variety of measurements from each of the EPC regions.
Feature selection then reduces this feature space to a more optimal subset, removing
irrelevant features. Once a dataset has been finalized, it is then split into training and
testing sets via a resampling algorithm, and the classifier is trained on the training
set and tested on the testing set. The classifier output is used to predict a finalized
class label for the testing set, and the classifier’s performance can be evaluated.
Each tag is given an individual class label; however, we are only interested in
whether or not a new signal corresponds to an EPC from the specific tag of interest.
The goal for this application is to identify false, cloned signals trying to emulate the
original tag. We therefore implement a binary one-against-one classification routine,
where we consider one individual tag at a time (the classifier tag), and all other tags
(the testing tags) are tested against it one at a time. This assigns one of two labels to
a testing signal, either ω = 1 declaring that the signal corresponds to an EPC from
the classifier tag, or ω = −1 indicating that the signal does not correspond to an EPC
from the classifier tag.
7.3 Materials and Methods
Avery-Dennison AD-612 RFID tags were used in this study, which follow the EPCglobal UHF Class 1 Generation 2 (EPCGen2) standards [23]. There were 40 individual RFID tags available, labeled AD01, AD02, . . . , AD40. The experimental
procedure involves writing the same EPC code onto each tag with a Thing Magic
Mercury 5e RFID Reader1 paired with an omni-directional antenna (Laird Technologies2 ). Stand-alone RFID readers sold today perform all of the signal amplification,
modulation/demodulation, mixing, etc. in special-purpose hardware. While this is
1 Cambridge,
2 St
MA (http://www.thingmagic.com).
Louis, MI (http://www.lairdtech.com).
210
C. A. Miller and M. K. Hinders
Fig. 7.1 Experimental setup for RFID data collection is shown, with the RFID reader, tag, antenna,
connection to the VSA, and USRP2 software-defined radio
beneficial for standard RFID use where only the demodulated EPC is of interest, it
is inadequate for our research because we seek to extract the raw EPC RF signal.
Preliminary work [20] collected raw RF signals through a vector signal analyzer
recording 327.50 ms of data at a 3.2 MHz sampling frequency, a laboratory-grade
instrument often used in the design and testing of electronic devices. While the
vector signal analyzer proved useful for data collection in the preliminary work, it
is not a practical tool that could be implemented in real-world applications. We thus
explore the use of an alternate RF signal recording device, a software-defined radio
(SDR) system. Software-defined radios are beneficial over standard RFID units as
they contain their own A/D converters and the majority of their signal processing is
software controlled, allowing them to transmit and receive a wide variety of radio
protocols based solely on the software used. The SDR system used here is from the
Universal Software Radio Peripheral (USRP) family of products developed by Ettus
Research LLC,3 specifically the USRP2, paired with a GnuRadio [24] interface. With
board schemes and open-source drivers widely available, the flexibility of the USRP
system provides a simple and effective solution for our RF interface (Fig. 7.1).
Data was collected in two separate sessions: the first taking place in an environment
that was electromagnetically shielded over the span of a week and the second without
any shielding taking place at William and Mary (W&M) over the span of 2 weeks.
The first session included 25 individual AD-612 RFID tags labeled AD01 − AD25.
The same EPC code was first written onto each tag with the Thing Magic Mercury
5e RFID Reader, and no further modifications were performed to the tags. Data
was collected by placing one tag at a time in a fixed position near the antenna. Tag
3 Mountain
View, CA (http://www.ettus.com).
7 Classification of RFID Tags with Wavelet Fingerprinting
211
Fig. 7.2 Tag orientations used for data collection. Parallel (PL) oblique (OB) and upside-down (UD)
can be seen, named because of the tag position relative to the antenna. Real-world degradation was
also applied to the tags in the form of water submersion, as well as both light and heavy physical
deformations
transmission events were recorded for 3 seconds for each tag using the USRP2, saving
all data as a MATLAB format. Each tag was recorded at three different RFID reader
operating frequencies (902, 915, and 928 MHz), with three tag orientations relative
to the antenna being used at each frequency (parallel (PL), upside-down (UD), and
a 45◦ oblique angle (OB)). The second session of data collection at W&M included
a second, independent set of 15 AD-612 RFID tags labeled AD26 − AD40. Similar
to before, the same EPC code was first written onto each tag with the Thing Magic
Mercury 5e RFID Reader.
The second session was different from the first in that the tags were no longer
in a fixed position relative to the antenna, but rather simply held by hand near the
antenna. This introduces additional variability into the individual tag-reader events
throughout each signal recording. Tag transmission events were again recorded for 3
seconds for each tag. Data was collected at a single operating frequency (902 MHz)
with a constant orientation relative to the antenna (parallel (PL)); however, data was
collected on four separate days allowing for environmental variation (temperature,
humidity, etc.). These tags were then split into two subsets, one of which was used
for a water damage study while the other was used for a physical damage study.
For the water damage, tags AD26 − AD32 were submerged in water for 3 hours,
at which point they were patted dry and used to record data (labeled as Wet). They
were then allowed to dry overnight, and again used to record data (Wet-to-Dry). For
the physical damage, tags AD33 − AD40 were first lightly crumpled by hand (light
damage) and subsequently heavily crumpled (heavy damage). Pictures of the tag
orientation variations as well as the tag damage can be seen in Fig. 7.2.
From these datasets, four separate studies were performed. First, a frequency
comparison was run in which the three operating frequencies were used as training
212
C. A. Miller and M. K. Hinders
Table 7.1 There were 40 individual Avery-Dennison AD 612 RFID tags used for this study, split
into z subsets Dz for the various comparisons. Tag numbers τi are given for each comparison
Comparison type
Dz
Tags used (τi )
Frequency variations
Orientation variations
Different day recordings
Water damage
Physical samage
902, 915, 928 MHz
PL, UD, OB
Day 1, 2, 3, 4
Wet, Wet-to-dry
Light, Heavy damage
i
i
i
i
i
= 1, . . . , 25
= 1, . . . , 25
= 26, . . . , 40
= 26, . . . , 32
= 33, . . . , 40
and testing datasets for the classifiers, collected while maintaining a constant PL
orientation. Second, an orientation comparison was performed in which the three
tag orientations were used as training and testing datasets, collected at a constant
902 MHz operating frequency. Third, the 4 days worth of constant PL and 902 MHz
handheld recordings were used as training and testing datasets. Finally, the classifiers
were trained on the 4 days’ worth of recordings, and the additional damage datasets
were used as testing sets. The specific tags used for each comparison are summarized
in Table 7.1.
7.4 EPC Extraction
In most RFID applications, the RFID reader only has a few seconds to identify a
specific tag. For example, consumers would not want a car’s key-less entry system
that required the user to stand next to the car for half a minute while it interrogated
the tag. Rather, the user expects access to their car within a second or two of being
within the signal’s range. These short transmission times result in only a handful of
individual EPCs being transmitted, making it important that each one is extracted
efficiently and accurately.
In our dataset, each tag’s raw recording is a roughly 3 second tag-to-reader communication. During this time there is continuous communication between the antenna
and any RFID tags within range. This continuous communication is composed of
repeated individual tag-reader (T⇔R) events. The structure and duration of each
T⇔R event is pre-defined by the specific protocols used. The AD-612 RFID tags are
built to use the EPCGen2 protocols [23], so we can use the inherent structure within
these protocols to automatically extract the EPCs within each signal.
Previous attempts at identifying the individual EPC codes within the raw signals
involved a fixed-window cross-correlation approach, where a manually extracted
EPC region was required for comparison [20]. With smart window sizing, this
approach can identify the majority of EPC regions within a signal. As the communication period is shortened and the number of EPCs contained in each recording
decreases, however, this technique becomes insufficient.
We have developed an alternative technique that automatically identifies components of the EPCGen2 communication protocols. The new extraction algorithm is
7 Classification of RFID Tags with Wavelet Fingerprinting
213
Table 7.2 The EPC extraction routine used to find the EPC regions of interest for analysis
For each tag AD01-AD40 {
Raw recorded signal is sent to getEPC.m
◦ Find and window “downtime” regions between tag/reader
communication periods
◦ Envelope windowed sections, identify individual T⇔R events
For each T⇔R event
• Envelope the signal, locate flat [EPC+]
region
• Set start/finish bounds on [EPC+]
• Return extracted [EPC+] regions
Each extracted [EPC+] is sent to windowEPC.m
◦ Generate artificial Miller (M=4) modulated preamble
◦ Locate preamble in recorded signal via cross-correlation
◦ Identify all subsequent Miller (M=4) basis functions via crosscorrelation
◦ Extract corresponding bit values
◦ Verify extracted bit sequence matches known EPC bit sequence
◦ Return start/end locations of EPC region
Save EPC regions
}
outlined in Table 7.2, with a detailed explanation to follow. It should be noted that the
region identified as [EPC+] is a region of the signal that is composed of a preamble
which initiates the transmission, a protocol-control element, the EPC itself, as well
as a final 16-bit cyclic-redundancy check.
The first step in the EPC extraction routine is to window each raw signal by locating
the portions that occur between reader transmission repetitions. These periods of no
transmission are referred to here as “downtime” regions. These are the portions of
the signal during which the RFID reader is not communicating with the tag at all. An
amplitude threshold is sufficient to locate the downtime regions, which divide the raw
signal into separate sections, each of which contains several individual T⇔R events.
There is another short “dead” zone between each individual T⇔R event where the
RFID reader stops transmitting briefly. Because of this, the upper envelope of the
active communication region is taken and another amplitude threshold is applied
to identify these dead zones, further windowing the signal into its individual T⇔R
events.
Each individual T⇔R event is then processed to extract the individual [EPC+]
region within. First, the envelope of the T⇔R event is taken, which highlights the
back-and-forth communicating between the tag and the RFID reader. The [EPC+]
region, being the longest in time duration out of all the communication, is relatively
consistent in amplitude compared to the up-and-down structure of the signal. Therefore, a region is located that meets a flatness as well as time duration requirements
corresponding to this [EPC+]. Once this [EPC+] region is found, an error check is
applied that envelopes the region and checks this envelope for outliers that would
indicate an incorrectly chosen area.
214
C. A. Miller and M. K. Hinders
Fig. 7.3 Excerpt from the EPC Class 1 Gen 2 Protocols showing the Miller basis functions and a
generator state diagram [23]
The next step in this process is to extract the actual EPC from the larger [EPC+]
region. For all Class 1 Gen 2 EPCs, the tags encode the backscattered data using either
FM0 baseband or Miller modulation of a subcarrier, the encoding choice made by
the reader. The Thing Magic Mercury 5e RFID Reader uses Miller (M=4) encoding,
the basis functions of which can be seen in Fig. 7.3. The Miller (M=4) preamble is
then simulated and cross correlated with the [EPC+] region to determine its location
within. From the end of the preamble, the signal is broken up into individual bits, and
cross-correlation is used to determine which bits are present for the remainder of the
signal (positive or negative, 0 or 1). Upon completion, the bit sequence is compared
to a second known bit sequence generated from the output of the RFID reader’s serial
log for verification, shown in Table 7.3. The bounds of this verified bit sequence are
then used to window the [EPC+] region down to the EPC only. A single T⇔R event
as well as a close-up of a [EPC+] region can be seen in Fig. 7.4.
The goal of the classifier is to identify individual RFID tags despite the fact that all
the tags are of the same type, from the same manufacturer, and written with the same
EPC. The raw RFID signal s(t) is complex valued, so an amplitude representation
α(t) is used for the raw signal [25]. An “optimal” version of our signal was also
reverse engineered using the known Miller (M=4) encoding methods, labeled s0 (t).
We then subtract the raw signal from the optimal representation, producing an EPC
error signal as well, labeled e E PC . These are summarized by
s(t) = r(t) + ic(t)
α(t) = r 2 (t) + c2 (t)
e E PC (t) = s0 (t) − s(t)
(7.1)
This signal processing step of reducing the complex-valued s(t) to either α(t) or
e E PC (t) will be referred to as EPC compression. A signal that has been compressed
using either one of these methods will be denoted ŝ(t) for generality. Figure 7.5
compares the different EPC compression results on a typical complex RFID signal.
7 Classification of RFID Tags with Wavelet Fingerprinting
215
Table 7.3 Thing Magic Mercury 5e RFID Reader Serial Log
(17:05:31.625 - TX(63)): 00 29 CRC:1D26
(17:05:31.671 - RX(63)): 04 29 00 00 00 00 00 01 CRC:9756
(17:05:31.671 - TX(64)): 03 29 00 07 00 CRC:F322
(17:05:31.718 - RX(64)): 19 29 00 00 00 07 00 01 07 72 22 00 80 30 00 30 08 33 B2 DD
D9 01 40 35 05 00 00 42 E7 CRC:4F31
(17:05:31.734 - TX(65)): 00 2A CRC:1D25
(17:05:31.765 - RX(65)): 00 2A 00 00 CRC:01E8
(17:05:31.765 - TX(66)): 00 2A CRC:1D25
(17:05:31.796 - RX(66)): 00 2A 00 00 CRC:01E8
(17:05:31.796 - TX(67)): 05 22 00 00 00 00 FA CRC:0845
(17:05:32.093 - RX(67)): 04 22 00 00 00 00 00 01 CRC:7BA9
(17:05:32.093 - TX(68)): 00 29 CRC:1D26
(17:05:32.140 - RX(68)): 04 29 00 00 00 00 00 01 CRC:9756
(17:05:32.140 - TX(69)): 03 29 00 07 00 CRC:F322
(17:05:32.187 - RX(69)): 19 29 00 00 00 07 00 01 07 72 22 00 80 30 00 30 08 33 B2 DD
D9 01 40 35 05 00 00 42 E7 CRC:4F31
(17:05:32.203 - TX(70)): 00 2A CRC:1D25
(17:05:32.234 - RX(70)): 00 2A 00 00 CRC:01E8
(17:05:32.234 - TX(71)): 00 2A CRC:1D25
(17:05:32.265 - RX(71)): 00 2A 00 00 CRC:01E8
EPC code (in hex) is in bold.
7.5 Feature Generation
For each RFID tag τi , where i is indexed according to the range given in Table 7.1,
the EPC extraction routine produces N -many different EPCs ŝi, j (t), j = 1, . . . , N .
Four different methods are then used to extract features from these signals: dynamic
wavelet fingerprinting (DWFP), wavelet packet decomposition (WPD), higher order
statistics, and Mellin transform statistics. Using these methods, M feature values are
extracted which make up the feature vector X = xi, j,k , k = 1, . . . , M. It should be
noted here that due to the computation time required to perform this analysis, the
computer algorithms were adapted to run on William and Mary’s Scientific Computer
Cluster.4
4 http://www.compsci.wm.edu/SciClone/.
216
C. A. Miller and M. K. Hinders
Amplitude
0.06
0.04
0.02
0
0.625
0.63
0.635
0.64
0.645
0.65
0.655
Time (s)
Fig. 7.4 A single T⇔R event with the automatically determined [EPC+] region highlighted in
gray. Close-up view of a single [EPC+] region with the EPC itself highlighted in gray
7.5.1 Dynamic Wavelet Fingerprint
The DWFP technique is used to generate a subset of the features used for classification. Wavelet-based measurements provide the ability to decompose noisy and
complex information and patterns into elementary components. To summarize this
process, the DWFP technique first applies a continuous wavelet transform on each
original time-domain signal ŝi, j (t)[26]. The resulting coefficients are then used to
generate “fingerprint”-type images Ii, j (a, b) that are coincident in time with the raw
signal. Mother wavelets used in this study include the Daubechies-3 (db3), Symelet-5
(sym5), and Meyer (meyr) wavelets, chosen based on preliminary results.
Since pattern classification uses one-dimensional feature vectors to develop decision boundaries for each group of observations, the dimension of the binary fingerprint images Ii, j (a, b) that are generated for each EPC signal needs to be reduced.
A subset of ν individual values that best represent the signals for classification will
be selected. The number ν (ν < M) of DWFP features to select is arbitrary, and can
be adjusted based on memory requirements and computation time restraints. For this
RFID application, we consider all cases of ν ∈ [1, 5, 10, 15, 20, 50, 75, 100].
7 Classification of RFID Tags with Wavelet Fingerprinting
217
Amplitude
0.05
0
-0.05
0.629
0.6295
0.63
0.6305
0.631
0.629
0.6295
0.63
0.6305
0.631
0.629
0.6295
0.63
0.6305
0.631
Amplitude
0.07
0.06
0.05
0.04
0.03
Amplitude
0.2
0
-0.2
-0.4
Time (s)
Fig. 7.5 The different EPC compression techniques are shown here, displaying the real r (t) and
imaginary c(t) components of a raw [EPC+] region (black and gray, respectively, top), the amplitude
α(t) (middle), and the EPC error e E PC (t) (bottom). The EPC portion of the [EPC+] signal is bound
by vertical red dotted lines
Using standard MATLAB routines,5 the feature extraction process consists of
several steps:
1. Label each binary image with individual values for all sets of connected pixels.
2. Relabel concentric objects centered around a common area (useful for the ringlike features found in the fingerprints).
3. Apply thresholds to remove any insignificant objects in the images.
4. Extract features from each labeled object.
5. Linearly interpolate in time between individual fingerprint locations to generate
a smoothed array of feature values.
6. Identify points in time where the feature values are consistent among individual
RFID tags yet separable between different tags.
5 MATLAB’s
Image Processing Toolbox (MATLAB, 2008, The Mathworks, Natick, MA).
218
C. A. Miller and M. K. Hinders
Fig. 7.6 An example of 8-connectivity and its application on a binary image
The binary nature of the images allows us to consider each pixel of the image
as having a value of either 1 or 0. The pixels with a value of 0 can be thought
of as the background, while pixels with an value of 1 can be thought of as the
significant pixels. The first step in feature extraction is to assign individual labels
to each set of 8-connected components in the image [27], demonstrated in Fig. 7.6.
Since the fingerprints are often concentric shapes, different concentric “rings” are
often not connected to each other, but still are components of the same fingerprint
object. Therefore, the second step in the process is to relabel groups of concentric
objects using their center of mass, which is the average time coordinate of each pixel,
demonstrated in Fig. 7.7. The third step in the feature extraction process is to remove
any fingerprint objects from the image whose area (sum of the pixels) is below a
particular threshold. Objects that are too small for the computations in later steps are
removed; however, this threshold is subjective and depends on the mother wavelet
used.
At this point in the processing, the image is ready for features to be generated.
Twenty-two measurements are made on each remaining fingerprint object, including
the area, centroid, diameter of a circle with the same area, Euler number, convex
image, solidity, coefficients of second and fourth degree polynomials fit to the fingerprint boundary, as well as major/minor axis length, eccentricity, and orientation
of an ellipse that has the same normalized second central moment as the fingerprint.
The property measurements result in a sparse property array Pi, j,n [t], where n represents the property index n = 1, . . . , 22, since each extracted value is matched to the
7 Classification of RFID Tags with Wavelet Fingerprinting
219
Fig. 7.7 An example of the fingerprint labeling process. The components of the binary image and
the resulting 8-connected components, where each label index corresponds to a different index on
the “hot” colormap in this image. Concentric objects are then relabeled, resulting in unique labels
for each individual fingerprint object, shown here as orange and white fingerprint objects for clarity
220
C. A. Miller and M. K. Hinders
time value of the corresponding fingerprint’s center of mass. Therefore, these sparse
property vectors are linearly interpolated to produce a smoothed vector of property
values, Pi, j,n (t). This process is shown for a typical time-domain EPC signal in
Fig. 7.8.
Once an array of fingerprint features for each EPC has been generated, it still
needs to be reduced into a single vector of ν-many values to be used for classification. Without this reduction, not only is the feature set too large to process even on
a computing cluster, but also most of the information contained within it is redundant. Since we are implementing a one-against-one classification scheme, where one
testing tag (τt ) will be compared against features designed to identify one classifier
tag (τc ), we are looking for feature values that are consistent among each individual
RFID tag, yet separable between different tags.
First, the dimensionality of the property array is reduced by calculating the intertag mean property value for each tag τi ,
μi,n (t) =
1 Pi, j,n (t).
| j| j
(7.2)
Each inter-tag mean vector is then normalized to the range [0, 1]. Next, the difference
in inter-tag mean vectors for property n is considered for all binary combinations of
tags τi1 , τi2 ,
(7.3)
dn (t) = |μi1 ,n (t) − μi2 ,n (t)| for i 1 , i 2 ∈ i
for values of i shown in Table 7.1. We are left with a single vector representing the
average intra-class difference in property n values as a function of time.
Similarly, we compute the standard deviation within each class,
2
1 σi,n (t) = Pi, j,n (t) − μi,n (t) .
| j| j
(7.4)
We next identify the maximum value of standard deviation among all tags τi at each
point in time t, essentially taking the upper envelope of all values of σi,n (t),
σn (t) = max σi,n (t) .
i
(7.5)
Times tm , where m = 1, . . . , ν, are then identified for each property n at which the
average intra-class difference dn (tm ) is high while the inter-class standard deviation
σn (tm ) remains low. The resulting DWFP feature vector for EPC signal ŝi, j (t) is
xi, j,km = Pi, j,n m (tm ).
(7.6)
7 Classification of RFID Tags with Wavelet Fingerprinting
221
Fig. 7.8 The DWFP is applied to an EPC signal ŝi, j (t), shown in gray (a). A close-up of the signal
is shown for clarity (b), from which the fingerprint image (c) is generated, shown here with white
peaks and gray valleys for distinction. Each fingerprint object is individually labeled and localized
in both time and scale (d). A variety of measures are extracted from each fingerprint and interpolated
in time, including the area of the on-pixels for each object (e), as well as the coefficients of a fourthorder polynomial ( p1 x 4 + p2 x 3 + p3 x 2 + p4 x + p5 ) fit to the boundary of each fingerprint object,
with coefficients p3 shown here (f)
222
C. A. Miller and M. K. Hinders
7.5.2 Wavelet Packet Decomposition
Another wavelet-based feature used in classification is generated by wavelet packet
decomposition [28]. First, each EPC signal is filtered using a stationary wavelet
transform, removing the first three levels of detail as well as the highest approximation
level. A wavelet packet transform (WPT) is applied to the filtered waveform with
a specified mother wavelet and the number of levels to decompose the waveform,
generating a tree of coefficients similar in nature to the continuous wavelet transform.
From the WPT tree, a vector containing the percentages of energy corresponding to
the T terminal nodes of the tree is computed, known as the wavelet packet energy.
Because the WPT is an orthonormal transform, the entire energy of the signal is
preserved in these terminal nodes [29]. The energy matrix E j for each RFID tag τi
can then be represented as
E i = e1,i , e2,i , . . . , e N ,i ,
(7.7)
where N is the number of EPCs extracted from tag τi and ei, j [b] is the energy from
bin number b = 1, . . . , T of the energy map for signal j = 1, . . . , N . Singular value
decomposition is then applied to each energy matrix E i :
E i = Ui Σi Vi∗ ,
(7.8)
where Ui is composed of T -element left singular column vectors ub,i
Ui = u1,i , u2,i , . . . , uT,i , .
(7.9)
The Σi matrix is a T × N singular value matrix. The row space and nullspace of E i
are defined in the N × N matrix Vi∗ , and are not used in the analysis of the energy
maps. For the energy matrices E i , we found that there was a dominant singular value
relative to the second highest singular value, implying that there was a dominant
representative energy vector corresponding to the first singular vector u1,i . From the
set of all singular vectors ub,i , the significant bins that have energies above a given
threshold are identified. The threshold is lowered until all the vectors return a common significant bin. Finally, the WPT elements corresponding to the extracted bin are
used as features. In the case of multiple bins being selected, all corresponding WPT
elements are included in the feature set. Wavelet packet decomposition uses redundant basis functions and can therefore provide arbitrary time–frequency resolution
details, improving upon the wavelet transform when analyzing signals containing
close, high-frequency components.
7 Classification of RFID Tags with Wavelet Fingerprinting
223
7.5.3 Statistical Features
Several statistical features were generated from the raw EPC signals ŝi, j (t):
1. The mean of the raw signal
μi, j =
1 ŝi, j (t),
|ŝ| t
where |ŝ| is the length of ŝi, j (t)
2. The maximum cross-correlation of ŝi, j (t) with another EPC from the same tag,
ŝi,k (t), where τ j = τk
max
ŝi,∗ j (t)ŝi,k (t + τ ) .
t
3. The Shannon entropy
ŝi,2 j (t) ln(ŝi,2 j (t)).
t
4. The unbiased sample variance
1 (ŝi, j (t) − μi, j )2 .
|ŝ| − 1 t
5. The skewness (third central moment)
1
σi,3 j |ŝ|
t
(ŝi, j (t) − μi, j )3 .
6. The kurtosis (fourth central moment)
κi, j =
1
σi,4 j |ŝ|
(ŝ j (t) − μi, j )4 .
t
Statistical moments provide insight by highlighting outliers due to any specific flawtype signatures found in the data.
7.5.4 Mellin Features
The Mellin transform is an integral transform, closely related to the Fourier transform
and the Laplace transform, that can represent a signal in terms of a physical attribute
similar to frequency known as scale. The β-Mellin transform is defined as [30]
224
C. A. Miller and M. K. Hinders
∞
M f ( p) =
ŝ(t)t p−1 dt,
(7.10)
0
for the complex variable p = − jc + β, with fixed parameter β ∈ R and independent
variable c ∈ R. This variation of the Mellin transform is used because the β parameter
allows for the selection of a variety of more specific transforms. In the case of
β = 1/2, this becomes a scale-invariant transform, meaning invariant to compression
or expansion of the time axis while preserving signal energy, defined on the vertical
line p = − jc + 1/2. This scale transform is defined as
∞
1
D f (c) = √
2π
ŝ(t)e(− jc−1/2) ln t dt.
(7.11)
0
This transform has the key property of scale invariance, which means that ŝ is a scaled
version of a function ŝ, and they will have the same transform magnitude. Variations
in each RFID tag’s local oscillator can lead to slight but measurable differences in
the frequency of the returned RF signal, effectively scaling the signal. Zanetti et al.
call this the time interval error (TIE), and extract the TIE directly to use as a feature
for individual tag classification [21]. We observed this slight scaling effect in our
data and therefore explore the use of a scale-invariant feature extraction technique.
Mellin transform’s relationship with the Fourier transform can be highlighted by
setting β = 0, which results in a logarithmic-time Fourier transform:
M f (c) =
∞
ŝ(t)e− jc(ln t) d(ln t).
(7.12)
−∞
Similarly, the scale transform of a function ŝ(t) can be defined using the Fourier
transform of g(t) = ŝ(et ):
M f (c) =
∞
−∞
g(t)e− jct d(t) = F (g(t)).
(7.13)
References [30, 33] discuss the complexities associated with discretizing the fast
Mellin transform (FMT) algorithm, as well as provide a MATLAB-based implementation.6 The first step in implementing this is to define both an exponential sampling
step along with the number of samples needed for a given signal in order to exponentially resample it, an example of which can be seen in Fig. 7.9. Once the exponential
axis has been defined, an exponential point-by-point multiplication with the original
signal is performed. A fast Fourier transform (FFT) is then computed, followed by
an energy normalization step. This process is summarized in Fig. 7.10.
6 http://profs.sci.univr.it/~desena/FMT.
7 Classification of RFID Tags with Wavelet Fingerprinting
225
Fig. 7.9 An example of uniform sampling (in blue, top) and exponential sampling (in red, bottom)
Once the Mellin transform is computed, features are extracted from the resulting
Mellin domain including the mean of the Mellin transform, as well as the standard
deviation, the variance, the second central moment, the Shannon entropy, the kurtosis,
and the skewness of the mean-removed Mellin transform [31, 32].
7.6 Classifier Design
Since the goal of our classification routine is to distinguish individual RFID tags from
nominally identical copies, each individual RFID tag is assigned a unique class label.
This results in a multiclass problem, with the number of classes being equivalent to
the number of tags being compared. There are two main methods which can be used
to address multiclass problems. The first uses classifiers that have multi-dimensional
discriminant functions, which often output classification probabilities for each test
object that then need to be reduced for a final classification decision. The second
method uses a binary comparison between all possible pairs of classes utilizing a
two-class discriminant function, with a voting procedure used to determine final
classification. We have discussed in Sect. 7.2 our choice of the binary classification
approach, allowing us to include intrinsically two-class discriminants in our analysis.
Therefore, only two tags will be considered against each other at a time, a classifier
tag τc ∈ D R and a testing tag τt ∈ DT , where D R represents the training dataset used
and DT the testing dataset, outlined in Table 7.1.
For each binary combination of tags (τc , τt ), a training set (R) is generated composed of feature vectors from k-many EPCs associated with tags τc and τt from
226
C. A. Miller and M. K. Hinders
Fig. 7.10 A raw EPC signal si, j (t) (top, left), the exponentially resampled axis (top, right), the
signal resampled according to the exponential axis (middle, left), this same signal after pointby-point multiplication of the exponential axis (middle, right), and the resulting Mellin domain
representation (bottom)
dataset D R . Corresponding known labels (ωk ) are ωk = 1 when k ∈ c, and ωk = −1
when k ∈ t. The testing set (T ) is composed of feature vectors of tag τt only from
dataset DT , where predicted labels are denoted yk = ±1. In other words, the classifier
is trained on data from both tags in the training dataset and tested on the testing tag
only from the testing dataset. When D R = DT , which means the classifier is trained
and tested on the same dataset, a holdout algorithm is used to split the data into R
and T .
The problem of class imbalance, where the class labels ωk are unequally distributed (i.e., |ωk = 1|
|ωk = −1|, or vice versa), can affect classifier performance
7 Classification of RFID Tags with Wavelet Fingerprinting
227
and has been a topic of further study by several researchers [34–36]. While the number
of EPCs extracted from each tag here does not present a significant natural imbalance
as all recordings are approximately the same length in time, it is not necessarily true
that the natural distribution between classes, or even a perfect 50:50 distribution,
is ideal. To explore the effect of class imbalance on the classifier performance, a
variable ρ is introduced here, defined as
ρ=
|ωk = −1|
, k ∈ R.
|ωk = 1|
(7.14)
This variable defines the ratio of negative versus positive EPC labels in R, with
ρ ∈ Z+ . When ρ = 1, the training set T contains an equal number of EPCs from
tags τc as it does τt where under-sampling is used as necessary for equality. As ρ
increases, additional tags are included at random from τm , m = c, t with ωm = −1
until ρ is satisfied. When all of the tags in D R are included in the training set, ρ is
denoted as “all.”
The process of selecting which classifiers to use is a difficult problem. The No
Free Lunch Theorem states that there is no inherently best classifier for a particular
application, and in practice several classifiers are often compared and contrasted.
There exists a hierarchy of possible choices that are application dependent. We have
previously determined that supervised, statistical pattern classification techniques
using both parametric and nonparametric probability-based classifiers are appropriate
for consideration.
For parametric classifiers, we include a linear classifier using normal densities
(LDC) and a quadratic classifier using normal densities (QDC). For nonparametric
classifiers, we include a k-nearest-neighbor classifier (KNNC) for k = 1, 2, 3, and
a linear support vector machine (SVM) classifier. The mathematical explanations
for these classifiers can be found in [37–59]. For implementation of these classifier
functions, we use routines from the MATLAB toolbox PRTools [58].
For the above classifiers that output densities, a function is applied that converts
the output to a proper confidence interval, where the sum of the outcomes is one for
every test object. This allows for comparison between classifier outputs. Since each
EPC’s feature vector is assigned a confidence value for each class, the final label is
decided by the highest confidence of all the classes.
7.7 Classifier Evaluation
Since we have implemented a binary classification algorithm, a confusion matrix
L (c, t), where τc is the classifier tag and τt is the testing tag, can be used to view the
results of a given classifier. Each entry in a confusion matrix represents the number
of EPCs from the testing tag that are labeled as the classifier tag, denoted by a label
of yt = 1, and is given by
228
C. A. Miller and M. K. Hinders
L (c, t) =
|yt = 1|
when τc ∈ R, τt ∈ T.
|yt |
(7.15)
A perfect classifier would therefore have values of L = 1 whenever τc = τt (on the
diagonal) and values of L = 0 when τc = τt (off-diagonal). Given the number of
classifier configuration parameters used in this study, it does not make sense to compare individual confusion matrices to each other to determine classifier performance.
Each entry of the confusion matrix is a measure of the number of EPCs from each
testing set that is determined to belong to each possible training class. We can therefore apply a threshold h to each confusion matrix, where the value of h lies within the
range [0, 1]. All confusion matrix entries that are above this threshold are positive
matches for class membership, and all entries below the threshold are identified as
negative matches for class membership. It follows that we can determine the number
of false positive ( f + ), false negative, ( f − ), true positive(t+ ), and true negative (t− )
rates for each confusion matrix, given by
f + (h) =
t+ (h) =
f − (h) =
t− (h) =
|L (c, t) > h|, c = t
|L (c, t) > h|, c = t
|L (c, t) ≤ h|, c = t
|L (c, t) ≤ h|, c = t.
(7.16)
From these values, we can calculate the sensitivity (χ ) and specificity (ψ),
χ (h) =
ψ(h) =
t+ (h)
t+ (h)+ f − (h)
t− (h)
.
t− (h)+ f + (h)
(7.17)
The concept of sensitivity and specificity values is inherent in binary classification,
where testing data is identified as either a positive or negative match for each possible
class. High values of sensitivity indicate that the classifier successfully classified
most of the testing tags whenever the testing tag and classifier tag were the same,
while high values of specificity indicate that the classifier successfully classified
most of the testing tags being different than the classifier tag whenever the testing
tag and the classifier tag were not the same. Since sensitivity and specificity are
functions of the threshold h, they can be plotted with sensitivity (χ (h)) on the y-axis
and 1 − specificity (1 − ψ(h)) on the x-axis for 0 < h ≤ 1 in what is known as a
receiver operatic characteristic (ROC) [60]. The resulting curve on the ROC plane is
essentially a summary of the sensitivity and specificity of a binary classifier as the
threshold for discrimination changes. Points on the diagonal line y = x represent a
result as good as random guessing, where classifiers performing better than chance
have curves above the diagonal in the upper left-hand corner of the plane. The point
(0, 1) corresponding to χ = 1 and ψ = 1 represents perfect classification.
The area under each classifier’s ROC curve (|AUC|) is a common measure of
a classifier’s performance, and is calculated in practice using simple trapezoidal
integration. Higher |AUC| values generally correspond to classifiers with better performance [61]. This is not a strict rule, however, as a classifier with a higher |AUC|
7 Classification of RFID Tags with Wavelet Fingerprinting
229
Fig. 7.11 A comparison of confusion matrices L for classifiers of varying performance, with
0 → black and 1 → white. A perfect confusion matrix has values of L = 1 whenever τc = τt ,
seen as a white diagonal here, and values of L = 0 whenever τc = τt , seen as a black off-diagonal
here. In general, |AUC| = 1 corresponds to a perfect classifier, while |AUC| = 0.5 performs as well
as random guessing. This trend can be seen in the matrices
may perform worse in specific areas of the ROC plane than another classifier with a
lower |AUC| [62]. Several examples of confusion matrices can be seen in Fig. 7.11,
where each corresponding |AUC| value is provided to highlight their relationship to
performance. It can be seen that the confusion matrix with the highest |AUC| has a
clear, distinct diagonal of positive classifications while the lowest |AUC| has positive
classifications scattered throughout the matrix.
The use of |AUC| values for directly comparing classifier performance has recently
been questioned [63, 64], identifying the information loss associated with summarizing the ROC curve distribution as a main concern. We therefore do not use |AUC|
as a final classifier ranking measure. Rather, they are only used here to narrow the
results down from all the possible classifier configurations to a smaller subset of the
“best” ones. The values of χ (h) and ψ(h) are still useful measures of the remaining top classifiers. At this point, however, they have been calculated for a range of
threshold values that extend over 0 < h ≤ 1. A variety of methods can be used to
230
C. A. Miller and M. K. Hinders
Fig. 7.12 ROC curves for the classifiers corresponding to the confusion matrices in Fig. 7.11, with
|AUC| = 0.9871 (dotted line), |AUC| = 0.8271 (dash-dot line), |AUC| = 0.6253 (dashed line), and
|AUC| = 0.4571 (solid line). The “perfect classifier” result at (0, 1) in this ROC space is represented
by the black star (∗), and each curve’s closest point to this optimal result at threshold ĥ is indicated
by a circle (◦)
determine a final decision threshold h for a given classifier configuration, the choice
of which depends heavily on the classifier’s final application. A popular approach
involves sorting by the minimum number of misclassifications, min( f + + f − ); however, this does not account for differences in severity between the different types of
misclassifications [65]. Instead, the overall classifier results were sorted here using
their position in the ROC space corresponding to the Euclidean distance from the
point (0, 1) as a metric. Formally, this is
dROC (h) =
(χ − 1)2 + (1 − ψ)2 .
(7.18)
For each classifier configuration, the threshold value ĥ corresponding to the minimum
distance was determined,
ĥ = arg min dROC (h) = {h|∀h : dROC (h ) ≥ dROC (h)}.
(7.19)
h
In other words, ĥ is the threshold value corresponding to the point in the ROC space
that is closest to the (0, 1) “perfect classifier” result. The classifier configurations
are then ranked by the lowest distance dROC (ĥ). Figure 7.12 shows an example of the
ROC curves for the classifiers that are generated from the confusion matrices found
in Fig. 7.11. In it, the point corresponding to ĥ is indicated by a circle, with the (0, 1)
point indicated by a star.
7 Classification of RFID Tags with Wavelet Fingerprinting
231
7.8 Results
7.8.1 Frequency Comparison
The ultra-high-frequency (UHF) range of RFID frequencies spans from 868–
928 MHz; however, in North America UHF can be used unlicensed from 902–928
MHz (± 13 MHz from a 915 MHz center frequency). We test the potential for pattern
classification routines to uniquely identify RFID tags at several operating frequencies within this range. Data collected at three frequencies (902, 915, and 928 MHz)
while being held at a single orientation (PL) were used as training and testing frequencies for the classifier. Only amplitude (α(t)) signal compression was used in
this frequency comparison.
Table 7.4 shows the top individual classifier configuration for the RFID reader
operating frequency comparison. Results are presented as sensitivity and specificity
values for the threshold value ĥ that corresponds to the minimum distance dROC in
the ROC space. Similarly, confusion matrices are presented in Fig. 7.13 for each
classifier configuration listed in Table 7.4.
Table 7.4 The classifier configurations ranked by dROC (ĥ) over all values of the classifier configuration variables when trained and tested on the frequency parameters 902, 915, and 928 MHz.
D R and DT correspond to the training and testing datasets, respectively. ρ represents the ratio of
negative versus positive EPCs in the training set. The threshold ĥ corresponding to the minimum
distance dROC is presented, along with the values of χ(ĥ) and ψ(ĥ)
DR
DT
|AUC|
Classifier configuration
#DWFP
Classifier
ρ
Results
χ (ĥ)
ψ(ĥ)
ĥ (%)
Features
(ν)
Accuracy
(%)
902
902
100
QDC
(MATLAB)
3
0.9983
1.000
0.997
96.6
99.7
902
915
1
LDC
(PRTools)
5
0.9898
1.000
0.943
8.5
94.6
902
928
50
3NN
1
0.9334
0.960
0.950
10.9
95.0
915
902
1
LDC
(PRTools)
12
0.4571
0.640
0.543
2.3
54.7
915
915
1
QDC
(MATLAB)
9
0.9977
1.000
0.995
82.2
99.5
915
928
10
LDC
(MATLAB)
all
0.5195
0.720
0.538
1.9
54.6
928
902
10
1NN
3
0.4737
0.520
0.757
9.1
74.7
928
915
1
LDC
(MATLAB)
7
0.6587
0.880
0.498
2.0
51.4
928
928
75
QDC
(MATLAB)
2
1.0000
1.000
1.000
86.9
100.0
C. A. Miller and M. K. Hinders
25
20
20
20
15
10
15
10
5
5
1
1
1
5
10
15
τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
15
10
5
1
5
10 15
τ t ∈ DT
20
1
25
25
20
20
20
15
10
15
10
5
5
1
1
1
5
10
15
τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
1
5
10 15
τ t ∈ DT
20
1
25
20
20
15
10
5
1
1
1
5
10
15
τ t ∈ DT
20
25
τ c ∈ DR
20
5
10 15
τ t ∈ DT
20
25
1
5
10 15
τ t ∈ DT
20
25
1
5
10 15
τ t ∈ DT
20
25
5
25
10
5
10
25
15
1
15
25
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
232
15
10
5
1
5
10 15
τ t ∈ DT
20
25
1
Fig. 7.13 Confusion matrices (L ) for the classifier configurations corresponding to the minimum
distance dROC (ĥ) over all combinations of (D R , DT ) where D R , DT ∈ 902, 915, 928 MHz. Values
of L range from [0, 1] with 0 → black and 1 → white here
The classifier performed well when trained on the dataset collected at 902 MHz,
regardless of what frequency the testing data was collected. Accuracies were above
94.6% for all three testing frequencies, and sensitivity (χ (ĥ)) and specificity (ψ(ĥ))
values were all above 0.943, very close to the ideal value of 1.000. These confusion
matrices shown in Fig. 7.13 all display the distinct diagonal line which indicates
accurate classification. When the classifier was trained on either the 915 MHz or
928 MHz datasets, however, the classification accuracy was low. Neither case was
able to identify tags from other frequencies very well, even though they did well
classifying tags from their own frequency. When D R = 915 MHz and DT = 928
MHz, for example, the |AUC| value was only 0.5195, not much higher than the
0.5000 value associated with random guessing. The corresponding confusion matrix
7 Classification of RFID Tags with Wavelet Fingerprinting
233
shows no diagonal but instead vertical lines at several predicted tag labels, indicating
that the classifier simply labeled all of the tags as one of these values.
7.8.2 Orientation Comparison
A second variable that is inherent in real-world RFID application is the orientation
relative to the antenna at which the RFID tags are read. This is one of the main reasons
why RFID technology is considered advantageous compared to traditional barcodes;
however, antenna design and transmission power variability result in changes in the
size and shape of the transmission field produced by the antenna [66]. It follows
that changes in the tag orientation relative to this field will result in changes in
the pre-demodulated RF signals. To test how the pattern classification routines will
behave with a changing variable like orientation, data was collected at three different
orientations (PL, OB, and UD) while being held at a common operating frequency
(902 MHz). This data was used as training and testing sets for the classifiers. Again,
only amplitude (α(t)) signal compression was used.
Table 7.5 shows the top individual classifier configuration for the RFID tag orientation comparison. Results are presented as sensitivity and specificity values for
a threshold value ĥ that corresponds to the minimum distance dROC in the ROC
space. Similarly, confusion matrices are presented in Fig. 7.14 for each classifier
configuration listed in Table 7.5.
Similar to the frequency results, the classification results again show a single
orientation that performs well as a training set regardless of the subsequent tag
orientation of the testing set. When trained on data collected at the parallel (PL)
orientation, the classification accuracies range from 94.9 to 99.7% across the three
testing tag orientations. Values of χ (ĥ) range from 0.880–1.000, meaning that over
88% of the true positives are correctly identified, and ψ(ĥ) values range from 0.952
to 0.997, indicating that over 95% of the true negatives are accurately identified as
well. These accuracies are verified in the confusion matrix representations found in
Fig. 7.13. When the classifiers are trained on either the oblique (OB) or upside-down
(UD) orientations, we again see that the classifiers struggles to identify testing data
from alternate tag orientations. The best performing of these results is for D R =
OB and DT = PL, where χ (ĥ) = 0.920 and ψ(ĥ) = 0.770 suggesting accurate true
positive classification with slightly more false positives as well, resulting in an overall
accuracy of 77.6%. When D R = UD, the testing results are again only slightly better
than random guessing, with |AUC| values of 0.5398 for DT = PL and 0.5652 for
DT = OB.
234
C. A. Miller and M. K. Hinders
Table 7.5 The classifier configurations ranked by dROC (ĥ) over all values of the classifier configuration variables when trained and tested on the orientation parameters PL, UD, and OB. D R and
DT correspond to the training and testing datasets, respectively. ρ represents the ratio of negative
versus positive EPCs in the training set. The threshold ĥ corresponding to the minimum distance
dROC is presented, along with the values of χ(ĥ) and ψ(ĥ)
DR
DT
|AUC|
Classifier configuration
#DWFP
Classifier
ρ
Results
χ
ψ
h (%)
Features
(ν)
Accuracy
(%)
PL
PL
1
QDC
(MATLAB)
13
0.9979
1.000
0.997
88.9
99.7
PL
UD
75
3NN
1
0.9489
0.960
0.953
12.1
95.4
PL
OB
20
1NN
1
0.8627
0.880
0.952
2.8
94.9
UD
PL
10
LDC
(MATLAB)
19
0.5398
0.680
0.658
2.9
65.9
UD
UD
1
QDC
(MATLAB)
5
0.9994
1.000
0.995
73.6
99.5
UD
OB
5
LDC
(MATLAB)
15
0.5652
0.680
0.622
1.9
62.4
OB
PL
10
LDC
(MATLAB)
13
0.8250
0.920
0.770
5.8
77.6
OB
UD
5
1NN
4
0.6042
0.760
0.622
1.9
62.7
OB
OB
75
QDC
(MATLAB)
2
1.0000
1.000
1.000
47.7
100.0
7.8.3 Different Day Comparison
We next present classification results when data recorded on multiple days were used
as training and testing datasets. The following analysis provides a better understanding of how signals taken from the same tag, same frequency, same orientation, but in
subsequent recordings on multiple days are comparable to each other. It is important
to note that the data used here was collected with the RFID tag being held by hand
above the antenna. While it was held as consistently as possible, it was not fixed
in position. Additionally, each subsequent recording was done when environmental
conditions were intentionally different from the previous recordings (humidity, temperature, etc.). Data was collected on four different days (Day 1, 2, 3, and 4). This
data was used as training and testing sets for the classifiers. Amplitude (α(t)) and
EPC error (e E PC (t)) were both used as signal compression methods.
Table 7.6 shows the top individual classifier configuration for the different day
tag recording comparison. Results are presented as sensitivity and specificity values
for a threshold value ĥ that corresponds to the minimum distance dROC in the ROC
space. Similarly, confusion matrices are presented in Fig. 7.15 for each classifier
configuration listed in Table 7.6.
235
25
20
20
20
15
10
15
10
5
5
1
1
1
5
10
15
τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
15
10
5
1
5
10 15
τ t ∈ DT
20
1
25
25
20
20
20
15
10
15
10
5
5
1
1
1
5
10
15
τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
1
5
10 15
τ t ∈ DT
20
1
25
20
20
15
10
5
1
1
1
5
10
15
τ t ∈ DT
20
25
τ c ∈ DR
20
5
10 15
τ t ∈ DT
20
25
1
5
10 15
τ t ∈ DT
20
25
1
5
10 15
τ t ∈ DT
20
25
5
25
10
5
10
25
15
1
15
25
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
7 Classification of RFID Tags with Wavelet Fingerprinting
15
10
5
1
5
10 15
τ t ∈ DT
20
25
1
Fig. 7.14 Confusion matrices (L ) for the classifier configurations corresponding to the minimum
distance dROC (ĥ) over all combinations of (D R , DT ) where D R , DT ∈ PL, UD, OB. Values of L
range from [0, 1] with 0 → black and 1 → white here
The first thing to note in these results is the prevalence of the EPC error (e E PC (t))
signal compression compared to the amplitude (α(t)) signal compression. This suggests that e E PC (t) is more able to correctly classify the RFID tags than the raw signal
amplitude. Unlike the two previous sets of results, where one frequency and one orientation classified well compared to the others, there is no dominant subset here.
All the different days were classified similarly when tested against each other. This
is expected, since there should be no reason data trained on a specific day should
perform better than any other. |AUC| values were mainly above 0.6700 yet below
0.7500, with accuracies ranging from 63.6 to 80.9% when D R = DT .
The confusion matrix representations of these classification results (Fig. 7.15)
again indicate there is no single dominant training subset. We see that D R = DT
236
C. A. Miller and M. K. Hinders
Table 7.6 The classifier configurations ranked by dROC (ĥ) over all values of the classifier configuration variables when trained and tested on the different day parameters Day 1, 2, 3, and 4. D R and
DT correspond to the training and testing datasets, respectively. ρ represents the ratio of negative
versus positive EPCs in the training set. The threshold ĥ corresponding to the minimum distance
dROC is presented, along with the values of χ(ĥ) and ψ(ĥ)
DR
DT
|AUC|
Classifier configuration
EPC
Comp.
#DWFP
Features
(ν )
Classifier
ρ
Results
χ
ψ
h
(%)
Accuracy
(%)
Day 1
Day 1
α, e E PC 15
QDC
(MATLAB)
2
0.9949
1.000
0.986
23.9
98.7
Day 1
Day 2
e E PC
20
LDC
(MATLAB)
4
0.7432
0.867
0.657
39.5
67.1
Day 1
Day 3
e E PC
10
1NN
12
0.6735
0.800
0.662
9.1
67.1
Day 1
Day 4
e E PC
10
QDC
(MATLAB)
5
0.7287
0.667
0.724
37.6
72.0
Day 2
Day 1
e E PC
20
LDC
(MATLAB)
10
0.7443
0.800
0.748
42.4
75.1
Day 2
Day 2
α, e E PC 20
QDC
(MATLAB)
2
0.9990
1.000
0.986
42.4
98.7
Day 2
Day 3
e E PC
20
3NN
1
0.7990
0.800
0.790
52.6
79.1
Day 2
Day 4
e E PC
1
SVM
1
0.7083
0.800
0.733
20.1
73.8
Day 3
Day 1
e E PC
1
SVM
1
0.7014
0.867
0.619
21.9
63.6
Day 3
Day 2
e E PC
15
3NN
8
0.6919
0.800
0.719
4.6
72.4
Day 3
Day 3
α, e E PC 50
QDC
(MATLAB)
5
1.0000
1.000
1.000
72.8
100.0
Day 3
Day 4
e E PC
3NN
7
0.6390
0.800
0.648
4.6
65.8
Day 4
Day 1
α, e E PC 1
3NN
3
0.7705
0.800
0.710
17.9
71.6
Day 4
Day 2
e E PC
5
1NN
3
0.7395
0.733
0.719
29.8
72.0
Day 4
Day 3
α
1
3NN
1
0.7422
0.667
0.819
57.6
80.9
Day 4
Day 4
α, e E PC 50
LDC
(PRTools)
1
1.0000
1.000
1.000
95.7
100.0
50
results all show distinct diagonal lines, even with D R , DT = Day 4 where there are
additional high off-diagonal entries in the matrix. This is indicated in Table 7.6 by
a relatively high threshold (ĥ) value of 95.7. When D R = DT , there are still faint
diagonal lines present in some of the confusion matrices. For example, when D R =
Day 2 and DT = Day 3, diagonal entries coming out of the lower left-hand corner
are somewhat higher in accuracy (closer to white in the confusion matrix) than their
surrounding off-diagonal entries. We see in Table 7.6 that this classifier has an |AUC|
equal to 0.7990 and a 79.1% overall accuracy.
40
35
35
35
35
30
26
35
τ t ∈ DT
40
30
26
26
30
35
τ t ∈ DT
40
26
26
30
35
τ t ∈ DT
40
40
40
35
35
35
35
30
30
26
26
26
30
35
τ t ∈ DT
30
35
τ t ∈ DT
40
30
35
τ t ∈ DT
40
35
35
35
35
30
26
26
26
30
35
τ t ∈ DT
30
35
τ t ∈ DT
40
30
35
τ t ∈ DT
40
35
35
35
35
30
26
26
26
30
35
τ t ∈ DT
40
τ c ∈ DR
40
τ c ∈ DR
40
τ c ∈ DR
40
30
30
35
τ t ∈ DT
40
26
30
35
τ t ∈ DT
40
26
30
35
τ t ∈ DT
40
26
30
35
τ t ∈ DT
40
30
26
26
40
26
26
40
30
35
τ t ∈ DT
30
26
26
40
τ c ∈ DR
40
τ c ∈ DR
40
τ c ∈ DR
40
30
30
26
26
40
30
26
30
26
26
40
τ c ∈ DR
40
τ c ∈ DR
40
τ c ∈ DR
τ c ∈ DR
30
30
τ c ∈ DR
30
26
26
τ c ∈ DR
40
τ c ∈ DR
40
30
τ c ∈ DR
237
40
τ c ∈ DR
τ c ∈ DR
7 Classification of RFID Tags with Wavelet Fingerprinting
26
26
30
35
τ t ∈ DT
40
Fig. 7.15 Confusion matrices (L ) for the classifier configurations corresponding to the minimum
distance dROC (ĥ) over all combinations of (D R , DT ) where D R , DT ∈ Day 1, 2, 3, and 4. Values
of L range from [0, 1] with 0 → black and 1 → white here
7.8.4 Damage Comparison
We next present the results of the RFID tag damage analysis to explore how physical
degradation affects the RFID signals and the resulting classification accuracy. The
datasets from Day 1, 2, 3, and 4 are combined and used here as a single training set.
The tags which make up this dataset, AD26 − AD40, are split into two subsets: tags
AD26 − AD32 were subjected to water damage study, while tags AD33 − AD40
were subjected to a physical damage study. The AD-612 tags are not waterproof nor
are they embedded in a rigid shell of any kind, although many RFID tags exist that are
sealed to the elements and/or encased in a shell for protection. For the water damage,
each tag was submerged in water for 3 hours, at which point they were patted dry
238
C. A. Miller and M. K. Hinders
Table 7.7 The classifier configurations ranked by dROC (ĥ) over all values of the classifier configuration variables when trained and tested on the tag damage comparisons for both water and physical
damage. D R and DT correspond to the training and testing datasets, respectively. ρ represents the
ratio of negative versus positive EPCs in the training set. The threshold ĥ corresponding to the
minimum distance dROC is presented, along with the values of χ(ĥ) and ψ(ĥ)
DR
DT
|AUC|
Classifier configuration
EPC
#DWFP
Comp.
Features
(ν )
Classifier
ρ
Results
χ
ψ
h (%)
Accuracy
(%)
Day 1,
2, 3, 4
Wet
α
1
SVM
1
0.6361
0.714
0.786
80.1
73.8
Day 1,
2, 3, 4
Wet-todry
α
1
3NN
17
0.7789
0.857
0.738
4.8
75.5
Day 1,
2, 3, 4
Light
α
Damage
5
1NN
16
0.7589
0.750
0.839
17.9
82.8
Day 1,
2, 3, 4
Heavy
α
Damage
20
LDC
(PRTools)
7
0.7980
1.000
0.589
44.5
64.1
to remove any excess water and used to collect data (labeled as Wet). They were
then allowed to air-dry overnight, and were again used to collect data (Wet-to-dry).
For the physical damage, each tag was first gently crumpled by hand (light damage)
and subsequently balled up and then somewhat flattened (heavy damage), with data
being collected after each stage.
Table 7.7 shows the top individual classifier configuration for the two RFID tag
damage comparisons. Results are presented as sensitivity and specificity values for
a threshold value ĥ that corresponds to the minimum distance dROC in the ROC
space. Similarly, confusion matrices are presented in Fig. 7.16 for each classifier
configuration listed in Table 7.7.
The RFID tag damage classification results provide similar to the previous different day comparison. The water damage did not seem to have a severe effect on the
classification accuracy, while the more severe physical damage showed lower classifier accuracy. However, rather than relatively equal χ (ĥ) and ψ(ĥ) values, the heavy
damage resulted in values of χ (ĥ) = 1.000 and ψ(ĥ) = 0.589, which means that the
classifier was optimistically biased and over-classified positive matches. This lower
accuracy was not unexpected, as deformation of the tag’s antenna should distort the
RF signal and therefore the classifier’s ability to identify a positive match for the tag.
7.9 Discussion
The results presented above suggest that a dominant reader frequency, 902 MHz in
this case, may exist at which data can be initially collected for the classifier to be
trained on and then used to correctly identify tags read at alternate frequencies. In our
analysis, we have explored reader frequencies that span the North American UHF
range, yet were only part of the full 865–928 MHz UHF range for which the AD-612
30
τ c ∈ DR
τ c ∈ DR
7 Classification of RFID Tags with Wavelet Fingerprinting
26
30
26
30
τ t ∈ DT
τ c ∈ DR
26
τ c ∈ DR
239
37
33
26
30
τ t ∈ DT
33
37
τ t ∈ DT
37
33
33
37
τ t ∈ DT
Fig. 7.16 Confusion matrices (L ) for the classifier configurations corresponding to the minimum
distance dROC (ĥ) over all combinations of (D R , DT ) where D R = Day 1, 2, 3, and 4, and DT ∈ wet,
wet-to-dry, light damage, and heavy damage. Values of L range from [0, 1] with 0 → black and
1 → white here
tags used here were optimized. Therefore, the dominant 902 MHz read frequency
we observed lies at the center of the actual tags operating frequency range. It is of no
surprise that the tags perform best at the center of their optimized frequency range
rather than at the upper limit. Similarly, a classifier can be trained on a tag orientation
(relative to the reader antenna) that may result in accurate classification of RFID tags
regardless of their subsequent orientation to the reader antenna. Antenna design for
both the readers and the tags is an active field of research [2], and it is expected
that the RF field will be non-uniform around the antennas. This could explain why
only one of the experimental orientations used here performs better than the others.
Regardless of the field strength, however, the unique variations in the RF signature of
an RFID tag should be present. It is promising that the classifier still had an accuracy
of over 60% with these variations, and up to 94.9% accuracy if trained on the parallel
(PL) orientation.
240
C. A. Miller and M. K. Hinders
Changes in the environmental conditions, like ambient temperature and relative
humidity, were also allowed in the different day study where the RFID tags were
suspended by hand near the antenna (in generally the same spot) for data collection
on successive afternoons. It is important to note that the tags were not fixtured for this
study, and that slight variations in both distance to the reader as well as orientation
were inherent due to the human element. Even so, the classifier was generally able
to correctly identify the majority of the RFID tags as being either a correct match
or a correct mismatch when presented with a dataset it had never seen before, with
accuracies ranging from 63.6 to 80.9%. This study represents a typical real-world
application of RFID tags due to these environmental and human variations.
As previously mentioned, the EPC compression method tended to favor the EPC
error signal e E PC (t), although there was not a large difference in classifier performance between the different day comparison that used both α(t) and e E PC (t)
compression, and the frequency/orientation comparisons that used only α(t) compression. The parameter ρ had a large spread of values across the classifiers, indicating that the classification results may not be very sensitive to class imbalance within
the training set. The number of DWFP features also shows no consistent trend in our
results, other than being often larger than 1, indicating that there may be room for
feature reduction. With any application of pattern classification, a reduction in the
feature space through feature selection can lead to improved classification results
[44]. Individual feature ranking is one method that can be used to identify features
on a one-by-one basis; however, it can overlook the usefulness of combining feature
variables. In combination with a nested selection method like sequential backward
floating search (SBFS), the relative usefulness of the DWFP features, as well as the
remaining features, can be evaluated [51].
The results in Tables 7.4–7.6 where D R = DT are comparable to those observed
by Bertoncini et al. [20], with some classifiers having 100% accuracy and the rest
near 99%. In these instances, the classifier was trained on subsets of the testing data,
so it is expected that the classifier performs better in these cases.
It is also important to note that the final decision threshold ĥ used can still vary
greatly depending on the classifier’s application. It is important to note that adjusting
the classifier’s final threshold value does not alter the classification results of said
classifier. The |AUC| takes into account all possible threshold values, and is therefore fixed for each classifier configuration. The threshold values only determine the
distribution of error types, χ vs. ψ, within the results. Aside from the minimum dROC
metric, weights can be applied to determine an alternate threshold if the application
calls for a trade-off between false negative and false positive results. For example, if
a user is willing to allow up to five false positives before allowing a false negative,
a minimizing function can be used to identify this weighted optimal threshold.
A comparison of different methods to determine a final threshold can be seen
in Table 7.8, where the classifier configuration trained on Day 1 data and tested
on Day 4 data from Table 7.6 is presented for several alternate threshold values.
First, the threshold is shown for the minimum distance dROC (ĥ), as was previously
presented in Table 7.6. The threshold is then shown for the minimum number of total
misclassifications ( f + + f − ), followed by minimum number of false positives ( f + ),
7 Classification of RFID Tags with Wavelet Fingerprinting
241
Fig. 7.17 A plot of sensitivity χ(h) (solid line) and specificity ψ(h) (dashed line) versus threshold
for the classifier trained on Day 1 and tested on Day 4 from Table 7.6. Threshold values are shown
corresponding to the min( f + + f − ) (h = 63.4, dash-dot line) as well as ĥ (h = 37.6, dotted line).
The threshold value determines the values of χ and ψ, and is chosen based on the classifier’s final
application
and then by that of the lowest number of false negatives ( f − ). Several weighting
ratios are then shown, where the cost of returning a false positive ( f + ) is increased
compared to the cost of returning a false negative ( f − ). For example, a weighting
ratio of 1 : 5 [ f − : f + ] means that 5 f + cost as much as a single f − , putting more
emphasis on reducing the number of f − present. It can be seen that the values of χ (h)
and ψ(h) change as the threshold h changes. An overly optimistic classifier is the
result of threshold values that are too low, when all classifications are identified as
positive matches (χ ≈ 1 and ψ ≈ 0). Alternatively, an overly pessimistic classifier is
the result of threshold values that are too high, resulting in all classifications identified
as negative matches (χ ≈ 0 and ψ ≈ 1). The weighting ratio 1 : 10 [ f − : f + ] returns
the most even values of χ and ψ, which matches the ĥ threshold. Figure 7.17 shows
an example of the trade-off between the values of χ (h) and ψ(h) as the threshold h
is increased, where two examples from Table 7.8 are highlighted.
It is useful to discuss a few examples here to better understand the different
threshold results. In a security application, for example, where RFID badges are
used to control entry into a secure area, it is most important to minimize the number
of f + results because allowing a cloned RFID badge access to a secure area could
be devastating. In this situation, we would want the value of ψ to be as close to 1.0
as possible. In Table 7.8, this result corresponds to a threshold value of h = 65.3.
Unfortunately, the value of χ at this threshold is 0.067, which means that almost all
of the true ID badges would also be identified as negative matches. Therefore, that
specific classifier is not appropriate for a security application.
An alternate example is the use of RFID-embedded credit cards in retail point of
sale. To a store, keeping the business of a repeat customer may be much more valuable
than losing some merchandise to a cloned RFID credit card. In this sense, it is useful
to determine an appropriate weight of [ f − : f + ] that evens the gains and losses of
both cases. If it were determined that a repeat customer would bring in 20 times as
much revenue as it would cost to refund a fraudulent charge due to a cloned RFID
account, then a weight of 1 : 20 [ f − : f + ] could be used to determine the optimal
DT
Day 4
..
.
DR
Day 1
..
.
..
.
e E PC
..
.
10
QDC
(MATLAB)
..
.
Classifier configuration
EPC
#DWFP
Classifier
Comp.
Features
..
.
5
ρ
..
.
0.7287
|AUC|
0.133
0.067
1.000
0.200
0.667
0.867
0.933
min( f + )
min( f − )
1:5 [ f + : f − ]
1:10 [ f + : f − ]
1:15 [ f + : f − ]
1:20 [ f + : f − ]
0.667
min dROC (ĥ)
min( f + + f − )
Results
χ
Sorted by
1.000
0.000
0.981
0.724
0.529
0.443
0.995
0.724
ψ
65.3
0.1
56.7
37.6
30.8
28.2
63.4
37.6
h (%)
93.8
6.7
92.9
72.0
55.1
47.6
93.8
72.0
Accuracy
(%)
Table 7.8 The classifier configuration trained on Day 1 and tested on Day 4 data from Table 7.6 using several different metrics to determine the final threshold
h. Metrics include min( f + + f − ), min( f + ), min( f − ), a weight of 1:5, 1:10, 1:15, and 1:20 for [ f + : f − ], meaning that up to 20 f + are allowed for each f − .
The choice of metric used to determine the final threshold value depends on the classifier’s final application
242
C. A. Miller and M. K. Hinders
7 Classification of RFID Tags with Wavelet Fingerprinting
243
classifier threshold. From Table 7.8, it can be seen that a corresponding threshold
is h = 28.2, resulting in values of χ = 0.933 and ψ = 0.443. This classifier would
incorrectly identify 7% of the repeat customers as being fraudulent while correctly
identifying 44% of the cloned signals as being fraudulent. This specific classifier
could be useful in this retail example.
7.10 Conclusion
The USRP software-defined radio system has been shown to capture signals at a
usable level of detail for RFID tag classification applications. Since the signal manipulations are performed in software, this allows us to extract not only the raw RF signal,
but it also allows us to generate our own, ideal signal to compare against. A new
signal representation has been created this way, that is, the difference between the
recorded and ideal signal representations, e E PC (t), and has proven to be very useful
in the classification routines.
The binary classification routine has been explored on more real-world grounds,
including exposure to a variety of environmental conditions without the use of RF
shielding to boost the SNR level. To explore classifier robustness without fixed proximity and orientation relative to the RFID reader, several validation classifications
were performed, including an RFID reader frequency comparison, a tag orientation
comparison, a multi-day data collection comparison, as well as physical damage and
water exposure comparisons. The frequency comparison was performed to determine the effect that the variability of potential RFID readers inspection frequencies
would have on the classification routines. The results were promising, although not
perfect, and suggest that while it is best to train a classifier on all possible scenarios, a main frequency (i.e., center frequency) could potentially be used for a master
classifier training set. A similar orientation comparison was done, altering the RFID
tag’s orientation relative to the antenna. Again, the results showed it was best to train
the classifiers on the complete set of data; however, there was again promise for a
potential single main orientation that could be used to train a classifier.
In the multi-day collection comparison, data was collected by hand in an identical
fashion but on separate days. The results showed that the inconsistency associated
with holding an RFID tag near the antenna cause the classifiers to have trouble
correctly identifying EPCs as coming from their correct tag. Two further comparisons
were performed to assess the degree that physical degradation had on the RFID tags.
When subjected to water, promising classifier configurations were found that were
on the same level of accuracy as results seen for undamaged tags, suggesting that
the water may not have a significant effect on the RFID classification routines. A
separate subset of the RFID tags was subjected to a similar degradation analysis, this
time with physical bending as a result of being crumpled by hand. The results show
that, as expected, bending of the RFID tag’s antenna caused degradation in the raw
signal that caused the classifier to misclassify many tags.
244
C. A. Miller and M. K. Hinders
Applications of RFID technology that implement a fixed tag position are a potential market for the classification routine we present. One example of this is with
ePassports, which are embedded with an RFID chip containing digitally signed biometric information [7]. These passports are placed into a reader that controls the
position and distance of the RFID chip relative to the antennas. Additionally, passports are generally protected from the elements and can be replaced if they undergo
physical wear and tear. We have demonstrated a specific emitter identification technique that performs well given these restrictions.
Acknowledgements This work was performed using computational facilities at the College of
William and Mary which were provided with the assistance of the National Science Foundation, the
Virginia Port Authority, Sun Microsystems, and Virginia’s Commonwealth Technology Research
Fund. Partial support for the project is provided by the Naval Research Laboratory and the Virginia
Space Grant Consortium. The authors would like to thank Drs. Kevin Rudd and Crystal Bertoncini
for their many helpful discussions.
References
1. Ngai EWT, Moon KKL, Riggins FJ, Yi CY (2008) RFID research: an academic literature
review (1995–2005) and future research directions. Int J Prod Econ 112(2):510–520
2. Abdulhadi AE, Abhari R (2012) Design and experimental evaluation of miniaturized monopole
UHF RFID tag antennas. IEEE Antennas Wirel Propag Lett 11:248–251
3. Khan MA, Bhansali US, Alshareef HN (2012) High-performance non-volatile organic ferroelectric memory on banknotes. Adv Mat 24(16):2165–2170
4. Juels A (2006) RFID security and privacy: a research survey. IEEE J Sel Areas Commun
24(2):381–394
5. Halamka J, Juels A, Stubblefield A, Westhues J (2006) The security implications of verichip
cloning. J Am Med Inform Assoc 13(6):601–607
6. Heydt-Benjamin T, Bailey D, Fu K, Juels A, O’Hare T (2007) Vulnerabilities in first-generation
RFID-enabled credit cards. In: Dietrich S, Dhamija R (eds) Financial Cryptography and Data
Security, vol 4886. Lecture Notes in Computer Science. Springer, Berlin/Heidelberg, pp 2–14
7. Richter H, Mostowski W, Poll E (2008) Fingerprinting passports. In: NLUUG Spring conference on security, pp 21–30
8. White D (2005) NCR: RFID in retail. In: RFID: applications, security, and privacy, pp 381–395
9. Westhues J (2005) Hacking the prox card. In: RFID: applications, security, and privacy, pp
291–300
10. Smart Card Alliance (2007) Proximity mobile payments: leveraging NFC and the contactless
financial payments infrastructure. Whitepaper
11. Léopold E (2009) The future of mobile check-in. J Airpt Manag 3(3):215–222
12. Wang MH, Liu JF, Shen J, Tang YZ, Zhou N (2012) Security issues of RFID technology in
supply chain management. Adv Mater Res 490:2470–2474
13. Juels A (2004) Yoking-proofs for RFID tags. In: Proceedings of the second annual IEEE
pervasive computing and communication workshops (PERCOMW 2004), PERCOMW ’04,
pp 138–143
14. Juels A (2005) Strengthening epc tags against cloning. In: Proceedings of the 4th ACM workshop on wireless security, WiSe ’05, pp 67–76. ACM, New York
15. Riezenman MJ (2000) Cellular security: better, but foes still lurk. IEEE Spectr 37(6):39–42
16. Suski WC, Temple MA, Mendenhall MJ, Mills RF (2008) Using spectral fingerprints to improve
wireless network security. In: Global telecommunications conference, 2008. IEEE GLOBECOM 2008. IEEE. IEEE, New Orleans, pp 1–5
7 Classification of RFID Tags with Wavelet Fingerprinting
245
17. Gerdes RM, Mina M, Russell SF, Daniels TE (2012) Physical-layer identification of wired
ethernet devices. IEEE Trans Inf Forensics Secur 7(4):1339–1353
18. Kennedy IO, Scanlon P, Mullany FJ, Buddhikot MM, Nolan KE, Rondeau TW (2008) Radio
transmitter fingerprinting: a steady state frequency domain approach. In: Proceedings of the
IEEE 68th vehicular technology conference (VTC 2008), pp 1–5
19. Danev B, Heydt-Benjamin TS, Čapkun S (2009) Physical-layer identification of RFID devices.
In: Proceedings of the 18th conference on USENIX security symposium, SSYM’09. USENIX
Association, Berkeley, CA, USA, pp 199–214
20. Bertoncini C, Rudd K, Nousain B, Hinders M (2012) Wavelet fingerprinting of radio-frequency
identification (RFID) tags. IEEE Trans Ind Electron 59(12):4843–4850
21. Zanetti D, Danevs B, Čapkun S (2010) Physical-layer identification of UHF RFID tags. In:
Proceedings of the sixteenth annual international conference on Mobile computing and networking, MobiCom ’10. ACM, New York, NY, USA, pp 353–364
22. Romero HP, Remley KA, Williams DF, Wang C-M (May 2009) Electromagnetic measurements
for counterfeit detection of radio frequency identification cards. IEEE Trans Microw Theory
Tech 57(5):1383–1387
23. EPCglobal Inc. (2008) EPC radio-frequency identity protocols: class-1 generation-2 UHF RFID
protocol for Communications at 860 MHz–960 MHz Version 1.2.0
24. GNU Radio Website (2011) Software. http://www.gnuradio.org
25. Ellis KJ, Serinken N (2001) Characteristics of radio transmitter fingerprints. Radio Sci
36(4):585–597
26. Hou JD, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals.
Mater Eval 60(9):1089–1093
27. Haralick RM, Shapiro LG (1992) Computer and robot vision, vol 1. Addison-Wesley, Boston,
MA
28. Learned RE, Willsky AS (1995) A wavelet packet approach to transient signal classification.
Appl Comput Harmon Anal 2(3):265–278
29. Feng Y, Schlindwein FS (2009) Normalized wavelet packets quantifiers for condition monitoring. Mech Syst Signal Process 23(3):712–723
30. de Sena A, Rocchesso D (2007) A fast Mellin and scale transform. EURASIP J Adv Signal
Process 2007(1):089170
31. Harley JB, Moura JMF (2011) Guided wave temperature compensation with the scale-invariant
correlation coefficient. In: 2011 IEEE International Ultrasonics Symposium (IUS), pp 1068–
1071, Orlando, FL
32. Harley JB, Ying Y, Moura JMF, Oppenheim IJ, Sobelman L (2012) Application of Mellin
transform features for robust ultrasonic guided wave structural health monitoring. AIP Conf
Proc 1430:1551–1558
33. Sundaram H, Joshi SD, Bhatt RKP (1997) Scale periodicity and its sampling theorem. IEEE
Trans Signal Proc 45(7):1862–1865
34. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data
Anal 6(5):429–449
35. Weiss GM, Provost F (2001) The effect of class distribution on classifier learning: Technical
Report ML-TR-44. Department of Computer Science, Rutgers University
36. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning. Morgan Kaufmann Publishers, Inc, Burlington, pp 179–186
37. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
38. Fukunaga K (1990) Introduction to statistical pattern recognition. Computer science and scientific computing, 2nd edn. Academic Press, Boston
39. Kuncheva LI (2004) Combining pattern classifiers. Wiley, New York
40. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
41. Webb AR (2012) Statistical pattern recognition. Wiley, Hoboken
42. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
246
C. A. Miller and M. K. Hinders
43. Kanal L (1974) Patterns in pattern recognition: 1968–1974. IEEE Trans Inf Theory 20(6):697–
722
44. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Patt
Anal Mach Intell 22(1):4–37
45. Watanabe S (1985) Pattern recognition: human and mechanical. Wiley-Interscience publication,
Hoboken
46. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning.
Artif Intell 97(1):245–271
47. Jain AK, Chandrasekaran B (1982) Dimensionality and sample size considerations in pattern
recognition practice. In: Krishnaiah PR, Kanal LN (eds) Classification pattern recognition and
reduction of dimensionality. Handbook of Statistics, vol 2. Elsevier, Amsterdam, pp 835–855
48. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics.
Bioinformatics 23(19):2507–2517
49. Dash M, Liu H (1997) Feature selection for classification. Intellect Data Anal 1(1–4):131–156
50. Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK (2000) Dimensionality reduction
using genetic algorithms. IEEE Trans Evol Comput 4(2):164–171
51. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn
Res 3:1157–1182
52. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space.
Stat Sin 20(1):101–148
53. Romero E, Sopena JM, Navarrete G, Alquézar R (2003) Feature selection forcing overtraining
may help to improve performance. In: Proceedings of the international joint conference on
neural networks, 2003, vol 3. IEEE, Portland, OR, pp 2181–2186
54. Lambrou T, Kudumakis P, Speller R, Sandler M, Linney A (1998) Classification of audio
signals using statistical features on time and wavelet transform domains. In: Proceedings of the
1998 IEEE international conference on acoustics, speech and signal processing. vol 6. IEEE,
Washington, pp 3621–3624
55. Smith SW (2003) Digital signal processing: a practical guide for engineers and scientists.
Newnes, Oxford
56. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Applications of mathematics. Springer, Berlin
57. Theodoridis S, Koutroumbas K (1999) Pattern recognition. Academic Press, Cambridge
58. Duin RPW, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax DMJ, Verzakov S (2007)
PRTools4.1, A Matlab toolbox for pattern recognition
59. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other
kernel-based learning methods. Cambridge University Press, Cambridge
60. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
61. Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating
characteristic curves derived from the same cases. Radiology 148(3):839–843
62. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine
learning algorithms. Pattern Recognit 30(7):1145–1159
63. Lobo JM, Jiménez-Valverde A, Real R (2007) Auc: a misleading measure of the performance
of predictive distribution models. Glob Ecol Biogeogr 17(2):145–151
64. Hanczar B, Hua J, Sima C, Weinstein J, Bittner M, Dougherty ER (2010) Small-sample precision of ROC-related estimates. Bioinformatics 26(6):822–830
65. Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the
ROC curve. Mach Learn 77(1):103–123
66. Rao KVS, Nikitin PV, Lam SF (2005) Antenna design for UHF RFID tags: a review and a
practical application. IEEE Trans Antennas Propag 53(12):3870–3876
Chapter 8
Pattern Classification for Interpreting
Sensor Data from a Walking-Speed
Robot
Eric A. Dieckman and Mark K. Hinders
Abstract In order to perform useful tasks for us, robots must have the ability to
notice, recognize, and respond to objects and events in their environment. This
requires the acquisition and synthesis of information from a variety of sensors. Here
we investigate the performance of a number of sensor modalities in an unstructured
outdoor environment including the Microsoft Kinect, thermal infrared camera, and
coffee can radar. Special attention is given to acoustic echolocation measurements
of approaching vehicles, where an acoustic parametric array propagates an audible
signal to the oncoming target and the Kinect microphone array records the reflected
backscattered signal. Although useful information about the target is hidden inside
the noisy time-domain measurements, the dynamic wavelet fingerprint (DWFP) is
used to create a time–frequency representation of the data. A small-dimensional feature vector is created for each measurement using an intelligent feature selection
process for use in statistical pattern classification routines. Using experimentally
measured data from real vehicles at 50 m, this process is able to correctly classify
vehicles into one of five classes with 94% accuracy.
Keywords Acoustic parametric array · Mobile robotics · Wavelet fingerprint ·
Vehicle classification
8.1 Overview
Useful robots are able to notice, recognize, and respond to objects and events and
then make decisions based on this information in real time. I find myself strongly
disapproving of drivers checking their smartphones while I’m trying to jaywalk with
a cup of coffee from WaWa. I think they should watch out for me, because I’m trying
not to spill. Since I’m too lazy to go down to the crosswalk and wait for the light, I
usually check for oncoming traffic and note both what type of vehicles are coming
my way and estimate my chances of being able to cross safely without spilling on
E. A. Dieckman · M. K. Hinders (B)
Department of Applied Science, William & Mary, Williamsburg, VA 23187-8795, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_8
247
248
E. A. Dieckman and M. K. Hinders
myself. While this may be a simple task for a human, an autonomous robotic assistant
must exhibit many human behaviors to successfully complete the task. Now consider
having a robotic assistant fetch me a cup of coffee from across the street [1] which
I would find useful. In particular, we will consider the sensing aspects of creating a
useful autonomous robot.
A robot must be able to sense and react to objects and events occurring over
a range of distances. For our case of a mobile walking-speed robot, this includes
long-range sensors that can detect dangers such as oncoming motor vehicles, in
time to evade, as well as close-range sensors that provide more information about
stationary objects in the environment. In addition, sensors must be able to provide
useful information in a variety of environmental conditions. While an RGB camera
may provide detailed information in a well-lit environment, it is less useful on a
foggy night. The key to creating a useful autonomous robot is to equip the robot with
a number of complementary sensors so that it can learn about its environment and
make decisions.
In particular, we are interested in the use of acoustic echolocation as a long-range
sensor modality for mobile robotics. While sonar has long been used as a sensor
in underwater environments, the short propagation of ultrasonic waves in air has
restricted its use elsewhere. Lower frequency acoustic signals in the audible range
are able to propagate long distances in air, but traditional methods of creating highly
directional audible acoustic signals require very large speaker arrays not feasible for
a mobile robot. In addition, the complex interactions of these signals with objects in
the environment and ubiquitous environmental noise make the reflected signals very
difficult to analyze.
We use an acoustic parametric array to generate our acoustic echolocation signal.
This is a physically small speaker that uses nonlinear acoustics to create a tight
beam of low-frequency sound that can propagate long distances [34–38]. Such a
highly directional signal provides good spatial resolution that allows a distinction
between the target and environmental clutter. Systematic experimental investigations
and simulations allow us to study the propagation of these nonlinear sound beams
and their interaction with scatterers [39, 40].
These sensor signals are very noisy, making it difficult for the robot to extract useful information. One common technique that can provide additional insight is to transform the problem into an alternate domain. For the simple case of one-dimensional
time-domain signal this most commonly takes the form of Fourier transform. While
this converts the signal to the frequency domain and can reveal previously hidden information, all time-domain information is lost in the transformation. A better
solution for time-domain data is to transform the original signal into a joint time–
frequency domain. This can be accomplished by a number of methods, but there is
no one best time–frequency representation. Uncertainty limits restrict simultaneous
time and frequency resolution, some methods are very complex and hard to implement, and the resulting two-dimensional images can be even more difficult to analyze
than the original signal.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
249
8.2 rMary
Creating an autonomous robot able to act independently of human control has long
been an area of active research in robotics. New low-cost sensors and recent advances
in signal processing necessary to analyze large amounts of streaming data have only
increased the number of researchers focusing on autonomous robotics, buoyed by a
greater public awareness of the field.
In particular, DARPA-funded competitions have enabled focused efforts to create autonomous vehicles and humanoid robots. The 2004 DARPA Grand Challenge
and follow-on 2007 DARPA Urban Challenge [15] focused on creating autonomous
vehicles that could safely operate over complex courses in rural and urban environments. These competitions rapidly expanded the boundaries of the field, leading
to recent near-commercial possibilities such as Google’s self-driving cars [16, 17].
Similarly, the recent and rapid rise of unmanned aerial vehicles (AUVs) has led to
a large amount of research in designing truly autonomous drone aircraft. Designing
autonomous vehicles, whether surface and aerial, comes with its own difficulties,
namely, collecting and interpreting data at a fast enough rate to make decisions.
This often requires expensive sensors that only large research programs can afford.
Commercialization of such technologies will require lower cost alternative sensor
modalities.
A more tractable problem is the design of walking-speed robotic platforms. Compared to road or air vehicles, the lower speed of such platforms allows the use of
low-cost sensor modalities that take longer to acquire and analyze data, since more
time is available to make a decision. Using commercially available wheeled platforms
(such as all-terrain vehicles) shifts focus from the engineering problems in creating
a humanoid robot to the types of sensors used and how such data can be combined.
For these reasons, we will focus on the analysis of different sensor modalities for
a walking-speed robot. The goal in autonomous robotics is to create a robot with
the ability to perform tasks normally accomplished by a human. An added bonus is
the ability to do tasks that are too dangerous for humans such as entering dangerous
environments in disaster situations.
8.2.1 Sensor Modalities for Mobile Robots
We can classify sensor modalities as “active” or “passive” depending on whether they
transmit a signal (i.e., radar) or use information already present in the environment
(i.e., an RGB image), respectively. The use of passive sensors is often preferred to
reduce the possibility of detection in covert operations and to reduce annoyance.
Another important consideration is the range at which a given sensor performs.
For example, imaging systems can provide detailed information about objects near
the sensor but may not detect fast-moving hazards (such as an oncoming vehicle) at
a great enough distance to allow a robot time to evade. Long-range sensors such as
250
E. A. Dieckman and M. K. Hinders
radar or LIDAR are able to detect objects at a greater distance, giving the robot more
time to maneuver out of the way. This long-range detection often requires expensive
sensors which don’t provide detailed information about the target. A combination
of near- and long-range sensors will give a robot the most information about its
environment.
Once the sensor has measured some information about its environment, the robot
needs to know how to interact. A real-world example of this difficulty comes from
agriculture, where smart machines have the ability to replace human workers in
repetitive tasks. One agricultural application in particular is the thinning of lettuce,
where human laborers paid by the acre thin healthy plants unnecessarily. A robotic
“Lettuce Bot” is towed behind a tractor, imaging individual lettuce plants as it passes
and using computer vision algorithms to comparing these images to a database of
over a million images to decide which plants to remove by dousing them with a concentrated dose of fertilizer [18]. Though this machine claims 98% accuracy while
driving at 2 kph and may be cost-competitive with manual labor, it also highlights
issues with image-based analysis on mobile robots. Creating a large enough database
for different types of lettuce is a monumental task, given the different colors, shapes,
soil types, and other variables. Even the sun creates problems, causing shadows that
are difficult for the computer vision software to correctly match. Shielding the sensor
and restraining the image database to a particular geographical region (thereby reducing the number of lettuce variants and soil types) allow these techniques to work for
this particular application but the approach is not scalable to more unstructured environments. While the human brain has evolved to process images quickly and easily,
automated interpretation of images is a difficult problem. Using non-imaging sensors can ease the signal processing requirements, but requires sophisticated machine
learning techniques to deal with large amounts of abstract data.
In addition to the range limitations of different sensors and the difficulty in analyzing the resulting data, individual sensor modalities tend to work better in particular
environmental conditions. For example, a webcam can create detailed images in a
well-lit environment but fail to provide much useful information on a dark night
while passive infrared images can detect subtle changes in emissivity from surfaces
in a variety of weather conditions. Because of the limitations of individual sensors,
intelligent combinations of complementary sensors must be used to create the most
robust awareness of an unstructured environment. The exact manner in which these
modalities are combined is referred to as data fusion [19].
Our focus is on the performance of different sensor modalities in real-world,
unstructured environments under a variety of environmental conditions. Our robotic
sensor platform, rMary, contains a number of both passive and active sensors. Passive
vision-based sensors include a standard RGB webcam and infrared sensors operating
in both the near-infrared and long-wave regions of the infrared spectrum. Active sensors include a three-dimensional depth mapping system that uses infrared projection,
a simple radar system, and a speaker/microphone combination to perform acoustic
echolocation in the audible range. We will apply machine learning algorithms to this
data to automatically detect and classify oncoming vehicles at long distances.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
251
8.3 Investigation of Sensor Modalities Using rMary
To collect data in unstructured outdoor environments, we have created a mobile
sensor platform with multiple sensors (Fig. 8.1). This platform, named rMary, was
first placed in service in 2003 to collect infrared data from passive objects [20]. The
robot is remotely operated using a modified RC aircraft controller and is steered
using four independent drive motors synced to allow agile skid-steering. Power is
supplied to these motors from a small battery bank built into the base of the platform,
where the control electronics are also located. The low center of gravity, inflatable
rubber tires, and a custom-built suspension system allow off-road transit to acquire
measurements in otherwise inaccessible locations. The current sensors on rMary
include the following:
• Raytheon ControlIR 2000B long-wave infrared camera
• Microsoft Kinect (2010)
– Active IR projector
– IR and RGB sensors
– 4-channel microphone array
• Sennheiser Audiobeam acoustic parametric array
• Coffee can FMCW ISM-band radar
A parabolic dish microphone can also be attached, but the Kinect microphone
array provides superior audio recordings. Sensor control and data acquisition are
accomplished using a low-powered Asus EeePC 1000h Linux netbook. This underpowered laptop was deliberately used to show that data can be easily acquired with
commodity hardware. The computer’s single internal USB hub does restrict the number of simultaneous data streams, which only became an issue when trying to collect
video data from multiple hardware devices using non-optimized drivers. Each of the
sensors, whose location is shown in Fig. 8.1, will be discussed in the sections that
follow.
8.3.1 Thermal Infrared (IR)
A Raytheon ControlIR 2000B infrared camera couples a long-wave focal plane array
microbolometer detector to a 50 mm lens to provide 320 × 240 resolution at 30 Hz
over a 18◦ × 13.5◦ field of view. Although thermal imaging cameras are now low cost
and portable enough to be used by home inspectors for energy audits, this was one
of the few uncooled, portable infrared imaging systems available when first installed
in 2006.
These first experiments showed that the sensor was able to characterize passive
(non-heat-generating) objects through small changes in their thermal signatures [21,
22]. The sensor measures radiance in the long-wave region (8-15 μm) of the infrared
spectrum where radiation from passive objects is maximum.
252
E. A. Dieckman and M. K. Hinders
Fig. 8.1 The rMary sensor
platform contains a
forward-looking long-wave
infrared camera, mounted
upright in an enclosure for
stability and weather
protection, an acoustic
parametric array, the
Microsoft Kinect sensor bar,
and a coffee can radar. All
sensors are powered by the
on-board battery bank and
controlled with a netbook
computer running Linux
For stability and protection from the elements, the camera is mounted vertically
in an enclosed locker. A polished aluminum plate with a low emissivity value makes
a good reflector of thermal radiation and allows the camera to image objects in front
of rMary. Figure 8.2 shows several examples of images of passive objects acquired
with the thermal infrared camera, both indoors and outside.
8.3.2 Kinect
Automated interpretation of images from the thermal infrared camera requires segmentation of the images to distinguish areas of interest, which can be a difficult
image processing task. In addition, the small field-of-view and low resolution of the
infrared camera used here led us to investigate possible alternatives. While there are
still relatively few long-wave thermal sensors with enough sensitivity to measure
the small differences in emissivity between passive objects, other electronics now
contain infrared sensors.
One of the most exciting alternatives was the Microsoft Kinect, released in November 2010 as an accessory to the Xbox 360 gaming console. The Kinect was immensely
popular, selling millions of units in the first several months, and integrates active
infrared illumination, an IR sensor, and an RGB camera to output 640 × 480 RGBD (RGB + depth) video at 30 Hz. It also contains a tilt motor, accelerometer, and
4-channel microphone array, all at total cost of less than USD $150.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
253
Fig. 8.2 Examples of passive objects imaged with the thermal IR camera include (clockwise from
top left) a car in front of a brick wall, a tree trunk with foliage, a table and chairs in front of a
bookcase, and a window in a brick wall
Access to this low-cost suite of sensors is provided by two different open source
driver libraries: libfreenect [23], with a focus on audio support and motor controls
and OpenNI [24], with greater focus on skeletal tracking and object segmentation.
Other specialized libraries such as nestk [25] used these drivers to provide highlevel functions and ease of use. In June 2011, Microsoft released their own SDK
that provides access to the raw sensor streams and high-level functions, but these
libraries only worked in Windows 7 and are closed-source with restrictive licenses
[26]. In addition, Microsoft changed the license agreement in March 2012 to require
use of the “Kinect for Windows” sensor instead of the identical but cheaper Kinect
sensor for Xbox.
We investigate the usefulness and limitations of the Kinect sensor for robotics,
particularly the raw images recorded from the infrared sensor and the depth-mapped
RGB-D images. Since our application is more focused on acquiring raw data for later
processing than utilizing the high-level skeletal tracking algorithms, we are using the
libfreenect libraries to synchronously capture RGB-D video and multi-channel audio
streams from the Kinect.
The Kinect uses a structured light approach similar in principle to [27] to create
a depth mapping. An infrared projector emits a known pattern of dots, allowing the
calculation of depth based on triangulation of the specific angle between the emitter
254
E. A. Dieckman and M. K. Hinders
and receiver, an infrared sensor with 1280 × 1024 resolution. The projected pattern
is visible in some situations in the raw image from the infrared sensor, to which the
open-source drivers allow access. To reduce clutter in the depth mapping, the infrared
sensor also has a band-stop filter at the projector’s output frequency of 830 nm. The
Kinect is able to create these resulting 640 × 480 resolution, 11-bit depth-mapped
images at video frame rates (30 Hz). The stated range of the depth sensing is 1.2–3.5
m, but in the right environments can extend to almost 6 m. An example of this image
for an indoor environment is shown in Fig. 8.3, along with the raw image from the
IR sensor and a separate photograph for comparison.
In addition to this colormapped depth image, the depth information can also be
overlaid on the RGB image acquired from a 1280 × 1024 RGB sensor to create a
three-dimensional point-cloud representation (Fig. 8.4).
Since the Kinect was designed to work as an accessory to a gaming system, it
works well in indoor environments, and others have evaluated its applications to
indoor robotics, object segmentation and tracking, and three-dimensional scanning.
Figure 8.5 shows a sampling of the raw IR and depth-mapped images for several
outdoor objects. The most visible feature when compared to images acquired in
indoor environments is that the raw infrared images are very well illuminated, or
even over-exposed. Because of this, the projector pattern is difficult to detect in the
infrared image, and the resulting depth-mapped images don’t tend to have much
structure. There is more likely to be useful depth information if the object being
imaged is not in direct sunlight and/or is located very close to the sensor.
Unlike the thermal IR camera which operates in the long-wave region of the IR
regime, the Kinect’s infrared sensor operates in the near-infrared. This is necessary
so that a distance can be calculated from the projected image, but the proximity of
the near-infrared to the visible spectrum allows the sensor to become saturated.
Figure 8.6 shows a series of images of the same scene as the sun emerges from
behind a cloud. As there is more sunlight, the infrared sensor becomes saturated and
no depth mapping can be constructed.
Although the Kinect’s depth mapping is of little use in outdoor environments
during the day, it may still be useful outside at night. However, the point-cloud library
representation will not work at night because it requires well-illuminated webcam
image on which to overlay the depth information. An example of the usefulness of
the Kinect depth mapping at night is shown in Fig. 8.7, where the depth mapping
highlights obstacles not visible in normal webcam images.
In summary, the Kinect’s depth sensor will work outside under certain circumstances. Unfortunately, the Kinect’s infrared sensor will not replace the more expensive thermal imaging camera to detect small signals from passive objects since it
operates in the near-infrared regime instead of long-wave regime that is more sensitive to such signals. However, very small and inexpensive LWIR cameras are now
available as smartphone attachments.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
Fig. 8.3 In an indoor
environment, the Kinect is
able to take a raw infrared
image (top) and convert it to
a corresponding
depth-mapped image
(middle), which overlays
depth information on an
RGB image. The speckle
pattern barely visible on the
raw infrared image is the
projected infrared pattern
that allows the Kinect to
create this depth mapping.
As shown in this mapping,
darker colored surfaces, such
as the desk chair on the left
of the image, are closer to
the sensor while lighter
colors are farther away.
Unexpected infrared
reflectors can confuse this
mapping and produce
erroneous results such as the
light fixture in the center of
the image. The bottom image
is a photograph of the same
scene (with the furniture
slightly rearranged) for
comparison
255
256
E. A. Dieckman and M. K. Hinders
Fig. 8.4 Instead of the
two-dimensional
colormapped images, the
Kinect depth-mapping can be
overlaid on the RGB image
and exported to a point-cloud
format. These point-cloud
library (PCL) images contain
real-world distances and
allow for three-dimensional
visualization on a computer
screen. Examples are shown
for an indoor scene (top) and
an outdoor scene acquired in
low-light conditions
(bottom), viewed at an
oblique angle to highlight the
3-D representation
8.3.3 Audio
Our main interest in updating rMary is to see how acoustic sensors could be integrated
into mobile robotics. Past work with rMary’s sibling rWilliam (Fig. 8.8) investigated
the use of air-coupled ultrasound in mobile robotics [28, 29], as have others [30–32].
The high attenuation of ultrasound in air limits the use of ultrasound scanning for
mobile robots.
Instead, we wish to study the use of low-frequency acoustic echolocation for
mobile robots. This is similar in principle to how bats navigate, though at much
lower frequencies and with much lower amplitude signals. A similar use of this
technology is found in the Sonar ruler app for the iPhone that attempts to measure
distances using the speaker and microphone, with mixed results [33]. Using signals
in the audible range reduces the attenuation, allowing for propagation over useful
distances. However, there is more background noise in the audible frequency range,
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
257
Fig. 8.5 Images acquired outdoors using the Kinect IR sensor (left) and the corresponding depth
mapped images (right) for a window (top) and tree (bottom) show the difficulties sunlight creates
for the infrared sensor
requiring the use of coded excitation signals and sophisticated signal processing
techniques to find the reflected signal in inherently noisy data.
One way to ensure that the reflected signal is primarily backscattering from the
target rather than clutter (unwanted reflections from the environment) is to create a
tightly spatially controlled beam of low-frequency sound using an acoustic parametric array. We can also use insights gleaned from simulations to improve the analysis
methods. Dieckman [2] discusses in detail a method of simulating the propagation
of the nonlinear acoustic beam produced by the acoustic parametric array and its
scattering from targets.
The properties of the acoustic parametric array have been studied in depth [34–38]
and has been used for area denial, concealed weapons detection, and nondestructive
evaluation [39–41]. In brief, the parametric array works by generating ultrasonic
signals at frequencies f 1 and f 2 , whose difference is in the audible range. As these
signals propagate, the nonlinearity of air causes self-demodulation of the signal,
creating signals at the sum ( f 1 + f 2 ) and difference ( f 2 − f 1 ) frequencies. Since
absorption is proportional to the square of frequency, only the difference frequency
remains as the signal propagates away from the array (Fig. 8.9).
The acoustic parametric array allows for tighter spatial control of the lowfrequency sound beam than a standard loudspeaker of the same size. Directivity of
258
E. A. Dieckman and M. K. Hinders
Fig. 8.6 As the sun emerges from behind a cloud and sunlight increases (top to bottom), the Kinect’s
infrared sensor (left) becomes saturated and the Kinect is unable to construct a corresponding depth
mapped image (right)
a speaker depends on the ratio of the size of the speaker to the wavelength of sound
produced, with larger speakers able to create more directive low-frequency sound.
Line arrays (speakers arranged in a row) are the traditional way to create directional
low-frequency sound, but can take up a great deal of space [42]. Using nonlinear
acoustics, the acoustic parametric array is able to create directional low-frequency
sound in a normal-sized speaker, as shown in Fig. 8.10.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
259
Fig. 8.7 The Kinect depth mapping (A) works well in nighttime outdoor environments, detecting a
light pole not visible in the illuminated RGB image (B). The image from the thermal camera (C) also
shows the tree and buildings in the background, but has a smaller field of view and lower resolution
than the raw image from the Kinect IR sensor (D) (images resized from original resolutions)
Fig. 8.8 A 50 kHz ultrasound scanner mounted on rWilliam is able to detect objects at close range
260
E. A. Dieckman and M. K. Hinders
Fig. 8.9 The acoustic parametric array creates signals at two frequencies f 1 and f 2 in the ultrasonic
range (pink shaded region). As the signals propagate away from the parametric array, the nonlinearity
of air allows the signals to self-demodulate, creating signals at the sum and difference frequencies.
Because attenuation is proportional to the square of frequency, the higher frequency signals attenuate
more quickly, and after several meters only the audible difference frequency remains
For our tests, we have mounted the Sennheiser Audiobeam parametric array to
rMary, with power supplied directly from rMary’s battery. This commercially available parametric array uses a 40 kHz carrier signal to produce audible sound pressure
levels of 75 ± 5 dB at a distance of 4 m from the face of the transducer. The echolocation signals we use are audible in front of the transducer at distances exceeding
50 m in a quiet environment, but would not necessarily be obtrusive to pedestrians
passing through the area and are easily masked by low levels of external noise.
To record the backscattered echolocation signal, as well as ambient noise from our
environment, we use the 4-channel microphone array included in the Kinect. This
array is comprised of four spatially separated high-quality capsule microphones with
a sampling rate of 16 kHz. The array separation is not large enough to allow implementation of beamforming methods at distances of interest here. The low sampling
rate means that acoustic signals are limited to a maximum frequency content of
8 kHz.
Audio data recorded by the Kinect microphone array was compared to data
recorded using a parabolic dish microphone (Dan Gibson EPM, 48 kHz sampling
rate), whose reflector dish directs sound onto the microphone. Figure 8.11 shows that
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
261
Fig. 8.10 The acoustic beam created by the acoustic parametric array has a consistently tighter
beam pattern than the much physically larger line array at low frequencies and fewer sidelobes at
higher frequencies
the microphones used in the Kinect actually perform better than the parabolic dish
microphone [43, 44]. All data used in our subsequent analysis is recorded with the
Kinect array.
8.3.4 Radar
The final sensor on rMary is a coffee can radar. A collaboration between MIT and
Lincoln Labs in 2011 produced a design for a low-cost radar system that uses two
metal coffee cans as antennas [45]. Simple amplifier circuits built on a breadboard
power low-cost modular microwave (RF) components to send and acquire signals
262
E. A. Dieckman and M. K. Hinders
Fig. 8.11 Even though the 4-channel Kinect microphone array has tiny capsule microphones that
only sample at 16 kHz, they provide a cleaner signal than the parabolic dish microphone with a 48
kHz sampling rate
Fig. 8.12 A low-cost coffee can radar was built and attached to rMary to test the capabilities of
radar sensors on mobile robots
through the transmit (Tx) and receive (Rx) antennas. The entire system is powered
by 8 AA batteries, which allows easy portability and the total cost of components is
less than USD $350. Our constructed coffee can radar is shown in Fig. 8.12.
The signal processing requirements of the coffee can radar system are reduced
by using a frequency modulated continuous wave (FMCW). In this setup, the radar
transmits an 80 MHz chirped waveform centered at 2.4 Ghz (in the ISM band). The
same waveform is then used to downconvert, or “de-chirp” the signal so that the
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
263
residual bandwidth containing all the information is small enough to digitize with a
sound card. This information is saved in .wav files and analyzed in Matlab.
The system as originally designed has 10 mW Tx power with an approximate
maximum range of 1 km and can operate in one of three modes: Doppler, range, or
Synthetic Aperture Radar (SAR). In Doppler mode the radar emits a continuouswave signal at a given frequency. By measuring any frequency shifts in this signal,
moving objects are differentiated from stationary ones. Images from this mode show
an object’s speed as a function of time. In ranging mode, the radar signal is frequency
modulated, with the magnitude of this modulation specifying the transmit bandwidth.
This allows the imaging of stationary or slowly moving objects, and the resulting
images show distance from the radar (range) as a function of time. SAR imaging
is basically a set of ranging measurements acquired over a wide area to create a
three-dimensional representation of the radar scattering from a target [46–48].
While SAR imaging has the greatest potential application in mobile robotics, since
the robotic platform is already moving over time, we here look at ranging measurements in our feasibility tests of the radar. Figure 8.13 shows a ranging measurement
of three vehicles approaching rMary with initial detection of the vehicles at a distance
of 70 m. Since the ranging image is a color-mapped plot of time versus range, the
speed of approaching vehicles can also be calculated directly from the image data.
Fig. 8.13 The ranging image acquired using a coffee can radar shows three vehicles approaching
rMary. The vehicles’ speed can be calculated from the slope of the line
264
E. A. Dieckman and M. K. Hinders
These measurements demonstrate the feasibility of this low-cost radar as a longrange sensor for mobile robotics. Since the radar signal is de-chirped to facilitate
processing with a computer sound card, these measurements may not contain information about scattering from the object, unlike the acoustic echolocation signal.
However, radar ranging measurements could provide an early detection system for a
mobile robot, detecting objects at long range before other sensors are used to classify
the object. This detection distance is dependent upon a number of parameters, most
important of which is the availability of line-of-sight to the target.
Over the course of several months, we collected data on and near the William &
Mary campus as vehicles approached rMary as shown in Fig. 8.14 at ranges of up to
50 m. The parametric array projected a narrow-beam, linear chirp signal down the
road, which backscattered from on-coming vehicles. We noted in each case whether
the on-coming vehicle was a car, SUV, van, truck, bus, motorcycle, or other.
The rMary platform allows us to investigate the capabilities and limitations of a
number of low-cost sensors in unstructured outdoor environments. A combination of
short- and long-range sensors provides a mobile robot with the most useful information about its environment. Previous work focused on passive thermal infrared and
air-coupled ultrasound as possible short-range sensor modalities. Our work looked at
the suitability of the Microsoft Kinect as a short-range active infrared depth sensor,
as well as the performance of a coffee can radar and acoustic echolocation via acous-
Fig. 8.14 We collected data with rMary on the sidewalk, but with traffic approaching head on
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
265
tic parametric array as long-range sensors for mobile robotics. While the low-cost
depth sensor on the Microsoft Kinect is of limited use in outdoor environments, the
coffee can radar has the potential to provide low-cost long-range detection capability.
In addition, the Kinect microphone array can be paired with an acoustic parametric
array to provide high-quality acoustic echolocation measurements.
8.4 Pattern Classification
So far we know that our transmitted signal is present in the backscattered reflection
from a target at distances exceeding 50 m. The hope is that this reflected signal
contains useful information that will allow us to determine the type (class) of vehicle.
Since we are using a coded signal we also expect that a time–frequency representation
of the data will prove useful in this classification process. The next step is to use
statistical pattern classification techniques to find that useful information in these
signals to differentiate between vehicles of different classes. These analyzes are
written in parallel to run in MATLAB on a computing cluster to reduce computation
time.
8.4.1 Compiling Data
To more easily compare the large number of measurements from different classes
we organize the measured data into structures. The greater than 4000 individual
measurements we have collected are spread over 5926 files including audio, radar, and
image data organized in timestamped directories. Separate plaintext files associate
each timestamp with its corresponding vehicle class. If we are to run our analysis
routines on computing clusters, providing access to this more than 3.6 GB of original
data becomes problematic. Instead we create smaller data structures containing only
the information we require. These datasets range in size from 11–135 MB for 108–
750 measurements and can easily be uploaded to parallel computing resources.
Much of the reduction is size is due to the fact that we only require access to
the audio data for these tests and can eliminate the large image files. One additional
reduction is accomplished by choosing a single channel of the 4-channel Kinect
audio data to include in the structure. The array has a small enough spacing that
all useful information is present in every channel, as seen in Fig. 8.15. Resampling
all data to the acceptable minimum rate allowed by the Nyquist–Shannon sampling
theorem further reduces the size of the data structure.
Our goal is to differentiate between vehicle classes, so it is natural to create data
structures divided by class. Since it doesn’t make sense to compare data from different
incident signals, we create these structures for a number of data groups. We have
also allowed the option to create combination classes, for example, vans, trucks, and
buses are combined into the “vtb” class. This allows vehicles with similar frontal
266
E. A. Dieckman and M. K. Hinders
Fig. 8.15 Due to the close spacing of the microphone array on the Kinect, all four channels contain
the same information. The parabolic microphone is mounted aft of the Kinect array, causing the
slight delay visible here, and is much more sensitive to noise
profiles to be grouped together to create a larger dataset to train our classifier. When
creating these structures, data is pulled at random from the entire set of possibilities.
The data structure can contain either all possible data or equal amounts of data from
each class (or combined class), which can help reduce classification errors due to
unequal data distribution. It is also important to note that due to the difficulty of
detecting individual reflections inside a signal, not every measurement inside the
data structure is guaranteed to be usable. Tables 8.1 and 8.2 of the amount of data in
each group only provide an upper limit on the number of usable measurements.
8.4.2 Aligning Reflected Signals
The first step in our pattern classification process is to align the signals in time. This
is crucial to ensure that we are comparing the signals reflected from vehicles to each
other, rather than comparing a vehicle reflection to a background measurement that
contains no useful information.
Our control of the transmitted signal gives us several advantages that we can
exploit in this analysis. First, since the frequency content of the transmitted signal
is known, we can apply a bandpass filter to the reflected signal to reduce noise at
other frequencies. In some cases, this will highlight reflections that were previously
hidden in the noise floor, allowing for automated peak detection.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
267
Table 8.1 Maximum amount of data in each classification group (overlaps possible between
groups)
Group
c
s
v
t
b
m
o
5-750 HO
10-750 HO
100-900 HO
250-500 HO
250-500 NHO
250-500 all
250-1000 HO
250-1000 NHO
250-1000 all
250-comb HO
250-comb NHO
250-comb all
520
191
94
61
41
102
552
589
1141
613
630
1243
501
190
88
74
21
95
515
410
925
589
431
1020
52
22
13
16
1
17
55
52
107
71
53
124
32
18
7
3
3
6
42
28
70
45
31
76
15
7
16
3
1
4
27
20
47
30
21
51
5
3
1
0
0
0
3
2
5
3
2
5
1
0
0
3
0
3
3
6
9
6
6
12
Table 8.2 Maximum amount of data in each classification group when binned (overlaps possible).
Group
c
s
vtb
5-750 HO
10-750 HO
100-900 HO
250-500 HO
250-500 NHO
250–500 all
250–1000 HO
250–1000 NHO
250–1000 all
250-comb HO
250-comb NHO
250-comb all
520
191
94
61
41
102
552
589
1141
613
630
1243
501
190
88
74
21
95
515
410
925
589
431
1020
99
47
36
22
5
27
124
100
224
146
105
251
More often, however, the backscattered reflection remains hidden among the noise
even after a bandpass filter is applied. In this case we obtain better results using peak
detection on the envelope signal.
To create this signal, we take our original signal f (x) which has already been
bandpass filtered, and take the absolute value of its Hilbert transform | fˆ(x)|. This
is the analytic signal, which discards the negative frequency components of a signal
created by the Fourier transform in exchange for dealing with a complex-valued
function. The envelope signal is then constructed by applying a very lowpass filter
to the analytic signal. This process is shown in Fig. 8.16.
268
E. A. Dieckman and M. K. Hinders
Fig. 8.16 The envelope signal is created by taking the original bandpass-filtered data (top), creating
the analytic signal (middle), and applying a very lowpass filter (5th order Butterworth, f c = 20 Hz)
(bottom). The envelope signal is created by taking the original bandpass-filtered data (top), creating
the analytic signal (middle), and applying a very lowpass filter (5th order Butterworth, f c = 20 Hz)
(bottom)
In some cases, even peak detection on the envelope signal will not give optimal
results. Occasionally, signals will have a non-constant DC offset that complicates the
envelope signal. This can often be corrected by detrending (removing the mean) the
signal. A more pressing issue is that the envelope signal is not a reliable detection
method if reflections aren’t visible in the filtered signal. Even when peaks can be
detected in the envelope signal, they tend to be very broad. As a general rule, peak
detection is less sensitive to variations in threshold as the peak grows sharper. Adding
a step to the peak detection that finds the mean value of connected points above a
certain threshold ameliorates this problem, but since the peak widths of the envelope
signal are not uniform, finding the same point on each reflected signal becomes an
issue. Some of these issues are highlighted in Fig. 8.17.
Instead, we can exploit another feature of our transmitted signal—its shape. All
of our pulses are linear frequency chirps which have well-defined characteristics
and, more importantly, maintain a similar shape even after they reflect from a target.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
269
Fig. 8.17 Even for multiple measurements from the same stationary vehicle, the envelope (red) has
difficulty consistently finding peaks unless they are obviously visible in the filtered signal (blue).
The shifted cross-correlation (green) doesn’t have this limitation
By taking the cross-correlation of our particular transmitted signal and the reflected
signal and accounting for the time shift inherent to the process, a sharp peak that can
be easily found by an automated peak detection algorithm is created at time point
where the reflected signal begins.
Peak detection in any form requires setting a threshold at a level which reduces the
number of false peaks detected without disqualifying actual peaks. This is a largely
trial-and-error process and can easily introduce a human bias into the results. Setting
the threshold as a percentage of the signal’s maximum value will also improve the
performance of the peak detection.
270
E. A. Dieckman and M. K. Hinders
Fig. 8.18 Individual reflections aren’t visible in the bandpass-filtered signal from an oncoming
vehicle at 50 m (top) or in its detrended envelope signal (middle). Cross-correlation of the filtered
signal and the 100–900 transmitted pulse (bottom) show clear peaks at the beginning of each
reflected pulse, which can be used in an automated detection algorithm
Several problems with the automated peak detection are clearly visible in Fig. 8.18,
where a detection level set to 70% of the maximum value will only detect one of
the three separate reflections that a human eye would identify. Although we could
adjust the threshold level to be more inclusive, it would also increase the rate of
false detection and add more computational load to filter these out. Adjustment of
the threshold is also not ideal as it can add a human bias to the procedure.
Another issue is due to the shape of the correlated waveform, caused by a vehicle’s
noise increasing as it nears the microphone. The extra noise in the first part of the
signal is above the detection threshold and will lead to false detection. This is an
easier problem to solve—our algorithm will reject any peaks that are not separated
by large enough distance. A separation distance of half the length of the cut signals
reduces the rate of false detection.
It is also important to note the position of the sensors on the robotic platform at
this point. If we were using a regular loudspeaker, the transmitted pulse would be
recorded along with the reflected signal and the cross-correlation would detect both
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
271
Fig. 8.19 For the simpler case of a stationary van at 25 m, the backscatter reflection is clearly
visible once it has been bandpass filtered (blue). Automated peak detection may correctly find the
peaks of the envelope signal (red), but is much more sensitive to the threshold level than the shifted
cross-correlation signal (green) due to the sharpness of its peaks
signals, complicating the detection process. Mounting the microphone array behind
the speaker could help, but care would have to be taken with speaker selection. Since
the acoustic parametric array transmits an ultrasonic signal, the audible signal is only
audible at greater distances than the position of the microphone array.
The three detection methods (filtered signal, envelope signal, and shifted crosscorrelation) are summarized in Fig. 8.19, which uses the simplified situation of data
from a stationary vehicle at 25 m to illustrate all three methods. For our analysis we
will use the shifted cross-correlation to align the signals in time.
8.4.3 Feature Creation with DWFP
We use the dynamic wavelet fingerprint (DWFP) to represent our time-domain
waveforms in a time-scale domain. This analysis has proven useful in past work
to reveal subtle features in noisy signals [4–9] by transforming a one-dimensional,
time-domain waveform to a two-dimensional time-scale image. An example of the
DWFP process is shown in Fig. 8.20 for real-world data.
272
E. A. Dieckman and M. K. Hinders
Fig. 8.20 A one-second long acoustic signal reflected from a bus (top) is filtered (middle) and
transformed into a time-scale image that resembles a set of individual fingerprints (bottom). This
image is a pre-segmented ternary image that can easily be analyzed using existing image processing
algorithms
The main advantage of the DWFP process is that the output is a pre-segmented
image that can be analyzed using existing image processing techniques. We implement these libraries to create a number of one-dimensional parameter waveforms
that describe the image, and by extension of our original signal. This analysis yields
approximately 25 parameter waveforms.
As an overview, our feature extraction process takes a time-domain signal and
applies a bandpass filter. A pre-segmented fingerprint image is created using the
DWFP process, from which a number of one-dimensional parameter waveforms are
extracted. In effect, our original one-dimensional time-domain signal is now represented by multiple parameter waveforms. Most importantly, the time axis is maintained throughout this process so that features of the parameter waveform are directly
correlated to events in the original time-domain signal. A visual representation of
the process is shown in Fig. 8.21.
The user has control of a large number of parameters in the DWFP creation and
feature extraction process, which greatly affects the appearance of the fingerprint
images, and thus the extracted features. The parameters that most affect the fingerprint image are the wavelets used for pre-filtering and performing the continuous
wavelet transform to create the DWFP image. A list of candidate wavelets is shown in
Table 8.3. However, there is no way to tell a priori which combination of parameters
will create the ideal representation for a particular application. We use a computing
cluster to run this process in parallel for a large number of parameter combinations,
combined with past experience with analysis of DWFP images to avoid an entirely
brute force implementation.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
273
Fig. 8.21 A one-second long 100–900 backscatter signal is bandpass filtered and converted to a
ternary image using the DWFP process. Since the image is pre-segmented it is easy to apply existing
image analysis techniques and create approximately 25 one-dimensional parameter waveforms that
describe the image. Our original signal is now represented by these parameter waveforms, three
examples of which are shown here (ridge count, filled area, and orientation). Since the time axis is
maintained throughout the entire process, features in the parameter waveforms are directly correlated
to events in the original time-domain image
8.4.4 Intelligent Feature Selection
Now that each original one-dimensional backscatter signal is represented by a set
of continuous one-dimensional parameter waveforms, we need to determine which
will best differentiate between different vehicles. The end goal is to create a small
dimensional feature vector for each original backscatter signal which contains the
value of a parameter waveform at a particular point in time. By choosing these time
274
E. A. Dieckman and M. K. Hinders
Table 8.3 List of usable wavelets. For those wavelet families with multiple representations (db,
sym, and coif), the default value used is shown
Name
Matlab name
Prefiltering
Transform
Haar
Daubechies
Symlets
Coiflets
Meyer
Discrete meyer
Mexican hat
Morlet
haar
db3
sym5
coif3
meyr
dmey
mexh
morl
X
X
X
X
X
X
X
X
X
X
X
X
points correctly, we have created a new representation of the signal that is much more
information dense than the original signal. This feature vector completely describes
the original signal and can be used in statistical pattern classification algorithms to
classify the data in seconds.
For this analysis we are using a variant of linear discriminant analysis to find the
points in time where the parameter waveform has the greatest separation between
different classes, but also where signals of the same class have a small variance. For
each parameter waveform, all of those signals from a single class are averaged to
create a mean and corresponding standard deviation signal. Comparing the mean
signals to each other and keeping a running average of the difference allows us to
create an overall separation distance signal (δ), while a measure of the variance
between signals of the same class comes from the maximum standard deviation of
all signals (σ ). Instead of using iterative methods to simultaneously maximize δ and
minimize σ , we create a ratio signal ρ = σδ and find its maxima (Fig. 8.22).
We save the time point and value of ρ of the top 5–10 points for each extracted
feature. When this process has been completed for all parameter waveforms, this list
is sorted based on decreasing ρ and reduced to the top 25–50 points, keeping track of
both points and feature name. Restricting the process in this manner tends to create
a feature vector with components from many of the features, as shown in Fig. 8.23.
The number of top points saved for both steps is a user parameter, shown in Table 8.4
and restricted to mitigate the curse of dimensionality.
Feature vectors can then be created for each original backscatter signal by taking
the value of the selected features at the given points. This results in a final, dense
feature vector representation for each original signal.
The entire pattern classification process for data from three classes is summarized
in Fig. 8.24.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
275
Fig. 8.22 For each of the parameter waveforms (FilledArea shown here), a mean value is created
by averaging all the measurements of that class (top). The distance between classes is quantified
by the separation distance (middle left) and tempered by the intraclass variance, represented by the
maximum standard deviation (middle right). The points that are most likely to separate the classes
are shown as the peaks of the joint separation curve (bottom)
8.4.5 Statistical Pattern Classification
The final step in our analysis is to test the ability of the feature vector to differentiate
between vehicle classes using various pattern classification algorithms. This is often
the most time-consuming step in the process, but since we have used intelligent
feature selection to create an optimized and small feature vector, this is the fastest step
of the entire process here, and can be completed in seconds on a desktop machine. Of
course, we have simply shifted the hard computational work that requires a computing
cluster to the feature selection step. That is not to say that there are no advantages to
doing the analysis this way—having such small feature vectors allows us to easily
test a number of parameters of the classification.
Before we can run pattern classification routines we must separate our data into
training and testing (or validation) datasets. By withholding a subset of the data for
testing the classifier’s performance, we can eliminate any “cheating” that comes from
using training data for testing. We also use equal amounts of data from each class
for both testing and training to eliminate bias from unequal-sized datasets.
276
E. A. Dieckman and M. K. Hinders
Fig. 8.23 The list of top features selected for 100–900 (left) and 250–500 (right) datasets illustrate
how features are chosen from a variety of different parameter waveforms
Table 8.4 List of user parameters in feature selection
Setting
Options
Peakdetect
Viewselected
Selectnfeats
Joint, separate
Binary switch
Z+
Topnfeats
Z+
Description
Method to choose top points
View selected points
Keep this many top points for each
feature
Keep this many top points overall
Our classification routine are run in MATLAB, using a number of standard classifiers included in the PRTools toolbox [14]. Because of our small feature vectors
and short classification run time, we run the pattern classification many times, randomizing the data used for the testing and training for each run. This gives us an
average classification performance and allows us to use standard deviation of correct
classifications as a measure of classification repeatability. While this single-valued
metric is useful in comparing classifiers, more detailed information about classifier
performance will come from the average confusion matrix. For n classes, this is an
n × n matrix that plots the estimated class against the known class. The confusion
matrix for a perfect classification would resemble the identity matrix, with values of
1 on the diagonal and 0 elsewhere. An example of a confusion matrix is shown in
Fig. 8.25.
Fig. 8.24 For each class, every individual measurement is filtered and transformed to a fingerprint image, from which a number of parameter waveforms are
extracted. For each of these parameter waveforms, an average is created for each class. A comparison of these average waveforms finds the points that are best
separated between the classes, and the feature vector is compiled using the values of the parameter waveform at these points. This image diagrams the process
for sample data from three classes (blue, green, and red)
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
277
278
E. A. Dieckman and M. K. Hinders
Fig. 8.25 This example of a real confusion matrix shows good classification performance, with
high values on the diagonal and low values elsewhere. The confusion matrix allows more detailed
visualization of the classification performance for specific classes than the single-value metric of
overall percent correct. For example, although this classification has a fairly high accuracy of 86%
correct, the confusion matrix shows that most of the error comes from misclassifying SUVs into
the van/truck/bus class [udc classifier, 20 runs]
8.5 Results
We illustrate the use of the pattern classification analyses on data collected from both
stationary and oncoming vehicles. Due to the similar frontal profiles of vans, trucks,
and buses, and to mitigate the small number of observations recorded from these
vehicles, we will create a combined class of these measurements. The classes for
classification purposes are then “car”, “SUV”, and “van/truck/bus”. For this threeclass problem, a classification accuracy of greater than 33% means the classifier is
performing better than random guessing.
8.5.1 Proof-of-Concept: Acoustic Classification of Stationary
Vehicles
We begin our analysis with data collected from stationary vehicles. The first test is
a comparison of classification performance when observations come from a single
vehicle as compared to multiple vehicles. Multiple observations were made from
vehicles in a parking lot at distances between 5 and 20 m. The orientation is approximately head-on (orthogonal) but with slight repositioning after every measurement
to construct a more realistic dataset. The classification accuracy shown in Fig. 8.26
validates our expectation that classification performs better when observations are
exclusively from a single vehicle rather than from a number of different vehicles.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
279
Fig. 8.26 The first attempt
to classify vehicles based on
their backscattered acoustic
reflection from a 250–1000
transmitted signal shows
very good classification
accuracy when all
observations come from a
single vehicle at 20 m (top).
When observations come
from multiple vehicles at 5
m, the classification accuracy
is much lower but still better
than random guessing
(bottom) [knnc classifier, 10
runs]
We see that reasonable classification accuracies can be achieved even when the
observations come from multiple vehicles. Optimizing the transmitted signal and
recognizing the importance of proper alignment of the reflected signals will help
improve classification performance, as seen in Fig. 8.27. Here, data is collected from
multiple stationary vehicles at a range of short distances between 10 and 25 m using
both a 10–750 and 250–1000 transmitted signal. Classification performance seems
slightly better for the shorter chirp-length signal, but real improvement comes from
ensuring the signals are aligned in time. For this dataset, alignment was ensured by
visual inspection of all observations in the 250–1000 dataset. This labor-intensive
manual inspection has been replaced by cross-correlation methods described earlier
in the analysis of data from oncoming vehicle.
While these initial tests exhibit poorer classification performance than the data
from oncoming vehicles, it is worth noting that these datasets consist of relatively few
observations and are intended as a proof-of-concept. These stationary datasets were
used to optimize the analysis procedure for the more interesting data from oncoming vehicles. For example, alignment algorithms weren’t yet completely developed,
and the k-nearest neighbor classifier used to generate the above confusion matrices has proven to have consistently worse performance than the classifiers used for
280
E. A. Dieckman and M. K. Hinders
Fig. 8.27 Observations from multiple stationary vehicles at distances of 10–25 m shows approximately equal classification performance for both the 10–750 (left) and 250–1000 (right) transmitted
signals. Manual inspection of the observations in the 250–1000 dataset to ensure clear visual alignment leads to markedly improved classification performance (bottom) [knnc, 10 runs]
the results shown from oncoming vehicles. Nevertheless, we see that better-thanrandom-guessing classification accuracy is possible using only the acoustic echolocation signal.
8.5.2 Acoustic Classification of Oncoming Vehicles
Now that we have seen that it is possible to classify stationary vehicles using only
the reflected acoustic echolocation signal, the more interesting problem is trying to
classify oncoming vehicles at greater distances.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
281
Since the DWFP feature creation process has a large number of user parameters,
the first step was to find which few parameters will give us the best classification
performance. This reduces the parameter space in our analysis and allows us to focus
on more interesting details of the classification, such as the effect of the transmitted
signal on the classification accuracy. Through previous work we have seen that the
choice of wavelet in both the prefiltering and transform stage in the DWFP process causes the greatest change in the fingerprint appearance, and thus the features
extracted from the fingerprint.
Table 8.5 shows the classification accuracy for different prefiltering wavelets,
while Table 8.6 shows the classification accuracy for different transform wavelets. In
both cases, the dataset being classified is from vehicles at 50 m approaching head-on
using the 100–900 transmitted signal. The settings for other user parameters are:
filtering at 5 levels, removing the first 5 details, 15 slices of thickness 0.03, and
removing fingerprints that do not have a solidity in the range 0.3–0.6. The mean
correct classification rate is created from 20 classification runs for each classifier.
These are the parameters for the rest of our analysis unless noted otherwise.
The choice of prefiltering wavelet does not affect the classification accuracy much.
The variance measure (given by the standard deviation of repeated classifications)
is not shown, but is consistently around 0.07 for all classifiers. With this knowledge,
there is no obvious preference of prefiltering wavelet and we chose coif3 for further
analysis.
The choice of transform wavelet does seem to affect the classification accuracy
somewhat more than the choice of the prefiltering wavelet. Still, the choice of classifier is by far the most important factor in classification accuracy. We select db3 as
the default transform wavelet, with dmey and sym5 as alternatives.
From this analysis we can also select a few good classifiers for our problem.
Pre-selecting classifiers violate the Ugly Duckling theorem, which states that we
should not prefer one classifier over another, but because the underlying physical
situation is similar between all of our datasets we are justified in selecting a small
number of well-performing classifiers. We will use the top five classifiers from our
initial analysis: nmsc, perlc, ldc, fisherc, and udc. The klldc and pcldc classifiers also
performed well, but since they are closely related to ldc, we choose other classifiers
for diversity and to provide a good mix of parametric and non-parametric classifiers.
More in-depth analysis of the effect that physical differences have on classification accuracy can be explored, using coif3 as a prefiltering wavelet and db3 as the
transform wavelet, with the nmsc, perlc, ldc, fisherc, and udc classifiers.
Table 8.7 shows the datasets constructed for the following analyses. All datasets
are constructed from data pulled at random from the overall datasets from that particular signal type and contain an equal number of instances for each class. Most of
the datasets consist of three classes: car (c), SUV (s), and a combined van/truck/bus
(vtb), though few datasets with data from all five classes: car (c), SUV (s), van (v),
truck (t), and bus (b) were created to attempt this individual classification. Requiring an equal number of instances from each class leads to small datasets, even after
creating the combined van/truck/bus class to mitigate this effect. In addition, not all
Classifier
qdc
udc
haar
0.40
0.81
db3
0.49
0.78
sym5 0.41
0.81
coif3
0.55
0.85
Average 0.46
0.81
PW
ldc
0.83
0.80
0.81
0.77
0.80
klldc
0.78
0.76
0.81
0.80
0.79
pcldc
0.79
0.78
0.81
0.79
0.79
nmc
0.66
0.57
0.54
0.60
0.59
nmsc
0.87
0.86
0.85
0.89
0.87
loglc
0.75
0.69
0.73
0.77
0.73
fisherc
0.82
0.79
0.79
0.80
0.80
knnc
0.56
0.59
0.54
0.57
0.56
parzenc
0.63
0.58
0.52
0.59
0.58
parzendc
0.73
0.78
0.72
0.69
0.73
kernelc
0.59
0.57
0.58
0.66
0.60
perlc
0.86
0.80
0.84
0.86
0.84
svc
0.67
0.72
0.73
0.72
0.71
nusvc
0.70
0.68
0.69
0.71
0.69
treec
0.51
0.49
0.54
0.47
0.50
Average
0.70
0.69
0.69
0.71
0.70
Table 8.5 A comparison of prefiltering wavelet (PW) choice on classification accuracy. The transform wavelet is db3. Data is from the 100–900 approaching
vehicle dataset with a train/test ratio of 0.7 and classification into three classes (c, s, vtb). The differences in performance between classifiers falls within the
measure of variance for a single classifier (not shown here for reasons of space). Since there seems to be no preferred prefiltering wavelet, future analysis will
use coif3
282
E. A. Dieckman and M. K. Hinders
haar
db3
sym5
coif3
meyr
dmey
mexh
morl
Average
TW
Classifier
qdc
udc
0.45
0.68
0.55
0.85
0.55
0.77
0.45
0.77
0.43
0.65
0.43
0.84
0.47
0.64
0.47
0.68
0.47
0.73
ldc
0.73
0.77
0.78
0.68
0.77
0.82
0.76
0.82
0.77
klldc
0.67
0.80
0.79
0.68
0.77
0.80
0.77
0.82
0.76
pcldc
0.68
0.79
0.81
0.67
0.77
0.82
0.73
0.82
0.76
nmc
0.42
0.60
0.53
0.49
0.51
0.50
0.55
0.52
0.51
nmsc
0.81
0.89
0.86
0.76
0.86
0.88
0.78
0.84
0.83
loglc
0.59
0.77
0.72
0.64
0.66
0.81
0.63
0.72
0.69
fisherc
0.66
0.80
0.82
0.64
0.80
0.80
0.76
0.83
0.76
knnc
0.47
0.57
0.53
0.49
0.53
0.47
0.57
0.51
0.52
parzenc
0.47
0.59
0.54
0.51
0.61
0.50
0.51
0.51
0.53
parzendc
0.67
0.69
0.73
0.68
0.68
0.71
0.63
0.76
0.69
kernelc
0.54
0.66
0.61
0.55
0.62
0.54
0.61
0.51
0.58
perlc
0.77
0.86
0.80
0.75
0.77
0.88
0.73
0.85
0.80
svc
0.61
0.72
0.71
0.65
0.69
0.71
0.64
0.70
0.68
nusvc
0.63
0.71
0.66
0.63
0.65
0.62
0.63
0.65
0.65
treec
0.48
0.47
0.49
0.56
0.47
0.52
0.44
0.51
0.49
Average
0.61
0.71
0.69
0.62
0.66
0.69
0.64
0.68
0.70
Table 8.6 A comparison of transform wavelet (TW) choice on classification accuracy shows very similar classification performance for many wavelet choices.
The prefiltering wavelet is coif3. Data is from the 100–900 approaching vehicle dataset with train/test ratio of 0.7 and classification into three classes (c, s, vtb).
Due to space constraints, the variance is not shown
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
283
284
E. A. Dieckman and M. K. Hinders
Table 8.7 A survey of the datasets used in this analysis (in order of appearance) shows the number
of classes, total number of instances, and distance at which data was acquired. The small size of the
datasets is a direct result of requiring the datasets to have an equal number of instances per class
and the relatively few observations from vans, trucks, and buses
Dataset
Classes
Instances
Distance (m)
100–900 a
100–900 b
100–900 c
250-comb a
250-comb b
250-comb c
250–500
250–1000
250-comb HO
250-comb NHO
5–750
10–750
100–900
3
3
3
3
3
3
3
3
3
3
3
3
5
108
108
108
251
251
251
66
66
98
98
108
108
35
50
50
50
25, 30, 50
25, 30, 50
25, 30, 50
30, 50
< 25, 25, 30, 60
50
25, 30
50
50
50
Table 8.8 Increasing the amount of data used for training increases the classification accuracy, but
reduces the amount of available data for validation. As too much of the available data is used for
training the classifier becomes overtrained and the variance of the accuracy measurement increases.
Data is from the 100–900 approaching vehicle dataset with classification into three classes (c, s,
vtb)
Train %
Classifier
nmsc
perlc
ldc
fisherc
udc
Avg
0.25
0.5
0.6
0.7
0.8
0.9
0.82 ± 0.06
0.89 ± 0.05
0.89 ± 0.03
0.90 ± 0.06
0.91 ± 0.06
0.91 ± 0.08
0.77 ± 0.06
0.85 ± 0.04
0.84 ± 0.06
0.87 ± 0.05
0.89 ± 0.06
0.86 ± 0.12
0.52 ± 0.08
0.69 ± 0.07
0.76 ± 0.06
0.81 ± 0.06
0.79 ± 0.11
0.82 ± 0.13
0.43 ± 0.09
0.74 ± 0.08
0.75 ± 0.06
0.79 ± 0.06
0.80 ± 0.10
0.82 ± 0.12
0.72 ± 0.08
0.80 ± 0.05
0.82 ± 0.06
0.83 ± 0.06
0.82 ± 0.07
0.88 ± 0.10
0.65
0.79
0.81
0.84
0.84
0.86
of the instances are usable due to the difficulty of detecting and aligning the signals.
This is especially true for the 250 ms signals.
We will first look at the influence of the train/test ratio on the classification performance. Table 8.8 shows the classification accuracy as a function of train/test ratio
for the same 100–900 dataset used in our earlier analysis of wavelets and classifiers
shown in Tables 8.5 and 8.6. In general, the classifiers are able to perform well even
when only 25% of the dataset is used for training, with the notable exception of the
fisherc classifier. Classification accuracy increases with increasing training ratio, but
when too much of the dataset is used for training (90% here) not enough data is
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
285
Table 8.9 Classification of datasets whose instances are selected at random from the larger dataset
containing all possible observations shows repeatable results for both 100–900 and 250-comb data.
The overall lower performance of the 250-comb datasets is likely due to the greater variety in the
observations present in this dataset
Dataset
Classifier
nmsc
perlc
ldc
fisherc
udc
Avg
100–900: a
b
c
250-comb: a
b
c
0.81 ± 0.04
0.84 ± 0.06
0.86 ± 0.05
0.60 ± 0.06
0.58 ± 0.05
0.52 ± 0.06
0.84 ± 0.05
0.76 ± 0.09
0.84 ± 0.05
0.55 ± 0.06
0.55 ± 0.06
0.45 ± 0.05
0.78 ± 0.05
0.69 ± 0.08
0.79 ± 0.07
0.56 ± 0.06
0.56 ± 0.06
0.51 ± 0.06
0.78 ± 0.06
0.74 ± 0.08
0.77 ± 0.06
0.56 ± 0.03
0.54 ± 0.03
0.50 ± 0.05
0.76 ± 0.07
0.71 ± 0.08
0.72 ± 0.05
0.53 ± 0.04
0.56 ± 0.05
0.50 ± 0.04
0.79
0.75
0.77
0.56
0.56
0.50
available for validation and the variance of the classification accuracy increases. A
more in-depth look at this phenomenon comes from the confusion matrices, shown
in Fig. 8.28. For future analysis we choose a train/test ratio of 0.6 to ensure we have
enough data for validation, with the caveat that our classification performance could
be few points higher if we used a higher training ratio.
8.5.2.1
Repeatability of Classification Results
Since our code creates a dataset for classification by randomly selecting observations
from a given class from among all the total possibilities, we would expect some
variability between these separate datasets. Table 8.9 shows the classification results
for three datasets compiled from all available data from the 100–900 and 250-comb
overall datasets.
Classification performance is similar among both the 100–900 and 250-comb
datasets. The 250-comb dataset has an overall lower classification performance, likely
due to the greater variety of observations present in the dataset. The 250-comb data is
a combination of 250–500 and 250–1000 data, created with the assumption that the
time between chirps is less important than the length of the chirp. A comparison of
classification performance of all the 250 ms chirps, shared in Table 8.10 and Fig. 8.29
calls this into question.
While it is possible that this particular 250-comb dataset that was randomly
selected from the largest and most diverse dataset just suffered from bad luck, there
is clearly a difference in classification performance between the 250–500/250–1000
datasets and the combined 250-comb dataset. This leads us to believe that the entire
transmitted signal, including the space between chirps, is important in defining a
signal, rather than just the chirp length.
That said, both the 250–500 and 250–1000 datasets exhibit good classification
performance, albeit with large variances. These large variances are caused by the
relatively small number of useful observations in each dataset. Figure 8.30 shows
286
E. A. Dieckman and M. K. Hinders
Fig. 8.28 This example of the classification of 98 instances pulled from the 100–900 dataset
shows how increasing the training ratio first improves classification performance and then increases
variance due to overtraining and a lack of data to validate the classifier. The mean confusion matrix
is shown for the fisherc classifier at 25% training data (top), 60% train (middle), and 90% train
(bottom)
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
287
Fig. 8.29 A comparison of the confusion matrices (perlc classifier) of the 250–500 (top), 250–1000
(middle), and 250-comb (bottom) datasets shows that the non-combined datasets have a much higher
classification performance than the 250-comb dataset. Both the 250–500 and 250–1000 datasets
are small, with 17 and 19 total instances, respectively, compared with the 215 total instances of the
250-comb dataset. The classification performance of the 250–1000 dataset is almost perfect
288
E. A. Dieckman and M. K. Hinders
Table 8.10 A comparison of classification performance between the 250 datasets shows that the
spacing between chirps is important in defining our transmitted signal
Dataset
Classifier
nmsc
perlc
ldc
fisherc
udc
Avg
250–500
0.92 ± 0.09 0.85 ± 0.12 0.72 ± 0.16 0.75 ± 0.14 0.70 ± 0.15 0.79
250–1000
0.99 ± 0.03 0.98 ± 0.05 0.56 ± 0.22 0.68 ± 0.19 0.97 ± 0.07 0.84
250-comb: a 0.60 ± 0.06 0.55 ± 0.06 0.56 ± 0.06 0.56 ± 0.03 0.53 ± 0.04 0.56
Table 8.11 A comparison of 250-comb data acquired in a “head-on” orientation and data collected
at a slight angle shows that both orientations have a similar classification performance.
Orientation Classifier
nmsc
perlc
ldc
fisherc
udc
Avg
Head-on
Oblique
0.64 ± 0.08 0.56 ± 0.11 0.54 ± 0.11 0.52 ± 0.09 0.65 ± 0.12 0.58
0.63 ± 0.09 0.56 ± 0.07 0.57 ± 0.10 0.55 ± 0.12 0.58 ± 0.08 0.58
an example of how these variances play out and why the confusion matrices are
so important to understanding the classification results. Here, both the ldc and udc
classifiers have an average overall correct rate of around 70%, but the udc classifier
has difficulty correctly classifying the van/truck/bus class.
This example also illustrates the importance of having large datasets to create
training and testing sets with a sufficient number of observations for each class. Both
the 250–500 and 250–1000 datasets used here have fewer than 10 instances per class,
meaning that even at a 60% training percentage, the classification can only be tested
on a few instances.
We are forced to use these small datasets in this situation because our automated
detection routing has a good deal of difficulty locating peaks from the 250 signals. The
detection rate of this 250–500 dataset is 26% and the rate for the 250–1000 dataset
is 29%. This is compared to a detection rate of 89% for the 100–900 signal. For
this reason, and reasons discussed earlier, the 100–900 signal remains our preferred
transmission signal.
8.5.2.2
Head-on Versus Oblique Reflections
Another useful comparison is between data acquired “head-on” and at a slight angle.
Results from stationary vehicles confirm that the recorded signal contains reflected
pulses, and Table 8.11 shows that both datasets have a similar classification performance. With an average correct classification rate of 58% for both, this 250-comb
data isn’t an ideal dataset for reasons that we discussed above, but was the only
dataset containing observations from both orientations.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
289
Fig. 8.30 A comparison of the confusion matrices from the ldc (top) and udc (bottom) classifiers
on data from the 250–500 dataset highlights the importance of using the extra information present
in the confusion matrix. Both classifiers have a mean overall correct rate of approximately 70%, but
the udc classifier has much more difficult time classifying vans/trucks/buses into the correct class
8.5.2.3
Comparison of All Input Signals and Classification into Five
Classes
Finally, we make an overall comparison between the different incident signals for
the three-class problem, and make an attempt at classifying the data from one dataset
into the five original, non-grouped classes.
290
E. A. Dieckman and M. K. Hinders
Table 8.12 A comparison of datasets from all of the transmitted signal types shows very good
classification performance for all classifiers except ldc and fisherc. Removing these classifiers
gives the average in the last column (avg2). The 100–900 and 250 datasets have been discussed
previously in more detail and are included here for completeness. The final row shows the lone
five-class classification attempt, with only seven instances per class
Signal
Classifier
nmsc
perlc
ldc
fisherc
udc
Avg
Avg2
0.98 ±
0.05
10–750
0.96 ±
0.06
100–900 0.89 ±
0.06
250–500 0.92 ±
0.09
250–1000 0.99 ±
0.03
100–900 0.94 ±
5C
0.09
5–750
0.99 ±
0.02
0.94 ±
0.04
0.86 ±
0.05
0.85 ±
0.12
0.98 ±
0.05
0.92 ±
0.08
0.53 ±
0.19
0.61 ±
0.13
0.77 ±
0.07
0.72 ±
0.16
0.56 ±
0.22
0.47 ±
0.14
0.54 ±
0.14
0.50 ±
0.17
0.80 ±
0.09
0.75 ±
0.14
0.68 ±
0.19
0.45 ±
0.19
0.96 ±
0.07
0.88 ±
0.09
0.85 ±
0.06
0.70 ±
0.15
0.97 ±
0.07
0.82 ±
0.15
0.80
0.98
0.78
0.93
0.83
0.87
0.79
0.82
0.84
0.98
0.72
0.89
Table 8.12 shows the results from these comparisons. Even though our reflection
detection algorithm has difficulties with both the 5–750 and 10–750 datasets (as well
as with the 250- datasets as discussed earlier) and can only detect the reflection in
25% of the observations, we get good classification performance. The ldc and fisherc
classifiers give anomalously low mean overall classification performance rates with
high variance. Removing these classifiers, we can calculate an average performance
for the remaining three classifiers, shown in the last column of Table 8.12. With
mean overall classification rates ranging from 82 to 98% we can’t say much about
one signal being preferred to another, except that our algorithms are able to detect
the reflections in the 100–900 signal best.
Even with the severely limited data available for classification into five classes
(only 7 instances per class), we surprisingly find good classification performance,
with an average classification rate of 89%. The best (nmsc at 94%) and worst (udc
at 82%) classifiers for this data is shown in Fig. 8.31.
8.6 Conclusions
We have shown that oncoming vehicles can be classified with a high amount of
accuracy, and at useful distances, using only reflected acoustic pulses. Finding and
aligning these reflected signals is a nontrivial, vital step in the process, but one that
can be successfully automated, especially if the transmitted signal is optimized to
the application. Useful feature vectors that differentiate between vehicles of different
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
291
Fig. 8.31 Classification of 100–900 data into five classes is only possible with a limited dataset of 7
instances per class, but manages mean classification performance rates of 82% for the udc classifier
(top) and 94% for the nmsc classifier (bottom). Classification of 100–900 data into five classes is
only possible with a limited dataset of seven instances per class, but manages mean classification
performance rates of 82% for the udc classifier (top) and 94% for the nmsc classifier (bottom)
292
E. A. Dieckman and M. K. Hinders
classes can be formed by using the Dynamic Wavelet Fingerprint to create alternative time–frequency representations of the reflected signal, and intelligent feature
selection algorithms create information-dense representations of our data that allow
for very fast and accurate classification.
We have found that a 100–900 linear chirp transmitted signal is best optimized
for this particular problem. The signal contains enough energy to propagate long
distances, while remaining compact enough in time to allow easy automatic detection.
With this signal we can consistently attain correct overall classification rates upwards
of 85% at distances of 50 m.
We have investigated a number of sensor modalities that may be appropriate
for mobile walking-speed robots operating in unstructured outdoor environments. A
combination of short- and long-range sensors is necessary for a robot to capture usable
data about its environment. Our prior work had focused on passive thermal infrared
and air-coupled ultrasound as possible short-range sensor modalities. This work
looked at the suitability of the Microsoft Kinect as a short-range active infrared depth
sensor, as well as the performance of a coffee can radar and acoustic echolocation
via acoustic parametric array as long-range sensors for mobile robotics.
In particular, we have demonstrated that the most exciting feature of the Microsoft
Kinect, a low-cost depth sensor, is of limited use in outdoor environments. The active
illumination source in the near infrared is both limited to a range of several meters
and easily saturated by sunlight so that it is mostly useful in nighttime outdoor
environments. The infrared sensor is tuned to this near infrared wavelength and
provides little more information than the included RGB webcam.
The Kinect 4-channel microphone array proved to be of high quality. The microphones are not spatial separated enough to allow for implementation of beamforming
methods at distances over several meters and are limited to a relatively low 16 kHz
sampling rate by current software, but the design of the capsule microphones and
built-in noise cancellation algorithms allow for high-quality recording.
Construction of a coffee can radar showed that such devices are feasible for mobile
robotics, providing long-range detection capability at low cost and in a physically
small package. Since the radar signal is de-chirped to facilitate processing with a
computer sound card, these measurements do not contain much useful information
about scattering from the target. However, radar ranging measurements could provide
an early detection system for a mobile robot, detecting objects at long range before
other sensors are used to classify the object.
Another exciting possible use of the radar sensor is the creating of synthetic aperture radar (SAR) images. This method to create a three-dimensional representation
of the radar scattering from a target is essentially a set of ranging measurements
acquired over a wide area. Normally this requires either an array of individual radar
sensors or a radar that can be steered by beam forming but is a natural fit for mobile
robotics since the radar sensor is in motion on a well-defined path.
The main focus of our work has been the use of acoustic echolocation as a longrange sensor for mobile robotics. Using coded signals in the audible range increases
the range of the signal while still allowing for detection in noisy environments. The
acoustic parametric array is able to create a tight beam of this low-frequency sound,
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
293
directing the majority of the sound energy on the target. This serves the dual purpose
of reducing clutter in the backscattered signal and keeping noise pollution added to
the surrounding environment to a minimum level.
As a test of this sensor modality, several thousand acoustic echolocation measurements were acquired from approaching vehicles in a variety of environmental
conditions. The goal was to classify the vehicle into one of five classes (car, SUV,
van, truck, or bus) based on the frontal profile. To test feasibility as a long-range
sensor, vehicles were interrogated at distances up to 50 m.
Initial analysis of the measured backscattered data showed that useful information about the target under is buried deep in noise. Time–frequency representations of
the data, in particular, representations created using the dynamic wavelet fingerprint
(DWFP) process reveal hidden information. The formal framework of statistical pattern classification allowed us to intelligently create small-dimensional, informationdense feature vectors that best describe the target. This process was able to correctly
classify vehicles using only the backscattered acoustic signal with 94% accuracy.
References
1. Pratkanis AT, Leeper AE, Salisbury JK (2013) Replacing the office intern: an autonomous coffee
run with a mobile manipulator. In: IEEE international conference on robotics and automation
2. Dieckman EA (2013) Use of pattern classification algorithms to interpret passive and active data
streams from a walking-speed robotic sensor platform, William & Mary Doctoral Dissertation
3. Cohen L (1995) Time-frequency analysis. Prentice Hall, New Jersey
4. Hou J, Hinders M (2002) Dynamic wavelet fingerprint identification of ultrasound signals.
Mater Eval 60(9):1089–1093
5. Hinders M, Hou J, Keon JM (2005) Wavelet processing of high frequency ultrasound echoes
from multilayers. In: Review of Progress in Quantitative Nondestructive Evaluation, vol 24,
pp 1137–1144
6. Hou J, Hinders M, Rose S (2005) Ultrasonic periodontal probing based on the dynamic wavelet
fingerprint. J Appl Signal Process 7:1137–1146
7. Hinders M, Jones R, Leonard KR (2007) Wavelet thumbprint analysis of time domain reflectometry signals for wiring flaw detection. Eng Intell Syst 15(4):65–79
8. Bertoncini C, Hinders M (2010) Fuzzy classification of roof fall predictors in microseismic
monitoring. Measurement 43:1690–1701
9. Miller CA (2013) Intelligent feature selection techniques for pattern classification of timedomain signals. Doctoral dissertation, The College of William and Mary
10. Bingham J, Hinders M (2009) Lamb wave characterization of corrosion-thinning in aircraft
stringers: experiment and 3d simulation. J Acous Soc Am 126(1):103–113
11. Bertoncini C, Hinders M (2010) Ultrasonic periodontal probing depth determination via pattern classification. In: Thompson D, Chimenti D (eds) 31st review of progress in quantitative
nondestructive evaluation, vol 29. AIP Press, pp 1556–1573
12. Miller CA, Hinders MK (2014) Classification of flaw severity using pattern recognition for
guided wave-based structural health monitoring. Ultrasonics 54:247–258
13. Bertoncini C (2010) Applications of pattern classification to time-domain signals. Doctoral
dissertation, The College of William and Mary
14. Duin R, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax D, Verzakov S (2007) Prtools4.1,
a Matlab toolbox for pattern recognition. Delft University of Technology, http://prtools.org/
15. DARPA (2004) DARPA grand challenge, http://archive.darpa.mil/grandchallenge04/
294
E. A. Dieckman and M. K. Hinders
16. Guizzo E (2011) How Google’s self-driving car works. IEEE spectrum online, http://spectrum.
ieee.org/automaton/robotics/artificial-intelligence/how-google-self-driving-car-works
17. Fisher A (2013) Inside Google’s quest to popularize self-driving cars. Popular science. http://
www.popsci.com/cars/article/2013-09/google-self-driving-car
18. The Economist, “March of the lettuce bot,” The Economist, December 2012. http://economist.
com/news/technology-quarterly/21567202-robotics-machine-helps-lettuce-farmers-justone-several-robots/
19. Hall DL, Llinas J (1997) An introduction to multisensor data fusion. Proc IEEE 85:6–23
20. Fehlman WL (2008) Classification of non-heat generating outdoor objects in thermal scenes
for autonomous robots. Doctoral dissertation, The College of William and Mary
21. Fehlman WL, Hinders MK (2010) Passive infrared thermographic imaging for mobile robot
object identification. J Field Robot 27(3):281–310
22. Fehlman WL, Hinders M (2009) Mobile robot navigation with intelligent infrared image interpretation. Springer, London
23. Open Kinect. http://openkinect.org/
24. OpenNI. http://openni.org/
25. Burrus N, RGBDemo. http://labs.manctl.com/rgbdemo/
26. Microsoft Research, Kinect for windows SDK. http://www.microsoft.com/en-us/
kinectforwindows/
27. Liebe CC, Padgett C, Chapsky J, Wilson D, Brown K, Jerebets S, Goldberg H, Schroeder J
(2006) Spacecraft hazard avoidance utilizing structured light. In: IEEE aerospace conference.
Big Sky, Montana
28. Gao W, Hinders MK (2005) Mobile robot sonar interpretation algorithm for distinguishing
trees from poles. Robot Autonom Syst 53:89–98
29. Hinders M, Gao W, Fehlman W (2007) Sonar sensor interpretation and infrared image fusion
for mobile robotics. In: Kolski S (ed) Mobile robots: perception and navigation. Pro Literatur
Verlag, Germany, pp 69–90
30. Barshan B, Kuc R (1990) Differentiating sonar reflections from corners and planes by employing an intelligent sensor. IEEE Trans Pattern Anal Mach Intell 12(6):560–569
31. Kleeman L, Kuc R (1995) Mobile robot sonar for target localization and classification. Int J
Robot Res 14(4):295–318
32. Hernandez A, Urena J, Mazo M, Garcia J, Jimenez A, Jimenez J, Perez M, Alvarez F,
DeMarziani C, Derutin J, Serot J (2009) Advanced adaptive sonar for mapping applications. J
Intell Robot Syst 55:81–106
33. Laan Labs, Sonar ruler. https://itunes.apple.com/app/sonar-ruler/id324621243?mt=8.
Accessed 5 Aug 2013
34. Westervelt P (1963) Parametric acoustic array. J Acous Soc Am 35:535–537
35. Bennett MB, Blackstock D (1975) Parametric array in air. J Acous Soc Am 57:562–568
36. Yoneyama M, Fujimoto J, Kawamo Y, Sasabe S (1983) The audio spotlight: an application of
nonlinear interaction of sound waves to a new type of loudspeaker design. J Acous Soc Am
73:1532–1536
37. Pompei FJ (1999) The use of airborne ultrasonics for generating audible sound beams. J Audio
Eng Soc 47:726–731
38. Gan W-S, Yang J, Kamakura T (2012) A review of parametric acoustic array in air. Appl Acoust
73:1211–1219
39. Achanta A, McKenna M, Heyman J (2005) Non-linear acoustic concealed weapons detection.
In: Proceedings of the 34th applied imagery and pattern recognition workshop (AIPR05),
pp 7–27
40. Hinders M, Rudd K (2010) Acoustic parametric array for identifying standoff targets. In:
Thompson D, Chimenti D (eds) 31st review of progress in quantitative nondestructive evaluation, vol 29. AIP Press, pp 1757–1764
41. Calicchia P, Simone SD, Marcoberardino LD, Marchal J (2012) Near- to far-field characterization of a parametric loudspeaker and its application in non-destructive detection of detachments
in panel paintings. Appl Acoust 73:1296–1302
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
295
42. Ureda MS (2004) Analysis of loudspeaker line arrays. J Audio Eng Soc 52:467–495
43. Tashev I (2009) Sound capture and processing: practical approaches. Wiley, West Sussex
44. Tashev I (2011) Audio for kinect: from idea to “Xbox, play!”. http://channel9.msdn.com/
events/mix/mix11/RES01
45. Charvat GL, Williams JH, Fenn AJ, Kogon S, Herd JS (2011) Build a small radar system
capable of sensing range, doppler, and synthetic aperture radar imaging. Massachusetts Institute
of Technology: MIT OpenCourseWare RES.LL-003
46. Skolnik M (2008) Radar handbook. McGraw-Hill, New York
47. Richards M (2005) Fundamentals of radar signal processing. McGraw-Hill, New York
48. Pozar DM (2011) Microwave engineering. Wiley, Hoboken
49. Barker JL (1970) Radar, acoustic, and magnetic vehicle detectors. IEEE Trans Vehicul Technol
19:30–43
50. Mills MK (1970) Future vehicle detection concepts. IEEE Trans Vehicul Technol 19:43–49
51. Palubinskas G, Runge H (2007) Radar signatures of a passenger car. IEEE Geosci Remote
Sens Lett 4:644–648
52. Lan J, Nahavandi S, Lan T, Yin Y (2005) Recognition of moving ground targets by measuring
and processing seismic signal. Measurement 37:189–199
53. Sun Z, Bebis G, Miller R (2006) On-road vehicle detection: a review. IEEE Trans Pattern Anal
Mach Intell 28:694–711
54. Buch N, Velastin SA, Orwell J (2011) A review of computer vision techniques for the analysis
of urban traffic. IEEE Trans Intell Transp Syst 12:920–939
55. Thomas DW, Wilkins BR (1972) The analysis of vehicle sounds for recognition. Pattern Recognit 4(4):379–389
56. Nooralahiyan AY, Dougherty M, McKeown D, Kirby HR (1997) A field trial of acoustic
signature analysis for vehicle classification. Transp Res C 5(3–4):165–177
57. Nooralahiyan AY, Kirby HR, McKeown D (1998) Vehicle classification by acoustic signature.
Math Comput Model 27(9–11):205–214
58. Bao M, Zheng C, Li X, Yang J, Tian J (2009) Acoustic vehicle detection based on bispectral
entropy. IEEE Signal Process Lett 16:378–381
59. Guo B, Nixon MS, Damarla T (2012) Improving acoustic vehicle classification by information
fusion. Pattern Anal Appl 15:29–43
60. Averbuch A, Zheludev VA, Rabin N, Schclar A (2009) Wavelet-based acoustic detection of
moving vehicles. Multidim Syst Sign Process 20:55–80
61. Averbuch A, Zheludev VA, Neittaanmaki P, Wartiainen P, Huoman K, Janson K (2011) Acoustic
detection and classification of river boats. Appl Acoust 72:22–34
62. Quaranta V, Dimino I (2007) Experimental training and validation of a system for aircraft
acoustic signature identification. J Aircraft 44:1196–1204
63. Wu H, Siegel M, Khosla P (1999) Vehicle sound signature recognition by frequency vector
principal component analysis. IEEE Trans Instrum Measur 48:1005–1009
64. Aljaafreh A, Dong L (2010) An evaluation of feature extraction methods for vehicle classification based on acoustic signals. In: Proceedings of the 2010 international conference on
networking, sensing and control (ICNSC), pp 570–575
65. Lee J, Rakotonirainy A (2011) Acoustic hazard detection for pedestrians with obscured hearing.
IEEE Trans Intell Transp Syst 12:1640–1649
66. Braun ME, Walsh SJ, Horner J, Chuter R (2013) Noise source characteristics in the ISO 362
vehicle pass-by noise test: literature review. Appl Acoust 74:1241–1265
67. Westervelt P (1957) Scattering of sound by sound. J Acous Soc Am 29:199–203
68. Westervelt P (1957) Scattering of sound by sound. J Acous Soc Am 29:934–935
69. Pompei FJ (2002) Sound from ultrasound- The parametric array as an audible sound source.
Doctoral dissertation, Massachusetts Institute of Technology
70. Pierce AD (1989) Acoustics: an introduction to its physical principles and applications. The
Acoustical Society of America, New York
71. Beyer RT (1960) Parameter of nonlinearity in fluids. J Acous Soc Am 32:719–721
72. Hamilton MF, Blackstock DT (1998) Nonlinear acoustics. Academic, San Diego
73. Beyer RT (1974) Nonlinear acoustics. Brown University Department of Physics, Providence
Chapter 9
Cranks and Charlatans and Deepfakes
Mark K. Hinders and Spencer L. Kirn
Abstract Media gate-keepers who curate information and decide which things regular people should see and read are largely superfluous in the age of social media.
There have always been hoaxsters and pranksters and hucksters, but who and what
online sources to trust and believe is becoming an urgent problem for us regular
people. Snake oil salesmen provide insight into what to watch for when skeptically
evaluating the claims of machine learning software salesreps. Digital photos are so
easy to manipulate that we can’t believe what we see in pictures, of course, but now
deepfake videos mean we can’t tell when video evidence is deliberately misleading.
Classic faked UFO photos instruct us to pay attention to the behavior of the photographer, which can now be done automatically by analyzing tweetstorms surrounding
events. Topic modeling gives a way to form time-series plots of tweetstorm subjects
that can then be transformed via dynamic wavelet fingerprints to isolate shapes that
may be characteristic of organic versus artificial virality.
Keywords Tweetstorm · Digital charlatans · Natural language processing ·
Wavelet fingerprint
9.1 Digital Cranks and Charlatans
Question: Whose snake oil is more dangerous, the crank or the charlatan? First some
historical context. Medicine in the modern sense is a fairly recent phenomenon,
with big advances happening when we’re at war. During the Civil War, surgeons
mostly knew to wash their saws between amputations, but their primary skill was
speed because anesthesia (ether) wasn’t really used in the field hospitals. Part of the
punishment for losing WWI was Germany giving up their aspirin patent. Antibiotics
can be thought of as a WWII innovation along with radar and the A-bomb. In our
current wars to ensure the flow of oil and staunch the resulting spread of terrorism,
M. K. Hinders (B) · S. L. Kirn
W&M Applied Science Department, Williamsburg, VA 23187-8795, USA
e-mail: hinders@wm.edu
© The Editor(s) (if applicable) and The Author(s), under exclusive
license to Springer Nature Switzerland AG 2020
M. K. Hinders, Intelligent Feature Selection for Machine Learning
Using the Dynamic Wavelet Fingerprint,
https://doi.org/10.1007/978-3-030-49395-0_9
297
298
M. K. Hinders and S. L. Kirn
we have been doing an excellent job of saving lives of those who have been gravely
wounded on the battlefield, as evidenced by the now common sight of prosthetic arms
and legs in everyday America. Same goes for the dramatic drop in murder rates that
big-city mayors all took credit for. Emergency medical technology on the battlefield
and in the hood is really quite amazing, but it’s very recent.
In the late ninenteenth century there really wasn’t what we would now call
medicine [1–7]. Nevertheless, there was a robust market for patent medicines, i.e.,
snake oil. Traveling salesmen would hawk them, you could order them from the
Sears and Roebuck catalog, and so on. There was no way to know what was in them
because the requirement to list ingredients on the label didn’t arise until the Pure
Food and Drug Act of 1906. The modern requirement to prove both safety and efficacy prior to marketing a drug or treatment was far off into the distant future. You
might think, then, that soothing syrup that was so effective at getting a teething baby
to sleep was mostly alcohol like some modern cough syrups. Nope. Thanks to recent
chemical analyses of samples from the Henry Ford museum, we now know that most
of these snake oils were as much as thirty percent by weight opiates. Rub a bit of that
on the baby’s gums. Have a swig or three yourself. Everybody has a nice long nap.
Some of the hucksters who mixed up their concoctions in kitchens, barns, etc.
knew full well that their medicines didn’t actually treat or cure any ailments, but the
power of suggestion can and does make people feel better if it’s wrapped up properly
in a convincing gimmick. This turns out to be how ancient Chinese medicine works.
Acupuncture makes you feel better because you think it should make you feel better.
Billions of people over thousands of years can’t have been wrong! Plus, you see the
needles go in and you feel a tingling that must be your Qi unblocking. Don’t worry,
placebos still work even after you’ve been told that they’re placebos. Similarly, tap
water in a plastic bottle sells for way more if there’s a picture of nature on the label.
You know perfectly well there’s no actual Poland Spring in Maine and that your
square-bottled Fiji is no more watery than regular water, but their back stories make
the water taste so much more natural. Plus, everybody remembers that gross kid in
school who put his mouth right on the water fountain, ew!
Hucksters who know perfectly well that their snake oil isn’t medicine are charlatans. They are selling you pretend medicine (or tap water) strictly to make a buck.
The story they tell is what makes their medicine work. Even making it taste kind
of bad is important because we all know strong medicine has side effects. Oh, and
expensive placebos actually do work better than cheap ones, so charlatans don’t have
to feel bad about their high-profit margins.
Some of the hucksters have actually convinced themselves that the concoction
they’ve mixed up has curative powers bordering on the magical, kind of like the
second-grade teacher whose effervescent vitamin tablets prevent you from getting
sick touching in-flight magazines. These deluded individuals are called cranks.
Charlatans are criminals. Cranks are fools. But whose snake oil is more dangerous? Note again that all medicine has side effects, which you hear listed during
commercials for medicines on the evening news. Charlatans don’t want to go to
prison and they want a reliable base of repeat customers. Hence, they will typically
take great care to not kill people off via side effects. Cranks are much more likely to
9 Cranks and Charlatans and Deepfakes
299
Fig. 9.1 Charlatans know full well their snake oil works via the placebo effect. Cranks may overlook
dangerous side effects because they fervently believe in their concoction’s magical curative powers.
School pills give a student knowledge without having to attend lessons [8–10] so that the student’s
time can instead be applied to Athletic pursuits. School pills are sugar coated and easily swallowed,
unlike most machine learning software
overlook quite dangerous side effects in their fervent belief in the ubiquitous healing
powers of their concoctions. That fervent belief can also make cranks surprisingly
effective at convincing others. They are, in fact, true believers. Their snake oil is way
more dangerous than the charlatan’s, even the charlatan who is a pure psychopath
that lacks any ability to care whether you live or die (Fig. 9.1).
Raymond Merriman is a financial astrologer and author of several financial market timing books, and is the recipient of the Regulus Award for Enhancing Astrology’s Image as a Profession [11]. “Financial markets offer an objective means to
test astrological validity. The Moon changes signs every 2–3 days and is valuable
for short-term trading. Planetary stations and aspects identify longer term market
reversals. Approximately 4–5 times per year, markets will form important highs or
lows, which are the most favorable times to buy and sell.” If you’d like to pay actual
money to attend, his workshop “provides research studies showing the correlation of
astrological factors to short-term and longer term financial market timing events in
stock markets, precious metals, and Bitcoin.” His website does offer a free weekly
forecast, so there’s that. Guess what he didn’t forecast in Q1 2020?
Warren Buffet is neither a crank nor a charlatan. He is the second or third wealthiest, and one of the most influential, business people in America [12] having earned
the nickname “Oracle of Omaha” by predicting the future performance of companies
and then hand-picking those that he thought were going to do well. Ordinary people who bought shares in his company have done quite well. Nevertheless, Buffett
advocates index funds [13] for people who are either not interested in managing their
300
M. K. Hinders and S. L. Kirn
own money or don’t have the time. He is skeptical that active management can outperform the market in the long run, and has advised both individual and institutional
investors to move their money to low-cost index funds that track broad, diversified
stock market indices. Buffett said in one of his letters to shareholders that [14] “when
trillions of dollars are managed by Wall Streeters charging high fees, it will usually
be the managers who reap outsized profits, not the clients.” Buffett won a friendly
wager with numerous managers that a simple Vanguard S & P 500 index fund will
outperform hedge funds that charge exorbitant fees. Some call that a bet, some call
it a controlled prospective experiment. If you’re trying to figure out whether a new
miracle cure or magic AI software actually works, you have to figure out some way
to test it in order to make sure you aren’t being fooled or even fooling yourself.
A casual perusal of software platforms for traders, both amateur day-traders and
professional money managers, sets off my bullshit detectors which I have honed over
35 years of learning about and teaching pseudoscience. It seems obvious to me that
there are lots of cranks and charlatans in financial services. I suppose that’s always
been the case, but now that there’s so much AI and machine learning snake oil in the
mix it has become suddenly important to understand their modus operandi. OK, so
the stock market isn’t exactly life and death, but sometimes it sure feels like it and a
little soothing syrup would often hit the spot right about 4pm Eastern time. How do
we go about telling the crank from the charlatan when the snake oil they’re selling is
machine learning software? They’re pretty unlikely to list the algorithmic ingredients
on their labels, since that might involve open-sourcing their software. Even if you
could get access to the source code, good luck decoding that uncommented jumble of
function calls, loops and what ifs. The advice of diplomats might be: trust, but verify.
Scientists would approach it a bit differently. One of the key foundational tenets of
science is that skeptical disbelief is good manners. The burden of proof is always
with the one making a claim of some new finding, and the more amazing the claim
is the more evidence that the claimant must provide. Politicians (and cranks) always
seem to get cross when we don’t immediately believe their outlandish promises, so
we can use that sort of emotional response as a red flag.
Another key aspect of science is that no one experiment, no matter how elegantly
executed, is convincing by itself. Replication, replication, replication. Independent
verification by multiple laboratories is how science comes around to new findings.
Much of the research literature in psychology is suspect right now because of their socalled replication crisis [15]. Standard procedures in psychology have often involved
what’s called p-hacking, which is basically slicing and dicing the results with different
sorts of statistical analysis methods until something notable shows up with statistical
significance. In predictive financial modeling it’s a trivial matter to back test your
algorithms over a range of time slices and market-segment dices until you find a
combination where your algorithms work quite well. Charlatans will try to fool you
by doing this. Cranks will fool themselves. We can use this as another red flag, and
again science has worked out for us a way to proceed.
Let’s say you wanted to design a controlled test of urine therapy [16–24] which is
the crazy idea that drinking your own tinkle has health benefits. I suppose that loners
who believe chugging their first water of the morning will make them feel better do
9 Cranks and Charlatans and Deepfakes
301
actually feel better, but that’s almost certainly just the placebo effect [25]. Hence,
any test of urine therapy would have to be placebo controlled, which brings up the
obvious question of what one would use for a placebo. Most college students that I’ve
polled over the last several years agree that warm Natty Light tastes just like piss, so
that would probably work. So the way a placebo-controlled study goes is, people are
randomly assigned to either drink their own urine or chug a beer, but they can’t know
which group they’re in so we’ll have to introduce a switcheroo of some sort. What
might work is that everybody pees into a cup and then gives the cup to an assistant
who either hands back that cup of piss or hands over an identical cup full of warm
Natty Light according to which group the testee is in. It might even be necessary
to give the testee an N95 facemask and/or the tester a blindfold so as to minimize
the chances of inadvertently passing some clues as to what the cup contains. Then
another research assistant who has no idea whether the testee has been drinking urine
or beer will be asked to assess some proposed health benefit, which is the part that
makes it double blind. If the person interpreting the results knows who’s in which
group, that can easily skew their assessment of the results. Come to think of it, the
testees should probably get an Altoids before they go get their health assessed.
That scheme is pretty standard, but we’ve left out a very important thing that
needs to get nailed down before we start. Namely, what is urine therapy good for?
If we look for ill-defined health benefits we almost certainly will be able to p-hack
the results afterwards in order to find some, so we have to be crystal clear up front
about what it is that drinking urine might do for us. If there’s more than one potential
health benefit, we might just have to run the experiment again with another whole
set of testees. Don’t worry, Natty Light is cheaper than water.
So when somebody has a fancy new bit of Python code for financial modeling or
whatever, we’re going to insist that they show that it works. The burden of proof lies
with them. We are appropriately skeptical of their claims, and the more grandiose their
claims are the more skeptical we are. That’s just good manners, after all. And since
we’re afraid of p-hacking, we’re going to insist that a specific benefit be described.
We might even phrase that as, “What is it good for?” Obviously we’re not going
to buy software unless it solves a need, and we’re not dumb enough to believe that
a magic bit of coded algorithm could solve a whole passel of needs, so pick one
and we’ll test that. We’ll also need something to compare against, which I suppose
could be random guessing but probably is going to be some market average or what an
above-average hedge fund can do or whatever.1 Then we back test, but we’re going to
have to blind things first. That probably means having assistants select slices of time
and dices of market sectors and running back tests without knowing whether they’re
using the candidate machine learning software or the old stand-by comparison. Other
assistants who weren’t involved in running the back tests can then assess the results
or if there’s a simple metric of success then that can be used to automatically score
the new software against the standard. Of course, some of those back tests should
run forward a bit in to the future so that we can get a true test of whether they work
or not. If the vendor really believes in their software, they might even be willing to
1 Warren
Buffet would say compare to some Vanguard low-load index funds.
302
M. K. Hinders and S. L. Kirn
agree to stake a bet on predictions a fair chunk into the future. If they’re right, you’ll
buy more software from them. If they’re wrong, you get your money back and they
have to drink their own urine every morning for month.
9.2 Once You Eliminate the Possible, Whatever Remains,
No Matter How Probable, Is Fake News [26]
Harry Houdini was a mamma’s boy. He loved his mommy. It was awkward for his
wife. Then his mommy died, and he wanted nothing more than to talk to her one
last time to make sure she was all right and knew that he still loved her. Eternally
awkward for his wife. I know what you’re thinking, so here’s Houdini in his own
words, “If there ever was a son who idolized and worshipped his Mother, that son
was myself. My Mother meant my life, her happiness was synonymous with my
peace of mind.” Notice that it’s capital-M.
Houdini was perhaps the world’s most famous performer. He may have been the
most talented magician ever. He was also an amazing physical specimen, and many
of his escapes depended on the rare combination of raw talent, physical abilities,
and practice, practice, practice. His hotness also helped with at least half of his
audience, because of course he had to be half-clothed to prove he wasn’t hiding a
key or whatnot. That may have been awkward for both Mrs. Houdinis.
Sir Arthur Conan Doyle was Houdini’s frenemy. He thought that Houdini wasn’t
abnormally talented physically, but that he had the paranormal ability to disapparate
his physical body from inside the locked chest under water, or whatever, and then
re-apparate it back on land, stage, etc. I suppose that could explain all manner of
seemingly impossible escapes, but geez. Doyle’s mind was responsible for Sherlock
Holmes, who said that when you have eliminated the impossible, whatever remains,
however improbable, must be the truth. Doyle apparently didn’t allow for the possibility of Houdini’s rare combination of raw talent, physical abilities, and practice,
practice, practice.
He did, however, believe in fairies. The cardboard cutout kind. Not in person, of
course, but photos. Notice in Fig. 9.2 that the wings of the hovering fairies are captured
crisply but the waterfall in the background is blurred. That’s not an Instagram filter,
because this was a century ago. Otherwise, those little girls had a pretty strong selfie
game.
It’s always been fun for kids to fool adults. Kate and Margaret Fox were two
little girls in upstate New York in 1848 who weren’t tired even though it was their
bedtime, so they were fooling around. When their mom said to “quiet down up there”
they said, of course, that it wasn’t them. It was Mr. Slipfoot. A spirit. They asked
him questions and he responded in a sort of Morse code by making rapping noises.
Had momma Fox been less gullible she might have asked the obvious question about
ghosts and spirits, “If they can freely pass through walls and such, how can they
rattle a chain or make a rapping noise? Now go to sleep, your father and I worked
hard all day and we’re tired.”
9 Cranks and Charlatans and Deepfakes
303
Fig. 9.2 So you see, comrade, the Russians did not invent fake news on social media
They didn’t quiet down, because they had merely been playing all day and weren’t
tired at all. When their mom came to tell them one last time to hush they had an apple
tied to a string so they could make the rapping noises with her standing right there.
Mr. Slipfoot backed up their story. Momma Fox was amazed. She told the neighbors.
They were similarly amazed, and maybe even a bit jealous because their own special
snowflakes hadn’t been singled out by the Spirits to communicate on behalf of the
living. How long ‘till I die, and then what? Everybody ever has had at least some
passing curiosity about that.
Kate and Margaret had a married sister who saw the real potential. She took them
on tour, which they then did for the rest of their lives. Margaret gave it up for a
while and admitted it was all trickery, but eventually had to go back to performing
tricks to make a living.2 What amazes me is that as little kids they figured out how to
make the raps by swishing their fingers and then their toes, and do it loudly enough to
make a convincing stage demonstration. Margaret says, “The rappings are simply the
result of a perfect control of the muscles of the leg below the knee, which govern the
tendons of the foot and allow action of the toe and ankle bones that is not commonly
known.” That sounds painful. Margaret continues, “With control of the muscles of
2 Rapping
noises, not those tricks you perv.
304
M. K. Hinders and S. L. Kirn
the foot, the toes may be brought down to the floor without any movement that is
perceptible to the eye. The whole foot, in fact can be made to give rappings by the
use only of the muscles below the knee. This, then is the simple explanation of the
whole method of the knock and raps.” My knee sometimes makes these noises when
I first get out of bed in the morning.
Look up gullible in an old thesaurus and you’ll find its antonym is Houdini. You’ll
also find his picture next to the definition of momma’s boy in most good dictionaries
from the roaring twenties. To quote one final time from his book [27] “I was willing to
believe, even wanted to believe. It was weird to me and with a beating heart I waited,
hoping that I might feel once more the presence of my beloved Mother.” Also, “I
was determined to embrace Spiritualism if there was any evidence strong enough to
down the doubts that have crowded my brain for the last thirty years.” As Houdini
traveled the world performing, he did two things in each town. First, he gave money
to have the graves of local magicians maintained. Second, he sought out the locally
most famous Spiritualist medium and asked her if she could please get in contact
with his beloved Mother, which they all agreed they could easily do. Including Lady
Doyle, Sir Arthur’s wife. They all failed. Lady Doyle failed on Mother’s birthday.
Spiritualist mediums were all doing rather rudimentary magic tricks, which Houdini spotted immediately because it takes one to know one. That pissed him off so
much that he wrote the book I’ve been quoting from. To give an idea of the kind
of things that Spiritualists were getting away with, imagine what you might think
if your favorite medium was found to have your family jewels in his hand.3 You
would be within your rights to assume that you were being robbed. But no, it was the
Spirits who dematerialized your jewels from your safe and rematerialized them into
the hands of the medium while the lights were off for the séance. It’s proof that the
spirits want your medium to have your valuables, and if you call the police that will
piss them off and you’ll never get the spirits to go find Mother and bring her around
to tell you that she’s been thinking of you and watching over you and protecting you
from evil spirits as any good Mother would do for her beloved son.
Most people remember that Houdini left a secret code with his wife, so that after
he died she could ask mediums to contact him and if it was in fact the Great Houdini
he could verify his presence with that code. What most people don’t know is that
Houdini also arranged codes with other close associates who died before he did. No
secret codes were ever returned. If anybody could escape from the great beyond it
would be the world’s (former) greatest escape artist. So much for Spiritualism. But
wait, just because Houdini’s wife wasn’t able to contact her husband after he died,
that doesn’t mean anything. Like Mother would let him talk to that woman now that
she had him all to herself, for eternity.
3 Diamonds
and such, you perv. Geez.
9 Cranks and Charlatans and Deepfakes
305
9.3 Foo Fighters Was Founded by Nirvana Drummer Dave
Grohl After the Death of Grunge
The term “flying saucer” is a bit of a misnomer. When private pilot Kenneth Arnold
kicked off the UFO craze [28] by reporting unidentifiable flying objects in the Pacific
Northwest in 1947, he said they skipped like saucers, not that they were shaped like
saucers. It’s kind of a shame that he didn’t call them flapjacks, because there actually
was an experimental aircraft back then with that nickname, see Fig. 9.3. Many of the
current stealth aircraft are also flying wing designs that look quite a lot like flying
saucers from some angles, and some of the newest military drones look exactly like
flying pancakes.
Fig. 9.3 The Flying Flapjackwas an experimental U.S. Navy fighter aircraft designed by Charles
H. Zimmerman for Vought during World War II. This unorthodox design consisted of a flyingsaucer-shaped body that served as the lifting surface [29]. The proof-of-concept vehicle was built
under a U.S. Navy contract and it made its first flight on November 23, 1942. It has a circular
wing 23.3 feet in diameter and a symmetrical NACA airfoil section. A huge 16-foot diameter
three-bladed propeller was mounted at the tip of each airfoil blanketing the entire aircraft in their
slipstreams. Power was provided by two 80 HP Continental engines. Although it had unusual flight
characteristics and control responses, it could be handled effectively without modern computerized
flight control systems. It could almost hover and it survived several forced landings, including a
nose-over, with no serious damage to the aircraft, or injury to the pilot. Recently, the Vought Aircraft
Heritage Foundation Volunteers donated over 25,000 labor hours to complete a restoration effort,
and the aircraft is on long-term loan from the Smithsonian Institution
306
M. K. Hinders and S. L. Kirn
Fig. 9.4 Most UFO
sightings are early aircraft or
weather balloons or
meteorological phenomena
[31]. The planet Venus is
commonly reported as a
UFO, including by former
President Jimmy Carter
So, lots of people started seeing flying saucers and calling up the government to
report them. The sightings were collected and collated and evaluated in the small,
underfunded Project Bluebook (Fig. 9.4). Then one crashed near Roswell, NM [30]
and the wreckage was recovered. Never mind, said the Air Force. “It was just
a weather balloon.” Actually it was a top-secret Project Mogul balloon, but they
couldn’t say that in 1947 because it was top secret.
If a flying saucer had crashed and been recovered, of course it would have been top
secret. Duh. In 1947 we were this world’s only nuclear superpower. Our industrial
might and fighting spirit (and radar) had defeated the Axis of evil. Now all we
had to worry about was our Soviet frenemies. And all those other Commies. But if
we could reverse-engineer an interplanetary space craft we would be unstoppable.
Unless the aliens attacked, in which case we had better hurry up and figure out all
their advanced technology so we can defend ourselves. Ergo, any potentially usable
physical evidence from a UFO would be spirited away to top-secret government labs
where “top men” would be put to work in our national interest.
The key thing to know about government secrecy is that it’s compartmentalized
[32–43].
You don’t even know what you don’t know because before you can know, somebody who
already knows has to decide that you have a need to know what you don’t yet know. You
will then be read into the program and will know, but you can’t ever say what you know to
somebody who doesn’t have a need to know. There’s no way to know when things become
knowable, so it’s best to make like you don’t know.
For example, Vice President Truman didn’t know about the Manhattan Project [44]
because the President and the Generals didn’t think he needed to know. Then when
Roosevelt died the Generals had to have a slightly awkward conversation about this
9 Cranks and Charlatans and Deepfakes
307
Fig. 9.5 Trent photos with indications of how they are typically cropped. The Condon Committee
concluded,“The object appears beneath a pair of wires, as is seen in Plates 23 and 24. We may
question, therefore, whether it could have been a model suspended from one of the wires. This
possibility is strengthened by the observation that the object appears beneath roughly the same
point in the two photos, in spite of their having been taken from two positions.” and concludes [46]
“These tests do not rule out the possibility that the object was a small model suspended from the
nearby wire by an unresolved thread”
top-secret gadget that could end the war and save millions of American lives. It’s not
too surprising that there’s no good evidence of UFOs, that I know of....
In 1950 Paul and Evelyn Trent took a couple of pictures of a flying saucer above
their farm near McMinnville, OH. Without wreckage, pictures are the best evidence,
except that in the 1950s this is how deepfakes were done so it’s worth critically assessing both the photos themselves and the behavior of those involved. Life magazine
[45] published cropped versions of the photos (Fig. 9.5) but misplaced the original
negatives. That’s highly suspicious, because if real it would be a huge story. In those
days the fakery typically happened when prints were made from the negatives so
careful analysis of the original negatives was key. Another oddity is that because the
roll of film in Paul Trent’s camera was not entirely used up they did not have the film
developed immediately. You never really knew if your pictures had come out until
you had them developed, of course, so you’d think the Trents would take a bunch of
selfies to finish out the roll and run right down to the drug store.
The Trents said that the pictures were taken at 7:30 pm, but analysis of the shadows shows pretty clearly that it was instead morning. Also, the other objects in
the photos—oil tank, bush, fencepost, garage—allow the 3D geometry to be reconstucted. Clearly it’s a small object hanging from the wires and the photos were taken
from two different lateral positions, rather than a large distant object zooming past
with two photos snapped from the same position while panning across as the object
passed by. Despite being thoroughly debunked, the McMinnville UFO photographs
remain perhaps the best publicized in UFO history. They’re most convincing if the
overhead wires are cropped out of the pictures.
Rex Heflin’s UFO photos are at least as famous as the Trent’s (Fig. 9.6) and again
it’s the behavior of the photographer that is most of interest. First of all, he is the
308
M. K. Hinders and S. L. Kirn
Fig. 9.6 In 1965 an Orange County, CA highway inspector named Rex Heflin [47] snapped three
close-up photos through the windows of his truck of a low-flying UFO with the Polaroid camera
he carried for work. They clearly show a round, hat-like object with a dark band around its raised
superstructure, and in the side mirror you can see the line of telephone poles it was hanging from
only one who saw it, but a large UFO in the distance would have been above one
of the busiest freeways in southern California at noon on a weekday. He originally
produced three Polaroids, then later came up with a fourth. Rather than providing
the originals for analysis, he said the “men from NORAD” took them so all he had
was copies. He had snapped the photos from inside the cab of is pickup, so the UFO
is framed by that, and the telephone poles are only visible accidentally in the side
mirror of one of the photos. Oops.
Both the Trent and the Heflin photos have been analyzed by experts on both sides.
Ufologists declared them genuine. Others, not so much. Or any. Just, nope. Double
nope. And these are the best UFO photos in existence, something both sides agree to
stipulate. So there’s that. What we needed was a controlled test of this sort of thing.
What you might call a hoax. It turns out that ufologists are fun to pwn. The Trent
and Heflin episodes are what I call shallowfakes. Warminster isn’t quite a deepfake,
but it’s at least it’s past your navel.
In 1970 Flying Saucer Watch published photographs taken in March of that year
at Cradle Hill in Warminster, UK showing a UFO at night. This was a place where
ufologists would gather together in order to watch the skies and socialize. It was
before the Internet. Mixed in with the enthusiasts were a couple of hoaxers, one of
whom had a camera and excitedly took a picture or three of a UFO across the way.
Figure 9.7 shows one of those photos and also what they actually were looking at.
Two more of the Warminster photos are shown in Fig. 9.8. Part of the hoax was to
shine the purple spotlight and pretend to take a picture of it and then see if anybody
noticed the difference between the UFO in the photo and what they all saw blinking
purple on and off on the other hill. The second part of the test was that some of the
9 Cranks and Charlatans and Deepfakes
309
Fig. 9.7 A UFO over Cradle Hill in Warminster, UK. The photo on the left was pre-exposed with
an indistinct UFO shape. The photo on the right shows what the ufologists were actually seeing.
Note that the UFO in the photo, which indicated by an arrow, isn’t a purple spotlight
Fig. 9.8 Two photos of the scene of the UFO over Cradle Hill in Warminster, UK, except that
these pictures were taken on different nights (with different patterns of lights) and from different
enough perspectives (streetlight spacings are different) that ufologists investigating the sightings
should have noticed
photos were taken on a different night, so the pattern of lights in the pictures was
a bit different. Nobody noticed that either. The hoaxers let the experiment run for
a couple of years and with some amusement watched a series of pseudo-scholarly
analyses pronounce the photos genuine. See for example [48].
At this point you might want to refer back to the Spectral Intermezzo chapter. That
was intended to be a bit silly, of course, but now that we’ve talked explicitly about
cranks/charlatans and hoaxes/pwns, you should be better able to spot such obvious
attempts to apply machine learning lingo to pseudoscientific phenomena. You might
also appreciate Tim Minchin’s 9-min beat poem, Storm [49] where he notes that
Scooby Doo is like CSI for kids.
310
M. K. Hinders and S. L. Kirn
9.4 Digital Imaging Is Why Our Money Gets Redesigned so
Often
A picture is worth a thousand words, so in 2002 the website worth1000.com
started having photoshopping contests. Go ahead and pick your favorites from
Figs. 9.9, 9.10, and 9.11. I may have done the middle one.
We’ve all gotten used to the idea that most images we see in print have been
manipulated, but people didn’t really pay attention until Time magazine added a tear
to an existing photo of the Gipper [51]. First there were Instagram filters (Fig. 9.12)
and now Chinese smartphones come with automatic image manipulation software
built in, and it’s considered bad manners to not “fix” a friend’s pic before you upload
a selfie. We are now apparently in the age of Instagram Face [52]. “It’s a young face,
of course, with poreless skin and plump, high cheekbones. It has catlike eyes and
long, cartoonish lashes; it has a small, neat nose and full, lush lips.”
So now we’re at this strange moment in technology development when seeing may
or may not be believing [54–58]. Figures 9.9 and 9.12 are all in good fun, but you just
have to get used to the idea that the news isn’t curated text delivered to your driveway
in the morning by the paperboy anymore. The news is images that come to that screen
you’re holding all the damn time and scrolling through when you should be watching
where you’re going. “Powerful images of current events, of controversies, of abuses
have been an important driver of social change and public policy. If the public, if the
news consuming, image consuming, picture drenched public loses confidence in the
Fig. 9.9 Three of the five stooges deciding the fate of post-war Europe [50]. Shemp and Curley–Joe
are not pictured. Joe Besser isn’t a real stooge, sorry Stinky
9 Cranks and Charlatans and Deepfakes
311
Fig. 9.10 UFO over William and Mary took no talent and almost no time. The lighting direction
isn’t even consistent between the two component images
ability of photographers to tell the truth in a fundamental way, then the game is up.”
[59] Fake news on social media seems to be able to influence elections. Yikes!
At some level we’re also going to have to get used to the idea that any given
picture might be totally fake. I’m willing to stipulate for the record that there is
no convincing photographic evidence of alien visitation. The best pictures from the
old days were faked, and ufologists showed themselves incapable of even moderate
skepticism during the Warminster pwn.4 It would be trivially simple to fake a UFO
selfie these days. But what about video? It’s bad news. Really bad news, for Hermione
Granger in particular.
4I
said pwn here, but it was a reasonably well-controlled prospective test of ufologists.
312
M. K. Hinders and S. L. Kirn
Fig. 9.11 This took some real talent, and time [50]. Even today, I would expect it to win some
reddit gold even though I’m not entirely sure what that means
Fig. 9.12 This Guy Can’t Stop Photoshopping Himself Into Kendall Jenner’s Instagram Pics, and
it’s hilarious because she has someone to’Shop her pics before she uploads them [53]
We probably should have seen it coming, though. In 2007 a series of videos hit
the interwebs showing UFOs over various cities around the world [60, 61] made by a
35-year-old professional animator who had attended one of the most prestigious art
schools in France and brought a decade of experience with computer graphics and
commercial animation to the party. It took a total of 17 h to make the infamous Haiti
and Dominican Republic videos, working by himself using a MacBook Pro and a
suite of commercially available three-dimensional animation programs. The videos
are 100% computer-generated, and may have been intended as a viral marketing ploy
in which case it worked, since they racked up tens of millions of views on YouTube
[62]. “No other footage of a UFO sailing through the sky has been watched by as
many people.”
9 Cranks and Charlatans and Deepfakes
313
9.5 Social Media Had Sped up the Flow of Information
Deepfake is the muggle name for putting the face of a celebrity onto another body in
an existing video. The software is free. It runs on your desktop computer. It’s getting
better fast enough that soon analysis of the deepfake video footage won’t be able
to tell the difference [63–68]. One strategy is to embed watermarks or some sort of
metadata into the video in order to tell if it’s been altered [69]. Some young MBA
has probably already declared confidently via a thick slide deck that Blockchain is
the answer. YouTube’s strategy [70] is to “identify authoritative news sources, bring
those videos to the top of users’ feeds, and support quality journalism with tools and
funding that will help news organizations more effectively reach their audiences. The
challenge is deciding what constitutes authority when the public seems more divided
than ever before on which news sources to trust—or whether to trust the traditional
news industry at all.”
Recall the lesson from the Trents and Rex Heflin that we should look to the
behavior of the photographer to assess the reliability of their photographs. Are they
cranks for charlatans? I forgot to mention it before, but the Trents were repeaters
who had previously faked UFO photos. Those early attempts didn’t go viral via Life
magazine, of course, but in my rulebook any previous faking gets you a Pete Rose
style lifetime ban. Any hint of schenanigans gets you banished. We can’t afford to
play around anymore, so if you’ve been photoshopping yourself into some random
Kardashian’s social media we don’t ever believe your pics even if you’re trying to
be serious this time.
Content moderation at scale isn’t practical, but Twitter can be analyzed in bulk to
see what people are tweeting about and if you have enough tweets, topic modeling
can go much further than keeping track of what’s trending. Tweetstorms are localized
in time, so trending isn’t really an appropriate metric. Trends are first derivatives.
We should at least look at higher derivatives, don’t you think? Second derivatives
give rate of change, or acceleration. That would be better. How fast something trends
and then detrends should be pretty useful. Our current @POTUS has figured out that
tweeting something outrageous just often enough distracts the mainstream media
quite effectively and allows the public narrative to be steered. The mainstream media
even use up screentime on their old-fashioned newscasts to read for oldsters the
things @POTUS just tweeted. They also show whatever viral video trended on the
moms’ Facebook a couple of days ago and had been seen by all the digital natives a
couple of days before that. These patterns can be used to identify deepfakes before
they make it to the screens of people who are too used to believing their own eyes. It’s
not the videos per se that we analyze, but the pattern of spread of the videos and most
importantly the cloud of converstations that are happening in the cloud during that
spread. Social media platforms come and go on a relatively short time scale, so the
key is to be able to analyze the text contained in whatever platform people are using
to communicate and share. We’re talking about Twitter, but the method described
below can be adapted to a wide variety of communication modalities, including voice
because speech-to-text already works pretty well and is getting better.
314
M. K. Hinders and S. L. Kirn
Fig. 9.13 A group of
squirrels that isn’t a family
unit is called a scurry. A
family of squirrels is called a
dray. Most squirrels aren’t
social, though, so they
normally don’t gather in a
group. Oh look: pizza [71,
72]
Unless you downloaded this chapter first and haven’t read the rest of this book, you
probably can guess what I’m about to say: dynamic wavelet fingerprints. Tweetstorms
have compact support, by which I mean that they start and stop. Tweetstorms have
echoes via retweeting and such. Wavelets are mathematically suited to analyzing
time-domain signals that start and stop and echo in very complicated ways. Wavelet
fingerprints allow us to identify patterns in precisely this sort of complex time-domain
signal. First we have to be able to extract meaning from tweetstorms, but that’s what
topic modeling does.
Storms of tweets about shared cultural events will be shaped by the natural time
scale of the events and whether people are watching it live or on tape delay or are
waking up the next morning and just then seeing the results. There should be fingerprints that allow us to identify these natural features. Deepfakery-fueled tweetstorms
and echoes should leave fingerprints that allow us to identify un-natural features.
It should be possible to flag deepfakes before they make it to the moms’ Facebook
feeds and oldsters’ television newscasts (Fig. 9.13).
9.6 Discovering Latent Topics in a Corpus of Tweets
In an age where everyone is rushing to broadcast opinions and inanities and commentaries on the Internet, any mildly notable event can sometimes trigger a tweetstorm.
We define a tweetstorm as any surge in a specific topic on social media, where tweet
is used in a generic sense as a label for any form of unstructured textual data posted to
the interwebs. Since tweetstorms generate vast amounts of text data in short amounts
of time for text analytics, this gives us the ability to dissect and analyze Internet chatter in a host of different situations. Companies can analyze transcribed phone calls
from their call centers to see what customers call about most often, which is exactly
what happens when they say “this call may be monitored for quality control pur-
9 Cranks and Charlatans and Deepfakes
315
poses.” Journalists can analyze tweets from their region to determine which stories
are the most interesting to their audience and will likely have the greatest impact on
the local community. Political campaigns can analyze social media posts to discern
various voters’ opinions on issues and use those insights to refine their messages.
Topic modeling is an extraordinarily powerful tool that allows us to process largescale, unstructured text data to uncover latent topics in what would otherwise be
obscured from human analysis. According to the Domo Data Never Sleeps study,
nearly 475,000 tweets are published every minute [73]. Even with this firehose of
data continually streaming in, it is possible to run analysis in real time to examine
the popular topics of online conversation at any given instant.
Using unsupervised machine learning techniques to leverage large-scale text analytics allows for the discovery of hidden structures, groupings, and themes throughout
massive amounts of data. Although humans possess a natural ability to observe patterns, manual analysis of huge datasets is highly impractical. To extract topics from
Twitter we have explored a range of topic modeling and document clustering methods
including non-negative matrix factorization, doc2vec, and k-means clustering [74].
9.6.1 Document Embedding
Document and word embeddings are instrumental in running any kind of natural
language modeling algorithm because they produce numerical representations of
language, which can be interpreted by a computer. There are two common methods
for creating both document and word embeddings: document-term methods and
neural network methods.
Document-term methods are the simplest and most common ways to create document embeddings. Each of these produces a matrix with dimensions m × n, where
m is the total number of documents in the corpus being analyzed and n is the total
number of words in the vocabulary. Every entry in the matrix represents how a
word in the vocabulary is related to each document. A weight is assigned to each
entry, often according to one of three common weighting styles. The first of these
is a simple binary weighting where a 1 is assigned to all terms that appear in the
corresponding document and a 0 is assigned to all other terms. The frequency of
each term appearing in a specific document has no bearing on the weight, so a term
that appears once in the text has the same weight as a term that appears 100,000
times. The second method, the term frequency weighting method, accounts for the
frequent or infrequent use of a term throughout a text. Term frequency counts the
occurrences of each term in a document and uses that count as the assigned weight
for the corresponding entry of the document-term matrix. The last weighting style
is term frequency-inverse document frequency (TF-IDF). This weighting scheme is
defined as
N
+1 ,
(9.1)
t f -id f t,d = t f t,d × log
d ft
316
M. K. Hinders and S. L. Kirn
where t f -id f t,d is the weight value for a term t in a document d, t f is the number
of occurrences of term t in document d, N is the total number of documents in the
collection, and d f t is the total number of documents the term t occurred in. The TFIDF weight for each term can be normalized by dividing each term by the Euclidean
norm of all TF-IDF weights for the corresponding document. This normalization
ensures that all entries of the document-term matrix are between zero and one. The
TF-IDF weighting scheme weights terms that appear many times in a few documents
higher than terms that appear many times in many documents, allowing for more
descriptive terms to have larger weights. Document-term methods for word and
document embeddings are simple and effective, but they can occasionally create
issues with the curse of dimensionality [75] since they project language into spaces
that may reach thousands of dimensions. Not only do such large embeddings cause
classification issues, but they can cause memory issues because the document-term
matrix might have tens or hundreds of thousands of entries for even a small corpus
and several millions for a larger corpus.
Since large document-term embeddings can lead to issues downstream with uses
of document and word vectors, it is beneficial to have embeddings in lower dimensions. This is why methods like word2vec and doc2vec, developed by Le and
Mikolov [76, 77], are so powerful. These methods work by training a fully connected
neural network to predict context words based on an input word or document. The
weight matrices created in the training stage contain a vector for each term and document in the training set. Each of these vectors is close to semantically similar terms
and documents in the embedding space.
The greatest benefits of using neural network embedding methods are that we can
control the size of the embedding vectors, and that the vectors are not sparse. This
allows vectors to be passed to downstream analysis methods without encountering
the issues of large dimension size and sparsity that commonly arise in document-term
methods. However, they require significantly more data to train a model. While we
can make document-term embeddings from only a few documents, it takes thousands
or perhaps millions of documents and terms to accurately train these embeddings.
Furthermore, the individual dimensions are arbitrary, unlike document-term embeddings where each dimension represents a specific term. Where sufficient training
data does exist, doc2vec has been proven to be adept at topic modeling [78–81].
9.6.2 Topic Models
Clustering is the generic term used in machine learning for grouping objects based
on similarities. Soft clustering methods allow objects to belong to multiple different
clusters and have weights associated with each cluster, while hard clustering assigns
objects to a single group. Matrix decomposition algorithms are popular soft clustering methods because they run quickly and produce interpretable results. Matrix
methods for topic modeling are used for document-term embedding methods, but
they are useless for more sophisticated embedding methods like doc2vec because
9 Cranks and Charlatans and Deepfakes
317
Fig. 9.14 Decomposition of a document-term matrix, A, into a document-topic, W , and a topicterm matrix, H
their dimensions in these are arbitrary. However, there has been some work on implementing non-negative matrix factorization (NMF) on word co-occurrence matrices
to create word and document embeddings like those generated by word2vec and
doc2vec [80, 82].
Non-negative matrix factorization (NMF) was first proposed as a topic modeling
technique by Xu et al. [83]. NMF is a matrix manipulation method that decomposes
an m × n matrix A into two matrices
A = W H,
(9.2)
where W is an m × k matrix, H is a k × n matrix, and k is a predefined parameter
that represents the number of topics. Decomposing a large matrix into two smaller
matrices (Fig. 9.14) is useful for analyzing text from a large corpus of documents
to find underlying trends and similarities. The matrix A is a document-term matrix,
where each row represents one of the m documents in the corpus and each column
represents one of the n terms.
The decomposition of A is calculated by minimizing the function
L =
1
|| A − W H||,
2
(9.3)
where both W and H are constrained to non-negative entries [83] and L is the objective function to be minimized. The matrix norm can be any p-norm, but is usually
either the 1 or 2 norm or a combination of the two. In topic modeling applications,
k is the number of topics to be extracted from the corpus of m documents. Thus, the
matrix W is referred to as the document-topic matrix. Each entry in the documenttopic matrix is the weight of a particular topic in a certain document, where a higher
weight indicates the prevalence of that topic throughout the corresponding document.
The matrix H is the topic-term matrix. Weights in H represent the relevance of each
term to a topic.
318
M. K. Hinders and S. L. Kirn
A popular extension of NMF is as a dynamic topic model (DTM) which tracks
topics through time updating the weights of terms in topics and adding new topics
when necessary [84]. One of the original DTMs was actually derived for latent
dirichlet allocation (LDA) [85]. Several popular DTM extensions of NMF include
[86–88].
Principal component analysis (PCA) is a common method for dimension reduction. PCA uses eigenvectors to find the directions of greatest variance in the data and
uses those as features in a lower dimensional space. In topic modeling, we can use
these eigenvectors as topics inherent in the dataset.
PCA operates on the TF-IDF document-term matrix A. We calculate the covariance matrix of A by pre-multiplying it by its transpose
C=
1
AT A,
n−1
(9.4)
which gives a symmetric covariance matrix with dimensions of the total vocabulary
1
term acting as a normalizer. The covariance matrix shows how often each
with the n−1
word co-occurs with every other word in the vocabulary. We can then diagonalize it
to find the eigenvectors and eigenvalues
C = E D E −1 ,
(9.5)
where the matrix D is a diagonal matrix with eigenvalues as the entries and E is the
eigenvector matrix. The eigenvectors have shape 1 × V , the total number of terms in
the vocabulary. We select the eigenvectors that correspond to the largest eigenvalues
as our principal components and eliminate all others. These eigenvectors are used
as the topics and the original document-term matrix can now be recast into a lower
dimensional document-topic space. The terms defining a topic are found by selecting
the largest entries in each eigenvector.
Singular value decomposition (SVD) uses the same process as PCA, but it does
not rely on the covariance matrix. In the context of topic modeling SVD is also
referred to as latent semantic indexing (LSI) [89] which decomposes matrix A into
A = UΣ V T ,
(9.6)
where Σ is the diagonal matrix with singular values along its entries. The matrix
U represents the document-topic matrix and the matrix V is the topic-term matrix.
The largest values of Σ represent the most common topics occurring in the original
corpus. We select the largest values and the corresponding vectors from the topic-term
matrix as the topics.
Latent dirichlet allocation (LDA) is a probabilistic model which assumes that
documents can be represented as random mixtures over latent topics, where each topic
is a distribution of vocabulary terms [90]. LDA is described as a generative process
which draws a multinomial topic distribution θd =Dir(α), where α is a corpus-level
scaling parameter, for each document, d. Then for each word wn ∈ {w1 , w2 , . . . , w N },
9 Cranks and Charlatans and Deepfakes
319
where N is the total number of terms in d, we draw one of k topics, z d,n , from θd and
choose a word wd,n from p(wn |z n , β). This probability is a multinomial distribution
of length V , where V is the number of terms in the total vocabulary. In this algorithm
α and β are corpus-level parameters that are fit through training. The parameter α
is a scaling parameter and β is a k × V matrix representing the word probabilities
in each topic. β can be compared to the document-topic matrix observed in prior
methods.
The k-means algorithm seeks to partition an n-dimensional dataset into a predetermined number of clusters, where the variance of each cluster is minimized. The
data are initially clustered randomly, and the mean of the cluster is calculated. These
means, called the centroids and denoted μ j , are used to calculate the variance in successive iterations of clusters. After the centroids have been calculated, the distance
between each element of the dataset and each centroid is computed, and samples are
re-assigned to the cluster with the closest centroid. New centroids are then calculated
for the updated clusters. This process continues until the variance for each cluster
reaches a minimum, and the iterative process has converged.
Minimization of the within-cluster sum-of-squares
k
n i=1 j=1
min (||xi − μ j ||2 )
μ j ∈C
is the ultimate goal of the k-means algorithm [91]. Here, a set of n datapoints
x1 , x2 , . . . , xn is divided into k clusters, each with a centroid μ j . The set of all centroids is denoted C. Unfortunately, clustering becomes difficult since the concept of
distance in very high-dimensional spaces is poorly defined. However, k-means clustering often remains usable because it evaluates the variance of a cluster rather than
computing the distance between all points in a cluster. It is generally useful to reduce
the dimension using SVD or PCA before applying clustering to document-term
embeddings, or to train a doc2vec model and cluster the resulting lower dimension
embeddings.
Evaluation of our topic models requires a way to quantify the coherence of each
topic generated. Mimno et al. suggest that there are four varieties of “bad” topics
that may arise in topic modeling [92]:
1. Chained: Chained topics occur when two distinct concepts arise in the same topic
because they share a common word. For example, in a corpus of documents
containing texts about river banks and financial banks, a model might generate a
topic containing the terms “river”, “financial”, and “bank”. Although “river bank”
and “financial bank” are two very different concepts, they still appear in the same
topic due to the shared word “bank”.
2. Intruded: Intruded topics contain sets of distinct topic terms that are not related
to other sets within that topic.
3. Random: Random topics are sets of terms that make little sense when strung
together.
320
M. K. Hinders and S. L. Kirn
4. Unbalanced: Unbalanced topics have top terms that are logically connected, but
also contain a mix of both general terms and very specific terms within the topic.
We want to be able to locate “bad” topics automatically, as opposed to manually
sifting through and assigning scores to all topics. To accomplish this, Mimno et al.
propose a coherence score
C(t, V (t) ) =
M m−1
log
m=2 l=1
D(vm(t) , vl(t) ) + 1
D(vl(t) )
,
(9.7)
where t is the topic, V (t) = [v1(t) · · · v(t)
M ] is the list of the top M terms in topic t,
D(vl(t) ) is the document frequency of term vl(t) , and D(vm(t) , vl(t) ) is the co-document
frequency of the two terms vm(t) and vl(t) [92]. The document frequency is a count of the
number of documents in the corpus in which a term appears, and the co-document
frequency is the count of documents in which two different terms appear. A 1 is
added in the numerator of the log function to avoid the possibility of taking the log
of zero. For highly coherent topics, the value of C will be close to zero while for
incoherent topics this value will become increasingly negative. The only way for C
to be positive is for the terms vm(t) to only appear in documents that also contain vl(t) .
9.6.3 Uncovering Topics in Tweets
As an example showing the application of topic modeling to real Twitter data, we
have scraped a set of tweets from the 2018 World Cup in Russia and extracted the
topics of conversation. Soccer is the most popular sport worldwide, so the World Cup
naturally creates tweet storms whenever matches occur. Beyond a general interest in
the event, national teams and television coverage push conversation on social media
by promoting and posting various hashtags during coverage. Two of the common
World Cup hashtags were #WorldCup and #Russia2018. Individual games even had
their own hashtags such as #poresp for a game between Portugal and Spain or the
individual hashtags for the teams #por #esp.
Along with the continued swell of general online activity, the rise of 5G wireless
technology will facilitate online chatter at stadiums, arenas, and similar venues.
KT Corp., a South Korean telecom carrier, was the first to debut 5G technology
commercially at the 2018 Winter Olympics. 5G wireless will allow for increased
communication at speeds reaching up to 10 gigabits per second, even in areas with
congested networks [93]. Furthermore, 5G will allow for better device-to-device
communication. For text analysis, this means that more people will have the ability
to post thoughts, pictures, videos, etc. during events and might even be able to
communicate with event officials through platforms like Twitter to render a more
interactive experience at events like the World Cup. This will create a greater influx
9 Cranks and Charlatans and Deepfakes
321
of text data which event coordinators can use to analyze their performance and
pinpoint areas that need improvement.
Twitter can be accessed through the API with a Twitter account, and data can be
obtained using the Tweepy module. The API provides the consumer key, consumer
secret, access token, and access token secret, sequences necessary to retrieve Twitter
data. Once access to the Twitter API has been established, there are two ways to
gather tweets: one can either stream them in real time, or pull them from the past.
To stream tweets in real time we use the tweepy.Stream method. To pull older
tweets we use the tweepy.Cursor method, which also provides an option to
search for specific terms throughout Twitter using the tweepy.API().search
function and defining query terms in the argument using q=‘query words’.
Further information about streaming, cursoring, and other related functions can be
found in the Tweepy documentation [94].
We pulled tweets from the group stage of the 2018 World Cup by searching
for posts containing the hashtag #WorldCup. These tweets were published between
June 19–25, 2018. Our dataset is composed of about 16,000 tweets—a small number
relative to the total volume of Twitter, but a sufficient amount for this proof-of-concept
demonstration.
9.6.4 Analyzing a Tweetstorm
We have found that NMF produces better results for our purposes than the other
topic modeling methods we have explored, so the remainder of this chapter relies on
the output of Sci-kit Learn’s NMF implementation [91]. However, the same analysis
could be carried out on any topic model output. Our goal is to obtain the most coherent
set of topics possible by varying the number of extracted topics, k. Testing k values
ranging from 10 to 25, we find that the most coherent topics are produced when
k = 17. The topics are as follows:
• Topic 0: russia2018 worldcup, coherence score: −1.099
• Topic 1: memberships escort discounts google website number news great lasele
bitcoin, coherence score: −1.265
• Topic 2: 2018 russia, coherence score −1.386
• Topic 3: world cup russia win, coherence score: −0.510
• Topic 4: mex (Mexico) ger (Germany) germex mexicovsalemania, coherence
score: −1.449
• Topic 5: arg (Argentina) isl (Iceland) argisl 11, coherence score: −0.869
• Topic 6: nigeria croatia cronga nga (Nigeria) cro (Croatia) supereagles win match
soarsupereagles 20, coherence score: −0.998
• Topic 7: worldcup2018 worldcuprussia2018 rusya2018 rusia2018, coherence
score: −0.992
• Topic 8: mexico germany germex game, coherence score: −1.610
322
M. K. Hinders and S. L. Kirn
• Topic 9: bra (Brazil) brazil brasui switzerland brasil neymar 11 copa2018 coutinho,
coherence score: −1.622
• Topic 10: fifa football, coherence score: −1.386
• Topic 11: vs serbia, coherence score: −1.099
• Topic 12: argentina iceland 11, coherence score: −0.903
• Topic 13: england tunisia eng (England) threelions football tun (Tunisia) engtun
tuneng go tonight, coherence score: −1.485
• Topic 14: fraaus france australia 21 match pogba, coherence score: −1.196
• Topic 15: fifaworldcup2018 fifaworldcup soccer russia2018worldcup fifa18, coherence score: −0.439
• Topic 16: messi ronaldo penalty argisl lionel miss, coherence score: −1.009
Overall, these topics are intelligible and there are no obvious chained, intruded,
unbalanced, or random topics. Even the least coherent topic, topic 9, still makes
good sense, with references to Brazil and Switzerland during their match, Neymar
and Coutinho are both star Brazilian players. There are a few topics that overlap, such
as topics 4 and 8 that both discuss the Germany–Mexico match, and topics 5, 12, and
16 which all discuss the Argentina–Iceland match. Since these particular matches
were highly anticipated, it is not surprising that they prompted enough Twitter activity
to produce several topics.
Tweets are significantly shorter than other documents we might run topic modeling
on, such as news articles, so there are far fewer term co-occurrences. In our dataset,
a portion of the tweets only reference a single occurrence in a match. As a result,
some of our extracted topics are extremely specific. For example, topic 16 seems to
be primarily about Lionel Messi missing a penalty kick, while topic 5 is about the
Argentina–Iceland game in general. One distinction between topics 5 and 12 seems
to be the simple difference in how the game is referenced—the tweets represented by
topic 5 use the hashtag shortenings “arg” and “isl” for Argentina and Iceland, while
the tweets generating topic 12 use the full names of the countries. A similar difference
is seen in topics 4 and 8 referencing the Germany–Mexico match. Topics 6, 9, 13, and
14 all also reference specific games that occurred during the time of data collection.
Beyond individual games we also see topics that reference the hashtags that were
used during the event and how these hashtags overlap in the dataset. Specifically
topics 0, 7, 15 all show different groupings of hashtags. Topic 0 is by far the most
popular of these hashtags with over 9000 tweets referencing it, while topics 7 and
15 each have fewer than 2000 tweets. This makes sense because we searched by
#WorldCup so most tweets should have a reference to it by default.5 These were
also the two hashtags that were being promoted by the traditional media companies
covering the event.
Although topics extracted from such a dataset provide insight, they only tell a
fraction of the story. With enough tweets, we can use time-series representations to
5 It
should be noted here that by definition all tweets have the hashtag #WorldCup because that was
our query term. However, many tweets that push the character limit are truncated through the API.
If #WorldCup appears in the truncated part of the tweet is does not show up in the information we
pull from the API.
9 Cranks and Charlatans and Deepfakes
323
Fig. 9.15 Time-series representation of the World Cup dataset. Each point in the time series represents the total number of tweets occurring within that given five-minute window. All times are in
UTC
gain a deeper understanding of the topic and a way to measure a topic’s relevance
by allowing us to identify various trending behaviors. Did a topic build slowly to a
peak the way a grassroots movement might? Did it spike sharply indicating collective
reaction to a singular event might? These questions can be answered with time-series
representations.
To create a time series from topics, we use the topic-document matrix, W , and
the publish time of each tweet. In our case the tweets in W are in chronological
order. Figure 9.15 shows the time-series representation of the full dataset, f. Each
data point in this figure represents an integer number of tweets that were collected
during a five-minute window. We first iterate through the rows of W , collecting all
the tweets that correspond to one single-time window, and count the number of nonzero entries for each topic. Next, W is normalized across topics so that each row
represents a distribution across topics. We then convert W to a binary matrix, Wbi n ,
by setting every entry in W greater than 0.1 to 1, and sum across each column to get
a count for each topic
i 0
+f(t)
Wbi n (i, k),
(9.8)
G(t, k) =
i=i 0
where G is a T × k matrix representing the time series for each of the discovered
topics, t = 1 . . . T , T is the total time of the time series f, and i 0 is calculated as
i0 =
t−1
f( j).
(9.9)
j=0
The result is a matrix G, where each column gives the time-series representation
of a topic. Two examples of topic progression over time can be seen in Fig. 9.16.
This figure shows the time-series plots of World Cup topics 4 and 5, which discuss
the Germany–Mexico match and the Argentina–Iceland match, respectively. Topic
4 spikes shortly after the match started at 15:00 UTC, with additional peaks at 16:50
324
M. K. Hinders and S. L. Kirn
Fig. 9.16 Time-series representation for topics 4 and 5. Topic 4 is about the Germany–Mexico
match, that occurred on June 17, and topic 5 is about the Argentina–Iceland match that occurred on
June 16
and 15:35. The latter two times correspond to events that occurred during the game:
at 15:35 Mexico scored the first goal to take a 1-0 lead, and at 16:50 the game ended,
signifying Mexico’s defeat of the reigning World Cup champions. Topic 5 focuses on
the Argentina–Iceland game which took place on June 16 at 13:00 UTC. The highest
peak occurs at the conclusion of the game, while another peak occurs at the start.
This was such a popular game because although Iceland was a major underdog, they
were able to play to a tie the star-studded Argentinian team. Two other peaks occur
at 13:20 and 13:25, the times when the two teams scored their only goals in the 1-1
tie.
Although we are able to identify important characteristics of each topic using
this sort of standard time-series representation, we would like to be able to analyze
each topic’s behavior at a deeper level. Fourier analysis does not give behavior
localized in time and spectrograms are almost never the most efficient representations
of localized frequency content. Thus we look to wavelet analysis, which converts a
one-dimensional time series into two dimensions with coefficients representing both
wavelet scale and time. Wavelet scale is the wavelet analog to frequency, where high
wavelet scales represent low-frequency behavior and low-wavelet scales represent
high-frequency behavior [95].
The dynamic wavelet fingerprint is a technique developed [96] to analyze ultrasonic waveforms, though it has been found to be an effective tool for a wide array
of problems [97–105]. It is also effective for characterizing behaviors exhibited by
time series extracted from topic modeling millions of tweets [74]. The key to this
method, and why it has been so broadly applicable is that it has the ability to identify
behavior that is buried deeply in noise in a way the traditional Fourier analysis is
unable to. By casting the time series into higher dimensions and isolating behavior
in time we have shown that we can identify common behavior across diverse topics
from different tweet storms, even if this behavior was not obvious from looking at
their time series.
DWFP casts one-dimensional time-series data into a two-dimensional black and
white time-scale image. Every mother wavelet creates a unique fingerprints, which
9 Cranks and Charlatans and Deepfakes
325
can highlight different features in the time series [95]. The DWFP is performed on a
time series w(t), where t = 1, . . . , T by calculating a continuous wavelet transform
C(a, b) =
+∞
−∞
w(t)ψa,b dt,
(9.10)
where a and b represent the wavelet scale and time shift of the mother wavelet ψa,b ,
respectively. The time shift value, b, represents the point in time the wavelet begins
and allows wavelet analysis to be localized in both time and frequency similar to
a spectrogram [106]. The resulting ns × T coefficient matrix from (9.10), C(a, b),
represents a three-dimensional surface with a coefficient for each wavelet ψa,b , where
ns is the total number of wavelet scales. This matrix is then normalized between −1
and 1
(9.11)
Cnor m (a, b) = C(a, b)/max(|C(a, b)|).
With the normalized surface represented by Cnor m (a, b), a thick contour slice operation is performed to create a binary image I(a, b) such that
I(a, b) =
1, s − r2t ≤ Cnor m (a, b) ≤ s +
0, otherwise
rt
2
,
(9.12)
where s represents the center of each of the S total thick contour slices, s =
(± 1S , ± 2S , . . . , ± SS ), and r t represents the total width of each slice. This gives a
binary matrix, I(a, b), which we call a fingerprint.
Figure 9.17 illustrates the steps performed to create a wavelet fingerprint from
the topic 4 time series. Shown at the top is the raw time series for topic 4. Noise is
removed from the time series using a low-pass filter resulting in the middle image
of Fig. 9.17. Once the noise is removed from a time series it can be processed by the
DWFP algorithm, which generates the fingerprint shown at the bottom of Fig. 9.17.
It should be noted that that shading present in Fig. 9.17 as well as other fingerprints in
this chapter is simply for readability. Generally when analyzing fingerprints just the
binary image is used. The alternating shades show different objects in a fingerprint.
A single object represents either a peak or a valley in the Cnor m matrix, where peaks
refer to areas where all entries are positive and valley is an area where all entries are
negative.
One of the main advantages of using the DWFP for time-series analysis is the ability to create variations in fingerprint representations with different mother wavelets.
This is depicted in Fig. 9.18, which shows three different DWFP transformations of
the filtered time series in Fig. 9.17. All parameters remain constant—only the mother
wavelet is changed. The top image uses the Mexican Hat wavelet, the same as in
Fig. 9.17, the middle fingerprint was created using the 4th Daubechies wavelet, and
the bottom image was created using the 6th Gaussian wavelet. From these representations we can extract features such as ridge count, filled area, best fit ellipses, etc.
Detailed explanations of these features can be found in [107]. Each different repre-
326
M. K. Hinders and S. L. Kirn
Fig. 9.17 Illustration of process to create DWFP of topic 4. Beginning with the raw waveform
(top) we run a low-pass filter to filter out the noise. The filtered waveform (middle) is then passed
through the DWFP process to create the fingerprint (bottom). The resulting fingerprint gives a twodimensional representation of the wavelet transform. In the above example we used the Mexican
Hat wavelet, with r t = 0.12 and s = 5. The different shades show the different objects within the
fingerprint
sentation gives a unique description of the behavior of the signal and can be used to
identify different characteristics. For example the middle fingerprint in Fig. 9.18 has
more activity in low scale wavelets, while the Mexican Hat DWFP shows more low
scale information from the signal and a much wider fingerprint.
Individual objects in a fingerprint tell a story about the temporal behavior of a
signal over a short time. Each object in a fingerprint describes behavior in the time
series in a manner that is not possible through traditional Fourier analysis. In [74]
we analyzed seven different tweet storms of various volumes and contexts. Some
9 Cranks and Charlatans and Deepfakes
327
Fig. 9.18 Example of the ability to change representations of the same time series by simply
changing the mother wavelet. Represented are three DWFP transforms of the filtered time series
in Fig. 9.17. The top is the same Mexican Hat representation shown in Fig. 9.17. The middle representation was created using the fourth Daubechies wavelet. The bottom is was created using the
6th Gaussian wavelet
were political tweet storms, while some were focused on major sporting events
including this World Cup dataset. Through this analysis, we identified 11 distinct
objects that described behavior that differentiated different types of topics. We called
these objects characteristic storm cells.
Figure 9.19 shows the fingerprints from topics 5 (top), 6 (middle), and 9 (bottom).
In each of these topics, as well as the fingerprint for topic 4 at the bottom of Fig. 9.17,
the far left object resembles storm cell 8 as found in [74]. We found this storm cell
was descriptive of topics that build up to a peak and maintain that peak for some time.
Each of the topics representing individual games shows this slow build up. Though
there are occasionally quick spikes for goals scored, these spikes will be registered
328
M. K. Hinders and S. L. Kirn
Fig. 9.19 Fingerprints for topics 5 (top), 6 (middle), and 9 (bottom). Each topic represents a
different match that occurred during data collection and all three show relatively similar behavior.
Different shades in each fingerprint represent different objects. Objects give a description of the
temporal behavior of a signal over a short time
in the second object which describes the behavior during the match. These second
objects represent various storm cells such as storm cell 7 which describes behavior
after the initial increase in volume.
Many of the match topics appear similar in the time-series representation, so it
is no surprise that they look similar in the DWFP representation. However, when
looking at some of the other topics, while their time-series representations look quite
different we can show that the behavior is actually quite similar. Figure 9.20 shows
two different topics with their time series and corresponding fingerprints. The top
two images are for topic 0, which represents tweets that had strong correlation to the
hashtags #Russia2018 and #WorldCup, while the bottom two images are for topic 1,
9 Cranks and Charlatans and Deepfakes
329
Fig. 9.20 Time series and DWFP representations for topics 0 (top 2) and 1 (bottom 2). Even though
these two topics emit much different time signatures we can still identify common patterns in their
behavior using the DWFP. Specifically the objects indicated by the red arrows look quite similar to
the characteristic storm cell 0 as found in [74]
330
M. K. Hinders and S. L. Kirn
which represents mostly ads and spam that ran along with the World Cup coverage.
The time series for topic 0 mimics the time series for the dataset as a whole, Fig. 9.15,
with a rhythmic pattern over the three days of data. This should be expected because
most tweets in the dataset referenced #WorldCup. However, topic 1 does not have
this same shape. It exhibits two quick spikes around the same times topic 0 has its
local maximum, showing the advertisers targeted their content for when traffic was
at its heaviest, but there is not the same rhythmic pattern over the three days as shown
from topic 0. While these time series look quite different in the time domain, they
share some key characteristics in the DWFP domain, most notably an object sitting
between the two local maxima, marked by the red arrow in both fingerprints, that
resembles storm cell 0 in [74]. This shows the underlying behavior of the advertisers,
who tailor their ads to World Cup fans, thus they try to run their ads when more users
are online to maximize exposure while minimizing the overall costs of running the
ads.
As teams and networks continue to push conversation and coverage to social
media over time, the amount of data on Twitter will continue to grow. For instance,
during the 2018 Olympics in South Korea, NBC began streaming live events online
and providing real-time coverage via Twitter [108], and several NFL games and
Wimbledon matches were also aired on Twitter in 2016 [109].
The application of topic modeling and DWFP analysis to social media could be
far more powerful than what was illustrated above. Disinformation campaigns have
been launched in at least 28 different countries [110] using networks of automated
accounts to target vulnerable audiences with their fake news stories disguised as
traditional news [111–114].
Content moderation at scale is an intractable problem in a free society, so analyzing characteristics of individual stories posted to social media is not the answer.
Instead we need to look for the underlying behavior of the actors operating campaigns. We can use the DWFP to uncover this behavior by identifying key temporal
signals buried deeply in the noise of social media. Networks of automated accounts
targeting specific audiences will necessarily emit different time signatures than would
a network of human users discussing some topic, as illustrated by ads timed to coincide with high World Cup tweet volume. This can be exploited by identifying the
fingerprint pattern that best identifies this time signature. Topics giving this time
signature can then be flagged so that preventative action can be taken to slow the
dissemination of that disinformation. Paired with spatial methods such as retweet
cascades [115] and network analysis [116, 117] this can differentiate bot driven topics from disinformation campaigns from all other normal topics on social media. Part
of the inherent advantage of the DWFP representation is that it can highlight signal
features of interest that are buried deeply in noise.
9 Cranks and Charlatans and Deepfakes
331
9.7 DWFP for Account Analysis
Much of the spread of inflammatory content and disinformation on social media is
pushed by bots, many of whom operate as part of disinformation campaigns [118–
120]. Thus it is advantageous to be able to identify if an individual account is a bot.
Bot accounts are adept at disguising themselves as humans so they can be exceedingly
difficult to detect [121]. Many bot accounts on social media platforms are harmless
and upfront about their nature. Examples of these include news aggregators and
customer service bots. Of greater concern to the public are accounts that do not
identify themselves as bots, but instead pretend to be real human accounts in order
to fool other users. These types of bots have been identified as a large component
in spreading misinformation on social media [122, 123]. They were instrumental
in the coordinated effort by the Russian operated Internet Research Agency to sow
discord into the United States’ voter base in order to attempt influence the 2016
US Presidential Election [124–126]. Raising concerns about malicious bot accounts
pushing misinformation and misconstruing public sentiment has lead to the need to
develop robust algorithms that can identify bots operating in near real time [127].
We can again use Tweepy to extract metadata from individual Twitter users [94].
Metadata related to an account include total number of followers/following, total
number of tweets, creation date, bio, etc. Metadata is commonly used to detect
bots [128–131]. One of the most popular currently available bot detection algorithms, Botometer—originally BotOrNot—uses over 1,000 features extracted from
an account’s metadata and a random forest classifier to return the likelihood an
account is a bot [132, 133]. However, training a bot detection system purely on
metadata will not create an algorithm that is robust through time. Bot creators are
constantly evolving their methods to subvert detection [134]. Thus, any bot detection
method needs to be able to look beyond surface level metadata and analyze behavioral characteristics of both bots and their networks, as these will be more difficult
to manipulate [135, 136]. We use the wavelet fingerprint to analyze the temporal
posting behavior of an account to identify inorganic behaviors.
Recently we [74] illustrated the basic method analyzing a corpus of tweets curated
from 7 distinct tweet storms that occurred between 2015 and 2018 that we deemed
to have cultural significance. The Twitter API was used to stream tweets as events
unfolded for the Brett Kavanaugh confirmation hearings, Michael Cohen congressional testimony, President Trump’s summit with North Korea’s Kim Jung Un, the
release of the Mueller Report on Russian meddling in the 2016 presidential election,
and the 2018 World Cup. Open datasets were found for the 2018 Winter Olympics,
2017 Unite the Right rally in Charlottesville VA, and the 2015 riots in Baltimore
over the killing of Freddie Gray. In all we collected and analyzed 28,285,124 tweets.
The key to identifying twitterbots is sophisticated analysis of the timing of tweets.
Pre-programmed posts will not have the organic signatures characteristic of humans
posting thoughts and re-tweeting mots. The wavelet fingerprint method can be used
to analyze tweetstorms and automatically classify different types of tweetstorms.
We first use topic modeling to identify the things that are being tweeted about, and
332
M. K. Hinders and S. L. Kirn
then form time series of the topics of interest. We then perform our particular type of
wavelet transform on the time series to return the black-and-white time-scale images.
We then extract features from these wavelet fingerprints to use for machine learning
in order to classify categories of tweetstorms. Using the time series of each topic we
run wavelet fingerprint analyses to get a two-dimensional, time-scale, binary image.
Gaussian mixture model (GMM) clustering is used to identify individual objects, or
storm cells, that are characteristic to specific local behaviors commonly occurring
in topics. The wavelet fingerprint transformation is volume agnostic, meaning we can
compare tweet storms of different intensities. We find that we can identify behavior,
localized in time, that is characteristic to how different topics propagate through
Twitter.
Organic tweetstorms are driven by human emotions, e.g., outrage or team spirit,
and many people are often simultaneously expressing those same, or perhaps the
contra-lateral, emotions via tweets. It’s the characteristic fingerprints of those collective human emotions that bots can’t accurately mimic, because we’re pulling out
the topics from the specific groups of words being used. We have found, for example,
that fans tweeting about World Cup matches are distinguishable from fans tweeting
about Olympic events, presumably because football fans care much more deeply and
may be tweeting from a pub.
With enough tweets, we can use time-series representations to gain a deeper
understanding of the topic and a way to measure a topic’s cultural relevance by
allowing us to identify various trending behaviors. Did a topic build slowly to a peak
the way a grassroots movement might? Did it spike sharply indicating collective
reaction to a singular event might? These questions can be answered with time-series
representations. Although we are often able to identify important characteristics of
each topic using this sort of standard time-series representation, we need to be able to
analyze each topic’s behavior at a deeper level. Fourier analysis does not give behavior
localized in time and spectrograms are almost never the most efficient representations
of localized frequency content. Thus we look to wavelet analysis, which converts a
one-dimensional time series into two dimensions with coefficients representing both
wavelet scale and time. Wavelet scale is the wavelet analog to frequency, where high
wavelet scales represent low-frequency behavior and low-wavelet scales represent
high-frequency behavior. Each mother wavelet creates a unique fingerprints, which
can highlight different features in the time series. One of the main advantages of using
the wavelet fingerprint for time-series analysis is the ability to create variations in
fingerprint representations with different mother wavelets.
As an illustration of how we can use the DWFP to identify inorganic behavior
emitting from a single account we analyze the account trvestuff (@trvestuff). At 7:45
am Eastern time on 22 Jan 20, one of our curated sock puppets (@IBNowitall) with
8 known human followers, tweeted the following, “Al Gore tried to warn us all about
adverse climate change. He had a powerpoint and everything. Now even South Park
admits that manbearpig is real. It’s time to get ready for commerce in the Arctic. I’m
super serial.” It was retweeted almost immediately by a presumed bot programmed
to watch for tweets about climate change. The bot’s profile pic is clearly a teenager
purporting to be in Tacoma, WA where it would have been rather early. First-period
9 Cranks and Charlatans and Deepfakes
333
Fig. 9.21 Time-series analysis of the posting behavior of the account trvestuff (@trvestuff). The
top plot shows the time series of the raw number of posts every 15 min for two weeks. The middle
plot shows the wavelet fingerprint of this time series. The wavelet fingerprint was created using the
Gauss 2 wavelet, 50 wavelet scales, 5 slices, and a ridge thickness of 0.12. The bottom plot shows
the ridge count for the wavelet fingerprint. There is a clearly un-natural repetition to this signal
bell there is 7:35 am and this was a school day. Although this twitter account is only
a few months old, that profile pic has been on the interwebs for more than 5 years,
mostly in non-English-speaking countries. We ran our topic modeling procedure on
that bot’s corpus of tweets and formed a time signature of retweets characteristic of
bots.
Figure 9.21 shows the actual tweeting behavior of trvestuff plotted in time. The
top plot in this figure shows the time-series of all tweets by trevstuff over a two week
period from 5 Mar 20 until 19 Mar 20. Each point in the time series shows how many
tweets the account posts for a 15 min interval. There is clearly a repetitive behavior
exhibited by trvestuff. With 12 h of tweeting 4–6 tweets every hour, then 12 h off.
This is confirmed by looking at the wavelet fingerprint shown in the middle image of
Fig. 9.21. Compare this to Fig. 9.22 which shows the same information for a prolific
human tweeter (@realDonaldTrump) and looks much less regular, with a few spikes
of heavy tweeting activity. Looking at the fingerprint we cannot see any repetitive
behavior, outside of maybe a slight diurnal pattern, which would be natural for any
human tweeter.
Features are extracted from the fingerprints which describe the tweeting behavior
of an account in different ways. For example at the bottom of both Figs. 9.21 and
9.22 is the ridge count for the corresponding wavelet fingerprint. The ridge count
tabulates the number of times each column of pixels in the fingerprint switch from
334
M. K. Hinders and S. L. Kirn
Fig. 9.22 Time-series analysis of the posting behavior of the account Donald Trump (@realDonaldTrump). The top plot shows the time series of the raw number of posts every 15 min for two
weeks. The middle plot shows the wavelet fingerprint of this time series. The wavelet fingerprint
was created using the Gauss 2 wavelet, 50 wavelet scales, 5 slices, and a ridge thickness of 0.12.
The bottom plot shows the ridge count for the wavelet fingerprint
Fig. 9.23 Autocorrelation coefficients for each of the ridge counts shown in Figs. 9.21 (top) and
9.22 (bottom). Each data point shows the correlation coefficient of the ridge count with itself offset
by that many data points
9 Cranks and Charlatans and Deepfakes
335
0 to 1 or 1 to 0. We can use the autocorrelation of the ridge counts to unearth any
repetitive behaviors. Figure 9.23 shows the normalized auto correlation. Each plot
shows how similar the vector is to its self shifted by some amount of time steps,
with a maximum of 1 if they are exactly the same. Again we see further evidence of
the artificial nature of trevstuff’s tweeting behavior. The President’s shows no real
repetition. There is a small spike when the vector is offset by about three days. This
is from the ridge count spikes on March 9th and March 12th aligning. However, the
prominence of that spike is not close to the prominence of the spikes exhibited from
trvestuff.
The DWFP was probably unnecessary to see that trvestuff is a bot posing as a
human. Sophisticated bots operating on Twitter will not be this obvious, of course,
but the goal is to detect bots automatically. The DWFP has shown itself to be adept
at identifying behavior that is buried deeply in noise, so it can identify these inorganic signals that are buried deeply in the noise of a Twitter bot’s posting behavior.
This is the capability necessary to uncover sophisticated bots spreading spam and
disinformation through Twitter. But what else might this be useful for?
9.8 In-Game Sports Betting
Some people think Ultimate Frisbee is a sport. Some people think the earth is shaped
like a Frisbee. Some people think soccer is a sport. Some people think the earth is
shaped like a football. I think American football is a sport. I think basketball is a
sport. I think baseball and golf are excellent games for napping. I think Frisbee is a
sport for dogs. I think muggle quidditch is embarrassing. I have no particular opinion
on soccer, except that I don’t think kids should get a trophy if their feet never touched
the ball. But now that sports betting has been legalized nationally (What were the
odds of that?) we may have to expand our definition of sports to be pretty much any
contest that you can bet on [137]. Americans currently wager a total of $150 billion
each year through illegal sports gambling, although it’s not clear how much of that
is March Madness office pools where most people make picks by their teams’ pretty
colors or offensive mascots.
Some people are worried that sports betting will create ethical issues for the
NCAA. I’m worried that academically troubled student-athletes are encouraged to
sign up for so-called paper classes, which are essentially no-show independent studies
involving a single paper that allows functionally illiterate football players to prop
up their GPAs to satisfy the NCAA’s eligibility requirements.6 As many others have
6
Here’s an actual UNC Tarheel paper that was written for an actual intro class, in which the studentathlete finished with an actual A-: On the evening of December Rosa Parks decided that she was
going to sit in the white people section on the bus in Montgomery, Alabama. During this time blacks
had to give up there seats to whites when more whites got on the bus. Rosa Parks refused to give
up her seat. Her and the bus driver began to talk and the conversation went like this. “Let me have
those front seats” said the driver. She didn’t get up and told the driver that she was tired of giving
her seat to white people. “I’m going to have you arrested,” said the driver. “You may do that,” Rosa
336
M. K. Hinders and S. L. Kirn
pointed out, people who think that most big-time college athletes are at school first
and foremost to be educated are fooling themselves. They’re there to work and
earn money and prestige for the school. That money all stays inside the athletic
departments. The English department faculty don’t even get free tee shirts or tickets
to the games, because that might create a conflict of interest if someday a semi-literate
star player needed an A- to keep his eligibility. Profs. do get to share in the prestige,
though. There might be a small faculty discount at the university bookstore, which
mostly sells tee shirts.
Tweetstorms about the Olympics give characteristically different fingerprints than
the World Cup. Fans are somewhat less emotionally engaged for the former, and
somewhat more likely to be drinking during the latter. Winter Olympics hooligans
simply isn’t a thing, unless Canada loses at curling. Both the Olympics and the World
Cup do take place over about a fortnight, and whoever paid for the broadcast rights
does try hard to generate buildup. Hence, related tweetstorms can be considered to
have compact support in time. Cultural events also often happen over about this sort
of time scale because people collectively are fickle and the public is thought to have
short attention spans. There’s also always some new disaster happening or looming to
grab the public’s attention and perhaps generate advertising dollars. We can analyze
tweetstorms about global pandemics in a manner similar to how we did the Olympics
and the World Cup. The minutiae of the wavelet fingerprints are different in ways
that should allow us to begin to automatically characterize tweetstorm events as they
begin to build and develop so that proper action can be taken. Of particular interest
is distinguishing tweetstorms that blow up organically versus those that are inflated
artificially. Fake news is now widely recognized as a problem in society, but some
politicians have gotten into the habit of accusing anything that they disagree with of
being fake news. Analyzing the tweetstorms that surround an event should allow us
to determine its inherent truthiness. We expect this to be of critical importance during
the next few election cycles, now that deepfake videos can be produced without the
hardware, software and talent that used to be confined to Hollywood special effects
studios.
These techniques can be used to analyze in real time the minutiæ of in-game
sports betting in order to adjust odds on the fly faster than human bookies could ever
do. For a slow-moving sport like major league baseball, fans inside the stadium will
be able to hold a beer in one hand and their stadium-WiFi-connected smartphone in
the other. Teams will want to provide excellent WiFi and convenient apps that let
fans bet on every little thing that happens during the game. They can encourage fans
to come to the game by exploiting the demise of net neutrality to give a moment’s
advantage to fans inside the stadium as compared to those streaming the game at
home. The in-stadium betting can be easily monetized by taking a small cut off the
top for those using the team’s app, but the real issue with in-game betting is adjusting
the odds to make sure they are always in Panem’s favor. We expect a rich storm of
Parks responded. Two white policemen came in and Rosa Parks asked them “why do you all push
us around?” The police officer replied and said “I don’t know, but the law is the law and you’re
under arrest.”
9 Cranks and Charlatans and Deepfakes
337
information to develop around any live sporting event as people draw on multiple
sources of information, guidance, speculation, etc. in order to improve their odds. As
of 2020, that seems likely to be primarily twitter, although we use the terms “tweets”
and “tweetstorms” in the general sense, rather than referring to a particular social
media platform because those come and go all the time. Speech-to-text technology
is robust enough that radio broadcasts can be entrained into topic modeling systems,
and databases such as in-game, in-stadium beer sales can be updated and accessed
in real time. Meteorological information and the like can also be incorporated via a
comprehensive machine learning approach to fine tune the odds. Gotta go, the game’s
about to start.
9.9 Virtual Financial Advisor Is Now Doable
Meanwhile, Warren Buffet and Bill Gates are persuading fellow billionaires to each
commit to giving away half of their wealth to good causes before they die. If none
of the billionaires listed at givingpledge.org think you’re a good cause to
give their money to, you might be interested in some fancier investment strategies
than diversified, low-load index funds. Econometrics is the field of study where
mathematical modeling and statistics is used to figure out why the market(s) just
did that thing that nobody except Warrent Buffet predicted. Business analytics and
machine learning are all the rage right now, and applying artificial intelligence to
financial modeling and prediction seems like a no-lose business proposition. As I
may have mentioned, there seem to be lots of cranks and charlatans in this business
right now.
If you page back through this book you should see that the dynamic wavelet
fingerprint technique works quite well to identify subtle patterns in time-series data.
It would be a trivial matter to form wavelet fingerprints from econometric datasets,
and then to look for patterns that you think might predict the future. The human eye is
very good at seeing patterns in these kinds of fingerprint images, but will see patterns
even if they aren’t there, so you have to be very careful about fooling yourself. The
good news is you can place bets on yourself and see if your predictions return profit
or loss. You could also write a simple wavelet fingerprint app for day traders and see
if that hits the jackpot. If you make billions of dollars doing that sort of thing, be sure
to sign on to the giving pledge. Generously supporting The Alma Mater of a Nation
would be a worthy cause, and we’ll be happy to name a building after you and yours.
Machine learning has rather a lot of potential for doing good in financial services,
which we close with to end on a hopeful note, as opposed to deepfakes and twitter bots
and such. Structured data is typically comprised of well-defined data types whose
pattern makes it easily searchable, while unstructured data is comprised of data that is
usually not as easily searchable including formats like audio, video, and social media
postings that require more preprocessing. It’s now possible to make use of both structured and unstructured data types in a unified machine learning paradigm. Key to
the applications discussed in this book is extracting meaning from time-series data
338
M. K. Hinders and S. L. Kirn
using the standard statistical methods as well as time-series and time-frequency/timescale transformations which convert time-series signals to higher dimensional representations where image processing methods and shape identification methods can
be brought to bear. The standard tools available via day-trading platforms such as
E*trade, TD Ameritrade, etc.—and even the more sophisticated tools on Bloomberg
terminals—can be leveraged to develop advanced signal processing methods for
machine learning for developing a virtual financial advisor (VFA).
A VFA could, for example, give advice on purchasing a car that would best fit
a person’s particular financial situation, accounting for marital status, dependents,
professional status, savings, total debt, credit score, etc. as well as previous cars
purchased. Rather than a simple credit score decision tree, the goal is a more holistic assessment of the person’s situation and financial trajectory and how notional
car-buying decisions would affect that financial trajectory both over the short term
and much longer. Natural language processing allows the VFA to simply engage the
person in a conversation without the pressure of a salesman trying to close a deal,
and without having to type a bunch of information into a stupid webform. Of course,
major financial decisions like buying a car should never be done without an assessment of the state of the larger economy and economic outlook. The VFA algorithms
should allow for incorporation of micro- and macro-economic data including realtime assessments of markets, trending topics on world financial markets discussion
sites (verbal and text-based), and historical data/analysis with an eye toward precursor signals of significant financial events.7 Topic modeling and pattern classification
via dynamic wavelet fingerprints allow a machine learning system to incorporate and
make sense of patterns in all of these diverse and disparate information sources in
order to give give neutral, but tailored guidance to a potential car buyer. The VFA
would need to be able to
• Leverage data of people with similar attributes, e.g., education, professional status,
life events, financial situations, and previous car ownership. The data types and
formats available, especially the time resolution and length of archived data history,
will be critical in this algorithm development.
• Understand semantics in conversations via an up-to-date a library of car names,
nicknames and related jargon, but freely available sources from Wikipedia to specialized subreddits can be scraped and analyzed with topic-modeling methods.
• Present alternatives as suggestions, but not rank ordered in any way because most
people view their cars as an extension of their own self-image and part of the
“personality” of the VFA is that it doesn’t presume to know which car makes you
cool or whatever human emotions happen to be driving your car-buying decisions.
The recommendations would be simply couched in terms of how various options
might affect the your financial situation now and in the future.
Of course the VFA would need to be integrated as a module in an intelligent personal
assistant technology with an interface to interact using (Warren Buffet’s) voice, which
7 As
I’m typing this, oil is at $20 per barrel and some analysts are predicting single digits. Who
could have predicted that?
9 Cranks and Charlatans and Deepfakes
339
means that it could be used to keep track of television shows such as Motorweek and
several of the numerous car-repair and rework shows airing currently, automotive
podcasts, etc. Similarly, it could monitor financial news television shows, podcasts,
etc. with a particular focus on identifying and isolating discussions about the state
of the automotive industry. A side benefit of the ability to monitor and classify
automotive industry and financial audio streams may be to “bookmark” segments that
would be of interest for you and suggest that you might want to listen or watch current
news about the cars you’ve expressed interest in buying. The TextRank algorithm
for automatic keyword extraction and summarization using Levenshtein distance as
relation between text units can be leveraged to tl;dr them.
This sort of system could be expanded to give advice for other decisions such
as home purchases (e.g., finding a home, mortgages and insurance) and college
savings (e.g., 529 College Savings). The VFA would give advice on how to approach
purchasing a home, for example, by helping with mortgages and insurance as well as
finding homes that best suit a person’s family and financial situation. It would make
use of existing home-search tools, like Realtor.com, Zillow.com, etc. Buying a home
is a much more significant financial decision with much longer term consequences,
so a feature of a VFA personality for this would be entering into a patient, longterm conversation about the range of choices. Also, once the person’s dream home
appears on the market there is often a sudden urgency to downselecting on all the
ancillary financial issues, especially financing options and the like. A virtual financial
advisor is ideally suited to all configurations of a hurry-up-and-wait scenario and can
patiently and diligently watch both the local housing market and mortgage rates for
things to perfectly align. Individuals can be confident that the VFA is driven by what’s
in their best interests, not just the need to make its sales numbers this quarter, and
will present ranges of options with implications for their long-term financial well
being at the center of the calculus.
You wouldn’t have to worry about whether the VFA might be a crank (fool) or a
charlatan (crook) because its sole purpose is to collect information and detect patterns
and then give you tailored, unbiased advice. Similarly, modules to offer advice concerning other long-term financial planning decisions, such as college savings, could
be incorporated. Specialized expertise—such as details about how to maximize benefits under the Post-9/11 GI Bill or why it’s better to have grandparents open a 529
College Savings Plan or how the typical discount rate at private colleges affects cost
of attendance—could be incorporated into the VFA. There is a wealth of insider
information about how college works and what it actually costs and how US News
rankings, Final Four results, and post-college placement statistics, etc. affect everything from the acceptance rate, cost of attendance, and subsequent earning potential
for the child. The VFA can help to demystify college for a young parent who needs
to know now how to start planning enough in advance to help open the right doors 10
or 15 years hence without resorting to bribe a coach or something. We think wavelet
fingerprints can help.
340
M. K. Hinders and S. L. Kirn
References
1. Stromberg J (2013) What’s in century-old ‘Snake Oil’ medicines? Mercury and lead, smithsonian.com April 8, 2013. https://www.smithsonianmag.com/science-nature/whats-in-centuryold-snake-oil-medicines-mercury-and-lead-16743639/
2. Silberman S (2012) Are warnings about the side effects of drugs making us sick? PLOS
(The Public Library of Science). http://blogs.plos.org/neurotribes/2012/07/16/are-warningsabout-the-side-effects-of-drugs-making-us-sick/. Accessed 16 July 2012
3. Preston E (2014) Say no to nocebo: how doctors can keep patients’ minds from making them
sicker. Discover Magazine. http://blogs.discovermagazine.com/inkfish/2014/07/09/say-noto-nocebo-how-doctors-can-keep-patients-minds-from-making-them-sicker/. Accessed 9
July 2014
4. Zublin F (2015) Keep your whooping cough to yourself, OZY.com. https://www.ozy.com/
immodest-proposal/keep-your-whooping-cough-to-yourself/60310. Accessed 31 May 2015
5. Lewandowsky S, Mann ME, Bauld L, Hastings G, Loftus EF (2013) The subterranean war
on science. Association for Psychological Science. https://www.psychologicalscience.org/
observer/the-subterranean-war-on-science
6. Mole B (2016) FDA: homeopathic teething gels may have killed 10 babies, sickened
400. Ars Technica. https://arstechnica.com/science/2016/10/fda-homeopathic-teething-gelsmay-have-killed-10-babies-sickened-400/. Accessed 13 Oct 2016
7. Gorski D (2010) The dietary supplement safety act of 2010: a long overdue correction to the
DSHEA of 1994? Science-Based Medicing. http://www.sciencebasedmedicine.org/?p=3772.
Accessed 8 Feb 2010
8. Summers D (2014) Dr. Oz: world’s best snake oil salesman. The Daily Beast. https://www.
thedailybeast.com/dr-oz-worlds-best-snake-oil-salesman. Accessed 14 June 2014
9. McCoy T (2014) Half of Dr. Oz’s medical advice is baseless or wrong, study says. The
Washington Post. http://www.washingtonpost.com/news/morning-mix/wp/2014/12/19/halfof-dr-ozs-medical-advice-is-baseless-or-wrong-study-says/. Accessed 19 Dec 2014
10. Kaplan K (2014) Real-world doctors fact-check Dr. Oz, and the results aren’t pretty.
LA Times. http://www.latimes.com/science/sciencenow/la-sci-sn-dr-oz-claims-fact-checkbmj-20141219-story.html. Accessed 19 Dec 2014
11. Ray Merriman Workshop at ISAR Conference on “Reimagining the Future”. https://isar2020.
org/, https://www.mmacycles.com/events/ray-merriman-workshop/. Accessed 9 Sept 2020
12. How Warren Buffet became one of the wealthiest people in America, A chronological history
of the Oracle of Omaha: 1930–2019 By Joshua Kennon the balance. https://www.thebalance.
com/warren-buffett-timeline-356439. Accessed 25 June 2019
13. Yochim D, Voigt K (2019) Index funds: how to invest and best funds to choose. NerdWallet.
https://www.nerdwallet.com/blog/investing/how-to-invest-in-index-funds/. Accessed 5 Dec
2019
14. Buffett W (2017) ‘Oracle of Omaha’, criticises Wall Street and praises immigrants. The Guardian. https://www.theguardian.com/business/2017/feb/25/warren-buffettberkshire-hathaway-wall-street-apple-annual-letter. Accessed 25 Feb 2017
15. Yong Ed (2016) Psychology’s replication crisis can’t be wished away. The Atlantic. https://
www.theatlantic.com/science/archive/2016/03/psychologys-replication-crisis-cant-bewished-away/472272/. Accessed 4 March 2016
16. Reilly J (2014) Chin chin: urine-drinking hindu cult believes a warm cup before sunrise
straight from a virgin cow heals cancer - and followers are queuing up to try it. The
Daily Mail. http://www.dailymail.co.uk/news/article-2538520/Urine-drinking-Hindu-cultbelieves-warm-cup-sunrise-straight-virgin-cow-heals-cancer-followers-queuing-try-it.
html. Accessed 13 Jan 2014
9 Cranks and Charlatans and Deepfakes
341
17. Adams C (2014) Is there a scientifically detectable difference between high-price liquor and
regular stuff? The Straight Dope. http://www.straightdope.com/columns/read/3142/is-therea-scientifically-detectable-difference-between-high-price-liquor-and-regular-stuff/, http://
www.straightdope.com/columns/read/3142/is-there-a-scientifically-detectable-differencebetween-high-price-liquor-and-regular-stuff/. Accessed 3 Jan 2014
18. Adams W (2014) Wine expert reviews cheap beer. Devour.com. http://devour.com/video/
wine-expert-reviews-cheap-beer/. Accessed 26 Feb 2014
19. Burke J (2013) Woo-ing wine drinkers. Skepchick. http://skepchick.org/2013/01/guest-postwoo-ing-wine-drinkers/. Accessed 19 Jan 2013
20. The bottled water taste test. BuzzFeedVideo. https://youtu.be/2jIC6MBkjjs. Accessed 30 Nov
2014
21. Peters J (2008) What’s the best adult diaper? That depends. Slate Geezers issue. http://
www.slate.com/articles/life/geezers/2008/09/whats_the_best_adult_diaper.html. Accessed
10 Sept 2008
22. Barry-Jester AM (2016) What went wrong in flint. FiveThirtyEight. https://fivethirtyeight.
com/features/what-went-wrong-in-flint-water-crisis-michigan/. Accessed 26 Jan 2016
23. Groll E (2015) WHO: to avoid MERS, Don’t Drink Camel Urine. Foreign Policy. https://
foreignpolicy.com/2015/06/08/who-to-avoid-mers-dont-drink-camel-urine/. Accessed 8
June 2015
24. Lee C (2015) Magic placebo more effective than ordinary placebo. Ars Technica. https://
arstechnica.com/science/2015/04/a-tale-of-two-placebos/, http://journals.plos.org/plosone/
article?id=10.1371/journal.pone.0118440. Accessed 22 April 2015
25. Kubo K, Guillory N (2016) People tried their own urine for the first time and they were disgusted. Buzz Feed. https://www.buzzfeed.com/kylekubo/people-tried-their-own-urine-forthe-first-time-and-they-wer. Accessed 5 Jan 2016
26. This section adapted with permission from: Ignatius B. Nowitall, "Scientific Adulting and BS
Detection" W&N Edutainment, 2020
27. Houdini H (1924) A magician among the spirits. Harper, New York
28. Mizokami K (2018) 70 years and counting, the UFO phenomenon is as mysterious as ever.
Popular Mechanics. https://www.popularmechanics.com/space/a22025557/world-ufo-day2018/. Accessed 2 July 2018
29. Wright T (2013) Why there will never be another Flying Pancake. The end of Vought V173. Air & Space Magazine. http://www.vought.org/rest/html/rv-1731.html, https://www.
airspacemag.com/history-of-flight/restoration-vought-v-173-7990846/
30. Webster D (2017) In 1947, A high-altitude balloon crash landed in roswell. The Aliens Never
Left Smithsonian Magazine. https://www.smithsonianmag.com/smithsonian-institution/in1947-high-altitude-balloon-crash-landed-roswell-aliens-never-left-180963917/. Accessed
5 July 2017
31. Project BLUE BOOK - Unidentified Flying Objects. United States Air Force, 1952–1969.
https://www.archives.gov/research/military/air-force/ufos.html
32. “Top Secret America" ws a project nearly two years in the making that describes the huge
national security buildup in the United States after the Sept. 11, 2001, attacks. The project
was last updated in September 2010. http://projects.washingtonpost.com/top-secret-america/
33. Kirk M, Gilmore J, Wiser M, Smith M (2014) United States of Secrets. Frontline. https://
www.pbs.org/wgbh/frontline/film/united-states-of-secrets/. Accessed 13 May 2014
34. Lundberg J, Pilkington M, Denning R, Kyprianou K (2013) Mirage Men, a journey into
paranoia, disinformation and UFOs. Random Media. World premiere at Sheffield Doc/Fest
in June 2013. https://www.imdb.com/title/tt2254010/
35. Plackett B (2012) Declassified at last: air force’s supersonic flying saucer schematics. Wired.
https://www.wired.com/2012/10/the-airforce/. Accessed 05 Oct 2012
36. Cowing K (2014) CIA admits that it owns all of the flying saucers. NASA Watch. http://
nasawatch.com/archives/2014/12/cia-admits-that.html. Accessed 29 Dec 2014
342
M. K. Hinders and S. L. Kirn
37. Dickson C (2015) Obama adviser John Podesta’s biggest regret: keeping America in dark
about UFOs. Yahoo News. https://www.yahoo.com/news/outgoing-obama-adviser-johnpodesta-s-biggest-regret-of-2014--keeping-america-in-the-dark-about-ufos-234149498.
html. Accessed 13 Feb 2015
38. Cooper H, Blumenthal R, Kean L (2017) Glowing auras and ‘Black Money’: the Pentagon’s
Mysterious U.F.O. Program. New York Times. https://www.nytimes.com/2017/12/16/us/
politics/pentagon-program-ufo-harry-reid.html. Accessed 16 Dec 2017
39. Tchou A (2011) I saw four green objects in a formation, an interactive map of 15 years
of UFO sightings. Slate. http://www.slate.com/articles/news_and_politics/maps/2011/01/i_
saw_four_green_objects_in_a_formation.html. Accessed 11 Jan 2011
40. “British UFO Files Reveal Shocking Sightings And Hoaxes From 1950 To Present” HUFFINGTON POST May 25, 2011 https://www.huffingtonpost.com/2011/03/03/uk-releases-8500pages-of_n_830880.html
41. Darrach HB, Jr, Ginna R (1952) HAVE WE VISITORS FROM SPACE? LIFE Magazine.
http://www.project1947.com/shg/csi/life52.html. Accessed 7 April 1952
42. Sofge E (2009) The 10 most influential UFO-inspired books, movies and TV shows. Popular
Mechanics. https://www.popularmechanics.com/space/g204/4305349/. Accessed 17 March
2009
43. Mikkelson D (2014) Did boyd bushman provide evidence of alien contact? Snopes.com.
https://www.snopes.com/fact-check/boyd-bushman-aliens/. Accessed 31 Oct 2014
44. When president Roosevelt died Truman was was informed by secretary of war, Harry Stimson
of a new and terrible weapon being developed by physicists in New Mexico. https://www.
history.com/this-day-in-history/truman-is-briefed-on-manhattan-project
45. "Farmer Trent’s Flying Saucer". Life: 40. https://books.google.com/books?
id=50oEAAAAMBAJ. Accessed 26 June 1950
46. SCIENTIFIC STUDY OF UNIDENTIFIED FLYING OBJECTS Conducted by the University
of Colorado Under contract No. 44620-67-C-0035 With the United States Air Force, Dr.
Edward U. Condon, Scientific Director (1968). Electronic edition1999 by National Capital
Area Skeptics (NCAS). http://files.ncas.org/condon/text/case46.htm
47. Dom Armentano OC’s moment in UFO history, The Orange County Register October
30, 2009. https://www.ocregister.com/2009/10/30/ocs-moment-in-ufo-history/. See also:
https://www.washingtonpost.com/news/morning-mix/wp/2015/01/21/two-decades-ofmysterious-air-force-ufo-files-now-available-online/
48. Bowen C (1970) “Progress at Cradle Hill” in the March/April edition of Flying Saucer Review.
An international journal devoted to the study of Unidentified Flying Objects vol 17, no 2
(1970). http://www.ignaciodarnaude.com/ufologia/FSR%201971%20V%2017%20N%202.
pdf
49. Minchin’s T (2011) Storm the animated movie. https://www.youtube.com/watch?
v=HhGuXCuDb1U. Accessed 7 April 2011
50. Worth1000.com is now part of DesignCrowd.com, which has preserved all the amazing
Worth1000 content here so you can search the archives to find old favorites and new contest
art. https://blog.designcrowd.com/article/898/worth1000-on-designcrowd
51. Tumulty K (2007) How the right went wrong time magazine. (PHOTOGRAPH BY DAVID
HUME KENNERLY. TEAR BY TIM O’BRIEN) http://content.time.com/time/covers/0,
16641,20070326,00.html. Accessed 15 March 2007
52. “The Age of Instagram Face, How social media, FaceTune, and plastic surgery created a
single, cyborgian look.” by Jia Tolentino. The New Yorker. https://www.newyorker.com/
culture/decade-in-review/the-age-of-instagram-face. Accessed 12 Dec 2019
53. This guy can’t stop photoshopping himself into Kendall Jenner’s instagram pics, twisted Sifter.
http://twistedsifter.com/2017/01/guy-photoshops-himself-into-kendall-jenners-pics/. Kirby
Jenner, Fraternal Twin of Kendall Jenner is @KirbyJenner. Accessed 5 Jan 2017
54. Shiffman D (2013) How to tell if a “shark in flooded city streets after a storm" photo is a fake in
5 easy steps. Southern Fried Science. http://www.southernfriedscience.com/how-to-tell-if-ashark-in-flooded-city-streets-after-a-storm-photo-is-a-fake-in-5-easy-steps/. Accessed 23
Jan 2013
9 Cranks and Charlatans and Deepfakes
343
55. Geigner T (2013) This week’s bad photoshopping lesson comes from scientology (from the
the-thetans-did-it dept). Techdirt. Accessed 16 May 2013
56. Live Science Staff “10 Paranormal Videos Debunked” Live Science. https://www.livescience.
com/33237-6-paranormal-video-hoaxes.html. Accessed 27 April 2011
57. Boehler P (2013) Nine worst doctored photos of Chinese officials. South China
Morning. https://www.scmp.com/news/china-insider/article/1343568/slideshow-doctoredphotos-chinese-officials. Accessed 30 Oct 2013
58. Dickson EJ (2014) Plastic surgeons say Three-Boob Girl is a hoax. The Daily Dot. https://
www.dailydot.com/irl/three-boob-internet-hoax/. Accessed 23 Sept 2014
59. Bruce Shapiro: What happens when Photoshop goes too far? PBS Newshour. https://www.pbs.
org/newshour/show/now-see-exhibit-chronicles-manipulated-news-photos. Accessed 26
July 2015
60. Neil Slade Haiti UFO DEBUNKED Slow Motion and Enhanced Stills, Video posted
on Aug 10, 2007. https://www.youtube.com/watch?v=rrrx9izp0Lchttps://www.snopes.com/
fact-check/ufos-over-haiti/
61. Sarno D (2007) It came from outer space. Los Angeles Times. Accessed 22 Aug 2007. http://
www.latimes.com/newsletters/topofthetimes/la-et-ufo22aug22-story.html
62. Newitz A (2012) Why is this the most popular UFO footage on YouTube?” io9, Gizmodo.
https://io9.gizmodo.com/5912215/why-is-this-the-most-popular-ufo-footage-on-youtube.
Accessed 22 May 2012
63. Weiskott E (2016) Before ‘Fake News’ Came False Prophecy. The Atlantic. https://www.
theatlantic.com/politics/archive/2016/12/before-fake-news-came-false-prophecy/511700/.
Accessed 27 Dec 2016
64. Holiday R (2012) How your fake news gets made (Two Quick Examples). Forbes.
https://www.forbes.com/sites/ryanholiday/2012/05/24/how-your-fake-news-gets-madetwo-quick-examples/. Accessed 24 May 2012
65. Achenbach J (2015) Why do many reasonable people doubt science? National Geographic.
https://www.nationalgeographic.com/magazine/2015/03/
66. Bergstrom CT, West J (2017) Calling bullshit: data reasoning in a digital world. INFO
198/BIOL 106B. University of Washington, Autumn Quarter 2017. https://callingbullshit.
org/syllabus.html
67. Aschwanden C (2015) Science isn’t broken, it’s just a hell of a lot harder than we give it credit
for. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/. Accessed 19
Aug 2015
68. Yglesias M (2018) Mark Zuckerberg has been apologizing for reckless privacy violations
since he was a freshman. Vox. https://www.vox.com/2018/4/10/17220290/mark-zuckerbergfacemash-testimony. Accessed 11 April 2018
69. Constine J (2018) Truepic raises $8M to expose Deepfakes, verify photos for Reddit.
TechCrunch. https://techcrunch.com/2018/06/20/detect-deepfake/. Accessed 20 June 2018
70. Building a better news experience on YouTube, together. YouTube Official Blog, Monday.
https://youtube.googleblog.com/2018/07/building-better-news-experience-on.html.
Accessed 9 July 2018
71. Trulove R (2018) Wildlife writer and conservationist of over a half century. Quora. https://
top.quora.com/What-is-a-group-of-squirrels-called. Accessed 25 March 2018
72. Blakeslee S (1997) Kentucky doctors warn against a regional dish: squirrels’ brains.
New York Times. https://www.nytimes.com/1997/08/29/us/kentucky-doctors-warn-againsta-regional-dish-squirrels-brains.html. Accessed 29 Aug 1997
73. Data Never Sleeps 6.0, https://www.domo.com/learn/data-never-sleeps-6
74. Kirn SL, Hinders MK (2020) Dynamic wavelet fingerprint for differentiation of tweet storm
types. Soc Netw Anal Min 10:4. https://doi.org/10.1007/s13278-019-0617-3
75. Raschka S (2015) Python machine learning. Packt Publishing Ltd., Birmingham
76. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
344
M. K. Hinders and S. L. Kirn
77. Mikolov T et al (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 3111–3119
78. Jedrzejowicz J, Zakrzewska M, (2017) Word embeddings versus LDA for topic assignment
in documents. In: Nguyen N, Papadopoulos G, Jedrzejowicz P, Trawinski B, Vossen G (eds)
Computational collective intelligence. ICCCI, (2017) Lecture notes in computer science, vol
10449. Springer, Cham
79. Lau JH, Baldwin T (2016) An emperical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st workshop on representation learning
for NLP. Berlin, Germany, pp 78–86
80. Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. Adv
Neural Inf Process Syst 2177–2185
81. Niu L, Dai X (2015) Topic2Vec: learning distributed representations of topics, CoRR,
arXiv:1506.08422
82. Xun G et al (2017) Collaboratively improving topic discovery and word embeddings by
coordinating global and local contexts. KDD 17 Research Paper, pp 535–543
83. Xu W et al (2003) Document clustering based on non-negative matrix factorization. In:
Research and development in information retrieval conference proceedings, pp 267–273
84. Patrick J et al (2018) Scalable generalized dynamic topic models. arXiv:1803.07868
85. David B, John L (2006) Dynamic topic models. In: Proceedings of the 23rd international
conference on machine learning, pp 113–129
86. Saha A, Sindhwani V (2012) Learning evolving and emerging topics in social media: a
dynamic nmf approach with temporal regularization. In: Proceedings of the fifth ACM international conference on Web search and data mining, pp 693–702
87. Derek G, Cross James P (2017) Exploring the political agenda of the European parliament
using a dynamic topic modeling approach. Politi Anal 25(1):77–94
88. Yong C et al (2015) Modeling emerging, evolving, and fading topics using dynamic soft
orthogonal nmf with sparse representation. In: 2015 IEEE international conference on data
mining, pp 61–70
89. Deerwester S et al (1990) Indexing by latent semantic analysis. J Amer Soc Inf Sci 41(6):391–
407
90. David B et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
91. Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–
2830
92. Mimno D et al (2011) Optimizing semantic coherence in topic models. In: Proceedings of the
2011 conference on empirical methods in natural language processing, pp 262–272
93. Kim S, Sohee Kim, 5G is making its global debut at Olympics, and it’s wicked fast,
https://www.bloomberg.com/news/articles/2018-02-12/5g-is-here-super-speed-makesworldwide-debut-at-winter-olympics
94. Tweepy Documentation, http://docs.tweepy.org/en/v3.5.0/index.html
95. Daubechies I (1992) Ten lectures on wavelets, vol 61. Siam
96. Hou J, Hinders MK (2002) Dynamic Wavelet fingerprint identification of ultrasound signals.
Mater Eval 60:1089–1093
97. Bertoncini CA, Hinders MK (2010) Fuzzy classification of roof fall predictors in microseismic
monitoring. Measurement 43(10):1690–1701
98. Bertoncini CA, Rudd K, Nousain B, Hinders M (2012) Wavelet fingerprinting of radiofrequency identification (RFID) tags. IEEE Trans Ind Electron 59(12):4843–4850
99. Bingham J, Hinders M (2009) Lamb wave characterization of corrosion-thinning in aircraft
stringers: experiment and three-dimensional simulation. J Acoust Soc Amer 126(1):103–113
100. Bingham J, Hinders M, Friedman A (2009) Lamb wave detection of limpet mines on ship
hulls. Ultrasonics 49(8):706–722
101. Hinders M, Bingham J, Rudd K, Jones R, Leonard K (2006) Wavelet thumbprint analysis of
time domain reflectometry signals for wiring flaw detection. In: Thompson DO, Chimenti DE
(eds) Review of progress in quantitative nondestructive evaluation, vol 25, American Institute
of Physics Conference Series, vol 820, pp 641–648
9 Cranks and Charlatans and Deepfakes
345
102. Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time
extraction for improved tomographic reconstruction. Inverse Probl 20(6):1873–1888
103. Hou J, Rose ST, Hinders MK (2005) Ultrasonic periodontal probing based on the dynamic
wavelet fingerprint. EURASIP J Adv Signal Process 2005:1137–1146
104. Miller CA, Hinders MK (2014) Classification of flaw severity using pattern recognition for
guided wave-based structural health monitoring. Ultrasonics 54(1):247–258
105. Skinner E, Kirn S, Hinders M (2019) Development of underwater beacon for arctic through-ice
communication via satellite. Cold Reg Sci Technol 160:58–79
106. Cohen L (1995) Time-frequency analysis, vol 778. Prentice Hall, Upper Saddle River
107. Bertoncini CA (2010) Applications of pattern classification to time-domain signals. PhD
dissertation, William and Mary, Department of Physics
108. Pivot to Video: Inside NBC’s Social Media Strategy for the 2018 Winter Games
109. Dillinger A, Everything you need to know about Watching the NFL on Twitter, https://www.
dailydot.com/debug/how-to-watch-nfl-games-on-twitter/
110. Bradshaw S, Howard PN (2018) The global organization of social media disinformation
campaigns. J Int Affairs 71:23–35
111. Keller FB et al (2019) Political astroturfing on Twitter: how to coordinate a disinformation
campaign. Polit Commun 1–25
112. Pierri F et al (2019) Investigating Italian disinformation spreading on Twitter in the context
of 2019 European elections, arXiv:1907.08170
113. Yao Y et al (2017) Automated crowdturfing attacks and defenses in online review systems.
In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications
security, pp 1143–1158
114. Zannettou S et al (2019) Disinformation warfare: Understanding state-sponsored trolls on
Twitter and their influence on the web. In: Companion proceedings of the 2019 World Wide
Web Conference, pp 218–226
115. Starbird K and Palen L (2012) (How) will the revolution be retweeted?: information diffusion
and the 2011 Egyptian uprising. In: Proceedings of the acm 2012 conference on computer
supported cooperative work, pp 7–16
116. Xiong F et al (2012) An information diffusion model based on retweeting mechanism for
online social media. Phys Lett A 376(30–31):2103–2108
117. Zannettou S et al (2017) The web centipede: understanding how web communities influence
each other through the lens of mainstream and alternative news sources. In: Proceedings of
the 2017 internet measurement conference, pp 405–417
118. Shao S et al (2017) The spread of fake news by social bots. 96:104 arXiv:1707.07592
119. Woolley SC (2016) Automating power: social bot interference in global politics. First Monday
21(4)
120. Woolley SC (2020) The reality game: how the next wave of technology will break the truth.
PublicAffairs, ISBN, p 9781541768246
121. Schneier B (2020) Bots are destroying political discourse as we know it. The Atlantic
122. Ferrara E et al (2016) The rise of social bots. Commun ACM 59(7):96–104
123. Shao C et al (2018) The spread of low-credibility content by social bots. Nat Commun
9(1):4787
124. Mueller RS (2019) Report on the investivation into Russian interference in teh 2016 Presidental
Election. US Department of Justice, Washington, DC
125. Bessi A, Ferrara E (2016) Social bots distort the 2016 US Presidential Election online discussion. First Monday 21(11–7)
126. Linvill DL et al (2019) "The Russians are Hacking my Brain!" Investigating Russia’s Internet Research Agency Twitter tactics during the 2016 United states Presidential Campaign.
Computers in Human Behavior
127. Subrahmanian VS et al (2016) The DARPA Twitter bot challenge. Computer 49(6):38–46
128. Minnich A et al (2017) Botwalk: efficient adaptive exploration of Twitter bot networks. In:
Proceedings of the 2017 IEEE/ACM international conference on advances in social networks
analysis and mining 2017, pp 467–474
346
M. K. Hinders and S. L. Kirn
129. Varol O et al (2017) Online human-bot interactions: detection, estimation, and characterization. In: Eleventh international AAAI conference on web and social media
130. Kudugunta S, Ferrara E (2018) Deep neural networks for bot detection. Inf Sci 467:312–322
131. Yang K et al (2019) Scalable and generalizable social bot detection through data selection.
arXiv:1911.09179
132. Davis CA et al (2016) Botornot: a system to evaluate social bots. In: Proceedings of the 25th
international conference companion on the World Wide Web, pp 273–274
133. Yang K et al (2019) Arming the public with artificial intelligence to counter social bots. Hum
Behav Emerg Technol 1(1):48–61
134. Cresci S et al (2017) The paradigm-shift of social spambots: evidence, theories and tools
for the arms race. In: Proceedings of the 26th international conference on World Wide Web
companion, pp 963–972
135. Cresci S et al (2017) Social fingerprinting: detection of spambot groups through DNA-inspired
behavioral modeling. IEEE Trans Dependable Secure Comput 15(4):561–576
136. Cresci S et al (2019) On the capability of evolved spambots to evade detection via genetic
engineering. Online Soc Netw Media 9:1–16
137. Stradbrooke S (2018) US supreme court rules federal sports betting ban is unconstitutional. Calvin Ayre.com. https://calvinayre.com/2018/05/14/business/us-supreme-courtrules-paspa-sports-betting-ban-unconstitutional/. Accessed 14 May 2018
Download