Uploaded by alexdicu09

dokumen.pub security-engineering-a-guide-to-building-dependable-distributed-systems-3nbsped-1119642787-9781119642787

advertisement
Security Engineering
Third Edition
Security Engineering
A Guide to Building Dependable
Distributed Systems
Third Edition
Ross Anderson
k
Anderson ffirs01.tex V1 - 11/23/2020 5:59pm Page iv
Copyright © 2020 by Ross Anderson
Published by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada and the United Kingdom
ISBN: 978-1-119-64278-7
ISBN: 978-1-119-64283-1 (ebk)
ISBN: 978-1-119-64281-7 (ebk)
Manufactured in the United States of America
No part of this publication may be reproduced, stored in a retrieval system or transmitted
in any form or by any means, electronic, mechanical, photocopying, recording, scanning or
otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright
Act, without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood
Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher
for permission should be addressed to the Permissions Department, John Wiley & Sons,
Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
www.wiley.com/go/permissions.
k
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this
work and specifically disclaim all warranties, including without limitation warranties of
fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every
situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required,
the services of a competent professional person should be sought. Neither the publisher nor
the author shall be liable for damages arising herefrom. The fact that an organization or Web
site is referred to in this work as a citation and/or a potential source of further information
does not mean that the author or the publisher endorses the information the organization or
website may provide or recommendations it may make. Further, readers should be aware
that Internet websites listed in this work may have changed or disappeared between when
this work was written and when it is read.
For general information on our other products and services please contact our Customer
Care Department within the United States at (877) 762-2974, outside the United States at
(317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some
material included with standard print versions of this book may not be included in e-books
or in print-on-demand. If this book refers to media such as a CD or DVD that is not included
in the version you purchased, you may download this material at booksupport.wiley
.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2020948679
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John
Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not
be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned
in this book.
k
k
For Shireen, Bavani, Nav, Ivan, Lily-Rani, Veddie and Bella
About the Author
I’ve worked with systems for over forty years. I graduated in mathematics
and natural science from Cambridge in the 1970s, and got a qualification in
computer engineering; my first proper job was in avionics; and after getting
interested in cryptology and computer security, I worked in the banking
industry in the 1980s. I then started working for companies who designed
equipment for banks, and then on related applications such as prepayment
electricity meters.
I moved to academia in 1992 but continued to consult to industry on security
technology. During the 1990s, the number of applications that used cryptology
rose rapidly: burglar alarms, car door locks, road toll tags and satellite TV systems all made their appearance. The first legal disputes about these systems
came along, and I was lucky enough to be an expert witness in some of the
important cases. The research team I lead had the good fortune to be in the
right place at the right time when technologies such as peer-to-peer systems,
tamper-resistance and digital watermarking became hot topics.
After I’d taught security and cryptology to students for a few years, it
became clear to me that the existing textbooks were too narrow and theoretical: the security textbooks focused on the access control mechanisms in
operating systems, while the cryptology books developed the theory behind
cryptographic algorithms and protocols. These topics are interesting, and
important. But they’re only part of the story. Most working engineers are
not overly concerned with crypto or operating system internals, but with
getting good tools and learning how to use them effectively. The inappropriate
use of protection mechanisms is one of the main causes of security failure.
I was encouraged by the positive reception of a number of articles I wrote
on security engineering (starting with ‘Why Cryptosystems Fail’ in 1993).
vii
viii
About the Author
Finally, in 1999, I got round to rewriting my class lecture notes and a number
of real-world case studies into a book for a general technical audience.
The first edition of the book, which appeared in 2001, helped me consolidate
my thinking on the economics of information security, as I found that when I
pulled my experiences about some field together into a narrative, the backbone
of the story was often the incentives that the various players had faced. As the
first edition of this book established itself as the standard textbook in the field,
I worked on establishing security economics as a discipline. In 2002, we started
the Workshop on the Economics of Information Security to bring researchers
and practitioners together.
By the time the second edition came out in 2008, it was clear we’d not paid
enough attention to the psychology of security either. Although we’d worked
on security usability from the 1990s, there’s much more to it than that. We need
to understand everything from the arts of deception to how people’s perception of risk is manipulated. So in 2008 we started the Workshop on Security and
Human Behaviour to get security engineers talking to psychologists, anthropologists, philosophers and even magicians.
A sabbatical in 2011, which I spent partly at Google and partly at Carnegie
Mellon University, persuaded me to broaden our research group to hire psychologists and criminologists. Eventually in 2015 we set up the Cambridge
Cybercrime Centre to collect lots of data on the bad things that happen online
and make them available to over a hundred researchers worldwide. This hasn’t
stopped us doing research on technical security; in fact it’s helped us pick more
relevant technical research topics.
A medic needs to understand a whole series of subjects including anatomy,
physiology, biochemistry, pharmacy and psychology, and then temper this
knowledge with experience of working on hundreds of cases with experienced
colleagues. So also a security engineer needs to understand technical subjects
like crypto, access controls, protocols and side channels; but this knowledge
also needs to be honed by studying real cases. My goal in my academic career
has been to pull all this together. The result you now hold in your hands.
I have learned a lot in the process; writing down what you think you know
is a good way of finding out what you don’t. I have also had a lot of fun. I hope
you have as much fun reading it!
Ross Anderson
Cambridge, November 2020
Acknowledgements
A great many people have helped in various ways with the third edition of
this book. I put the chapters online for comment as I wrote them, and I owe
thanks to the many people who read them and pointed out assorted errors and
obscurities. They are: Mansoor Ahmed, Sam Ainsworth, Peter Allan, Amit
Seal Ami, James Andrews, Tom Auger, Asokan, Maria Bada, Daniel Bates,
Craig Bauer, Pilgrim Beart, Gerd Beuster, Johann Bezuidenhoudt, Fred Bone,
Matt Brockman, Nick Bohm, Fred Bone, Phil Booth, Lorenzo Cavallaro, David
Chaiken, Yi Ting Chua, Valerio Cini, Ben Collier, Hugo Connery, Lachlan
Cooper, Franck Courbon, Christopher Cowan, Ot van Daalen, Ezra Darshan,
Roman Dickmann, Saar Drimer, Charles Duffy, Marlena Erdos, Andy Farnell,
Bob Fenichel, David Fernee, Alexis FitzGerald, Jean-Alain Fournier, Jordan
Frank, Steve Friedl, Jerry Gamache, Alex Gantman, Ben Gardiner, Jon Geater,
Stuart Gentry, Cam Gerlach, John Gilmore, Jan Goette, Ralph Gross, Cyril
Guerin, Pedram Hayati, Chengying He, Matt Hermannson, Alex Hicks, Ross
Hinds, Timothy Howell, Nick Humphrey, James Humphry, Duncan Hurwood,
Gary Irvine, Erik Itland, Christian Jeschke, Gary Johnson, Doug Jones, Henrik
Karlzen, Joud Khoury, Jon Kilian, Timm Korte, Ronny Kuckuck, Mart Kung,
Jay Lala, Jack Lang, Susan Landau, Peter Landrock, Carl Landwehr, Peter
Lansley, Jeff Leese, Jochen Leidner, Tom de Leon, Andrew Lewis, David
Lewis, Steve Lipner, Jim Lippard, Liz Louis, Simon Luyten, Christian Mainka,
Dhruv Malik, Ivan Marsa-Maestra, Phil Maud, Patrick McCorry, TJ McIntyre,
Marco Mesturino, Luke Mewburn, Spencer Moss, Steven Murdoch, Arvind
Narayanan, Lakshmi Narayanan, Kristi Nikolla, Greg Norcie, Stanislav
Ochotnický, Andy Ozment, Deborah Peel, Stephen Perlmutter, Tony Plank,
William Porquet, David Pottage, Mark Quevedo, Roderick Rees, Larry Reeves,
Philipp Reisinger, Mark Richards, Niklas Rosencrantz, Andy Sayler, Philipp
ix
x
Acknowledgements
Schaumann, Christian Schneider, Ben Scott, Jean-Pierre Seifert, Mark Shawyer,
Adam Shostack, Ilia Shumailov, Barbara Simons, Sam Smith, Saija Sorsa,
Michael Specter, Chris Tarnovski, Don Taylor, Andrew Thaeler, Kurt Thomas,
Anthony Vance, Jonas Vautherin, Alex Vetterl, Jeffrey Walton, Andrew Watson, Debora Weber-Wulff, Nienke Weiland, David White, Blake Wiggs, Robin
Wilton, Ron Woerner, Bruno Wolff, Stuart Wray, Jeff Yan, Tom Yates, Andrew
Yeomans, Haaroon Yousaf, Tim Zander and Yiren Zhao. I am also grateful to
my editors at Wiley, Tom Dinse, Jim Minatel and Pete Gaughan, and to my
copyeditors Judy Flynn and Kim Wimpsett, who have all helped make the
process run smoothly.
The people who contributed in various ways to the first and second editions
included the late Anne Anderson, Adam Atkinson, Jean Bacon, Robin Ball,
Andreas Bender, Alastair Beresford, Johann Bezuidenhoudt, Maximilian
Blochberger, David Boddie, Kristof Boeynaems, Nick Bohm, Mike Bond,
Richard Bondi, Robert Brady, Martin Brain, John Brazier, Ian Brown, Mike
Brown, Nick Bohm, Richard Bondi, the late Caspar Bowden, Duncan Campbell, Piotr Carlson, Peter Chambers, Valerio Cini, Richard Clayton, Frank Clish,
Jolyon Clulow, Richard Cox, Dan Cvrcek, George Danezis, James Davenport,
Peter Dean, John Daugman, Whit Diffie, Roger Dingledine, Nick Drage,
Austin Donnelly, Ben Dougall, Saar Drimer, Orr Dunkelman, Steve Early, Dan
Eble, Mike Ellims, Jeremy Epstein, Rasit Eskicioǧlu, Robert Fenichel, Fleur
Fisher, Shawn Fitzgerald, Darren Foong, Shailendra Fuloria, Dan Geer, Gary
Geldart, Paul Gillingwater, John Gilmore, Brian Gladman, Virgil Gligor, Bruce
Godfrey, John Gordon, Gary Graunke, Rich Graveman, Wendy Grossman,
Dan Hagon, Feng Hao, Tony Harminc, Pieter Hartel, David Håsäther, Bill Hey,
Fay Hider, Konstantin Hyppönen, Ian Jackson, Neil Jenkins, Simon Jenkins,
Roger Johnston, Oliver Jorns, Nikolaos Karapanos, the late Paul Karger, Ian
Kelly, Grant Kelly, Alistair Kelman, Ronald De Keulenaer, Hyoung Joong Kim,
Patrick Koeberl, Oliver Kömmerling, Simon Kramer, Markus Kuhn, Peter
Landrock, Susan Landau, Jack Lang, Jong-Hyeon Lee, the late Owen Lewis,
Stephen Lewis, Paul Leyland, Jim Lippard, Willie List, Dan Lough, John
McHugh, the late David MacKay, Garry McKay, Udi Manber, John Martin,
Nick Mathewson, Tyler Moore, the late Bob Morris, Ira Moskowitz, Steven
Murdoch, Shishir Nagaraja, Roger Nebel, the late Roger Needham, Stephan
Neuhaus, Andrew Odlyzko, Mark Oeltjenbruns, Joe Osborne, Andy Ozment,
Alexandros Papadopoulos, Roy Paterson, Chris Pepper, Oscar Pereira, Fabien
Petitcolas, Raphael Phan, Mike Roe, Mark Rotenberg, Avi Rubin, Jerry Saltzer,
Marv Schaefer, Denise Schmandt-Besserat, Gus Simmons, Sam Simpson,
Sergei Skorobogatov, Matthew Slyman, Rick Smith, Sijbrand Spannenburg, the
late Karen Spärck Jones, Mark Staples, Frank Stajano, Philipp Steinmetz, Nik
Sultana, Don Taylor, Martin Taylor, Peter Taylor, Daniel Thomas, Paul Thomas,
Acknowledgements
Vlasios Tsiatsis, Marc Tobias, Hal Varian, Nick Volenec, Daniel Wagner-Hall,
Randall Walker, Robert Watson, Keith Willis, Simon Wiseman, Stuart Wray, Jeff
Yan and the late Stefek Zaba. I also owe a lot to my first publisher, Carol Long.
Through the whole process I have been supported by my family, and especially by my long-suffering wife Shireen. Each edition of the book meant over
a year when I was constantly distracted. Huge thanks to all for putting up
with me!
xi
Contents at a Glance
Preface to the Third Edition
Preface to the Second Edition
xxxvii
xli
Preface to the First Edition
xliii
For my daughter, and other lawyers …
xlvii
Foreword
xlix
Part I
Chapter 1
What Is Security Engineering?
3
Chapter 2
Who Is the Opponent?
17
Chapter 3
Psychology and Usability
63
Chapter 4
Protocols
119
Chapter 5
Cryptography
145
Chapter 6
Access Control
207
Chapter 7
Distributed Systems
243
Chapter 8
Economics
275
xiii
xiv
Contents at a Glance
Part II
Chapter 9
Multilevel Security
315
Chapter 10
Boundaries
341
Chapter 11
Inference Control
375
Chapter 12
Banking and Bookkeeping
405
Chapter 13
Locks and Alarms
471
Chapter 14
Monitoring and Metering
497
Chapter 15
Nuclear Command and Control
529
Chapter 16
Security Printing and Seals
549
Chapter 17
Biometrics
571
Chapter 18
Tamper Resistance
599
Chapter 19
Side Channels
639
Chapter 20
Advanced Cryptographic Engineering
667
Chapter 21
Network Attack and Defence
699
Chapter 22
Phones
737
Chapter 23
Electronic and Information Warfare
777
Chapter 24
Copyright and DRM
815
Chapter 25
New Directions?
865
Chapter 26
Surveillance or Privacy?
909
Chapter 27
Secure Systems Development
965
Chapter 28
Assurance and Sustainability
1015
Chapter 29
Beyond “Computer Says No”
1059
Part III
Bibliography
1061
Index
1143
Contents
Preface to the Third Edition
Preface to the Second Edition
xxxvii
xli
Preface to the First Edition
xliii
For my daughter, and other lawyers …
xlvii
Foreword
xlix
Part I
Chapter 1
What Is Security Engineering?
1.1 Introduction
1.2 A framework
1.3 Example 1 – a bank
1.4 Example 2 – a military base
1.5 Example 3 – a hospital
1.6 Example 4 – the home
1.7 Definitions
1.8 Summary
3
3
4
6
7
8
10
11
16
Chapter 2
Who Is the Opponent?
2.1 Introduction
2.2 Spies
2.2.1 The Five Eyes
2.2.1.1 Prism
2.2.1.2 Tempora
2.2.1.3 Muscular
2.2.1.4 Special collection
17
17
19
19
19
20
21
22
xv
xvi
Contents
2.2.1.5 Bullrun and Edgehill
2.2.1.6 Xkeyscore
2.2.1.7 Longhaul
2.2.1.8 Quantum
2.2.1.9 CNE
2.2.1.10 The analyst’s viewpoint
2.2.1.11 Offensive operations
2.2.1.12 Attack scaling
China
Russia
The rest
Attribution
22
23
24
25
25
27
28
29
30
35
38
40
Crooks
2.3.1 Criminal infrastructure
2.3.1.1 Botnet herders
2.3.1.2 Malware devs
2.3.1.3 Spam senders
2.3.1.4 Bulk account compromise
2.3.1.5 Targeted attackers
2.3.1.6 Cashout gangs
2.3.1.7 Ransomware
2.3.2 Attacks on banking and payment systems
2.3.3 Sectoral cybercrime ecosystems
2.3.4 Internal attacks
2.3.5 CEO crimes
2.3.6 Whistleblowers
Geeks
The swamp
2.5.1 Hacktivism and hate campaigns
2.5.2 Child sex abuse material
2.5.3 School and workplace bullying
2.5.4 Intimate relationship abuse
Summary
Research problems
Further reading
41
42
42
44
45
45
46
46
47
47
49
49
49
50
52
53
54
55
57
57
59
60
61
2.2.2
2.2.3
2.2.4
2.2.5
2.3
2.4
2.5
2.6
Chapter 3
Psychology and Usability
3.1 Introduction
3.2 Insights from psychology research
3.2.1 Cognitive psychology
3.2.2 Gender, diversity and interpersonal variation
63
63
64
65
68
Contents
3.2.3
3.2.4
3.2.5
3.3
3.4
Social psychology
3.2.3.1 Authority and its abuse
3.2.3.2 The bystander effect
The social-brain theory of deception
Heuristics, biases and behavioural economics
3.2.5.1 Prospect theory and risk misperception
3.2.5.2 Present bias and hyperbolic discounting
3.2.5.3 Defaults and nudges
3.2.5.4 The default to intentionality
3.2.5.5 The affect heuristic
3.2.5.6 Cognitive dissonance
3.2.5.7 The risk thermostat
Deception in practice
3.3.1 The salesman and the scamster
3.3.2 Social engineering
3.3.3 Phishing
3.3.4 Opsec
3.3.5 Deception research
Passwords
3.4.1 Password recovery
3.4.2 Password choice
3.4.3 Difficulties with reliable password entry
3.4.4 Difficulties with remembering the password
3.4.4.1 Naïve choice
3.4.4.2 User abilities and training
3.4.4.3 Design errors
3.4.4.4 Operational failures
3.4.4.5 Social-engineering attacks
3.4.4.6 Customer education
3.4.4.7 Phishing warnings
3.4.5 System issues
3.4.6 Can you deny service?
3.4.7 Protecting oneself or others?
3.4.8 Attacks on password entry
3.4.8.1 Interface design
3.4.8.2 Trusted path, and bogus terminals
3.4.8.3 Technical defeats of password retry
counters
3.4.9 Attacks on password storage
3.4.9.1 One-way encryption
3.4.9.2 Password cracking
3.4.9.3 Remote password checking
70
71
72
73
76
77
78
79
79
80
81
81
81
82
84
86
88
89
90
92
94
94
95
96
96
98
100
101
102
103
104
105
105
106
106
107
107
108
109
109
109
xvii
xviii
Contents
3.5
3.6
3.4.10 Absolute limits
3.4.11 Using a password manager
3.4.12 Will we ever get rid of passwords?
110
111
113
CAPTCHAs
Summary
Research problems
Further reading
115
116
117
118
Chapter 4
Protocols
4.1 Introduction
4.2 Password eavesdropping risks
4.3 Who goes there? – simple authentication
4.3.1 Challenge and response
4.3.2 Two-factor authentication
4.3.3 The MIG-in-the-middle attack
4.3.4 Reflection attacks
4.4 Manipulating the message
4.5 Changing the environment
4.6 Chosen protocol attacks
4.7 Managing encryption keys
4.7.1 The resurrecting duckling
4.7.2 Remote key management
4.7.3 The Needham-Schroeder protocol
4.7.4 Kerberos
4.7.5 Practical key management
4.8 Design assurance
4.9 Summary
Research problems
Further reading
119
119
120
122
124
128
129
132
133
134
135
136
137
137
138
139
141
141
143
143
144
Chapter 5
Cryptography
5.1 Introduction
5.2 Historical background
5.2.1 An early stream cipher – the Vigenère
5.2.2 The one-time pad
5.2.3 An early block cipher – Playfair
5.2.4 Hash functions
5.2.5 Asymmetric primitives
5.3 Security models
5.3.1 Random functions – hash functions
5.3.1.1 Properties
5.3.1.2 The birthday theorem
5.3.2 Random generators – stream ciphers
5.3.3 Random permutations – block ciphers
145
145
146
147
148
150
152
154
155
157
157
158
159
161
Contents
5.3.4
5.3.5
5.4
5.5
5.6
5.7
Public key encryption and trapdoor one-way
permutations
Digital signatures
Symmetric crypto algorithms
5.4.1 SP-networks
5.4.1.1 Block size
5.4.1.2 Number of rounds
5.4.1.3 Choice of S-boxes
5.4.1.4 Linear cryptanalysis
5.4.1.5 Differential cryptanalysis
5.4.2 The Advanced Encryption Standard (AES)
5.4.3 Feistel ciphers
5.4.3.1 The Luby-Rackoff result
5.4.3.2 DES
Modes of operation
5.5.1 How not to use a block cipher
5.5.2 Cipher block chaining
5.5.3 Counter encryption
5.5.4 Legacy stream cipher modes
5.5.5 Message authentication code
5.5.6 Galois counter mode
5.5.7 XTS
Hash functions
5.6.1 Common hash functions
5.6.2 Hash function applications – HMAC, commitments
and updating
Asymmetric crypto primitives
5.7.1 Cryptography based on factoring
5.7.2 Cryptography based on discrete logarithms
5.7.2.1 One-way commutative encryption
5.7.2.2 Diffie-Hellman key establishment
5.7.2.3 ElGamal digital signature and DSA
5.7.3 Elliptic curve cryptography
5.7.4 Certification authorities
5.7.5 TLS
5.7.5.1 TLS uses
5.7.5.2 TLS security
5.7.5.3 TLS 1.3
5.7.6 Other public-key protocols
5.7.6.1 Code signing
5.7.6.2 PGP/GPG
5.7.6.3 QUIC
5.7.7 Special-purpose primitives
163
164
165
165
166
166
167
167
168
169
171
173
173
175
176
177
178
178
179
180
180
181
181
183
185
185
188
189
190
192
193
194
195
196
196
197
197
197
198
199
199
xix
xx
Contents
5.7.8
5.7.9
5.8
How strong are asymmetric cryptographic
primitives?
What else goes wrong
Summary
Research problems
Further reading
200
202
203
204
204
Chapter 6
Access Control
6.1 Introduction
6.2 Operating system access controls
6.2.1 Groups and roles
6.2.2 Access control lists
6.2.3 Unix operating system security
6.2.4 Capabilities
6.2.5 DAC and MAC
6.2.6 Apple’s macOS
6.2.7 iOS
6.2.8 Android
6.2.9 Windows
6.2.10 Middleware
6.2.10.1 Database access controls
6.2.10.2 Browsers
6.2.11 Sandboxing
6.2.12 Virtualisation
6.3 Hardware protection
6.3.1 Intel processors
6.3.2 Arm processors
6.4 What goes wrong
6.4.1 Smashing the stack
6.4.2 Other technical attacks
6.4.3 User interface failures
6.4.4 Remedies
6.4.5 Environmental creep
6.5 Summary
Research problems
Further reading
207
207
209
210
211
212
214
215
217
217
218
219
222
222
223
224
225
227
228
230
231
232
234
236
237
238
239
240
240
Chapter 7
Distributed Systems
7.1 Introduction
7.2 Concurrency
7.2.1 Using old data versus paying to propagate state
7.2.2 Locking to prevent inconsistent updates
7.2.3 The order of updates
7.2.4 Deadlock
243
243
244
245
246
247
248
Contents
7.2.5
7.2.6
7.3
249
250
Fault tolerance and failure recovery
7.3.1 Failure models
7.3.1.1 Byzantine failure
7.3.1.2 Interaction with fault tolerance
7.3.2 What is resilience for?
7.3.3 At what level is the redundancy?
7.3.4 Service-denial attacks
Naming
7.4.1 The Needham naming principles
7.4.2 What else goes wrong
7.4.2.1 Naming and identity
7.4.2.2 Cultural assumptions
7.4.2.3 Semantic content of names
7.4.2.4 Uniqueness of names
7.4.2.5 Stability of names and addresses
7.4.2.6 Restrictions on the use of names
7.4.3 Types of name
Summary
Research problems
Further reading
251
252
252
253
254
255
257
259
260
263
264
265
267
268
269
269
270
271
272
273
Economics
8.1 Introduction
8.2 Classical economics
8.2.1 Monopoly
8.3 Information economics
8.3.1 Why information markets are different
8.3.2 The value of lock-in
8.3.3 Asymmetric information
8.3.4 Public goods
8.4 Game theory
8.4.1 The prisoners’ dilemma
8.4.2 Repeated and evolutionary games
8.5 Auction theory
8.6 The economics of security and dependability
8.6.1 Why is Windows so insecure?
8.6.2 Managing the patching cycle
8.6.3 Structural models of attack and defence
8.6.4 The economics of lock-in, tying and DRM
8.6.5 Antitrust law and competition policy
8.6.6 Perversely motivated guards
275
275
276
278
281
281
282
284
285
286
287
288
291
293
294
296
298
300
302
304
7.4
7.5
Chapter 8
Non-convergent state
Secure time
xxi
xxii
Contents
8.6.7
8.6.8
8.6.9
8.7
Economics of privacy
Organisations and human behaviour
Economics of cybercrime
Summary
Research problems
Further reading
305
307
308
310
311
311
Part II
Chapter 9
Multilevel Security
9.1 Introduction
9.2 What is a security policy model?
9.3 Multilevel security policy
9.3.1 The Anderson report
9.3.2 The Bell-LaPadula model
9.3.3 The standard criticisms of Bell-LaPadula
9.3.4 The evolution of MLS policies
9.3.5 The Biba model
9.4 Historical examples of MLS systems
9.4.1 SCOMP
9.4.2 Data diodes
9.5 MAC: from MLS to IFC and integrity
9.5.1 Windows
9.5.2 SELinux
9.5.3 Embedded systems
9.6 What goes wrong
9.6.1 Composability
9.6.2 The cascade problem
9.6.3 Covert channels
9.6.4 The threat from malware
9.6.5 Polyinstantiation
9.6.6 Practical problems with MLS
9.7 Summary
Research problems
Further reading
315
315
316
318
319
320
321
323
325
326
326
327
329
329
330
330
331
331
332
333
333
334
335
337
338
339
Chapter 10
Boundaries
10.1 Introduction
10.2 Compartmentation and the lattice model
10.3 Privacy for tigers
10.4 Health record privacy
10.4.1 The threat model
10.4.2 The BMA security policy
10.4.3 First practical steps
341
341
344
346
349
351
353
356
Contents
10.4.4 What actually goes wrong
10.4.4.1 Emergency care
10.4.4.2 Resilience
10.4.4.3 Secondary uses
10.4.5 Confidentiality – the future
10.4.6 Ethics
10.4.7 Social care and education
10.4.8 The Chinese Wall
Chapter 11
Chapter 12
357
358
359
359
362
365
367
369
10.5 Summary
Research problems
Further reading
371
372
373
Inference Control
11.1 Introduction
11.2 The early history of inference control
11.2.1 The basic theory of inference control
11.2.1.1 Query set size control
11.2.1.2 Trackers
11.2.1.3 Cell suppression
11.2.1.4 Other statistical disclosure control
mechanisms
11.2.1.5 More sophisticated query controls
11.2.1.6 Randomization
11.2.2 Limits of classical statistical security
11.2.3 Active attacks
11.2.4 Inference control in rich medical data
11.2.5 The third wave: preferences and search
11.2.6 The fourth wave: location and social
11.3 Differential privacy
11.4 Mind the gap?
11.4.1 Tactical anonymity and its problems
11.4.2 Incentives
11.4.3 Alternatives
11.4.4 The dark side
11.5 Summary
Research problems
Further reading
375
375
377
378
378
379
379
Banking and Bookkeeping
12.1 Introduction
12.2 Bookkeeping systems
12.2.1 Double-entry bookkeeping
12.2.2 Bookkeeping in banks
12.2.3 The Clark-Wilson security policy model
405
405
406
408
408
410
380
381
382
383
384
385
388
389
392
394
395
398
399
400
401
402
402
xxiii
xxiv
Contents
12.2.4 Designing internal controls
12.2.5 Insider frauds
12.2.6 Executive frauds
12.2.6.1 The post office case
12.2.6.2 Other failures
12.2.6.3 Ecological validity
12.2.6.4 Control tuning and corporate governance
12.2.7 Finding the weak spots
12.3 Interbank payment systems
12.3.1 A telegraphic history of E-commerce
12.3.2 SWIFT
12.3.3 What goes wrong
12.4 Automatic teller machines
12.4.1 ATM basics
12.4.2 What goes wrong
12.4.3 Incentives and injustices
12.5 Credit cards
12.5.1 Credit card fraud
12.5.2 Online card fraud
12.5.3 3DS
12.5.4 Fraud engines
12.6 EMV payment cards
12.6.1 Chip cards
12.6.1.1 Static data authentication
12.6.1.2 ICVVs, DDA and CDA
12.6.1.3 The No-PIN attack
12.6.2 The preplay attack
12.6.3 Contactless
12.7 Online banking
12.7.1 Phishing
12.7.2 CAP
12.7.3 Banking malware
12.7.4 Phones as second factors
12.7.5 Liability
12.7.6 Authorised push payment fraud
12.8 Nonbank payments
12.8.1 M-Pesa
12.8.2 Other phone payment systems
12.8.3 Sofort, and open banking
12.9 Summary
Research problems
Further reading
411
415
416
418
419
420
421
422
424
424
425
427
430
430
433
437
438
439
440
443
444
445
445
446
450
451
452
454
457
457
458
459
459
461
462
463
463
464
465
466
466
468
Contents
Chapter 13
Locks and Alarms
13.1 Introduction
13.2 Threats and barriers
13.2.1 Threat model
13.2.2 Deterrence
13.2.3 Walls and barriers
13.2.4 Mechanical locks
13.2.5 Electronic locks
13.3 Alarms
13.3.1 How not to protect a painting
13.3.2 Sensor defeats
13.3.3 Feature interactions
13.3.4 Attacks on communications
13.3.5 Lessons learned
13.4 Summary
Research problems
Further reading
471
471
472
473
474
476
478
482
484
485
486
488
489
493
494
495
495
Chapter 14
Monitoring and Metering
14.1 Introduction
14.2 Prepayment tokens
14.2.1 Utility metering
14.2.2 How the STS system works
14.2.3 What goes wrong
14.2.4 Smart meters and smart grids
14.2.5 Ticketing fraud
14.3 Taxi meters, tachographs and truck speed limiters
14.3.1 The tachograph
14.3.2 What goes wrong
14.3.2.1 How most tachograph manipulation is
done
14.3.2.2 Tampering with the supply
14.3.2.3 Tampering with the instrument
14.3.2.4 High-tech attacks
14.3.3 Digital tachographs
14.3.3.1 System-level problems
14.3.3.2 Other problems
14.3.4 Sensor defeats and third-generation devices
14.3.5 The fourth generation – smart tachographs
14.4 Curfew tags: GPS as policeman
14.5 Postage meters
497
497
498
499
501
502
504
508
509
509
511
511
512
512
513
514
515
516
518
518
519
522
xxv
xxvi
Contents
14.6 Summary
Research problems
Further reading
526
527
527
Chapter 15
Nuclear Command and Control
15.1 Introduction
15.2 The evolution of command and control
15.2.1 The Kennedy memorandum
15.2.2 Authorization, environment, intent
15.3 Unconditionally secure authentication
15.4 Shared control schemes
15.5 Tamper resistance and PALs
15.6 Treaty verification
15.7 What goes wrong
15.7.1 Nuclear accidents
15.7.2 Interaction with cyberwar
15.7.3 Technical failures
15.8 Secrecy or openness?
15.9 Summary
Research problems
Further reading
529
529
532
532
534
534
536
538
540
541
541
542
543
544
545
546
546
Chapter 16
Security Printing and Seals
16.1 Introduction
16.2 History
16.3 Security printing
16.3.1 Threat model
16.3.2 Security printing techniques
16.4 Packaging and seals
16.4.1 Substrate properties
16.4.2 The problems of glue
16.4.3 PIN mailers
16.5 Systemic vulnerabilities
16.5.1 Peculiarities of the threat model
16.5.2 Anti-gundecking measures
16.5.3 The effect of random failure
16.5.4 Materials control
16.5.5 Not protecting the right things
16.5.6 The cost and nature of inspection
16.6 Evaluation methodology
16.7 Summary
Research problems
Further reading
549
549
550
551
552
553
557
558
558
559
560
562
563
564
564
565
566
567
569
569
570
Contents xxvii
Chapter 17
Biometrics
17.1 Introduction
17.2 Handwritten signatures
17.3 Face recognition
17.4 Fingerprints
17.4.1 Verifying positive or negative identity claims
17.4.2 Crime scene forensics
17.5 Iris codes
17.6 Voice recognition and morphing
17.7 Other systems
17.8 What goes wrong
17.9 Summary
Research problems
Further reading
571
571
572
575
579
581
584
588
590
591
593
596
597
597
Chapter 18
Tamper Resistance
18.1 Introduction
18.2 History
18.3 Hardware security modules
18.4 Evaluation
18.5 Smartcards and other security chips
18.5.1 History
18.5.2 Architecture
18.5.3 Security evolution
18.5.4 Random number generators and PUFs
18.5.5 Larger chips
18.5.6 The state of the art
18.6 The residual risk
18.6.1 The trusted interface problem
18.6.2 Conflicts
18.6.3 The lemons market, risk dumping and evaluation
games
18.6.4 Security-by-obscurity
18.6.5 Changing environments
18.7 So what should one protect?
18.8 Summary
Research problems
Further reading
599
599
601
601
607
609
609
610
611
621
624
628
630
630
631
Side Channels
19.1 Introduction
19.2 Emission security
19.2.1 History
19.2.2 Technical surveillance and countermeasures
639
639
640
641
642
Chapter 19
632
632
633
634
636
636
636
xxviii Contents
Chapter 20
19.3 Passive attacks
19.3.1 Leakage through power and signal cables
19.3.2 Leakage through RF signals
19.3.3 What goes wrong
19.4 Attacks between and within computers
19.4.1 Timing analysis
19.4.2 Power analysis
19.4.3 Glitching and differential fault analysis
19.4.4 Rowhammer, CLKscrew and Plundervolt
19.4.5 Meltdown, Spectre and other enclave side channels
19.5 Environmental side channels
19.5.1 Acoustic side channels
19.5.2 Optical side channels
19.5.3 Other side-channels
19.6 Social side channels
19.7 Summary
Research problems
Further reading
645
645
645
649
650
651
652
655
656
657
659
659
661
661
663
663
664
664
Advanced Cryptographic Engineering
20.1 Introduction
20.2 Full-disk encryption
20.3 Signal
20.4 Tor
20.5 HSMs
20.5.1 The xor-to-null-key attack
20.5.2 Attacks using backwards compatibility and
time-memory tradeoffs
20.5.3 Differential protocol attacks
20.5.4 The EMV attack
20.5.5 Hacking the HSMs in CAs and clouds
20.5.6 Managing HSM risks
20.6 Enclaves
20.7 Blockchains
20.7.1 Wallets
20.7.2 Miners
20.7.3 Smart contracts
20.7.4 Off-chain payment mechanisms
20.7.5 Exchanges, cryptocrime and regulation
20.7.6 Permissioned blockchains
20.8 Crypto dreams that failed
20.9 Summary
Research problems
Further reading
667
667
668
670
674
677
677
678
679
681
681
681
682
685
688
689
689
691
692
695
695
696
698
698
Contents
Chapter 21
Chapter 22
Network Attack and Defence
21.1 Introduction
21.2 Network protocols and service denial
21.2.1 BGP security
21.2.2 DNS security
21.2.3 UDP, TCP, SYN floods and SYN reflection
21.2.4 Other amplifiers
21.2.5 Other denial-of-service attacks
21.2.6 Email – from spies to spammers
21.3 The malware menagerie – Trojans, worms and RATs
21.3.1 Early history of malware
21.3.2 The Internet worm
21.3.3 Further malware evolution
21.3.4 How malware works
21.3.5 Countermeasures
21.4 Defense against network attack
21.4.1 Filtering: firewalls, censorware and wiretaps
21.4.1.1 Packet filtering
21.4.1.2 Circuit gateways
21.4.1.3 Application proxies
21.4.1.4 Ingress versus egress filtering
21.4.1.5 Architecture
21.4.2 Intrusion detection
21.4.2.1 Types of intrusion detection
21.4.2.2 General limitations of intrusion detection
21.4.2.3 Specific problems detecting network
attacks
21.5 Cryptography: the ragged boundary
21.5.1 SSH
21.5.2 Wireless networking at the periphery
21.5.2.1 WiFi
21.5.2.2 Bluetooth
21.5.2.3 HomePlug
21.5.2.4 VPNs
21.6 CAs and PKI
21.7 Topology
21.8 Summary
Research problems
Further reading
699
699
701
701
703
704
705
706
706
708
709
710
711
713
714
715
717
718
718
719
719
720
722
722
724
Phones
22.1 Introduction
22.2 Attacks on phone networks
737
737
738
724
725
726
727
727
728
729
729
730
733
734
734
735
xxix
xxx
Contents
22.2.1
22.2.2
22.2.3
22.2.4
22.2.5
22.2.6
22.2.7
22.2.8
Chapter 23
Attacks on phone-call metering
Attacks on signaling
Attacks on switching and configuration
Insecure end systems
Feature interaction
VOIP
Frauds by phone companies
Security economics of telecomms
739
742
743
745
746
747
748
749
22.3 Going mobile
22.3.1 GSM
22.3.2 3G
22.3.3 4G
22.3.4 5G and beyond
22.3.5 General MNO failings
22.4 Platform security
22.4.1 The Android app ecosystem
22.4.1.1 App markets and developers
22.4.1.2 Bad Android implementations
22.4.1.3 Permissions
22.4.1.4 Android malware
22.4.1.5 Ads and third-party services
22.4.1.6 Pre-installed apps
22.4.2 Apple’s app ecosystem
22.4.3 Cross-cutting issues
22.5 Summary
Research problems
Further reading
750
751
755
757
758
760
761
763
764
764
766
767
768
770
770
774
775
776
776
Electronic and Information Warfare
23.1 Introduction
23.2 Basics
23.3 Communications systems
23.3.1 Signals intelligence techniques
23.3.2 Attacks on communications
23.3.3 Protection techniques
23.3.3.1 Frequency hopping
23.3.3.2 DSSS
23.3.3.3 Burst communications
23.3.3.4 Combining covertness and jam resistance
23.3.4 Interaction between civil and military uses
23.4 Surveillance and target acquisition
23.4.1 Types of radar
777
777
778
779
781
784
785
786
787
788
789
790
791
792
Contents
23.4.2 Jamming techniques
23.4.3 Advanced radars and countermeasures
23.4.4 Other sensors and multisensor issues
Chapter 24
793
795
796
23.5
23.6
23.7
23.8
IFF systems
Improvised explosive devices
Directed energy weapons
Information warfare
23.8.1 Attacks on control systems
23.8.2 Attacks on other infrastructure
23.8.3 Attacks on elections and political stability
23.8.4 Doctrine
23.9 Summary
Research problems
Further reading
797
800
802
803
805
808
809
811
812
813
813
Copyright and DRM
24.1 Introduction
24.2 Copyright
24.2.1 Software
24.2.2 Free software, free culture?
24.2.3 Books and music
24.2.4 Video and pay-TV
24.2.4.1 Typical system architecture
24.2.4.2 Video scrambling techniques
24.2.4.3 Attacks on hybrid scrambling systems
24.2.4.4 DVB
24.2.5 DVD
24.3 DRM on general-purpose computers
24.3.1 Windows media rights management
24.3.2 FairPlay, HTML5 and other DRM systems
24.3.3 Software obfuscation
24.3.4 Gaming, cheating, and DRM
24.3.5 Peer-to-peer systems
24.3.6 Managing hardware design rights
24.4 Information hiding
24.4.1 Watermarks and copy generation management
24.4.2 General information hiding techniques
24.4.3 Attacks on copyright marking schemes
24.5 Policy
24.5.1 The IP lobby
24.5.2 Who benefits?
24.6 Accessory control
815
815
817
817
823
827
828
829
830
832
836
837
838
839
840
841
843
845
847
848
849
849
851
854
857
859
860
xxxi
xxxii Contents
Chapter 25
24.7 Summary
Research problems
Further reading
862
862
863
New Directions?
25.1 Introduction
25.2 Autonomous and remotely-piloted vehicles
25.2.1 Drones
25.2.2 Self-driving cars
25.2.3 The levels and limits of automation
25.2.4 How to hack a self-driving car
25.3 AI / ML
25.3.1 ML and security
25.3.2 Attacks on ML systems
25.3.3 ML and society
25.4 PETS and operational security
25.4.1 Anonymous messaging devices
25.4.2 Social support
25.4.3 Living off the land
25.4.4 Putting it all together
25.4.5 The name’s Bond. James Bond
25.5 Elections
25.5.1 The history of voting machines
25.5.2 Hanging chads
25.5.3 Optical scan
25.5.4 Software independence
25.5.5 Why electronic elections are hard
25.6 Summary
Research problems
Further reading
865
865
866
866
867
869
872
874
875
876
879
882
885
887
890
891
893
895
896
896
898
899
900
904
904
905
Surveillance or Privacy?
26.1 Introduction
26.2 Surveillance
26.2.1 The history of government wiretapping
26.2.2 Call data records (CDRs)
26.2.3 Search terms and location data
26.2.4 Algorithmic processing
26.2.5 ISPs and CSPs
26.2.6 The Five Eyes’ system of systems
26.2.7 The crypto wars
26.2.7.1 The back story to crypto policy
26.2.7.2 DES and crypto research
909
909
912
912
916
919
920
921
922
925
926
927
Part III
Chapter 26
Contents xxxiii
26.2.7.3 Crypto War 1 – the Clipper chip
26.2.7.4 Crypto War 2 – going spotty
26.2.8 Export control
Chapter 27
928
931
934
26.3 Terrorism
26.3.1 Causes of political violence
26.3.2 The psychology of political violence
26.3.3 The role of institutions
26.3.4 The democratic response
26.4 Censorship
26.4.1 Censorship by authoritarian regimes
26.4.2 Filtering, hate speech and radicalisation
26.5 Forensics and rules of evidence
26.5.1 Forensics
26.5.2 Admissibility of evidence
26.5.3 What goes wrong
26.6 Privacy and data protection
26.6.1 European data protection
26.6.2 Privacy regulation in the USA
26.6.3 Fragmentation?
26.7 Freedom of information
26.8 Summary
Research problems
Further reading
936
936
937
938
940
941
942
944
948
948
950
951
953
953
956
958
960
961
962
962
Secure Systems Development
27.1 Introduction
27.2 Risk management
27.3 Lessons from safety-critical systems
27.3.1 Safety engineering methodologies
27.3.2 Hazard analysis
27.3.3 Fault trees and threat trees
27.3.4 Failure modes and effects analysis
27.3.5 Threat modelling
27.3.6 Quantifying risks
27.4 Prioritising protection goals
27.5 Methodology
27.5.1 Top-down design
27.5.2 Iterative design: from spiral to agile
27.5.3 The secure development lifecycle
27.5.4 Gated development
27.5.5 Software as a Service
27.5.6 From DevOps to DevSecOps
27.5.6.1 The Azure ecosystem
965
965
966
969
970
971
971
972
973
975
978
980
981
983
985
987
988
991
991
xxxiv Contents
27.5.6.2 The Google ecosystem
27.5.6.3 Creating a learning system
27.5.7 The vulnerability cycle
27.5.7.1 The CVE system
27.5.7.2 Coordinated disclosure
27.5.7.3 Security incident and event management
27.5.8 Organizational mismanagement of risk
Chapter 28
992
994
995
997
998
999
1000
27.6 Managing the team
27.6.1 Elite engineers
27.6.2 Diversity
27.6.3 Nurturing skills and attitudes
27.6.4 Emergent properties
27.6.5 Evolving your workflow
27.6.6 And finally …
27.7 Summary
Research problems
Further reading
1004
1004
1005
1007
1008
1008
1010
1010
1011
1012
Assurance and Sustainability
28.1 Introduction
28.2 Evaluation
28.2.1 Alarms and locks
28.2.2 Safety evaluation regimes
28.2.3 Medical device safety
28.2.4 Aviation safety
28.2.5 The Orange book
28.2.6 FIPS 140 and HSMs
28.2.7 The common criteria
28.2.7.1 The gory details
28.2.7.2 What goes wrong with the Common
Criteria
28.2.7.3 Collaborative protection profiles
28.2.8 The ‘Principle of Maximum Complacency’
28.2.9 Next steps
28.3 Metrics and dynamics of dependability
28.3.1 Reliability growth models
28.3.2 Hostile review
28.3.3 Free and open-source software
28.3.4 Process assurance
28.4 The entanglement of safety and security
28.4.1 The electronic safety and security of cars
28.4.2 Modernising safety and security regulation
28.4.3 The Cybersecurity Act 2019
1015
1015
1018
1019
1019
1020
1023
1025
1026
1026
1027
1029
1031
1032
1034
1036
1036
1039
1040
1042
1044
1046
1049
1050
Contents xxxv
Chapter 29
28.5 Sustainability
28.5.1 The Sales of goods directive
28.5.2 New research directions
28.6 Summary
Research problems
Further reading
1051
1052
1053
1056
1057
1058
Beyond “Computer Says No”
1059
Bibliography
1061
Index
1143
Preface to the Third Edition
The first edition of Security Engineering was published in 2001 and the second
in 2008. Since then there have been huge changes.
The most obvious is that the smartphone has displaced the PC and laptop.
Most of the world’s population now walk around with a computer that’s also
a phone, a camera and a satnav; and the apps that run on these magic devices
have displaced many of the things we were building ten years ago. Taxi rides
are now charged by ride-hailing apps rather than by taxi meters. Banking has
largely gone online, with phones starting to displace credit cards. Energy saving is no longer about your meter talking to your heating system but about
both talking to your phone. Social networking has taken over many people’s
lives, driving everything from advertising to politics.
A related but less visible change is the move to large server farms. Sensitive data have moved from servers in schools, doctors’ offices and law firms to
cloud service providers. Many people no longer do their writing on word processing software on their laptop but on Google Docs or Office365 (I’m writing
this book on Overleaf). This has consequences. Security breaches can happen
at a scale no-one would have imagined twenty years ago. Compromises of
tens of millions of passwords, or credit cards, have become almost routine.
And in 2013, we discovered that fifteen years’ worth of UK hospital medical
records had been sold to 1200 organisations worldwide without the consent of
the patients (who were still identifable via their postcodes and dates of birth).
A real game-changer of the last decade was the Snowden revelations, also
in 2013, when over 50,000 Top Secret documents about the NSA’s signals
intelligence activities were leaked to the press. The scale and intrusiveness
of government surveillance surprised even cynical security engineers. It
followed on from Stuxnet, where America attacked Iran’s nuclear weapons
program using malware, and was followed by NotPetya, where a Russian
xxxvii
xxxviii Preface to the Third Edition
cyberweapon, deployed against the Ukraine, inflicted hundreds of millions
of dollars’ worth of collateral damage on firms elsewhere. This brings us to
the third big change, which is a much better understanding of nation-state
security threats. In addition to understanding the capabilities and priorities
of western intelligence agencies, we have a reasonably good idea of what the
Chinese, the Russians and even the Syrians get up to.
And where the money is, the crooks follow too. The last decade has also seen
the emergence of a cyber-crime ecosystem, with malware writers providing
the tools to subvert millions of machines, many of which are used as criminal
infrastructure while others are subverted in various ways into defrauding their
users. We have a team at Cambridge that studies this, and so do dozens of
other research groups worldwide. The rise of cybercrime is changing policing,
and other state activity too: cryptocurrencies are not just making it easier to
write ransomware, but undermining financial regulation. And then there are
non-financial threats from cyber-bullying up through hate speech to election
manipulation and videos of rape and murder.
So online harms now engage all sorts of people from teachers and the police
to banks and the military. It is ever more important to measure the costs of these
harms, and the effectiveness of the measures we deploy to mitigate them.
Some of the changes would have really surprised someone who read my book
ten years ago and then spent a decade in solitary confinement. For example,
the multilevel security industry is moribund, despite being the beneficiary of
billions of dollars of US government funding over forty years; the Pentagon’s
entire information security philosophy – of mandating architectures to stop
information flowing downward from Top Secret to Secret to Confidential to
Unclassified – has been abandoned as unworkable. While architecture still matters, the emphasis has shifted to ecosystems. Given that bugs are ubiquitous
and exploits inevitable, we had better be good at detecting exploits, fixing bugs
and recovering from attacks. The game is no longer trusted systems but coordinated disclosure, DevSecOps and resilience.
What might the future hold? A likely game-changer is that as we put software
into safety-critical systems like cars and medical devices, and connect them to
the Internet, safety and security engineering are converging. This is leading to
real strains; while security engineers fix bugs quickly, safety engineers like to
test systems rigorously against standards that change slowly if at all. A wicked
problem is how we will patch durable goods. At present, you might get security
patches for your phone for three years and your laptop for five; you’re expected
to buy a new one after that. But cars last for fifteen years on average and if
we’re suddenly asked to scrap them after five the environmental costs won’t
be acceptable. So tell me, if you’re writing navigation software today in 2020
for a car that will launch in 2023, how will you ensure that you can keep on
shipping security patches in 2033, 2043 and 2053? What tools will you choose
today?
Preface to the Third Edition xxxix
Finally, there has been a sea change in the political environment. After
decades in which political leaders considered technology policy to be for
men in anoraks, and generally took the line of least resistance, the reports
of Russian interference in the Brexit referendum and the Trump election got
their attention. The prospect of losing your job can concentrate the mind
wonderfully. The close attention of lawmakers is changing the game, first with
tighter general rules such as Europe’s General Data Protection Regulation; and
second as products that are already regulated for safety, from cars and railway
signals to children’s toys acquire software and online connectivity, which has
led to rules in Europe about how long software has to be maintained.
The questions the security engineer has to ask today are just the same as a
decade ago: what are we seeking to prevent, and will the proposed mechanisms
actually work? However, the canvas on which we work is now much broader.
Almost all human life is there.
Ross Anderson
Cambridge, October 2020
Preface to the Second Edition
The first edition of Security Engineering was published in May 2001. Since then
the world has changed.
System security was one of Microsoft’s lowest priorities then; it’s now one
of the highest. The volume of malware continues to increase along with the
nuisance that it causes. Although a lot of effort has gone into defence – we have
seen Windows NT replaced by XP and then Vista, and occasional service packs
replaced by monthly security patches – the effort put into attacks has increased
far more. People who write viruses no longer do so for fun, but for profit; the
last few years have seen the emergence of a criminal economy that supports
diverse specialists. Spammers, virus writers, phishermen, money launderers
and spies trade busily with each other.
Cryptography has also moved on. The Advanced Encryption Standard is
being embedded into more and more products, and we have some interesting
developments on the public-key side of things too. But just as our algorithm
problems get solved, so we face a host of implementation issues. Side channels,
poorly designed APIs and protocol failures continue to break systems. Applied
cryptography is harder than ever to do well.
Pervasive computing also opens up new challenges. As computers and communications become embedded invisibly everywhere, so problems that used
to only afflict ‘proper computers’ crop up in all sorts of other devices too. What
does it mean for a thermometer to be secure, or an air-conditioner?
The great diversity of intelligent devices brings with it a great diversity
of interests and actors. Security is not just about keeping the bad guys out,
but increasingly concerned with tussles for power and control. DRM pits the
content and platform industries against consumers, and against each other;
accessory control is used to tie printers to their vendors’ cartridges, but leads
to antitrust lawsuits and government intervention. Security also interacts with
xli
xlii
Preface to the Second Edition
safety in applications from cars through utilities to electronic healthcare. The
security engineer needs to understand not just crypto and operating systems,
but economics and human factors as well.
And the ubiquity of digital devices means that ‘computer security’ is no
longer just a problem for a few systems specialists. Almost all white-collar
crime (and much crime of the serious violent sort) now involves computers
or mobile phones, so a detective needs to understand computer forensics just
as she needs to know how to drive. More and more lawyers, accountants,
managers and other people with no formal engineering training are going to
have to understand system security in order to do their jobs well.
The rapid growth of online services, from Google and Facebook to massively
multiplayer games, has also changed the world. Bugs in online applications
can be fixed rapidly once they’re noticed, but the applications get ever more
complex and their side-effects harder to predict. We may have a reasonably
good idea what it means for an operating system or even a banking service to
be secure, but we can’t make any such claims for online lifestyles that evolve
all the time. We’re entering a novel world of evolving socio-technical systems,
and that raises profound questions about how the evolution is driven and who
is in control.
The largest changes, however, may be those driven by the tragic events of
September 2001 and by our reaction to them. These have altered perceptions
and priorities in many ways, and changed the shape of the security industry.
Terrorism is not just about risk, but about the perception of risk, and about
the manipulation of perception. This adds psychology and politics to the mix.
Security engineers also have a duty to contribute to the political debate. Where
inappropriate reactions to terrorist crimes have led to major waste of resources
and unforced policy errors, we have to keep on educating people to ask a few
simple questions: what are we seeking to prevent, and will the proposed mechanisms actually work?
Ross Anderson
Cambridge, January 2008
Preface to the First Edition
For generations, people have defined and protected their property and their
privacy using locks, fences, signatures, seals, account books, and meters. These
have been supported by a host of social constructs ranging from international
treaties through national laws to manners and customs.
This is changing, and quickly. Most records are now electronic, from bank
accounts to registers of real property; and transactions are increasingly electronic, as shopping moves to the Internet. Just as important, but less obvious,
are the many everyday systems that have been quietly automated. Burglar
alarms no longer wake up the neighborhood, but send silent messages to the
police; students no longer fill their dormitory washers and dryers with coins,
but credit them using a smartcard they recharge at the college bookstore; locks
are no longer simple mechanical affairs, but are operated by electronic remote
controls or swipe cards; and instead of renting videocassettes, millions of people get their movies from satellite or cable channels. Even the humble banknote
is no longer just ink on paper, but may contain digital watermarks that enable
many forgeries to be detected by machine.
How good is all this new security technology? Unfortunately, the honest
answer is ‘nowhere near as good as it should be.’ New systems are often
rapidly broken, and the same elementary mistakes are repeated in one application after another. It often takes four or five attempts to get a security design
right, and that is far too many.
The media regularly report security breaches on the Internet; banks fight
their customers over ‘phantom withdrawals’ from cash machines; VISA
reports huge increases in the number of disputed Internet credit card transactions; satellite TV companies hound pirates who copy their smartcards; and
law enforcement agencies try to stake out territory in cyberspace with laws
controlling the use of encryption. Worse still, features interact. A mobile phone
xliii
xliv
Preface to the First Edition
that calls the last number again if one of the keys is pressed by accident may
be just a minor nuisance – until someone invents a machine that dispenses
a can of soft drink every time its phone number is called. When all of a
sudden you find 50 cans of Coke on your phone bill, who is responsible, the
phone company, the handset manufacturer, or the vending machine operator?
Once almost every electronic device that affects your life is connected to the
Internet – which Microsoft expects to happen by 2010 – what does ‘Internet
security’ mean to you, and how do you cope with it?
As well as the systems that fail, many systems just don’t work well enough.
Medical record systems don’t let doctors share personal health information
as they would like, but still don’t protect it against inquisitive private eyes.
Zillion-dollar military systems prevent anyone without a “top secret” clearance from getting at intelligence data, but are often designed so that almost
everyone needs this clearance to do any work. Passenger ticket systems are
designed to prevent customers cheating, but when trustbusters break up the
railroad, they cannot stop the new rail companies cheating each other. Many
of these failures could have been foreseen if designers had just a little bit more
knowledge of what had been tried, and had failed, elsewhere.
Security engineering is the new discipline that is starting to emerge out of all
this chaos.
Although most of the underlying technologies (cryptology, software reliability, tamper resistance, security printing, auditing, etc.) are relatively well
understood, the knowledge and experience of how to apply them effectively
is much scarcer. And since the move from mechanical to digital mechanisms
is happening everywhere at once, there just has not been time for the lessons
learned to percolate through the engineering community. Time and again, we
see the same old square wheels being reinvented.
The industries that have managed the transition most capably are often
those that have been able to borrow an appropriate technology from another
discipline. Examples include the reuse of technology designed for military
identify-friend-or-foe equipment in bank cash machines and even prepayment gas meters. So even if a security designer has serious expertise in some
particular speciality – whether as a mathematician working with ciphers or a
chemist developing banknote inks – it is still prudent to have an overview of the
whole subject. The essence of good security engineering is understanding the
potential threats to a system, then applying an appropriate mix of protective
measures – both technological and organizational – to control them. Knowing
what has worked, and more importantly what has failed, in other applications
is a great help in developing judgment. It can also save a lot of money.
Preface to the First Edition
The purpose of this book is to give a solid introduction to security engineering, as we understand it at the beginning of the twenty-first century. My goal
is that it works at four different levels:
1. as a textbook that you can read from one end to the other over a few
days as an introduction to the subject. The book is to be used mainly
by the working IT professional who needs to learn about the subject,
but it can also be used in a one-semester course in a university;
2. as a reference book to which you can come for an overview of the
workings of some particular type of system (such as cash machines,
taxi meters, radar jammers, anonymous medical record databases or
whatever);
3. as an introduction to the underlying technologies, such as crypto,
access control, inference control, tamper resistance, and seals. Space
prevents me from going into great depth; but I provide a basic road
map for each subject, plus a reading list for the curious (and a list
of open research problems for the prospective graduate student);
4. as an original scientific contribution in which I have tried to draw
out the common principles that underlie security engineering, and
the lessons that people building one kind of system should have
learned from others. In the many years I have been working in
security, I keep coming across these. For example, a simple attack
on stream ciphers wasn’t known to the people who designed a
common anti-aircraft fire control radar so it was easy to jam; while
a trick well known to the radar community wasn’t understood by
banknote printers and people who design copyright marking schemes,
which led to a quite general attack on most digital watermarks.
I have tried to keep this book resolutely mid-Atlantic. A security engineering book has to be, as many of the fundamental technologies are American,
while many of the interesting applications are European. (This isn’t surprising
given the better funding of US universities and research labs, and the greater
diversity of nations and markets in Europe.) What’s more, many of the successful European innovations – from the smartcard to the GSM mobile phone
to the pay-per-view TV service – have crossed the Atlantic and now thrive in
the Americas. Both the science, and the case studies, are necessary.
This book grew out of the security engineering courses I teach at Cambridge
University, but I have rewritten my notes to make them self-contained and
added at least as much material again. It should be useful to the established
professional security manager or consultant as a first-line reference; to the
computer science professor doing research in cryptology; to the working
police detective trying to figure out the latest computer scam; and to policy
wonks struggling with the conflicts involved in regulating cryptography and
xlv
xlvi
Preface to the First Edition
anonymity. Above all, it is aimed at Dilbert. My main audience is the working
programmer or engineer who is trying to design real systems that will keep on
working despite the best efforts of customers, managers, and everybody else.
This book is divided into three parts.
The first looks at basic concepts, starting with the central concept
of a security protocol, and going on to the human-computer interface, access controls, cryptology and distributed system issues.
It does not assume any particular technical background other
than basic computer literacy. It is based on an ‘Introduction to
Security’ course which we teach to second year undergraduates.
The second part looks in much more detail at a number of important
applications such as military communications, medical record systems, cash machines, mobile phones and pay-TV. These are used to
introduce more of the advanced technologies and concepts. It also
considers information security from the viewpoint of a number of
different interest groups such as companies, consumers, criminals,
the police and spies. This material is drawn from my senior course
on security, from research work, and from experience consulting.
The third part looks at the organizational and policy issues: how
computer security interacts with law, with evidence, and with corporate
politics; how we can gain confidence that a system will perform as
intended; and how the whole business of security engineering can best
be managed.
I believe that building systems which continue to perform robustly in the face
of malice is one of the most important, interesting, and difficult tasks facing
engineers in the twenty-first century.
Ross Anderson
Cambridge, January 2001
For my daughter,
and other lawyers …
The tricks taught in this book are intended only to enable you to build better
systems. They are not in any way given as a means of helping you to break into
systems or do anything else illegal. So where possible I have tried to give case
histories at a level of detail that illustrates the underlying principles without
giving a ‘hacker’s cookbook’.
Governments fought to restrict knowledge of cryptography until the turn
of the century, and there may still be people who believe that the knowledge
contained in this book should not be published.
Their fears were answered in the first book in English that discussed cryptology, a 1641 treatise on optical and acoustic telegraphy written by Oliver
Cromwell’s cryptographer and son-in-law John Wilkins [2025]. He traced
scientific censorship back to the Egyptian priests who forbade the use of
alphabetic writing on the grounds that it would spread literacy among the
common people and thus foster dissent. As he said:
‘It will not follow that everything must be suppresst which may be abused … If
all those useful inventions that are liable to abuse should therefore be concealed
there is not any Art of Science which may be lawfully profest.’
The question was raised again in the nineteenth century, when some
well-meaning people wanted to ban books on locksmithing. In 1853, a
contemporary writer replied [1899]:
‘Many well-meaning persons suppose that the discussion respecting the means
for baffling the supposed safety of locks offers a premium for dishonesty, by showing others how to be dishonest. This is a fallacy. Rogues are very keen in their
profession, and already know much more than we can teach them respecting their
several kinds of roguery. Rogues knew a good deal about lockpicking long before
xlvii
xlviii For my daughter, and other lawyers …
locksmiths discussed it among themselves … if there be harm, it will be much
more than counterbalanced by good.’
Thirty years later, in the first book on cryptographic engineering, Auguste
Kerckhoffs explained that you must always assume that the other side knows
the system, so security must reside in the choice of a key.
His wisdom has been borne out by long experience since. The relative
benefits of ‘Open’ versus ‘Closed’ security systems have also been studied
by researchers applying the tools of dependability analysis and security
economics. We discuss their findings in this book.
In short, while some bad guys will benefit from a book such as this, they
mostly know it already – and the good guys benefit much more.
Ross Anderson
Cambridge, November 2020
Foreword
In a paper he wrote with Roger Needham, Ross Anderson coined the phrase
‘programming Satan’s computer’ to describe the problems faced by computersecurity engineers. It’s the sort of evocative image I’ve come to expect from
Ross, and a phrase I’ve used ever since.
Programming a computer is straightforward: keep hammering away at the
problem until the computer does what it’s supposed to do. Large application
programs and operating systems are a lot more complicated, but the methodology is basically the same. Writing a reliable computer program is much harder,
because the program needs to work even in the face of random errors and
mistakes: Murphy’s computer, if you will. Significant research has gone into
reliable software design, and there are many mission-critical software applications that are designed to withstand Murphy’s Law.
Writing a secure computer program is another matter entirely. Security
involves making sure things work, not in the presence of random faults, but in
the face of an intelligent and malicious adversary trying to ensure that things
fail in the worst possible way at the worst possible time … again and again.
It truly is programming Satan’s computer.
Security engineering is different from any other kind of programming. It’s a
point I made over and over again: in my own book, Secrets and Lies, in my
monthly newsletter Crypto-Gram, and in my other writings. And it’s a point
Ross makes in every chapter of this book. This is why, if you’re doing any security engineering … if you’re even thinking of doing any security engineering,
you need to read this book. It’s the first, and only, end-to-end modern security
design and engineering book ever written.
And it comes just in time. You can divide the history of the Internet into three
waves. The first wave centered around mainframes and terminals. Computers
xlix
l
Foreword
were expensive and rare. The second wave, from about 1992 until now,
centered around personal computers, browsers, and large application programs. And the third, starting now, will see the connection of all sorts of devices
that are currently in proprietary networks, standalone, and non-computerized.
By 2003, there will be more mobile phones connected to the Internet than
computers. Within a few years we’ll see many of the world’s refrigerators,
heart monitors, bus and train ticket dispensers, burglar alarms, and electricity
meters talking IP. Personal computers will be a minority player on the Internet.
Security engineering, especially in this third wave, requires you to think
differently. You need to figure out not how something works, but how
something can be made to not work. You have to imagine an intelligent
and malicious adversary inside your system (remember Satan’s computer),
constantly trying new ways to subvert it. You have to consider all the ways
your system can fail, most of them having nothing to do with the design itself.
You have to look at everything backwards, upside down, and sideways. You
have to think like an alien.
As the late great science fiction editor John W. Campbell, said: “An alien
thinks as well as a human, but not like a human.” Computer security is a lot
like that. Ross is one of those rare people who can think like an alien, and then
explain that thinking to humans. Have fun reading.
Bruce Schneier
January 2001
PART
I
In the first section of the book, I cover the basics.
The first chapter sets out to clarify concepts and
terminology by describing the secure distributed
systems commonly found in four environments: a
bank, an air force base, a hospital, and the home. The
second chapter then plunges into the thick of things by
describing the threat actors and how they operate. We
look at state actors such as the US, Chinese and Russian
intelligence communities, about which we now know
quite a lot thanks to disclosures by Ed Snowden and
others; we describe the cybercrime ecosystem, which
we’ve been studying for some years now; and we also
describe non-financial abuses from cyber-bullying and
intimate partner abuse up to election manipulation
and political radicalisation. This teaches that a wide
range of attackers use similar techniques, not just at
the technical level but increasingly to deceive and
manipulate people.
In the third chapter we therefore turn to psychology.
Phishing is a key technique for both online crime and
national intelligence gathering; usability failures are
exploited all the time, and are really important for
safety as well as security. One of the most fruitful areas
of security research in recent years has therefore been
psychology. Security engineers need to understand
how people can be deceived, so we can design systems
that make deception harder. We also need to understand how risk perceptions and realities have drifted
ever further apart.
2
Part I
The following chapters dig deeper into the technical meat. The fourth chapter
is on security protocols, which specify how the players in a system – whether
people, computers, phones or other electronic devices – establish and maintain
trust. The fifth is on the ‘duct tape’ that underlies most of the protocols and
holds distributed systems together: cryptography. This is the art (and science)
of codes and ciphers; but it is much more than a clever means for keeping messages secret from an eavesdropper. Nowadays its job is taking trust from where
it exists to where it’s needed, maintaining the integrity of security contexts, and
much more besides.
The sixth chapter is on access control: how can we keep apart the different
apps on a phone, or the different virtual machines or containers on a server,
and how can we control the data flows we want to permit between them.
Sometimes this can be done cleanly, but often it’s hard; web browsers deal with
JavaScript code from multiple untrustworthy websites, while home assistants
have to deal with multiple people.
The next chapter is on distributed systems. Systems that run on multiple
devices have to deal with coordination problems such as concurrency control,
fault tolerance, and naming. These take on subtle new meanings when systems must be made resilient against malice as well as against accidental failure.
Many systems perform poorly or even fail because their designers don’t think
through these issues.
The final chapter in this part is on economics. Security economics has grown
hugely since this book first appeared in 2001 and helped to launch it as a subject. We now know that many security failures are due to perverse incentives
rather than to deficient technical protection mechanisms. (Indeed, the former
often explain the latter.) The dependability of a system is increasingly an emergent property that depends on the self-interested striving of large numbers of
players; in effect it’s an equilibrium in a market. Security mechanisms are not
just used to keep ‘bad’ people out of ‘good’ systems, but to enable one principal to exert power over another; they are often abused to capture or distort
markets. If we want to understand such plays, or to design systems that resist
strategic manipulation, we need some game theory and auction theory.
These chapters cover basic material, and largely follow what we teach
first-year and second-year undergraduates at Cambridge. But I hope that even
experts will find the case studies of interest and value.
CHAPTER
1
What Is Security Engineering?
Out of the crooked timber of humanity, no straight thing was ever made.
– IMMANUEL KANT
The world is never going to be perfect, either on- or offline; so let’s not set impossibly high
standards for online.
– ESTHER DYSON
1.1 Introduction
Security engineering is about building systems to remain dependable in the
face of malice, error, or mischance. As a discipline, it focuses on the tools, processes, and methods needed to design, implement, and test complete systems,
and to adapt existing systems as their environment evolves.
Security engineering requires cross-disciplinary expertise, ranging from
cryptography and computer security through hardware tamper-resistance to
a knowledge of economics, applied psychology, organisations and the law.
System engineering skills, from business process analysis through software
engineering to evaluation and testing, are also important; but they are not
sufficient, as they deal only with error and mischance rather than malice. The
security engineer also needs some skill at adversarial thinking, just like a chess
player; you need to have studied lots of attacks that worked in the past, from
their openings through their development to the outcomes.
Many systems have critical assurance requirements. Their failure may
endanger human life and the environment (as with nuclear safety and control
systems), do serious damage to major economic infrastructure (cash machines
and online payment systems), endanger personal privacy (medical record
systems), undermine the viability of whole business sectors (prepayment
utility meters), and facilitate crime (burglar and car alarms). Security and
safety are becoming ever more intertwined as we get software in everything.
3
4
Chapter 1
■
What Is Security Engineering?
Even the perception that a system is more vulnerable or less reliable than it
really is can have real social costs.
The conventional view is that while software engineering is about ensuring
that certain things happen (“John can read this file”), security is about ensuring that they don’t (“The Chinese government can’t read this file”). Reality is
much more complex. Security requirements differ greatly from one system to
another. You typically need some combination of user authentication, transaction integrity and accountability, fault-tolerance, message secrecy, and covertness. But many systems fail because their designers protect the wrong things,
or protect the right things but in the wrong way.
Getting protection right thus depends on several different types of process.
You have to figure out what needs protecting, and how to do it. You also need to
ensure that the people who will guard the system and maintain it are properly
motivated. In the next section, I’ll set out a framework for thinking about this.
Then, in order to illustrate the range of different things that security and safety
systems have to do, I will take a quick look at four application areas: a bank, a
military base, a hospital, and the home. Once we’ve given concrete examples
of the stuff that security engineers have to understand and build, we will be in
a position to attempt some definitions.
1.2 A framework
To build really dependable systems, you need four things to come together.
There’s policy: what you’re supposed to achieve. There’s mechanism: the
ciphers, access controls, hardware tamper-resistance and other machinery that
you use to implement the policy. There’s assurance: the amount of reliance you
can place on each particular mechanism, and how well they work together.
Finally, there’s incentive: the motive that the people guarding and maintaining
the system have to do their job properly, and also the motive that the attackers
have to try to defeat your policy. All of these interact (see Figure 1.1).
As an example, let’s think of the 9/11 terrorist attacks. The hijackers’ success
in getting knives through airport security was not a mechanism failure but
a policy one; the screeners did their job of keeping out guns and explosives,
but at that time, knives with blades up to three inches were permitted. Policy
changed quickly: first to prohibit all knives, then most weapons (baseball bats
are now forbidden but whiskey bottles are OK); it’s flip-flopped on many
details (butane lighters forbidden then allowed again). Mechanism is weak,
because of things like composite knives and explosives that don’t contain
nitrogen. Assurance is always poor; many tons of harmless passengers’
possessions are consigned to the trash each month, while less than half of all
the real weapons taken through screening (whether accidentally or for test
purposes) are spotted and confiscated.
1.2 A framework
Policy
Incentives
Mechanism
Assurance
Figure 1.1: – Security Engineering Analysis Framework
Most governments have prioritised visible measures over effective ones. For
example, the TSA has spent billions on passenger screening, which is fairly
ineffective, while the $100m spent on reinforcing cockpit doors removed most
of the risk [1526]. The President of the Airline Pilots Security Alliance noted
that most ground staff aren’t screened, and almost no care is taken to guard
aircraft parked on the ground overnight. As most airliners don’t have door
locks, there’s not much to stop a bad guy wheeling steps up to a plane and
placing a bomb on board; if he had piloting skills and a bit of chutzpah, he could
file a flight plan and make off with it [1204]. Yet screening staff and guarding
planes are just not a priority.
Why are such policy choices made? Quite simply, the incentives on the
decision makers favour visible controls over effective ones. The result is what
Bruce Schneier calls ‘security theatre’ – measures designed to produce a feeling
of security rather than the reality. Most players also have an incentive to exaggerate the threat from terrorism: politicians to ‘scare up the vote’ (as President
Obama put it), journalists to sell more papers, companies to sell more equipment, government officials to build their empires, and security academics to
get grants. The upshot is that most of the damage done by terrorists to democratic countries comes from the overreaction. Fortunately, electorates figure
this out over time, and now – nineteen years after 9/11 – less money is wasted.
Of course, we now know that much more of our society’s resilience budget
should have been spent on preparing for pandemic disease. It was at the top
of Britain’s risk register, but terrorism was politically more sexy. The countries
that managed their priorities more rationally got much better outcomes.
Security engineers need to understand all this; we need to be able to put risks
and threats in context, make realistic assessments of what might go wrong,
and give our clients good advice. That depends on a wide understanding of
what has gone wrong over time with various systems; what sort of attacks have
worked, what their consequences were, and how they were stopped (if it was
5
6
Chapter 1
■
What Is Security Engineering?
worthwhile to do so). History also matters because it leads to complexity, and
complexity causes many failures. Knowing the history of modern information
security enables us to understand its complexity, and navigate it better.
So this book is full of case histories. To set the scene, I’ll give a few brief
examples here of interesting security systems and what they’re designed to
prevent.
1.3 Example 1 – a bank
Banks operate a lot of security-critical computer systems.
1. A bank’s operations rest on a core bookkeeping system. This keeps
customer account master files plus a number of journals that record
incoming and outgoing transactions. The main threat here is the
bank’s own staff; about one percent of bank branch staff are fired
each year, mostly for petty dishonesty (the average theft is only a
few thousand dollars). The traditional defence comes from bookkeeping procedures that have evolved over centuries. For example,
each debit against one account must be matched by a credit against
another; so money can only be moved within a bank, never created or destroyed. In addition, large transfers typically need two
people to authorize them. There are also alarms that look for
unusual volumes or patterns of transactions, and staff are required
to take regular vacations with no access to the bank’s systems.
2. One public face is the bank’s automatic teller machines. Authenticating transactions based on a customer’s card and personal
identification number – so as to defend against both outside and
inside attack – is harder than it looks! There have been many epidemics of ‘phantom withdrawals’ in various countries when local
villains (or bank staff) have found and exploited loopholes in the
system. Automatic teller machines are also interesting as they were
the first large-scale commercial use of cryptography, and they helped
establish a number of crypto standards. The mechanisms developed
for ATMs have been extended to point-of-sale terminals in shops,
where card payments have largely displaced cash; and they’ve been
adapted for other applications such as prepayment utility meters.
3. Another public face is the bank’s website and mobile phone app. Most
customers now do their routine business, such as bill payments and
transfers between savings and checking accounts, online rather than
at a branch. Bank websites have come under heavy attack since 2005
from phishing – where customers are invited to enter their passwords
1.4 Example 2 – a military base
at bogus websites. The standard security mechanisms designed in the
1990s turned out to be less effective once criminals started attacking
the customers rather than the bank, so many banks now send you
a text message with an authentication code. The crooks’ reaction
is to go to a phone shop, pretend to be you, and buy a new phone
that takes over your phone number. This arms race poses many
fascinating security engineering problems mixing elements from
authentication, usability, psychology, operations and economics.
4. Behind the scenes are high-value messaging systems, used to move large
sums between banks; to trade in securities; to issue letters of credit and
guarantees; and so on. An attack on such a system is the dream of the
high-tech criminal – and we hear that the government of North Korea
has stolen many millions by attacks on banks. The defence is a mixture of bookkeeping controls, access controls, and cryptography.
5. The bank’s branches may seem large, solid and prosperous, reassuring
customers that their money is safe. But the stone facade is theatre rather
than reality. If you walk in with a gun, the tellers will give you all the
cash you can see; and if you break in at night, you can cut into the safe in
minutes with an abrasive wheel. The effective controls center on alarm
systems, which are connected to a security company’s control center,
whose staff check things out by video and call the police if they have to.
Cryptography is used to prevent a robber manipulating the communications and making the alarm appear to say ‘all’s well’ when it isn’t.
I’ll look at these applications in later chapters. Banking computer security
is important: until the early 2000s, banks were the main civilian market for
many computer security products, so they had a huge influence on security
standards.
1.4 Example 2 – a military base
Military systems were the other technology driver back in the 20th century, as
they motivated much of the academic research that governments funded into
computer security from the early 1980s onwards. As with banking, there’s not
one application but many.
1. Military communications drove the development of cryptography, going right back to ancient Egypt and Mesopotamia. But
it is often not enough to just encipher messages: an enemy who
sees traffic encrypted with somebody else’s keys may simply
locate and attack the transmitter. Low-probability-of-intercept
(LPI) radio links are one answer; they use tricks that are now
adopted in everyday communications such as Bluetooth.
7
8
Chapter 1
■
What Is Security Engineering?
2. Starting in the 1940s, governments spent a lot of money on electronic warfare systems. The arms race of trying to jam enemy
radars while preventing the enemy from jamming yours has led
to many sophisticated deception tricks, countermeasures, and
counter-countermeasures – with a depth, subtlety and range of strategies that are still not found elsewhere. Spoofing and service-denial
attacks were a reality there long before blackmailers started
targeting the websites of bankers, bookmakers and gamers.
3. Military organisations need to hold some information close, such
as intelligence sources and plans for future operations. These are
typically labeled ‘Top Secret’ and handled on separate systems; they
may be further restricted in compartments, so that the most sensitive
information is known to only a handful of people. For years, attempts
were made to enforce information flow rules, so you could copy a file
from a Secret stores system to a Top Secret command system, but not vice
versa. Managing multiple systems with information flow restrictions
is a hard problem, and the billions that were spent on attempting
to automate military security helped develop the access-control
technology you now have in your mobile phone and laptop.
4. The problems of protecting nuclear weapons led to the invention
of a lot of cool security technology, ranging from provably-secure
authentication systems, through optical-fibre alarm sensors,
to methods of identifying people using biometrics – including
the iris patterns now used to identify all citizens of India.
The security engineer can still learn a lot from this. For example, the military
was until recently one of the few customers for software systems that had to
be maintained for decades. Now that software and Internet connectivity are
finding their way into safety-critical consumer goods such as cars, software
sustainability is becoming a much wider concern. In 2019, the European Union
passed a law demanding that if you sell goods with digital components, you
must maintain those components for two years, or for longer if that’s a reasonable expectation of the customer – which will mean ten years for cars and
white goods. If you’re writing software for a car or fridge that will be on sale
for seven years, you’ll have to maintain it for almost twenty years. What tools
should you use?
1.5 Example 3 – a hospital
From bankers and soldiers we move on to healthcare. Hospitals have a number
of interesting protection requirements – mostly to do with patient safety and
privacy.
1.5 Example 3 – a hospital
1. Safety usability is important for medical equipment, and is by no
means a solved problem. Safety usability failures are estimated to
kill about as many people as road traffic accidents – a few tens of
thousands a year in the USA, for example, and a few thousand in
the UK. The biggest single problem is with the infusion pumps
used to drip-feed patients with drugs; a typical hospital might have
half-a-dozen makes, all with somewhat different controls, making
fatal errors more likely. Safety usability interacts with security: unsafe
devices that are also found to be hackable are much more likely to
have product recalls ordered as regulators know that the public’s
appetite for risk is lower when hostile action becomes a possibility. So
as more and more medical devices acquire not just software but radio
communications, security sensitivities may lead to better safety.
2. Patient record systems should not let all the staff see every patient’s
record, or privacy violations can be expected. In fact, since the second
edition of this book, the European Court has ruled that patients have
a right to restrict their personal health information to the clinical staff
involved in their care. That means that systems have to implement rules
such as “nurses can see the records of any patient who has been cared for
in their department at any time during the previous 90 days”. This can
be harder than it looks. (The US HIPAA legislation sets easier standards
for compliance but is still a driver of information security investment.)
3. Patient records are often anonymized for use in research, but this
is hard to do well. Simply encrypting patient names is not enough:
an enquiry such as “show me all males born in 1953 who were
treated for atrial fibrillation on October 19th 2003” should be enough
to target former Prime Minister Tony Blair, who was rushed to
hospital that day to be treated for an irregular heartbeat. Figuring
out what data can be anonymized effectively is hard, and it’s also
a moving target as we get more and more social and contextual
data – not to mention the genetic data of relatives near and far.
4. New technology can introduce poorly-understood risks. Hospital
administrators understand the need for backup procedures to deal
with outages of power; hospitals are supposed to be able to deal with
casualties even if their mains electricity and water supplies fail. But
after several hospitals in Britain had machines infected by the Wannacry
malware in May 2017, they closed down their networks to limit further
infection, and then found that they had to close their accident and
emergency departments – as X-rays no longer travel from the X-ray
machine to the operating theatre in an envelope, but via a server in a
distant town. So a network failure can stop doctors operating when a
power failure would not. There were standby generators, but no standby
9
10
Chapter 1
■
What Is Security Engineering?
network. Cloud services can make things more reliable on average, but
the failures can be bigger, more complex, and correlated. An issue surfaced by the coronavirus pandemic is accessory control: some medical
devices authenticate their spare parts, just as printers authenticate ink
cartridges. Although the vendors claim this is for safety, it’s actually
so they can charge more money for spares. But it introduces fragility:
when the supply chain gets interrupted, things are a lot harder to fix.
We’ll look at medical system security (and safety too) in more detail later.
This is a younger field than banking IT or military systems, but as healthcare
accounts for a larger proportion of GNP than either of them in all developed
countries, its importance is growing. It’s also consistently the largest source of
privacy breaches in countries with mandatory reporting.
1.6 Example 4 – the home
You might not think that the typical family operates any secure systems. But
just stop and think.
1. You probably use some of the systems I’ve already described.
You may use a web-based electronic banking system to pay bills,
and you may have online access to your doctor’s surgery so
you can order repeat prescriptions. If you’re diabetic then your
insulin pump may communicate with a docking station at your
bedside. Your home burglar alarm may send an encrypted ‘all’s
well’ signal to the security company every few minutes, rather
than waking up the neighborhood when something happens.
2. Your car probably has an electronic immobilizer. If it was made before
about 2015, the car unlocks when you press a button on the key, which
sends an encrypted unlock command. If it’s a more recent model, where
you don’t have to press any buttons but just have the key in your pocket,
the car sends an encrypted challenge to the key and waits for the right
response. But eliminating the button press meant that if you leave your
key near the front door, a thief might use a radio relay to steal your
car. Car thefts have shot up since this technology was introduced.
3. Your mobile phone authenticates itself to the network by a cryptographic challenge-response protocol similar to the ones used
in car door locks and immobilizers, but the police can use a false
base station (known in Europe as an IMSI-catcher, and in America as a Stingray) to listen in. And, as I mentioned above, many
phone companies are relaxed about selling new SIM cards to people
who claim their phones have been stolen; so a crook might steal
your phone number and use this to raid your bank account.
1.7 Definitions
4. In over 100 countries, households can get prepayment meters
for electricity and gas, which they top up using a 20-digit code
that they buy from an ATM or an online service. It even works
off-grid; in Kenyan villages, people who can’t afford $200 to buy
a solar panel can get one for $2 a week and unlock the electricity
it generates using codes they buy with their mobile phones.
5. Above all, the home provides a haven of physical security and
seclusion. This is changing in a number of ways. Burglars aren’t
worried by locks as much as by occupants, so alarms and monitoring
systems can help; but monitoring is also becoming pervasive, with
many households buying systems like Alexa and Google Home
that listen to what people say. All sorts of other gadgets now have
microphones and cameras as voice and gesture interfaces become
common, and the speech processing is typically done in the cloud
to save battery life. By 2015, President Obama’s council of advisers
on science and technology was predicting that pretty soon every
inhabited space on earth would have microphones that were connected to a small number of cloud service providers. (The USA and
Europe have quite different views on how privacy law should deal
with this.) One way or another, the security of your home may come
to depend on remote systems over which you have little control.
Over the next few years, the number of such systems is going to increase
rapidly. On past experience, many of them will be badly designed. For
example, in 2019, Europe banned a children’s watch that used unencrypted
communications to the vendor’s cloud service; a wiretapper could download
any child’s location history and cause their watch to phone any number in the
world. When this was discovered, the EU ordered the immediate safety recall
of all watches [903].
This book aims to help you avoid such outcomes. To design systems that are
safe and secure, an engineer needs to know about what systems there are, how
they work, and – at least as important – how they have failed in the past. Civil
engineers learn far more from the one bridge that falls down than from the
hundred that stay up; exactly the same holds in security engineering.
1.7 Definitions
Many of the terms used in security engineering are straightforward, but some
are misleading or even controversial. There are more detailed definitions of
technical terms in the relevant chapters, which you can find using the index.
In this section, I’ll try to point out where the main problems lie.
11
12
Chapter 1
■
What Is Security Engineering?
The first thing we need to clarify is what we mean by system. In practice, this
can denote:
1. a product or component, such as a cryptographic protocol, a
smartcard, or the hardware of a phone, a laptop or server;
2. one or more of the above plus an operating system, communications and
other infrastructure;
3. the above plus one or more applications (banking app, health
app, media player, browser, accounts/payroll package, and
so on – including both client and cloud components);
4. any or all of the above plus IT staff;
5. any or all of the above plus internal users and management;
6. any or all of the above plus customers and other external users.
Confusion between the above definitions is a fertile source of errors and vulnerabilities. Broadly speaking, the vendor and evaluator communities focus
on the first and (occasionally) the second of them, while a business will focus
on the sixth (and occasionally the fifth). We will come across many examples
of systems that were advertised or even certified as secure because the hardware was, but that broke badly when a particular application was run, or when
the equipment was used in a way the designers didn’t anticipate. Ignoring the
human components, and thus neglecting usability issues, is one of the largest
causes of security failure. So we will generally use definition 6; when we take
a more restrictive view, it should be clear from the context.
The next set of problems comes from lack of clarity about who the players are
and what they’re trying to prove. In the literature on security and cryptology,
it’s a convention that principals in security protocols are identified by names
chosen with (usually) successive initial letters – much like hurricanes, except
that we use alternating genders. So we see lots of statements such as “Alice
authenticates herself to Bob”. This makes things much more readable, but can
come at the expense of precision. Do we mean that Alice proves to Bob that
her name actually is Alice, or that she proves she’s got a particular credential?
Do we mean that the authentication is done by Alice the human being, or by a
smartcard or software tool acting as Alice’s agent? In that case, are we sure it’s
Alice, and not perhaps Carol to whom Alice lent her card, or David who stole
her phone, or Eve who hacked her laptop?
By a subject I will mean a physical person in any role including that of an
operator, principal or victim. By a person, I will mean either a physical person
or a legal person such as a company or government1 .
1 The
law around companies may come in handy when we start having to develop rules around
AI. A company, like a robot, may be immortal and have some functional intelligence – but without
consciousness. You can’t jail a company but you can fine it.
1.7 Definitions
A principal is an entity that participates in a security system. This entity can
be a subject, a person, a role, or a piece of equipment such as a laptop, phone,
smartcard, or card reader. A principal can also be a communications channel
(which might be a port number, or a crypto key, depending on the circumstance). A principal can also be a compound of other principals; examples are
a group (Alice or Bob), a conjunction (Alice and Bob acting together), a compound role (Alice acting as Bob’s manager) and a delegation (Bob acting for
Alice in her absence).
Beware that groups and roles are not the same. By a group I will mean
a set of principals, while a role is a set of functions assumed by different
persons in succession (such as ‘the officer of the watch on the USS Nimitz’
or ‘the president for the time being of the Icelandic Medical Association’). A
principal may be considered at more than one level of abstraction: e.g. ‘Bob
acting for Alice in her absence’ might mean ‘Bob’s smartcard representing
Bob who is acting for Alice in her absence’ or even ‘Bob operating Alice’s
smartcard in her absence’. When we have to consider more detail, I’ll be
more specific.
The meaning of the word identity is controversial. When we have to be careful, I will use it to mean a correspondence between the names of two principals
signifying that they refer to the same person or equipment. For example, it may
be important to know that the Bob in ‘Alice acting as Bob’s manager’ is the
same as the Bob in ‘Bob acting as Charlie’s manager’ and in ‘Bob as branch
manager signing a bank draft jointly with David’. Often, identity is abused to
mean simply ‘name’, an abuse entrenched by such phrases as ‘user identity’
and ‘citizen identity card’.
The definitions of trust and trustworthy are often confused. The following
example illustrates the difference: if an NSA employee is observed in a toilet stall at Baltimore Washington International airport selling key material to a
Chinese diplomat, then (assuming his operation was not authorized) we can
describe him as ‘trusted but not trustworthy’. I use the NSA definition that a
trusted system or component is one whose failure can break the security policy,
while a trustworthy system or component is one that won’t fail.
There are many alternative definitions of trust. In the corporate world,
trusted system might be ‘a system which won’t get me fired if it gets hacked
on my watch’ or even ‘a system which we can insure’. But when I mean an
approved system, an insurable system or an insured system, I’ll say so.
The definition of confidentiality versus privacy versus secrecy opens another
can of worms. These terms overlap, but are not exactly the same. If my neighbor
cuts down some ivy at our common fence with the result that his kids can look
into my garden and tease my dogs, it’s not my confidentiality that has been
invaded. And the duty to keep quiet about the affairs of a former employer is
a duty of confidence, not of privacy.
13
14
Chapter 1
■
What Is Security Engineering?
The way I’ll use these words is as follows.
Secrecy is an engineering term that refers to the effect of the mechanisms used to limit the number of principals who can access
information, such as cryptography or computer access controls.
Confidentiality involves an obligation to protect some other person’s or
organisation’s secrets if you know them.
Privacy is the ability and/or right to protect your personal information
and extends to the ability and/or right to prevent invasions of your
personal space (the exact definition of which varies from one country to
another). Privacy can extend to families but not to legal persons such as
corporations.
For example, hospital patients have a right to privacy, and in order to
uphold this right the doctors, nurses and other staff have a duty of confidence
towards their patients. The hospital has no right of privacy in respect of its
business dealings but those employees who are privy to them may have
a duty of confidence (unless they invoke a whistleblowing right to expose
wrongdoing). Typically, privacy is secrecy for the benefit of the individual
while confidentiality is secrecy for the benefit of the organisation.
There is a further complexity in that it’s often not sufficient to protect
data, such as the contents of messages; we also have to protect metadata,
such as logs of who spoke to whom. For example, many countries have
laws making the treatment of sexually transmitted diseases secret, and yet
if a private eye could observe you exchanging encrypted messages with a
sexually-transmitted disease clinic, he might infer that you were being treated
there. In fact, a key privacy case in the UK turned on such a fact: a model in
Britain won a privacy lawsuit against a tabloid newspaper which printed a
photograph of her leaving a meeting of Narcotics Anonymous. So anonymity
can be just as important a factor in privacy (or confidentiality) as secrecy. But
anonymity is hard. It’s difficult to be anonymous on your own; you usually
need a crowd to hide in. Also, our legal codes are not designed to support
anonymity: it’s much easier for the police to get itemized billing information
from the phone company, which tells them who called whom, than it is to get
an actual wiretap. (And it’s often more useful.)
The meanings of authenticity and integrity can also vary subtly. In the academic literature on security protocols, authenticity means integrity plus freshness: you have established that you are speaking to a genuine principal, not
a replay of previous messages. We have a similar idea in banking protocols. If
local banking laws state that checks are no longer valid after six months, a seven
month old uncashed check has integrity (assuming it’s not been altered) but
is no longer valid. However, there are some strange edge cases. For example,
a police crime scene officer will preserve the integrity of a forged check – by
1.7 Definitions
placing it in an evidence bag. (The meaning of integrity has changed in the new
context to include not just the signature but any fingerprints.)
The things we don’t want are often described as hacking. I’ll follow Bruce
Schneier and define a hack as something a system’s rules permit, but which
was unanticipated and unwanted by its designers [1682]. For example, tax
attorneys study the tax code to find loopholes which they develop into tax
avoidance strategies; in exactly the same way, black hats study software code
to find loopholes which they develop into exploits. Hacks can target not just
the tax system and computer systems, but the market economy, our systems
for electing leaders and even our cognitive systems. They can happen at
multiple layers: lawyers can hack the tax code, or move up the stack and
hack the legislature, or even the media. In the same way, you might try to
hack a cryptosystem by finding a mathematical weakness in the encryption
algorithm, or you can go down a level and measure the power drawn by
a device that implements it in order to work out the key, or up a level and
deceive the device’s custodian into using it when they shouldn’t. This book
contains many examples. In the broader context, hacking is sometimes a
source of significant innovation. If a hack becomes popular, the rules may be
changed to stop it; but it may also become normalised (examples range from
libraries through the filibuster to search engines and social media).
The last matter I’ll clarify here is the terminology that describes what we’re
trying to achieve. A vulnerability is a property of a system or its environment
which, in conjunction with an internal or external threat, can lead to a security
failure, which is a breach of the system’s security policy. By security policy I will
mean a succinct statement of a system’s protection strategy (for example, “in
each transaction, sums of credits and debits are equal, and all transactions
over $1,000,000 must be authorized by two managers”). A security target is
a more detailed specification which sets out the means by which a security
policy will be implemented in a particular product – encryption and digital
signature mechanisms, access controls, audit logs and so on – and which
will be used as the yardstick to evaluate whether the engineers have done a
proper job. Between these two levels you may find a protection profile which
is like a security target, except written in a sufficiently device-independent
way to allow comparative evaluations among different products and different
versions of the same product. I’ll elaborate on security policies, security targets
and protection profiles in Part 3. In general, the word protection will mean a
property such as confidentiality or integrity, defined in a sufficiently abstract
way for us to reason about it in the context of general systems rather than
specific implementations.
This somewhat mirrors the terminology we use for safety-critical systems,
and as we are going to have to engineer security and safety together in ever
more applications it is useful to keep thinking of the two side by side.
15
16
Chapter 1
■
What Is Security Engineering?
In the safety world, a critical system or component is one whose failure could
lead to an accident, given a hazard – a set of internal conditions or external
circumstances. Danger is the probability that a hazard will lead to an accident,
and risk is the overall probability of an accident. Risk is thus hazard level combined with danger and latency – the hazard exposure and duration. Uncertainty
is where the risk is not quantifiable, while safety is freedom from accidents.
We then have a safety policy which gives us a succinct statement of how risks
will be kept below an acceptable threshold (and this might range from succinct, such as “don’t put explosives and detonators in the same truck”, to the
much more complex policies used in medicine and aviation); at the next level
down, we might find a safety case having to be made for a particular component such as an aircraft, an aircraft engine or even the control software for an
aircraft engine.
1.8 Summary
‘Security’ is a terribly overloaded word, which often means quite incompatible
things to different people. To a corporation, it might mean the ability to monitor
all employees’ email and web browsing; to the employees, it might mean being
able to use email and the web without being monitored.
As time goes on, and security mechanisms are used more and more by the
people who control a system’s design to gain some commercial advantage over
the other people who use it, we can expect conflicts, confusion and the deceptive use of language to increase.
One is reminded of a passage from Lewis Carroll:
“When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means
just what I choose it to mean – neither more nor less.” “The question is,” said
Alice, “whether you can make words mean so many different things.” “The question is,” said Humpty Dumpty, “which is to be master – that’s all.”
The security engineer must be sensitive to the different nuances of meaning
that words acquire in different applications, and be able to formalize what the
security policy and target actually are. That may sometimes be inconvenient for
clients who wish to get away with something, but, in general, robust security
design requires that the protection goals are made explicit.
CHAPTER
2
Who Is the Opponent?
Going all the way back to early time-sharing systems we systems people regarded the users,
and any code they wrote, as the mortal enemies of us and each other. We were like the
police force in a violent slum.
– ROGER NEEDHAM
False face must hide what the false heart doth know.
– MACBETH
2.1 Introduction
Ideologues may deal with the world as they would wish it to be, but engineers
deal with the world as it is. If you’re going to defend systems against attack,
you first need to know who your enemies are.
In the early days of computing, we mostly didn’t have real enemies; while
banks and the military had to protect their systems, most other people didn’t
really bother. The first computer systems were isolated, serving a single
company or university. Students might try to hack the system to get more
resources and sysadmins would try to stop them, but it was mostly a game.
When dial-up connections started to appear, pranksters occasionally guessed
passwords and left joke messages, as they’d done at university. The early
Internet was a friendly place, inhabited by academics, engineers at tech
companies, and a few hobbyists. We knew that malware was possible but
almost nobody took it seriously until the late 1980s when PC viruses appeared,
followed by the Internet worm in 1988. (Even that was a student experiment
that escaped from the lab; I tell the story in section 21.3.2.)
Things changed once everyone started to get online. The mid-1990s saw the
first spam, the late 1990s brought the first distributed denial-of-service attack,
and the explosion of mail-order business in the dotcom boom introduced credit
card fraud. To begin with, online fraud was a cottage industry; the same person would steal credit card numbers and use them to buy goods which he’d
17
18
Chapter 2
■
Who Is the Opponent?
then sell, or make up forged cards to use in a store. Things changed in the
mid-2000s with the emergence of underground markets. These let the bad guys
specialise – one gang could write malware, another could harvest bank credentials, and yet others could devise ways of cashing out. This enabled them to get
good at their jobs, to scale up and to globalise, just as manufacturing did in the
late eighteenth century. The 2000s also saw the world’s governments putting
in the effort to ‘Master the Internet’ (as the NSA put it) – working out how to
collect data at scale and index it, just as Google does, to make it available to
analysts. It also saw the emergence of social networks, so that everyone could
have a home online – not just geeks with the skills to create their own handcrafted web pages. And of course, once everyone is online, that includes not
just spies and crooks but also jerks, creeps, racists and bullies.
Over the past decade, this threat landscape has stabilised. We also know quite
a lot about it. Thanks to Ed Snowden and other whistleblowers, we know a lot
about the capabilities and methods of Western intelligence services; we’ve also
learned a lot about China, Russia and other nation-state threat actors. We know
a lot about cybercrime; online crime now makes up about half of all crime, by
volume and by value. There’s a substantial criminal infrastructure based on
malware and botnets with which we are constantly struggling; there’s also a
large ecosystem of scams. Many traditional crimes have gone online, and a
typical firm has to worry not just about external fraudsters but also about dishonest insiders. Some firms have to worry about hostile governments, some
about other firms, and some about activists. Many people have to deal with
online hostility, from kids suffering cyber-bullying at school through harassment of elected politicians to people who are stalked by former partners. And
our politics may become more polarised because of the dynamics of online
extremism.
One of the first things the security engineer needs to do when tackling a new
problem is to identify the likely opponents. Although you can design some
specific system components (such as cryptography) to resist all reasonable
adversaries, the same is much less true for a complex real-world system.
You can’t protect it against all possible threats and still expect it to do useful
work at a reasonable cost. So what sort of capabilities will the adversaries
have, and what motivation? How certain are you of this assessment, and how
might it change over the system’s lifetime? In this chapter I will classify online
and electronic threats depending on motive. First, I’ll discuss surveillance,
intrusion and manipulation done by governments for reasons of state, ranging
from cyber-intelligence to cyber-conflict operations. Second, I’ll deal with
criminals whose motive is mainly money. Third will be researchers who
find vulnerabilities for fun or for money, or who report them out of social
conscience – compelling firms to patch their software and clean up their
operations. Finally, I’ll discuss bad actors whose reasons are personal and who
mainly commit crimes against the person, from cyber-bullies to stalkers.
2.2 Spies
The big service firms, such as Microsoft, Google and Facebook, have to worry
about all four classes of threat. Most firms and most private individuals will
only be concerned with some of them. But it’s important for a security engineer
to understand the big picture so you can help clients work out what their own
threat model should be, and what sort of attacks they should plan to forestall.
2.2 Spies
Governments have a range of tools for both passive surveillance of networks
and active attacks on computer systems. Hundreds of firms sell equipment for
wiretapping, for radio intercept, and for using various vulnerabilities to take
over computers, phones and other digital devices. However, there are significant differences among governments in scale, objectives and capabilities. We’ll
discuss four representative categories – the USA and its allies, China, Russia
and the Arab world – from the viewpoint of potential opponents. Even if spies
aren’t in your threat model today, the tools they use will quite often end up in
the hands of the crooks too, sooner or later.
2.2.1 The Five Eyes
Just as everyone in a certain age range remembers where they were when John
Lennon was shot, everyone who’s been in our trade since 2013 remembers
where they were when they learned of the Snowden revelations on Friday 7th
June of that year.
2.2.1.1 Prism
I was in a hotel in Palo Alto, California, reading the Guardian online before
a scheduled visit to Google where I’d been as a scientific visitor in 2011, helping develop contactless payments for Android phones. The headline was ‘NSA
Prism program taps in to user data of Apple, Google and others’; the article,
written by Glenn Greenwald and Ewen MacAskill, describes a system called
Prism that collects the Gmail and other data of users who are not US citizens or permanent residents, and is carried out under an order from the FISA
court [818]. After breakfast I drove to the Googleplex, and found that my former colleagues were just as perplexed as I was. They knew nothing about
Prism. Neither did the mail team. How could such a wiretap have been built?
Had an order been served on Eric Schmidt, and if so how could he have implemented it without the mail and security teams knowing? As the day went on,
people stopped talking.
19
20
Chapter 2
■
Who Is the Opponent?
It turned out that Prism was an internal NSA codename for an access channel that had been provided to the FBI to conduct warranted wiretaps. US law
permits US citizens to be wiretapped provided an agency convinces a court
to issue a warrant, based on ‘probable cause’ that they were up to no good;
but foreigners could be wiretapped freely. So for a foreign target like me, all
an NSA intelligence analyst had to do was click on a tab saying they believed
I was a non-US person. The inquiry would be routed automatically via the
FBI infrastructure and pipe my Gmail to their workstation. According to the
article, this program had started at Microsoft in 2007; Yahoo had fought it in
court, but lost, joining in late 2008; Google and Facebook had been added in
2009 and Apple finally in 2012. A system that people thought was providing
targeted, warranted wiretaps to law enforcement was providing access at scale
for foreign intelligence purposes, and according to a slide deck leaked to the
Guardian it was ‘the SIGAD1 most used in NSA reporting’.
The following day we learned that the source of the story was Edward Snowden, an NSA system administrator who’d decided to blow the whistle. The
story was that he’d smuggled over 50,000 classified documents out of a facility
in Hawaii on a memory stick and met Guardian journalists in Hong Kong [819].
He tried to fly to Latin America on June 21st to claim asylum, but after the US
government cancelled his passport he got stuck in Moscow and eventually got
asylum in Russia instead. A consortium of newspapers coordinated a series of
stories describing the signals intelligence capabilities of the ‘Five Eyes’ countries – the USA, the UK, Canada, Australia and New Zealand – as well as how
these capabilities were not just used but also abused.
The first story based on the leaked documents had actually appeared two
days before the Prism story; it was about how the FISA court had ordered Verizon to hand over all call data records (CDRs) to the NSA in February that
year [815]. This hadn’t got much attention from security professionals as we
knew the agencies did that anyway. But it certainly got the attention of lawyers
and politicians, as it broke during the Privacy Law Scholars’ Conference and
showed that US Director of National Intelligence James Clapper had lied to
Congress when he’d testified that the NSA collects Americans’ domestic communications ‘only inadvertently’. And what was to follow changed everything.
2.2.1.2 Tempora
On June 21st, the press ran stories about Tempora, a program to collect intelligence from international fibre optic cables [1201]. This wasn’t a complete surprise; the journalist Duncan Campbell had described a system called Echelon
in 1988 which tapped the Intelsat satellite network, keeping voice calls on tape
while making metadata available for searching so that analysts could select
1 Sigint
(Signals Intelligence) Activity Designator
2.2 Spies
traffic to or from phone numbers of interest [375, 376] (I’ll give more historical
background in section 26.2.6). Snowden gave us an update on the technology.
In Cornwall alone, 200 transatlantic fibres were tapped and 46 could be collected at any one time. As each of these carried 10Gb/s, the total data volume
could be as high as 21Pb a day, so the incoming data feeds undergo massive volume reduction, discarding video, news and the like. Material was then selected
using selectors – not just phone numbers but more general search terms such as
IP addresses – and stored for 30 days in case it turns out to be of interest.
The Tempora program, like Echelon before it, has heavy UK involvement.
Britain has physical access to about a quarter of the Internet’s backbone, as
modern cables tend to go where phone cables used to, and they were often laid
between the same end stations as nineteenth-century telegraph cables. So one
of the UK’s major intelligence assets turns out to be the legacy of the communications infrastructure it built to control its nineteenth-century empire. And the
asset is indeed significant: by 2012, 300 analysts from GCHQ, and 250 from the
NSA, were sifting through the data, using 40,000 and 31,000 selectors respectively to sift 600m ‘telephone events’ each day.
2.2.1.3 Muscular
One of the applications running on top of Tempora was Muscular. Revealed
on October 30th, this collected data as it flowed between the data centres of
large service firms such as Yahoo and Google [2020]. Your mail may have been
encrypted using SSL en route to the service’s front end, but it then flowed in
the clear between each company’s data centres. After an NSA PowerPoint slide
on ‘Google Cloud Exploitation’ was published in the Washington Post – see
figure 2.1—the companies scrambled to encrypt everything on their networks.
Executives and engineers at cloud service firms took the smiley as a personal
affront. It reminded people in the industry that even if you comply with
warrants, the agencies will also hack you if they can. It made people outside
the industry stop and think: Google had accreted so much access to all our
lives via search, mail, maps, calendars and other services that unrestricted
intelligence-service access to its records (and to Facebook’s and Microsoft’s
too) was a major privacy breach.
Two years later, at a meeting at Princeton which Snowden attended in the
form of a telepresence robot, he pointed out that a lot of Internet communications that appear to be encrypted aren’t really, as modern websites use content
delivery networks (CDNs) such as Akamai and Cloudflare; while the web traffic
is encrypted from the user’s laptop or phone to the CDN’s point of presence
at their ISP, it isn’t encrypted on the backhaul unless they pay extra – which
most of them don’t [87]. So the customer thinks the link is encrypted, and it’s
protected from casual snooping—but not from nation states or from firms who
can read backbone traffic.
21
22
Chapter 2
■
Who Is the Opponent?
Figure 2.1: Muscular – the slide
2.2.1.4 Special collection
The NSA and CIA jointly operate the Special Collection Service (SCS) whose
most visible activity may be the plastic panels near the roofs of US and
allied embassies worldwide; these hide antennas for hoovering up cellular
communication (a program known as ‘Stateroom’). Beyond this, SCS implants
collection equipment in foreign telcos, Internet exchanges and government
facilities. This can involve classical spy tradecraft, from placing bugs that
monitor speech or electronic communications, through recruiting moles in
target organisations, to the covert deployment of antennas in target countries
to tap internal microwave links. Such techniques are not restricted to state
targets: Mexican drug cartel leader ‘El Chapo’ Guzman was caught after US
agents suborned his system administrator.
Close-access operations include Tempest monitoring: the collection of information leaked by the electromagnetic emissions from computer monitors and
other equipment, described in 19.3.2. The Snowden leaks disclose the collection
of computer screen data and other electromagnetic emanations from a number of countries’ embassies and UN missions including those of India, Japan,
Slovakia and the EU2 .
2.2.1.5 Bullrun and Edgehill
Special collection increasingly involves supply-chain tampering. SCS routinely intercepts equipment such as routers being exported from the USA,
2 If
the NSA needs to use high-tech collection against you as they can’t get a software implant into
your computer, that may be a compliment!
2.2 Spies
adds surveillance implants, repackages them with factory seals and sends
them onward to customers. And an extreme form of supply-chain tampering
was when the NSA covertly bought Crypto AG, a Swiss firm that was the
main supplier of cryptographic equipment to non-aligned countries during
the Cold War; I tell the story in more detail later in section 26.2.7.1.
Bullrun is the NSA codename, and Edgehill the GCHQ one, for ‘crypto
enabling’, a $100m-a-year program of tampering with supplies and suppliers
at all levels of the stack. This starts off with attempts to direct, or misdirect,
academic research3 ; it continued with placing trusted people on standards
committees, and using NIST’s influence to get weak standards adopted. One
spectacular incident was the Dual_EC_DRBG debacle, where NIST standardised
a random number generator based on elliptic curves that turned out to
contain an NSA backdoor. Most of the actual damage, though, was done by
restrictions on cryptographic key length, dovetailed with diplomatic pressure
on allies to enforce export controls, so that firms needing export licenses could
have their arms twisted to use an ‘appropriate’ standard, and was entangled
with the Crypto Wars (which I discuss in section 26.2.7). The result was that
many of the systems in use today were compelled to use weak cryptography,
leading to vulnerabilities in everything from hotel and car door locks to VPNs.
In addition to that, supply-chain attacks introduce covert vulnerabilities into
widely-used software; many nation states play this game, along with some
private actors [892]. We’ll see vulnerabilities that result from surveillance and
cryptography policies in one chapter after another, and return in Part 3 of the
book to discuss the policy history in more detail.
2.2.1.6 Xkeyscore
With such a vast collection of data, you need good tools to search it. The
Five Eyes search computer data using Xkeyscore, a distributed database that
enables an analyst to search collected data remotely and assemble the results.
Exposed on July 31 2013, NSA documents describe it as its “widest-reaching”
system for developing intelligence; it enables an analyst to search emails,
SMSes, chats, address book entries and browsing histories [816]. Examples
in a 2008 training deck include “my target speaks German but is in Pakistan.
How can I find him?” “Show me all the encrypted Word documents from
Iran” and “Show me all PGP usage in Iran”. By searching for anomalous
behaviour, the analyst can find suspects and identify strong selectors (such
3 In
the 1990s, when I bid to run a research program in coding theory, cryptography and computer security at the Isaac Newton Institute at Cambridge University, a senior official from GCHQ
offered the institute a £50,000 donation not to go ahead, saying “There’s nothing interesting
happening in cryptography, and Her Majesty’s Government would like this state of affairs to
continue”. He was shown the door and my program went ahead.
23
24
Chapter 2
■
Who Is the Opponent?
as email addresses, phone numbers or IP addresses) for more conventional
collection.
Xkeyscore is a federated system, where one query scans all sites. Its components buffer information at collection points – in 2008, 700 servers at 150 sites.
Some appear to be hacked systems overseas from which the NSA malware can
exfiltrate data matching a submitted query. The only judicial approval required
is a prompt for the analyst to enter a reason why they believe that one of the
parties to the conversation is not resident in the USA. The volumes are such that
traffic data are kept for 30 days but content for only 3–5 days. Tasked items are
extracted and sent on to whoever requested them, and there’s a notification
system (Trafficthief) for tipping off analysts when their targets do anything of
interest. Extraction is based either on fingerprints or plugins – the latter allow
analysts to respond quickly with detectors for new challenges like steganography and homebrew encryption.
Xkeyscore can also be used for target discovery: one of the training queries
is “Show me all the exploitable machines in country X” (machine fingerprints
are compiled by a crawler called Mugshot). For example, it came out in 2015
that GCHQ and the NSA hacked the world’s leading provider of SIM cards,
the Franco-Dutch company Gemalto, to compromise the keys needed to intercept (and if need be spoof) the traffic from hundreds of millions of mobile
phones [1661]. The hack used Xkeyscore to identify the firm’s sysadmins, who
were then phished; agents were also able to compromise billing servers to
suppress SMS billing and authentication servers to steal keys; another technique was to harvest keys in transit from Gemalto to mobile service providers.
According to an interview with Snowden in 2014, Xkeyscore also lets an analyst build a fingerprint of any target’s online activity so that they can be followed automatically round the world. The successes of this system are claimed
to include the capture of over 300 terrorists; in one case, Al-Qaida’s Sheikh
Atiyatallah blew his cover by googling himself, his various aliases, an associate
and the name of his book [1661].
There’s a collection of decks on Xkeyscore with a survey by Morgan
Marquis-Boire, Glenn Greenwald and Micah Lee [1232]; a careful reading of
the decks can be a good starting point for exploring the Snowden hoard4 .
2.2.1.7 Longhaul
Bulk key theft and supply-chain tampering are not the only ways to defeat
cryptography. The Xkeyscore training deck gives an example: “Show me all
the VPN startups in country X, and give me the data so I can decrypt and
discover the users”. VPNs appear to be easily defeated; a decryption service
4 There’s
also a search engine for the collection at https://www.edwardsnowden.com.
2.2 Spies
called Longhaul ingests ciphertext and returns plaintext. The detailed description of cryptanalytic techniques is held as Extremely Compartmented Information
(ECI) and is not found in the Snowden papers, but some of them talk of recent
breakthroughs in cryptanalysis. What might these be?
The leaks do show diligent collection of the protocol messages used to set up
VPN encryption, so some cryptographers suggested in 2015 that some variant
of the “Logjam attack” is feasible for a nation-state attacker against the 1024-bit
prime used by most VPNs and many TLS connections with Diffie-Hellman key
exchange [26]. Others pointed to the involvement of NSA cryptographers in
the relevant standard, and a protocol flaw discovered later; yet others pointed
out that even with advances in number theory or protocol exploits, the NSA
has enough money to simply break 1024-bit Diffie-Hellman by brute force, and
this would be easily justified if many people used the same small number of
prime moduli – which they do [854]. I’ll discuss cryptanalysis in more detail
in Chapter 5.
2.2.1.8 Quantum
There is a long history of attacks on protocols, which can be spoofed, replayed
and manipulated in various ways. (We’ll discuss this topic in detail in
Chapter 4.) The best-documented NSA attack on Internet traffic goes under
the codename of Quantum and involves the dynamic exploitation of one of the
communication end-points. Thus, to tap an encrypted SSL/TLS session to a
webmail provider, the Quantum system fires a ‘shot’ that exploits the browser.
There are various flavours; in ‘Quantuminsert’, an injected packet redirects the
browser to a ‘Foxacid’ attack server. Other variants attack software updates
and the advertising networks whose code runs in mobile phone apps [1999].
2.2.1.9 CNE
Computer and Network Exploitation (CNE) is the generic NSA term for hacking, and it can be used for more than just key theft or TLS session hijacking;
it can be used to acquire access to traffic too. Operation Socialist was the
GCHQ codename for a hack of Belgium’s main telco Belgacom5 in 2010–11.
GCHQ attackers used Xkeyscore to identify three key Belgacom technical staff,
then used Quantuminsert to take over their PCs when they visited sites like
LinkedIn. The attackers then used their sysadmin privileges to install malware
on dozens of servers, including authentication servers to leverage further
access, billing servers so they could cover their tracks, and the company’s
core Cisco routers [734]. This gave them access to large quantities of mobile
5 It
is now called Proximus.
25
26
Chapter 2
■
Who Is the Opponent?
roaming traffic, as Belgacom provides service to many foreign providers when
their subscribers roam in Europe. The idea that one NATO and EU member
state would conduct a cyber-attack on the critical infrastructure of another
took many by surprise. The attack also gave GCHQ access to the phone system
in the European Commission and other European institutions. Given that
these institutions make many of the laws for the UK and other member states,
this was almost as if a US state governor had got his state troopers to hack
AT&T so he could wiretap Congress and the White House.
Belgacom engineers started to suspect something was wrong in 2012, and
realised they’d been hacked in the spring of 2013; an anti-virus company found
sophisticated malware masquerading as Windows files. The story went public in September 2013, and the German news magazine Der Spiegel published
Snowden documents showing that GCHQ was responsible. After the Belgian
prosecutor reported in February 2018, we learned that the attack must have
been authorised by then UK Foreign Secretary William Hague, but there was
not enough evidence to prosecute anyone; the investigation had been hampered in all sorts of ways both technical and political; the software started
deleting itself within minutes of discovery, and institutions such as Europol
(whose head was British) refused to help. The Belgian minister responsible
for telecomms, Alexander de Croo, even suggested that Belgium’s own intelligence service might have informally given the operation a green light [735].
Europol later adopted a policy that it will help investigate hacks of ‘suspected
criminal origin’; it has nothing to say about hacks by governments.
A GCHQ slide deck on CNE explains that it’s used to support conventional
Sigint both by redirecting traffic and by “enabling” (breaking) cryptography; that it must always be “UK deniable”; and that it can also be used for
“effects”, such as degrading communications or “changing users’ passwords
on extremist website” [735]. Other papers show that the agencies frequently
target admins of phone companies and ISPs in the Middle East, Africa and
indeed worldwide – compromising a key technician is “generally the entry
ticket to the network” [1141]. As one phone company executive explained,
“The MNOs were clueless at the time about network security. Most networks
were open to their suppliers for remote maintenance with an ID and password
and the techie in China or India had no clue that their PC had been hacked”.
The hacking tools and methods used by the NSA and its allies are now fairly
well understood; some are shared with law enforcement. The Snowden papers
reveal an internal store where analysts can get a variety of tools; a series of
leaks in 2016–7 by the Shadow Brokers (thought to be Russian military intelligence, the GRU) disclosed a number of actual NSA malware samples, used by
hackers at the NSA’s Tailored Access Operations team to launch attacks [239].
(Some of these tools were repurposed by the Russians to launch the NotPetya
worm and by the North Koreans in Wannacry, as I’ll discuss later.) The best
documentation of all is probably about a separate store of goodies used by the
2.2 Spies
CIA, disclosed in some detail to Wikileaks in the ‘Vault 7’ leaks in 2017. These
include manuals for tools that can be used to install a remote access Trojan on
your machine, with components to geolocate it and to exfiltrate files (including
SSH credentials), audio and video; a tool to jump air gaps by infecting thumb
drives; a tool for infecting wifi routers so they’ll do man-in-the-middle attacks;
and even a tool for watermarking documents so a whistleblower who leaks
them could be tracked. Many of the tools are available not just for Windows
but also for macOS and Android; some infect firmware, making them hard
to remove. There are tools for hacking TVs and IoT devices too, and tools to
hamper forensic investigations. The Vault 7 documents are useful reading if
you’re curious about the specifications and manuals for modern government
malware [2023]. As an example of the law-enforcement use of such tools, in
June 2020 it emerged that the French police in Lille had since 2018 installed
malware on thousands of Android phones running EncroChat, an encrypted
messaging system favoured by criminals, leading to the arrest of 800 criminal suspects in France, the Netherlands, the UK and elsewhere, as well as the
arrest of several police officers for corruption and the seizure of several tons of
drugs [1334].
2.2.1.10 The analyst’s viewpoint
The intelligence analyst thus has a big bag of tools. If they’re trying to find the
key people in an organisation – whether the policymakers advising on a critical decision, or the lawyers involved in laundering an oligarch’s profits – they
can use the traffic data in Xkeyscore to map contact networks. There are various neat tools to help, such as ‘Cotraveler’ which flags up mobile phones that
have traveled together. We have some insight into this process from our own
research into cybercrime, where we scrape tens of millions of messages from
underground forums and analyse them to understand crime types new and
old. One might describe the process as ‘adaptive message mining’. Just as you
use adaptive text mining when you do a web search, and constantly refine your
search terms based on samples of what you find, with message mining you
also have metadata – so you can follow threads, trace actors across forums,
do clustering analysis and use various other tricks to ‘find more messages like
this one’. The ability to switch back and forth between the detailed view you
get from reading individual messages, and the statistical view you get from
analysing bulk collections, is extremely powerful.
Once the analyst moves from the hunting phase to the gathering phase,
they can use Prism to look at the targets’ accounts at Facebook, Google
and Microsoft, while Xkeyscore will let them see what websites they visit.
Traffic data analysis gives still more: despite the growing use of encryption,
the communications to and from a home reveal what app or device is used
27
28
Chapter 2
■
Who Is the Opponent?
when and for how long6 . The agencies are pushing for access to end-to-end
messaging systems such as WhatsApp; in countries like the UK, Australia
and China, legislators have already authorised this, though it’s not at all clear
which US companies might comply (I’ll discuss policy in Chapter 26).
Given a high-value target, there’s a big bag of tools the analyst can install
on their laptop or cellphone directly. They can locate it physically, turn it into
a room bug and even use it as a remote camera. They can download the target’s address book and contact history and feed that into Xkeyscore to search
recursively for their direct and indirect contacts. Meanwhile the analyst can
bug messaging apps, beating the end-to-end encryption by collecting the call
contents once they’ve been decrypted. They can set up an alarm to notify them
whenever the target sends or receives messages of interest, or changes location.
The coverage is pretty complete. And when it’s time for the kill, the target’s
phone can be used to guide a bomb or a missile. Little wonder Ed Snowden
insisted that journalists interviewing him put their phones in the fridge!
Finally, the analyst has also a proxy through which they can access the Internet surreptitiously – typically a machine on a botnet. It might even be the PC
in your home office.
2.2.1.11 Offensive operations
The Director NSA also heads the US Cyber Command, which since 2009 has
been one of ten unified commands of the United States Department of Defense.
It is responsible for offensive cyber operations, of which the one that made a
real difference was Stuxnet. This was a worm designed to damage Iran’s uranium enrichment centrifuges by speeding them up and slowing them down
in patterns designed to cause mechanical damage, and was developed jointly
by the USA and Israel [326, 827]. It was technically sophisticated, using four
zero-day exploits and two stolen code-signing certificates to spread promiscuously through Windows PCs, until it found Siemens programmable logic
controllers of the type used at Iran’s Natanz enrichment plant – where it would
then install a rootkit that would issue the destructive commands, while the PC
assured the operators that everything was fine. It was apparently introduced
using USB drives to bridge the air gap to the Iranian systems, and came to
light in 2010 after copies had somehow spread to central Asia and Indonesia.
Two other varieties of malware (Flame and Duqu) were then discovered using
similar tricks and common code, performing surveillance at a number of companies in the Middle East and South Asia; more recent code-analysis tools have
traced a lineage of malware that goes back to 2002 (Flowershop) and continued
to operate until 2016 (with the Equation Group tools) [2071].
6 See
for example Hill and Mattu who wiretapped a modern smart home to measure this [902].
2.2 Spies
Stuxnet acted as a wake-up call for other governments, which rushed
to acquire ‘cyber-weapons’ and develop offensive cyber doctrine – a set of
principles for what cyber warriors might do, developed with some thought
given to rationale, strategy, tactics and legality. Oh, and the price of zero-day
vulnerabilities rose sharply.
2.2.1.12 Attack scaling
Computer scientists know the importance of how algorithms scale, and exactly
the same holds for attacks. Tapping a single mobile phone is hard. You have to
drive around behind the suspect with radio and cryptanalysis gear in your car,
risk being spotted, and hope that you manage to catch the suspect’s signal as
they roam from one cell to another. Or you can drive behind them with a false
base station7 and hope their phone will roam to it as the signal is louder than
the genuine one; but then you risk electronic detection too. Both are highly
skilled work and low-yield: you lose the signal maybe a quarter of the time.
So if you want to wiretap someone in central Paris often enough, why not just
wiretap everyone? Put antennas on your embassy roof, collect it all, write the
decrypted calls and text messages into a database, and reconstruct the sessions
electronically. If you want to hack everyone in France, hack the telco, perhaps
by subverting the equipment it uses. At each stage the capital cost goes up but
the marginal cost of each tap goes down. The Five Eyes strategy is essentially to
collect everything in the world; it might cost billions to establish and maintain
the infrastructure, but once it’s there you have everything.
The same applies to offensive cyber operations, which are rather like sabotage. In wartime, you can send commandos to blow up an enemy radar station;
but if you do it more than once or twice, your lads will start to run into a
lot of sentries. So we scale kinetic attacks differently: by building hundreds
of bomber aircraft, or artillery pieces, or (nowadays) thousands of drones. So
how do you scale a cyber attack to take down not just one power station, but the
opponent’s whole power grid? The Five Eyes approach is this. Just as Google
keeps a copy of the Internet on a few thousand servers, with all the content and
links indexed, US Cyber Command keeps a copy of the Internet that indexes
what version of software all the machines in the world are using – the Mugshot
system mentioned above – so a Five Eyes cyber warrior can instantly see which
targets can be taken over by which exploits.
A key question for competitor states, therefore, is not just to what extent they
can create some electronic spaces that are generally off-limits to the Five Eyes.
It’s the extent to which they can scale up their own intelligence and offensive
capabilities rather than having to rely on America. The number of scans and
7 These devices are known in the USA as a Stingray and in Europe as an IMSI-catcher; they conduct
a man-in-the-middle attack of the kind we’ll discuss in detail in section 22.3.1.
29
30
Chapter 2
■
Who Is the Opponent?
probes that we see online indicates that the NSA are not alone in trying to build
cyber weapons that scale. Not all of them might be nation states; some might
simply be arms vendors or mercenaries. This raises a host of policy problems
to which we’ll return in Part 3. For now we’ll continue to look at capabilities.
2.2.2 China
China is now the leading competitor to the USA, being second not just in terms
of GDP but as a technology powerhouse. The Chinese lack the NSA’s network of alliances and access to global infrastructure (although they’re working
hard at that). Within China itself, however, they demand unrestricted access to
local data. Some US service firms used to operate there, but trouble followed.
After Yahoo’s systems were used to trap the dissident Wang Xiaoning in 2002,
Alibaba took over Yahoo’s China operation in 2005; but there was still a row
when Wang’s wife sued Yahoo in US courts in 2007, and showed that Yahoo had
misled Congress over the matter [1764]. In 2008, it emerged that the version of
Skype available in China had been modified so that messages were scanned for
sensitive keywords and, if they were found, the user’s texts were uploaded to a
server in China [1963]. In December 2009, Google discovered a Chinese attack
on its corporate infrastructure, which became known as Operation Aurora;
Chinese agents had hacked into the Google systems used to do wiretaps for
the FBI (see Prism above) in order to discover which of their own agents in
the USA were under surveillance. Google had already suffered criticism for
operating a censored version of their search engine for Chinese users, and a
few months later, they pulled out of China. By this time, Facebook, Twitter and
YouTube had already been blocked. A Chinese strategy was emerging of total
domestic control, augmented by ever-more aggressive collection overseas.
From about 2002, there had been a series of hacking attacks on US and UK
defence agencies and contractors, codenamed ‘Titan Rain’ and ascribed to the
Chinese armed forces. According to a 2004 study by the US Foreign Military
Studies Office (FMSO), Chinese military doctrine sees the country in a state of
war with the West; we are continuing the Cold War by attacking China, trying
to overthrow its communist regime by exporting subversive ideas to it over
the Internet [1884]. Chinese leaders see US service firms, news websites and
anonymity tools such as Tor (which the State Department funds so that Chinese
and other people can defeat censorship) as being of one fabric with the US
surveillance satellites and aircraft that observe their military defences. Yahoo
and Google were thus seen as fair game, just like Lockheed Martin and BAe.
Our own group’s first contact with the Chinese came in 2008. We were asked
for help by the Dalai Lama, who had realised that the Chinese had hacked
his office systems in the run-up to the Beijing Olympics that year. One of my
research students, Shishir Nagaraja, happened to be in Delhi waiting for his UK
2.2 Spies
visa to be renewed, so he volunteered to go up to the Tibetan HQ in Dharamsala
and run some forensics. He found that about 35 of the 50 PCs in the office of the
Tibetan government in exile had been hacked; information was being siphoned
off to China, to IP addresses located near the three organs of Chinese state security charged with different aspects of Tibetan affairs. The attackers appear to
have got in by sending one of the monks an email that seemed to come from a
colleague; when he clicked on the attached PDF, it had a JavaScript buffer overflow that used a vulnerability in Adobe Reader to take over his machine. This
technique is called phishing, as it works by offering a lure that someone bites on;
when it’s aimed at a specific individual (as in this case) it’s called spear phishing.
They then compromised the Tibetans’ mail server, so that whenever one person in the office sent a .pdf file to another, it would arrive with an embedded
attack. The mail server itself was in California.
This is pretty sobering, when you stop to think about it. You get an email from
a colleague sitting ten feet away, you ask him if he just sent it – and when he
says yes, you click on the attachment. And your machine is suddenly infected
by a server that you rent ten thousand miles away in a friendly country. We
wrote this up in a tech report on the ‘Snooping Dragon’ [1376]. After it came
out, we had to deal for a while with attacks on our equipment, and heckling
at conference talks by Chinese people who claimed we had no evidence to
attribute the attacks to their government. Colleagues at the Open Net Initiative in Toronto followed through, and eventually found from analysis of the
hacking tools’ dashboard that the same espionage network had targeted 1,295
computers in 103 countries [1225] – ranging from the Indian embassy in Washington through Associated Press in New York to the ministries of foreign affairs
in Thailand, Iran and Laos.
There followed a series of further reports of Chinese state hacking, from a
complex dispute with Rio Tinto in 2009 over the price of iron ore and a hack of
the Melbourne International Film festival in the same year when it showed a
film about a Uighur leader [1902]. In 2011, the Chinese hacked the CIA’s covert
communications system, after the Iranians had traced it, and executed about 30
agents – though that did not become publicly known till later [578]. The first
flashbulb moment was a leaked Pentagon report in 2013 that Chinese hackers had stolen some of the secrets of the F35 joint strike fighter, as well as a
series of other weapon systems [1381]. Meanwhile China and Hong Kong were
amounting for over 80% of all counterfeit goods seized at US ports. The Obama
administration vowed to make investigations and prosecutions in the theft of
trade secrets a top priority, and the following year five members of the People’s
Liberation Army were indicted in absentia.
The White House felt compelled to act once more after the June 2015 news
that the Chinese had hacked the Office of Personnel Management (OPM),
getting access to highly personal data on 22 million current and former federal
31
32
Chapter 2
■
Who Is the Opponent?
employees, ranging from fingerprints to sensitive information from security
clearance interviews. Staff applying for Top Secret clearances are ordered to
divulge all information that could be used to blackmail them, from teenage
drug use to closeted gay relationships. All sexual partners in the past five years
have to be declared for a normal Top Secret clearance; for a Strap clearance (to
deal with signals intelligence material) the candidate even has to report any
foreigners they meet regularly at their church. So this leak affected more than
just 22 million people. Officially, this invasive data collection is to mitigate
the risk that intelligence agency staff can be blackmailed. (Cynics supposed it
was also so that whistleblowers could be discredited.) Whatever the motives,
putting all such information in one place was beyond stupid; it was a real
‘database of ruin’. For the Chinese to get all the compromising information on
every American with a sensitive government job was jaw-dropping. (Britain
screwed up too; in 2008, a navy officer lost a laptop containing the personal
data of 600,000 people who had joined the Royal Navy, or tried to [1074].) At
a summit in September that year, Presidents Obama and Xi agreed to refrain
from computer-enabled theft of intellectual property for commercial gain8 .
Nothing was said in public though about military secrets – or the sex lives of
federal agents.
The Chinese attacks of the 2000s used smart people plus simple tools; the
attacks on the Tibetans used Russian crimeware as the remote access Trojans.
The state also co-opted groups of ‘patriotic hackers’, or perhaps used them
for deniability; some analysts noted waves of naïve attacks on western firms
that were correlated with Chinese university terms, and wondered whether
students had been tasked to hack as coursework. The UK police and security
service warned UK firms in 2007. By 2009, multiple Chinese probes had
been reported on US electricity firms, and by 2010, Chinese spear-phishing
attacks had been reported on government targets in the USA, Poland and
Belgium [1306]. As with the Tibetan attacks, these typically used crude tools
and had such poor operational security that it was fairly clear where they
came from.
By 2020 the attacks had become more sophisticated, with a series of advanced
persistent threats (APTs) tracked by threat intelligence firms. A campaign
to hack the phones of Uighurs involved multiple zero-day attacks, even on
iPhones, that were delivered via compromised Uighur websites [395]; this
targeted not only Uighurs in China but the diaspora too. China also conducts
industrial and commercial espionage, and Western agencies claim they exploit
8 The
Chinese have kept their promise; according to US firms doing business in China, IP is now
sixth on the list of concerns, down from second in 2014 [704]. In any case, the phrase ‘IP theft’ was
always a simplification, used to conflate the theft of classified information from defence contractors with the larger issue of compelled technology transfer by other firms who wanted access to
Chinese markets and the side-issue of counterfeiting.
2.2 Spies
managed service providers9 . Another approach was attacking software
supply chains; a Chinese group variously called Wicked Panda or Barium
compromised software updates from computer maker Asus, a PC cleanup
tool and a Korean remote management tool, as well as three popular computer
games, getting its malware installed on millions of machines; rather than
launching banking trojans or ransomware, it was then used for spying [811].
Just as in GCHQ’s Operation Socialist, such indirect strategies give a way to
scale attacks in territory where you’re not the sovereign. And China was also
playing the Socialist game: it came out in 2019 that someone had hacked at
least ten western mobile phone companies over the previous seven years and
exfiltrated call data records – and that the perpetrators appeared to be the
APT10 gang, linked to the Chinese military [2021].
Since 2018 there has been a political row over whether Chinese firms should
be permitted to sell routers and 5G network hardware in NATO countries, with
the Trump administration blacklisting Huawei in May 2019. There had been
a previous spat over another Chinese firm, ZTE; in 2018 GCHQ warned that
ZTE equipment “would present risk to UK national security that could not
be mitigated effectively or practicably” [1477]10 . President Trump banned ZTE
for breaking sanctions on North Korea and Iran, but relented and allowed its
equipment back in the USA subject to security controls11 .
The security controls route had been tried with Huawei, which set up a
centre in Oxfordshire in 2010 where GCHQ could study its software as a
condition of the company’s being allowed to sell in the UK. While the analysts
did not find any backdoors, their 2019 report surfaced some scathing criticisms
of Huawei’s software engineering practices [933]. Huawei had copied a lot
of code, couldn’t patch what they didn’t understand, and no progress was
being made in tackling many problems despite years of promises. There
was an unmanageable number of versions of OpenSSL, including versions
that had known vulnerabilities and that were not supported: 70 full copies
of 4 different OpenSSL versions, and 304 partial copies of 14 versions. Not
only could the Chinese hack the Huawei systems; so could anybody. Their
equipment had been excluded for some years from UK backbone routers and
from systems used for wiretapping. The UK demanded “sustained evidence
of improvement across multiple versions and multiple product ranges” before
9
This became public in 2019 with the claim that they had hacked Wipro and used this to compromise their customers [1095]; but it later emerged that Wipro had been hacked by a crime gang
operating for profit.
10 The only router vendor to have actually been caught with a malicious backdoor in its code is
the US company Juniper, which not only used the NSA’s Dual-EC backdoor to make VPN traffic
exploitable, but did it in such a clumsy way that others could exploit it too – and at least one other
party did so [415].
11 This was done as a favour to President Xi, according to former National Security Adviser
John Bolton, who declared himself ‘appalled’ that the president would interfere in a criminal
prosecution [157].
33
34
Chapter 2
■
Who Is the Opponent?
it will put any more trust in it. A number of countries, including Australia and
New Zealand, then banned Huawei equipment outright, and in 2019 Canada
arrested Huawei’s CFO (who is also its founder’s daughter) following a US
request to extradite her for conspiring to defraud global banks about Huawei’s
relationship with a company operating in Iran. China retaliated by arresting
two Canadians, one a diplomat on leave, on spurious espionage charges, and
by sentencing two others to death on drugs charges. The USA hit back with
a ban on US suppliers selling chips, software or support to Huawei. The UK
banned the purchase of their telecomms equipment from the end of 2020 and
said it would remove it from UK networks by 2027. Meanwhile, China is helping many less developed countries modernise their networks, and this access
may help them rival the Five Eyes’ scope in due course. Trade policy, industrial
policy and cyber-defence strategy have become intertwined in a new Cold War.
Strategically, the question may not be just whether China could use Huawei
routers to wiretap other countries at scale, so much as whether they could use
it in time of tension to launch DDoS attacks that would break the Internet by
subverting BGP routing. I discuss this in more detail in the section 21.2.1. For
years, China’s doctrine of ‘Peaceful Rise’ meant avoiding conflict with other
major powers until they’re strong enough. The overall posture is one of largely
defensive information warfare, combining pervasive surveillance at home, a
walled-garden domestic Internet that is better defended against cyber-attack
than anyone else’s, plus considerable and growing capabilities, which are
mainly used for diligent intelligence-gathering in support of national strategic
interests. They are starting to bully other countries in various ways that
sometimes involve online operations. In 2016, during a dispute with Vietnam
over some islands in the South China Sea, they hacked the airport systems
in Hanoi and Ho Chi Minh City, displaying insulting messages and forcing
manual check-in for passengers [1197]. In 2020, the EU has denounced China
for spreading disruptive fake news about the coronavirus pandemic [1580],
and Australia has denounced cyber-attacks that have happened since it
called for an international inquiry into the pandemic’s origins [937]. These
information operations displayed a first-class overt and covert disinformation
capability and followed previous more limited campaigns in Hong Kong
and Taiwan [564]. Diplomatic commentators note that China’s trade policy,
although aggressive, is no different from Japan’s in the 1970s and not as
aggressive as America’s; that the new Cold War is just as misguided and
just as likely to be wasteful and dangerous as the last one; that China still
upholds the international order more than it disrupts it; and that it upholds it
more consistently than the USA has done since WWII [704]. China’s external
propaganda aim is to present itself as a positive socio-economic role model
for the world, as it competes for access and influence and emerges as a peer
competitor to the USA and Europe.
2.2 Spies
2.2.3 Russia
Russia, like China, lacks America’s platform advantage and compensates with
hacking teams that use spear-phishing and malware. Unlike China, it takes
the low road, acting frequently as a spoiler, trying to disrupt the international
order, and sometimes benefiting directly via a rise in the price of oil, its main
export. The historian Timothy Snyder describes Putin’s rise to power and
his embrace of oligarchs, orthodox Christianity, homophobia and the fascist
ideologue Ivan Ilyin, especially since rigged elections in 2012. This leaves
the Russian state in need of perpetual struggle against external enemies who
threaten the purity of the Russian people [1802]. Its strategic posture online
is different from China’s in four ways. First, it’s a major centre for cybercrime;
underground markets first emerged in Russia and Ukraine in 2003–5, as we’ll
discuss in the following section on cybercrime. Second, although Russia is
trying to become more closed like China, its domestic Internet is relatively
open and intertwined with the West’s, including major service firms such as
VK and Yandex [605]. Third, Russia’s strategy of re-establishing itself as a
regional power has been pursued much more aggressively than China’s, with
direct military interference in neighbours such as Georgia and Ukraine. These
interventions have involved a mixed strategy of cyber-attacks plus ‘little green
men’ – troops without Russian insignia on their uniforms – with a political
strategy of denial. Fourth, Russia was humiliated by the USA and Europe
when the USSR collapsed in 1989, and still feels encircled. Since about 2005
its goal has been to undermine the USA and the EU, and to promote authoritarianism and nationalism as an alternative to the rules-based international
order. This has been pursued more forcefully since 2013; Snyder tells the history [1802]. With Brexit, and with the emergence of authoritarian governments
in Hungary, Turkey and Poland, this strategy appears to be winning.
Russian cyber-attacks came to prominence in 2007, after Estonia moved
a much-hated Soviet-era statue in Tallinn to a less prominent site, and the
Russians felt insulted. DDoS attacks on government offices, banks and media
companies forced Estonia to rate-limit its external Internet access for a few
weeks [692]. Russia refused to extradite the perpetrators, most of whom were
Russian, though one ethnic-Russian Estonian teenager was fined. Sceptics
said that the attacks seemed the work of amateurs and worked because the
Estonians hadn’t hardened their systems the way US service providers do.
Estonia nonetheless appealed to NATO for help, and one outcome was the
Tallinn Manual, which sets out the law of cyber conflict [1667]. I’ll discuss
this in more detail in the chapter on electronic and information warfare, in
section 23.8. The following year, after the outbreak of a brief war between
Russia and Georgia, Russian hackers set up a website with a list of targets in
Georgia for Russian patriots to attack [1994].
35
36
Chapter 2
■
Who Is the Opponent?
Estonia and Georgia were little more than warm-ups for the Ukraine invasion. Following demonstrations in Maidan Square in Kiev against pro-Russian
President Yanukovich, and an intervention in February 2014 by Russian
mercenaries who shot about a hundred demonstrators, Yanukovich fled. The
Russians invaded Ukraine on February 24th, annexing Crimea and setting
up two puppet states in the Donbass area of eastern Ukraine. Their tactics
combined Russian special forces in plain uniforms, a welter of propaganda
claims of an insurgency by Russian-speaking Ukrainians or of Russia helping
defend the population against Ukrainian fascists or of defending Russian
purity against homosexuals and Jews; all of this coordinated with a variety
of cyber-attacks. For example, in May the Russians hacked the website of
the Ukrainian election commission and rigged it to display a message that a
nationalist who’d received less than 1% of the vote had won; this was spotted
and blocked, but Russian media announced the bogus result anyway [1802].
The following year, as the conflict dragged on, Russia took down 30 electricity substations on three different distribution systems within half an hour
of each other, leaving 230,000 people without electricity for several hours.
They involved multiple different attack vectors that had been implanted over
a period of months, and since they followed a Ukrainian attack on power
distribution in Crimea – and switched equipment off when they could have
destroyed it instead – seemed to have been intended as a warning [2070]. This
attack was still tiny compared with the other effects of the conflict, which
included the shooting down of a Malaysian Airlines airliner with the loss
of all on board; but it was the first cyber-attack to disrupt mains electricity.
Finally on June 27 2017 came the NotPetya attack – by far the most damaging
cyber-attack to date [814].
The NotPetya worm was initially distributed using the update service
for MeDoc, the accounting software used by the great majority of Ukrainian
businesses. It then spread laterally in organisations across Windows file-shares
using the EternalBlue vulnerability, an NSA exploit with an interesting history.
From March 2016, a Chinese gang started using it against targets in Vietnam,
Hong Kong and the Philippines, perhaps as a result of finding and reverse
engineering it (it’s said that you don’t launch a cyberweapon; you share it).
It was leaked by a gang called the ‘Shadow Brokers’ in April 2017, along
with other NSA software that the Chinese didn’t deploy, and then used by
the Russians in June. The NotPetya worm used EternalBlue together with the
Mimikatz tool that recovers passwords from Windows memory. The worm’s
payload pretended to be ransomware; it encrypted the infected computer’s
hard disk and demanded a ransom of $300 in bitcoin. But there was no
mechanism to decrypt the files of computer owners who paid the ransom, so
it was really a destructive service-denial worm. The only way to deal with it
was to re-install the operating system and restore files from backup.
2.2 Spies
The NotPetya attack took down banks, telcos and even the radiation monitoring systems at the former Chernobyl nuclear plant. What’s more, it spread
from Ukraine to international firms who had offices there. The world’s largest
container shipping company, Maersk, had to replace most of its computers and
compensate customers for late shipments, at a cost of $300m; FedEx also lost
$300m, and Mondelez $100m. Mondelez’ insurers refused to pay out on the
ground that it was an ‘Act of War’, as the governments of Ukraine, the USA and
the UK all attributed NotPetya to Russian military intelligence, the GRU [1234].
2016 was marked by the Brexit referendum in the UK and the election of
President Trump in the USA, in both of which there was substantial Russian
interference. In the former, the main intervention was financial support
for the leave campaigns, which were later found to have broken the law by
spending too much [1267]; this was backed by intensive campaigning on social
media [365]. In the latter, Russian interference was denounced by President
Obama during the campaign, leading to renewed economic sanctions, and by
the US intelligence community afterwards. An inquiry by former FBI director
Robert Mueller found that Russia interfered very widely via the disinformation and social media campaigns run by its Internet Research Agency ‘troll
farm’, and by the GRU which hacked the emails of the Democratic national
and campaign committees, most notably those of the Clinton campaign chair
John Podesta. Some Trump associates went to jail for various offences.
As I’ll discuss in section 26.4.2, it’s hard to assess the effects of such interventions. On the one hand, a report to the US Senate’s Committee on Foreign
Relations sets out a story of a persistent Russian policy, since Putin came to
power, to undermine the influence of democratic states and the rules-based
international order, promoting authoritarian governments of both left and
right, and causing trouble where it can. It notes that European countries
use broad defensive measures including bipartisan agreements on electoral
conduct and raising media literacy among voters; it recommends that these
be adopted in the USA as well [387]. On the other hand, Yochai Benkler
cautions Democrats against believing that Trump’s election was all Russia’s
fault; the roots of popular disaffection with the political elite are much older
and deeper [228]. Russia’s information war with the West predates Putin;
it continues the old USSR’s strategy of weakening the West by fomenting
conflict via a variety of national liberation movements and terrorist groups
(I discuss the information-warfare aspects in section 23.8.3). Timothy Snyder
places this all in the context of modern Russian history and politics [1802]; his
analysis also outlines the playbook for disruptive information warfare against
a democracy. It’s not just about hacking substations, but about hacking voters’
minds; about undermining trust in institutions and even in facts, exploiting
social media and recasting politics as showbusiness. Putin is a judo player;
judo’s about using an opponent’s strength and momentum to trip them up.
37
38
Chapter 2
■
Who Is the Opponent?
2.2.4 The rest
The rest of the world’s governments have quite a range of cyber capabilities,
but common themes, including the nature and source of their tools. Middle
Eastern governments were badly shaken by the Arab Spring uprisings, and
some even turned off the Internet for a while, such as Libya in April–July
2010, when rebels were using Google maps to generate target files for US, UK
and French warplanes. Since then, Arab states have developed strategies that
combine spyware and hacking against high-profile targets, through troll farms
pumping out abusive comments in public fora, with physical coercion.
The operations of the United Arab Emirates were described in 2019 by a
whistleblower, Lori Stroud [248]. An NSA analyst – and Ed Snowden’s former
boss – she was headhunted by a Maryland contractor in 2014 to work in Dubai
as a mercenary, but left after the UAE’s operations started to target Americans.
The UAE’s main technique was spear-phishing with Windows malware, but
their most effective tool, called Karma, enabled them to hack the iPhones of
foreign statesmen and local dissidents. They also targeted foreigners critical of
the regime. In one case they social-engineered a UK grad student into installing
spyware on his PC on the pretext that it would make his communications hard
to trace. The intelligence team consisted of several dozen people, both mercenaries and Emiratis, in a large villa in Dubai. The use of iPhone malware by the
UAE government was documented by independent observers [1221].
In 2018, the government of Saudi Arabia murdered the Washington Post
journalist Jamal Khashoggi in its consulate in Istanbul. The Post campaigned
to expose Saudi crown prince Mohammed bin Salman as the man who gave
the order, and in January 2019 the National Enquirer published a special
edition containing texts showing that the Post’s owner Jeff Bezos was having
an affair. Bezos pre-empted the Enquirer by announcing that he and his wife
were divorcing, and hired an investigator to find the source of the leak. The
Enquirer had attempted to blackmail Bezos over some photos it had also
obtained; it wanted both him and the investigator to declare that the paper
hadn’t relied upon ‘any form of electronic eavesdropping or hacking in their
news-gathering process’. Bezos went public instead. According to the investigator, his iPhone had been hacked by the Saudi Arabian government [200]; the
malicious WhatsApp message that did the damage was sent from the phone of
the Crown Prince himself [1055]. The US Justice Department later charged two
former Twitter employees with spying, by disclosing to the Saudis personal
account information of people who criticised their government [1502].
An even more unpleasant example is Syria, where the industrialisation
of brutality is a third approach to scaling information collection. Malware
attacks on dissidents were reported from 2012, and initially used a variety
of spear-phishing lures. As the civil war got underway, police who were
arresting suspects would threaten female family members with rape on the
2.2 Spies
spot unless the suspect disclosed his passwords for mail and social media.
They would then spear-phish all his contacts while he was being taken away
in the van to the torture chamber. This victim-based approach to attack scaling
resulted in the compromise of many machines not just in Syria but in America
and Europe. The campaigns became steadily more sophisticated as the war
evolved, with false-flag attacks, yet retained a brutal edge with some tools
displaying beheading videos [737].
Thanks to John Scott-Railton and colleagues at Toronto, we have many
further documented examples of online surveillance, computer malware and
phone exploits being used to target dissidents; many in Middle Eastern and
African countries but also in Mexico and indeed in Hungary [1221]. The
real issue here is the ecosystem of companies, mostly in the USA, Europe
and Israel, that supply hacking tools to unsavoury states. These tools range
from phone malware, through mass-surveillance tools you use on your own
network against your own dissidents, to tools that enable you to track and
eavesdrop on phones overseas by abusing the signaling system [489]. These
tools are used by dictators to track and monitor their enemies in the USA
and Europe.
NGOs have made attempts to push back on this cyber arms trade. In
one case NGOs argued that the Syrian government’s ability to purchase
mass-surveillance equipment from the German subsidiary of a UK company
should be subject to export control, but the UK authorities were unwilling to
block it. GCHQ was determined that if there were going to be bulk surveillance
devices on President Assad’s network, they should be British devices rather
than Ukrainian ones. (I describe this in more detail later in section 26.2.8.) So
the ethical issues around conventional arms sales persist in the age of cyber;
indeed they can be worse because these tools are used against Americans,
Brits and others who are sitting at home but who are unlucky enough to be
on the contact list of someone an unpleasant government doesn’t like. In the
old days, selling weapons to a far-off dictator didn’t put your own residents
in harm’s way; but cyber weapons can have global effects.
Having been isolated for years by sanctions, Iran has developed an indigenous cyber capability, drawing on local hacker forums. Like Syria, its main
focus is on intelligence operations, particularly against dissident Iranians,
both at home and overseas. It has also been the target of US and other attacks
of which the best known was Stuxnet, after which it traced the CIA’s covert
communications network and rounded up a number of agents [578]. It has
launched both espionage operations and attacks of its own overseas. An
example of the former was its hack of the Diginotar CA in the Netherlands
which enabled it to monitor dissidents’ Gmail; while its Shamoon malware
damaged thousands of PCs at Aramco, Saudi Arabia’s national oil company.
The history of Iranian cyber capabilities is told by Collin Anderson and Karim
Sadjadpour [50]. Most recently, it attacked Israeli water treatment plants in
39
40
Chapter 2
■
Who Is the Opponent?
April 2020; Israel responded the following month with an attack on the Iranian
port of Bandar Abbas [230].
Finally, it’s worth mentioning North Korea. In 2014, after Sony Pictures
started working on a comedy about a plot to assassinate the North Korean
leader, a hacker group trashed much of Sony’s infrastructure, released embarrassing emails that caused its top film executive Amy Pascal to resign, and
leaked some unreleased films. This was followed by threats of terrorist attacks
on movie theatres if the comedy were put on general release. The company
put the film on limited release, but when President Obama criticised them for
giving in to North Korean blackmail, they put it on full release instead.
In 2017, North Korea again came to attention after their Wannacry worm
infected over 200,000 computers worldwide, encrypting data and demanding
a bitcoin ransom – though like NotPetya it didn’t have a means of selective
decryption, so was really just a destructive worm. It used the NSA EternalBlue vulnerability, like NotPetya, but was stopped when a malware researcher
discovered a kill switch. In the meantime it had disrupted production at carmakers Nissan and Renault and at the Taiwanese chip foundry TSMC, and
also caused several hospitals in Britain’s National Health Service to close their
accident and emergency units. In 2018, the US Department of Justice unsealed
an indictment of a North Korean government hacker for both incidents, and
also for a series of electronic bank robberies, including of $81m from the Bank
of Bangladesh [1656]. In 2019, North Korean agents were further blamed, in a
leaked United Nations report, for the theft of over $1bn from cryptocurrency
exchanges [348].
2.2.5 Attribution
It’s often said that cyber is different, because attribution is hard. As a general
proposition this is untrue; anonymity online is much harder than you think.
Even smart people make mistakes in operational security that give them away,
and threat intelligence companies have compiled a lot of data that enable
them to attribute even false-flag operations with reasonable probability in
many cases [181]. Yet sometimes it may be true, and people still point to the
Climategate affair. Several weeks before the 2009 Copenhagen summit on
climate change, someone published over a thousand emails, mostly sent to or
from four climate scientists at the University of East Anglia, England. Climate
sceptics seized on some of them, which discussed how to best present evidence
of global warming, as evidence of a global conspiracy. Official inquiries later
established that the emails had been quoted out of context, but the damage
had been done. People wonder whether the perpetrator could have been the
Russians or the Saudis or even an energy company. However one of the more
convincing analyses suggests that it was an internal leak, or even an accident;
2.3 Crooks
only one archive file was leaked, and its filename (FOIA2009.zip) suggests it
may have been prepared for a freedom-of-information disclosure in any case.
The really interesting thing here may be how the emails were talked up into a
conspiracy theory.
Another possible state action was the Equifax hack. The initial story was that
on 8th March 2017, Apache warned of a vulnerability in Apache Struts and
issued a patch; two days later, a gang started looking for vulnerable systems;
on May 13th, they found that Equifax’s dispute portal had not been patched,
and got in. The later story, in litigation, was that Equifax had used the default
username and password ‘admin’ for the portal [354]. Either way, the breach had
been preventable; the intruders found a plaintext password file giving access
to 51 internal database systems, and spent 76 days helping themselves to the
personal information of at least 145.5 million Americans before the intrusion
was reported on July 29th and access blocked the following day. Executives
sold stock before they notified the public on September 7th; Congress was outraged, and the CEO Rick Smith was fired. So far, so ordinary. But no criminal
use has been made of any of the stolen information, which led analysts at the
time to suspect that the perpetrator was a nation-state actor seeking personal
data on Americans at scale [1446]; in due course, four members of the Chinese
military were indicted for it [552].
In any case, the worlds of intelligence and crime have long been entangled,
and in the cyber age they seem to be getting more so. We turn to cybercrime next.
2.3 Crooks
Cybercrime is now about half of all crime, both by volume and by value,
at least in developed countries. Whether it is slightly more or less than half
depends on definitions (do you include tax fraud now that tax returns are
filed online?) and on the questions you ask (do you count harassment and
cyber-bullying?) – but even with narrow definitions, it’s still almost half. Yet
the world’s law-enforcement agencies typically spend less than one percent of
their budgets on fighting it. Until recently, police forces in most jurisdictions
did their best to ignore it; in the USA, it was dismissed as ‘identity theft’ and
counted separately, while in the UK victims were told to complain to their
bank instead of the police from 2005–15. The result was that as crime went
online, like everything else, the online component wasn’t counted and crime
appeared to fall. Eventually, though, the truth emerged in those countries that
have started to ask about fraud in regular victimisation surveys12 .
12 The
USA, the UK, Australia, Belgium and France
41
42
Chapter 2
■
Who Is the Opponent?
Colleagues and I run the Cambridge Cybercrime Centre where we collect
and curate data for other researchers to use, ranging from spam and phish
through malware and botnet command-and-control traffic to collections of
posts to underground crime forums. This section draws on a survey we did in
2019 of the costs of cybercrime and how they’ve been changing over time [92].
Computer fraud has been around since the 1960s, a notable early case being
the Equity Funding insurance company which from 1964-72 created more than
60,000 bogus policies which it sold to reinsurers, creating a special computer
system to keep track of them all. Electronic frauds against payment systems
have been around since the 1980s, and spam arrived when the Internet was
opened to all in the 1990s. Yet early scams were mostly a cottage industry,
where individuals or small groups collected credit card numbers, then forged
cards to use in shops, or used card numbers to get mail-order goods. Modern cybercrime can probably be dated to 2003–5 when underground markets
emerged that enabled crooks to specialise and get good at their jobs, just as
happened in the real economy with the Industrial Revolution.
To make sense of cybercrime, it’s convenient to consider the shared infrastructure first, and then the main types of cybercrime that are conducted for
profit. There is a significant overlap with the crimes committed by states that
we considered in the last section, and those committed by individuals against
other individuals that we’ll consider in the next one; but the actors’ motives
are a useful primary filter.
2.3.1 Criminal infrastructure
Since about 2005, the emergence of underground markets has led to people
specialising as providers of criminal infrastructure, most notably botnet
herders, malware writers, spam senders and cashout operators. I will discuss
the technology in much greater detail in section 21.3; in this section my focus
is on the actors and the ecosystem in which they operate. Although this
ecosystem consists of perhaps a few thousand people with revenues in the
tens to low hundreds of millions, they impose costs of many billions on the
industry and on society. Now that cybercrime has been industrialised, the
majority of ‘jobs’ are now in boring roles such as customer support and system
administration, including all the tedious setup work involved in evading
law enforcement takedowns [456]. The ‘firms’ they work for specialise; the
entrepreneurs and technical specialists can make real money. (What’s more,
the cybercrime industry has been booming during the coronavirus pandemic.)
2.3.1.1 Botnet herders
The first botnets – networks of compromised computers – may have been seen
in 1996 with an attack on the ISP Panix in New York, using compromised Unix
2.3 Crooks
machines in hospitals to conduct a SYN flood attack [370]. The next use was
spam, and by 2000 the Earthlink spammer sent over a million phishing emails;
its author was sued by Earthlink. Once cyber-criminals started to get organised, there was a significant scale-up. We started to see professionally built and
maintained botnets that could be rented out by bad guys, whether spammers,
phishermen or others; by 2007 the Cutwail botnet was sending over 50 million
spams a minute from over a million infected machines [1836]. Bots would initially contact a command-and-control server for instructions; these would be
taken down, or taken over by threat intelligence companies for use as sinkholes
to monitor infected machines, and to feed lists of them to ISPs and corporates.
The spammers’ first response was peer-to-peer botnets. In 2007 Storm suddenly grew to account for 8% of all Windows malware; it infected machines
mostly by malware in email attachments and had them use the eDonkey
peer-to-peer network to find other infected machines. It was used not just
for spam but for DDoS, for pump-and-dump stock scams and for harvesting
bank credentials. Defenders got lots of peers to join this network to harvest
lists of bot addresses, so the bots could be cleaned up, and by late 2008 Storm
had been cut to a tenth of the size. It was followed by Kelihos, a similar
botnet that also stole bitcoins; its creator, a Russian national, was arrested
while on holiday in Spain in 2017 and extradited to the USA where he pled
guilty in 2018 [661].
The next criminal innovation arrived with the Conficker botnet: the domain
generation algorithm (DGA). Conficker was a worm that spread by exploiting a Windows network service vulnerability; it generated 250 domain names
every day, and infected machines would try them all out in the hope that the
botmaster had managed to rent one of them. Defenders started out by simply
buying up the domains, but a later variant generated 50,000 domains a day
and an industry working group made agreements with registrars that these
domains would simply be put beyond use. By 2009 Conficker had grown so
large, with maybe ten million machines, that it was felt to pose a threat to the
largest websites and perhaps even to nation states. As with Storm, its use of
randomisation proved to be a two-edged sword; defenders could sit on a subset of the domains and harvest feeds of infected machines. By 2015 the number
of infected machines had fallen to under a million.
Regardless of whether something can be done to take out the command-andcontrol system, whether by arresting the botmaster or by technical tricks, the
universal fix for botnet infections is to clean up infected machines. But this
raises many issues of scale and incentives. While AV companies make tools
available, and Microsoft supplies patches, many people don’t use them. So long
as your infected PC is merely sending occasional spam but works well enough
otherwise, why should you go to the trouble of doing anything? But bandwidth
costs ISPs money, so the next step was that some ISPs, particularly the cable
companies like Comcast, would identify infected machines and confine their
43
44
Chapter 2
■
Who Is the Opponent?
users to a ‘walled garden’ until they promised to clean up. By 2019 that has
become less common as people now have all sorts of devices on their wifi,
many of which have no user interface; communicating with human users has
become harder.
In 2020, we find many botnets with a few tens of thousands of machines that
are too small for most defenders to care about, plus some large ones that tend
to be multilayer – typically with peer-to-peer mechanisms at the bottom that
enable the footsoldier bots to communicate with a few control nodes, which
in turn use a domain generation algorithm to find the botmaster. Fragmenting
the footsoldiers into a number of small botnets makes it hard for defenders to
infiltrate all of them, while the control nodes may be located in places that are
hard for defenders to get at. The big money for such botnets in 2020 appears to
be in clickfraud.
The latest innovation is Mirai, a family of botnets that exploit IoT devices.
The first Mirai worm infected CCTV cameras that had been manufactured
by Xiaomi and that had a known factory default password that couldn’t
be changed. Mirai botnets scan the Internet’s IPv4 address space for other
vulnerable devices which typically get infected within minutes of being
powered up. The first major attack was on DynDNS and took down Twitter
for six hours on the US eastern seaboard in October 2016. Since then there
have been over a thousand variants, which researchers study to determine
what’s changed and to work out what countermeasures might be used.
At any one time, there may be half a dozen large botnet herders. The Mirai
operators, for example, seem to be two or three groups that might have
involved a few dozen people.
2.3.1.2 Malware devs
In addition to the several hundred software engineers who write malware
for the world’s intelligence agencies and their contractors, there may be
hundreds of people writing malware for the criminal market; nobody really
knows (though we can monitor traffic on hacker forums to guess the order of
magnitude).
Within this community there are specialists. Some concentrate on turning
vulnerabilities into exploits, a nontrivial task for modern operating systems
that use stack canaries, ASLR and other techniques we’ll discuss later in
section 6.4.1. Others specialise in the remote access Trojans that the exploits
install; others build the peer-to-peer and DGA software for resilient commandand-control communications; yet others design specialised payloads for bank
fraud. The highest-value operations seem to be platforms that are maintained
with constant upgrades to cope with the latest countermeasures from the
anti-virus companies. Within each specialist market segment there are typically a handful of operators, so that when we arrest one of them it makes a
2.3 Crooks
difference for a while. Some of the providers are based in jurisdictions that
don’t extradite their nationals, like Russia, and Russian crimeware is used not
just by Russian state actors but by others too.
As Android has taken over from Windows as the most frequently used
operating system we’ve seen a rise in Android malware. In China and in
countries with a lot of second-hand and older phones, this may be software
that uses an unpatched vulnerability to root an Android phone; the USA and
Europe have lots of unpatched phones (as many OEMs stop offering patches
once a phone is no longer on sale) but it’s often just apps that do bad things,
such as stealing SMSes used to authenticate banking transactions.
2.3.1.3 Spam senders
Spamming arrived on a small scale when the Internet opened to the public
in the mid-1990s, and by 2000 we saw the Earthlink spammer making millions from sending phishing lures. By 2010 spam was costing the world’s ISPs
and tech companies about $1bn a year in countermeasures, but it earned its
operators perhaps one percent of that. The main beneficiaries may have been
webmail services such as Yahoo, Hotmail and Gmail, which can operate better
spam filters because of scale; during the 2010s, hundreds of millions of people
switched to using their services.
Spam is now a highly specialised business, as getting past modern spam
filters requires a whole toolbox of constantly-changing tricks. If you want to
use spam to install ransomware, you’re better off paying an existing service
than trying to learn it all from scratch. Some spam involves industrial-scale
email compromise, which can be expensive for the victim; some $350m was
knocked off the $4.8bn price at which Yahoo was sold to Verizon after a bulk
compromise [772].
2.3.1.4 Bulk account compromise
Some botnets are constantly trying to break into email and other online
accounts by trying to guess passwords and password recovery questions.
A large email service provider might be recovering several tens of thousands
of accounts every day. There are peaks, typically when hackers compromise
millions of email addresses and passwords at one website and then try them
out at all the others. In 2019, this credential stuffing still accounts for the largest
number of attempted account compromises by volume [1885]. Compromised
accounts are sold on to people who exploit them in various ways. Primary
email accounts often have recovery information for other accounts, including
bank accounts if the attacker is lucky. They can also be used for scams such as
the stranded traveler, where the victim emails all their friends saying they’ve
45
46
Chapter 2
■
Who Is the Opponent?
been robbed in some foreign city and asking for urgent financial help to pay
the hotel bill. If all else fails, compromised email accounts can be used to
send spam.
A variant on the theme is the pay-per-install service, which implants malware
on phones or PCs to order and at scale. This can involve a range of phishing
lures in a variety of contexts, from free porn sites that ask you to install a special
viewer, to sports paraphernalia offers and news about topical events. It can
also use more technical means such as drive-by downloads. Such services are
often offered by botnets which need them to maintain their own numbers; they
might charge third party customers $10-15 per thousand machines infected in
the USA and Europe, and perhaps $3 for Asia.
2.3.1.5 Targeted attackers
We’ve seen the emergence of hack-for-hire operators who will try to compromise a specific target account for a fee, of typically $750 [1885]. They will
investigate the target, make multiple spear-phishing attempts, try password
recovery procedures, and see if they can break in through related accounts.
This continues a tradition of private eyes who traditionally helped in divorce
cases and also stalked celebrities on behalf of red-top newspapers – though
with even fewer ethical constraints now that services can be purchased anonymously online. John Scott-Railton and colleagues exposed the workings of
Dark Basin, a hack-for-hire company that had targeted critics of ExxonMobil,
and also net neutrality advocates, and traced it to a company in India [1695].
In recent years, targeted attacks have also been used at scale against small
business owners and the finance staff of larger firms in order to carry out various kinds of payment fraud, as I’ll discuss below in 2.3.2.
2.3.1.6 Cashout gangs
Back in the twentieth century, people who stole credit card numbers would
have to go to the trouble of shopping for goods and then selling them to get
money out. Nowadays there are specialists who buy compromised bank credentials on underground markets and exploit them. The prices reveal where
the real value lies in the criminal chain; a combination of credit card number
and expiry date sells for under a dollar, and to get into the single dollars you
need a CVV, the cardholder’s name and address, and more.
Cashout techniques change every few years, as paths are discovered through
the world’s money-laundering controls, and the regulations get tweaked to
block them. Some cashout firms organise armies of mules to whom they transfer some of the risk. Back in the mid-2000s, mules could be drug users who
would go to stores and buy goods with stolen credit cards; then there was a
period when unwitting mules were recruited by ads promising large earnings
2.3 Crooks
to ‘agents’ to represent foreign companies but who were used to remit stolen
funds through their personal bank accounts. The laundrymen next used Russian banks in Latvia, to which Russian mules would turn up to withdraw cash.
Then Liberty Reserve, an unlicensed digital currency based in Costa Rica, was
all the rage until it was closed down and its founder arrested in 2013. Bitcoin
took over for a while but its popularity with the cybercrime community tailed
off as its price became more volatile, as the US Department of the Treasury
started arm-twisting bitcoin exchanges into identifying their customers.
As with spam, cashout is a constantly evolving attack-defence game. We
monitor it and analyse the trends using CrimeBB, a database we’ve assembled
of tens of millions of posts in underground hacker forums where cybercriminals buy and sell services including cashout [1501]. It also appears to
favour gangs who can scale up, until they get big enough to attract serious
law-enforcement attention: in 2020, one Sergey Medvedev pleaded guilty to
inflicting more than $568 million in actual losses over the period 2010–15 [1932].
2.3.1.7 Ransomware
One reason for the decline in cryptocurrency may have been the growth
of ransomware, and as the gangs involved in this switched to payment
methods that are easier for victims to use. By 2016–17, 42% of ransomware
encountered by US victims demanded prepaid vouchers such as Amazon gift
cards; 14% demanded wire transfers and only 12% demanded cryptocurrency;
a lot of the low-end ransomware aimed at consumers is now really scareware as it doesn’t actually encrypt files at all [1746]. Since 2017, we’ve seen
ransomware-as-a-service platforms; the operators who use these platforms
are often amateurs and can’t decrypt even if you’re willing to pay.
Meanwhile a number of more professional gangs penetrate systems, install
ransomware, wait until several days or weeks of backup data have been
encrypted and demand substantial sums of bitcoin. This has grown rapidly
over 2019–20, with the most high-profile ransomware victims in the USA being
public-sector bodies; several hundred local government bodies and a handful
of hospitals have suffered service failures [356]. During the pandemic, more
hospitals have been targeted; the medical school at UCSF paid over $1m [1482].
It’s an international phenomenon, though, and many private-sector firms
fall victim too. Ransomware operators have also been threatening large-scale
leaks of personal data to bully victims into paying.
2.3.2 Attacks on banking and payment systems
Attacks on card payment systems started with lost and stolen cards, with
forgery at scale arriving in the 1980s; the dotcom boom ramped things up
further in the 1990s as many businesses started selling online with little idea
47
48
Chapter 2
■
Who Is the Opponent?
of how to detect fraud; and it was card fraud that spawned underground
markets in the mid-2000s as criminals sought ways to buy and sell stolen card
numbers as well as related equipment and services.
Another significant component is pre-issue fraud, known in the USA as ‘identity theft’ [670], where criminals obtain credit cards, loans and other assets in
your name and leave you to sort out the mess. I write ‘identity theft’ in quotes as
it’s really just the old-fashioned offence of impersonation. Back in the twentieth
century, if someone went to a bank, pretended to be me, borrowed money from
them and vanished, then that was the bank’s problem, not mine. In the early
twenty-first, banks took to claiming that it’s your identity that’s been stolen
rather than their money [1730]. There is less of that liability dumping now, but
the FBI still records much cybercrime as ‘identity theft’ which helps keep it out
of the mainstream US crime statistics.
The card fraud ecosystem is now fairly stable. Surveys in 2011 and 2019 show
that while card fraud doubled over the decade, the loss fell slightly as a percentage of transaction value [91, 92]; the system has been getting more efficient as it
grows. Many card numbers are harvested in hacking attacks on retailers, which
can be very expensive for them once they’ve paid to notify affected customers
and reimburse banks for reissued cards. As with the criminal infrastructure,
the total costs may be easily two orders of magnitude greater than anything
the criminals actually get away with.
Attacks on online banking ramped up in 2005 with the arrival of large-scale
phishing attacks; emails that seemed to come from banks drove customers to
imitation bank websites that stole their passwords. The banks responded with
techniques such as two-factor authentication, or the low-cost substitute of asking for only a few letters of the password at a time; the crooks’ response, from
about 2009, has been credential-stealing malware. Zeus and later Trojans lurk
on a PC until the user logs on to a bank whose website they recognise; they then
make payments to mule accounts and hide their activity from the user – the
so-called ‘man-in-the-browser attack’. (Some Trojans even connect in real time
to a human operator.) The crooks behind the Zeus and later the Dridex banking
malware were named and indicted by US investigators in December 2019, and
accused of stealing some $100m, but they remain at liberty in Russia [796].
Other gangs have been broken up and people arrested for such scams, which
continue to net in the hundreds of millions to low billions a year worldwide.
Firms also have to pay attention to business email compromise, where a
crook compromises a business email account and tells a customer that their
bank account number has changed; or where the crook impersonates the CEO
and orders a financial controller to make a payment; and social engineering
attacks by people pretending to be from your bank who talk you into releasing
a code to authorise a payment. Most targeted attacks on company payment
systems can in theory be prevented by the control procedures that most large
firms already have, and so the typical target is a badly-run large firm, or a
2.3 Crooks
medium-sized firm with enough money to be worth stealing but not enough
control to lock everything down.
I’ll discuss the technicalities of such frauds in Chapter 12, along with a growing number of crimes that directly affect only banks, their regulators and their
retail customers. I’ll also discuss cryptocurrencies, which facilitate cybercrimes
from ransomware to stock frauds, in Chapter 20.
2.3.3 Sectoral cybercrime ecosystems
A number of sectors other than banking have their own established cybercrime
scenes. One example is travel fraud. There’s a whole ecosystem of people who
sell fraudulently obtained air tickets, which are sometimes simply bought with
stolen credit card numbers, sometimes obtained directly by manipulating or
hacking the systems of travel agents or airlines, sometimes booked by corrupt
staff at these firms, and sometimes scammed from the public directly by stealing their air miles. The resulting cut-price tickets are sold directly using spam
or through various affiliate marketing scams. Some of the passengers who use
them to fly know they’re dubious, while others are dupes – which makes it hard
to deal with the problem just by arresting people at the boarding gate. (The
scammers also supply tickets at the last minute, so that the alarms are usually
too late.) For an account and analysis of travel fraud, see Hutchings [938]. An
increasing number of other business sectors are acquiring their own dark side,
and I will touch on some of them in later chapters.
2.3.4 Internal attacks
Fraud by insiders has been an issue since businesses started hiring people.
Employees cheat the firm, partners cheat each other, and firms cheat their
shareholders. The main defence is bookkeeping. The invention of double-entry
bookkeeping, of which our earliest records are from the Cairo of a thousand
years ago, enabled businesses to scale up beyond the family that owned them.
This whole ecosystem is evolving as technology does, and its design is driven
by the Big Four accounting firms who make demands on their audit clients
that in turn drive the development of accounting software and the supporting
security mechanisms. I discuss all this at length in Chapter 12. There are also
inside attacks involving whistleblowing, which I discuss below.
2.3.5 CEO crimes
Companies attack each other, and their customers too. From the 1990s, printer
vendors have used cryptography to lock their customers in to using proprietary ink cartridges, as I describe in section 24.6, while companies selling refills
have been breaking the crypto. Games console makers have been playing
49
50
Chapter 2
■
Who Is the Opponent?
exactly the same game with aftermarket vendors. The use of cryptography for
accessory control is now pervasive, being found even on water filter cartridges
in fridges [1073]. Many customers find this annoying and try to circumvent
the controls. The US courts decided in the Lexmark v SCC case that this was
fine: the printer vendor Lexmark sued SCC, a company that sold clones of
its security chips to independent ink vendors, but lost. So the incumbent
can now hire the best cryptographers they can find to lock their products,
while the challenger can hire the best cryptanalysts they can find to unlock
them – and customers can hack them any way they can. Here, the conflict is
legal and open. As with state actors, corporates sometimes assemble teams
with multiple PhDs, millions of dollars in funding, and capital assets such as
electron microscopes13 . We discuss this in greater detail later in section 24.6.
Not all corporate attacks are conducted as openly. Perhaps the best-known
covert hack was by Volkswagen on the EU and US emissions testing schemes;
diesel engines sold in cars were programmed to run cleanly if they detected
the standard emission test conditions, and efficiently otherwise. For this, the
CEO of VW was fired and indicted in the USA (to which Germany won’t
extradite him), while the CEO of Audi was fired and jailed in Germany [1086].
VW has set aside €25bn to cover criminal and civil fines and compensation.
Other carmakers were cheating too; Daimler was fined €860m in Europe
in 2019 [1468], and in 2020 reached a US settlement consisting of a fine of
$1.5bn from four government agencies plus a class action of $700m [1859].
Settlements for other manufacturers and other countries are in the pipeline.
Sometimes products are designed to break whole classes of protection system, an example being the overlay SIM cards described later in Chapter 12.
These are SIM cards with two sides and only 160 microns thick, which you
stick on top of the SIM card in your phone to provide a second root of trust; they
were designed to enable people in China to defeat the high roaming charges of
the early 2010s. The overlay SIM essentially does a man-in-the-middle attack
on the real SIM, and can be programmed in Javacard. A side-effect is that such
SIMs make it really easy to do some types of bank fraud.
So when putting together the threat model for your system, stop and think
what capable motivated opponents you might have among your competitors,
or among firms competing with suppliers on which products you depend. The
obvious attacks include industrial espionage, but nowadays it’s much more
complex than that.
2.3.6 Whistleblowers
Intelligence agencies, and secretive firms, can get obsessive about ‘the insider
threat’. But in 2018, Barclays Bank’s CEO was fined £642,000 and ordered to
13
Full disclosure: both our hardware lab and our NGO activities have on occasion received funding from such actors.
2.3 Crooks
repay £500,000 of his bonus for attempting to trace a whistleblower in the
bank [698]. So let’s turn it round and look at it from the other perspective – that
of the whistleblower. Many are trying to do the right thing, often at a fairly
mundane level such as reporting a manager who’s getting bribes from suppliers or who is sexually harassing staff. In regulated industries such as banking they may have a legal duty to report wrongdoing and legal immunity
against claims of breach of confidence by their employer. Even then, they often
lose because of the power imbalance; they get fired and the problem goes on.
Many security engineers think the right countermeasure to leakers is technical,
such as data loss prevention systems, but robust mechanisms for staff to report
wrongdoing are usually more important. Some organisations, such as banks,
police forces and online services, have mechanisms for reporting crimes by
staff but no effective process for raising ethical concerns about management
decisions14 .
But even basic whistleblowing mechanisms are often an afterthought;
they typically lead the complainant to HR rather than to the board’s audit
committee. External mechanisms may be little better. One big service firm
ran a “Whistle-blowing hotline” for its clients in 2019; but the web page code
has trackers from LinkedIn, Facebook and Google, who could thus identify
unhappy staff members, and also JavaScript from CDNs, littered with cookies
and referrers from yet more IT companies. No technically savvy leaker would
use such a service. At the top end of the ecosystem, some newspapers offer
ways for whistleblowers to make contact using encrypted email. But the
mechanisms tend to be clunky and the web pages that promote them do not
always educate potential leakers about either the surveillance risks, or the
operational security measures that might counter them. I discuss the usability
and support issues around whistleblowing in more detail in section 25.4.
This is mostly a policy problem rather than a technical one. It’s difficult
to design a technical mechanism whereby honest staff can blow the whistle
on abuses that have become ingrained in an organisation’s culture, such
as pervasive sexual harassment or financial misconduct. In most cases, it’s
immediately clear who the whistleblower is, so the critical factor is whether
the whistleblower will get external support. For example, will they ever get
another job? This isn’t just a matter of formal legal protection but also of culture. For example, the rape conviction of Harvey Weinstein empowered many
women to protest about sexual harassment and discrimination; hopefully the
Black Lives Matter protests will similarly empower people of colour [32].
An example where anonymity did help, though, was the UK parliamentary
expenses scandal of 2008–9. During a long court case about whether the public could get access to the expense claims of members of parliament, someone
14 Google staff ended up going on strike in 2018 about the handling of sexual harassment scandals.
51
52
Chapter 2
■
Who Is the Opponent?
went to the PC where the records were kept, copied them to a DVD and sold
the lot to the Daily Telegraph. The paper published the juicy bits in instalments
all through May and June, when MPs gave up and published the lot on Parliament’s website. Half-a-dozen ministers resigned; seven MPs and peers went to
prison; dozens of MPs stood down or lost their seats at the following election;
and there was both mirth and outrage at some of the things charged to the taxpayer. The whistleblower may have technically committed a crime, but their
action was clearly in the public interest; now all parliamentary expenses are
public, as they should have been all along. If a nation’s lawmakers have their
hands in the till, what else will clean up the system?
Even in the case of Ed Snowden, there should have been a robust way for
him to report unlawful conduct by the NSA to the appropriate arm of government, probably a Congressional committee. But he knew that a previous
whistleblower, Bill Binney, had been arrested and harassed after trying to do
that. In hindsight, that aggressive approach was unwise, as President Obama’s
NSA review group eventually conceded. At the less exalted level of a commercial firm, if one of your staff is stealing your money, and another wants to tell
you about it, you’d better make that work.
2.4 Geeks
Our third category of attacker are the people like me – researchers who
investigate vulnerabilities and report them so they can be fixed. Academics
look for new attacks out of curiosity, and get rewarded with professional
acclaim – which can lead to promotion for professors and jobs for the students
who help us. Researchers working for security companies also look for
newsworthy exploits; publicity at conferences such as Black Hat can win new
customers. Hobby hackers break into stuff as a challenge, just as people climb
mountains or play chess; hacktivists do it to annoy companies they consider
to be wicked. Whether on the right side of the law or not, we tend to be
curious introverts who need to feel in control, but accept challenges and look
for the ‘rush’. Our reward is often fame – whether via academic publications,
by winning customers for a security consulting business, by winning medals
from academic societies or government agencies, or even on social media.
Sometimes we break stuff out of irritation, so we can circumvent something
that stops us fixing something we own; and sometimes there’s an element
of altruism. For example, people have come to us in the past complaining
that their bank cards had been stolen and used to buy stuff, and the banks
wouldn’t give them a refund, saying their PIN must have been used, when it
hadn’t. We looked into some of these cases and discovered the No-PIN and
preplay attacks on chip and PIN systems, which I’ll describe in the chapter on
2.5 The swamp
banking (the bad guys had actually discovered these attacks, but we replicated
them and got justice for some of the victims).
Security researchers who discovered and reported vulnerabilities to a
software vendor or system operator used to risk legal threats, as companies sometimes thought this would be cheaper than fixing things. So some
researchers took to disclosing bugs anonymously on mailing lists; but this
meant that the bad guys could use them at once. By the early 2000s, the IT
industry had evolved practices of responsible disclosure whereby researchers
disclose the bug to the maintainer some months in advance of disclosure.
Many firms operate bug-bounty programs that offer rewards for vulnerabilities; as a result, independent researchers can now make serious money selling
vulnerabilities, and more than one assiduous researcher has now earned
over $1m doing this. Since the Stuxnet worm, governments have raced to
stockpile vulnerabilities, and we now see some firms that buy vulnerabilities
from researchers in order to weaponise them, and sell them to cyber-arms
suppliers. Once they’re used, they spread, are eventually reverse-engineered
and patched. I’ll discuss this ecosystem in more detail in the chapters on
economics and assurance.
Some more traditional sectors still haven’t adopted responsible disclosure.
Volkswagen sued researchers in the universities of Birmingham and Nijmegen
who reverse-engineered some online car theft tools and documented how poor
their remote key entry system was. The company lost, making fools of themselves and publicising the insecurity of their vehicles (I’ll discuss the technical
details in section 4.3.1 and the policy in section 27.5.7.2). Eventually, as software permeates everything, software industry ways of working will become
more widespread too. In the meantime, we can expect turbulence. Firms that
cover up problems that harm their customers will have to reckon with the possibility that either an internal whistleblower, or an external security researcher,
will figure out what’s going on, and when that happens there will often be an
established responsible disclosure process to invoke. This will impose costs on
firms that fail to align their business models with it.
2.5 The swamp
Our fourth category is abuse, by which we usually mean offences against
the person rather than against property. These range from cyber-bullying at
schools all the way to state-sponsored Facebook advertising campaigns that
get people to swamp legislators with death threats. I’ll deal first with offences
that scale, including political harassment and child sex abuse material, and
then with offences that don’t, ranging from school bullying to intimate partner
abuse.
53
54
Chapter 2
■
Who Is the Opponent?
2.5.1 Hacktivism and hate campaigns
Propaganda and protest evolved as technology did. Ancient societies had to
make do with epic poetry; cities enabled people to communicate with hundreds of others directly, by making speeches in the forum; and the invention
of writing enabled a further scale-up. The spread of printing in the sixteenth
century led to wars of religion in the seventeenth, daily newspapers in the
eighteenth and mass-market newspapers in the nineteenth. Activists learned
to compete for attention in the mass media, and honed their skills as radio and
then TV came along.
Activism in the Internet age started off with using online media to mobilise
people to do conventional lobbying, such as writing to legislators; organisations such as Indymedia and Avaaz developed expertise at this during the
2000s. In 2011, activists such as Wael Ghonim used social media to trigger the
Arab Spring, which we discuss in more detail in section 26.4.1. Since then, governments have started to crack down, and activism has spread into online hate
campaigns and radicalisation. Many hate campaigns are covertly funded by
governments or opposition parties, but by no means all: single-issue campaign
groups are also players. If you can motivate hundreds of people to send angry
emails or tweets, then a company or individual on the receiving end can have a
real problem. Denial-of-service attacks can interrupt operations while doxxing
can do real brand damage as well as causing distress to executives and staff.
Activists vary in their goals, in their organisational coherence and in the
extent to which they’ll break the law. There’s a whole spectrum, from the
completely law-abiding NGOs who get their supporters to email legislators to
the slightly edgy, who may manipulate news by getting bots to click on news
stories, to game the media analytics and make editors pay more attention to
their issue. Then there are whistleblowers who go to respectable newspapers,
political partisans who harass people behind the mild anonymity of Twitter
accounts, hackers who break into target firms and vandalise their websites or
even doxx them. The Climategate scandal, described in 2.2.5 above, may be
an example of doxxing by a hacktivist. At the top end, there are the hard-core
types who end up in jail for terrorist offences.
During the 1990s, I happily used email and usenet to mobilise people against
surveillance bills going through the UK parliament, as I’ll describe later in
section 26.2.7. I found myself on the receiving end of hacktivism in 2003 when
the Animal Liberation Front targeted my university because of plans to build
a monkey house, for primates to be used in research. The online component
consisted of thousands of emails sent to staff members with distressing images
of monkeys with wires in their brains; this was an early example of ‘brigading’,
where hundreds of people gang up on one target online. We dealt with that
online attack easily enough by getting their email accounts closed down.
But they persisted with physical demonstrations and media harassment; our
2.5 The swamp
Vice-Chancellor decided to cut her losses, and the monkey house went to
Oxford instead. Some of the leaders were later jailed for terrorism offences
after they assaulted staff at a local pharmaceutical testing company and placed
bombs under the cars of medical researchers [21].
Online shaming has become popular as a means of protest. It can be quite
spontaneous, with a flash mob of vigilantes forming when an incident goes
viral. An early example happened in 2005 when a young lady in Seoul failed
to clean up after her dog defecated in a subway carriage. Another passenger
photographed the incident and put it online; within days the ‘dog poo girl’ had
been hounded into hiding, abandoning her university course [420]. There have
been many other cases since.
The power of platforms such as Twitter became evident in Gamergate, a
storm sparked by abusive comments about a female game developer made
publicly by a former boyfriend in August 2014, and cascading into a torrent
of misogynistic criticism of women in the gaming industry and of feminists
who had criticised the industry’s male-dominated culture. A number of
people were doxxed, SWATted, or hounded from their homes [1936]. The
harassment was coordinated on anonymous message boards such as 4chan
and the attackers would gang up on a particular target – who then also
got criticised by mainstream conservative journalists [1132]. The movement
appeared leaderless and evolved constantly, with one continuing theme being
a rant against ‘social justice warriors’. It appears to have contributed to the
development of the alt-right movement which influenced the 2016 election
two years later.
A growing appreciation of the power of angry online mobs is leading politicians to stir them up, at all levels from local politicians trying to undermine
their rivals to nation states trying to swing rival states’ elections. Angry mobs
are an unpleasant enough feature of modern politics in developed countries;
in less developed countries things get even worse, with real lynchings in
countries such as India (where the ruling BJP party has been building a
troll army since at least 2011 to harrass political opponents and civil-society
critics [1640]). Companies are targeted less frequently, but it does happen.
Meanwhile the social-media companies are under pressure to censor online
content, and as it’s hard for an AI program to tell the difference between a joke,
abuse, a conspiracy theory and information warfare by a foreign government,
they end up having to hire more and more moderators. I will return to the law
and policy aspects of this in 26.4 below.
2.5.2 Child sex abuse material
When the Internet came to governments’ attention in the 1990s and they wondered how to get a handle on it, the first thing to be regulated was images
of child sex abuse (CSA), in the Budapest Convention in 2001. We have little
55
56
Chapter 2
■
Who Is the Opponent?
data on the real prevalence of CSA material as the legal restrictions make it
hard for anyone outside law enforcement to do any research. In many countries, the approach to CSA material has less focus on actual harm reduction
than it deserves. Indeed, many laws around online sexual offences are badly
designed, and seem to be driven more by exploiting outrage than by minimising the number of victims and the harm they suffer. CSA may be a case study on
how not to do online regulation because of forensic failures, takedown failures,
weaponisation and the law-norm gap.
The most notorious forensic failure was Britain’s Operation Ore, which I
describe in more detail in 26.5.3. Briefly, several thousand men were arrested
on suspicion of CSA offences after their credit card numbers were found on
an abuse website, and perhaps half of them turned out to be victims of credit
card fraud. Hundreds of innocent men had their lives ruined. Yet nothing
was done for the child victims in Brazil and Indonesia, and the authorities are
still nowhere near efficient at taking down websites that host CSA material.
In most countries, CSA takedown is a monopoly of either the police, or a
regulated body that operates under public-sector rules (NCMEC in the USA
and the IWF in the UK), and takes from days to weeks; things would go much
more quickly if governments were to use the private-sector contractors that
banks use to deal with phishing sites [940]. The public-sector monopoly stems
from laws in many countries that make the possession of CSA material a
strict-liability offence. This not only makes it hard to deal with such material
using the usual abuse channels, but also allows it to be weaponised: protesters
can send it to targets and then report them to the police. It also makes it
difficult for parents and teachers to deal sensibly with incidents that arise with
teens using dating apps or having remote relationships. The whole thing is a
mess, caused by legislators wanting to talk tough without understanding the
technology. (CSA material is now a significant annoyance for some legislators’
staff, and also makes journalists at some newspapers reluctant to make their
email addresses public.)
There is an emerging law-norm gap with the growth in popularity of sexting
among teenagers. Like it or not, sending intimate photographs to partners (real
and intended) became normal behaviour for teens in many countries when
smartphones arrived in 2008. This was a mere seven years after the Budapest
convention, whose signatories may have failed to imagine that sexual images
of under-18s could be anything other than abuse. Thanks to the convention,
possessing an intimate photo of anyone under 18 can now result in a prison
sentence in any of the 63 countries that have ratified it. Teens laugh at lectures
from schoolteachers to not take or share such photos, but the end result is real
harm. Kids may be tricked or pressured into sharing photos of themselves, and
even if the initial sharing is consensual, the recipient can later use it for blackmail or just pass it round for a laugh. Recipients – even if innocent – are also
committing criminal offences by simply having the photos on their phones, so
2.5 The swamp
kids can set up other kids and denounce them. This leads to general issues of
bullying and more specific issues of intimate partner abuse.
2.5.3 School and workplace bullying
Online harassment and bullying are a fact of life in modern societies, not
just in schools but in workplaces too, as people jostle for rank, mates and
resources. From the media stories of teens who kill themselves following
online abuse, you might think that cyber-bullying now accounts for most of
the problem – at least at school – but the figures show that it’s less than half. An
annual UK survey discloses that about a quarter of children and young people
are constantly bullied (13% verbal, 5% cyber and 3% physical) while about half
are bullied sometimes (24%, 8% and 9% respectively) [565]. The only national
survey of all ages of which I’m aware is the French national victimisation
survey, which since 2007 has collected data not just on physical crimes such as
burglary and online crimes such as fraud, but on harassment too [1460]. This is
based on face-to-face interviews with 16,000 households and the 2017 survey
reported two million cases of threatening behaviour, 7% were made on social
networks and a further 9% by phone. But have social media made this worse?
Research suggests that the effects of social media use on adolescent well-being
are nuanced, small at best, and contingent on analytic methods [1475].
Yet there is talk in the media of a rise in teen suicide which some commentators link to social media use. Thankfully, the OECD mortality statistics show
that this is also untrue: suicides among 15–19 year olds have declined slightly
from about 8 to about 7 cases per 100,000 over the period 1990–2015 [1479].
2.5.4 Intimate relationship abuse
Just as I ended the last section by discussing whistleblowers – the insider
threat to companies – I’ll end this section with intimate relationship abuse,
the insider threat to families and individuals. Gamergate may have been a
flashbulb example, but protection from former intimate partners and other
family members is a real problem that exists at scale – with about half of all
marriages ending in divorce, and not all breakups being amicable. Intimate
partner abuse has been suffered by 27% of women and 11% of men. Stalking
is not of course limited to former partners. Celebrities in particular can be
stalked by people they’ve never met – with occasional tragic outcomes, as in
the case of John Lennon. But former partners account for most of it, and law
enforcement in most countries have historically been reluctant to do anything
effective about them. Technology has made the victims’ plight worse.
One subproblem is the publication of non-consensual intimate imagery
(NCII), once called ‘revenge porn’ – until California Attorney General Kamala
57
58
Chapter 2
■
Who Is the Opponent?
Harris objected that this is cyber-exploitation and a crime. Her message got
through to the big service firms who since 2015 have been taking down such
material on demand from the victims [1693]. This followed an earlier report
in 2012 where Harris documented the increasing use of smartphones, online
marketplaces and social media in forcing vulnerable people into unregulated
work including prostitution – raising broader questions about how technology
can be used to connect with, and assist, crime victims [867].
The problems faced by a woman leaving an abusive and controlling husband are among the hardest in the universe of information security. All the
usual advice is the wrong way round: your opponent knows not just your
passwords but has such deep contextual knowledge that he can answer all
your password recovery questions. There are typically three phases: a physical control phase where the abuser has access to your device and may install
malware, or even destroy devices; a high-risk escape phase as you try to find
a new home, a job and so on; and a life-apart phase when you might want to
shield location, email address and phone numbers to escape harassment, and
may have lifelong concerns. It takes seven escape attempts on average to get
to life apart, and disconnecting from online services can cause other abuse to
escalate. After escape, you may have to restrict childrens’ online activities and
sever mutual relationships; letting your child post anything can leak the school
location and lead to the abuser turning up. You may have to change career as it
can be impossible to work as a self-employed professional if you can no longer
advertise.
To support such users, responsible designers should think hard about
usability during times of high stress and high risk; they should allow users to
have multiple accounts; they should design things so that someone reviewing
your history should not be able to tell you deleted anything; they should push
two-factor authentication, unusual activity notifications, and incognito mode.
They should also think about how a survivor can capture evidence for use in
divorce and custody cases and possibly in criminal prosecution, while minimising the trauma [1250]. But that’s not what we find in real life. Many banks
don’t really want to know about disputes or financial exploitation within
families. A big problem in some countries is stalkerware – apps designed to
monitor partners, ex-partners, children or employees. A report from Citizen
Lab spells out the poor information security practices of these apps, how
they are marketed explicitly to abusive men, and how they break the law in
Europe and Canada; as for the USA and Australia, over half of abusers tracked
women using stalkerware [1497]. And then there’s the Absher app, which
enables men in Saudi Arabia to control their women in ways unacceptable in
developed countries; its availability in app stores has led to protests against
Apple and Google elsewhere in the world, but as of 2020 it’s still there.
Intimate abuse is hard for designers and others to deal with as it’s entangled
with normal human caregiving between partners, between friends and
2.6 Summary
colleagues, between parents and young children, and later between children
and elderly parents. Many relationships are largely beneficent but with
some abusive aspects, and participants often don’t agree on which aspects.
The best analysis I know, by Karen Levy and Bruce Schneier, discusses the
combination of multiple motivations, copresence which leads to technical vulnerabilities, and power dynamics leading to relational vulnerabilities [1156].
Technology facilitates multiple privacy invasions in relationships, ranging
from casual annoyance to serious crime; designers need to be aware that
households are not units, devices are not personal, and the purchaser of a
device is not the only user. I expect that concerns about intimate abuse will
expand in the next few years to concerns about victims of abuse by friends,
teachers and parents, and will be made ever more complex by new forms of
home and school automation.
2.6 Summary
The systems you build or operate can be attacked by a wide range of opponents. It’s important to work out who might attack you and how, and it’s also
important to be able to figure out how you were attacked and by whom. Your
systems can also be used to attack others, and if you don’t think about this in
advance you may find yourself in serious legal or political trouble.
In this chapter I’ve grouped adversaries under four general themes: spies,
crooks, hackers and bullies. Not all threat actors are bad: many hackers report
bugs responsibly and many whistleblowers are public-spirited. (‘Our’ spies are
of course considered good while ‘theirs’ are bad; moral valence depends on the
public and private interests in play.) Intelligence and law enforcement agencies
may use a mix of traffic data analysis and content sampling when hunting, and
targeted collection for gathering; collection methods range from legal coercion
via malware to deception. Both spies and crooks use malware to establish botnets as infrastructure. Crooks typically use opportunistic collection for mass
attacks, while for targeted work, spear-phishing is the weapon of choice; the
agencies may have fancier tools but use the same basic methods. There are also
cybercrime ecosystems attached to specific business sectors; crime will evolve
where it can scale. As for the swamp, the weapon of choice is the angry mob,
wielded nowadays by states, activist groups and even individual orators. There
are many ways in which abuse can scale, and when designing a system you
need to work out how crimes against it, or abuse using it, might scale. It’s not
enough to think about usability; you need to think about abusability too.
Personal abuse matters too. Every police officer knows that the person
who assaults you or murders you isn’t usually a stranger, but someone
you know – maybe another boy in your school class, or your stepfather.
This has been ignored by the security research community, perhaps because
59
60
Chapter 2
■
Who Is the Opponent?
we’re mostly clever white or Asian boys from stable families in good
neighbourhoods.
If you’re defending a company of any size, you’ll see enough machines on
your network getting infected, and you need to know whether they’re just
zombies on a botnet or part of a targeted attack. So it’s not enough to rely
on patching and antivirus. You need to watch your network and keep good
enough logs that when an infected machine is spotted you can tell whether
it’s a kid building a botnet or a targeted attacker who responds to loss of a
viewpoint with a scramble to develop another one. You need to make plans
to respond to incidents, so you know who to call for forensics – and so your
CEO isn’t left gasping like a landed fish in front of the TV cameras. You need
to think systematically about your essential controls: backup to recover from
ransomware, payment procedures to block business email compromise, and so
on. If you’re advising a large company they should have much of this already,
and if it’s a small company you need to help them figure out how to do enough
of it.
The rest of this book will fill in the details.
Research problems
Until recently, research on cybercrime wasn’t really scientific. Someone would
get some data – often under NDA from an anti-virus company – work out some
statistics, write up their thesis, and then go get a job. The data were never available to anyone else who wanted to check their results or try a new type of
analysis. Since 2015 we’ve been trying to fix that by setting up the Cambridge
Cybercrime Centre, where we collect masses of data on spam, phish, botnets
and malware as a shared resource for researchers. We’re delighted for other
academics to use it. If you want to do research on cybercrime, call us.
We also need something similar for espionage and cyber warfare. People trying to implant malware into control systems and other operational technology
are quite likely to be either state actors, or cyber-arms vendors who sell to
states. The criticisms made by President Eisenhower of the ‘military-industrial
complex’ apply here in spades. Yet not one of the legacy think-tanks seems
interested in tracking what’s going on. As a result, nations are more likely to
make strategic miscalculations, which could lead not just to cyber-conflict but
the real kinetic variety, too.
As for research into cyber abuse, there is now some research, but the technologists, the psychologists, the criminologists and the political scientists aren’t
talking to each other enough. There are many issues, from the welfare and
rights of children and young people, through the issues facing families separated by prison, to our ability to hold fair and free elections. We need to engage
more technologists with public-policy issues and educate more policy people
Further reading
about the realities of technology. We also need to get more women involved,
and people from poor and marginalised communities in both developed and
less developed countries, so we have a less narrow perspective on what the
real problems are.
Further reading
There’s an enormous literature on the topics discussed in this chapter but it’s
rather fragmented. A starting point for the Snowden revelations might be
Glenn Greenwald’s book ‘No Place to Hide’ [817]; for an account of Russian
strategy and tactics, see the 2018 report to the US Senate’s Committee on Foreign Relations [387]; and for a great introduction to the history of propaganda
see Tim Wu’s ‘The Attention Merchants’ [2052]. For surveys of cybercrime,
see our 2012 paper “Measuring the Cost of Cybercrime” [91] and our 2019
follow-up “Measuring the Changing Cost of Cybercrime” [92]. Criminologists
such as Bill Chambliss have studied state-organised crime, from piracy and
slavery in previous centuries through the more recent smuggling of drugs
and weapons by intelligence agencies to torture and assassination; this gives
the broader context within which to assess unlawful surveillance. The story
of Gamergate is told in Zoë Quinn’s ‘Crash Override’ [1570]. Finally, the tale of
Marcus Hutchins, the malware expert who stopped Wannacry, is at [812].
61
CHAPTER
3
Psychology and Usability
Humans are incapable of securely storing high-quality cryptographic keys, and they have
unacceptable speed and accuracy when performing cryptographic operations. (They are also
large, expensive to maintain, difficult to manage, and they pollute the environment. It is
astonishing that these devices continue to be manufactured and deployed. But they are
sufficiently pervasive that we must design our protocols around their limitations.)
– KAUFMANN, PERLMAN AND SPECINER [1028]
Only amateurs attack machines; professionals target people.
– BRUCE SCHNEIER
Metternich told lies all the time, and never deceived any one; Talleyrand never told a lie and
deceived the whole world.
– THOMAS MACAULAY
3.1 Introduction
Many real attacks exploit psychology at least as much as technology. We saw
in the last chapter how some online crimes involve the manipulation of angry
mobs, while both property crimes and espionage make heavy use of phishing,
in which victims are lured by an email to log on to a website that appears genuine but that’s actually designed to steal their passwords or get them to install
malware.
Online frauds like phishing are often easier to do, and harder to stop, than
similar real-world frauds because many online protection mechanisms are neither as easy to use nor as difficult to forge as their real-world equivalents. It’s
much easier for crooks to create a bogus bank website that passes casual inspection than to build an actual bogus bank branch in a shopping street.
We’ve evolved social and psychological tools over millions of years to help
us deal with deception in face-to-face contexts, but these are less effective when
we get an email that asks us to do something. For an ideal technology, good use
63
64
Chapter 3
■
Psychology and Usability
would be easier than bad use. We have many examples in the physical world: a
potato peeler is easier to use for peeling potatoes than a knife is, but a lot harder
to use for murder. But we’ve not always got this right for computer systems yet.
Much of the asymmetry between good and bad on which we rely in our daily
business doesn’t just depend on formal exchanges – which can be automated
easily – but on some combination of physical objects, judgment of people, and
the supporting social protocols. So, as our relationships with employers, banks
and government become more formalised via online communication, and we
lose both physical and human context, the forgery of these communications
becomes more of a risk.
Deception, of various kinds, is now the principal mechanism used to defeat
online security. It can be used to get passwords, to compromise confidential
information or to manipulate financial transactions directly. Hoaxes and frauds
have always happened, but the Internet makes some of them easier, and lets
others be repackaged in ways that may bypass our existing controls (be they
personal intuitions, company procedures or even laws).
Another driver for the surge in attacks based on social engineering is that
people are getting better at technology. As designers learn how to forestall
the easier technical attacks, psychological manipulation of system users or
operators becomes ever more attractive. So the security engineer absolutely
must understand basic psychology, as a prerequisite for dealing competently
with everything from passwords to CAPTCHAs and from phishing to social
engineering in general; a working appreciation of risk misperception and
scaremongering is also necessary to understand the mechanisms underlying
angry online mobs and the societal response to emergencies from terrorism to
pandemic disease. So just as research in security economics led to a real shift
in perspective between the first and second editions of this book, research
in security psychology has made much of the difference to how we view the
world between the second edition and this one.
In the rest of this chapter, I’ll first survey relevant research in psychology, then
work through how we apply the principles to make password authentication
mechanisms more robust against attack, to security usability more generally,
and beyond that to good design.
3.2 Insights from psychology research
Psychology is a huge subject, ranging from neuroscience through to clinical
topics, and spilling over into cognate disciplines from philosophy through artificial intelligence to sociology. Although it has been studied for much longer
than computer science, our understanding of the mind is much less complete:
the brain is so much more complex. There’s one central problem – the nature of
consciousness – that we just don’t understand at all. We know that ‘the mind
3.2 Insights from psychology research
is what the brain does’, yet the mechanisms that underlie our sense of self and
of personal history remain obscure.
Nonetheless a huge amount is known about the functioning of the mind and
the brain, and we’re learning interesting new things all the time. In what follows I can only offer a helicopter tour of three of the themes in psychology
research that are very relevant to our trade: cognitive psychology, which studies topics such as how we remember and what sort of mistakes we make; social
psychology, which deals with how we relate to others in groups and to authority; and behavioral economics, which studies the heuristics and biases that
lead us to make decisions that are consistently irrational in measurable and
exploitable ways.
3.2.1 Cognitive psychology
Cognitive psychology is the classical approach to the subject – building on
early empirical work in the nineteenth century. It deals with how we think,
remember, make decisions and even daydream. Twentieth-century pioneers
such as Ulric Neisser discovered that human memory doesn’t work like a
video recorder: our memories are stored in networks across the brain, from
which they are reconstructed, so they change over time and can be manipulated [1429]. There are many well-known results. For example, it’s easier to
memorise things that are repeated frequently, and it’s easier to store things
in context. Many of these insights are used by marketers and scammers, but
misunderstood or just ignored by most system developers.
For example, most of us have heard of George Miller’s result that human
short-term memory can cope with about seven (plus or minus two) simultaneous choices [1319] and, as a result, many designers limit menu choices to about
five. But this is not the right conclusion. People search for information first by
recalling where to look, and then by scanning; once you’ve found the relevant
menu, scanning ten items is only twice as hard as scanning five. The real limits
on menu size are screen size, which might give you ten choices, and with spoken menus, where the average user has difficulty dealing with more than three
or four [1547]. Here, too, Miller’s insight is misused because spatio-structural
memory is a different faculty from echoic memory. This illustrates why a broad
idea like 7+/-2 can be hazardous; you need to look at the detail.
In recent years, the centre of gravity in this field has been shifting from
applied cognitive psychology to the human-computer interaction (HCI)
research community, because of the huge amount of empirical know-how
gained not just from lab experiments, but from the iterative improvement of
fielded systems. As a result, HCI researchers not only model and measure
human performance, including perception, motor control, memory and
problem-solving; they have also developed an understanding of how users’
65
66
Chapter 3
■
Psychology and Usability
mental models of systems work, how they differ from developers’ mental models, and of the techniques (such as task analysis and cognitive walkthrough)
that we can use to explore how people learn to use and understand systems.
Security researchers need to find ways of turning these ploughshares into
swords (the bad guys are already working on it). There are some low-hanging
fruit; for example, the safety research community has put a lot of effort into
studying the errors people make when operating equipment [1592]. It’s said
that ‘to err is human’ and error research confirms this: the predictable varieties
of human error are rooted in the very nature of cognition. The schemata, or
mental models, that enable us to recognise people, sounds and concepts so
much better than computers, also make us vulnerable when the wrong model
gets activated.
Human errors made while operating equipment fall into broadly three categories, depending on where they occur in the ‘stack’: slips and lapses at the
level of skill, mistakes at the level of rules, and misconceptions at the cognitive
level.
Actions performed often become a matter of skill, but we can slip when
a manual skill fails – for example, pressing the wrong button – and
we can also have a lapse where we use the wrong skill. For example,
when you intend to go to the supermarket on the way home from
work you may take the road home by mistake, if that’s what you do
most days (this is also known as a capture error). Slips are exploited
by typosquatters, who register domains similar to popular ones, and
harvest people who make typing errors; other attacks exploit the fact
that people are trained to click ‘OK’ to pop-up boxes to get their work
done. So when designing a system you need to ensure that dangerous
actions, such as installing software, require action sequences that are
quite different from routine ones. Errors also commonly follow interruptions and perceptual confusion. One example is the post-completion
error: once they’ve accomplished their immediate goal, people are easily
distracted from tidying-up actions. More people leave cards behind
in ATMs that give them the money first and the card back second.
Actions that people take by following rules are open to errors when
they follow the wrong rule. Various circumstances – such as information overload – can cause people to follow the strongest rule they know,
or the most general rule, rather than the best one. Phishermen use many
tricks to get people to follow the wrong rule, ranging from using https
(because ‘it’s secure’) to starting URLs with the impersonated bank’s
name, as www.citibank.secureauthentication.com – for most people, looking for a name is a stronger rule than parsing its position.
The third category of mistakes are those made by people for cognitive
reasons – either they simply don’t understand the problem, or pretend
3.2 Insights from psychology research
that they do, and ignore advice in order to get their work done. The
seminal paper on security usability, Alma Whitten and Doug Tygar’s
“Why Johnny Can’t Encrypt”, demonstrated that the encryption
program PGP was simply too hard for most college students to use
as they didn’t understand the subtleties of private versus public keys,
encryption and signatures [2022]. And there’s growing realisation
that many security bugs occur because most programmers can’t use
security mechanisms either. Both access control mechanisms and
security APIs are hard to understand and fiddly to use; security testing
tools are often not much better. Programs often appear to work even
when protection mechanisms are used in quite mistaken ways. Engineers then copy code from each other, and from online code-sharing
sites, so misconceptions and errors are propagated widely [11]. They
often know this is bad, but there’s just not the time to do better.
There is some important science behind all this, and here are just two
examples. James Gibson developed the concept of action possibilities or
affordances: the physical environment may be climbable or fall-off-able or
get-under-able for an animal, and similarly a seat is sit-on-able. People have
developed great skill at creating environments that induce others to behave
in certain ways: we build stairways and doorways, we make objects portable
or graspable; we make pens and swords [763]. Often perceptions are made
up of affordances, which can be more fundamental than value or meaning.
In exactly the same way, we design software artefacts to train and condition
our users’ choices, so the affordances of the systems we use can affect how we
think in all sorts of ways. We can also design traps for the unwary: an animal
that mistakes a pitfall for solid ground is in trouble.
Gibson also came up with the idea of optical flows, further developed by
Christopher Longuet-Higgins [1187]. As our eyes move relative to the environment, the resulting optical flow field lets us interpret the image, understanding
the size, distance and motion of objects in it. There is an elegant mathematical
theory of optical parallax, but our eyes deal with it differently: they contain
receptors for specific aspects of this flow field which assume that objects in it
are rigid, which then enables us to resolve rotational and translational components. Optical flows enable us to understand the shapes of objects around us,
independently of binocular vision. We use them for some critical tasks such as
landing an aeroplane and driving a car.
In short, cognitive science gives useful insights into how to design system
interfaces so as to make certain courses of action easy, hard or impossible. It
is increasingly tied up with research into computer human interaction. You
can make mistakes more or less likely by making them easy or difficult; in
section 28.2.2 I give real examples of usability failures causing serious accidents involving both medical devices and aircraft. Yet security can be even
67
68
Chapter 3
■
Psychology and Usability
harder than safety if we have a sentient attacker who can provoke exploitable
errors.
What can the defender expect attackers to do? They will use errors whose
effect is predictable, such as capture errors; they will exploit perverse affordances; they will disrupt the flows on which safe operation relies; and they will
look for, or create, exploitable dissonances between users’ mental models of a
system and its actual logic. To look for these, you should try a cognitive walkthrough aimed at identifying attack points, just as a code walkthough can be
used to search for software vulnerabilities. Attackers also learn by experiment
and share techniques with each other, and develop tools to look efficiently for
known attacks. So it’s important to be aware of the attacks that have already
worked. (That’s one of the functions of this book.)
3.2.2 Gender, diversity and interpersonal variation
Many women die because medical tests and technology assume that patients
are men, or because engineers use male crash-test dummies when designing
cars; protective equipment, from sportswear through stab-vests to spacesuits,
gets tailored for men by default [498]. So do we have problems with information systems too? They are designed by men, and young geeky men at that,
yet over half their users may be women. This realisation has led to research on
gender HCI – on how software should be designed so that women can also use
it effectively. Early experiments started from the study of behaviour: experiments showed that women use peripheral vision more, and it duly turned out
that larger displays reduce gender bias. Work on American female programmers suggested that they tinker less than males, but more effectively [203]. But
how much is nature, and how much nurture? Societal factors matter, and US
women who program appear to be more thoughtful, but lower self-esteem and
higher risk-aversion leads them to use fewer features.
Gender has become a controversial topic in psychology research. In the
early 2000s, discussion of male aptitude for computer science was sometimes
in terms of an analysis by Simon Baron-Cohen which gives people separate
scores as systemisers (good at geometry and some kinds of symbolic reasoning) and as empathisers (good at intuiting the emotions of others and
social intelligence generally) [177]. Most men score higher at systematising,
while most women do better at empathising. The correspondence isn’t exact;
a minority of men are better at empathising while a minority of women are
better at systematising. Baron-Cohen’s research is in Asperger’s and autism
spectrum disorder, which he sees as an extreme form of male brain. This
theory gained some traction among geeks who saw an explanation of why
we’re often introverted with more aptitude for understanding things than for
understanding people. If we’re born that way, it’s not out fault. It also suggests
an explanation for why geek couples often have kids on the spectrum.
3.2 Insights from psychology research
Might this explain why men are more interested in computer science than
women, with women consistently taking about a sixth of CS places in the USA
and the UK? But here, we run into trouble. Women make up a third of CS
students in the former communist countries of Poland, Romania and the Baltic
states, while numbers in India are close to equal. Male dominance of software
is also a fairly recent phenomenon. When I started out in the 1970s, there were
almost as many women programmers as men, and many of the pioneers were
women, whether in industry, academia or government. This suggests that
the relevant differences are more cultural than genetic or developmental. The
argument for a ‘male brain / female brain’ explanation has been progressively
undermined by work such as that of Daphna Joel and colleagues who’ve
shown by extensive neuroimaging studies that while there are recognisable
male and female features in brains, the brains of individuals are a mosaic of
both [987]. And although these features are visible in imaging, that does not
mean they’re all laid down at birth: our brains have a lot of plasticity. As with
our muscles the tissues we exercise grow bigger. Perhaps nothing else might
have been expected given the variance in gender identity, sexual preference,
aggression, empathy and so on that we see all around us.
Other work has shown that gender performance differences are absent in
newborns, and appear round about age 6–7, by which time children have
long learned to distinguish gender and adapt to the social cues all around
them, which are reinforced in developed countries by a tsunami of blue/pink
gendered toys and marketing. (Some believe that women are happier to
work in computing in India because India escaped the home computer
boom in the 1980s and its evolution into gaming.) This is reinforced in later
childhood and adolescence by gender stereotypes that they internalise as
part of their identity; in cultures where girls aren’t supposed to be good
at maths or interested in computers, praise for being ‘good at maths’ can
evoke a stereotype threat (the fear of confirming a negative stereotype about a
group to which one belongs). Perhaps as a result, men react better to personal
praise (‘That was really clever of you!’) while women are motivated better
by performance praise (‘You must have put in a hell of a lot of effort’). So it
may not be surprising that we see a deficit of women in disciplines that praise
genius, such as mathematics. What’s more, similar mechanisms appear to
underlie the poorer academic performance of ethnic groups who have been
stigmatised as non-academic. In short, people are not just born different; we
learn to be different, shaped by power, by cultural attitudes, by expectations
and by opportunities. There are several layers between gene and culture
with emergent behaviour, including the cell and the circuit. So if we want
more effective interventions in the pipeline from school through university
to professional development, we need a better understanding of the underlying neurological and cultural mechanisms. For a survey of this, see Gina
Rippon [1608].
69
70
Chapter 3
■
Psychology and Usability
Gender matters at many levels of the stack, from what a product should
do through how it does it. For example, should a car be faster or safer? This
is entangled with social values. Are men better drivers because they win car
races, or are women better drivers because they have fewer insurance claims?
Digging down, we find gendered and cultural attitudes to risk. In US surveys,
risks are judged lower by white people and by men, and on closer study this
is because about 30% of white males judge risks to be extremely low. This
bias is consistent across a wide range of hazards but is particularly strong for
handguns, second-hand cigarette smoke, multiple sexual partners and street
drugs. Asian males show similarly low sensitivity to some hazards, such as
motor vehicles. White males are more trusting of technology, and less of government [693].
We engineers must of course work with the world as it is, not as it might
be if our education system and indeed our culture had less bias; but we must
be alert to the possibility that computer systems discriminate because they are
built by men for men, just like cars and spacesuits. For example, Tyler Moore
and I did an experiment to see whether anti-phishing advice given by banks to
their customers was easier for men to follow than women, and we found that
indeed it was [1339]. No-one seems to have done much work on gender and
security usability, so there’s an opportunity.
But the problem is much wider. Many systems will continue to be designed
by young fit straight clever men who are white or Asian and may not think
hard or at all about the various forms of prejudice and disability that they do
not encounter directly. You need to think hard about how you mitigate the
effects. It’s not enough to just have your new product tested by a token geek
girl on your development team; you have to think also of the less educated and
the vulnerable – including older people, children and women fleeing abusive
relationships (about which I’ll have more to say later). You really have to think
of the whole stack. Diversity matters in corporate governance, market research,
product design, software development and testing. If you can’t fix the imbalance in dev, you’d better make it up elsewhere. You need to understand your
users; it’s also good to understand how power and culture feed the imbalance.
As many of the factors relevant to group behaviour are of social origin, we
next turn to social psychology.
3.2.3 Social psychology
This attempts to explain how the thoughts, feelings, and behaviour of individuals are influenced by the actual, imagined, or implied presence of others.
It has many aspects, from the identity that people derive from belonging to
groups – whether of gender, tribe, team, profession or even religion – through
3.2 Insights from psychology research
the self-esteem we get by comparing ourselves with others. The results that
put it on the map were three early papers that laid the groundwork for understanding the abuse of authority and its relevance to propaganda, interrogation
and aggression. They were closely followed by work on the bystander effect
which is also highly relevant to crime and security.
3.2.3.1 Authority and its abuse
In 1951, Solomon Asch showed that people could be induced to deny the
evidence of their own eyes in order to conform to a group. Subjects judged
the lengths of lines after hearing wrong opinions from other group members,
who were actually the experimenter’s stooges. Most subjects gave in and
conformed, with only 29% resisting the bogus majority [136].
Stanley Milgram was inspired by the 1961 trial of Nazi war criminal
Adolf Eichmann to investigate how many experimental subjects were prepared to administer severe electric shocks to an actor playing the role of a
‘learner’ at the behest of an experimenter while the subject played the role
of the ‘teacher’ – even when the ‘learner’ appeared to be in severe pain and
begged the subject to stop. This experiment was designed to measure what
proportion of people will obey an authority rather than their conscience.
Most did – Milgram found that consistently over 60% of subjects would do
downright immoral things if they were told to [1314]. This experiment is now
controversial but had real influence on the development of the subject.
The third was the Stanford Prisoner Experiment which showed that normal
people can behave wickedly even in the absence of orders. In 1971, experimenter Philip Zimbardo set up a ‘prison’ at Stanford where 24 students were
assigned at random to the roles of 12 warders and 12 inmates. The aim of the
experiment was to discover whether prison abuses occurred because warders
(and possibly prisoners) were self-selecting. However, the students playing the
role of warders rapidly became sadistic authoritarians, and the experiment was
halted after six days on ethical grounds [2076]. This experiment is also controversial now and it’s unlikely that a repeat would get ethical approval today. But
abuse of authority, whether real or ostensible, is a real issue if you are designing
operational security measures for a business.
During the period 1995–2005, a telephone hoaxer calling himself ‘Officer
Scott’ ordered the managers of over 68 US stores and restaurants in 32 US states
(including at least 17 McDonald’s stores) to detain some young employee on
suspicion of theft and strip-search them. Various other degradations were
ordered, including beatings and sexual assaults [2036]. A former prison
guard was tried for impersonating a police officer but acquitted. At least 13
people who obeyed the caller and did searches were charged with crimes,
71
72
Chapter 3
■
Psychology and Usability
and seven were convicted. McDonald’s got sued for not training its store
managers properly, even years after the pattern of hoax calls was established;
and in October 2007, a jury ordered them to pay $6.1 million dollars to one
of the victims, who had been strip-searched when she was an 18-year-old
employee. It was a nasty case, as she was left by the store manager in the
custody of her boyfriend, who then committed a further indecent assault
on her. The boyfriend got five years, and the manager pleaded guilty to
unlawfully detaining her. McDonald’s argued that she was responsible for
whatever damages she suffered for not realizing it was a hoax, and that the
store manager had failed to apply common sense. A Kentucky jury didn’t
buy this and ordered McDonald’s to pay up. The store manager also sued,
claiming to be another victim of the firm’s negligence to warn her of the hoax,
and got $1.1 million [1090]. So US employers now risk heavy damages if they
fail to train their staff to resist the abuse of authority.
3.2.3.2 The bystander effect
On March 13, 1964, a young lady called Kitty Genovese was stabbed to death
in the street outside her apartment in Queens, New York. The press reported
that thirty-eight separate witnesses had failed to help or even to call the police,
although the assault lasted almost half an hour. Although these reports were
later found to be exaggerated, the crime led to the nationwide 911 emergency
number, and also to research on why bystanders often don’t get involved.
John Darley and Bibb Latané reported experiments in 1968 on what factors
modulated the probability of a bystander helping someone who appeared to
be having an epileptic fit. They found that a lone bystander would help 85%
of the time, while someone who thought that four other people could see the
victim would help only 31% of the time; group size dominated all other effects.
Whether another bystander was male, female or even medically qualified made
essentially no difference [513]. The diffusion of responsibility has visible effects
in many other contexts. If you want something done, you’ll email one person
to ask, not three people. Of course, security is usually seen as something that
other people deal with.
However, if you ever find yourself in danger, the real question is whether
at least one of the bystanders will help, and here the recent research is much
more positive. Lasse Liebst, Mark Levine and others have surveyed CCTV
footage of a number of public conflicts in several countries over the last ten
years, finding that in 9 out of 10 cases, one or more bystanders intervened to
de-escalate a fight, and that the more bystanders intervene, the more successful
they are [1166]. So it would be wrong to assume that bystanders generally pass
by on the other side; so the bystander effect’s name is rather misleading.
3.2 Insights from psychology research
3.2.4 The social-brain theory of deception
Our second big theme, which also fits into social psychology, is the growing
body of research into deception. How does deception work, how can we detect
and measure it, and how can we deter it?
The modern approach started in 1976 with the social intelligence hypothesis.
Until then, anthropologists had assumed that we evolved larger brains in order
to make better tools. But the archaeological evidence doesn’t support this. All
through the paleolithic period, while our brains evolved from chimp size to
human size, we used the same simple stone axes. They only became more
sophisticated in the neolithic period, by which time our ancestors were anatomically modern homo sapiens. So why, asked Nick Humphrey, did we evolve
large brains if we didn’t need them yet? Inspired by observing the behaviour
of both caged and wild primates, his hypothesis was that the primary function
of the intellect was social. Our ancestors didn’t evolve bigger brains to make
better tools, but to use other primates better as tools [936]. This is now supported by a growing body of evidence, and has transformed psychology as a
discipline. Social psychology had been a poor country cousin until then and
was not seen as rigorous; since then, people have realised it was probably the
driving force of cognitive evolution. Almost all intelligent species developed
in a social context. (One exception is the octopus, but even it has to understand
how predators and prey react.)
The primatologist Andy Whiten then collected much of the early evidence
on tactical deception, and recast social intelligence as the Machiavellian brain
hypothesis: we became smart in order to deceive others, and to detect deception too [362]. Not everyone agrees completely with this characterisation, as
the positive aspects of socialisation, such as empathy, also matter. But Hugo
Mercier and Dan Sperber have recently collected masses of evidence that the
modern human brain is more a machine for arguing than anything else [1296].
Our goal is persuasion rather than truth; rhetoric comes first, and logic second.
The second thread coming from the social intellect hypothesis is theory of
mind, an idea due to David Premack and Guy Woodruff in 1978 but developed
by Heinz Wimmer and Josef Perner in a classic 1983 experiment to determine
when children are first able to tell that someone has been deceived [2032]. In
this experiment, the Sally-Anne test, a child sees a sweet hidden under a cup
by Sally while Anne and the child watch. Anne then leaves the room and Sally
switches the sweet to be under a different cup. Anne then comes back and the
child is asked where Anne thinks the sweet is. Normal children get the right
answer from about age five; this is when they acquire the ability to discern
others’ beliefs and intentions. Simon Baron-Cohen, Alan Leslie and Uta Frith
then showed that children on the Aspergers / autism spectrum acquire this
ability significantly later [178].
73
74
Chapter 3
■
Psychology and Usability
Many computer scientists and engineers appear to be on the spectrum to
some extent, and we’re generally not as good at deception as neurotypical
people are. This has all sorts of implications! We’re under-represented in politics, among senior executives and in marketing. Oh, and there was a lot less
cybercrime before underground markets brought together geeks who could
write wicked code with criminals who could use it for wicked purposes. Geeks
are also more likely to be whistleblowers; we’re less likely to keep quiet about
an uncomfortable truth just to please others, as we place less value on their
opinions. But this is a complex field. Some well-known online miscreants who
are on the spectrum were hapless more than anything else; Gary McKinnon
claimed to have hacked the Pentagon to discover the truth about flying saucers
and didn’t anticipate the ferocity of the FBI’s response. And other kinds of
empathic deficit are involved in many crimes. Other people with dispositional
empathy deficits include psychopaths who disregard the feelings of others
but understand them well enough to manipulate them, while there are many
people whose deficits are situational, ranging from Nigerian scammers who
think that any white person who falls for their lure must think Africans are
stupid, so they deserve it, right through to soldiers and terrorists who consider
their opponents to be less than human or to be morally deserving of death. I’ll
discuss radicalisation in more detail later in section 26.4.2.
The third thread is self-deception. Robert Trivers argues that we’ve evolved
the ability to deceive ourselves in order to better deceive others: “If deceit is
fundamental in animal communication, then there must be strong selection to
spot deception and this ought, in turn, to select for a degree of self-deception,
rendering some facts and motives unconscious so as to not betray – by the
subtle signs of self-knowledge – the deception being practiced” [906]. We forget
inconvenient truths and rationalise things we want to believe. There may well
be a range of self-deception abilities from honest geeks through to the great
salesmen who have a magic ability to believe completely in their product. But
it’s controversial, and at a number of levels. For example, if Tony Blair really
believed that Iraq had weapons of mass destruction when he persuaded Britain
to go to war in 2003, was it actually a lie? How do you define sincerity? How
can you measure it? And would you even elect a national leader if you expected
that they’d be unable to lie to you? There is a lengthy discussion in [906], and
the debate is linked to other work on motivated reasoning. Russell Golman,
David Hagman and George Loewenstein survey research on how people avoid
information, even when it is free and could lead to better decision-making:
people at risk of illness avoid medical tests, managers avoid information that
might show they made bad decisions, and investors look at their portfolios less
when markets are down [782]. This strand of research goes all the way back
to Sigmund Freud, who described various aspects of the denial of unpleasant
information, including the ways in which we try to minimise our feelings of
guilt for the bad things we do, and to blame others for them.
3.2 Insights from psychology research
It also links up with filter-bubble effects on social media. People prefer to listen to others who confirm their beliefs and biases, and this can be analysed in
terms of the hedonic value of information. People think of themselves as honest
and try to avoid the ethical dissonance that results from deviations [173]; criminologists use the term neutralisation to describe the strategies that rule-breakers
use to minimise the guilt that they feel about their actions (there’s an overlap
with both filter effects and self-deception). A further link is to Hugo Mercier
and Dan Sperber’s work on the brain as a machine for argument, which I mentioned above.
The fourth thread is intent. The detection of hostile intent was a big deal in
our ancestral evolutionary environment; in pre-state societies, perhaps a quarter of men and boys die of homicide, and further back many of our ancestors
were killed by animal predators. So we appear to have evolved a sensitivity to
sounds and movements that might signal the intent of a person, an animal or
even a god. As a result, we now spend too much on defending against threats
that involve hostile intent, such as terrorism, and not enough on defending
against epidemic disease, which kills many more people – or climate change,
which could kill even more.
There are other reasons why we might want to think about intent more
carefully. In cryptography, we use logics of belief to analyse the security of
authentication protocols, and to deal with statements such as ‘Alice believes
that Bob believes that Charlie controls the key K’; we’ll come to this in the next
chapter. And now we realise that people use theories of mind to understand
each other, philosophers have got engaged too. Dan Dennett derived the
intentional stance in philosophy, arguing that the propositional attitudes we
use when reasoning – beliefs, desires and perceptions – come down to the
intentions of people and animals.
A related matter is socially-motivated reasoning: people do logic much better
if the problem is set in a social role. In the Wason test, subjects are told they have
to inspect some cards with a letter grade on one side, and a numerical code on
the other, and given a rule such as “If a student has a grade D on the front of
their card, then the back must be marked with code 3”. They are shown four
cards displaying (say) D, F, 3 and 7 and then asked “Which cards do you have
to turn over to check that all cards are marked correctly?” Most subjects get this
wrong; in the original experiment, only 48% of 96 subjects got the right answer
of D and 7. However the evolutionary psychologists Leda Cosmides and John
Tooby found the same problem becomes easier if the rule is changed to ‘If a
person is drinking beer, he must be 20 years old’ and the individuals are a beer
drinker, a coke drinker, a 25-year-old and a 16-year old. Now three-quarters
of subjects deduce that the bouncer should check the age of the beer drinker
and the drink of the 16-year-old [483]. Cosmides and Tooby argue that our
ability to do logic and perhaps arithmetic evolved as a means of policing social
exchanges.
75
76
Chapter 3
■
Psychology and Usability
The next factor is minimsation – the process by which people justify bad
actions or make their harm appear to be less. I mentioned Nigerian scammers
who think that white people who fall for their scams must be racist, so they
deserve it; there are many more examples of scammers working up reasons
why their targets are fair game. The criminologist Donald Cressey developed a
Fraud Triangle theory to explain the factors that lead to fraud: as well as motive
and opportunity, there must be a rationalisation. People may feel that their
employer has underpaid them so it’s justifiable to fiddle expenses, or that the
state is wasting money on welfare when they cheat on their taxes. Minimisation is very common in cybercrime. Kids operating DDoS-for-hire services
reassured each other that offering a ‘web stresser’ service was legal, and said on
their websites that the service could only be used for legal purposes. So undermining minimisation can work as a crime-fighting tool. The UK National Crime
Agency bought Google ads to ensure that anyone searching for a web stresser
service would see an official warning that DDoS was a crime. A mere £3,000
spent between January and June 2018 suppressed demand growth; DDoS revenues remained constant in the UK while they grew in the USA [457].
Finally, the loss of social context is a factor in online disinhibition. People
speak more frankly online, and this has both positive and negative effects.
Shy people can find partners, but we also see vicious flame wars. John Suler
analyses the factors as anonymity, invisibility, asynchronicity and the loss of
symbols of authority and status; in addition there are effects relating to psychic boundaries and self-imagination which lead us to drop our guard and
express feelings from affection to aggression that we normally rein in for social
reasons [1849].
Where all this leads is that the nature and scale of online deception can be
modulated by suitable interaction design. Nobody is as happy as they appear
on Facebook, as attractive as they appear on Instagram or as angry as they
appear on Twitter. They let their guard down on closed groups such as those
supported by WhatsApp, which offer neither celebrity to inspire performance,
nor anonymity to promote trolling. However, people are less critical in closed
groups, which makes them more suitable for spreading conspiracy theories,
and for radicalisation [523].
3.2.5 Heuristics, biases and behavioural economics
One field of psychology that has been applied by security researchers since the
mid-2000s has been decision science, which sits at the boundary of psychology
and economics and studies the heuristics that people use, and the biases
that influence them, when making decisions. It is also known as behavioural
economics, as it examines the ways in which people’s decision processes depart
from the rational behaviour modeled by economists. An early pioneer was
3.2 Insights from psychology research
Herb Simon – both an early computer scientist and a Nobel-prizewinning
economist – who noted that classical rationality meant doing whatever maximizes your expected utility regardless of how hard that choice is to compute.
So how would people behave in a realistic world of bounded rationality? The
real limits to human rationality have been explored extensively in the years
since, and Daniel Kahneman won the Nobel prize in economics in 2002 for his
major contributions to this field (along with the late Amos Tversky) [1006].
3.2.5.1 Prospect theory and risk misperception
Kahneman and Tversky did extensive experimental work on how people made
decisions faced with uncertainty. They first developed prospect theory which
models risk appetite: in many circumstances, people dislike losing $100 they
already have more than they value winning $100. Framing an action as avoiding a loss can make people more likely to take it; phishermen hook people by
sending messages like ‘Your PayPal account has been frozen, and you need to
click here to unlock it.’ We’re also bad at calculating probabilities, and use all
sorts of heuristics to help us make decisions:
we often base a judgment on an initial guess or comparison and then
adjust it if need be – the anchoring effect;
we base inferences on the ease of bringing examples to mind – the
availability heuristic, which was OK for lion attacks 50,000 years ago but
gives the wrong answers when mass media bombard us with images of
terrorism;
we’re more likely to be sceptical about things we’ve heard than about
things we’ve seen, perhaps as we have more neurons processing vision;
we worry too much about events that are very unlikely but have very
bad consequences;
we’re more likely to believe things we’ve worked out for ourselves
rather than things we’ve been told.
Behavioral economics is not just relevant to working out how likely people
are to click on links in phishing emails, but to the much deeper problem of the
perception of risk. Many people perceive terrorism to be a much worse threat
than epidemic disease, road traffic accidents or even food poisoning: this is
wrong, but hardly surprising to a behavioural economist. We overestimate the
small risk of dying in a terrorist attack not just because it’s small but because
of the visual effect of the 9/11 TV coverage, the ease of remembering the event,
the outrage of an enemy attack, and the effort we put into thinking and worrying about it. (There are further factors, which we’ll explore in Part 3 when we
discuss terrorism.)
77
78
Chapter 3
■
Psychology and Usability
The misperception of risk underlies many other public-policy problems. The
psychologist Daniel Gilbert, in an article provocatively entitled ‘If only gay sex
caused global warming’, compares our fear of terrorism with our fear of climate
change. First, we evolved to be much more wary of hostile intent than of nature;
100,000 years ago, a man with a club (or a hungry lion) was a much worse threat
than a thunderstorm. Second, global warming doesn’t violate anyone’s moral
sensibilities; third, it’s a long-term threat rather than a clear and present danger;
and fourth, we’re sensitive to rapid changes in the environment rather than
slow ones [765]. There are many more risk biases: we are less afraid when we’re
in control, such as when driving a car, as opposed to being a passenger in a car
or airplane; and we are more afraid of uncertainty, that is, when the magnitude
of the risk is unknown (even when it’s small) [1674, 1678]. We also indulge in
satisficing which means we go for an alternative that’s ‘good enough’ rather
than going to the trouble of trying to work out the odds perfectly, especially
for small transactions. (The misperception here is not that of the risk taker, but
of the economists who ignored the fact that real people include transaction
costs in their calculations.)
So, starting out from the folk saying that a bird in the hand is worth two in
the bush, we can develop quite a lot of machinery to help us understand and
model people’s attitudes towards risk.
3.2.5.2 Present bias and hyperbolic discounting
Saint Augustine famously prayed ‘Lord, make me chaste, but not yet.’ We find a
similar sentiment with applying security updates, where people may pay more
attention to the costs as they’re immediate and determinate in time, storage
and bandwidth, than the unpredictable future benefits. This present bias causes
many people to decline updates, which was the major source of technical vulnerability online for many years. One way software companies pushed back
was by allowing people to delay updates: Windows has ‘restart / pick a time /
snooze’. Reminders cut the ignore rate from about 90% to about 34%, and may
ultimately double overall compliance [726]. A better design is to make updates
so painless that they can be made mandatory, or nearly so; this is the approach
now followed by some web browsers, and by cloud-based services generally.
Hyperbolic discounting is a model used by decision scientists to quantify
present bias. Intuitive reasoning may lead people to use utility functions
that discount the future so deeply that immediate gratification seems to be
the best course of action, even when it isn’t. Such models have been applied
to try to explain the privacy paradox – why people say in surveys that they
care about privacy but act otherwise online. I discuss this in more detail in
section 8.67: other factors, such as uncertainty about the risks and about the
efficacy of privacy measures, play a part too. Taken together, the immediate
3.2 Insights from psychology research
and determinate positive utility of getting free stuff outweighs the random
future costs of disclosing too much personal information, or disclosing it to
dubious websites.
3.2.5.3 Defaults and nudges
This leads to the importance of defaults. Many people usually take the easiest path and use the standard configuration of a system, as they assume it
will be good enough. In 2009, Richard Thaler and Cass Sunstein wrote a bestseller ‘Nudge’ exploring this, pointing out that governments can achieve many
policy goals without infringing personal liberty simply by setting the right
defaults [1879]. For example, if a firm’s staff are enrolled in a pension plan
by default, most will not bother to opt out, while if it’s optional most will not
bother to opt in. A second example is that many more organs are made available for transplant in Spain, where the law lets a dead person’s organs be used
unless they objected, than in Britain where donors have to consent actively.
A third example is that tax evasion can be cut by having the taxpayer declare
that the information in the form is true when they start to fill it out, rather than
at the end. The set of choices people have to make, the order in which they
make them, and the defaults if they do nothing, are called the choice architecture. Sunnstein got a job in the Obama administration implementing some of
these ideas while Thaler won the 2017 economics Nobel prize.
Defaults matter in security too, but often they are set by an adversary so as to
trip you up. For example, Facebook defaults to fairly open information sharing,
and whenever enough people have figured out how to increase their privacy
settings, the architecture is changed so you have to opt out all over again. This
exploits not just hazardous defaults but also the control paradox – providing
the illusion of control causes people to share more information. We like to feel
in control; we feel more comfortable driving in our cars than letting someone
else fly us in an airplane – even if the latter is an order of magnitude safer. “Privacy control settings give people more rope to hang themselves,” as behavioral
economist George Loewenstein puts it. “Facebook has figured this out, so they
give you incredibly granular controls.” [1536]
3.2.5.4 The default to intentionality
Behavioral economists follow a long tradition in psychology of seeing the
mind as composed of interacting rational and emotional components – ‘heart’
and ‘head’, or ‘affective’ and ‘cognitive’ systems. Studies of developmental
biology have shown that, from an early age, we have different mental processing systems for social phenomena (such as recognising parents and siblings)
and physical phenomena. Paul Bloom argues that the tension between them
79
80
Chapter 3
■
Psychology and Usability
explains why many people believe that mind and body are basically different [269]. Children try to explain what they see using physics, but when their
understanding falls short, they explain phenomena in terms of intentional
action. This has survival value to the young, as it disposes them to get advice
from parents or other adults about novel natural phenomena. Bloom suggests
that it has an interesting side effect: it predisposes humans to believe that
body and soul are different, and thus lays the ground for religious belief. This
argument may not overwhelm the faithful (who will retort that Bloom simply
stumbled across a mechanism created by the Intelligent Designer to cause us
to have faith in Him). But it may have relevance for the security engineer.
First, it goes some way to explaining the fundamental attribution error – people
often err by trying to explain things from intentionality rather than from context. Second, attempts to curb phishing by teaching users about the gory design
details of the Internet – for example, by telling them to parse URLs in emails
that seem to come from a bank – will be of limited value once they get bewildered. If the emotional is programmed to take over whenever the rational runs
out, then engaging in a war of technical instruction and counter-instruction
with the phishermen is unsound, as they’ll be better at it. Safe defaults would
be better.
3.2.5.5 The affect heuristic
Nudging people to think in terms of intent rather than of mechanism can
exploit the affect heuristic, explored by Paul Slovic and colleagues [1791]. The
idea is that while the human brain can handle multiple threads of cognitive
processing, our emotions remain resolutely single-threaded, and they are
even less good at probability theory than the rational part of our brains. So by
making emotion salient, a marketer or a fraudster can try to get you to answer
questions using emotion rather than reason, and using heuristics rather than
calculation. A common trick is to ask an emotional question (whether ‘How
many dates did you have last month?’ or even ‘What do you think of President
Trump?’) to make people insensitive to probability.
So it should not surprise anyone that porn websites have been used to install
a lot of malware – as have church websites, which are often poorly maintained
and easy to hack. Similarly, events that evoke a feeling of dread – from cancer
to terrorism – not only scare people more than the naked probabilities justify,
but also make those probabilities harder to calculate, and deter people from
even making the effort.
Other factors that can reinforce our tendency to explain things by intent
include cognitive overload, where the rational part of the brain simply gets
tired. Our capacity for self-control is also liable to fatigue, both physical and
mental; some mental arithmetic will increase the probability that we’ll pick up
a chocolate rather than an apple. So a bank that builds a busy website may be
3.3 Deception in practice
able to sell more life insurance, but it’s also likely to make its customers more
vulnerable to phishing.
3.2.5.6 Cognitive dissonance
Another interesting offshoot of social psychology is cognitive dissonance theory. People are uncomfortable when they hold conflicting views; they seek out
information that confirms their existing views of the world and of themselves,
and try to reject information that conflicts with their views or might undermine
their self-esteem. One practical consequence is that people are remarkably able
to persist in wrong courses of action in the face of mounting evidence that
things have gone wrong [1866]. Admitting to yourself or to others that you
were duped can be painful; hustlers know this and exploit it. A security professional should ‘feel the hustle’ – that is, be alert for a situation in which recently
established social cues and expectations place you under pressure to ‘just do’
something about which you’d normally have reservations. That’s the time to
step back and ask yourself whether you’re being had. But training people to
perceive this is hard enough, and getting the average person to break the social
flow and say ‘stop!’ is hard. There have been some experiments, for example
with training health-service staff to not give out health information on the
phone, and training people in women’s self-defence classes to resist demands
for extra personal information. The problem with mainstreaming such training
is that the money available for it is orders of magnitude less than the marketing
budgets of the firms whose business model is to hustle their customers.
3.2.5.7 The risk thermostat
Some interesting empirical work has been done on how people manage their
exposure to risk. John Adams studied mandatory seat belt laws, and established that they don’t actually save lives: they just transfer casualties from
vehicle occupants to pedestrians and cyclists [20]. Seat belts make drivers feel
safer, so they drive faster in order to bring their perceived risk back up to its
previous level. He calls this a risk thermostat and the model is borne out in other
applications too [19]. The lesson is that testing needs to have ecological validity: you need to evaluate the effect of a proposed intervention in as realistic a
setting as possible.
3.3 Deception in practice
This takes us from the theory to the practice. Deception often involves an abuse
of the techniques developed by compliance professionals – those people whose
job it is to get other people to do things. While a sales executive might dazzle
81
82
Chapter 3
■
Psychology and Usability
you with an offer of a finance plan for a holiday apartment, a police officer
might nudge you by their presence to drive more carefully, a park ranger
might tell you to extinguish campfires carefully and not feed the bears, and a
corporate lawyer might threaten you into taking down something from your
website.
The behavioural economics pioneer and apostle of ‘nudge’, Dick Thaler,
refers to the selfish use of behavioural economics as ‘sludge’ [1878]. But it’s
odd that economists ever thought that the altruistic use of such techniques
would ever be more common than the selfish ones. Not only do marketers
push the most profitable option rather than the best value, but they use every
other available trick too. Stanford’s Persuasive Technology Lab has been at the
forefront of developing techniques to keep people addicted to their screens,
and one of their alumni, ex-Googler Tristan Harris, has become a vocal critic.
Sometimes dubbed ‘Silicon valley’s conscience’, he explains how tech earns its
money by manipulating not just defaults but choices, and asks how this can be
done ethically [868]. Phones and other screens present menus and thus control
choices, but there’s more to it than that. Two techniques that screens have
made mainstream are the casino’s technique of using intermittent variable
rewards to create addiction (we check our phones 150 times a day to see if
someone has rewarded us with attention) and bottomless message feeds (to
keep us consuming even when we aren’t hungry any more). But there are
many older techniques that predate computers.
3.3.1 The salesman and the scamster
Deception is the twin brother of marketing, so one starting point is the huge
literature about sales techniques. One eminent writer is Robert Cialdini, a psychology professor who took summer jobs selling everything from used cars to
home improvements and life insurance in order to document the tricks of the
trade. His book ‘Influence: Science and Practice’ is widely read by sales professionals and describes six main classes of technique used to influence people
and close a sale [426].
These are:
1. Reciprocity: most people feel the need to return favours;
2. Commitment and consistency: people suffer cognitive dissonance if they
feel they’re being inconsistent;
3. Social proof: most people want the approval of others. This
means following others in a group of which they’re a member, and the smaller the group the stronger the pressure;
4. Liking: most people want to do what a good-looking or otherwise likeable person asks;
3.3 Deception in practice
5. Authority: most people are deferential to authority figures (recall the
Milgram study mentioned above);
6. Scarcity: we’re afraid of missing out, if something we might want could
suddenly be unavailable.
All of these are psychological phenomena that are the subject of continuing research. They are also traceable to pressures in our ancestral evolutionary
environment, where food scarcity was a real threat, strangers could be dangerous and group solidarity against them (and in the provision of food and shelter)
was vital. All are used repeatedly in the advertising and other messages we
encounter constantly.
Frank Stajano and Paul Wilson built on this foundation to analyse the principles behind scams. Wilson researched and appeared in nine seasons of TV
programs on the most common scams – ‘The Real Hustle’ – where the scams
would be perpetrated on unsuspecting members of the public, who would then
be given their money back, debriefed and asked permission for video footage
to be used on TV. The know-how from experimenting with several hundred
frauds on thousands of marks over several years was distilled into the following seven principles [1823].
1. Distraction – the fraudster gets the mark to concentrate on the
wrong thing. This is at the heart of most magic performances.
2. Social compliance – society trains us not to question people who
seem to have authority, leaving people vulnerable to conmen
who pretend to be from their bank or from the police.
3. The herd principle – people let their guard down when everyone
around them appears to share the same risks. This is a mainstay of the
three-card trick, and a growing number of scams on social networks.
4. Dishonesty – if the mark is doing something dodgy, they’re less likely
to complain. Many are attracted by the idea that ‘you’re getting a
good deal because it’s illegal’, and whole scam families – such as
the resale of fraudulently obtained plane tickets – turn on this.
5. Kindness – this is the flip side of dishonesty, and an adaptation of
Cialdini’s principle of reciprocity. Many social engineering scams
rely on the victims’ helpfulness, from tailgating into a building
to phoning up with a sob story to ask for a password reset.
6. Need and greed – sales trainers tell us we should find what someone
really wants and then show them how to get it. A good fraudster
can help the mark dream a dream and use this to milk them.
7. Time pressure – this causes people to act viscerally rather than stopping
to think. Normal marketers use this all the time (‘only 2 seats left at this
price’); so do crooks.
83
84
Chapter 3
■
Psychology and Usability
The relationship with Cialdini’s principles should be obvious. A cynic might
say that fraud is just a subdivision of marketing; or perhaps that, as marketing
becomes ever more aggressive, it comes to look ever more like fraud. When we
investigated online accommodation scams we found it hard to code detectors,
since many real estate agents use the same techniques. In fact, the fraudsters’
behaviour was already well described by Cialdini’s model, except the scamsters added appeals to sympathy, arguments to establish their own credibility,
and ways of dealing with objections [2065]. (These are also found elsewhere in
the regular marketing literature.)
Oh, and we find the same in software, where there’s a blurry dividing
line between illegal malware and just-about-legal ‘Potentially Unwanted
Programs’ (PUPs) such as browser plugins that replace your ads with different
ones. One good distinguisher seems to be technical: malware is distributed
by many small botnets because of the risk of arrest, while PUPs are mostly
distributed by one large network [956]. But crooks use regular marketing
channels too: Ben Edelman found in 2006 that while 2.73% of companies
ranked top in a web search were bad, 4.44% of companies that appeared
alongside in the search ads were bad [612]. Bad companies were also more
likely to exhibit cheap trust signals, such as TRUSTe privacy certificates on
their websites. Similarly, bogus landlords often send reference letters or even
copies of their ID to prospective tenants, something that genuine landlords
never do.
And then there are the deceptive marketing practices of ‘legal’ businesses.
To take just one of many studies, a 2019 crawl of 11K shopping websites
by Arunesh Mathur and colleagues found 1,818 instances of ‘dark patterns’ – manipulative marketing practices such as hidden subscriptions,
hidden costs, pressure selling, sneak-into-basket tactics and forced account
opening. Of these at least 183 were clearly deceptive [1244]. What’s more,
the bad websites were among the most popular; perhaps a quarter to a third
of websites you visit, weighted by traffic, try to hustle you. This constant
pressure from scams that lie just short of the threshold for a fraud prosecution
has a chilling effect on trust generally. People are less likely to believe security
warnings if they are mixed with marketing, or smack of marketing in any
way. And we even see some loss of trust in software updates; people say
in surveys that they’re less likely to apply a security-plus-features upgrade
than a security patch, though the field data on upgrades don’t (yet) show any
difference [1594].
3.3.2 Social engineering
Hacking systems through the people who operate them is not new. Military
and intelligence organisations have always targeted each other’s staff; most of
the intelligence successes of the old Soviet Union were of this kind [119]. Private
investigation agencies have not been far behind.
3.3 Deception in practice
Investigative journalists, private detectives and fraudsters developed the
false-pretext phone call into something between an industrial process and an
art form in the latter half of the 20th century. An example of the industrial
process was how private detectives tracked people in Britain. Given that the
country has a National Health Service with which everyone’s registered, the
trick was to phone up someone with access to the administrative systems in
the area you thought the target was, pretend to be someone else in the health
service, and ask. Colleagues of mine did an experiment in England in 1996
where they trained the staff at a local health authority to identify and report
such calls1 . They detected about 30 false-pretext calls a week, which would
scale to 6000 a week or 300,000 a year for the whole of Britain. That eventually
got sort-of fixed but it took over a decade. The real fix wasn’t the enforcement
of privacy law, but that administrators simply stopped answering the phone.
Another old scam from the 20th century is to steal someone’s ATM card and
then phone them up pretending to be from the bank asking whether their card’s
been stolen. On hearing that it has, the conman says ‘We thought so. Please just
tell me your PIN now so I can go into the system and cancel your card.’ The
most rapidly growing recent variety is the ‘authorised push payment’, where
the conman again pretends to be from the bank, and persuades the customer to
make a transfer to another account, typically by confusing the customer about
the bank’s authentication procedures, which most customers find rather mysterious anyway2 .
As for art form, one of the most disturbing security books ever published is
Kevin Mitnick’s ‘Art of Deception’. Mitnick, who was arrested and convicted
for breaking into US phone systems, related after his release from prison how
almost all of his exploits had involved social engineering. His typical hack was
to pretend to a phone company employee that he was a colleague, and solicit
‘help’ such as a password. Ways of getting past a company’s switchboard and
winning its people’s trust are a staple of sales-training courses, and hackers
apply these directly. A harassed system administrator is called once or twice
on trivial matters by someone claiming to be the CEO’s personal assistant; once
this idea has been accepted, the caller demands a new password for the boss.
Mitnick became an expert at using such tricks to defeat company security procedures, and his book recounts a fascinating range of exploits [1327].
Social engineering became world headline news in September 2006 when it
emerged that Hewlett-Packard chairwoman Patricia Dunn had hired private
investigators who used pretexting to obtain the phone records of other board
members of whom she was suspicious, and of journalists she considered hostile. She was forced to resign. The detectives were convicted of fraudulent wire
communications and sentenced to do community service [139]. In the same
1 The
story is told in detail in chapter 9 of the second edition of this book, available free online.
occasionally, a customer can confuse the bank; a 2019 innovation was the ‘callhammer’
attack, where someone phones up repeatedly to ‘correct’ the spelling of ‘his name’ and changes
it one character at a time into another one.
2 Very
85
86
Chapter 3
■
Psychology and Usability
year, the UK privacy authorities prosecuted a private detective agency that did
pretexting jobs for top law firms [1140].
Amid growing publicity about social engineering, there was an audit of the
IRS in 2007 by the Treasury Inspector General for Tax Administration, whose
staff called 102 IRS employees at all levels, asked for their user IDs, and told
them to change their passwords to a known value; 62 did so. What’s worse,
this happened despite similar audit tests in 2001 and 2004 [1676]. Since then, a
number of audit firms have offered social engineering as a service; they phish
their audit clients to show how easy it is. Since the mid-2010s, opinion has
shifted against this practice, as it causes a lot of distress to staff without changing behaviour very much.
Social engineering isn’t limited to stealing private information. It can also
be about getting people to believe bogus public information. The quote from
Bruce Schneier at the head of this chapter appeared in a report of a stock scam,
where a bogus press release said that a company’s CEO had resigned and its
earnings would be restated. Several wire services passed this on, and the stock
dropped 61% until the hoax was exposed [1673]. Fake news of this kind has
been around forever, but the Internet has made it easier to promote and social
media seem to be making it ubiquitous. We’ll revisit this issue when I discuss
censorship in section 26.4.
3.3.3 Phishing
While phone-based social engineering was the favoured tactic of the 20th
century, online phishing seems to have replaced it as the main tactic of the
21st. The operators include both criminals and intelligence agencies, while the
targets are both your staff and your customers. It is difficult enough to train
your staff; training the average customer is even harder. They’ll assume you’re
trying to hustle them, ignore your warnings and just figure out the easiest
way to get what they want from your system. And you can’t design simply
for the average. If your systems are not safe to use by people who don’t speak
English well, or who are dyslexic, or who have learning difficulties, you are
asking for serious legal trouble. So the easiest way to use your system had
better be the safest.
The word ‘phishing’ appeared in 1996 in the context of the theft of AOL passwords. By then, attempts to crack email accounts to send spam had become
common enough for AOL to have a ‘report password solicitation’ button on its
web page; and the first reference to ‘password fishing’ is in 1990, in the context
of people altering terminal firmware to collect Unix logon passwords [445].
Also in 1996, Tony Greening reported a systematic experimental study: 336
computer science students at the University of Sydney were sent an email message asking them to supply their password on the pretext that it was required
3.3 Deception in practice
to ‘validate’ the password database after a suspected break-in. 138 of them
returned a valid password. Some were suspicious: 30 returned a plausible looking but invalid password, while over 200 changed their passwords without
official prompting. But very few of them reported the email to authority [813].
Phishing attacks against banks started seven years later in 2003, with
half-a-dozen attempts reported [443]. The early attacks imitated bank websites, but were both crude and greedy; the attackers asked for all sorts of
information such as ATM PINs, and their emails were also written in poor
English. Most customers smelt a rat. By about 2008, the attackers learned to
use better psychology; they often reused genuine bank emails, with just the
URLs changed, or sent an email saying something like ‘Thank you for adding
a new email address to your PayPal account’ to provoke the customer to log
on to complain that they hadn’t. Of course, customers who used the provided
link rather than typing in www.paypal.com or using an existing bookmark
would get their accounts emptied. By then phishing was being used by state
actors too; I described in section 2.2.2 how Chinese intelligence compromised
the Dalai Lama’s private office during the 2008 Olympic games. They used
crimeware tools that were originally used by Russian fraud gangs, which they
seemed to think gave them some deniability afterwards.
Fraud losses grew rapidly but stabilised by about 2015. A number of countermeasures helped bring things under control, including more complex logon
schemes (using two-factor authentication, or its low-cost cousin, the request for
some random letters of your password); a move to webmail systems that filter
spam better; and back-end fraud engines that look for cashout patterns. The
competitive landscape was rough, in that the phishermen would hit the easiest targets at any time in each country, both in terms of stealing their customer
credentials and using their accounts to launder stolen funds. Concentrated
losses caused the targets to wake up and take action. Since then, we’ve seen
large-scale attacks on non-financial firms like Amazon; in the late 2000s, the
crook would change your email and street address, then use your credit card to
order a wide-screen TV. Since about 2016, the action has been in gift vouchers.
As we noted in the last chapter, phishing is also used at scale by botmasters
to recruit new machines to their botnets, and in targeted ways both by crooks
aiming at specific people or firms, and by intelligence agencies. There’s a big
difference between attacks conducted at scale, where the economics dictate that
the cost of recruiting a new machine to a botnet can be at most a few cents, and
targeted attacks, where spies can spend years trying to hack the phone of a
rival head of government, or a fraudster can spend weeks or months of effort
stalking a chief financial officer in the hope of a large payout. The lures and
techniques used are different, even if the crimeware installed on the target’s
laptop or phone comes from the same stable. Cormac Herley argues that this
gulf between the economics of targeted crime and volume crime is one of the
reasons why cybercrime isn’t much worse than it is [889]. After all, given that
87
88
Chapter 3
■
Psychology and Usability
we depend on computers, and that all computers are insecure, and that there
are attacks all the time, how come civilisation hasn’t collapsed? Cybercrime
can’t always be as easy as it looks.
Another factor is that it takes time for innovations to be developed and disseminated. We noted that it took seven years for the bad guys to catch up with
Tony Greening’s 1995 phishing work. As another example, a 2007 paper by
Tom Jagatic and colleagues showed how to make phishing much more effective by automatically personalising each phish using context mined from the
target’s social network [973]. I cited that in the second edition of this book, and
in 2016 we saw it in the wild: a gang sent hundreds of thousands of phish with
US and Australian banking Trojans to individuals working in finance departments of companies, with their names and job titles apparently scraped from
LinkedIn [1299]. This seems to have been crude and hasn’t really caught on,
but once the bad guys figure it out we may see spear-phishing at scale in the
future, and it’s interesting to think of how we might respond. The other personalised bulk scams we see are blackmail attempts where the victims get email
claiming that their personal information has been compromised and including
a password or the last four digits of a credit card number as evidence, but the
yield from such scams seems to be low.
As I write, crime gangs have been making ever more use of spear-phishing
in targeted attacks on companies where they install ransomware, steal gift
coupons and launch other scams. In 2020, a group of young men hacked Twitter, where over a thousand employees had access to internal tools that enabled
them to take control of user accounts; the gang sent bitcoin scam tweets from
the accounts of such well-known users as Bill Gates, Barack Obama and Elon
Musk [1294]. They appear to have honed their spear-phishing skills on SIM
swap fraud, which I’ll discuss later in sections 3.4.1 and 12.7.4. The spread
of such ‘transferable skills’ among crooks is similar in many ways to the
adoption of mainstream technology.
3.3.4 Opsec
Getting your staff to resist attempts by outsiders to inveigle them into revealing secrets, whether over the phone or online, is known in military circles as
operational security or opsec. Protecting really valuable secrets, such as unpublished financial data, not-yet-patented industrial research and military plans,
depends on limiting the number of people with access, and also on doctrines
about what may be discussed with whom and how. It’s not enough for rules to
exist; you have to train the staff who have access, explain the reasons behind
the rules, and embed them socially in the organisation. In our medical privacy
case, we educated health service staff about pretext calls and set up a strict
callback policy: they would not discuss medical records on the phone unless
3.3 Deception in practice
they had called a number they had got from the health service internal phone
book rather than from a caller. Once the staff have detected and defeated a few
false-pretext calls, they talk about it and the message gets embedded in the way
everybody works.
Another example comes from a large Silicon Valley service firm, which
suffered intrusion attempts when outsiders tailgated staff into buildings on
campus. Stopping this with airport-style ID checks, or even card-activated
turnstiles, would have changed the ambience and clashed with the culture. The
solution was to create and embed a social rule that when someone holds open
a building door for you, you show them your badge. The critical factor, as with
the bogus phone calls, is social embedding rather than just training. Often the
hardest people to educate are the most senior; in my own experience in banking, the people you couldn’t train were those who were paid more than you,
such as traders in the dealing rooms. The service firm in question did better, as
its CEO repeatedly stressed the need to stop tailgating at all-hands meetings.
Some opsec measures are common sense, such as not throwing sensitive
papers in the trash, or leaving them on desks overnight. (One bank at which
I worked had the cleaners move all such papers to the departmental manager’s desk.) Less obvious is the need to train the people you trust. A leak
of embarrassing emails that appeared to come from the office of UK Prime
Minister Tony Blair and was initially blamed on ‘hackers’ turned out to have
been fished out of the trash at his personal pollster’s home by a private
detective [1210].
People operate systems however they have to, and this usually means breaking some of the rules in order to get their work done. Research shows that
company staff have only so much compliance budget, that is, they’re only prepared to put so many hours a year into tasks that are not obviously helping
them achieve their goals [197]. You need to figure out what this budget is, and
use it wisely. If there’s some information you don’t want your staff to be tricked
into disclosing, it’s safer to design systems so that they just can’t disclose it, or
at least so that disclosures involve talking to other staff members or jumping
through other hoops.
But what about a firm’s customers? There is a lot of scope for phishermen to
simply order bank customers to reveal their security data, and this happens at
scale, against both retail and business customers. There are also the many small
scams that customers try on when they find vulnerabilities in your business
processes. I’ll discuss both types of fraud further in the chapter on banking
and bookkeeping.
3.3.5 Deception research
Finally, a word on deception research. Since 9/11, huge amounts of money have
been spent by governments trying to find better lie detectors, and deception
89
90
Chapter 3
■
Psychology and Usability
researchers are funded across about five different subdisciplines of psychology.
The polygraph measures stress via heart rate and skin conductance; it has been
around since the 1920s and is used by some US states in criminal investigations,
as well as by the Federal government in screening people for Top Secret clearances. The evidence on its effectiveness is patchy at best, and surveyed extensively by Aldert Vrij [1974]. While it can be an effective prop in the hands of a
skilled interrogator, the key factor is the skill rather than the prop. When used
by unskilled people in a lab environment, against experimental subjects telling
low-stakes lies, its output is little better than random. As well as measuring
stress via skin conductance, you can measure distraction using eye movements
and guilt by upper body movements. In a research project with Sophie van
der Zee, we used body motion-capture suits and also the gesture-recognition
cameras in an Xbox and got slightly better results than a polygraph [2066].
However such technologies can at best augment the interrogator’s skill, and
claims that they work well should be treated as junk science. Thankfully, the
government dream of an effective interrogation robot is some way off.
A second approach to dealing with deception is to train a machine-learning
classifier on real customer behaviour. This is what credit-card fraud engines
have been doing since the late 1990s, and recent research has pushed into other
fields too. For example, Noam Brown and Tuomas Sandholm have created
a poker-playing bot called Pluribus that beat a dozen expert players over a
12-day marathon of 10,000 hands of Texas Hold ’em. It doesn’t use psychology
but game theory, playing against itself millions of times and tracking regret at
bids that could have given better outcomes. That it can consistently beat experts
without access to ‘tells’ such as its opponents’ facial gestures or body language
is itself telling. Dealing with deception using statistical machine learning rather
than physiological monitoring may also be felt to intrude less into privacy.
3.4 Passwords
The management of passwords gives an instructive context in which usability,
applied psychology and security meet. Passwords have been one of the
biggest practical problems facing security engineers since perhaps the 1970s.
In fact, as the usability researcher Angela Sasse puts it, it’s hard to think of a
worse authentication mechanism than passwords, given what we know about
human memory: people can’t remember infrequently-used or frequentlychanged items; we can’t forget on demand; recall is harder than recognition;
and non-meaningful words are more difficult.
To place the problem in context, most passwords you’re asked to set are
not for your benefit but for somebody else’s. The modern media ecosystem
is driven by websites seeking to maximise both their page views and their registered user bases so as to maximise their value when they are sold. That’s why,
3.4 Passwords
when you’re pointed to a news article that’s so annoying you feel you have to
leave a comment, you find you have to register. Click, and there’s a page of
ads. Fill out the form with an email address and submit. Got the CAPTCHA
wrong, so do it again and see another page of ads. Click on the email link, and
see a page with another ad. Now you can add a comment that nobody will ever
read. In such circumstances you’re better to type random garbage and let the
browser remember it; or better still, don’t bother. Even major news sites use
passwords against the reader’s interest, for example by limiting the number of
free page views you get per month unless you register again with a different
browser. This ecosystem is described in detail by Ryan Holiday [915].
Turning now to the more honest uses, the password system used by a big
modern service firm has a number of components.
1. The visible part is the logon page, which asks you to choose a
password when you register and probably checks its strength in
some way. It later asks for this password whenever you log on.
2. There will be recovery mechanisms that enable you to deal
with a forgotten password or even a compromised account,
typically by asking further security questions, or via your primary email account, or by sending an SMS to your phone.
3. Behind this lie technical protocol mechanisms for password checking, typically routines that encrypt your password when you enter
it at your laptop or phone, and then either compare it with a local
encrypted value, or take it to a remote server for checking.
4. There are often protocol mechanisms to synchronise passwords
across multiple platforms, so that if you change your password
on your laptop, your phone won’t let you use that service until
you enter the new one there too. And these mechanisms may
enable you to blacklist a stolen phone without having to reset
the passwords for all the services it was able to access.
5. There will be intrusion-detection mechanisms to propagate an alarm if
one of your passwords is used somewhere it probably shouldn’t be.
6. There are single-signon mechanisms to use one logon for many websites,
as when you use your Google or Facebook account to log on to a newspaper.
Let’s work up from the bottom. Developing a full-feature password management system can be a lot of work, and providing support for password
recovery also costs money (a few years ago, the UK phone company BT had two
hundred people in its password-reset centre). So outsourcing ‘identity management’ can make business sense. In addition, intrusion detection works best at
scale: if someone uses my gmail password in an Internet cafe in Peru while
91
92
Chapter 3
■
Psychology and Usability
Google knows I’m in Scotland, they send an SMS to my phone to check, and
a small website can’t do that. The main cause of attempted password abuse is
when one firm gets hacked, disclosing millions of email addresses and passwords, which the bad guys try out elsewhere; big firms spot this quickly while
small ones don’t. The big firms also help their customers maintain situational
awareness, by alerting you to logons from new devices or from strange places.
Again, it’s hard to do that if you’re a small website or one that people visit
infrequently.
As for syncing passwords between devices, only the device vendors can
really do that well; and the protocol mechanisms for encrypting passwords in
transit to a server that verifies them will be discussed in the next chapter. That
brings us to password recovery.
3.4.1 Password recovery
The experience of the 2010s, as the large service firms scaled up and people
moved en masse to smartphones, is that password recovery is often the hardest aspect of authentication. If people you know, such as your staff, forget their
passwords, you can get them to interact with an administrator or manager who
knows them. But for people you don’t know such as your online customers
it’s harder. And as a large service firm will be recovering tens of thousands of
accounts every day, you need some way of doing it without human intervention in the vast majority of cases.
During the 1990s and 2000s, many websites did password recovery using
‘security questions’ such as asking for your favourite team, the name of your
pet or even that old chestnut, your mother’s maiden name. Such near-public
information is often easy to guess so it gave an easier way to break into
accounts than guessing the password itself. This was made even worse by
everyone asking the same questions. In the case of celebrities – or abuse by a
former intimate partner – there may be no usable secrets. This was brought
home to the public in 2008, when a student hacked the Yahoo email account of
US Vice-Presidential candidate Sarah Palin via the password recovery questions – her date of birth and the name of her first school. Both of these were
public information. Since then, crooks have learned to use security questions
to loot accounts when they can; at the US Social Security Administration, a
common fraud was to open an online account for a pensioner who’s dealt
with their pension by snail mail in the past, and redirect the payments to a
different bank account. This peaked in 2013; the countermeasure that fixed it
was to always notify beneficiaries of account changes by snail mail.
In 2015, five Google engineers published a thorough analysis of security
questions, and many turned out to be extremely weak. For example, an
attacker could get a 19.7% success rate against ‘Favourite food?’ in English.
3.4 Passwords
Some 37% of people provided wrong answers, in some cases to make them
stronger, but sometimes not. Fully 16% of people’s answers were public. In
addition to being insecure, the ‘security questions’ turned out to be hard to
use: 40% of English-speaking US users were unable to recall the answers
when needed, while twice as many could recover accounts using an SMS reset
code [292].
Given these problems with security and memorability, most websites now let
you recover your password by an email to the address with which you first registered. But if someone compromises that email account, they can get all your
dependent accounts too. Email recovery may be adequate for websites where a
compromise is of little consequence, but for important accounts – such as banking and email itself – standard practice is now to use a second factor. This is
typically a code sent to your phone by SMS, or better still using an app that can
encrypt the code and tie it to a specific handset. Many service providers that
allow email recovery are nudging people towards using such a code instead
where possible. Google research shows that SMSs stop all bulk password
guessing by bots, 96% of bulk phishing and 76% of targeted attacks [574].
But this depends on phone companies taking care over who can get a
replacement SIM card, and many don’t. The problem in 2020 is rapid growth
in attacks based on intercepting SMS authentication codes, which mostly
seem to involve SIM swap, where the attacker pretends to be you to your
mobile phone company and gets a replacement SIM card for your account.
SIM-swap attacks started in South Africa in 2007, became the main form of
bank fraud in Nigeria, then caught on in America – initially as a means of
taking over valuable Instagram accounts, then to loot people’s accounts at
bitcoin exchanges, then for bank fraud more generally [1094]. I will discuss
SIM-swap attacks in more detail in section 12.7.4.
Attackers have also exploited the SS7 signalling protocol to wiretap targets’
mobile phones remotely and steal codes [485]. I’ll discuss such attacks in
more detail in the chapters on phones and on banking. The next step in the
arms race will be moving customers from SMS messages for authentication
and account recovery to an app; the same Google research shows that this
improves these last two figures to 99% for bulk phishing and 90% for targeted
attacks [574]. As for the targeted attacks, other research by Ariana Mirian along
with colleagues from UCSD and Google approached gangs who advertised
‘hack-for-hire’ services online and asked them to phish Gmail passwords.
Three of the gangs succeeded, defeating SMS-based 2fa with a middleperson
attack; forensics then revealed 372 other attacks on Gmail users from the same
IP addresses during March to October 2018 [1324]. This is still an immature
criminal market, but to stop such attacks an app or authentication token is
the way to go. It also raises further questions about account recovery. If I
use a hardware security key on my Gmail, do I need a second one in a safe
as a recovery mechanism? (Probably.) If I use one app on my phone to do
93
94
Chapter 3
■
Psychology and Usability
banking and another as an authenticator, do I comply with rules on two-factor
authentication? (See section 12.7.4 in the chapter on banking.)
Email notification is the default for telling people not just of suspicious login
attempts, but of logins to new devices that succeeded with the help of a code.
That way, if someone plants malware on your phone, you have some chance
of detecting it. How a victim recovers then is the next question. If all else fails,
a service provider may eventually let them speak to a real person. But when
designing such a system, never forget that it’s only as strong as the weakest fallback mechanism – be it a recovery email loop with an email provider you don’t
control, a phone code that’s vulnerable to SIM swapping or mobile malware,
or a human who’s open to social engineering.
3.4.2 Password choice
Many accounts are compromised by guessing PINs or passwords. There are
botnets constantly breaking into online accounts by guessing passwords and
password-recovery questions, as I described in 2.3.1.4, in order to use email
accounts to send spam and to recruit machines to botnets. And as people invent
new services and put passwords on them, the password guessers find new
targets. A recent example is cryptocurrency wallets: an anonymous ‘bitcoin
bandit’ managed to steal $50m by trying lots of weak passwords for ethereum
wallets [810]. Meanwhile, billions of dollars’ worth of cryptocurrency has been
lost because passwords were forgotten. So passwords matter, and there are
basically three broad concerns, in ascending order of importance and difficulty:
1. Will the user enter the password correctly with a high enough
probability?
2. Will the user remember the password, or will they have to either
write it down or choose one that’s easy for the attacker to guess?
3. Will the user break the system security by disclosing the password to a
third party, whether accidentally, on purpose, or as a result of deception?
3.4.3 Difficulties with reliable password entry
The first human-factors issue is that if a password is too long or complex, users
might have difficulty entering it correctly. If the operation they’re trying to
perform is urgent, this might have safety implications. If customers have difficulty entering software product activation codes, this can generate expensive
calls to your support desk. And the move from laptops to smartphones during the 2010s has made password rules such as ‘at least one lower-case letter,
upper-case letter, number and special character’ really fiddly and annoying.
3.4 Passwords
This is one of the factors pushing people toward longer but simpler secrets,
such as passphrases of three or four words. But will people be able to enter
them without making too many errors?
An interesting study was done for the STS prepayment meters used to sell
electricity in many less-developed countries. The customer hands some money
to a sales agent, and gets a 20-digit number printed out on a receipt. They take
this receipt home, enter the numbers at a keypad in the meter, and the lights
come on. The STS designers worried that since a lot of the population was illiterate, and since people might get lost halfway through entering the number, the
system might be unusable. But illiteracy was not a problem: even people who
could not read had no difficulty with numbers (‘everybody can use a phone’,
as one of the engineers said). The biggest problem was entry errors, and these
were dealt with by printing the twenty digits in two rows, with three groups
of four digits in the first row followed by two in the second [94]. I’ll describe
this in detail in section 14.2.
A quite different application is the firing codes for US nuclear weapons. These
consist of only 12 decimal digits. If they are ever used, the operators will be
under extreme stress, and possibly using improvised or obsolete communications channels. Experiments suggested that 12 digits was the maximum that
could be conveyed reliably in such circumstances. I’ll discuss how this evolved
in section 15.2.
3.4.4 Difficulties with remembering the password
Our second psychological issue is that people often find passwords hard to
remember [2079]. Twelve to twenty digits may be easy to copy from a telegram
or a meter ticket, but when customers are expected to memorize passwords,
they either choose values that are easy for attackers to guess, or write them
down, or both. In fact, standard password advice has been summed up as:
“Choose a password you can’t remember, and don’t write it down”.
The problems are not limited to computer access. For example, one chain of
cheap hotels in France introduced self service. You’d turn up at the hotel, swipe
your credit card in the reception machine, and get a receipt with a numerical
access code to unlock your room door. To keep costs down, the rooms did not
have en-suite bathrooms. A common failure mode was that you’d get up in the
middle of the night to go to the bathroom, forget your access code, and realise
you hadn’t taken the receipt with you. So you’d have to sleep on the bathroom
floor until the staff arrived the following morning.
Password memorability can be discussed under five main headings: naïve
choice, user abilities and training, design errors, operational failures and vulnerability to social-engineering attacks.
95
96
Chapter 3
■
Psychology and Usability
3.4.4.1 Naïve choice
Since the mid-1980s, people have studied what sort of passwords people
choose, and found they use spouses’ names, single letters, or even just hit
carriage return giving an empty string as their password. Cryptanalysis of
tapes from a 1980 Unix system showed that of the pioneers, Dennis Ritchie
used ‘dmac’ (his middle name was MacAlistair); the later Google chairman
Eric Schmidt used ‘wendy!!!’ (his wife’s name) and Brian Kernighan used
‘/.,/.,’ [796]. Fred Grampp and Robert Morris’s classic 1984 paper on Unix
security [806] reports that after software became available which forced
passwords to be at least six characters long and have at least one nonletter,
they made a file of the 20 most common female names, each followed by a
single digit. Of these 200 passwords, at least one was in use on each of several
dozen machines they examined. At the time, Unix systems kept encrypted
passwords in a file /etc/passwd that all system users could read, so any user
could verify a guess of any other user’s password. Other studies showed
that requiring a non-letter simply changed the most popular password from
‘password’ to ‘password1’ [1675].
In 1990, Daniel Klein gathered 25,000 Unix passwords and found that
21–25% of passwords could be guessed depending on the amount of effort put
in [1058]. Dictionary words accounted for 7.4%, common names for 4%, combinations of user and account name 2.7%, and so on down a list of less probable
choices such as words from science fiction (0.4%) and sports terms (0.2%).
Other password guesses used patterns, such as by taking an account ‘klone’
belonging to the user ‘Daniel V. Klein’ and trying passwords such as klone,
klone1, klone123, dvk, dvkdvk, leinad, neilk, DvkkvD, and so on. The following year, Alec Muffett released ‘crack’, software that would try to brute-force
Unix passwords using dictionaries and patterns derived from them by a set of
mangling rules.
The largest academic study of password choice of which I am aware is by Joe
Bonneau, who in 2012 analysed tens of millions of passwords in leaked password files, and also interned at Yahoo where he instrumented the login system to collect live statistics on the choices of 70 million users. He also worked
out the best metrics to use for password guessability, both in standalone systems and where attackers use passwords harvested from one system to crack
accounts on another [290]. This work informed the design of password strength
checkers and other current practices at the big service firms.
3.4.4.2 User abilities and training
Sometimes you can train the users. Password checkers have trained them to
use longer passwords with numbers as well as letters, and the effect spills over
to websites that don’t use them [446]. But you do not want to drive customers
3.4 Passwords
away, so the marketing folks will limit what you can do. In fact, research shows
that password rule enforcement is not a function of the value at risk, but of
whether the website is a monopoly. Such websites typically have very annoying rules, while websites with competitors, such as Amazon, are more usable,
placing more reliance on back-end intrusion-detection systems.
In a corporate or military environment you can enforce password choice
rules, or password change rules, or issue random passwords. But then people
will have to write them down. So you can insist that passwords are treated
the same way as the data they protect: bank master passwords go in the
vault overnight, while military ‘Top Secret’ passwords must be sealed in an
envelope, in a safe, in a room that’s locked when not occupied, in a building
patrolled by guards. You can send guards round at night to clean all desks
and bin everything that hasn’t been locked up. But if you want to hire and
retain good people, you’d better think things through a bit more carefully. For
example, one Silicon Valley firm had a policy that the root password for each
machine would be written down on a card and put in an envelope taped to
the side of the machine – a more human version of the rule that passwords be
treated the same way as the data they protect. The domestic equivalent is the
card in the back of your wifi router with the password.
While writing the first edition of this book, I could not find any account of
experiments on training people in password choice that would hold water
by the standards of applied psychology (i.e., randomized controlled trials
with adequate statistical power). The closest I found was a study of the recall
rates, forgetting rates, and guessing rates of various types of password [347];
this didn’t tell us the actual effects of giving users various kinds of advice.
We therefore decided to see what could be achieved by training, and selected
three groups of about a hundred volunteers from our first-year science
students [2058]:
the red (control) group was given the usual advice (password at least six
characters long, including one nonletter);
the green group was told to think of a passphrase and select letters from
it to build a password. So ‘It’s 12 noon and I am hungry’ would give
‘I’S12&IAH’;
the yellow group was told to select eight characters (alpha or numeric)
at random from a table we gave them, write them down, and destroy
this note after a week or two once they’d memorized the password.
What we expected to find was that the red group’s passwords would be easier
to guess than the green group’s which would in turn be easier than the yellow
group’s; and that the yellow group would have the most difficulty remembering their passwords (or would be forced to reset them more often), followed
by green and then red. But that’s not what we found.
97
98
Chapter 3
■
Psychology and Usability
About 30% of the control group chose passwords that could be guessed
using Alec Muffett’s ‘crack’ software, versus about 10 percent for the other
two groups. So passphrases and random passwords seemed to be about
equally effective. When we looked at password reset rates, there was no
significant difference between the three groups. When we asked the students
whether they’d found their passwords hard to remember (or had written them
down), the yellow group had significantly more problems than the other two;
but there was no significant difference between red and green.
The conclusions we drew were as follows.
For users who follow instructions, passwords based on mnemonic
phrases offer the best of both worlds. They are as easy to remember as
naively selected passwords, and as hard to guess as random passwords.
The problem then becomes one of user compliance. A significant number
of users (perhaps a third of them) just don’t do what they’re told.
So when the army gives soldiers randomly-selected passwords, its value
comes from the fact that the password assignment compels user compliance,
rather than from the fact that they’re random (as mnemonic phrases would do
just as well).
But centrally-assigned passwords are often inappropriate. When you are
offering a service to the public, your customers expect you to present broadly
the same interfaces as your competitors. So you must let users choose their
own website passwords, subject to some lightweight algorithm to reject
passwords that are ‘clearly bad’. (GCHQ suggests using a ‘bad password list’
of the 100,000 passwords most commonly found in online password dumps.)
In the case of bank cards, users expect a bank-issued initial PIN plus the ability
to change the PIN afterwards to one of their choosing (though again you may
block a ‘clearly bad’ PIN such as 0000 or 1234). Over half of cardholders keep
a random PIN, but about a quarter choose PINs such as children’s birth dates
which have less entropy than random PINs would, and have the same PIN
on different cards. The upshot is that a thief who steals a purse or wallet may
have a chance of about one in eleven to get lucky, if he tries the most common
PINs on all the cards first in offline mode and then in online mode, so he gets
six goes at each. Banks that forbid popular choices such as 1234 can increase
the odds to about one in eighteen [296].
3.4.4.3 Design errors
Attempts to make passwords memorable are a frequent source of severe
design errors. The classic example of how not to do it is to ask for ‘your
mother’s maiden name’. A surprising number of banks, government departments and other organisations still authenticate their customers in this way,
3.4 Passwords
though nowadays it tends to be not a password but a password recovery
question. You could always try to tell ‘Yngstrom’ to your bank, ‘Jones’ to the
phone company, ‘Geraghty’ to the travel agent, and so on; but data are shared
extensively between companies, so you could easily end up confusing their
systems – not to mention yourself. And if you try to phone up your bank and
tell them that you’ve decided to change your mother’s maiden name from
Yngstrom to yGt5r4ad – or even Smith – then good luck. In fact, given the
large number of data breaches, you might as well assume that anyone who
wants to can get all your common password recovery information – including
your address, your date of birth, your first school and your social security
number, as well as your mother’s maiden name.
Some organisations use contextual security information. A bank I once used
asks its business customers the value of the last check from their account that
was cleared. In theory, this could be helpful: if someone overhears me doing a
transaction on the telephone, then it’s not a long-term compromise. The details
bear some attention though. When this system was first introduced, I wondered whether a supplier, to whom I’d just written a check, might impersonate
me, and concluded that asking for the last three checks’ values would be safer.
But the problem we actually had was unexpected. Having given the checkbook
to our accountant for the annual audit, we couldn’t talk to the bank. I also don’t
like the idea that someone who steals my physical post can also steal my money.
The sheer number of applications demanding a password nowadays
exceeds the powers of human memory. A 2007 study by Dinei Florêncio and
Cormac Herley of half a million web users over three months showed that
the average user has 6.5 passwords, each shared across 3.9 different sites;
has about 25 accounts that require passwords; and types an average of 8
passwords per day. Bonneau published more extensive statistics in 2012 [290]
but since then the frequency of user password entry has fallen, thanks to
smartphones. Modern web browsers also cache passwords; see the discussion
of password managers at section 3.4.11 below. But many people use the same
password for many different purposes and don’t work out special processes
to deal with their high-value logons such as to their bank, their social media
accounts and their email. So you have to expect that the password chosen
by the customer of the electronic banking system you’ve just designed, may
be known to a Mafia-operated porn site as well. (There’s even a website,
http://haveibeenpwned.com, that will tell you which security breaches have
leaked your email address and password.)
One of the most pervasive and persistent errors has been forcing users to
change passwords regularly. When I first came across enforced monthly password changes in the 1980s, I observed that it led people to choose passwords
such as ‘julia03’ for March, ‘julia04’ for April, and so on, and said as much in
the first (2001) edition of this book (chapter 3, page 48). However, in 2003, Bill
Burr of NIST wrote password guidelines recommending regular update [1098].
99
100
Chapter 3
■
Psychology and Usability
This was adopted by the Big Four auditors, who pushed it out to all their audit
clients3 . Meanwhile, security usability researchers conducted survey after
survey showing that monthly change was suboptimal. The first systematic
study by Yinqian Zhang, Fabian Monrose and Mike Reiter of the password
transformation techniques users invented showed that in a system with forced
expiration, over 40% of passwords could be guessed from previous ones, that
forced change didn’t do much to help people who chose weak passwords,
and that the effort of regular password choice may also have diminished
password quality [2073]. Finally a survey was written by usability guru Lorrie
Cranor while she was Chief Technologist at the FTC [492], and backed up
by an academic study [1507]. In 2017, NIST recanted; they now recommend
long passphrases that are only changed on compromise4 . Other governments’
agencies such as Britain’s GCHQ followed, and Microsoft finally announced
the end of password-expiration policies in Windows 10 from April 2019.
However, many firms are caught by the PCI standards set by the credit-card
issuers, which haven’t caught up and still dictate three-monthly changes;
another problem is that the auditors dictate compliance to many companies,
and will no doubt take time to catch up.
The current fashion, in 2020, is to invite users to select passphrases of three
or more random dictionary words. This was promoted by a famous xkcd cartoon which suggested ‘correct horse battery staple’ as a password. Empirical
research, however, shows that real users select multi-word passphrases with
much less entropy than they’d get if they really did select at random from a
dictionary; they tend to go for common noun bigrams, and moving to three or
four words brings rapidly diminishing returns [297]. The Electronic Frontier
Foundation now promotes using dice to pick words; they have a list of 7,776
words (65 , so five dice rolls to pick a word) and note that a six-word phrase has
77 bits of entropy and is memorable [291].
3.4.4.4 Operational failures
The most pervasive operational error is failing to reset default passwords. This
has been a chronic problem since the early dial access systems in the 1980s
attracted attention from mischievous schoolkids. A particularly bad example
is where systems have default passwords that can’t be changed, checked by
software that can’t be patched. We see ever more such devices in the Internet
of Things; they remain vulnerable for their operational lives. The Mirai botnets
have emerged to recruit and exploit them, as I described in Chapter 2.
3 Our
university’s auditors wrote in their annual report for three years in a row that we should
have monthly enforced password change, but couldn’t provide any evidence to support this and
weren’t even aware that their policy came ultimately from NIST. Unimpressed, we asked the chair
of our Audit Committee to appoint a new lot of auditors, and eventually that happened.
4 NIST SP 800-63-3
3.4 Passwords
Passwords in plain sight are another long-running problem, whether on
sticky notes or some electronic equivalent. A famous early case was R v
Gold and Schifreen, where two young hackers saw a phone number for the
development version of Prestel, an early public email service run by British
Telecom, in a note stuck on a terminal at an exhibition. They dialed in later,
and found the welcome screen had a maintenance password displayed on it.
They tried this on the live system too, and it worked! They proceeded to hack
into the Duke of Edinburgh’s electronic mail account, and sent mail ‘from’
him to someone they didn’t like, announcing the award of a knighthood. This
heinous crime so shocked the establishment that when prosecutors failed to
persuade the courts to convict the young men, Britain’s parliament passed its
first Computer Misuse Act.
A third operational issue is asking for passwords when they’re not really
needed, or wanted for dishonest reasons, as I discussed at the start of this
section. Most of the passwords you’re forced to set up on websites are there
for marketing reasons – to get your email address or give you the feeling of
belonging to a ‘club’ [295]. So it’s perfectly rational for users who never plan to
visit that site again to express their exasperation by entering ‘123456’ or even
ruder words in the password field.
A fourth is atrocious password management systems: some don’t encrypt
passwords at all, and there are reports from time to time of enterprising hackers
smuggling back doors into password management libraries [429].
But perhaps the biggest operational issue is vulnerability to socialengineering attacks.
3.4.4.5 Social-engineering attacks
Careful organisations communicate security context in various ways to help
staff avoid making mistakes. The NSA, for example, had different colored internal and external telephones, and when an external phone in a room is off-hook,
classified material can’t even be discussed in the room – let alone on the phone.
Yet while many banks and other businesses maintain some internal security
context, they often train their customers to act in unsafe ways. Because of pervasive phishing, it’s not prudent to try to log on to your bank by clicking on a
link in an email, so you should always use a browser bookmark or type in the
URL by hand. Yet bank marketing departments send out lots of emails containing clickable links. Indeed much of the marketing industry is devoted to getting
people to click on links. Many email clients – including Apple’s, Microsoft’s,
and Google’s – make plaintext URLs clickable, so their users may never see a
URL that isn’t. Bank customers are well trained to do the wrong thing.
A prudent customer should also be cautious if a web service directs
them somewhere else – yet bank systems use all sorts of strange URLs for
101
102
Chapter 3
■
Psychology and Usability
their services. A spam from the Bank of America directed UK customers
to mynewcard.com and got the certificate wrong (it was for mynewcard
.bankofamerica.com). There are many more examples of major banks training
their customers to practice unsafe computing – by disregarding domain
names, ignoring certificate warnings, and merrily clicking links [582]. As a
result, even security experts have difficulty telling bank spam from phish [445].
It’s not prudent to give out security information over the phone to unidentified callers – yet we all get phoned by bank staff who demand security information. Banks also call us on our mobiles now and expect us to give out security
information to a whole train carriage of strangers, rather than letting us text
a response. (I’ve had a card blocked because a bank security team phoned me
while I was driving; it would have been against the law to deal with the call
other than in hands-free mode, and there was nowhere safe to stop.) It’s also
not prudent to put a bank card PIN into any device other than an ATM or a PIN
entry device (PED) in a store; and Citibank even asks customers to disregard
and report emails that ask for personal information, including PIN and account
details. So what happened? You guessed it – it sent its Australian customers an
email asking customers ‘as part of a security upgrade’ to log on to its website and authenticate themselves using a card number and an ATM PIN [1089].
And in one 2005 case, the Halifax sent a spam to the mother of a student of
ours who contacted the bank’s security department, which told her it was a
phish. The student then contacted the ISP to report abuse, and found that the
URL and the service were genuine [1243]. The Halifax disappeared during the
crash of 2008, and given that their own security department couldn’t tell spam
from phish, perhaps that was justice (though it cost us taxpayers a shedload
of money).
3.4.4.6 Customer education
After phishing became a real threat to online banking in the mid-2000s, banks
tried to train their customers to look for certain features in websites. This has
been partly risk reduction, but partly risk dumping – seeing to it that customers
who don’t understand or can’t follow instructions can be held responsible for
the resulting loss. The general pattern has been that as soon as customers are
trained to follow some particular rule, the phishermen exploit this, as the reasons for the rule are not adequately explained.
At the beginning, the advice was ‘Check the English’, so the bad guys either
got someone who could write English, or simply started using the banks’ own
emails but with the URLs changed. Then it was ‘Look for the lock symbol’,
so the phishing sites started to use SSL (or just forging it by putting graphics
of lock symbols on their web pages). Some banks started putting the last four
3.4 Passwords
digits of the customer account number into emails; the phishermen responded
by putting in the first four (which are constant for a given bank and card product). Next the advice was that it was OK to click on images, but not on URLs;
the phishermen promptly put in links that appeared to be images but actually
pointed at executables. The advice then was to check where a link would
really go by hovering your mouse over it; the bad guys then either inserted a
non-printing character into the URL to stop Internet Explorer from displaying
the rest, or used an unmanageably long URL (as many banks also did).
This sort of arms race is most likely to benefit the attackers. The countermeasures become so complex and counterintuitive that they confuse more and
more users – exactly what the phishermen need. The safety and usability communities have known for years that ‘blame and train’ is not the way to deal
with unusable systems – the only real fix is to design for safe usability in the
first place [1453].
3.4.4.7 Phishing warnings
Part of the solution is to give users better tools. Modern browsers alert you to
wicked URLs, with a range of mechanisms under the hood. First, there are lists
of bad URLs collated by the anti-virus and threat intelligence community. Second, there’s logic to look for expired certificates and other compliance failures
(as the majority of those alerts are false alarms).
There has been a lot of research, in both industry and academia, about
how you get people to pay attention to warnings. We see so many of them,
most are irrelevant, and many are designed to shift risk to us from someone
else. So when do people pay attention? In our own work, we tried a number
of things and found that people paid most attention when the warnings
were not vague and general (‘Warning - visiting this web site may harm your
computer!’) but specific and concrete (‘The site you are about to visit has been
confirmed to contain software that poses a significant risk to you, with no tangible
benefit. It would try to infect your computer with malware designed to steal your
bank account and credit card details in order to defraud you) [1329]. Subsequent
research by Adrienne Porter Felt and Google’s usability team has tried many
ideas including making warnings psychologically salient using faces (which
doesn’t work), simplifying the text (which helps) and making the safe defaults
both attractive and prominent (which also helps). Optimising these factors
improves compliance from about 35% to about 50% [675]. However, if you
want to stop the great majority of people from clicking on known-bad URLs,
then voluntary compliance isn’t enough. You either have to block them at
your firewall, or block them at the browser (as both Chrome and Firefox do
for different types of certificate error – a matter to which we’ll return in 21.6).
103
104
Chapter 3
■
Psychology and Usability
3.4.5 System issues
Not all phishing attacks involve psychology. Some involve technical mechanisms to do with password entry and storage together with some broader
system issues.
As we already noted, a key question is whether we can restrict the number of
password guesses. Security engineers sometimes refer to password systems as
‘online’ if guessing is limited (as with ATM PINs) and ‘offline’ if it is not (this
originally meant systems where a user could fetch the password file and take
it away to try to guess the passwords of other users, including more privileged
users). But the terms are no longer really accurate. Some offline systems can
restrict guesses, such as payment cards which use physical tamper-resistance to
limit you to three PIN guesses, while some online systems cannot. For example,
if you log on using Kerberos, an opponent who taps the line can observe your
key encrypted with your password flowing from the server to your client, and
then data encrypted with that key flowing on the line; so they can take their
time to try out all possible passwords. The most common trap here is the system that normally restricts password guesses but then suddenly fails to do so,
when it gets hacked and a one-way encrypted password file is leaked, together
with the encryption keys. Then the bad guys can try out their entire password
dictionary against each account at their leisure.
Password guessability ultimately depends on the entropy of the chosen passwords and the number of allowed guesses, but this plays out in the context of a
specific threat model, so you need to consider the type of attacks you are trying
to defend against. Broadly speaking, these are as follows.
Targeted attack on one account: an intruder tries to guess a specific user’s password. They might try to guess a rival’s logon
password at the office, in order to do mischief directly.
Attempt to penetrate any account belonging to a specific target: an enemy
tries to hack any account you own, anywhere, to get information that
might might help take over other accounts, or do harm directly.
Attempt to penetrate any account on a target system: the intruder tries to get
a logon as any user of the system. This is the classic case of the phisherman trying to hack any account at a target bank so he can launder stolen
money through it.
Attempt to penetrate any account on any system: the intruder merely
wants an account at any system in a given domain but doesn’t care
which one. Examples are bad guys trying to guess passwords on any
online email service so they can send spam from the compromised
account, and a targeted attacker who wants a logon to any random
machine in the domain of a target company as a beachhead.
3.4 Passwords
Attempt to use a breach of one system to penetrate a related one: the
intruder has got a beachhead and now wants to move inland to capture
higher-value targets.
Service-denial attack: the attacker may wish to block one or more legitimate users from using the system. This might be targeted on a particular
account or system-wide.
This taxonomy helps us ask relevant questions when evaluating a password
system.
3.4.6 Can you deny service?
There are basically three ways to deal with password guessing when you
detect it: lockout, throttling, and protective monitoring. Banks may freeze your
card after three wrong PINs; but if they freeze your online account after three
bad password attempts they open themselves up to a denial-of-service attack.
Service can also fail by accident; poorly-configured systems can generate
repeat fails with stale credentials. So many commercial websites nowadays
use throttling rather than lockout. In a military system, you might not want
even that, in case an enemy who gets access to the network could jam it with a
flood of false logon attempts. In this case, protective monitoring might be the
preferred option, with a plan to abandon rate-limiting if need be in a crisis. Joe
Bonneau and Soren Preibusch collected statistics of how many major websites
use account locking versus various types of rate control [295]. They found that
popular, growing, competent sites tend to be more secure, as do payment sites,
while content sites do worst. Microsoft Research’s Yuan Tian, Cormac Herley
and Stuart Schechter investigated how to do locking or throttling properly;
among other things, it’s best to penalise guesses of weak passwords (as otherwise an attacker gets advantage by guessing them first), to be more aggressive
when protecting users who have selected weak passwords, and to not punish
IPs or clients that repeatedly submit the same wrong password [1892].
3.4.7 Protecting oneself or others?
Next, to what extent does the system need to protect users and subsystems
from each other? In global systems on which anyone can get an account – such
as mobile phone systems and cash machine systems – you must assume that
the attackers are already legitimate users, and see to it that no-one can use the
service at someone else’s expense. So knowledge of one user’s password will
not allow another user’s account to be compromised. This has both personal
aspects, and system aspects.
105
106
Chapter 3
■
Psychology and Usability
On the personal side, don’t forget what we said about intimate partner
abuse in 2.5.4: the passwords people choose are often easy for their spouses or
partners to guess, and the same goes for password recovery questions: so some
thought needs to be given to how abuse victims can recover their security.
On the system side, there are all sorts of passwords used for mutual authentication between subsystems, few mechanisms to enforce password quality in
server-server environments, and many well-known issues (for example, the
default password for the Java trusted keystore file is ‘changeit’). Development
teams often share passwords that end up in live systems, even 30 years after
this practice led to the well-publicised hack of the Duke of Edinburgh’s email
described in section 3.4.4.4. Within a single big service firm you can lock stuff
down by having named crypto keys and seeing to it that each name generates
a call to an underlying hardware security module; or you can even use mechanisms like SGX to tie keys to known software. But that costs real money, and
money isn’t the only problem. Enterprise system components are often hosted
at different service companies, which makes adoption of better practices a hard
coordination problem too. As a result, server passwords often appear in scripts
or other plaintext files, which can end up in Dropbox or Splunk. So it is vital
to think of password practices beyond end users. In later chapters we’ll look at
protocols such as Kerberos and ssh; for now, recall Ed Snowden’s remark that
it was trivial to hack the typical large company: just spear-phish a sysadmin
and then chain your way in. Much of this chapter is about the ‘spear-phish a
sysadmin’ part; but don’t neglect the ‘chain your way in’ part.
3.4.8 Attacks on password entry
Password entry is often poorly protected.
3.4.8.1 Interface design
Thoughtless interface design is all too common. Some common makes of
cash machine have a vertical keyboard at head height, making it simple for a
pickpocket to watch a woman enter her PIN before lifting her purse from her
handbag. The keyboards may have been at a reasonable height for the men
who designed them, but women who are a few inches shorter are exposed.
When entering a card number or PIN in a public place, I usually cover my typing hand with my body or my other hand – but you can’t assume that all your
customers will. Many people are uncomfortable shielding a PIN as it’s a signal
of distrust, especially if they’re in a supermarket queue and a friend is standing
nearby. UK banks found that 20% of users never shield their PIN [128] – and
then used this to blame customers whose PINs were compromised by an overhead CCTV camera, rather than designing better PIN entry devices.
3.4 Passwords
3.4.8.2 Trusted path, and bogus terminals
A trusted path is some means of being sure that you’re logging into a genuine machine through a channel that isn’t open to eavesdropping. False terminal attacks go back to the dawn of time-shared computing. A public terminal
would be left running an attack program that looks just like the usual logon
screen – asking for a user name and password. When an unsuspecting user did
this, it would save the password, reply ‘sorry, wrong password’ and then vanish, invoking the genuine password program. The user assumed they’d made
a typing error and just entered the password again. This is why Windows had
a secure attention sequence; hitting ctrl-alt-del was guaranteed to take you to
a genuine password prompt. But eventually, in Windows 10, this got removed
to prepare the way for Windows tablets, and because almost nobody understood it.
ATM skimmers are devices that sit on an ATM’s throat, copy card details,
and have a camera to record the customer PIN. There are many variants on the
theme. Fraudsters deploy bad PIN entry devices too, and have even been jailed
for attaching password-stealing hardware to terminals in bank branches. I’ll
describe this world in much more detail in the chapter on banking and bookkeeping; the long-term solution has been to move from magnetic-strip cards
that are easy to copy to chip cards that are much harder. In any case, if a terminal might contain malicious hardware or software, then passwords alone will
not be enough.
3.4.8.3 Technical defeats of password retry counters
Many kids find out that a bicycle combination lock can usually be broken in a
few minutes by solving each ring in order of looseness. The same idea worked
against a number of computer systems. The PDP-10 TENEX operating system
checked passwords one character at a time, and stopped as soon as one of them
was wrong. This opened up a timing attack: the attacker would repeatedly place
a guessed password in memory at a suitable location, have it verified as part
of a file access request, and wait to see how long it took to be rejected [1131].
An error in the first character would be reported almost at once, an error in the
second character would take a little longer to report, and in the third character a little longer still, and so on. So you could guess the characters one after
another, and instead of a password of N characters drawn from an alphabet of
A characters taking AN ∕2 guesses on average, it took AN∕2. (Bear in mind that
in thirty years’ time, all that might remain of the system you’re building today
is the memory of its more newsworthy security failures.)
These same mistakes are being made all over again in the world of embedded systems. With one remote car locking device, as soon as a wrong byte was
transmitted from the key fob, the red telltale light on the receiver came on. With
107
108
Chapter 3
■
Psychology and Usability
some smartcards, it has been possible to determine the customer PIN by trying
each possible input value and looking at the card’s power consumption, then
issuing a reset if the input was wrong. The reason was that a wrong PIN caused
a PIN retry counter to be decremented, and writing to the EEPROM memory
which held this counter caused a current surge of several milliamps – which
could be detected in time to reset the card before the write was complete [1107].
These implementation details matter. Timing channels are a serious problem
for people implementing cryptography, as we’ll discuss at greater length in
the next chapter.
A recent high-profile issue was the PIN retry counter in the iPhone. My
colleague Sergei Skorobogatov noted that the iPhone keeps sensitive data
encrypted in flash memory, and built an adapter that enabled him to save the
encrypted memory contents and restore them to their original condition after
several PIN attempts. This enabled him to try all 10,000 possible PINs rather
than the ten PINs limit that Apple tried to impose [1781]5 .
3.4.9 Attacks on password storage
Passwords have often been vulnerable where they are stored. In MIT’s ‘Compatible Time Sharing System’ ctss – a 1960s predecessor of Multics – it once
happened that one person was editing the message of the day, while another
was editing the password file. Because of a software bug, the two editor temporary files got swapped, and everyone who logged on was greeted with a copy
of the password file! [476].
Another horrible programming error struck a UK bank in the late 1980s,
which issued all its customers with the same PIN by mistake [55]. As the procedures for handling PINs meant that no one in the bank got access to anyone’s
PIN other than their own, the bug wasn’t spotted until after thousands of customer cards had been shipped. Big blunders continue: in 2019 the security
company that does the Biostar and AEOS biometric lock system for building
entry control and whose customers include banks and police forces in 83 countries left a database unprotected online with over a million people’s IDs, plaintext passwords, fingerprints and facial recognition data; security researchers
who discovered this from an Internet scan were able to add themselves as
users [1867].
Auditing provides another hazard. When systems log failed password
attempts, the log usually contains a large number of passwords, as users
get the ‘username, password’ sequence out of phase. If the logs are not well
protected then someone who sees an audit record of a failed login with a
5 This
was done to undermine an argument by then FBI Director James Comey that the iPhone
was unhackable and so Apple should be ordered to produce an operating system upgrade that
created a backdoor; see section 26.2.7.4.
3.4 Passwords
non-existent user name of e5gv*8yp just has to try this as a password for all
the valid user names.
3.4.9.1 One-way encryption
Such incidents taught people to protect passwords by encrypting them using
a one-way algorithm, an innovation due to Roger Needham and Mike Guy.
The password, when entered, is passed through a one-way function and the
user is logged on only if it matches a previously stored value. However, it’s
often implemented wrong. The right way to do it is to generate a random key,
historically known in this context as a salt; combine the password with the salt
using a slow, cryptographically strong one-way function; and store both the
salt and the hash.
3.4.9.2 Password cracking
Some systems that use an encrypted password file make it widely readable.
Unix used to be the prime example – the password file /etc/passwd was readable by all users. So any user could fetch it and try to break passwords by
encrypting all the passwords in a dictionary and comparing them with the
encrypted values in the file. We already mentioned in 3.4.4.1 the ‘Crack’ software that people have used for years for this purpose.
Most modern operating systems have sort-of fixed this problem; in modern
Linux distributions, for example, passwords are salted, hashed using 5000
rounds of SHA-512, and stored in a file that only the root user can read. But
there are still password-recovery tools to help you if, for example, you’ve
encrypted an Office document with a password you’ve forgotten [1677]. Such
tools can also be used by a crook who has got root access, and there are still
lots of badly designed systems out there where the password file is vulnerable
in other ways.
There is also credential stuffing: when a system is hacked and passwords are
cracked (or were even found unencrypted), they are then tried out on other
systems to catch the many people who reused them. This remains a live problem. So password cracking is still worth some attention. One countermeasure
worth considering is deception, which can work at all levels in the stack. You
can have honeypot systems that alarm if anyone ever logs on to them, honeypot accounts on a system, or password canaries – bogus encrypted passwords
for genuine accounts [998].
3.4.9.3 Remote password checking
Many systems check passwords remotely, using cryptographic protocols
to protect the password in transit, and the interaction between password
109
110
Chapter 3
■
Psychology and Usability
security and network security can be complex. Local networks often use a
protocol called Kerberos, where a server sends you a key encrypted under
your password; if you know the password you can decrypt the key and use
it to get tickets that give you access to resources. I’ll discuss this in the next
chapter, in section 4.7.4; it doesn’t always protect weak passwords against
an opponent who can wiretap encrypted traffic. Web servers mostly use a
protocol called TLS to encrypt your traffic from the browser on your phone or
laptop; I discuss TLS in the following chapter, in section 5.7.5. TLS does not
protect you if the server gets hacked. However there is a new protocol called
Simultaneous Authentication of Equals (SAE) which is designed to set up
secure sessions even where the password is guessable, and which has been
adopted from 2018 in the WPA3 standard for WiFi authentication. I’ll discuss
this later too.
And then there’s OAuth, a protocol which allows access delegation, so you
can grant one website the right to authenticate you using the mechanisms provided by another. Developed by Twitter from 2006, it’s now used by the main
service providers such as Google, Microsoft and Facebook to let you log on to
media and other sites; an authorisation server issues access tokens for the purpose. We’ll discuss the mechanisms later too. The concomitant risk is cross-site
attacks; we are now (2019) seeing OAuth being used by state actors in authoritarian countries to phish local human-rights defenders. The technique is to
create a malicious app with a plausible name (say ‘Outlook Security Defender’)
and send an email, purportedly from Microsoft, asking for access. If the target
responds they end up at a Microsoft web page where they’re asked to authorise
the app to have access to their data [47].
3.4.10 Absolute limits
If you have confidence in the cryptographic algorithms and operating-system
security mechanisms that protect passwords, then the probability of a successful password guessing attack is a function of the entropy of passwords, if
they are centrally assigned, and the psychology of users if they’re allowed to
choose them. Military sysadmins often prefer to issue random passwords, so
the probability of password guessing attacks can be managed. For example,
if L is the maximum password lifetime, R is login attempt rate, S is the size of
the password space, then the probability that a password can be guessed in
its lifetime is P = LR∕S, according to the US Department of Defense password
management guideline [546].
There are issues with such a ‘provable security’ doctrine, starting with the
attackers’ goal. Do they want to crack a target account, or just any account? If
an army has a million possible passwords and a million users, and the alarm
goes off after three bad password attempts on any account, then the attacker
3.4 Passwords
can just try one password for every different account. If you want to stop this,
you have to do rate control not just for every account, but for all accounts.
To take a concrete example, Unix systems used to be limited to eight character passwords, so there were 968 or about 252 possible passwords. Some UK
government systems used to issue passwords randomly selected with a fixed
template of consonants, vowels and numbers designed to make them easier
to remember, such as CVCNCVCN (e.g. fuR5xEb8). If passwords are not case
sensitive, the guess probability is cut drastically, to only one in 214 .52 .102 or
about 2−29 . So if an attacker could guess 100 passwords a second – perhaps distributed across 10,000 accounts on hundreds of machines on a network, so as
not to raise the alarm – then they would need about 5 million seconds, or two
months, to get in. If you’re defending such a system, you might find it prudent to do rate control: set a limit of say one password guess per ten seconds
per user account, and perhaps by source IP address. You might also count the
failed logon attempts and analyse them: is there a constant series of guesses
that suggests an attacker using a botnet, or some other attempted intrusion?
And what will you do once you notice one? Will you close the system down?
Welcome back to the world of service denial.
With a commercial website, 100 passwords per second may translate to one
compromised user account per second, because of poor user password choices.
That may not be a big deal for a web service with 100 million accounts – but
it may still be worth trying to identify the source of any industrial-scale
password-guessing attacks. If they’re from a small number of IP addresses,
you can block them, but doing this properly is harder than it looks, as we
noted in section 3.4.6 above. And if an automated guessing attack does persist,
then another way of dealing with it is the CAPTCHA, which I’ll describe in
section 3.5.
3.4.11 Using a password manager
Since the 1980s, companies have been selling single sign-on systems that
remember your passwords for multiple applications, and when browsers
came along in the mid-1990s and people started logging into dozens of websites, password managers became a mass-market product. Browser vendors
noticed, and started providing much the same functionality for free.
Choosing random passwords and letting your browser remember them can
be a pragmatic way of operating. The browser will only enter the password
into a web page with the right URL (IE) or the same hostname and field name
(Firefox). Browsers let you set a master password, which encrypts all the individual site passwords and which you only have to enter when your browser is
updated. The main drawbacks of password managers in general are that you
might forget the master password; and that all your passwords may be compromised at once, since malware writers can work out how to hack common
111
112
Chapter 3
■
Psychology and Usability
products. This is a particular issue when using a browser, and another is that
a master password is not always the default so many users don’t set one. (The
same holds for other security services you get as options with platforms, such
as encrypting your phone or laptop.) An advantage of using the browser is that
you may be able to sync passwords between the browser in your phone and
that in your laptop.
Third-party password managers can offer more, such as choosing long
random passwords for you, identifying passwords shared across more than
one website, and providing more controllable ways for you to manage the
backup and recovery of your password collection. (With a browser, this comes
down to backing up your whole laptop or phone.) They can also help you track
your accounts, so you can see whether you had a password on a system that’s
announced a breach. The downside is that many products are truly dreadful,
with even some hardware password managers storing all your secrets in
the clear [131], while the top five software products suffer from serious and
systemic vulnerabilities, from autocomplete to ignoring subdomains [391].
How do you know that any given product is actually sound?
Many banks try to disable storage, whether by setting autocomplete="off"
in their web pages or using other tricks that block password managers too.
Banks think this improves security, but I’m not at all convinced. Stopping
people using password managers or the browser’s own storage will probably
make most of them use weaker passwords. The banks may argue that killing
autocomplete makes compromise following device theft harder, and may
stop malware stealing the password from the database of your browser or
password manager, but the phishing defence provided by that product is
disabled – which may expose the average customer to greater risk [1357]. It’s
also inconvenient; one bank that suddenly disabled password storage had to
back down the following day, because of the reaction from customers [1280].
People manage risk in all sorts of ways. I personally use different browsers
for different purposes, and let them store low-value passwords; for important
accounts, such as email and banking, I always enter passwords manually, and always navigate to them via bookmarks rather than by clicking
on links. But most people are less careful. And be sure to think through
backup and recovery, and exercise it to make sure it works. What happens
when your laptop dies? When your phone dies? When someone persuades
your phone company to link your phone number to their SIM? When you
die – or when you fall ill and your partner needs to manage your stuff? Do
they know where to find the master passwords? Writing them down in a
book can make sense, if all you (and your executor) have to remember is
‘page 169, Great Expectations.’ Writing them down in a diary you tote with
you, on a page saying ‘passwords’, is not so great. Very few people get all
this right.
3.4 Passwords
3.4.12 Will we ever get rid of passwords?
Passwords are annoying, so many people have discussed getting rid of them,
and the move from laptops to phones gives us a chance. The proliferation of
IoT devices that don’t have keyboards will force us to do without them for
some purposes. A handful of firms have tried to get rid of them completely.
One example is the online bank Monzo, which operates exclusively via an app.
They leave it up to the customer whether they protect their phone using a fingerprint, a pattern lock, a PIN or a password. However they still use email to
prompt people to upgrade, and to authenticate people who buy a new phone,
so account takeover involves either phone takeover, or guessing a password or
a password recovery question. The most popular app that uses SMS to authenticate rather than a password may be WhatsApp. I expect that this will become
more widespread; so we’ll see more attacks based on phone takeover, from SIM
swaps through Android malware, SS7 and RCS hacking, to simple physical
theft. In such cases, recovery often means an email loop, making your email
password more critical than ever – or phoning a call centre and telling them
your mother’s maiden name. So things may change less than they seem.
Joe Bonneau and colleagues analysed the options in 2012 [293]. There are
many criteria against which an authentication system can be evaluated, and
we’ve worked through them here: resilience to theft, to physical observation,
to guessing, to malware and other internal compromise, to leaks from other
verifiers, to phishing and to targeted impersonation. Other factors include ease
of use, ease of learning, whether you need to carry something extra, error rate,
ease of recovery, cost per user, and whether it’s an open design that anyone
can use. They concluded that most of the schemes involving net benefits were
variants on single sign-on – and OpenID has indeed become widespread, with
many people logging in to their newspaper using Google or Facebook, despite
the obvious privacy cost6 . Beyond that, any security improvements involve
giving up one or more of the benefits of passwords, namely that they’re easy,
efficient and cheap.
Bonneau’s survey gave high security ratings to physical authentication
tokens such as the CAP reader, which enables people to use their bank cards to
log on to online banking; bank regulators have already mandated two-factor
6 Government
attempts to set up single sign-on for public services have been less successful, with
the UK ‘Verify’ program due to be shuttered in 2020 [1394]. There have been many problems
around attempts to entrench government’s role in identity assurance, which I’ll discuss further
in the chapter on biometrics, and which spill over into issues from online services to the security of
elections. It was also hard for other private-sector firms to compete because of the network effects
enjoyed by incumbents. However in 2019 Apple announced that it would provide a new, more
privacy-friendly single sign-on mechanism, and use the market power of its app store to force
websites to support it. Thus the quality and nature of privacy on offer is becoming a side-effect
of battles fought for other motives. We’ll analyse this in more depth in the chapter on economics.
113
114
Chapter 3
■
Psychology and Usability
authentication in a number of countries. Using something tied to a bank
card gives a more traditional root of trust, at least with traditional high-street
banks; a customer can walk into a branch and order a new card7 . Firms that
are targets of state-level attackers, such as Google and Microsoft, now give
authentication tokens of some kind or another to all their staff.
Did the survey miss anything? Well, the old saying is ‘something you have,
something you know, or something you are’ – or, as Simson Garfinkel engagingly puts it, ‘something you had once, something you’ve forgotten, or something you once were’. The third option, biometrics, has started coming into
wide use since high-end mobile phones started offering fingerprint readers.
Some countries, like Germany, issue their citizens with ID cards containing a
fingerprint, which may provide an alternate root of trust for when everything
else goes wrong. We’ll discuss biometrics in its own chapter later in the book.
Both tokens and biometrics are still mostly used with passwords, first as a
backstop in case a device gets stolen, and second as part of the process of security recovery. So passwords remain the (shaky) foundation on which much of
information security is built. What may change this is the growing number of
devices that have no user interface at all, and so have to be authenticated using
other mechanisms. One approach that’s getting ever more common is trust on
first use, also known as the ‘resurrecting duckling’ after the fact that a duckling bonds on the first moving animal it sees after it hatches. We’ll discuss this
in the next chapter, and also when we dive into specific applications such as
security in vehicles.
Finally, you should think hard about how to authenticate customers or other
people who exercise their right to demand copies of their personal information
under data-protection law. In 2019, James Pavur sent out 150 such requests
to companies, impersonating his fiancée [1890]. 86 firms admitted they had
information about her, and many had the sense to demand her logon and password to authenticate her. But about a quarter were prepared to accept an email
address or phone number as authentication; and a further 16 percent asked for
easily forgeable ID. He collected full personal information about her, including
her credit card number, her social security number and her mother’s maiden
name. A threat intelligence firm with which she’d never interacted sent a list
of her accounts and passwords that had been compromised. Given that firms
face big fines in the EU if they don’t comply with such requests within 30 days,
you’d better work out in advance how to cope with them, rather than leaving
it to an assistant in your law office to improvise a procedure. If you abolish
passwords, and a former customer claims their phone was stolen, what do you
do then? And if you hold personal data on people who have never been your
customers, how do you identify them?
7 This
doesn’t work for branchless banks like Monzo; but they do take a video of you when you
register so that their call centre can recognise you later.
3.5 CAPTCHAs
3.5 CAPTCHAs
Can we have protection mechanisms that use the brain’s strengths rather than
its weaknesses? The most successful innovation in this field is probably the
CAPTCHA – the ‘Completely Automated Public Turing Test to Tell Computers
and Humans Apart’. These are the little visual puzzles that you often have to
solve to post to a blog, to register for a free online account, or to recover a password. The idea is that people can solve such problems easily, while computers
find them hard.
CAPTCHAs first came into use in a big way in 2003 to stop spammers
using scripts to open thousands of accounts on free email services, and to
make it harder for attackers to try a few simple passwords with each of a
large number of existing accounts. They were invented by Luis von Ahn and
colleagues [1973], who were inspired by the test famously posed by Alan
Turing as to whether a computer was intelligent: you put a computer in one
room and a human in another, and invite a human to try to tell them apart.
The test is turned round so that a computer can tell the difference between
human and machine.
Early versions set out to use a known ‘hard problem’ in AI such as the recognition of distorted text against a noisy background. The idea is that breaking
the CAPTCHA was equivalent to solving the AI problem, so an attacker would
actually have to do the work by hand, or come up with a real innovation in computer science. Humans were good at reading distorted text, while programs
were less good. It turned out to be harder than it seemed. A lot of the attacks
on CAPTCHAs, even to this day, exploit the implementation details.
Many of the image recognition problems posed by early systems also turned
out not to be too hard at all once smart people tried hard to solve them. There
are also protocol-level attacks; von Ahn mentioned that in theory a spammer
could get people to solve them as the price of access to free porn [1972]. This
soon started to happen: spammers created a game in which you undress a
woman by solving one CAPTCHA after another [192]. Within a few years,
we saw commercial CAPTCHA-breaking tools arriving on the market [844].
Within a few more, generic attacks using signal-processing techniques inspired
by the human visual system had become fairly efficient at solving at least a subset of most types of text CAPTCHA [746]. And security-economics research in
underground markets has shown that by 2011 the action had moved to using
humans; people in countries with incomes of a few dollars a day will solve
CAPTCHAs for about 50c per 1000.
From 2014, the CAPTCHA has been superseded by the ReCAPTCHA,
another of Luis von Ahn’s inventions. Here the idea is to get a number of
users to do some useful piece of work, and check their answers against each
other. The service initially asked people to transcribe fragments of text from
Google books that confused OCR software; more recently you get a puzzle
115
116
Chapter 3
■
Psychology and Usability
with eight pictures asking ‘click on all images containing a shop front’, which
helps Google train its vision-recognition AI systems8 . It pushes back on the
cheap-labour attack by putting up two or three multiple-choice puzzles and
taking tens of seconds over it, rather than allowing rapid responses.
The implementation of CAPTCHAs is often thoughtless, with accessibility
issues for users who are visually impaired. And try paying a road toll in Portugal where the website throws up a CAPTCHA asking you to identify pictures
with an object, if you can’t understand Portuguese well enough to figure out
what you’re supposed to look for!
3.6 Summary
Psychology matters to the security engineer, because of deception and because
of usability. Most real attacks nowadays target the user. Various kinds of phishing are the main national-security threat, the principal means of developing
and maintaining the cybercrime infrastructure, and one of the principal threats
to online banking systems. Other forms of deception account for much of the
rest of the cybercrime ecosystem, which is roughly equal to legacy crime in
both volume and value.
Part of the remedy is security usability, yet research in this field was long
neglected, being seen as less glamorous than cryptography or operating systems. That was a serious error on our part, and from the mid-2000s we have
started to realise the importance of making it easier for ordinary people to use
systems in safe ways. Since the mid-2010s we’ve also started to realise that we
also have to make things easier for ordinary programmers; many of the security bugs that have broken real systems have been the result of tools that were
just too hard to use, from cryptographic APIs that used unsafe defaults to the
C programming language. Getting usability right also helps business directly:
PayPal has built a $100bn business through being a safer and more convenient
way to shop online9 .
In this chapter, we took a whistle-stop tour through psychology research relevant to deception and to the kinds of errors people make, and then tackled
authentication as a case study. Much of the early work on security usability
focused on password systems, which raise dozens of interesting questions. We
now have more and more data not just on things we can measure in the lab such
as guessability, memorability, and user trainability, but also on factors that can
8 There’s been pushback from users who see a ReCAPTCHA saying ‘click on all images containing
a helicopter’ and don’t want to help in military AI research. Google’s own staff protested at this
research too and the military program was discontinued. But other users still object to working
for Google for free.
9 Full disclosure: I consult for them.
Research problems
only be observed in the field such as how real systems break, how real attacks
scale and how the incentives facing different players lead to unsafe equilibria.
At the end of the first workshop on security and human behavior in 2008, the
psychologist Nick Humphrey summed up a long discussion on risk. “We’re
all agreed,” he said, “that people pay too much attention to terrorism and not
enough to cybercrime. But to a psychologist this is obvious. If you want people
to be more relaxed in airports, take away the tanks and guns, put in some nice
sofas and Mozart in the loudspeakers, and people will relax soon enough. And
if you want people to be more wary online, make everyone use Jaws as their
screen saver. But that’s not going to happen as the computer industry goes out
of its way to make computers seem a lot less scary than they used to be.” And
of course governments want people to be anxious about terrorism, as it bids
up the police budgets and helps politicians get re-elected. So we give people
the wrong signals as well as spending our money on the wrong things. Understanding the many tensions between the demands of psychology, economics
and engineering is essential to building robust systems at global scale.
Research problems
Security psychology is one of the hot topics in 2020. In the second edition of
this book, I noted that the whole field of security economics had sprung into
life since the first edition in 2001, and wrote ‘We also need more fundamental thinking about the relationship between psychology and security’. Security
usability has become a discipline too, with the annual Symposium on Usable
Privacy and Security, and we’ve been running workshops to bring security
engineers together with anthropologists, psychologists, philosophers and others who work on risk and how people cope with it.
My meta-algorithm for finding research topics is to look first at applications
and then at neighbouring disciplines. An example of the first is safe usability: as
safety-critical products from cars to medical devices acquire not just software
and Internet connections, but complex interfaces and even their own apps, how
can we design them so that they won’t harm people by accident, or as a result
of malice?
An example of the second, and the theme of the Workshop on Security and
Human Behaviour, is what we can learn from disciplines that study how people
deal with risk, ranging from anthropology and psychology to sociology, history
and philosophy. Our 2020 event is hosting leading criminologists. The pandemic now suggests that maybe we should work with architects too. They’re
now working out how people can be physically distant but socially engaged,
and their skill is understanding how form facilitates human experience and
human interaction. There’s more to design than just hacking code.
117
118
Chapter 3
■
Psychology and Usability
Further reading
The Real Hustle videos are probably the best tutorial on deception; a number
of episodes are on YouTube. Meanwhile, the best book on social engineering is
still Kevin Mitnick’s ‘The Art of Deception’ [1327]. Amit Katwala wrote a short
survey of deception detection technologies [1027] while Tony Docan-Morgan
has edited a 2019 handbook on the state of deception research with 51 chapters
by specialists on its many aspects [569].
For how social psychology gets used and abused in marketing, the must-read
book is Tim Wu’s ‘The Attention Merchants’ which tells the history of advertising [2052].
In the computer science literature, perhaps a good starting point is James
Reason’s ‘Human Error’, which tells us what the safety-critical systems community has learned from many years studying the cognate problems in their
field [1592]. Then there are standard HCI texts such as [1547], while early
papers on security usability appeared as [493] and on phishing appeared
as [978]. As we move to a world of autonomous devices, there is a growing
body of research on how we can get people to trust robots more by Disneyfication – for example, giving library robots eyes that follow the direction of travel,
and making them chirp with happiness when they help a customer [1690].
Similar research on autonomous vehicles shows that people trust such vehicles
more if they’re given some personality, and the passengers are given some
strategic control such as the ability to select routes or even just to order the car
to stop.
As for behavioral economics, I get my students to read Danny Kahneman’s
Nobel prize lecture. For more technical detail, there’s a volume of papers
Danny edited just before that with Tom Gilovich and Dale Griffin [770], or
the pop science book ‘Thinking, Fast and Slow’ that he wrote afterwards [1007].
An alternative view, which gives the whole history of behavioral economics,
is Dick Thaler’s ‘Misbehaving: The Making of Behavioural Economics’ [1877]. For
the applications of this theory in government and elsewhere, the standard
reference is Dick Thaler and Cass Sunnstein’s ‘Nudge’ [1879]. Dick’s later
second thoughts about ‘Sludge’ are at [1878].
For a detailed history of passwords and related mechanisms, as well as many
empirical results and an analysis of statistical techniques for measuring both
guessability and recall, I strongly recommend Joe Bonneau’s thesis [290], a
number of whose chapters ended up as papers I cited above.
Finally, if you’re interested in the dark side, ‘The Manipulation of Human
Behavior’ by Albert Biderman and Herb Zimmer reports experiments on interrogation carried out after the Korean War with US Government funding [240].
Known as the Torturer’s Bible, it describes the relative effectiveness of sensory
deprivation, drugs, hypnosis, social pressure and so on when interrogating and
brainwashing prisoners. As for the polygraph and other deception-detection
techniques used nowadays, the standard reference is by Aldert Vrij [1974].
CHAPTER
4
Protocols
It is impossible to foresee the consequences of being clever.
– CHRISTOPHER STRACHEY
If it’s provably secure, it probably isn’t.
– LARS KNUDSEN
4.1 Introduction
Passwords are just one example of a more general concept, the security
protocol. If security engineering has a core theme, it may be the study of
security protocols. They specify the steps that principals use to establish trust
relationships. They are where the cryptography and the access controls meet;
they are the tools we use to link up human users with remote machines, to
synchronise security contexts, and to regulate key applications such as payment. We’ve come across a few protocols already, including challenge-response
authentication and Kerberos. In this chapter, I’ll dig down into the details, and
give many examples of how protocols fail.
A typical security system consists of a number of principals such as people,
companies, phones, computers and card readers, which communicate using
a variety of channels including fibre, wifi, the cellular network, bluetooth,
infrared, and by carrying data on physical devices such as bank cards and
transport tickets. The security protocols are the rules that govern these communications. They are designed so that the system will survive malicious acts
such as people telling lies on the phone, hostile governments jamming radio,
or forgers altering the data on train tickets. Protection against all possible
attacks is often too expensive, so protocol designs make assumptions about
threats. For example, when we get a user to log on by entering a password into
a machine, we implicitly assume that she can enter it into the right machine.
In the old days of hard-wired terminals in the workplace, this was reasonable;
119
120
Chapter 4
■
Protocols
now that people log on to websites over the Internet, it is much less obvious.
Evaluating a protocol thus involves two questions: first, is the threat model
realistic? Second, does the protocol deal with it?
Protocols may be very simple, such as swiping a badge through a reader to
enter a building. They often involve interaction, and are not necessarily technical. For example, when we order a bottle of fine wine in a restaurant, the
standard protocol is that the wine waiter offers us the menu (so that we see the
prices but our guests don’t); they bring the bottle, so we can check the label,
the seal and the temperature; they open it so we can taste it; and then serve it.
This has evolved to provide some privacy (our guests don’t learn the price),
some integrity (we can be sure we got the right bottle and that it wasn’t refilled
with cheap plonk) and non-repudiation (we can’t complain afterwards that the
wine was off). Matt Blaze gives other non-technical protocol examples from
ticket inspection, aviation security and voting in [261]. Traditional protocols
like these often evolved over decades or centuries to meet social expectations
as well as technical threats.
At the technical end of things, protocols get a lot more complex, and they
don’t always get better. As the car industry moved from metal keys to electronic
keys with buttons you press, theft fell, since the new keys were harder to copy.
But the move to keyless entry has seen car crime rise again, as the bad guys
figured out how to build relay devices that would make a key seem closer to
the car than it actually was. Another security upgrade that’s turned out to be
tricky is the move from magnetic-strip cards to smartcards. Europe made this
move in the late 2000s while the USA is only catching up in the late 2010s.
Fraud against cards issued in Europe actually went up for several years; clones
of European cards were used in magnetic-strip cash machines in the USA, as
the two systems’ protection mechanisms didn’t quite mesh. And there was a
protocol failure that let a thief use a stolen chipcard in a store even if he didn’t
know the PIN, which took the banks several years to fix.
So we need to look systematically at security protocols and how they fail.
4.2 Password eavesdropping risks
Passwords and PINs are still the foundation for much of computer security, as
the main mechanism used to authenticate humans to machines. We discussed
their usability in the last chapter; now let’s consider the kinds of technical
attack we have to block when designing protocols that operate between one
machine and another.
Remote key entry is a good place to start. The early systems, such as the
remote control used to open your garage or to unlock cars manufactured up
4.2 Password eavesdropping risks
to the mid-1990’s, just broadcast a serial number. The attack that killed them
was the ‘grabber’, a device that would record a code and replay it later. The first
grabbers, seemingly from Taiwan, arrived on the market in about 1995; thieves
would lurk in parking lots or outside a target’s house, record the signal used
to lock the car and then replay it once the owner had gone1 .
The first countermeasure was to use separate codes for lock and unlock. But
the thief can lurk outside your house and record the unlock code before you
drive away in the morning, and then come back at night and help himself.
Second, sixteen-bit passwords are too short. Occasionally people found they
could unlock the wrong car by mistake, or even set the alarm on a car whose
owner didn’t know he had one [309]. And by the mid-1990’s, devices appeared
that could try all possible codes one after the other. A code will be found on
average after about 215 tries, and at ten per second that takes under an hour.
A thief operating in a parking lot with a hundred vehicles within range would
be rewarded in less than a minute with a car helpfully flashing its lights.
The next countermeasure was to double the length of the password from 16
to 32 bits. The manufacturers proudly advertised ‘over 4 billion codes’. But this
only showed they hadn’t really understood the problem. There were still only
one or two codes for each car, and grabbers still worked fine.
Using a serial number as a password has a further vulnerability: lots of people
have access to it. In the case of a car, this might mean all the dealer staff, and
perhaps the state motor vehicle registration agency. Some burglar alarms have
also used serial numbers as master passwords, and here it’s even worse: when
a bank buys a burglar alarm, the serial number may appear on the order, the
delivery note and the invoice. And banks don’t like sending someone out to
buy something for cash.
Simple passwords are sometimes the appropriate technology. For example, a
monthly season ticket for our local swimming pool simply has a barcode. I’m
sure I could make a passable forgery, but as the turnstile attendants get to know
the ‘regulars’, there’s no need for anything more expensive. For things that are
online, however, static passwords are hazardous; the Mirai botnet got going by
recruiting wifi-connected CCTV cameras which had a password that couldn’t
be changed. And for things people want to steal, like cars, we also need something better. This brings us to cryptographic authentication protocols.
1
With garage doors it’s even worse. A common chip is the Princeton PT2262, which uses 12
tri-state pins to encode 312 or 531,441 address codes. However implementers often don’t read
the data sheet carefully enough to understand tri-state inputs and treat them as binary instead,
getting 212 . Many of them only use eight inputs, as the other four are on the other side of the
chip. And as the chip has no retry-lockout logic, an attacker can cycle through the combinations
quickly and open your garage door after 27 attempts on average. Twelve years after I noted these
problems in the second edition of this book, the chip has not been withdrawn. It’s now also sold
for home security systems and for the remote control of toys.
121
122
Chapter 4
■
Protocols
4.3 Who goes there? – simple authentication
A simple modern authentication device is the token that some multistorey
parking garages give subscribers to raise the barrier. The token has a single
button; when you press it, it first transmits its serial number and then sends an
authentication block consisting of the same serial number, followed by a random number, all encrypted using a key unique to the device, and sent to the
garage barrier (typically by radio at 434MHz, though infrared is also used). We
will postpone discussion of how to encrypt data to the next chapter, and simply
write {X}K for the message X encrypted under the key K.
Then the protocol between the access token and the parking garage can be
written as:
T → G ∶ T, {T, N}KT
This is standard protocol notation, so we’ll take it slowly.
The token T sends a message to the garage G consisting of its name T followed by the encrypted value of T concatenated with N, where N stands for
‘number used once’, or nonce. Everything within the braces is encrypted, and
the encryption binds T and N together as well as obscuring their values. The
purpose of the nonce is to assure the recipient that the message is fresh, that is,
it is not a replay of an old message. Verification is simple: the garage reads T,
gets the corresponding key KT, deciphers the rest of the message, checks that
the nonce N has not been seen before, and finally that the plaintext contains T.
One reason many people get confused is that to the left of the colon, T identifies one of the principals (the token that represents the subscriber) whereas to
the right it means the name (that is, the unique device number) of the token.
Another is that once we start discussing attacks on protocols, we may find
that a message intended for one principal was intercepted and played back
by another. So you might think of the T → G to the left of the colon as a hint as
to what the protocol designer had in mind.
A nonce can be anything that guarantees the freshness of a message. It can be
a random number, a counter, a random challenge received from a third party,
or even a timestamp. There are subtle differences between them, such as in the
level of resistance they offer to various kinds of replay attack, and the ways
in which they increase system cost and complexity. In very low-cost systems,
random numbers and counters predominate as it’s cheaper to communicate in
one direction only, and cheap devices usually don’t have clocks.
Key management in such devices can be very simple. In a typical garage
token product, each token’s key is just its unique device number encrypted
under a global master key KM known to the garage:
KT = {T}KM
4.3 Who goes there? – simple authentication
This is known as key diversification or key derivation. It’s a common way of
implementing access tokens, and is widely used in smartcards too. The goal
is that someone who compromises a token by drilling into it and extracting
the key cannot masquerade as any other token; all he can do is make a copy
of one particular subscriber’s token. In order to do a complete break of the
system, and extract the master key that would enable him to pretend to be
any of the system’s users, an attacker has to compromise the central server at
the garage (which might protect this key in a tamper-resistant smartcard or
hardware security module).
But there is still room for error. A common failure mode is for the serial numbers – whether unique device numbers or protocol counters – not to be long
enough, so that someone occasionally finds that their remote control works for
another car in the car park as well. This can be masked by cryptography. Having 128-bit keys doesn’t help if the key is derived by encrypting a 16-bit device
number, or by taking a 16-bit key and repeating it eight times. In either case,
there are only 216 possible keys, and that’s unlikely to be enough even if they
appear to be random2 .
Protocol vulnerabilities usually give rise to more, and simpler, attacks than
cryptographic weaknesses do. An example comes from the world of prepayment utility meters. Over a million households in the UK, plus over 400 million
in developing countries, have an electricity or gas meter that accepts encrypted
tokens: the householder buys a magic number and types it into the meter,
which then dispenses the purchased quantity of energy. One early meter that
was widely used in South Africa checked only that the nonce was different
from last time. So the customer could charge their meter indefinitely by buying two low-value power tickets and then feeding them in one after the other;
given two valid codes A and B, the series ABABAB... was seen as valid [94].
So the question of whether to use a random number or a counter is not as
easy as it looks. If you use random numbers, the lock has to remember a lot of
past codes. There’s the valet attack, where someone with temporary access, such
as a valet parking attendant, records some access codes and replays them later
to steal your car. In addition, someone might rent a car, record enough unlock
codes, and then go back later to the rental lot to steal it. Providing enough nonvolatile memory to remember thousands of old codes might add a few cents to
the cost of your lock.
If you opt for counters, the problem is synchronization. The key might be
used for more than one lock; it may also be activated repeatedly by accident
(I once took an experimental token home where it was gnawed by my dogs).
So you need a way to recover after the counter has been incremented hundreds
or possibly even thousands of times. One common product uses a sixteen bit
2 We’ll
go into this in more detail in section 5.3.1.2 where we discuss the birthday theorem in
probability theory.
123
124
Chapter 4
■
Protocols
counter, and allows access when the deciphered counter value is the last valid
code incremented by no more than sixteen. To cope with cases where the token
has been used more than sixteen times elsewhere (or gnawed by a family pet),
the lock will open on a second press provided that the counter value has been
incremented between 17 and 32,767 times since a valid code was entered (the
counter rolls over so that 0 is the successor of 65,535). This is fine in many
applications, but a thief who can get six well-chosen access codes – say for
values 0, 1, 20,000, 20,001, 40,000 and 40,001 – can break the system completely.
In your application, would you be worried about that?
So designing even a simple token authentication mechanism is not as easy
as it looks, and if you assume that your product will only attract low-grade
adversaries, this assumption might fail over time. An example is accessory
control. Many printer companies embed authentication mechanisms in printers
to ensure that genuine toner cartridges are used. If a competitor’s product is
loaded instead, the printer may quietly downgrade from 1200 dpi to 300 dpi,
or simply refuse to work at all. All sorts of other industries are getting in on the
act, from scientific instruments to games consoles. The cryptographic mechanisms used to support this started off in the 1990s being fairly rudimentary, as
vendors thought that any competitor who circumvented them on an industrial
scale could be sued or even jailed under copyright law. But then a judge found
that while a vendor had the right to hire the best cryptographer they could
find to lock their customers in, a competitor also had the right to hire the best
cryptanalyst they could find to set them free to buy accessories from elsewhere.
This set off a serious arms race, which we’ll discuss in section 24.6. Here I’ll
just remark that security isn’t always a good thing. Security mechanisms are
used to support many business models, where they’re typically stopping the
device’s owner doing things she wants to rather than protecting her from
the bad guys. The effect may be contrary to public policy; one example is
cellphone locking, which results in hundreds of millions of handsets ending
up in landfills each year, with toxic heavy metals as well as the embedded
carbon cost.
4.3.1 Challenge and response
Since 1995, all cars sold in Europe were required to have a ‘cryptographically
enabled immobiliser’ and by 2010, most cars had remote-controlled door
unlocking too, though most also have a fallback metal key so you can still
get into your car even if the key fob battery is flat. The engine immobiliser is
harder to bypass using physical means and uses a two-pass challenge-response
protocol to authorise engine start. As the car key is inserted into the steering
lock, the engine controller sends a challenge consisting of a random n-bit
number to the key using short-range radio. The car key computes a response
4.3 Who goes there? – simple authentication
by encrypting the challenge; this is often done by a separate RFID chip that’s
powered by the incoming radio signal and so keeps on working even if
the battery is flat. The frequency is low (125kHz) so the car can power the
transponder directly, and the exchange is also relatively immune to a noisy RF
environment.
Writing E for the engine controller, T for the transponder in the car key, K
for the cryptographic key shared between the transponder and the engine controller, and N for the random challenge, the protocol may look something like:
E→T∶
T→E∶
N
T, {T, N}K
This is sound in theory, but implementations of security mechanisms often
fail the first two or three times people try it.
Between 2005 and 2015, all the main remote key entry and immobiliser systems were broken, whether by security researchers, car thieves or both. The
attacks involved a combination of protocol errors, poor key management, weak
ciphers, and short keys mandated by export control laws.
The first to fall was TI’s DST transponder chip, which was used by at least
two large car makers and was also the basis of the SpeedPass toll payment
system. Stephen Bono and colleagues found in 2005 that it used a block cipher
with a 40-bit key, which could be calculated by brute force from just two
responses [298]. This was one side-effect of US cryptography export controls,
which I discuss in 26.2.7.1. From 2010, Ford, Toyota and Hyundai adopted
a successor product, the DST80. The DST80 was broken in turn in 2020 by
Lennert Wouters and colleagues, who found that as well as side-channel
attacks on the chip, there are serious implementation problems with key
management: Hyundai keys have only 24 bits of entropy, while Toyota keys
are derived from the device serial number that an attacker can read (Tesla
was also vulnerable but unlike the older firms it could fix the problem with
a software upgrade) [2050]. Next was Keeloq, which was used for garage
door openers as well as by some car makers; in 2007, Eli Biham and others
found that given an hour’s access to a token they could collect enough data
to recover the key [244]. Worse, in some types of car, there is also a protocol
⨁
bug, in that the key diversification used exclusive-or: KT = T KM. So you
can rent a car of the type you want to steal and work out the key for any other
car of that type.
Also in 2007, someone published the Philips Hitag 2 cipher, which also had
a 48-bit secret key. But this cipher is also weak, and as it was attacked by various cryptanalysts, the time needed to extract a key fell from days to hours to
minutes. By 2016, attacks took 8 authentication attempts and a minute of computation on a laptop; they worked against cars from all the French and Italian
makers, along with Nissan, Mitsubishi and Chevrolet [748].
125
126
Chapter 4
■
Protocols
The last to fall was the Megamos Crypto transponder, used by Volkswagen
and others. Car locksmithing tools appeared on the market from 2008, which
included the Megamos cipher and were reverse engineered by researchers from
Birmingham and Nijmegen – Roel Verdult, Flavio Garcia and Bar𝚤ş Ege – who
cracked it [1956]. Although it has a 96-bit secret key, the effective key length
is only 49 bits, about the same as Hitag 2. Volkswagen got an injunction in
the High Court in London to stop them presenting their work at Usenix 2013,
claiming that their trade secrets had been violated. The researchers resisted,
arguing that the locksmithing tool supplier had extracted the secrets. After two
years of argument, the case settled without admission of liability on either side.
Closer study then threw up a number of further problems. There’s also a protocol attack as an adversary can rewrite each 16-bit word of the 96-bit key, one
after another, and search for the key 16 bits at a time; this reduces the time
needed for an attack from days to minutes [1957].
Key management was pervasively bad. A number of Volkswagen implementations did not diversify keys across cars and transponders, but used a fixed
global master key for millions of cars at a time. Up till 2009, this used a cipher
called AUT64 to generate device keys; thereafter they moved to a stronger
cipher called XTEA but kept on using global master keys, which were found
in 23 models from the Volkswagen-Audi group up till 2016 [748]3 .
It’s easy to find out if a car is vulnerable: just try to buy a spare key. If the locksmith companies have figured out how to duplicate the key, your local garage
will sell you a spare for a few bucks. We have a spare key for my wife’s 2005
Lexus, bought by the previous owner. But when we lost one of the keys for my
2012 Mercedes, we had to go to a main dealer, pay over £200, show my passport
and the car log book, have the mechanic photograph the vehicle identification
number on the chassis, send it all off to Mercedes and wait for a week. We saw
in Chapter 3 that the hard part of designing a password system was recovering from compromise without the recovery mechanism itself becoming either
a vulnerability or a nuisance. Exactly the same applies here!
But the worst was still to come: passive keyless entry systems (PKES).
Challenge-response seemed so good that car vendors started using it with just
a push button on the dashboard to start the car, rather than with a metal key.
Then they increased the radio frequency to extend the range, so that it worked
not just for short-range authentication once the driver was sitting in the car,
but as a keyless entry mechanism. The marketing pitch was that so long as
3 There
are some applications where universal master keys are inevitable, such as in communicating with a heart pacemaker – where a cardiologist may need to tweak the pacemaker of
any patient who walks in, regardless of where it was first fitted, and regardless of whether the
network’s up – so the vendor puts the same key in all its equipment. Another example is the
subscriber smartcard in a satellite-TV set-top box, which we’ll discuss later. But they often result
in a break-once-run-anywhere (BORA) attack. To install universal master keys in valuable assets
like cars in a way that facilitated theft and without even using proper tamper-resistant chips to
protect them was an egregious error.
4.3 Who goes there? – simple authentication
you keep the key in your pocket or handbag you don’t have to worry about it;
the car will unlock when you walk up to it, lock as you walk away, and start
automatically when you touch the controls. What’s not to like?
Well, now you don’t have to press a button to unlock your car, it’s easy for
thieves to use devices that amplify or relay the signals. The thief sneaks up to
your front door with one relay while leaving the other next to your car. If you
left your keys on the table in the hall, the car door opens and away he goes.
Even if the car is immobilised he can still steal your stuff. And after many years
of falling car thefts, the statistics surged in 2017 with 56% more vehicles stolen
in the UK, followed by a further 9% in 2018 [824]4 .
The takeaway message is that the attempt since about 1990 to use cryptography to make cars harder to steal had some initial success, as immobilisers
made cars harder to steal and insurance premiums fell. It has since backfired,
as the politicians and then the marketing people got in the way. The politicians
said it would be disastrous for law enforcement if people were allowed to use
cryptography they couldn’t crack, even for stopping car theft. Then the immobiliser vendors’ marketing people wanted proprietary algorithms to lock in the
car companies, whose own marketing people wanted passive keyless entry as
it seemed cool.
What can we do? Well, at least two car makers have put an accelerometer in
the key fob, so it won’t work unless the key is moving. One of our friends left
her key on the car seat while carrying her child indoors, and got locked out.
The local police advise us to use old-fashioned metal steering-wheel locks; our
residents’ association recommends keeping keys in a biscuit tin. As for me, we
bought such a car but found that the keyless entry was simply too flaky; my
wife got stranded in a supermarket car park when it just wouldn’t work at all.
So we took that car back, and got a second-hand one with a proper push-button
remote lock. There are now chips using AES from NXP, Atmel and TI – of which
the Atmel is open source with an open protocol stack.
However crypto by itself can’t fix relay attacks; the proper fix is a new radio
protocol based on ultrawideband (UWB) with intrinsic ranging, which measures the distance from the key fob to the car with a precision of 10cm up to
a range of 150m. This is fairly complex to do properly, and the design of the
new 802.15.4z Enhanced Impulse Radio is described by Srdjan Capkun and
colleagues [1768]; the first chip became available in 2019, and it will ship in cars
from 2020. Such chips have the potential to replace both the Bluetooth and NFC
protocols, but they might not all be compatible; there’s a low-rate pulse (LRP)
mode that has an open design, and a high-rate pulse (HRP) variant that’s partly
proprietary. Were I advising a car startup, LRP would be my starting point.
4 To
be fair this was not due solely to relay attacks, as about half of the high-value thefts seem
to involve connecting a car theft kit to the onboard diagnostic port under the glove box. As it
happens, the authentication protocols used on the CAN bus inside the vehicle are also vulnerable
in a number of ways [893]. Updating these protocols will take many years because of the huge
industry investment.
127
128
Chapter 4
■
Protocols
Locks are not the only application of challenge-response protocols. In HTTP
Digest Authentication, a web server challenges a client or proxy, with whom it
shares a password, by sending it a nonce. The response consists of the hash of
the nonce, the password, and the requested URI [715]. This provides a mechanism that’s not vulnerable to password snooping. It’s used, for example, to
authenticate clients and servers in SIP, the protocol for Voice-Over-IP (VOIP)
telephony. It’s much better than sending a password in the clear, but like keyless entry it suffers from middleperson attacks (the beneficiaries seem to be
mostly intelligence agencies).
4.3.2 Two-factor authentication
The most visible use of challenge-response is probably in two-factor authentication. Many organizations issue their staff with password generators to let them
log on to corporate computer systems, and many banks give similar devices
to customers. They may look like little calculators (and some even work as
such) but their main function is as follows. When you want to log in, you
are presented with a random nonce of maybe seven digits. You key this into
your password generator, together with a PIN of maybe four digits. The device
encrypts these eleven digits using a secret key shared with the corporate security server, and displays the first seven digits of the result. You enter these seven
digits as your password. This protocol is illustrated in Figure 4.1. If you had a
password generator with the right secret key, and you entered the PIN right,
and you typed in the result correctly, then you get in.
Formally, with S for the server, P for the password generator, PIN for the
user’s Personal Identification Number, U for the user and N for the nonce:
S→U∶
U→P∶
P→U∶
U→S∶
N
N, PIN
{N, PIN}K
{N, PIN}K
These devices appeared from the early 1980s and caught on first with phone
companies, then in the 1990s with banks for use by staff. There are simplified
versions that don’t have a keyboard, but just generate new access codes by
encrypting a counter or a clock. And they work; the US Defense Department
announced in 2007 that an authentication system based on the DoD Common
Access Card had cut network intrusions by 46% in the previous year [321].
This was just when crooks started phishing bank customers at scale, so many
banks adopted the technology. One of my banks gives me a small calculator
that generates a new code for each logon, and also allows me to authenticate
new payees by using the last four digits of their account number in place of
the challenge. My other bank uses the Chip Authentication Program (CAP), a
calculator in which I can insert my bank card to do the crypto.
4.3 Who goes there? – simple authentication
N?
{N, PIN}K
N?
.....
Figure 4.1: Password generator use
But this still isn’t foolproof. In the second edition of this book, I noted ‘someone who takes your bank card from you at knifepoint can now verify that
you’ve told them the right PIN’, and this now happens. I also noted that ‘once
lots of banks use one-time passwords, the phishermen will just rewrite their
scripts to do real-time man-in-the-middle attacks’ and this has also become
widespread. To see how such attacks work, let’s look at a military example.
4.3.3 The MIG-in-the-middle attack
The first use of challenge-response authentication protocols was probably in
the military, with ‘identify-friend-or-foe’ (IFF) systems. The ever-increasing
speeds of warplanes in the 1930s and 1940s, together with the invention of the
jet engine, radar and rocketry, made it ever more difficult for air defence forces
to tell their own craft apart from the enemy’s. This led to a risk of pilots shooting down their colleagues by mistake and drove the development of automatic
systems to prevent this. These were first fielded in World War II, and enabled
an airplane illuminated by radar to broadcast an identifying number to signal friendly intent. In 1952, this system was adopted to identify civil aircraft
to air traffic controllers and, worried about the loss of security once it became
widely used, the US Air Force started a research program to incorporate cryptographic protection in the system. Nowadays, the typical air defense system
sends random challenges with its radar signals, and friendly aircraft can identify themselves with correct responses.
129
130
Chapter 4
■
Protocols
It’s tricky to design a good IFF system. One of the problems is illustrated
by the following story, which I heard from an officer in the South African Air
Force (SAAF). After it was published in the first edition of this book, the story
was disputed – as I’ll discuss below. Be that as it may, similar games have
been played with other electronic warfare systems since World War 2. The
‘MIG-in-the-middle’ story has since become part of the folklore, and it nicely
illustrates how attacks can be carried out in real time on challenge-response
protocols.
In the late 1980’s, South African troops were fighting a war in northern
Namibia and southern Angola. Their goals were to keep Namibia under white
rule, and impose a client government (UNITA) on Angola. Because the South
African Defence Force consisted largely of conscripts from a small white
population, it was important to limit casualties, so most South African soldiers
remained in Namibia on policing duties while the fighting to the north was
done by UNITA troops. The role of the SAAF was twofold: to provide tactical
support to UNITA by bombing targets in Angola, and to ensure that the
Angolans and their Cuban allies did not return the compliment in Namibia.
Suddenly, the Cubans broke through the South African air defenses and
carried out a bombing raid on a South African camp in northern Namibia,
killing a number of white conscripts. This proof that their air supremacy had
been lost helped the Pretoria government decide to hand over Namibia to the
insurgents –itself a huge step on the road to majority rule in South Africa
several years later. The raid may also have been the last successful military
operation ever carried out by Soviet bloc forces.
Some years afterwards, a SAAF officer told me how the Cubans had pulled
it off. Several MIGs had loitered in southern Angola, just north of the South
African air defense belt, until a flight of SAAF Impala bombers raided a target in Angola. Then the MIGs turned sharply and flew openly through the
SAAF’s air defenses, which sent IFF challenges. The MIGs relayed them to the
Angolan air defense batteries, which transmitted them at a SAAF bomber; the
responses were relayed back to the MIGs, who retransmitted them and were
allowed through – as in Figure 4.2. According to my informant, this shocked
the general staff in Pretoria. Being not only outfought by black opponents, but
actually outsmarted, was not consistent with the world view they had held up
till then.
After this tale was published in the first edition of my book, I was contacted
by a former officer in SA Communications Security Agency who disputed the
story’s details. He said that their IFF equipment did not use cryptography yet
at the time of the Angolan war, and was always switched off over enemy territory. Thus, he said, any electronic trickery must have been of a more primitive
kind. However, others tell me that ‘Mig-in-the-middle’ tricks were significant
in Korea, Vietnam and various Middle Eastern conflicts.
4.3 Who goes there? – simple authentication
SAAF
N?
N
ANGOLA
N
K
K
N?
MIG
N?
N
K
SAAF
NAMIBIA
Figure 4.2: The MIG-in-the middle attack
In any case, the tale gives us another illustration of the man-in-the-middle
attack. The relay attack against cars is another example. It also works against
password calculators: the phishing site invites the mark to log on and
simultaneously opens a logon session with his bank. The bank sends a
challenge; the phisherman relays this to the mark, who uses his device to
respond to it; the phisherman relays the response to the bank, and the bank
now accepts the phisherman as the mark.
Stopping a middleperson attack is harder than it looks, and may involve multiple layers of defence. Banks typically look for a known machine, a password,
a second factor such as an authentication code from a CAP reader, and a risk
assessment of the transaction. For high-risk transactions, such as adding a new
payee to an account, both my banks demand that I compute an authentication
code on the payee account number. But they only authenticate the last four
131
132
Chapter 4
■
Protocols
digits, because of usability. If it takes two minutes and the entry of dozens of
digits to make a payment, then a lot of customers will get digits wrong, give
up, and then either call the call center or get annoyed and bank elsewhere.
Also, the bad guys may be able to exploit any fallback mechanisms, perhaps
by spoofing customers into calling phone numbers that run a middleperson
attack between the customer and the call center. I’ll discuss all this further in
the chapter on Banking and Bookkeeping.
We will come across such attacks again and again in applications ranging
from Internet security protocols to Bluetooth. They even apply in gaming. As
the mathematician John Conway once remarked, it’s easy to get at least a draw
against a grandmaster at postal chess: just play two grandmasters at once, one
as white and the other as black, and relay the moves between them!
4.3.4 Reflection attacks
Further interesting problems arise when two principals have to identify each
other. Suppose that a challenge-response IFF system designed to prevent
anti-aircraft gunners attacking friendly aircraft had to be deployed in a
fighter-bomber too. Now suppose that the air force simply installed one of
their air gunners’ challenge units in each aircraft and connected it to the
fire-control radar.
But now when a fighter challenges an enemy bomber, the bomber might just
reflect the challenge back to the fighter’s wingman, get a correct response, and
then send that back as its own response:
F→B
B → F′
F′ → B
B→F
∶
∶
∶
∶
N
N
{N}K
{N}K
There are a number of ways of stopping this, such as including the names
of the two parties in the exchange. In the above example, we might require a
friendly bomber to reply to the challenge:
F→B∶N
with a response such as:
B → F ∶ {B, N}K
Thus a reflected response {F , N} from the wingman F′ could be detected5 .
This serves to illustrate the subtlety of the trust assumptions that underlie
authentication. If you send out a challenge N and receive, within 20 milliseconds, a response {N}K , then – since light can travel a bit under 3,730 miles
in 20 ms – you know that there is someone with the key K within 2000 miles.
′
5
And don’t forget: you also have to check that the intruder didn’t just reflect your own challenge
back at you. You must be able to remember or recognise your own messages!
4.4 Manipulating the message
But that’s all you know. If you can be sure that the response was not computed using your own equipment, you now know that there is someone else
with the key K within two thousand miles. If you make the further assumption that all copies of the key K are securely held in equipment which may
be trusted to operate properly, and you see {B, N}K , you might be justified
in deducing that the aircraft with callsign B is within 2000 miles. A careful
analysis of trust assumptions and their consequences is at the heart of security
protocol design.
By now you might think that we understand all the protocol design aspects
of IFF. But we’ve omitted one of the most important problems – and one which
the designers of early IFF systems didn’t anticipate. As radar is passive the
returns are weak, while IFF is active and so the signal from an IFF transmitter
will usually be audible at a much greater range than the same aircraft’s
radar return. The Allies learned this the hard way; in January 1944, decrypts
of Enigma messages revealed that the Germans were plotting British and
American bombers at twice the normal radar range by interrogating their IFF.
So more modern systems authenticate the challenge as well as the response.
The NATO mode XII, for example, has a 32 bit encrypted challenge, and a
different valid challenge is generated for every interrogation signal, of which
there are typically 250 per second. Theoretically there is no need to switch
off over enemy territory, but in practice an enemy who can record valid
challenges can replay them as part of an attack. Relays are made difficult in
mode XII using directionality and time-of-flight.
Other IFF design problems include the difficulties posed by neutrals, error
rates in dense operational environments, how to deal with equipment failure,
how to manage keys, and how to cope with multinational coalitions. I’ll
return to IFF in Chapter 23. For now, the spurious-challenge problem serves
to reinforce an important point: that the correctness of a security protocol
depends on the assumptions made about the requirements. A protocol that
can protect against one kind of attack (being shot down by your own side) but
which increases the exposure to an even more likely attack (being shot down
by the other side) might not help. In fact, the spurious-challenge problem
became so serious in World War II that some experts advocated abandoning
IFF altogether, rather than taking the risk that one bomber pilot in a formation
of hundreds would ignore orders and leave his IFF switched on while over
enemy territory.
4.4 Manipulating the message
We’ve now seen a number of middleperson attacks that reflect or spoof the
information used to authenticate a participant. However, there are more
complex attacks where the attacker doesn’t just impersonate someone, but
manipulates the message content.
133
134
Chapter 4
■
Protocols
One example we saw already is the prepayment meter that remembers only
the last ticket it saw, so it can be recharged without limit by copying in the codes
from two tickets A and B one after another: ABABAB.... Another is when dishonest cabbies insert pulse generators in the cable that connects their taximeter
to a sensor in their taxi’s gearbox. The sensor sends pulses as the prop shaft
turns, which lets the meter work out how far the taxi has gone. A pirate device
can insert extra pulses, making the taxi appear to have gone further. A truck
driver who wants to drive faster or further than regulations allow can use a
similar device to discard some pulses, so he seems to have been driving more
slowly or not at all. We’ll discuss such attacks in the chapter on ‘Monitoring
Systems’, in section 14.3.
As well as monitoring systems, control systems often need to be hardened
against message-manipulation attacks. The Intelsat satellites used for international telephone and data traffic have mechanisms to prevent a command being
accepted twice – otherwise an attacker could replay control traffic and repeatedly order the same maneuver to be carried out until the satellite ran out of
fuel [1529]. We will see lots of examples of protocol attacks involving message
manipulation in later chapters on specific applications.
4.5 Changing the environment
A common cause of protocol failure is that the environment changes, so that
the design assumptions no longer hold and the security protocols cannot cope
with the new threats.
A nice example comes from the world of cash machine fraud. In 1993,
Holland suffered an epidemic of ‘phantom withdrawals’; there was much controversy in the press, with the banks claiming that their systems were secure
while many people wrote in to the papers claiming to have been cheated. Eventually the banks noticed that many of the victims had used their bank cards at
a certain filling station near Utrecht. This was staked out and one of the staff
was arrested. It turned out that he had tapped the line from the card reader
to the PC that controlled it; his tap recorded the magnetic stripe details from
their cards while he used his eyeballs to capture their PINs [55]. Exactly the
same fraud happened in the UK after the move to ‘chip and PIN’ smartcards
in the mid-2000s; a gang wiretapped perhaps 200 filling stations, collected
card data from the wire, observed the PINs using CCTV cameras, then made
up thousands of magnetic-strip clone cards that were used in countries whose
ATMs still used magnetic strip technology. At our local filling station, over 200
customers suddenly found that their cards had been used in ATMs in Thailand.
Why had the system been designed so badly, and why did the design error
persist for over a decade through a major technology change? Well, when the
standards for managing magnetic stripe cards and PINs were developed in the
4.6 Chosen protocol attacks
early 1980’s by organizations such as IBM and VISA, the engineers had made
two assumptions. The first was that the contents of the magnetic strip – the
card number, version number and expiration date – were not secret, while
the PIN was [1303]. (The analogy used was that the magnetic strip was your
name and the PIN your password.) The second assumption was that bank card
equipment would only be operated in trustworthy environments, such as in a
physically robust automatic teller machine, or by a bank clerk at a teller station. So it was ‘clearly’ only necessary to encrypt the PIN, on its way from the
PIN pad to the server; the magnetic strip data could be sent in clear from the
card reader.
Both of these assumptions had changed by 1993. An epidemic of card forgery,
mostly in the Far East in the late 1980’s, drove banks to introduce authentication codes on the magnetic strips. Also, the commercial success of the bank
card industry led banks in many countries to extend the use of debit cards
from ATMs to terminals in all manner of shops. The combination of these two
environmental changes destroyed the assumptions behind the original system
architecture. Instead of putting a card whose magnetic strip contained no security data into a trusted machine, people were putting a card with clear security
data into an untrusted machine. These changes had come about so gradually,
and over such a long period, that the industry didn’t see the problem coming.
4.6 Chosen protocol attacks
Governments keen to push ID cards have tried to get them used for many other
transactions; some want a single card to be used for ID, banking and even transport ticketing. Singapore went so far as to experiment with a bank card that
doubled as military ID. This introduced some interesting new risks: if a Navy
captain tries to withdraw some cash from an ATM after a good dinner and forgets his PIN, will he be unable to take his ship to sea until Monday morning
when they open the bank and give him his card back?
Some firms are pushing multifunction authentication devices that could be
used in a wide range of transactions to save you having to carry around dozens
of different cards and keys. A more realistic view of the future may be that
people’s phones will be used for most private-sector authentication functions.
But this too may not be as simple as it looks. The idea behind the ‘Chosen
Protocol Attack’ is that given a target protocol, you design a new protocol that
will attack it if the users can be inveigled into reusing the same token or crypto
key. So how might the Mafia design a protocol to attack the authentication of
bank transactions?
Here’s one approach. It used to be common for people visiting a porn
website to be asked for ‘proof of age,’ which usually involves giving a credit
card number, whether to the site itself or to an age checking service. If
135
136
Chapter 4
■
Protocols
smartphones are used to authenticate everything, it would be natural for the
porn site to ask the customer to authenticate a random challenge as proof of
age. A porn site might then mount a ‘Mafia-in-the-middle’ attack as shown
in Figure 4.3. They wait until an unsuspecting customer visits their site, then
order something resellable (such as gold coins) from a dealer, playing the role
of the coin dealer’s customer. When the coin dealer sends them the transaction
data for authentication, they relay it through their porn site to the waiting
customer. The poor man OKs it, the Mafia gets the gold coins, and when
thousands of people suddenly complain about the huge charges to their cards
at the end of the month, the porn site has vanished – along with the gold [1034].
Picture 143!
Buy 10 gold coins
Sign ‘X’
Prove your age
by signing ‘X’
Customer
sigK
X
Mafia porn
site
sigK
X
BANK
Figure 4.3: The Mafia-in-the-middle attack
In the 1990s a vulnerability of this kind found its way into international
standards: the standards for digital signature and authentication could be run
back-to-back in this way. It has since been shown that many protocols, though
secure in themselves, can be broken if their users can be inveigled into reusing
the same keys in other applications [1034]. This is why, if we’re going to use
our phones to authenticate everything, it will be really important to keep the
banking apps and the porn apps separate. That will be the subject in Chapter 6
on Access Control.
In general, using crypto keys (or other authentication mechanisms) in more
than one application is dangerous, while letting other people bootstrap their
own application security off yours can be downright foolish. The classic case
is where a bank relies for two-factor authentication on sending SMSes to customers as authentication codes. As I discussed in section 3.4.1, the bad guys
have learned to attack that system by SIM-swap fraud – pretending to the
phone company that they’re the target, claiming to have lost their phone, and
getting a replacement SIM card.
4.7 Managing encryption keys
The examples of security protocols that we’ve discussed so far are mostly about
authenticating a principal’s name, or application data such as the impulses
driving a taximeter. There is one further class of authentication protocols that
is very important – the protocols used to manage cryptographic keys.
4.7 Managing encryption keys
4.7.1 The resurrecting duckling
In the Internet of Things, keys can sometimes be managed directly and physically, by local setup and a policy of trust-on-first-use or TOFU.
Vehicles provided an early example. I mentioned above that crooked taxi
drivers used to put interruptors in the cable from their car’s gearbox sensor
to the taximeter, to add additional mileage. The same problem happened in
reverse with tachographs, the devices used by trucks to monitor drivers’ hours
and speed. When tachographs went digital in the late 1990s, we decided to
encrypt the pulse train from the sensor. But how could keys be managed? The
solution was that whenever a new tachograph is powered up after a factory
reset, it trusts the first crypto key it receives over the sensor cable. I’ll discuss
this further in section 14.3.
A second example is Homeplug AV, the standard used to encrypt data communications over domestic power lines, and widely used in LAN extenders.
In the default, ‘just-works’ mode, a new Homeplug device trusts the first key
it sees; and if your new wifi extender mates with the neighbour’s wifi instead,
you just press the reset button and try again. There is also a ‘secure mode’
where you open a browser to the network management node and manually
enter a crypto key printed on the device packaging, but when we designed
the Homeplug protocol we realised that most people have no reason to bother
with that [1439].
The TOFU approach is also known as the ‘resurrecting duckling’ after
an analysis that Frank Stajano and I did in the context of pairing medical
devices [1822]. The idea is that when a baby duckling hatches, it imprints on
the first thing it sees that moves and quacks, even if this is the farmer – who
can end up being followed everywhere by a duck that thinks he’s mummy. If
such false imprinting happens with an electronic device, you need a way to
kill it and resurrect it into a newborn state – which the reset button does in a
device such as a LAN extender.
4.7.2 Remote key management
The more common, and interesting, case is the management of keys in
remote devices. The basic technology was developed from the late 1970s to
manage keys in distributed computer systems, with cash machines being an
early application. In this section we’ll discuss shared-key protocols such as
Kerberos, leaving public-key protocols such as TLS and SSH until after we’ve
discussed public-key cryptology in Chapter 5.
The basic idea behind key-distribution protocols is that where two principals
want to communicate, they may use a trusted third party to introduce them.
It’s customary to give them human names in order to avoid getting lost in too
much algebra. So we will call the two communicating principals ‘Alice’ and
‘Bob’, and the trusted third party ‘Sam’. Alice, Bob and Sam are likely to be
programs running on different devices. (For example, in a protocol to let a car
137
138
Chapter 4
■
Protocols
dealer mate a replacement key with a car, Alice might be the car, Bob the key
and Sam the car maker.)
A simple authentication protocol could run as follows.
1. Alice first calls Sam and asks for a key for communicating with Bob.
2. Sam responds by sending Alice a pair of certificates. Each contains a
copy of a key, the first encrypted so only Alice can read it, and the second
encrypted so only Bob can read it.
3. Alice then calls Bob and presents the second certificate as her
introduction. Each of them decrypts the appropriate certificate
under the key they share with Sam and thereby gets access to
the new key. Alice can now use the key to send encrypted messages to Bob, and to receive messages from him in return.
We’ve seen that replay attacks are a known problem, so in order that both Bob
and Alice can check that the certificates are fresh, Sam may include a timestamp
in each of them. If certificates never expire, there might be serious problems
dealing with users whose privileges have been revoked.
Using our protocol notation, we could describe this as
A → S ∶ A, B
S → A ∶ {A, B, KAB , T}KAS , {A, B, KAB , T}KBS
A → B ∶ {A, B, KAB , T}KBS , {M}KAB
Expanding the notation, Alice calls Sam and says she’d like to talk to Bob.
Sam makes up a message consisting of Alice’s name, Bob’s name, a session key
for them to use, and a timestamp. He encrypts all this under the key he shares
with Alice, and he encrypts another copy of it under the key he shares with
Bob. He gives both ciphertexts to Alice. Alice retrieves the session key from
the ciphertext that was encrypted to her, and passes on to Bob the ciphertext
encrypted for him. She now sends him whatever message she wanted to send,
encrypted using this session key.
4.7.3 The Needham-Schroeder protocol
Many things can go wrong, and here is a famous historical example. Many
existing key distribution protocols are derived from the Needham-Schroeder
protocol, which appeared in 1978 [1428]. It is somewhat similar to the above,
but uses nonces rather than timestamps. It runs as follows:
Message 1
Message 2
Message 3
Message 4
Message 5
A→S∶
S→A∶
A→B∶
B→A∶
A→B∶
A, B, NA
{NA , B, KAB , {KAB , A}KBS }KAS
{KAB , A}KBS
{NB }KAB
{NB − 1}KAB
4.7 Managing encryption keys
Here Alice takes the initiative, and tells Sam: ‘I’m Alice, I want to talk to Bob,
and my random nonce is NA .’ Sam provides her with a session key, encrypted
using the key she shares with him. This ciphertext also contains her nonce so
she can confirm it’s not a replay. He also gives her a certificate to convey this
key to Bob. She passes it to Bob, who then does a challenge-response to check
that she is present and alert.
There is a subtle problem with this protocol – Bob has to assume that the
key KAB he receives from Sam (via Alice) is fresh. This is not necessarily so:
Alice could have waited a year between steps 2 and 3. In many applications
this may not be important; it might even help Alice to cache keys against possible server failures. But if an opponent – say Charlie – ever got hold of Alice’s
key, he could use it to set up session keys with many other principals. And if
Alice ever got fired, then Sam had better have a list of everyone in the firm to
whom he issued a key for communicating with her, to tell them not to believe
it any more. In other words, revocation is a problem: Sam may have to keep
complete logs of everything he’s ever done, and these logs would grow in
size forever unless the principals’ names expired at some fixed time in the
future.
Almost 40 years later, this example is still controversial. The simplistic view
is that Needham and Schroeder just got it wrong; the view argued by Susan
Pancho and Dieter Gollmann (for which I have some sympathy) is that this
is a protocol failure brought on by shifting assumptions [781, 1493]. 1978 was
a kinder, gentler world; computer security then concerned itself with keeping ‘bad guys’ out, while nowadays we expect the ‘enemy’ to be among the
users of our system. The Needham-Schroeder paper assumed that all principals behave themselves, and that all attacks came from outsiders [1428]. Under
those assumptions, the protocol remains sound.
4.7.4 Kerberos
The most important practical derivative of the Needham-Schroeder protocol
is Kerberos, a distributed access control system that originated at MIT and is
now one of the standard network authentication tools [1829]. It has become
part of the basic mechanics of authentication for both Windows and Linux,
particularly when machines share resources over a local area network. Instead
of a single trusted third party, Kerberos has two kinds: authentication servers
to which users log on, and ticket granting servers which give them tickets
allowing access to various resources such as files. This enables scalable access
management. In a university, for example, one might manage students through
their colleges or halls of residence but manage file servers by departments;
in a company, the personnel people might register users to the payroll system while departmental administrators manage resources such as servers and
printers.
139
140
Chapter 4
■
Protocols
First, Alice logs on to the authentication server using a password. The client
software in her PC fetches a ticket from this server that is encrypted under her
password and that contains a session key KAS . Assuming she gets the password
right, she now controls KAS and to get access to a resource B controlled by the
ticket granting server S, the following protocol takes place. Its outcome is a
key KAB with timestamp TS and lifetime L, which will be used to authenticate
Alice’s subsequent traffic with that resource:
A→S∶
S→A∶
A→B∶
B→A∶
A, B
{TS , L, KAB , B, {TS , L, KAB , A}KBS }KAS
{TS , L, KAB , A}KBS , {A, TA }KAB
{TA + 1}KAB
Translating this into English: Alice asks the ticket granting server for access
to B. If this is permissible, the ticket {TS , L, KAB , A}KBS is created containing a
suitable key KAB and given to Alice to use. She also gets a copy of the key in a
form readable by her, namely encrypted under KAS . She now verifies the ticket
by sending a timestamp TA to the resource, which confirms it’s alive by sending
back the timestamp incremented by one (this shows it was able to decrypt the
ticket correctly and extract the key KAB ).
The revocation issue with the Needham-Schroeder protocol has been fixed
by introducing timestamps rather than random nonces. But, as in most of life,
we get little in security for free. There is now a new vulnerability, namely that
the clocks on our various clients and servers might get out of sync; they might
even be desynchronized deliberately as part of a more complex attack.
What’s more, Kerberos is a trusted third-party (TTP) protocol in that S is
trusted: if the police turn up with a warrant, they can get Sam to turn over
the keys and read the traffic. Protocols with this feature were favoured during
the ‘crypto wars’ of the 1990s, as I will discuss in section 26.2.7. Protocols that
involve no or less trust in a third party generally use public-key cryptography,
which I describe in the next chapter.
A rather similar protocol to Kerberos is OAuth, a mechanism to allow secure
delegation. For example, if you log into Doodle using Google and allow Doodle to update your Google calendar, Doodle’s website redirects you to Google,
which gets you to log in (or relies on a master cookie from a previous login) and
asks you for consent for Doodle to write to your calendar. Doodle then gives
you an access token for the calendar service [864]. I mentioned in section 3.4.9.3
that this poses a cross-site phishing risk. OAuth was not designed for user
authentication, and access tokens are not strongly bound to clients. It’s a complex framework within which delegation mechanisms can be built, with both
short-term and long-term access tokens; the details are tied up with how cookies and web redirects operate and optimised to enable servers to be stateless,
so they scale well for modern web services. In the example above, you want to
4.8 Design assurance
be able to revoke Doodle’s access at Google, so behind the scenes Doodle only
gets short-lived access tokens. Because of this complexity, the OpenID Connect
protocol is a ‘profile’ of OAuth which ties down the details for the case where
the only service required is authentication. OpenID Connect is what you use
when you log into your newspaper using your Google or Facebook account.
4.7.5 Practical key management
So we can use a protocol like Kerberos to set up and manage working keys
between users given that each user shares one or more long-term keys with a
server that acts as a key distribution centre. But there may be encrypted passwords for tens of thousands of staff and keys for large numbers of devices too.
That’s a lot of key material. How is it to be managed?
Key management is a complex and difficult business and is often got wrong
because it’s left as an afterthought. You need to sit down and think about how
many keys are needed, how they’re to be generated, how long they need to
remain in service and how they’ll eventually be destroyed. There is a much
longer list of concerns – many of them articulated in the Federal Information
Processing Standard for key management [1410]. And things go wrong as
applications evolve; it’s important to provide headroom to support next year’s
functionality. It’s also important to support recovery from security failure. Yet
there are no standard ways of doing either.
Public-key cryptography, which I’ll discuss in Chapter 5, can simplify the
key-management task slightly. In banking the usual answer is to use dedicated cryptographic processors called hardware security modules, which I’ll
describe in detail later. Both of these introduce further complexities though,
and even more subtle ways of getting things wrong.
4.8 Design assurance
Subtle difficulties of the kind we have seen above, and the many ways in which
protection properties depend on subtle assumptions that may be misunderstood, have led researchers to apply formal methods to protocols. The goal of
this exercise was originally to decide whether a protocol was right or wrong: it
should either be proved correct, or an attack should be exhibited. We often find
that the process helps clarify the assumptions that underlie a given protocol.
There are several different approaches to verifying the correctness of protocols. One of the best known is the logic of belief, or BAN logic, named after its
inventors Burrows, Abadi and Needham [352]. It reasons about what a principal might reasonably believe having seen certain messages, timestamps and so
141
142
Chapter 4
■
Protocols
on. Other researchers have applied mainstream formal methods such as CSP
and verification tools such as Isabelle.
Some history exists of flaws being found in protocols that had been proved
correct using formal methods; I described an example in Chapter 3 of the second edition, of how the BAN logic was used to verify a bank card used for
stored-value payments. That’s still used in Germany as the ‘Geldkarte’ but
elsewhere its use has died out (it was Net1 in South Africa, Proton in Belgium,
Moneo in France and a VISA product called COPAC). I’ve therefore decided
to drop the gory details from this edition; the second edition is free online, so
you can download and read the details.
Formal methods can be an excellent way of finding bugs in security protocol
designs as they force the designer to make everything explicit and thus confront difficult design choices that might otherwise be fudged. But they have
their limitations, too.
We often find bugs in verified protocols; they’re just not in the part that
we verified. For example, Larry Paulson verified the SSL/TLS protocol using
his Isabelle theorem prover in 1998, and about one security bug has been
found every year since then. These have not been flaws in the basic design but
exploited additional features that had been added later, and implementation
issues such as timing attacks, which we’ll discuss later. In this case there was
no failure of the formal method; that simply told the attackers where they
needn’t bother looking.
For these reasons, people have explored alternative ways of assuring the
design of authentication protocols, including the idea of protocol robustness. Just
as structured programming techniques aim to ensure that software is designed
methodically and nothing of importance is left out, so robust protocol design is
largely about explicitness. Robustness principles include that the interpretation
of a protocol should depend only on its content, not its context; so everything
of importance (such as principals’ names) should be stated explicitly in the
messages. It should not be possible to interpret data in more than one way;
so the message formats need to make clear what’s a name, what’s an address,
what’s a timestamp, and so on; string formats have to be unambiguous and it
should be impossible to use the protocol itself to mount attacks on the software
that handles it, such as by buffer overflows. There are other issues concerning
the freshness provided by counters, timestamps and random challenges, and
on the way encryption is used. If the protocol uses public key cryptography
or digital signature mechanisms, there are more subtle attacks and further
robustness issues, which we’ll start to tackle in the next chapter. To whet your
appetite, randomness in protocol often helps robustness at other layers, since
it makes it harder to do a whole range of attacks – from those based on mathematical cryptanalysis through those that exploit side-channels such as power
consumption and timing to physical attacks that involve microprobes or lasers.
Research problems
4.9 Summary
Passwords are just one example of a more general concept, the security
protocol. Protocols specify the steps that principals use to establish trust relationships in a system, such as authenticating a claim to identity, demonstrating
ownership of a credential, or establishing a claim on a resource. Cryptographic
authentication protocols are used for a wide range of purposes, from basic
entity authentication to providing infrastructure for distributed systems that
allows trust to be taken from where it exists to where it is needed. Security
protocols are fielded in all sorts of systems from remote car door locks through
military IFF systems to authentication in distributed computer systems.
Protocols are surprisingly difficult to get right. They can suffer from a number
of problems, including middleperson attacks, modification attacks, reflection
attacks, and replay attacks. These threats can interact with implementation vulnerabilities and poor cryptography. Using mathematical techniques to verify
the correctness of protocols can help, but it won’t catch all the bugs. Some of
the most pernicious failures are caused by creeping changes in the environment
for which a protocol was designed, so that the protection it gives is no longer
relevant. The upshot is that attacks are still found frequently on protocols that
we’ve been using for years, and sometimes even on protocols for which we
thought we had a security proof. Failures have real consequences, including
the rise in car crime worldwide since car makers started adopting passive keyless entry systems without stopping to think about relay attacks. Please don’t
design your own protocols; get a specialist to help, and ensure that your design
is published for thorough peer review by the research community. Even specialists get the first versions of a protocol wrong (I have, more than once). It’s
a lot cheaper to fix the bugs before the protocol is actually deployed, both in
terms of cash and in terms of reputation.
Research problems
At several times during the past 30 years, some people have thought that protocols had been ‘done’ and that we should turn to new research topics. They
have been repeatedly proved wrong by the emergence of new applications with
a new crop of errors and attacks to be explored. Formal methods blossomed in
the early 1990s, then key management protocols; during the mid-1990’s the
flood of proposals for electronic commerce mechanisms kept us busy. Since
2000, one strand of protocol research has acquired an economic flavour as security mechanisms are used more and more to support business models; the
designer’s ‘enemy’ is often a commercial competitor, or even the customer.
Another has applied protocol analysis tools to look at the security of application programming interfaces (APIs), a topic to which I’ll return later.
143
144
Chapter 4
■
Protocols
Much protocol research is problem-driven, but there are still deep questions.
How much can we get out of formal methods, for example? And how do we
manage the tension between the principle that robust protocols are generally
those in which everything is completely specified and checked and the system
engineering principle that a good specification should not overconstrain the
implementer?
Further reading
Research papers on security protocols are scattered fairly widely throughout
the literature. For the historical background you might read the original
Needham-Schroeder paper [1428], the Burrows-Abadi-Needham authentication logic [352], papers on protocol robustness [2, 113] and a survey paper by
Anderson and Needham [114]. Beyond that, there are many papers scattered
around a wide range of conferences; you might also start by studying the
protocols used in a specific application area, such as payments, which we
cover in more detail in Part 2. As for remote key entry and other security
issues around cars, a good starting point is a tech report by Charlie Miller and
Chris Valasek on how to hack a Jeep Cherokee [1318].
CHAPTER
5
Cryptography
ZHQM ZMGM ZMFM
– G JULIUS CAESAR
KXJEY UREBE ZWEHE WRYTU HEYFS KREHE GOYFI WTTTU OLKSY CAJPO BOTEI ZONTX BYBWT
GONEY CUZWR GDSON SXBOU YWRHE BAAHY USEDQ
– JOHN F KENNEDY
5.1 Introduction
Cryptography is where security engineering meets mathematics. It gives us the
tools that underlie most modern security protocols. It is the key technology for
protecting distributed systems, yet it is surprisingly hard to do right. As we’ve
already seen in Chapter 4, “Protocols,” cryptography has often been used to
protect the wrong things, or to protect them in the wrong way. Unfortunately,
the available crypto tools aren’t always very usable.
But no security engineer can ignore cryptology. A medical friend once told
me that while she was young, she worked overseas in a country where, for
economic reasons, they’d shortened their medical degrees and concentrated
on producing specialists as quickly as possible. One day, a patient who’d had
both kidneys removed and was awaiting a transplant needed her dialysis shunt
redone. The surgeon sent the patient back from the theater on the grounds that
there was no urinalysis on file. It just didn’t occur to him that a patient with no
kidneys couldn’t produce any urine.
Just as a doctor needs to understand physiology as well as surgery, so
a security engineer needs to be familiar with at least the basics of crypto
(and much else). There are, broadly speaking, three levels at which one can
approach crypto. The first consists of the underlying intuitions; the second of
the mathematics that we use to clarify these intuitions, provide security proofs
where possible and tidy up the constructions that cause the most confusion;
145
146
Chapter 5
■
Cryptography
and the third is the cryptographic engineering – the tools we commonly use,
and the experience of what can go wrong with them. In this chapter, I assume
you have no training in crypto and set out to explain the basic intuitions.
I illustrate them with engineering, and sketch enough of the mathematics to
help give you access to the literature when you need it. One reason you need
some crypto know-how is that many common constructions are confusing,
and many tools offer unsafe defaults. For example, Microsoft’s Crypto API
(CAPI) nudges engineers to use electronic codebook mode; by the end of
this chapter you should understand what that is, why it’s bad, and what you
should do instead.
Many crypto textbooks assume that their readers are pure maths graduates,
so let me start off with non-mathematical definitions. Cryptography refers to
the science and art of designing ciphers; cryptanalysis to the science and art of
breaking them; while cryptology, often shortened to just crypto, is the study of
both. The input to an encryption process is commonly called the plaintext or
cleartext, and the output the ciphertext. Thereafter, things get somewhat more
complicated. There are a number of basic building blocks, such as block ciphers,
stream ciphers, and hash functions. Block ciphers may either have one key for
both encryption and decryption, in which case they’re called shared-key (also
secret-key or symmetric), or have separate keys for encryption and decryption,
in which case they’re called public-key or asymmetric. A digital signature scheme
is a special type of asymmetric crypto primitive.
I will first give some historical examples to illustrate the basic concepts. I’ll
then fine-tune definitions by introducing the security models that cryptologists
use, including perfect secrecy, concrete security, indistinguishability and the
random oracle model. Finally, I’ll show how the more important cryptographic
algorithms actually work, and how they can be used to protect data. En route,
I’ll give examples of how people broke weak ciphers, and weak constructions
using strong ciphers.
5.2 Historical background
Suetonius tells us that Julius Caesar enciphered his dispatches by writing ‘D’
for ‘A’, ‘E’ for ‘B’ and so on [1847]. When Augustus Caesar ascended the throne,
he changed the imperial cipher system so that ‘C’ was now written for ‘A’, ‘D’
for ‘B’ etcetera. In modern terminology, we would say that he changed the key
from ‘D’ to ‘C’. Remarkably, a similar code was used by Bernardo Provenzano,
allegedly the capo di tutti capi of the Sicilian mafia, who wrote ‘4’ for ‘a’, ‘5’ for
‘b’ and so on. This led directly to his capture by the Italian police in 2006 after
they intercepted and deciphered some of his messages [1538].
5.2 Historical background
The Arabs generalised this idea to the monoalphabetic substitution, in which a
keyword is used to permute the cipher alphabet. We will write the plaintext in
lower case letters, and the ciphertext in upper case, as shown in Figure 5.1:
abcdefghijklmnopqrstuvwxyz
SECURITYABDFGHJKLMNOPQVWXZ
Figure 5.1: Monoalphabetic substitution cipher
OYAN RWSGKFR AN AH RHTFANY MSOYRM OYSH SMSEAC NCMAKO; but it’s a
pencil and paper puzzle to break ciphers of this kind. The trick is that some
letters, and combinations of letters, are much more common than others; in
English the most common letters are e,t,a,i,o,n,s,h,r,d,l,u in that order. Artificial
intelligence researchers have experimented with programs to solve monoalphabetic substitutions. Using letter and digram (letter pair) frequencies alone,
they typically need about 600 letters of ciphertext; smarter strategies such as
guessing probable words can cut this to about 150 letters; and state-of-the-art
systems that use neural networks and approach the competence of human
analysts are also tested on deciphering ancient scripts such as Ugaritic and
Linear B [1196].
There are basically two ways to make a stronger cipher – the stream cipher
and the block cipher. In the former, you make the encryption rule depend on
a plaintext symbol’s position in the stream of plaintext symbols, while in the
latter you encrypt several plaintext symbols at once in a block.
5.2.1 An early stream cipher – the Vigenère
This early stream cipher is commonly ascribed to the Frenchman Blaise de
Vigenère, a diplomat who served King Charles IX. It works by adding a key
repeatedly into the plaintext using the convention that ‘A’ = 0, ‘B’ = 1, … , ‘Z’
= 25, and addition is carried out modulo 26 – that is, if the result is greater than
25, we subtract as many multiples of 26 as are needed to bring it into the range
[0, … , 25], that is, [A, … , Z]. Mathematicians write this as
C = P + K mod 26
So, for example, when we add P (15) to U (20) we get 35, which we reduce to
9 by subtracting 26. 9 corresponds to J, so the encryption of P under the key U
(and of U under the key P) is J, or more simply U + P = J. In this notation, Julius
Caesar’s system used a fixed key K = D, while Augustus Caesar’s used K = C
and Vigenère used a repeating key, also known as a running key. Techniques
were developed to do this quickly, ranging from printed tables to brass cipher
wheels. Whatever the technology, the encryption using a repeated keyword for
the key would look as shown in Figure 5.2:
147
148
Chapter 5
■
Cryptography
Plain
tobeornottobethatisthequestion
Key
runrunrunrunrunrunrunrunrunrun
Cipher
KIOVIEEIGKIOVNURNVJNUVKHVMGZIA
Figure 5.2: Vigenère (polyalphabetic substitution cipher)
A number of people appear to have worked out how to solve polyalphabetic
ciphers, from the womaniser Giacomo Casanova to the computing pioneer
Charles Babbage. But the first published solution was in 1863 by Friedrich
Kasiski, a Prussian infantry officer [1023]. He noticed that given a long enough
piece of ciphertext, repeated patterns will appear at multiples of the keyword
length.
In Figure 5.2, for example, we see ‘KIOV’ repeated after nine letters, and ‘NU’
after six. Since three divides both six and nine, we might guess a keyword of
three letters. Then ciphertext letters one, four, seven and so on were all enciphered under the same keyletter; so we can use frequency analysis techniques
to guess the most likely values of this letter, and then repeat the process for the
remaining letters of the key.
5.2.2 The one-time pad
One way to make a stream cipher of this type proof against attacks is for the
key sequence to be as long as the plaintext, and to never repeat. This is known
as the one-time pad and was proposed by Gilbert Vernam during World War
I [1003]; given any ciphertext, and any plaintext of the same length, there’s a
key that decrypts the ciphertext to the plaintext. So regardless of the amount of
computation opponents can do, they’re none the wiser, as given any ciphertext,
all possible plaintexts of that length are equally likely. This system therefore has
perfect secrecy.
Here’s an example. Suppose you had intercepted a message from a wartime
German agent which you knew started with ‘Heil Hitler’, and the first ten letters of ciphertext were DGTYI BWPJA. So the first ten letters of the one-time pad
were wclnb tdefj, as shown in Figure 5.3:
Plain
heilhitler
Key
wclnbtdefj
Cipher
DGTYIBWPJA
Figure 5.3: A spy’s message
But once he’s burnt the piece of silk with his key material, the spy can claim
that he’s actually a member of the underground resistance, and the message
actually said ‘Hang Hitler’. This is also possible, as the key material could just
as easily have been wggsb tdefj, as shown in Figure 5.4:
5.2 Historical background
Cipher
DGTYIBWPJA
Key
wggsbtdefj
Plain
hanghitler
Figure 5.4: What the spy can claim he said
Now we rarely get anything for nothing in cryptology, and the price of the
perfect secrecy of the one-time pad is that it fails completely to protect message
integrity. So if you wanted to get this spy into trouble, you could change the
ciphertext to DCYTI BWPJA (Figure 5.5):
Cipher
DCYTIBWPJA
Key
wclnbtdefj
Plain
hanghitler
Figure 5.5: Manipulating the message to entrap the spy
Leo Marks’ engaging book on cryptography in the Special Operations Executive in World War II [1226] relates how one-time key material was printed on
silk, which agents could conceal inside their clothing; whenever a key had been
used it was torn off and burnt. In fact, during the war, Claude Shannon proved
that a cipher has perfect secrecy if and only if there are as many possible keys
as possible plaintexts, and every key is equally likely; so the one-time pad is
the only kind of system that offers perfect secrecy. He was finally allowed to
publish this in 1948 [1717, 1718].
The one-time tape was used for top-level communications by both sides from
late in World War II, then for strategic communications between NATO allies,
and for the US-USSR hotline from 1963. Thousands of machines were produced
in total, using paper tapes for key material, until they were eventually replaced
by computers from the mid-1980s1 . But such cryptography is too expensive for
most applications as it consumes as much key material as there is traffic. It’s
more common for stream ciphers to use a pseudorandom number generator
to expand a short key into a long keystream. The data is then encrypted by
combining the keystream, one symbol at a time, with the data. It’s not enough
for the keystream to appear “random” in the sense of passing the standard
statistical randomness tests: it must also have the property that an opponent
who gets his hands on even quite a lot of keystream symbols should not be able
to predict any more of them.
1 Information
about the machines can be seen at the Crypto Museum, https://www
.cryptomuseum.com.
149
150
Chapter 5
■
Cryptography
An early example was rotor machines, mechanical stream-cipher devices that
produce a very long sequence of pseudorandom states2 and combine them
with plaintext to get ciphertext. These machines were independently invented
by a number of people from the 1920s, many of whom tried to sell them to the
banking industry. Banks weren’t in general interested, for reasons we’ll discuss below, but rotor machines were very widely used by the combatants in
World War II to encipher radio traffic, and the efforts made by the Allies to decipher German traffic included the work by Alan Turing and others on Colossus,
which helped kickstart the computer industry after the war.
Stream ciphers have been widely used in hardware applications where the
number of gates had to be minimised to save power. However, block ciphers
are more flexible and are more common in systems being designed now, so let’s
look at them next.
5.2.3 An early block cipher – Playfair
The Playfair cipher was invented in 1854 by Sir Charles Wheatstone, a telegraph pioneer who also invented the concertina and the Wheatstone bridge.
The reason it’s not called the Wheatstone cipher is that he demonstrated it to
Baron Playfair, a politician; Playfair in turn demonstrated it to Prince Albert
and to Viscount Palmerston (later Prime Minister), on a napkin after dinner.
This cipher uses a 5 by 5 grid, in which we place the alphabet, permuted by
the key word, and omitting the letter ‘J’ (see Figure 5.6):
P
A
L
M
E
R
S
T
O
N
B
C
D
F
G
H
I
K
Q
U
V
W
X
Y
Z
Figure 5.6: The Playfair enciphering table
The plaintext is first conditioned by replacing ‘J’ with ‘I’ wherever it occurs,
then dividing it into letter pairs, preventing double letters occurring in a pair by
separating them with an ‘x’, and finally adding a ‘z’ if necessary to complete the
last letter pair. The example Playfair wrote on his napkin was ‘Lord Granville’s
letter’ which becomes ‘lo rd gr an vi lx le sl et te rz’.
2 letters
in the case of the Hagelin machine used by the USA, permutations in the case of the
German Enigma and the British Typex
5.2 Historical background
Plain
lo rd gr an vi lx le sl et te rz
Cipher
MT TB BN ES WH TL MP TA LN NL NV
Figure 5.7: Example of Playfair enciphering
It is then enciphered two letters at a time using the following rules:
if the two letters are in the same row or column, they are replaced
by the succeeding letters. For example, ‘am’ enciphers to ‘LE’;
otherwise the two letters stand at two of the corners of a rectangle
in the table, and we replace them with the letters at the other two
corners of this rectangle. For example, ‘lo’ enciphers to ‘MT’.
We can now encipher our specimen text as follows:
Variants of this cipher were used by the British army as a field cipher in World
War I, and by the Americans and Germans in World War II. It’s a substantial improvement on Vigenère as the statistics that an analyst can collect are
of digraphs (letter pairs) rather than single letters, so the distribution is much
flatter and more ciphertext is needed for an attack.
Again, it’s not enough for the output of a block cipher to just look intuitively
“random”. Playfair ciphertexts look random; but they have the property that
if you change a single letter of a plaintext pair, then often only a single letter
of the ciphertext will change. Thus using the key in Figure 5.7, rd enciphers to
TB while rf enciphers to OB and rg enciphers to NB. One consequence is that
given enough ciphertext, or a few probable words, the table (or an equivalent
one) can be reconstructed [740]. In fact, the quote at the head of this chapter
is a Playfair-encrypted message sent by the future President Jack Kennedy
when he was a young lieutenant holed up on a small island with ten other survivors after his motor torpedo boat had been sunk in a collision with a Japanese
destroyer. Had the Japanese intercepted it, they might possibly have decrypted
it, and history could be different. For a stronger cipher, we will want the effects
of small changes in the cipher’s input to diffuse completely through its output. Changing one input bit should, on average, cause half of the output bits
to change. We’ll tighten these ideas up in the next section.
The security of a block cipher can also be greatly improved by choosing a
longer block length than two characters. For example, the Data Encryption Standard (DES), which is widely used in payment systems, has a block length of 64
bits and the Advanced Encryption Standard (AES), which has replaced it in most
other applications, has a block length of twice this. I discuss the internal details
of DES and AES below; for the time being, I’ll just remark that we need more
than just an adequate block size.
For example, if a bank account number always appears at the same place
in a transaction, then it’s likely to produce the same ciphertext every time a
151
152
Chapter 5
■
Cryptography
transaction involving it is encrypted with the same key. This might allow an
opponent to cut and paste parts of two different ciphertexts in order to produce
a valid but unauthorised transaction. Suppose a crook worked for a bank’s
phone company, and monitored an enciphered transaction that he knew said
“Pay IBM $10,000,000”. He might wire $1,000 to his brother causing the bank
computer to insert another transaction saying “Pay John Smith $1,000”, intercept this instruction, and make up a false instruction from the two ciphertexts
that decrypted as “Pay John Smith $10,000,000”. So unless the cipher block is
as large as the message, the ciphertext will contain more than one block and
we’ll need some way of binding the blocks together.
5.2.4 Hash functions
The third classical type of cipher is the hash function. This evolved to protect
the integrity and authenticity of messages, where we don’t want someone to
be able to manipulate the ciphertext in such a way as to cause a predictable
change in the plaintext.
After the invention of the telegraph in the mid-19th century, banks rapidly
became its main users and developed systems for transferring money electronically. What’s ‘wired’ is a payment instruction, such as:
‘To Lombard Bank, London. Please pay from our account with you no.
1234567890 the sum of £1000 to John Smith of 456 Chesterton Road, who has
an account with HSBC Bank Cambridge no. 301234 4567890123, and notify
him that this was for “wedding present from Doreen Smith”. From First Cowboy
Bank of Santa Barbara, CA, USA. Charges to be paid by us.’
Since telegraph messages were relayed from one office to another by human
operators, it was possible for an operator to manipulate a payment message.
In the nineteenth century, banks, telegraph companies and shipping companies developed code books that could not only protect transactions but also
shorten them – which was important given the costs of international telegrams
at the time. A code book was essentially a block cipher that mapped words or
phrases to fixed-length groups of letters or numbers. So “Please pay from our
account with you no.” might become ‘AFVCT’. Sometimes the codes were also
enciphered.
The banks realised that neither stream ciphers nor code books protect message authenticity. If, for example, the codeword for ‘1000’ is ‘mauve’ and for
‘1,000,000’ is ‘magenta’, then the crooked telegraph clerk who can compare the
coded traffic with known transactions should be able to figure this out and
substitute one for the other.
The critical innovation, for the banks’ purposes, was to use a code book but
to make the coding one-way by adding the code groups together into a number
called a test key. (Modern cryptographers would describe it as a hash value or
message authentication code, terms I’ll define more carefully later.)
5.2 Historical background
Here is a simple example. Suppose the bank has a code book with a table of
numbers corresponding to payment amounts as in Figure 5.8.
0
1
2
3
4
5
6
7
8
9
x 1000
14
22
40
87
69
93
71
35
06
58
x 10,000
73
38
15
46
91
82
00
29
64
57
x 100,000
95
70
09
54
82
63
21
47
36
18
x 1,000,000
53
77
66
29
40
12
31
05
87
94
Figure 5.8: A simple test key system
Now in order to authenticate a transaction for £376,514 we might add
together 53 (no millions), 54 (300,000), 29 (70,000) and 71 (6,000) ignoring the
less significant digits. This gives us a test key of 207.
Most real systems were more complex than this; they usually had tables for
currency codes, dates and even recipient account numbers. In the better systems, the code groups were four digits long rather than two, and in order to
make it harder for an attacker to reconstruct the tables, the test keys were compressed: a key of ‘7549’ might become ‘23’ by adding the first and second digits,
and the third and fourth digits, ignoring the carry.
This made such test key systems into one-way functions in that although it
was possible to compute a test from a message, given knowledge of the key, it
was not possible to reverse the process and recover either a message or a key
from a single test – the test just did not contain enough information. Indeed,
one-way functions had been around since at least the seventeenth century. The
scientist Robert Hooke published in 1678 the sorted anagram ‘ceiiinosssttuu’
and revealed two years later that it was derived from ‘Ut tensio sic uis’ – ‘the
force varies as the tension’, or what we now call Hooke’s law for a spring. (The
goal was to establish priority for the idea while giving him time to do more
work on it.)
Banking test keys are not strong by the standards of modern cryptography.
Given between a few dozen and a few hundred tested messages, depending
on the design details, a patient analyst could reconstruct enough of the tables
to forge a transaction. With a few carefully chosen messages inserted into the
banking system by an accomplice, it’s even easier. But the banks got away
with it: test keys worked fine from the late nineteenth century through the
1980s. In several years working as a bank security consultant, and listening
to elderly auditors’ tales over lunch, I only ever heard of two cases of fraud
that exploited it: one external attempt involving cryptanalysis, which failed
because the attacker didn’t understand bank procedures, and one successful
but small fraud involving a crooked staff member. I’ll discuss the systems that
replaced test keys in the chapter on Banking and Bookkeeping.
153
154
Chapter 5
■
Cryptography
However, test keys are our historical example of an algebraic function
used for authentication. They have important modern descendants in the
authentication codes used in the command and control of nuclear weapons,
and also with modern block ciphers. The idea in each case is the same: if you
can use a unique key to authenticate each message, simple algebra can give
you ideal security. Suppose you have a message M of arbitrary length and
want to compute an authentication code or tag A of 128 bits length, and the
property you want is that nobody should be able to find a different message
M′ whose authentication code under the same key will also be A, unless
they know the key, except by a lucky guess for which the probability is 2−128 .
You can simply choose a 128-bit prime number p and compute A = k1 M + k2
(mod p) where the key consists of two 128-bit numbers k1 and k2 .
This is secure for the same reason the one-time pad is: given any other message M′ you can find another key (k1′ , k2′ ) that authenticates M′ to A. So without
knowledge of the key, the adversary who sees M and A simply has no information of any use in creating a valid forgery. As there are 256 bits of key and
only 128 bits of tag, this holds even for an adversary with unlimited computing power: such an adversary can find the 2128 possible keys for each pair of
message and tag but has no way to choose between them. I’ll discuss how this
universal hash function is used with block ciphers below, and how it’s used in
nuclear command and control in Part 2.
5.2.5 Asymmetric primitives
Finally, some modern cryptosystems are asymmetric, in that different keys are
used for encryption and decryption. So, for example, most web sites nowadays
have a certificate containing a public key with which people can encrypt their
session using a protocol called TLS; the owner of the web page can decrypt the
traffic using the corresponding private key. We’ll go into the details later.
There are some pre-computer examples of this too; perhaps the best is the
postal service. You can send me a private message by addressing it to me and
dropping it into a post box. Once that’s done, I’m the only person who’ll be
able to read it. Of course, many things can go wrong: you might get the wrong
address for me (whether by error or as a result of deception); the police might
get a warrant to open my mail; the letter might be stolen by a dishonest postman; a fraudster might redirect my mail without my knowledge; or a thief
might steal the letter from my doormat. Similar things can go wrong with
public key cryptography: false public keys can be inserted into the system,
computers can be hacked, people can be coerced and so on. We’ll look at these
problems in more detail in later chapters.
Another asymmetric application of cryptography is the digital signature. The
idea here is that I can sign a message using a private signature key and then
5.3 Security models
anybody can check this using my public signature verification key. Again, there
are pre-computer analogues in the form of manuscript signatures and seals;
and again, there is a remarkably similar litany of things that can go wrong,
both with the old way of doing things and with the new.
5.3 Security models
Before delving into the detailed design of modern ciphers, I want to look more
carefully at the various types of cipher and the ways in which we can reason
about their security.
Security models seek to formalise the idea that a cipher is “good”. We’ve
already seen the model of perfect secrecy: given any ciphertext, all possible plaintexts of that length are equally likely. Similarly, an authentication scheme that
uses a key only once can be designed so that the best forgery attack on it is a
random guess, whose probability of success can be made as low as we want by
choosing a long enough tag.
The second model is concrete security, where we want to know how much
actual work an adversary has to do. At the time of writing, it takes the most
powerful adversary in existence – the community of bitcoin miners, burning
about as much electricity as the state of Denmark – about ten minutes to solve
a 68-bit cryptographic puzzle and mine a new block. So an 80-bit key would
take them 212 times as long, or about a month; a 128-bit key, the default in
modern systems, is 248 times harder again. So even in 1000 years the probability
of finding the right key by chance is 2−35 or one in many billion. In general, a
system is (t, 𝜖)-secure if an adversary working for time t succeeds in breaking
the cipher with probability at most 𝜖.
The third model, which many theoreticians now call the standard model,
is about indistinguishability. This enables us to reason about the specific
properties of a cipher we care about. For example, most cipher systems
don’t hide the length of a message, so we can’t define a cipher to be secure
by just requiring that an adversary not be able to distinguish ciphertexts
corresponding to two messages; we have to be more explicit and require that
the adversary not be able to distinguish between two messages M1 and M2
of the same length. This is formalised by having the cryptographer and the
cryptanalyst play a game in which the analyst wins by finding an efficient
discriminator of something she shouldn’t be able to discriminate with more
than negligible probability. If the cipher doesn’t have perfect security this can
be asymptotic, where we typically want the effort to grow faster than any polynomial function of a security parameter n – say the length of the key in bits.
A security proof typically consists of a reduction where we show that if there
exists a randomised (i.e., probabilistic) algorithm running in time polynomial
155
156
Chapter 5
■
Cryptography
in n that learns information it shouldn’t with non-negligible probability, then
this would give an efficient discriminator for an underlying cryptographic
primitive that we already trust. Finally, a construction is said to have semantic
security if there’s no efficient distinguisher for the plaintext regardless of any
side information the analyst may have about it; even if she knows all but one
bit of it, and even if she can get a decryption of any other ciphertext, she can’t
learn anything more from the target ciphertext. This skips over quite a few
mathematical details, which you can find in a standard text such as Katz and
Lindell [1025].
The fourth model is the random oracle model, which is not as general as the
standard model but which often leads to more efficient constructions. We call a
cryptographic primitive pseudorandom if there’s no efficient way of distinguishing it from a random function of that type, and in particular it passes all the
statistical and other randomness tests we apply. Of course, the cryptographic
primitive will actually be an algorithm, implemented as an array of gates in
hardware or a program in software; but the outputs should “look random” in
that they’re indistinguishable from a suitable random oracle given the type and
the number of tests that our model of computation permits.
To visualise a random oracle, we might imagine an elf sitting in a black
box with a source of physical randomness and some means of storage (see
Figure 5.9) – represented in our picture by the dice and the scroll. The elf will
accept inputs of a certain type, then look in the scroll to see whether this query
has ever been answered before. If so, it will give the answer it finds there; if
not, it will generate an answer at random by throwing the dice, and keep a
record for future reference. We’ll further assume finite bandwidth – the elf
Figure 5.9: The random oracle
5.3 Security models
will only answer so many queries every second. What’s more, our oracle can
operate according to several different rules.
5.3.1 Random functions – hash functions
The first type of random oracle is the random function. A random function
accepts an input string of any length and outputs a string of fixed length, say n
bits long. The same input gives the same output, but the set of outputs appears
random. So the elf just has a simple list of inputs and outputs, which grows
steadily as it works.
Random functions are our model for cryptographic hash functions. These were
first used in computer systems for one-way encryption of passwords in the
1960s and have many more uses today. For example, if the police seize your
laptop, the standard forensic tools will compute checksums on all the files, to
identify which files are already known (such as system files) and which are
novel (such as user data). These hash values will change if a file is corrupted
and so can assure the court that the police haven’t tampered with evidence.
And if we want evidence that we possessed a given electronic document
by a certain date, we might submit it to an online time-stamping service or
have it mined into the Bitcoin blockchain. However, if the document is still
secret – for example an invention for which we want to establish a priority
date – then we would not upload the whole document, but just the message
hash. This is the modern equivalent of Hooke’s anagram that we discussed in
section 5.2.4 above.
5.3.1.1 Properties
The first main property of a random function is one-wayness. Given knowledge of an input x we can easily compute the hash value h(x), but it is very
difficult given h(x) to find x if such an input is not already known. (The elf
will only pick outputs for given inputs, not the other way round.) As the output is random, the best an attacker can do to invert a random function is to
keep on feeding in more inputs until he gets lucky; with an n-bit output this
will take about 2n−1 guesses on average. A pseudorandom function will have
the same properties, or they could be used to distinguish it from a random
function, contrary to our definition. So a pseudorandom function will also be
a one-way function, provided there are too many possible outputs for the opponent to guess an input that has a desired target output by chance. This means
choosing n so that the opponent can’t do anything near 2n computations. If we
claim, for example, that SHA256 is a pseudorandom function, then we’re saying that there’s no practical way to find an input that hashes to a given 256-bit
value, unless you knew it already and used it to compute that value.
157
158
Chapter 5
■
Cryptography
A second property of pseudorandom functions is that the output will not give
any information at all about even part of the input. So we can get a one-way
encryption of the value x by concatenating it with a secret key k and computing
h(x, k). If the hash function isn’t random enough, though, using it for one-way
encryption in this manner is asking for trouble. (I’ll discuss an example later in
section 22.3.1: the hash function used by many phone companies in the 1990s
and early 2000s to authenticate mobile phone users wasn’t random enough,
which led to attacks.)
A third property of pseudorandom functions with sufficiently long outputs is
that it is hard to find collisions, that is, different messages M1 ≠ M2 with h(M1 ) =
h(M2 ). Unless the opponent can find a shortcut attack (which would mean the
function wasn’t pseudorandom) then the best way of finding a collision is to
collect a large set of messages Mi and their corresponding hashes h(Mi ), sort the
hashes, and look for a match. If the hash function output is an n-bit number, so
that there are 2n possible hash values, then the number of hashes the enemy will
need to compute before he can expect to find a match will be about the square
root of this, namely 2n∕2 hashes. This fact is of huge importance in security
engineering, so let’s look at it more closely.
5.3.1.2 The birthday theorem
The birthday theorem gets its name from the following problem. A maths
teacher asks a class of 30 pupils what they think is the probability that two
of them have the same birthday. Most pupils intuitively think it’s unlikely,
and the maths teacher then asks the pupils to state their birthdays one after
another. The odds of a match exceed 50% once 23 pupils have been called.
As this surprises most people, it’s also known as the ‘birthday paradox’.
The birthday theorem was first used in the 1930’s to count fish, so it’s also
known as capture-recapture statistics [1668]. Suppose there are N fish in a lake
and you catch m of them, ring them and throw them back, then when you first
catch a fish you’ve ringed already, m should be ‘about’ the√square root of N.
The intuitive reason why this holds is that once you have N samples, each
could potentially
√ match any of the others, so the number of possible matches
√
is about N x N or N, which is what you need3 .
This theorem has many applications for the security engineer. For example,
if we have a biometric system that can authenticate a person’s claim to identity
with a probability of only one in a million that two randomly selected subjects
will be falsely identified as the same person, this doesn’t mean that we can use
it as a reliable means of identification in a university with a user population of
3 More
precisely, the probability that m fish chosen randomly from N fish are different is
𝛽 = N(N − 1) … (N − m + 1)∕Nm which is asymptotically solved by N ≃ m2 ∕2log(1∕𝛽) [1039].
5.3 Security models
twenty thousand staff and students. This is because there will be almost two
hundred million possible pairs. In fact, you expect to find the first collision – the
first pair of people who can be mistaken for each other by the system – once
you have somewhat over a thousand people enrolled. It may well, however,
be OK to use it to verify a claimed identity (though many other things can go
wrong; see the chapter on Biometrics in Part 2 for a discussion).
There are some applications where collision-search attacks aren’t a problem,
such as in challenge-response protocols where an attacker has to find the
answer to the challenge just issued, and where you can prevent challenges
repeating. In identify-friend-or-foe (IFF) systems, for example, common
equipment has a response length of 48 to 80 bits. You can’t afford much more
than that, as it costs radar accuracy.
But there are other applications in which collisions are unacceptable. When
we design digital signature systems, we typically pass the message M through
a cryptographic hash function first, and then sign the hash h(M), for a number of reasons we’ll discuss later. In such an application, if it were possible
to find collisions with h(M1 ) = h(M2 ) but M1 ≠ M2 , then a Mafia owned bookstore’s web site might precalculate suitable pairs M1 , M2 , get you to sign an M1
saying something like “I hereby order a copy of Rubber Fetish volume 7 for
$32.95” and then present the signature together with an M2 saying something
like “I hereby mortgage my house for $75,000 and please send the funds to
Mafia Holdings Inc., Bermuda.”
For this reason, hash functions used with digital signature schemes have
n large enough to make them collision-free. Historically, the two most common hash functions have been MD5, which has a 128-bit output and will thus
require at most 264 computations to break, and SHA1 with a 160-bit output
and a work factor for the cryptanalyst of at most 280 . However, collision search
gives at best an upper bound on the strength of a hash function, and both these
particular functions have turned out to be disappointing, with cryptanalytic
attacks that I’ll describe later in section 5.6.2.
To sum up: if you need a cryptographic hash function to be collision resistant,
then you’d better choose a function with an output of at least 256 bits, such as
SHA-2 or SHA-3. However if you only need to be sure that nobody will find a
second preimage for an existing, externally given hash, then you can perhaps
make do with less.
5.3.2 Random generators – stream ciphers
The second basic cryptographic primitive is the random generator, also known
as a keystream generator or stream cipher. This is also a random function, but it’s
the reverse of the hash function in that it has a short input and a long output.
If we had a good pseudorandom function whose input and output were long
159
160
Chapter 5
■
Cryptography
enough, we could turn it into a hash function by throwing away all but a few
hundred bits of the output, and turn it into a stream cipher by padding all
but a few hundred bits of the input with a constant and using the output as a
keystream.
It can be used to protect the confidentiality of our backup data as follows:
we go to the keystream generator, enter a key, get a long file of random bits,
and exclusive-or it with our plaintext data to get ciphertext, which we then
send to our backup service in the cloud. (This is also called an additive stream
cipher as exclusive-or is addition modulo 2.) We can think of the elf generating
a random tape of the required length each time he is presented with a new key,
giving it to us and keeping a copy on his scroll for reference in case he’s given
the same input key again. If we need to recover the data, we go back to the
generator, enter the same key, get the same keystream, and exclusive-or it with
our ciphertext to get our plaintext back again. Other people with access to the
keystream generator won’t be able to generate the same keystream unless they
know the key. Note that this would not give us any guarantee of file integrity; as
we saw in the discussion of the one-time pad, adding a keystream to plaintext
can protect confidentiality, but it can’t detect modification of the file. For that,
we might make a hash of the file and keep that somewhere safe. It may be easier
to protect the hash from modification than the whole file.
One-time pad systems are a close fit for our theoretical model, except in that
they are used to secure communications across space rather than time: the
two communicating parties have shared a copy of a keystream in advance.
Vernam’s original telegraph cipher machine used punched paper tape; Marks
describes how SOE agents’ silken keys were manufactured in Oxford by retired
ladies shuffling counters; we’ll discuss modern hardware random number generators in the chapter on Physical Security.
A real problem with keystream generators is to prevent the same keystream
being used more than once, whether to encrypt more than one backup tape or
to encrypt more than one message sent on a communications channel. During
World War II, the amount of Russian diplomatic traffic exceeded the quantity
of one-time tape they had distributed in advance to their embassies, so it was
reused. But if M1 + K = C1 and M2 + K = C2 , then the opponent can combine
the two ciphertexts to get a combination of two messages: C1 − C2 = M1 − M2 ,
and if the messages Mi have enough redundancy then they can be recovered.
Text messages do in fact contain enough redundancy for much to be recovered;
in the case of the Russian traffic this led to the Venona project in which the US
and UK decrypted large amounts of wartime Russian traffic from 1943 onwards
and broke up a number of Russian spy rings. In the words of one former NSA
chief scientist, it became a “two-time tape”.
To avoid this, the normal engineering practice is to have not just a key but also
a seed (also known as an initialisation vector or IV) so we start the keystream at a
different place each time. The seed N may be a sequence number, or generated
5.3 Security models
from a protocol in a more complex way. Here, you need to ensure that both parties synchronise on the right working key even in the presence of an adversary
who may try to get you to reuse old keystream.
5.3.3 Random permutations – block ciphers
The third type of primitive, and the most important in modern cryptography, is
the block cipher, which we model as a random permutation. Here, the function
is invertible, and the input plaintext and the output ciphertext are of a fixed
size. With Playfair, both input and output are two characters; with DES, they’re
both bit strings of 64 bits. Whatever the number of symbols and the underlying
alphabet, encryption acts on a block of fixed length. (So if you want to encrypt
a shorter input, you have to pad it as with the final ‘z’ in our Playfair example.)
We can visualise block encryption as follows. As before, we have an elf in a
box with dice and a scroll. This has on the left a column of plaintexts and on
the right a column of ciphertexts. When we ask the elf to encrypt a message, it
checks in the left-hand column to see if it has a record of it. If not, it rolls the
dice to generate a random ciphertext of the appropriate size (and which doesn’t
appear yet in the right-hand column of the scroll), and then writes down the
plaintext/ciphertext pair in the scroll. If it does find a record, it gives us the
corresponding ciphertext from the right-hand column.
When asked to decrypt, the elf does the same, but with the function of the
columns reversed: he takes the input ciphertext, looks for it on the right-hand
scroll, and if he finds it he gives the message with which it was previously
associated. If not, he generates a new message at random, notes it down and
gives it to us.
A block cipher is a keyed family of pseudorandom permutations. For each
key, we have a single permutation that’s independent of all the others. We can
think of each key as corresponding to a different scroll. The intuitive idea is
that a cipher machine should output the ciphertext given the plaintext and the
key, and output the plaintext given the ciphertext and the key, but given only
the plaintext and the ciphertext it should output nothing. Furthermore, nobody
should be able to infer any information about plaintexts or ciphertexts that it
has not yet produced.
We will write a block cipher using the notation established for encryption in
the chapter on protocols:
C = {M}K
The random permutation model also allows us to define different types of
attack on block ciphers. In a known plaintext attack, the opponent is just given a
number of randomly chosen inputs and outputs from the oracle corresponding
to a target key. In a chosen plaintext attack, the opponent is allowed to put a
161
162
Chapter 5
■
Cryptography
certain number of plaintext queries and get the corresponding ciphertexts. In
a chosen ciphertext attack he gets to make a number of ciphertext queries. In a
chosen plaintext/ciphertext attack he is allowed to make queries of either type.
Finally, in a related key attack he can make queries that will be answered using
keys related to the target key K, such as K + 1 and K + 2.
In each case, the objective of the attacker may be either to deduce the answer
to a query he hasn’t already made (a forgery attack), or to recover the key (unsurprisingly known as a key recovery attack).
This precision about attacks is important. When someone discovers a vulnerability in a cryptographic primitive, it may or may not be relevant to your
application. Often it won’t be, but will have been hyped by the media – so you
will need to be able to explain clearly to your boss and your customers why
it’s not a problem. So you have to look carefully to find out exactly what kind
of attack has been found, and what the parameters are. For example, the first
major attack announced on the Data Encryption Standard algorithm (differential cryptanalysis) required 247 chosen plaintexts to recover the key, while
the next major attack (linear cryptanalysis) improved this to 243 known plaintexts. While these attacks were of huge scientific importance, their practical
engineering effect was zero, as no practical systems make that much known
text (let alone chosen text) available to an attacker. Such impractical attacks
are often referred to as certificational as they affect the cipher’s security certification rather than providing a practical exploit. They can have a commercial
effect, though: the attacks on DES undermined confidence and started moving people to other ciphers. In some other cases, an attack that started off as
certificational has been developed by later ideas into an exploit.
Which sort of attacks you should be worried about depends on your
application. With a broadcast entertainment system, for example, a hacker can
buy a decoder, watch a lot of movies and compare them with the enciphered
broadcast signal; so a known-plaintext attack might be the main threat. But
there are surprisingly many applications where chosen-plaintext attacks are
possible. A historic example is from World War II, where US analysts learned
of Japanese intentions for an island ‘AF’ which they suspected meant Midway.
So they arranged for Midway’s commander to send an unencrypted message
reporting problems with its fresh water condenser, and then intercepted a
Japanese report that ‘AF is short of water’. Knowing that Midway was the
Japanese objective, Admiral Chester Nimitz was waiting for them and sank
four Japanese carriers, turning the tide of the war [1003].
The other attacks are more specialised. Chosen plaintext/ciphertext attacks may
be a worry where the threat is a lunchtime attack: someone who gets temporary
access to a cryptographic device while its authorised user is out, and tries out
the full range of permitted operations for a while with data of their choice.
Related-key attacks are a concern where the block cipher is used as a building block in the construction of a hash function (which we’ll discuss below).
5.3 Security models
To exclude all such attacks, the goal is semantic security, as discussed above; the
cipher should not allow the inference of unauthorised information (whether of
plaintexts, ciphertexts or keys) other than with negligible probability.
5.3.4 Public key encryption and trapdoor one-way
permutations
A public-key encryption algorithm is a special kind of block cipher in which the
elf will perform the encryption corresponding to a particular key for anyone
who requests it, but will do the decryption operation only for the key’s owner.
To continue with our analogy, the user might give a secret name to the scroll
that only she and the elf know, use the elf’s public one-way function to compute
a hash of this secret name, publish the hash, and instruct the elf to perform
the encryption operation for anybody who quotes this hash. This means that a
principal, say Alice, can publish a key and if Bob wants to, he can now encrypt
a message and send it to her, even if they have never met. All that is necessary
is that they have access to the oracle.
The simplest variation is the trapdoor one-way permutation. This is a computation that anyone can perform, but which can be reversed only by someone who
knows a trapdoor such as a secret key. This model is like the ‘one-way function’
model of a cryptographic hash function. Let us state it formally nonetheless: a
public key encryption primitive consists of a function which given a random
input R will return two keys, KR (the public encryption key) and KR−1 (the
private decryption key) with the properties that
1. Given KR, it is infeasible to compute KR−1 (so it’s not possible to compute R either);
2. There is an encryption function { … } which, applied to a message M
using the encryption key KR, will produce a ciphertext C = {M}KR ; and
3. There is a decryption function which, applied to a ciphertext C using the
decryption key KR−1 , will produce the original message M = {C}KR−1 .
For practical purposes, we will want the oracle to be replicated at both ends
of the communications channel, and this means either using tamper-resistant
hardware or (more commonly) implementing its functions using mathematics
rather than metal.
In most real systems, the encryption is randomised, so that every time
someone uses the same public key to encrypt the same message, the answer
is different; this is necessary for semantic security, so that an opponent cannot
check whether a guess of the plaintext of a given ciphertext is correct. There
are even more demanding models than this, for example to analyse security
in the case where the opponent can get ciphertexts of their choice decrypted,
with the exception of the target ciphertext. But this will do for now.
163
164
Chapter 5
■
Cryptography
5.3.5 Digital signatures
The final cryptographic primitive we’ll define here is the digital signature. The
basic idea is that a signature on a message can be created by only one principal,
but checked by anyone. It can thus perform the same function in the electronic
world that ordinary signatures do in the world of paper. Applications include
signing software updates, so that a PC can tell that an update to Windows was
really produced by Microsoft rather than by a foreign intelligence agency.
Signature schemes, too, can be deterministic or randomised: in the first,
computing a signature on a message will always give the same result and in
the second, it will give a different result. (The latter is more like handwritten
signatures; no two are ever alike but the bank has a means of deciding whether
a given specimen is genuine or forged.) Also, signature schemes may or may
not support message recovery. If they do, then given the signature, anyone can
recover the message on which it was generated; if they don’t, then the verifier
needs to know or guess the message before they can perform the verification.
Formally, a signature scheme, like a public key encryption scheme, has a keypair generation function which given a random input R will return two keys,
𝜎R (the private signing key) and VR (the public signature verification key) with
the properties that
1. Given the public signature verification key VR, it is infeasible to compute
the private signing key 𝜎R;
2. There is a digital signature function which given a message M and a
private signature key 𝜎R, will produce a signature Sig𝜎R {M}; and
3. There is a verification function which, given a signature Sig𝜎R {M} and
the public signature verification key VR, will output TRUE if the signature was computed correctly with 𝜎R and otherwise output FALSE.
Where we don’t need message recovery, we can model a simple digital signature algorithm as a random function that reduces any input message to a
one-way hash value of fixed length, followed by a special kind of block cipher
in which the elf will perform the operation in one direction, known as signature, for only one principal. In the other direction, it will perform verification
for anybody.
For this simple scheme, signature verification means that the elf (or the
signature verification algorithm) only outputs TRUE or FALSE depending
on whether the signature is good. But in a scheme with message recovery,
anyone can input a signature and get back the message corresponding to it.
In our elf model, this means that if the elf has seen the signature before, it
will give the message corresponding to it on the scroll, otherwise it will give
a random value (and record the input and the random output as a signature
and message pair). This is sometimes desirable: when sending short messages
over a low bandwidth channel, it can save space if only the signature has to
5.4 Symmetric crypto algorithms
be sent rather than the signature plus the message. An application that uses
message recovery is machine-printed postage stamps, or indicia: the stamp
consists of a 2-d barcode with a digital signature made by the postal meter
and which contains information such as the value, the date and the sender’s
and recipient’s post codes. We discuss this at the end of section 16.3.2.
In the general case we do not need message recovery; the message to be
signed may be of arbitrary length, so we first pass it through a hash function
and then sign the hash value. We need the hash function to be not just one-way,
but also collision resistant.
5.4 Symmetric crypto algorithms
Now that we’ve tidied up the definitions, we’ll look under the hood to see
how they can be implemented in practice. While most explanations are geared
towards graduate mathematics students, the presentation I’ll give here is based
on one I developed over the years with computer science undergraduates, to
help the non-specialist grasp the essentials. In fact, even at the research level,
most of cryptography is as much computer science as mathematics: modern
attacks on ciphers are put together from guessing bits, searching for patterns,
sorting possible results and so on, and require ingenuity and persistence rather
than anything particularly highbrow.
5.4.1 SP-networks
Claude Shannon suggested in the 1940s that strong ciphers could be built by
combining substitution with transposition repeatedly. For example, one might
add some key material to a block of input text, and then shuffle subsets of the
input, and continue in this way a number of times. He described the properties of a cipher as being confusion and diffusion – adding unknown key values
will confuse an attacker about the value of a plaintext symbol, while diffusion means spreading the plaintext information through the ciphertext. Block
ciphers need diffusion as well as confusion.
The earliest block ciphers were simple networks which combined substitution and permutation circuits, and so were called SP-networks [1011].
Figure 5.10 shows an SP-network with sixteen inputs, which we can imagine
as the bits of a sixteen-bit number, and two layers of four-bit invertible
substitution boxes (or S-boxes), each of which can be visualised as a lookup
table containing some permutation of the numbers 0 to 15.
The point of this arrangement is that if we were to implement an arbitrary 16
bit to 16 bit function in digital logic, we would need 220 bits of memory – one
lookup table of 216 bits for each single output bit. That’s hundreds of thousands
165
166
Chapter 5
■
Cryptography
S-box
S-box
S-box
S-box
S-box
S-box
S-box
S-box
Figure 5.10: A simple 16-bit SP-network block cipher
of gates, while a four bit to four bit function takes only 4 x 24 or 64 bits of
memory. One might hope that with suitable choices of parameters, the function
produced by iterating this simple structure would be indistinguishable from a
random 16 bit to 16 bit function to an opponent who didn’t know the value of
the key. The key might consist of some choice of a number of four-bit S-boxes,
or it might be added at each round to provide confusion and the resulting text
fed through the S-boxes to provide diffusion.
Three things need to be done to make such a design secure:
1. the cipher needs to be “wide” enough
2. it needs to have enough rounds, and
3. the S-boxes need to be suitably chosen.
5.4.1.1 Block size
First, a block cipher which operated on sixteen bit blocks would be rather limited, as an opponent could just build a dictionary of plaintext and ciphertext
blocks as they were observed. The birthday theorem tells us that even if the
input plaintexts were random, he’d expect to find a match as soon as he had
seen a few hundred blocks. So a practical block cipher will usually deal with
plaintexts and ciphertexts of 64 bits, 128 bits or even more. So if we are using
four-bit to four-bit S-boxes, we may have 16 of them (for a 64 bit block size) or
32 of them (for a 128 bit block size).
5.4.1.2 Number of rounds
Second, we have to have enough rounds. The two rounds in Figure 5.10 are
completely inadequate, as an opponent can deduce the values of the S-boxes
5.4 Symmetric crypto algorithms
by tweaking input bits in suitable patterns. For example, he could hold the
rightmost 12 bits constant and try tweaking the leftmost four bits, to deduce
the values in the top left S-box. (The attack is slightly more complicated than
this, as sometimes a tweak in an input bit to an S-box won’t produce a change
in any output bit, so we have to change one of its other inputs and tweak again.
But it is still a basic student exercise.)
The number of rounds we need depends on the speed with which data diffuse through the cipher. In our simple example, diffusion is very slow because
each output bit from one round of S-boxes is connected to only one input bit in
the next round. Instead of having a simple permutation of the wires, it is more
efficient to have a linear transformation in which each input bit in one round
is the exclusive-or of several output bits in the previous round. If the block
cipher is to be used for decryption as well as encryption, this linear transformation will have to be invertible. We’ll see some concrete examples below in
the sections on AES and DES.
5.4.1.3 Choice of S-boxes
The design of the S-boxes also affects the number of rounds required for security, and studying bad choices gives us our entry into the deeper theory of block
ciphers. Suppose that the S-box were the permutation that maps the inputs
(0,1,2, … ,15) to the outputs (5,7,0,2,4,3,1,6,8,10,15,12,9,11,14,13). Then the most
significant bit of the input would come through unchanged as the most significant bit of the output. If the same S-box were used in both rounds in the
above cipher, then the most significant bit of the input would pass through to
become the most significant bit of the output. We certainly couldn’t claim that
our cipher was pseudorandom.
5.4.1.4 Linear cryptanalysis
Attacks on real block ciphers are usually harder to spot than in this example,
but they use the same ideas. It might turn out that the S-box had the property
that bit one of the input was equal to bit two plus bit four of the output; more
commonly, there will be linear approximations to an S-box which hold with
a certain probability. Linear cryptanalysis [897, 1246] proceeds by collecting a
number of relations such as “bit 2 plus bit 5 of the input to the first S-box is equal
to bit 1 plus bit 8 of the output, with probability 13/16”, then searching for
ways to glue them together into an algebraic relation between input bits, output bits and key bits that holds with a probability different from one half. If we
can find a linear relationship that holds over the whole cipher with probability
p = 0.5 + 1∕M, then according to the sampling theorem in probability theory
we can expect to start recovering keybits once we have about M2 known texts.
167
168
Chapter 5
■
Cryptography
If the value of M2 for the best linear relationship is greater than the total possible number of known texts (namely 2n where the inputs and outputs are n bits
wide), then we consider the cipher to be secure against linear cryptanalysis.
5.4.1.5 Differential cryptanalysis
Differential Cryptanalysis [246, 897] is similar but is based on the probability that
a given change in the input to an S-box will give rise to a certain change in the
output. A typical observation on an 8-bit S-box might be that “if we flip input
bits 2, 3, and 7 at once, then with probability 11∕16 the only output bits that will
flip are 0 and 1”. In fact, with any nonlinear Boolean function, tweaking some
combination of input bits will cause some combination of output bits to change
with a probability different from one half. The analysis procedure is to look at
all possible input difference patterns and look for those values 𝛿i , 𝛿o such that
an input change of 𝛿i will produce an output change of 𝛿o with particularly
high (or low) probability.
As in linear cryptanalysis, we then search for ways to join things up so that
an input difference which we can feed into the cipher will produce a known
output difference with a useful probability over a number of rounds. Given
enough chosen inputs, we will see the expected output and be able to make
deductions about the key. As in linear cryptanalysis, it’s common to consider
the cipher to be secure if the number of texts required for an attack is greater
than the total possible number of different texts for that key. (We have to be
careful of pathological cases, such as if you had a cipher with a 32-bit block
and a 128-bit key with a differential attack whose success probability given a
single pair was 2−40 . Given a lot of text under a number of keys, we’d eventually
solve for the current key.)
There are many variations on these two themes. For example, instead of
looking for high probability differences, we can look for differences that can’t
happen (or that happen only rarely). This has the charming name of impossible
cryptanalysis, but it is quite definitely possible against many systems [243]4 .
Block cipher design involves a number of trade-offs. For example, we can
reduce the per-round information leakage, and thus the required number of
rounds, by designing the rounds carefully. But a complex design might be slow
in software, or need a lot of gates in hardware, so using simple rounds but more
of them might have been better. Simple rounds may also be easier to analyse.
A prudent designer will also use more rounds than are strictly necessary to
block the attacks known today, in order to give some safety margin, as attacks
only ever get better. But while we may be able to show that a cipher resists
all the attacks we know of, and with some safety margin, this says little about
4 This
may have been used first at Bletchley in World War II where a key insight into breaking the
German Enigma machine was that no letter ever enciphered to itself.
5.4 Symmetric crypto algorithms
whether it will resist novel types of attack. (A general security proof for a block
cipher would appear to imply a result such as P ≠ NP that would revolutionise
computer science.)
5.4.2 The Advanced Encryption Standard (AES)
The Advanced Encryption Standard (AES) is an algorithm originally known
as Rijndael after its inventors Vincent Rijmen and Joan Daemen [507]. It acts
on 128-bit blocks and can use a key of 128, 192 or 256 bits in length. It is an
SP-network; in order to specify it, we need to fix the S-boxes, the linear transformation between the rounds, and the way in which the key is added into the
computation.
AES uses a single S-box that acts on a byte input to give a byte output. For
implementation purposes it can be regarded simply as a lookup table of 256
bytes; it is actually defined by the equation S(x) = M(1∕x) + b over the field
GF(28 ) where M is a suitably chosen matrix and b is a constant. This construction gives tight differential and linear bounds.
The linear transformation is based on arranging the 16 bytes of the value
being enciphered in a square and then doing bytewise shuffling and mixing
operations. The first step is the shuffle, in which the top row of four bytes is
left unchanged while the second row is shifted one place to the left, the third
row by two places and the fourth row by three places. The second step is a
column-mixing step in which the four bytes in a column are mixed using matrix
multiplication. This is illustrated in Figure 5.11, which shows, as an example,
how a change in the value of the third byte in the first column is propagated.
The effect of this combination is that a change in the input to the cipher can
potentially affect all of the output after just two rounds – an avalanche effect
that makes both linear and differential attacks harder.
The key material is added byte by byte after the linear transformation. This
means that 16 bytes of key material are needed per round; they are derived
from the user supplied key material by means of a recurrence relation.
1
...
1
1
2
2
2
3
3
3
4
4
Shift
row
4
Mix
column
Figure 5.11: The AES linear transformation, illustrated by its effect on byte 3 of the input
169
170
Chapter 5
■
Cryptography
The algorithm uses 10 rounds with 128-bit keys, 12 rounds with 192-bit
keys and 14 rounds with 256-bit keys. These are enough to give practical, but
not certificational, security – as indeed we expected at the time of the AES
competition, and as I described in earlier editions of this chapter. The first
key-recovery attacks use a technique called biclique cryptanalysis and were
discovered in 2009 by Andrey Bogdanov, Dmitry Khovratovich, and Christian
Rechberger [274]; they give only a very small advantage, with complexity now
estimated at 2126 for 128-bit AES and 2254.3 for 256-bit AES, as opposed to 2127
and 2255 for brute-force search. Faster shortcut attacks are known for the case
where we have related keys. But none of these attacks make any difference
in practice, as they require infeasibly large numbers of texts or very special
combinations of related keys.
Should we trust AES? The governments of Russia, China and Japan try to
get firms to use local ciphers instead, and the Japanese offering, Camellia, is
found in a number of crypto libraries alongside AES and another AES competition finalist, Bruce Schneier’s Twofish. (Camellia was designed by a team
whose own AES candidate was knocked out at the first round.) Conspiracy theorists note that the US government picked the weakest of the five algorithms
that were finalists in the AES competition. Well, I was one of the designers of
the AES finalist Serpent [95], which came second in the competition: the winner Rijndael got 86 votes, Serpent 59 votes, Twofish 31 votes, RC6 23 votes
and MARS 13 votes. Serpent has a simple structure that makes it easy to analyse – the structure of Figure 5.10, but modified to be wide enough and to have
enough rounds – and was designed to have a much larger security margin than
Rijndael in anticipation of the attacks that have now appeared. Yet the simple
fact is that while Serpent is more secure, Rijndael is faster; industry and crypto
researchers voted for it at the last AES conference, and NIST approved it as the
standard.
Having been involved in the whole process, and having worked on the analysis and design of shared-key ciphers for much of the 1990s, I have a high level
of confidence that AES is secure against practical attacks based on mathematical cryptanalysis. And even though AES is less secure than Serpent, practical
security is all about implementation, and we now have enormous experience at
implementing AES. Practical attacks include timing analysis and power analysis. In the former, the main risk is that an opponent observes cache misses and
uses them to work out the key. In the latter, an opponent uses measurements of
the current drawn by the device doing the crypto – think of a bank smartcard
that a customer places in a terminal in a Mafia-owned shop. I discuss both in
detail in Part 2, in the chapter on Emission Security; countermeasures include
special operations in many CPUs to do AES, which are available precisely
because the algorithm is now a standard. It does not make sense to implement
Serpent as well, ‘just in case AES is broken’: having swappable algorithms is
5.4 Symmetric crypto algorithms
known as pluggable cryptography, yet the risk of a fatal error in the algorithm
negotiation protocol is orders of magnitude greater than the risk that anyone
will come up with a production attack on AES. (We’ll see a number of examples
later where using multiple algorithms caused something to break horribly.)
The back story is that, back in the 1970s, the NSA manipulated the choice
and parameters of the previous standard block cipher, the Data Encryption
Standard (DES) in such a way as to deliver a cipher that was good enough
for US industry at the time, while causing foreign governments to believe it
was insecure, so they used their own weak designs instead. I’ll discuss this
in more detail below, once I’ve described the design of DES. AES seems to
have followed this playbook; by selecting an algorithm that was only just
strong enough mathematically and whose safe implementation requires skill
and care, the US government saw to it that firms in Russia, China, Japan and
elsewhere will end up using systems that are less secure because less skill and
effort has been invested in the implementation. However, this was probably
luck rather than Machiavellian cunning: the relevant committee at NIST
would have had to have a lot of courage to disregard the vote and choose
another algorithm instead. Oh, and the NSA has since 2005 approved AES
with 128-bit keys for protecting information up to SECRET and with 192-bit
or 256-bit keys for TOP SECRET. So I recommend that you use AES instead
of GOST, or Camellia, or even Serpent. The definitive specification of AES is
Federal Information Processing Standard 197, and its inventors have written
a book describing its design in detail [507].
5.4.3 Feistel ciphers
Many block ciphers use a more complex structure, which was invented by
Feistel and his team while they were developing the Mark XII IFF in the late
1950s and early 1960s. Feistel then moved to IBM and founded a research group
that produced the Data Encryption Standard (DES) algorithm, which is still a
mainstay of payment system security.
A Feistel cipher has the ladder structure shown in Figure 5.12. The input is
split up into two blocks, the left half and the right half. A round function f1 of the
left half is computed and combined with the right half using exclusive-or
(binary addition without carry), though in some Feistel ciphers addition
with carry is also used. (We use the notation ⊕ for exclusive-or.) Then, a
function f2 of the right half is computed and combined with the left half,
and so on. Finally (if the number of rounds is even) the left half and right half
are swapped.
A notation which you may see for the Feistel cipher is 𝜓(f , g, h, ...) where f ,
g, h, … are the successive round functions. Under this notation, the above
171
172
Chapter 5
■
Cryptography
Left Half
•
Right Half
f1
f2
•
. . .
f2k
•
Figure 5.12: The Feistel cipher structure
cipher is 𝜓(f1 , f2 , ... f2k−1 , f2k ). The basic result that enables us to decrypt a Feistel
cipher – and indeed the whole point of his design – is that:
𝜓 −1 (f1 , f2 , ..., f2k−1 , f2k ) = 𝜓(f2k , f2k−1 , ..., f2 , f1 )
In other words, to decrypt, we just use the round functions in the reverse
order. Thus the round functions fi do not have to be invertible, and the Feistel
structure lets us turn any one-way function into a block cipher. This means
that we are less constrained in trying to choose a round function with good
diffusion and confusion properties, and which also satisfies any other design
constraints such as code size, software speed or hardware gate count.
5.4 Symmetric crypto algorithms
5.4.3.1 The Luby-Rackoff result
The key theoretical result on Feistel ciphers was proved by Mike Luby
and Charlie Rackoff in 1988. They showed that if fi were random functions, then 𝜓(f1 , f2 , f3 ) was indistinguishable from a random permutation
under chosen-plaintext attack, and this result was soon extended to show
that 𝜓(f1 , f2 , f3 , f4 ) was indistinguishable under chosen plaintext/ciphertext
attack – in other words, it was a pseudorandom permutation. (I omit a number
of technicalities.)
In engineering terms, the effect is that given a really good round function,
four rounds of Feistel are enough. So if we have a hash function in which we
have confidence, it is straightforward to construct a block cipher from it: use
four rounds of keyed hash in a Feistel network.
5.4.3.2 DES
The DES algorithm is widely used in banking and other payment applications.
The ‘killer app’ that got it widely deployed was ATM networks; from there
it spread to prepayment meters, transport tickets and much else. In its classic
form, it is a Feistel cipher, with a 64-bit block and 56-bit key. Its round function
operates on 32-bit half blocks and consists of three operations:
first, the block is expanded from 32 bits to 48;
next, 48 bits of round key are mixed in using exclusive-or;
the result is passed through a row of eight S-boxes, each of
which takes a six-bit input and provides a four-bit output;
finally, the bits of the output are permuted according to a fixed pattern.
The effect of the expansion, key mixing and S-boxes is shown in Figure 5.13:
Key added
in here
Si – 1
Si
Figure 5.13: The DES round function
Si + 1
173
174
Chapter 5
■
Cryptography
The round keys are derived from the user-supplied key by using each user
key bit in twelve different rounds according to a slightly irregular pattern. A
full specification of DES is given in [1399].
DES was introduced in 1974 and immediately caused controversy. The most
telling criticism was that the key is too short. Someone who wants to find a 56
bit key using brute force, that is by trying all possible keys, will have a total
exhaust time of 256 encryptions and an average solution time of half that, namely
255 encryptions. Whit Diffie and Martin Hellman argued in 1977 that a DES keysearch machine could be built with a million chips, each testing a million keys
a second; as a million is about 220 , this would take on average 215 seconds, or a
bit over 9 hours, to find the key. They argued that such a machine could be built
for $20 million in 1977 [557]. IBM, whose scientists invented DES, retorted that
they would charge the US government $200 million to build such a machine.
(In hindsight, both were right.)
During the 1980’s, there were persistent rumors of DES keysearch machines
being built by various intelligence agencies, but the first successful public keysearch attack took place in 1997. In a distributed effort organised over the net,
14,000 PCs took more than four months to find the key to a challenge. In 1998,
the Electronic Frontier Foundation (EFF) built a DES keysearch machine called
Deep Crack for under $250,000, which broke a DES challenge in 3 days. It contained 1,536 chips run at 40MHz, each chip containing 24 search units which
each took 16 cycles to do a test decrypt. The search rate was thus 2.5 million
test decryptions per second per search unit, or 60 million keys per second per
chip. The design of the cracker is public and can be found at [619]. By 2006,
Sandeep Kumar and colleagues at the universities of Bochum and Kiel built
a machine using 120 FPGAs and costing $10,000, which could break DES in 7
days on average [1110]. A modern botnet with 100,000 machines would take a
few hours. So the key length of single DES is now inadequate.
Another criticism of DES was that, since IBM kept its design principles secret
at the request of the US government, perhaps there was a ‘trapdoor’ which
would give them easy access. However, the design principles were published
in 1992 after differential cryptanalysis was invented and published [473].
The story was that IBM had discovered these techniques in 1972, and the US
National Security Agency (NSA) even earlier. IBM kept the design details
secret at the NSA’s request. We’ll discuss the political aspects of all this
in 26.2.7.1.
We now have a fairly thorough analysis of DES. The best known shortcut
attack, that is, a cryptanalytic attack involving less computation than keysearch,
is a linear attack using 242 known texts. DES would be secure with more than
20 rounds, but for practical purposes its security is limited by its keylength.
I don’t know of any real applications where an attacker might get hold of even
240 known texts. So the known shortcut attacks are not an issue. However, its
5.5 Modes of operation
vulnerability to keysearch makes single DES unusable in most applications. As
with AES, there are also attacks based on timing analysis and power analysis.
The usual way of dealing with the DES key length problem is to use the
algorithm multiple times with different keys. Banking networks have largely
moved to triple-DES, a standard since 1999 [1399]. Triple-DES does an encryption, then a decryption, and then a further encryption, all with independent
keys. Formally:
3DES(k0 , k1 , k2 ; M) = DES(k2 ; DES−1 (k1 ; DES(k0 ; M)))
By setting the three keys equal, you get the same result as a single DES
encryption, thus giving a backwards compatibility mode with legacy equipment. (Some banking systems use two-key triple-DES which sets k2 = k0 ; this
gives an intermediate step between single and triple DES.) Most new systems
use AES as the default choice, but many banking systems are committed to
using block ciphers with an eight-byte block, because of the message formats
used in the many protocols by which ATMs, point-of-sale terminals and bank
networks talk to each other, and because of the use of block ciphers to generate
and protect customer PINs (which I discuss in the chapter on Banking and
Bookkeeping). Triple DES is a perfectly serviceable block cipher for such
purposes for the foreseeable future.
Another way of preventing keysearch (and making power analysis harder) is
whitening. In addition to the 56-bit key, say k0 , we choose two 64-bit whitening
keys k1 and k2 , xor’ing the first with the plaintext before encryption and the
second with the output of the encryption to get the ciphertext afterwards. This
composite cipher is known as DESX. Formally,
DESX(k0 , k1 , k2 ; M) = DES(k0 ; M ⊕ k1 ) ⊕ k2
It can be shown that, on reasonable assumptions, DESX has the properties
you’d expect; it inherits the differential strength of DES but its resistance to
keysearch is increased by the amount of the whitening [1049]. Whitened block
ciphers are used in some applications, most specifically in the XTS mode of
operation which I discuss below. Nowadays, it’s usually used with AES, and
AESX is defined similarly, with the whitening keys used to make each block
encryption operation unique – as we shall see below in section 5.5.7.
5.5 Modes of operation
A common failure is that cryptographic libraries enable or even encourage
developers to use an inappropriate mode of operation. This specifies how a block
cipher with a fixed block size (8 bytes for DES, 16 for AES) can be extended to
process messages of arbitrary length.
175
176
Chapter 5
■
Cryptography
There are several standard modes of operation for using a block cipher on
multiple blocks [1406]. It is vital to understand them, so you can choose the
right one for the job, especially as some common tools provide a weak one
by default. This weak mode is electronic code book (ECB) mode, which we
discuss next.
5.5.1 How not to use a block cipher
In electronic code book mode, we just encrypt each succeeding block of plaintext with our block cipher to get ciphertext, as with the Playfair example above.
This is adequate for protocols using single blocks such as challenge-response
and some key management tasks; it’s also used to encrypt PINs in cash machine
systems. But if we use it to encrypt redundant data the patterns will show
through, giving an opponent information about the plaintext. For example,
figure 5.14 shows what happens to a cartoon image when encrypted using DES
in ECB mode. Repeated blocks of plaintext all encrypt to the same ciphertext,
leaving the image quite recognisable.
In one popular corporate email system from the last century, the encryption
used was DES ECB with the key derived from an eight-character password.
If you looked at a ciphertext generated by this system, you saw that a certain block was far more common than the others – the one corresponding to
a plaintext of nulls. This gave one of the simplest attacks ever on a fielded DES
encryption system: just encrypt a null block with each password in a dictionary and sort the answers. You can now break at sight any ciphertext whose
password was one of those in your dictionary.
In addition, using ECB mode to encrypt messages of more than one block
length which require authenticity – such as bank payment messages – is
(a) plaintext
(b) ECB ciphertext
Figure 5.14: The Linux penguin, in clear and ECB encrypted (from Wikipedia, derived from images
created by Larry Ewing).
5.5 Modes of operation
particularly foolish, as it opens you to a cut and splice attack along the block
boundaries. For example, if a bank message said “Please pay account number
X the sum Y, and their reference number is Z” then an attacker might initiate
a payment designed so that some of the digits of X are replaced with some of
the digits of Z.
5.5.2 Cipher block chaining
Most commercial applications which encrypt more than one block used to use
cipher block chaining, or CBC, mode. Like ECB, this was one of the original
modes of operation standardised with DES. In it, we exclusive-or the previous block of ciphertext to the current block of plaintext before encryption (see
Figure 5.15).
This mode disguises patterns in the plaintext: the encryption of each block
depends on all the previous blocks. The input initialisation vector (IV) ensures
that stereotyped plaintext message headers won’t leak information by encrypting to identical ciphertexts, just as with a stream cipher.
However, an opponent who knows some of the plaintext may be able to cut
and splice a message (or parts of several messages encrypted under the same
key). In fact, if an error is inserted into the ciphertext, it will affect only two
blocks of plaintext on decryption, so if there isn’t any integrity protection on the
plaintext, an enemy can insert two-block garbles of random data at locations
of their choice. For that reason, CBC encryption usually has to be used with a
separate authentication code.
More subtle things can go wrong, too; systems have to pad the plaintext to a
multiple of the block size, and if a server that decrypts a message and finds
incorrect padding signals this fact, whether by returning an ‘invalid padding’
message or just taking longer to respond, then this opens a padding oracle attack
P1
P2
P3
EK
EK
EK
C1
C2
C3
IV
Figure 5.15: Cipher Block Chaining (CBC) mode
...
177
178
Chapter 5
■
Cryptography
in which the attacker tweaks input ciphertexts, one byte at a time, watches
the error messages, and ends up being able to decrypt whole messages. This
was discovered by Serge Vaudenay in 2002; variants of it were used against
SSL, IPSEC and TLS as late as 2016 [1953].
5.5.3 Counter encryption
Feedback modes of block cipher encryption are falling from fashion, and not
just because of cryptographic issues. They are hard to parallelise. With CBC,
a whole block of the cipher must be computed between each block input and
each block output. This can be inconvenient in high-speed applications, such
as protecting traffic on backbone links. As silicon is cheap, we would rather
pipeline our encryption chip, so that it encrypts a new block (or generates a
new block of keystream) in as few clock ticks as possible.
The simplest solution is to use AES as a stream cipher. We generate
a keystream by encrypting a counter starting at an initialisation vector:
Ki = {IV + i}K , thus expanding the key K into a long stream of blocks Ki of
keystream, which is typically combined with the blocks of a message Mi using
exclusive-or to give ciphertext Ci = Mi ⊕ Ki .
Additive stream ciphers have two systemic vulnerabilities, as we noted in
section 5.2.2 above. The first is an attack in depth: if the same keystream is
used twice, then the xor of the two ciphertexts is the xor of the two plaintexts,
from which plaintext can often be deduced, as with Venona. The second is that
they fail to protect message integrity. Suppose that a stream cipher were used
to encipher fund transfer messages. These messages are highly structured; you
might know, for example, that bytes 37–42 contain the sum being transferred.
You could then cause the data traffic from a local bank to go via your computer,
for example by an SS7 exploit. You go into the bank and send $500 to an accomplice. The ciphertext Ci = Mi ⊕ Ki , duly arrives in your machine. You know Mi
for bytes 37–42, so you can recover Ki and construct a modified message which
instructs the receiving bank to pay not $500 but $500,000! This is an example of
an attack in depth; it is the price not just of the perfect secrecy we get from the
one-time pad, but of much more humble stream ciphers, too.
The usual way of dealing with this is to add an authentication code, and the
most common standard uses a technique called Galois counter mode, which I
describe later.
5.5.4 Legacy stream cipher modes
You may find two old stream-cipher modes of operation, output feedback
mode (OFB) and less frequently ciphertext feedback mode (CFB).
5.5 Modes of operation
Output feedback mode consists of repeatedly encrypting an initial value and
using this as a keystream in a stream cipher. Writing IV for the initialization
vector, we will have K1 = {IV}K and Ki = {IV}K(i−1) . However an n-bit block
cipher in OFB mode will typically have a cycle length of 2n∕2 blocks, after which
the birthday theorem will see to it that we loop back to the IV. So we may have
a cycle-length problem if we use a 64-bit block cipher such as triple-DES on a
high-speed link: once we’ve called a little over 232 pseudorandom 64-bit values,
the odds favour a match. (In CBC mode, too, the birthday theorem ensures that
after about 2n∕2 blocks, we will start to see repeats.) Counter mode encryption,
however, has a guaranteed cycle length of 2n rather than 2n∕2 , and as we noted
above is easy to parallelise. Despite this OFB is still used, as counter mode only
became a NIST standard in 2002.
Cipher feedback mode is another kind of stream cipher, designed for
use in radio systems that have to resist jamming. It was designed to be
self-synchronizing, in that even if we get a burst error and drop a few bits, the
system will recover synchronization after one block length. This is achieved
by using our block cipher to encrypt the last n bits of ciphertext, adding
the last output bit to the next plaintext bit, and shifting the ciphertext along
one bit. But this costs one block cipher operation per bit and has very bad
error amplification properties; nowadays people tend to use dedicated link
layer protocols for synchronization and error correction rather than trying to
combine them with the cryptography at the traffic layer.
5.5.5 Message authentication code
Another official mode of operation of a block cipher is not used to encipher
data, but to protect its integrity and authenticity. This is the message authentication code, or MAC. To compute a MAC on a message using a block cipher, we
encrypt it using CBC mode and throw away all the output ciphertext blocks
except the last one; this last block is the MAC. (The intermediate results are
kept secret in order to prevent splicing attacks.)
This construction makes the MAC depend on all the plaintext blocks as well
as on the key. It is secure provided the message length is fixed; Mihir Bellare,
Joe Kilian and Philip Rogaway proved that any attack on a MAC under these
circumstances would give an attack on the underlying block cipher [212].
If the message length is variable, you have to ensure that a MAC computed
on one string can’t be used as the IV for computing a MAC on a different string,
so that an opponent can’t cheat by getting a MAC on the composition of the two
strings. In order to fix this problem, NIST has standardised CMAC, in which
a variant of the key is xor-ed in before the last encryption [1407]. (CMAC is
based on a proposal by Tetsu Iwata and Kaoru Kurosawa [967].) You may see
legacy systems in which the MAC consists of only half of the last output block,
with the other half thrown away, or used in other mechanisms.
179
180
Chapter 5
■
Cryptography
There are other possible constructions of MACs: the most common one is
HMAC, which uses a hash function with a key; we’ll describe it in section 5.6.2.
5.5.6 Galois counter mode
The above modes were all developed for DES in the 1970s and 1980s (although
counter mode only became an official US government standard in 2002). They
are not efficient for bulk encryption where you need to protect integrity as well
as confidentiality; if you use either CBC mode or counter mode to encrypt your
data and a CBC-MAC or CMAC to protect its integrity, then you invoke the
block cipher twice for each block of data you process, and the operation cannot
be parallelised.
The modern approach is to use a mode of operation designed for authenticated encryption. Galois Counter Mode (GCM) has taken over as the default
since being approved by NIST in 2007 [1409]. It uses only one invocation of
the block cipher per block of text, and it’s parallelisable so you can get high
throughput on fast data links with low cost and low latency. Encryption is performed in a variant of counter mode; the resulting ciphertexts are also used as
coefficients of a polynomial which is evaluated at a key-dependent point over
a Galois field of 2128 elements to give an authenticator tag. The tag computation
is a universal hash function of the kind I described in section 5.2.4 and is provably secure so long as keys are never reused. The supplied key is used along
with a random IV to generate both a unique message key and a unique authenticator key. The output is thus a ciphertext of the same length as the plaintext,
plus an IV and a tag of typically 128 bits each.
GCM also has an interesting incremental property: a new authenticator
and ciphertext can be calculated with an amount of effort proportional to the
number of bits that were changed. GCM was invented by David McGrew and
John Viega of Cisco; their goal was to create an efficient authenticated encryption mode suitable for use in high-performance network hardware [1270]. It
is the sensible default for authenticated encryption of bulk content. (There’s
an earlier composite mode, CCM, which you’ll find used in Bluetooth 4.0
and later; this combines counter mode with CBC-MAC, so it costs about
twice as much effort to compute, and cannot be parallelised or recomputed
incrementally [1408].)
5.5.7 XTS
GCM and other authenticated encryption modes expand the plaintext by
adding a message key and an authenticator tag. This is very inconvenient
in applications such as hard disk encryption, where we prefer a mode of
5.6 Hash functions
operation that preserves plaintext length. Disk encryption systems used to
use CBC with the sector number providing an IV, but since Windows 10,
Microsoft has been using a new mode of operation, XTS-AES, inspired by
GCM and standardised in 2007. This is a codebook mode but with the plaintext
whitened by a tweak key derived from the disk sector. Formally, the message
Mi encrypted with the key K at block j is
AESX(KTj , K, KTj ; M)
where the tweak key KTj is derived by encrypting the IV using a different
key and then multiplying it repeatedly with a suitable constant so as to give
a different whitener for each block. This means that if an attacker swaps two
encrypted blocks, all 256 bits will decrypt to randomly wrong values. You still
need higher-layer mechanisms to detect ciphertext manipulation, but simple
checksums will be sufficient.
5.6 Hash functions
In section 5.4.3.1 I showed how the Luby-Rackoff theorem enables us to construct a block cipher from a hash function. It’s also possible to construct a hash
function from a block cipher5 . The trick is to feed the message blocks one at
a time to the key input of our block cipher, and use it to update a hash value
(which starts off at say H0 = 0). In order to make this operation non-invertible,
we add feedforward: the (i − 1)st hash value is exclusive or’ed with the output
of round i. This Davies-Meyer construction gives our final mode of operation of
a block cipher (Figure 5.16).
The birthday theorem makes another appearance here, in that if a hash function h is built using an n bit block cipher, it is possible to find two messages
M1 ≠ M2 with h(M1 ) = h(M2 ) with about 2n∕2 effort (hash slightly more than
that many messages Mi and look for a match). So a 64 bit block cipher is not
adequate, as forging a message would cost of the order of 232 messages, which
is just too easy. A 128-bit cipher such as AES used to be just about adequate,
and in fact the AACS content protection mechanism in Blu-ray DVDs used
‘AES-H’, the hash function derived from AES in this way.
5.6.1 Common hash functions
The hash functions most commonly used through the 1990s and 2000s evolved
as variants of a block cipher with a 512 bit key and a block size increasing from
5 In
fact, we can also construct hash functions and block ciphers from stream ciphers – so, subject
to some caveats I’ll discuss in the next section, given any one of these three primitives we can
construct the other two.
181
182
Chapter 5
■
Cryptography
hi−1
•
Mi
E
hi
Figure 5.16: Feedforward mode (hash function)
128 to 512 bits. The first two were designed by Ron Rivest and the others by
the NSA:
MD4 has three rounds and a 128 bit hash value, and a collision was
found for it in 1998 [568];
MD5 has four rounds and a 128 bit hash value, and a collision was found
for it in 2004 [1983, 1985];
SHA-1, released in 1995, has five rounds and a 160 bit hash value. A collision was found in 2017 [1831], and a more powerful version of the attack
in 2020 [1148];
SHA-2, which replaced it in 2002, comes in 256-bit and 512-bit
versions (called SHA256 and SHA512) plus a number of variants.
The block ciphers underlying these hash functions are similar: their round
function is a complicated mixture of the register operations available on 32 bit
processors [1670]. Cryptanalysis has advanced steadily. MD4 was broken by
Hans Dobbertin in 1998 [568]; MD5 was broken by Xiaoyun Wang and her colleagues in 2004 [1983, 1985]; collisions can now be found easily, even between
strings containing meaningful text and adhering to message formats such as
those used for digital certificates. Wang seriously dented SHA-1 the following year in work with Yiqun Lisa Yin and Hongbo Yu, providing an algorithm
to find collisions in only 269 steps [1984]; it now takes about 260 computations.
In February 2017, scientists from Amsterdam and Google published just such a
collision, to prove the point and help persuade people to move to stronger hash
functions such as SHA-2 [1831] (and from earlier versions of TLS to TLS 1.3).
In 2020, Gaëtan Leurent and Thomas Peyrin developed an improved attack
that computes chosen-prefix collisions, enabling certificate forgery at a cost of
several tens of thousands of dollars [1148].
5.6 Hash functions
In 2007, the US National Institute of Standards and Technology (NIST) organised a competition to find a replacement hash function family [1411]. The winner, Keccak, has a quite different internal structure, and was standardised as
SHA-3 in 2015. So we now have a choice of SHA-2 and SHA-3 as standard hash
functions.
A lot of deployed systems still use hash functions such as MD5 for which
there’s an easy collision-search algorithm. Whether a collision will break any
given application can be a complex question. I already mentioned forensic systems, which keep hashes of files on seized computers, to reassure the court
that the police didn’t tamper with the evidence; a hash collision would merely
signal that someone had been trying to tamper, whether the police or the defendant, and trigger a more careful investigation. If bank systems actually took a
message composed by a customer saying ‘Pay X the sum Y’, hashed it and
signed it, then a crook could find two messages ‘Pay X the sum Y’ and ‘Pay X
the sum Z’ that hashed to the same value, get one signed, and swap it for the
other. But bank systems don’t work like that. They typically use MACs rather
than digital signatures on actual transactions, and logs are kept by all the parties to a transaction, so it’s not easy to sneak in one of a colliding pair. And in
both cases you’d probably have to find a preimage of an existing hash value,
which is a much harder cryptanalytic task than finding a collision.
5.6.2 Hash function applications – HMAC, commitments
and updating
But even though there may be few applications where a collision-finding
algorithm could let a bad guy steal real money today, the existence of a vulnerability can still undermine a system’s value. Some people doing forensic
work continue to use MD5, as they’ve used it for years, and its collisions don’t
give useful attacks. This is probably a mistake. In 2005, a motorist accused of
speeding in Sydney, Australia was acquitted after the New South Wales Roads
and Traffic Authority failed to find an expert to testify that MD5 was secure
in this application. The judge was “not satisfied beyond reasonable doubt that
the photograph [had] not been altered since it was taken” and acquitted the
motorist; his strange ruling was upheld on appeal the following year [1434].
So even if a vulnerability doesn’t present an engineering threat, it can still
present a certificational threat.
Hash functions have many other uses. One of them is to compute MACs.
A naïve method would be to hash the message with a key: MACk (M) = h(k, M).
However the accepted way of doing this, called HMAC, uses an extra step
in which the result of this computation is hashed again. The two hashing
operations are done using variants of the key, derived by exclusive-or’ing
them with two different constants. Thus HMACk (M) = h(k ⊕ B, h(k ⊕ A, M)).
183
184
Chapter 5
■
Cryptography
A is constructed by repeating the byte 0x36 as often as necessary, and B
similarly from the byte 0x5C. If a hash function is on the weak side, this
construction can make exploitable collisions harder to find [1091]. HMAC is
now FIPS 198-1.
Another use of hash functions is to make commitments that are to be revealed
later. For example, I might wish to timestamp a digital document in order
to establish intellectual priority, but not reveal the contents yet. In that case,
I can publish a hash of the document, or send it to a commercial timestamping
service, or have it mined into the Bitcoin blockchain. Later, when I reveal the
document, the timestamp on its hash establishes that I had written it by then.
Again, an algorithm that generates colliding pairs doesn’t break this, as you
have to have the pair to hand when you do the timestamp.
Merkle trees hash a large number of inputs to a single hash output. The inputs
are hashed to values that form the leaves of a tree; each non-leaf node contains
the hash of all the hashes at its child nodes, so the hash at the root is a hash of
all the values at the leaves. This is a fast way to hash a large data structure; it’s
used in code signing, where you may not want to wait for all of an application’s
files to have their signatures checked before you open it. It’s also widely used
in blockchain applications; in fact, a blockchain is just a Merkle tree. It was
invented by Ralph Merkle, who first proposed it to calculate a short hash of a
large file of public keys [1298], particularly for systems where public keys are
used only once. For example, a Lamport digital signature can be constructed
from a hash function: you create a private key of 512 random 256-bit values
ki and publish the verification key V as their Merkle tree hash. Then to sign
h = SHA256(M) you would reveal k2i if the i-th bit of h is zero, and otherwise
reveal k2i+1 . This is secure if the hash function is, but has the drawback that
each key can be used only once. Merkle saw that you could generate a series of
private keys by encrypting a counter with a master secret key, and then use a
tree to hash the resulting public keys. However, for most purposes, people use
signature algorithms based on number theory, which I’ll describe in the next
section.
One security-protocol use of hash functions is worth a mention: key updating
and autokeying. Key updating means that two or more principals who share a
key pass it through a one-way hash function at agreed times: Ki = h(Ki−1 ). The
point is that if an attacker compromises one of their systems and steals the key,
he only gets the current key and is unable to decrypt back traffic. The chain
of compromise is broken by the hash function’s one-wayness. This property is
also known as backward security. A variant is autokeying where the principals
update a key by hashing it with the messages they have exchanged since the
last key change: Ki+1 = h(Ki , Mi1 , Mi2 , … ). If an attacker now compromises one
of their systems and steals the key, then as soon as they exchange a message
which he can’t observe or guess, security will be recovered; again, the chain
of compromise is broken. This property is known as forward security. It was
5.7 Asymmetric crypto primitives
first used in banking in EFT payment terminals in Australia [208, 210]. The
use of asymmetric cryptography allows a slightly stronger form of forward
security, namely that as soon as a compromised terminal exchanges a message with an uncompromised one which the opponent doesn’t control, security
can be recovered even if the message is in plain sight. I’ll describe how this
works next.
5.7 Asymmetric crypto primitives
The commonly used building blocks in asymmetric cryptography, public-key
encryption and digital signature are based on number theory. I’ll give a brief
overview here, and look in more detail at some of the mechanisms in Part 2
when I discuss applications.
The basic idea is to make the security of the cipher depend on the difficulty
of solving a mathematical problem that’s known to be hard, in the sense that a
lot of people have tried to solve it and failed. The two problems used in almost
all real systems are factorization and discrete logarithm.
5.7.1 Cryptography based on factoring
The prime numbers are the positive whole numbers with no proper divisors:
the only numbers that divide a prime number are 1 and the number itself. By
definition, 1 is not prime; so the primes are {2, 3, 5, 7, 11, … }. The fundamental
theorem of arithmetic states that each natural number greater than 1 factors into
prime numbers in a way that is unique up to the order of the factors. It is easy to
find prime numbers and multiply them together to give a composite number,
but much harder to resolve a composite number into its factors. And lots of
smart people have tried really hard since we started using cryptography based
on factoring. The largest composite product of two large random primes to
have been factorized in 2020 was RSA-250, an 829-bit number (250 decimal
digits). This took the equivalent of 2700 years’ work on a single 2.2GHz core; the
previous record, RSA-240 in 2019, had taken the equivalent of 900 years [302].
It is possible for factoring to be done surreptitiously, perhaps using a botnet; in
2001, when the state of the art was factoring 512-bit numbers, such a challenge
was set in Simon Singh’s ‘Code Book’ and solved by five Swedish students
using several hundred computers to which they had access [44]. As for 1024-bit
numbers, I expect the NSA can factor them already, and I noted in the second
edition that ‘an extrapolation of the history of factoring records suggests the
first factorization will be published in 2018.’ Moore’s law is slowing down, and
we’re two years late. Anyway, organisations that want keys to remain secure
for many years are already using 2048-bit numbers at least.
185
186
Chapter 5
■
Cryptography
The algorithm commonly used to do public-key encryption and digital signatures based on factoring is RSA, named after its inventors Ron Rivest, Adi
Shamir and Len Adleman. It uses Fermat’s little theorem, which states that for all
primes p not dividing a, ap−1 ≡ 1 (mod p) (proof: take the set {1, 2, … , p − 1} and
multiply each of them modulo p by a, then cancel out (p − 1)! each side). For a
general integer n, a𝜙(n) ≡ 1 (mod p) where Euler’s function 𝜙(n) is the number of
positive integers less than n with which it has no divisor in common (the proof
is similar). So if n is the product of two primes pq then 𝜙(n) = (p − 1)(q − 1).
In RSA, the encryption key is a modulus N which is hard to factor (take
N = pq for two large randomly chosen primes p and q, say of 1024 bits each)
plus a public exponent e that has no common factors with either p − 1 or q − 1.
The private key is the factors p and q, which are kept secret. Where M is the
message and C is the ciphertext, encryption is defined by
C ≡ Me (mod N)
Decryption is the reverse operation:
√
M ≡ e C(mod N)
√ Whoever knows the private key – the factors p and q of N – can easily calculate
e
C (mod N). As 𝜙(N) = (p − 1)(q − 1) and e has no common factors with 𝜙(N),
the key’s owner can find a number d such that de ≡ 1 (mod 𝜙(N)) – she finds
the value of d separately modulo p − 1 and q − 1, and combines the answers.
√
e
C (mod N) is now computed as Cd (mod N), and decryption works because
of Fermat’s theorem:
Cd ≡ {Me }d ≡ Med ≡ M1+k𝜙(N) ≡ M.Mk𝜙(N) ≡ M.1 ≡ M (mod N)
Similarly, the owner of a private key can operate on a message with it to
produce a signature
Sigd (M) ≡ Md (mod N)
and this signature can be verified by raising it to the power e mod N (thus,
using e and N as the public signature verification key) and checking that the
message M is recovered:
M ≡ (Sigd (M))e (mod N)
Neither RSA encryption nor signature is safe to use on its own. The reason is
that, as encryption is an algebraic process, it preserves certain algebraic properties. For example, if we have a relation such as M1 M2 = M3 that holds among
plaintexts, then the same relationship will hold among ciphertexts C1 C2 = C3
and signatures Sig1 Sig2 = Sig3 . This property is known as a multiplicative homomorphism; a homomorphism is a function that preserves some mathematical
structure. The homomorphic nature of raw RSA means that it doesn’t meet the
random oracle model definitions of public key encryption or signature.
5.7 Asymmetric crypto primitives
Another general problem with public-key encryption is that if the plaintexts
are drawn from a small set, such as ‘attack’ or ‘retreat’, and the encryption process is deterministic (as RSA is), then an attacker might just precompute the
possible ciphertexts and recognise them when they appear. With RSA, it’s also
dangerous to use a small exponent e to encrypt the same message to multiple
recipients, as this can lead to an algebraic attack. To stop the guessing attack,
the low-exponent attack and attacks based on homomorphism, it’s sensible to
add in some randomness, and some redundancy, into a plaintext block before
encrypting it. Every time we encrypt the same short message, say ‘attack’, we
want to get a completely different ciphertext, and for these to be indistinguishable from each other as well as from the ciphertexts for ‘retreat’. And there are
good ways and bad ways of doing this.
Crypto theoreticians have wrestled for decades to analyse all the things that
can go wrong with asymmetric cryptography, and to find ways to tidy it up.
Shafi Goldwasser and Silvio Micali came up with formal models of probabilistic
encryption in which we add randomness to the encryption process, and semantic
security, which we mentioned already; in this context it means that an attacker
cannot get any information at all about a plaintext M that was encrypted to
a ciphertext C, even if he is allowed to request the decryption of any other
ciphertext C′ not equal to C [778]. In other words, we want the encryption to
resist chosen-ciphertext attack as well as chosen-plaintext attack. There are a
number of constructions that give semantic security, but they tend to be too
ungainly for practical use.
The usual real-world solution is optimal asymmetric encryption padding
(OAEP), where we concatenate the message M with a random nonce N, and
use a hash function h to combine them:
C1 = M ⊕ h(N)
C2 = N ⊕ h(C1 )
In effect, this is a two-round Feistel cipher that uses h as its round function.
The result, the combination C1 , C2 , is then encrypted with RSA and sent. The
recipient then computes N as C2 ⊕ h(C1 ) and recovers M as C1 ⊕ h(N) [213].
This was eventually proven to be secure. There are a number of public-key
cryptography standards; PKCS #1 describes OAEP [995]. These block a whole
lot of attacks that were discovered in the 20th century and about which people
have mostly forgotten, such as the fact that an opponent can detect if you
encrypt the same message with two different RSA keys. In fact, one of the
things we learned in the 1990s was that randomisation helps make crypto
protocols more robust against all sorts of attacks, and not just the mathematical
ones. Side-channel attacks and even physical probing of devices take a lot
more work.
With signatures, things are slightly simpler. In general, it’s often enough to
just hash the message before applying the private key: Sigd = [h(M)]d (mod N);
187
188
Chapter 5
■
Cryptography
PKCS #7 describes simple mechanisms for signing a message digest [1010].
However, in some applications one might wish to include further data in
the signature block, such as a timestamp, or some randomness to make
side-channel attacks harder.
Many of the things that have gone wrong with real implementations have
to do with side channels and error handling. One spectacular example was
when Daniel Bleichenbacher found a way to break the RSA implementation
in SSL v 3.0 by sending suitably chosen ciphertexts to the victim and observing any resulting error messages. If he could learn from the target whether
a given c, when decrypted as cd (mod n), corresponds to a PKCS #1 message,
then he could use this to decrypt or sign messages [265]. There have been many
more side-channel attacks on common public-key implementations, typically
via measuring the precise time taken to decrypt. RSA is also mathematically
fragile; you can break it using homomorphisms, or if you have the same ciphertext encrypted under too many different small keys, or if the message is too
short, or if two messages are related by a known polynomial, or in several other
edge cases. Errors in computation can also give a result that’s correct modulo
one factor of the modulus and wrong modulo the other, enabling the modulus
to be factored; errors can be inserted tactically, by interfering with the crypto
device, or strategically, for example by the chipmaker arranging for one particular value of a 64-bit multiply to be computed incorrectly. Yet other attacks
have involved stack overflows, whether by sending the attack code in as keys,
or as padding in poorly-implemented standards.
5.7.2 Cryptography based on discrete logarithms
While RSA was the first public-key encryption algorithm deployed in the SSL
and SSH protocols, the most popular public-key algorithms now are based on
discrete logarithms. There are a number of flavors, some using normal modular
arithmetic while others use elliptic curves. I’ll explain the normal case first.
A primitive root modulo p is a number whose powers generate all the nonzero
numbers mod p; for example, when working modulo 7 we find that 52 = 25
which reduces to 4 (modulo 7), then we can compute 53 as 52 × 5 or 4 × 5 which
is 20, which reduces to 6 (modulo 7), and so on, as in Figure 5.17.
Thus 5 is a primitive root modulo 7. This means that given any y, we can
always solve the equation y = 5x (mod 7); x is then called the discrete logarithm
of y modulo 7. Small examples like this can be solved by inspection, but for a
large random prime number p, we do not know how to do this efficiently. So the
mapping f ∶ x → gx (mod p) is a one-way function, with the additional properties that f (x + y) = f (x)f (y) and f (nx) = f (x)n . In other words, it is a one-way
homomorphism. As such, it can be used to construct digital signature and public
key encryption algorithms.
5.7 Asymmetric crypto primitives
51
52 =
53 ≡
54 ≡
55 ≡
56 ≡
25
4x5
6x5
2x5
3x5
=5
≡4
≡6
≡2
≡3
≡1
(mod 7)
(mod 7)
(mod 7)
(mod 7)
(mod 7)
(mod 7)
Figure 5.17: Example of discrete logarithm calculations
5.7.2.1 One-way commutative encryption
Imagine we’re back in ancient Rome, that Anthony wants to send a secret to
Brutus, and the only communications channel available is an untrustworthy
courier (say, a slave belonging to Caesar). Anthony can take the message, put
it in a box, padlock it, and get the courier to take it to Brutus. Brutus could then
put his own padlock on it too, and have it taken back to Anthony. He in turn
would remove his padlock, and have it taken back to Brutus, who would now
at last open it.
Exactly the same can be done using a suitable encryption function that commutes, that is, has the property that {{M}KA }KB = {{M}KB }KA . Alice can take
the message M and encrypt it with her key KA to get {M}KA which she sends
to Bob. Bob encrypts it again with his key KB getting {{M}KA }KB . But the commutativity property means that this is just {{M}KB }KA , so Alice can decrypt it
using her key KA getting {M}KB . She sends this to Bob and he can decrypt it
with KB, finally recovering the message M.
How can a suitable commutative encryption be implemented? The one-time
pad does indeed commute, but is not suitable here. Suppose Alice chooses a
random key xA and sends Bob M ⊕ xA while Bob returns M ⊕ xB and Alice
finally sends him M ⊕ xA ⊕ xB, then an attacker can simply exclusive-or these
three messages together; as X ⊕ X = 0 for all X, the two values of xA and xB
both cancel out, leaving the plaintext M.
The discrete logarithm problem comes to the rescue. If the discrete log
problem based on a primitive root modulo p is hard, then we can use discrete
exponentiation as our encryption function. For example, Alice encodes her
message as the primitive root g, chooses a random number xA, calculates
gxA modulo p and sends it, together with p, to Bob. Bob likewise chooses
a random number xB and forms gxAxB modulo p, which he passes back to
Alice. Alice can now remove her exponentiation: using Fermat’s theorem,
she calculates gxB = (gxAxB )(p−xA) (mod p) and sends it to Bob. Bob can now
remove his exponentiation, too, and so finally gets hold of g. The security of
this scheme depends on the difficulty of the discrete logarithm problem. In
practice, it can be tricky to encode a message as a primitive root; but there’s a
simpler way to achieve the same effect.
189
190
Chapter 5
■
Cryptography
5.7.2.2 Diffie-Hellman key establishment
The first public-key encryption scheme to be published, by Whitfield Diffie and
Martin Hellman in 1976, has a fixed primitive root g and uses gxAxB modulo p
as the key to a shared-key encryption system. The values xA and xB can be the
private keys of the two parties.
Let’s walk through this. The prime p and generator g are common to all users.
Alice chooses a secret random number xA, calculates yA = gxA and publishes it
opposite her name in the company phone book. Bob does the same, choosing
a random number xB and publishing yB = gxB . In order to communicate with
Bob, Alice fetches yB from the phone book, forms yBxA which is just gxAxB , and
uses this to encrypt the message to Bob. On receiving it, Bob looks up Alice’s
public key yA and forms yAxB which is also equal to gxAxB , so he can decrypt
her message.
Alternatively, Alice and Bob can use transient keys, and get a mechanism
for providing forward security. As before, let the prime p and generator g be
common to all users. Alice chooses a random number RA , calculates gRA and
sends it to Bob; Bob does the same, choosing a random number RB and sending
gRB to Alice; they then both form gRA RB , which they use as a session key (see
Figure 5.18).
Alice and Bob can now use the session key gRA RB to encrypt a conversation.
If they used transient keys, rather than long-lived ones, they have managed
to create a shared secret ‘out of nothing’. Even if an opponent had inspected
both their machines before this protocol was started, and knew all their
stored private keys, then provided some basic conditions were met (e.g., that
their random number generators were not predictable and no malware was
left behind) the opponent could still not eavesdrop on their traffic. This is
the strong version of the forward security property to which I referred in
section 5.6.2. The opponent can’t work forward from knowledge of previous
keys, however it was obtained. Provided that Alice and Bob both destroy the
shared secret after use, they will also have backward security: an opponent
who gets access to their equipment later cannot work backward to break their
old traffic. In what follows, we may write the Diffie-Hellman key derived
from RA and RB as DH(RA , RB ) when we don’t have to be explicit about which
group we’re working in, and don’t need to write out explicitly which is the
private key RA and which is the public key gRA .
A → B ∶ gRA (mod p)
B → A ∶ gRB (mod p)
A → B ∶ {M}gRA RB
Figure 5.18: The Diffie-Hellman key exchange protocol
5.7 Asymmetric crypto primitives
Slightly more work is needed to provide a full solution. Some care is needed
when choosing the parameters p and g; we can infer from the Snowden disclosures, for example, that the NSA can solve the discrete logarithm problem for
commonly-used 1024-bit prime numbers6 . And there are several other details
which depend on whether we want properties such as forward security.
But this protocol has a small problem: although Alice and Bob end up with a
session key, neither of them has any real idea who they share it with.
Suppose that in our padlock protocol Caesar had just ordered his slave
to bring the box to him instead, and placed his own padlock on it next
to Anthony’s. The slave takes the box back to Anthony, who removes his
padlock, and brings the box back to Caesar who opens it. Caesar can even run
two instances of the protocol, pretending to Anthony that he’s Brutus and to
Brutus that he’s Anthony. One fix is for Anthony and Brutus to apply their
seals to their locks.
With the Diffie-Hellman protocol, the same idea leads to a middleperson
attack. Charlie intercepts Alice’s message to Bob and replies to it; at the same
time, he initiates a key exchange with Bob, pretending to be Alice. He ends up
with a key DH(RA , RC ) which he shares with Alice, and another key DH(RB , RC )
which he shares with Bob. So long as he continues to sit in the middle of the
network and translate the messages between them, they may have a hard time
detecting that their communications are compromised. The usual solution is to
authenticate transient keys, and there are various possibilities.
In the STU-2 telephone, which is now obsolete but which you can see in the
NSA museum at Fort Meade, the two principals would read out an eight-digit
hash of the key they had generated and check that they had the same value
before starting to discuss classified matters. Something similar is implemented
in Bluetooth versions 4 and later, but is complicated by the many versions that
the protocol has evolved to support devices with different user interfaces. The
protocol has suffered from multiple attacks, most recently the Key Negotiation
of Bluetooth (KNOB) attack, which allows a middleperson to force one-byte
keys that are easily brute forced; all devices produced before 2018 are vulnerable [125]. The standard allows for key lengths between one and sixteen bytes;
as the keylength negotiation is performed in the clear, an attacker can force the
length to the lower limit. All standards-compliant chips are vulnerable; this
may be yet more of the toxic waste from the Crypto Wars, which I discuss in
section 26.2.7. Earlier versions of Bluetooth are more like the ‘just-works’ mode
of the HomePlug protocol described in section 4.7.1 in that they were principally designed to help you set up a pairing key with the right device in a benign
6 The
likely discrete log algorithm, NFS, involves a large computation for each prime number followed by a smaller computation for each discrete log modulo that prime number. The open record
is 795 bits, which took 3,100 core-years in 2019 [302], using a version of NFS that’s three times more
efficient than ten years ago. There have been persistent rumours of a further NSA improvement
and in any case the agency can throw a lot more horsepower at an important calculation.
191
192
Chapter 5
■
Cryptography
environment, rather than defending against a sophisticated attack in a hostile
one. The more modern ones appear to be better, but it’s really just theatre.
So many things go wrong: protocols that will generate or accept very weak
keys and thus give only the appearance of protection; programs that leak keys
via side channels such as the length of time they take to decrypt; and software
vulnerabilities leading to stack overflows and other hacks. If you’re implementing public-key cryptography you need to consult up-to-date standards,
use properly accredited toolkits, and get someone knowledgeable to evaluate what you’ve done. And please don’t write the actual crypto code on your
own – doing it properly requires a lot of different skills, from computational
number theory to side-channel analysis and formal methods. Even using good
crypto libraries gives you plenty of opportunities to shoot your foot off.
5.7.2.3 ElGamal digital signature and DSA
Suppose that the base p and the generator g are public values chosen in some
suitable way, and that each user who wishes to sign messages has a private
signing key X with a public signature verification key Y = gX . An ElGamal signature scheme works as follows. Choose a message key k at random, and form
r = gk (mod p). Now form the signature s using a linear equation in k, r, the
message M and the private key X. There are a number of equations that will
do; the one that happens to be used in ElGamal signatures is
rX + sk = M
So s is computed as s = (M − rX)∕k; this is done modulo 𝜙(p). When both sides
are passed through our one-way homomorphism f (x) = gx mod p we get:
grX gsk ≡ gM
or
Yr rs ≡ gM
An ElGamal signature on the message M consists of the values r and s, and
the recipient can verify it using the above equation.
A few more details need to be fixed up to get a functional digital signature
scheme. As before, bad choices of p and g can weaken the algorithm. We will
also want to hash the message M using a hash function so that we can sign
messages of arbitrary length, and so that an opponent can’t use the algorithm’s
algebraic structure to forge signatures on messages that were never signed.
Having attended to these details and applied one or two optimisations, we get
the Digital Signature Algorithm (DSA) which is a US standard and widely used
in government applications.
5.7 Asymmetric crypto primitives
DSA assumes a prime p of typically 2048 bits7 , a prime q of 256 bits dividing
(p − 1), an element g of order q in the integers modulo p, a secret signing key x
and a public verification key y = gx . The signature on a message M, Sigx (M), is
(r, s) where
r ≡ (gk (mod p)) (mod q)
s ≡ (h(M) − xr)∕k (mod q)
The hash function used by default is SHA2568 .
DSA is the classic example of a randomised digital signature scheme without
message recovery. The most commonly-used version nowadays is ECDSA, a
variant based on elliptic curves, which we’ll discuss now – this is for example
the standard for cryptocurrency and increasingly also for certificates in bank
smartcards.
5.7.3 Elliptic curve cryptography
Discrete logarithms and their analogues exist in many other mathematical
structures. Elliptic curve cryptography uses discrete logarithms on an elliptic
curve – a curve given by an equation like y2 = x3 + ax + b. These curves have
the property that you can define an addition operation on them and the
resulting Mordell group can be used for cryptography. The algebra gets a bit
complex and this book isn’t the place to set it out9 . However, elliptic curve
cryptosystems are interesting for at least two reasons.
First is performance; they give versions of the familiar primitives such
as Diffie-Hellmann key exchange and the Digital Signature Algorithm that
use less computation, and also have shorter variables; both are welcome in
constrained environments. Elliptic curve cryptography is used in applications
from the latest versions of EMV payment cards to Bitcoin.
Second, some elliptic curves have a bilinear pairing which Dan Boneh and
Matt Franklin used to construct cryptosystems where your public key is
your name [287]. Recall that in RSA and Diffie-Hellmann, the user chose his
private key and then computed a corresponding public key. In a so-called
identity-based cryptosystem, you choose your identity then go to a central
authority that issues you with a private key corresponding to that identity.
There is a global public key, with which anyone can encrypt a message
7
In the 1990s p could be in the range 512–1024 bits and q 160 bits; this was changed to 1023–1024
bits in 2001 [1404] and 1024–3072 bits in 2009, with q in the range 160–256 bits [1405].
8 The default sizes of p are chosen to be 2048 bits and q 256 bits in order to equalise the work factors
of the two best known cryptanalytic attacks, namely the number field sieve whose running speed
depends on the size of p and Pollard’s rho which depends on the size of q. Larger sizes can be
chosen if you’re anxious about Moore’s law or about progress in algorithms.
9 See Katz and Lindell [1025] for an introduction.
193
194
Chapter 5
■
Cryptography
to your identity; you can decrypt this using your private key. Earlier, Adi
Shamir had discovered identity-based signature schemes that allow you to
sign messages using a private key so that anyone can verify the signature
against your name [1707]. In both cases, your private key is computed
by the central authority using a system-wide private key known only to
itself. Identity-based primitives have been used in a few specialist systems:
in Zcash for the payment privacy mechanisms, and in a UK government
key-management protocol called Mikey-Sakke. Computing people’s private
keys from their email addresses or other identifiers may seem a neat hack,
but it can be expensive when government departments are reorganised or
renamed [116]. Most organisations and applications use ordinary public-key
systems with certification of public keys, which I’ll discuss next.
5.7.4 Certification authorities
Now that we can do public-key encryption and digital signature, we need
some mechanism to bind users to keys. The approach proposed by Diffie and
Hellman when they invented digital signatures was to have a directory of the
public keys of a system’s authorised users, like a phone book. A more common
solution, due to Loren Kohnfelder, is for a certification authority (CA) to sign
the users’ public encryption keys or their signature verification keys giving
certificates that contain a user’s name, one or more of their public keys, and
attributes such as authorisations. The CA might be run by the local system
administrator; but it is most commonly a third party service such as Verisign
whose business is to sign public keys after doing some due diligence about
whether they are controlled by the principals named in them.
A certificate might be described symbolically as
CA = SigKS (TS , L, A, KA , VA )
(5.1)
where TS is the certificate’s starting date and time, L is the length of time for
which it is valid, A is the user’s name, KA is her public encryption key, and VA
is her public signature verification key. In this way, only the administrator’s
public signature verification key needs to be communicated to all principals in
a trustworthy manner.
Certification is hard, for a whole lot of reasons. Naming is hard, for starters;
we discuss this in Chapter 7 on Distributed Systems. But often names aren’t
really what the protocol has to establish, as in the real world it’s often
about authorisation rather than authentication. Government systems are often
about establishing not just a user’s name or role but their security clearance
level. In banking systems, it’s about your balance, your available credit and
your authority to spend it. In commercial systems, it’s often about linking
remote users to role-based access control. In user-facing systems, there is
5.7 Asymmetric crypto primitives
a tendency to dump on the customer as many of the compliance costs as
possible [524]. There are many other things that can go wrong with certification at the level of systems engineering. At the level of politics, there are
hundreds of certification authorities in a typical browser, they are all more or
less equally trusted, and many nation states can coerce at least one of them10 .
The revocation of bad certificates is usually flaky, if it works at all. There will
be much more on these topics later. With these warnings, it’s time to look at
the most commonly used public key protocol, TLS.
5.7.5 TLS
I remarked above that a server could publish a public key KS and any web
browser could then send a message M containing a credit card number to it
encrypted using KS: {M}KS . This is in essence what the TLS protocol (then
known as SSL) was designed to do, at the start of e-commerce. It was developed by Paul Kocher and Taher ElGamal in 1995 to support encryption and
authentication in both directions, so that both http requests and responses can
be protected against both eavesdropping and manipulation. It’s the protocol
that’s activated when you see the padlock on your browser toolbar.
Here is a simplified description of the basic version of the protocol in TLS v1:
1. the client sends the server a client hello message that contains its
name C, a transaction serial number C#, and a random nonce NC ;
2. the server replies with a server hello message that contains its name
S, a transaction serial number S#, a random nonce NS , and a certificate CS containing its public key KS. The client now checks
the certificate CS, and if need be checks the key that signed it
in another certificate, and so on back to a root certificate issued
by a company such as Verisign and stored in the browser;
3. the client sends a key exchange message containing a pre-master-secret key,
K0 , encrypted under the server public key KS. It also sends a finished
message with a message authentication code (MAC) computed on
all the messages to date. The key for this MAC is the master-secret,
K1 . This key is computed by hashing the pre-master-secret key with
the nonces sent by the client and server: K1 = h(K0 , NC , NS ). From this
point onward, all the traffic is encrypted; we’ll write this as {...}KCS
in the client-server direction and {...}KSC from the server to the client.
These keys are generated in turn by hashing the nonces with K1 .
10 The
few that can’t, try to cheat. In 2011 Iran hacked the CA Diginotar, and in 2019 Kazakhstan
forced its citizens to add a local police certificate to their browser. In both cases the browser vendors pushed back fast and hard: Diginotar failed after it was blacklisted, while the Kazakh cert
was blocked even if its citizens installed it manually. This of course raises issues of sovereignty.
195
196
Chapter 5
■
Cryptography
4. The server also sends a finished message with a MAC computed on
all the messages to date. It then finally starts sending the data.
C→S∶
S→C∶
C→S∶
C→S∶
S→C∶
C, C#, NC
S, S#, NS , CS
{K0 }KS
{finished, MAC(K1 , everythingtodate)}KCS
{finished, MAC(K1 , everythingtodate)}KSC , {data}KSC
Once a client and server have established a pre-master-secret, no more
public-key operations are needed as further master secrets can be obtained by
hashing it with new nonces.
5.7.5.1 TLS uses
The full protocol is more complex than this, and has gone through a number
of versions. It has supported a number of different ciphersuites, initially so
that export versions of software could be limited to 40 bit keys – a condition
of export licensing that was imposed for many years by the US government.
This led to downgrade attacks where a middleperson could force the use of
weak keys. Other ciphersuites support signed Diffie-Hellman key exchanges
for transient keys, to provide forward and backward secrecy. TLS also has
options for bidirectional authentication so that if the client also has a certificate,
this can be checked by the server. In addition, the working keys KCS and KSC
can contain separate subkeys for encryption and authentication, as is needed
for legacy modes of operation such as CBC plus CBC MAC.
As well as being used to encrypt web traffic, TLS has also been available as an
authentication option in Windows from Windows 2000 onwards; you can use
it instead of Kerberos for authentication on corporate networks. I will describe
its use in more detail in the chapter on network attack and defence.
5.7.5.2 TLS security
Although early versions of SSL had a number of bugs [1977], SSL 3.0 and later
appear to be sound; the version after SSL 3.0 was renamed TLS 1.0. It was formally verified by Larry Paulson in 1998, so we know that the idealised version
of the protocol doesn’t have any bugs [1504].
However, in the more than twenty years since then, there have been over
a dozen serious attacks. Even in 1998, Daniel Bleichenbacher came up with
the first of a number of attacks based on measuring the time it takes a server
to decrypt, or the error messages it returns in response to carefully-crafted
protocol responses [265]. TLS 1.1 appeared in 2006 with protection against
5.7 Asymmetric crypto primitives
exploits of CBC encryption and of padding errors; TLS 1.2 followed two years
later, upgrading the hash function to SHA256 and supporting authenticated
encryption; and meanwhile there were a number of patches dealing with
various attacks that had emerged. Many of these patches were rather inelegant
because of the difficulty of changing a widely-used protocol; it’s difficult to
change both the server and client ends at once, as any client still has to interact
with millions of servers, many running outdated software, and most websites
want to be able to deal with browsers of all ages and on all sorts of devices.
This has been dealt with by the big service firms changing their browsers to
reject obsolete ciphersuites, and to add features like strict transport security
(STS) whereby a website can instruct browsers to only interact with it using
https in future (to prevent downgrade attacks). The browser firms have also
mandated a number of other supporting measures, from shorter certificate
lifetimes to certificate transparency, which we’ll discuss in the chapter on
network attack and defence.
5.7.5.3 TLS 1.3
The most recent major upgrade to the core protocol, TLS 1.3, was approved
by the IETF in January 2019 after two years of discussion. It has dropped
backwards compatibility in order to end support for many old ciphers, and
made it mandatory to establish end-to-end forward secrecy by means of a
Diffie-Hellman key exchange at the start of each session. This has caused
controversy with the banking industry, which routinely intercepts encrypted
sessions in order to do monitoring for compliance purposes. This will no
longer be possible, so banks will have to bear the legal discomfort of using
obsolete encryption or the financial cost of redeveloping systems to monitor
compliance at endpoints instead11 .
5.7.6 Other public-key protocols
Dozens of other public-key protocols have found wide use, including the following, most of which we’ll discuss in detail later. Here I’ll briefly mention
code signing, PGP and QUIC.
5.7.6.1 Code signing
Code signing was introduced in the 1990s when people started downloading
software rather than getting it on diskettes. It is now used very widely to
11 The
COVID-19 pandemic has given some respite: Microsoft had been due to remove support
for legacy versions of TLS in spring 2020 but has delayed this.
197
198
Chapter 5
■
Cryptography
assure the provenance of software. You might think that having a public
signature-verification key in your software so that version N can verify an
update to version N + 1 would be a simple application of public-key cryptography but this is far from the case. Many platforms sign their operating-system
code, including updates, to prevent persistent malware; the mechanisms often
involve trusted hardware such as TPMs and I’ll discuss them in the next
chapter in section 6.2.5. Some platforms, such as the iPhone, will only run
signed code; this not only assures the provenance of software but enables
platform owners to monetise apps, as I will discuss in section 22.4.2; games
consoles are similar. As some users go to great lengths to jailbreak their
devices, such platforms typically have trustworthy hardware to store the
verification keys. Where that isn’t available, verification may be done using
code that is obfuscated to make it harder for malware (or customers) to tamper
with it; this is a constant arms race, which I discuss in section 24.3.3. As for the
signing key, the developer may keep it in a hardware security module, which
is expensive and breaks in subtle ways discussed in section 20.5; there may be
a chain of trust going back to a commercial CA, but then have to worry about
legal coercion by government agencies, which I discuss in section 26.2.7; you
might even implement your own CA for peace of mind. In short, code signing
isn’t quite as easy as it looks, particularly when the user is the enemy.
5.7.6.2 PGP/GPG
During the ‘Crypto Wars’ in the 1990s, cyber-activists fought governments
for the right to encrypt email, while governments pushed for laws restricting
encryption; I’ll discuss the history and politics in section 26.2.7. The crypto
activist Phil Zimmermann wrote an open-source encryption product Pretty
Good Privacy (PGP) and circumvented U.S. export controls by publishing the
source code in a paper book, which could be posted, scanned and compiled.
Along with later compatible products such as GPG, it has become fairly widely
used among geeks. For example, sysadmins, Computer Emergency Response
Teams (CERTs) and malware researchers use it to share information about
attacks and vulnerabilities. It has also been built into customised phones sold
to criminal gangs to support messaging; I’ll discuss this later in section 25.4.1.
PGP has a number of features but, in its most basic form, each user generates
private/public keypairs manually and shares public keys with contacts. There
are command-line options to sign a message with your signature key and/or
encrypt it using the public key of each of the intended recipients. Manual key
management avoids the need for a CA that can be cracked or coerced. Many
things were learned from the deployment and use of PGP during the 1990s. As
I described in section 3.2.1, Alma Whitten and Doug Tygar wrote the seminal
paper on security usability by assessing whether motivated but cryptologically
5.7 Asymmetric crypto primitives
unsophisticated users could understand it well enough to drive the program
safely. Only four of twelve subjects were able to correctly send encrypted email
to the other subjects, and every subject made at least one significant error.
5.7.6.3 QUIC
QUIC is a new UDP-based protocol designed by Google and promoted as
an alternative to TLS that allows quicker session establishment and cutting
latency in the ad auctions that happen as pages load; sessions can persist as
people move between access points. This is achieved by a cookie that holds
the client’s last IP address, encrypted by the server. It appeared in Chrome
in 2013 and now has about 7% of Internet traffic; it’s acquired a vigorous
standardisation community. Google claims it reduces search latency 8% and
YouTube buffer time 18%. Independent evaluation suggests that the benefit is
mostly on the desktop rather than mobile [1009], and there’s a privacy concern
as the server can use an individual public key for each client, and use this for
tracking. As a general principle, one should be wary of corporate attempts
to replace open standards with proprietary ones, whether IBM’s EBCDIC
coding standard of the 1950s and SNA in the 1970s, or Microsoft’s attempts
to ‘embrace and extend’ both mail standards and security protocols since the
1990s, or Facebook’s promotion of Internet access in Africa that kept users
largely within its walled garden. I’ll discuss the monopolistic tendencies of
our industry at greater length in Chapter 8.
5.7.7 Special-purpose primitives
Researchers have invented a large number of public-key and signature primitives with special properties. Two that have so far appeared in real products
are threshold cryptography and blind signatures.
Threshold crypto is a mechanism whereby a signing key, or a decryption key,
can be split up among n principals so that any k out of n can sign a message
(or decrypt). For k = n the construction is easy. With RSA, for example, you can
split up the private key d as d = d1 + d2 + … + dn . For k < n it’s slightly more
complex (but not much – you use the Lagrange interpolation formula) [554].
Threshold signatures were first used in systems where a number of servers
process transactions independently and vote independently on the outcome;
they have more recently been used to implement business rules on cryptocurrency wallets such as ‘a payment must be authorised by any two of the seven
company directors’.
Blind signatures are a way of making a signature on a message without
knowing what the message is. For example, if we are using RSA, I can take a
random number R, form Re M (mod n), and give it to the signer who computes
199
200
Chapter 5
■
Cryptography
(Re M)d = R.Md (mod n). When he gives this back to me, I can divide out R to get
the signature Md . Now you might ask why on earth someone would want to
sign a document without knowing its contents, but there are some applications.
The first was in digital cash; you might want to be able to issue anonymous
payment tokens to customers, and the earliest idea, due to David Chaum, was a
way to sign ‘digital coins’ without knowing their serial numbers [413]. A bank
might agree to honour for $10 any string M with a unique serial number and a
specified form of redundancy, bearing a signature that verified as correct using
the public key (e, n). The blind signature protocol ensures a customer can get a
bank to sign a coin without the banker knowing its serial number, and it was
used in prototype road toll systems. The effect is that the digital cash can be
anonymous for the spender. The main problem with digital cash was to detect
people who spend the same coin twice, and this was eventually fixed using
blockchains or other ledger mechanisms, as I discuss in section 20.7. Digital
cash failed to take off because neither banks nor governments really want payments to be anonymous: anti-money-laundering regulations since 9/11 restrict
anonymous payment services to small amounts, while both banks and bitcoin
miners like to collect transaction fees.
Anonymous digital credentials are now used in attestation: the TPM chip on
your PC motherboard might prove something about the software running on
your machine without identifying you. Unfortunately, this led to designs for
attestation in SGX (and its AMD equivalent) which mean that a single compromised device breaks the whole ecosystem. Anonymous signatures are also
found in prototype systems for conducting electronic elections, to which I will
return in section 25.5.
5.7.8 How strong are asymmetric cryptographic primitives?
In order to provide the same level of protection as a symmetric block cipher,
asymmetric cryptographic primitives generally require at least twice the block
length. Elliptic curve systems appear to achieve this bound; a 256-bit elliptic
scheme could be about as hard to break as a 128-bit block cipher with a 128-bit
key; and the only public-key encryption schemes used in the NSA’s Suite B of
military algorithms are 384-bit elliptic curve systems. The traditional schemes,
based on factoring and discrete log, now require 3072-bit keys to protect material at Top Secret, as there are shortcut attack algorithms such as the number
field sieve. As a result, elliptic curve cryptosystems are faster.
When I wrote the first edition of this book in 2000, the number field sieve
had been used to attack keys up to 512 bits, a task comparable in difficulty to
keysearch on 56-bit DES keys; by the time I rewrote this chapter for the second
edition in 2007, 64-bit symmetric keys had been brute-forced, and the 663-bit
challenge number RSA-200 had been factored. By the third edition in 2019,
5.7 Asymmetric crypto primitives
bitcoin miners are finding 68-bit hash collisions every ten minutes, RSA-768
has been factored and Ed Snowden has as good as told us that the NSA can do
discrete logs for a 1024-bit prime modulus.
There has been much research into quantum computers – devices that perform
a large number of computations simultaneously using superposed quantum
states. Peter Shor has shown that if a sufficiently large quantum computer
could be built, then both factoring and discrete logarithm computations will
become easy [1728]. So far only very small quantum devices have been built;
although there are occasional claims of ‘quantum supremacy’ – of a quantum computer performing a task sufficiently faster than a conventional one
to convince us that quantum superposition or entanglement is doing any real
work – they seem to lead nowhere. I am sceptical (as are many physicists) about
whether the technology will ever threaten real systems. I am even more sceptical about the value of quantum cryptography; it may be able to re-key a line
encryption device that uses AES for bulk encryption on a single uninterrupted
fibre run, but we already know how to do that.
What’s more, I find the security proofs offered for entanglement-based quantum cryptography to be unconvincing. Theoretical physics has been stalled
since the early 1970s when Gerard ’t Hooft completed the Standard Model
by proving the renormalisability of Yang-Mills. Since then, a whole series of
ideas have come and gone, such as string theory [2035]. Quantum information
theory is the latest enthusiasm. Its proponents talk up the mystery of the
Bell tests, which are supposed to demonstrate that physics cannot be simultaneously local and causal. But alternative interpretations such as ’t Hooft’s
cellular automaton model [918] and Grisha Volovik’s superfluid model [1971]
suggest that the Bell tests merely demonstrate the existence of long-range
order in the quantum vacuum, like the order parameter of a superfluid. Since
2005, we’ve had lab experiments involving bouncing droplets on a vibrating
fluid bath that demonstrate interesting analogues of quantum-mechanical
properties relevant to the Bell tests [1560]. This book is not the place to discuss
the implications in more detail; for that, see [312]. There is a whole community
of physicists working on emergent quantum mechanics – the idea that to make
progress beyond the Standard Model, and to reconcile the apparent conflict
between quantum mechanics and general relativity, we may need to look at
things differently. Meantime, if anyone claims their system is secure ‘because
quantum mechanics’ then scepticism may be in order.
I think it more likely that a major challenge to public-key cryptography could
come in the form of a better algorithm for computing discrete logarithms on
elliptic curves. These curves have a lot of structure; they are studied intensively by some of the world’s smartest pure mathematicians; better discrete-log
algorithms for curves of small characteristic were discovered in 2013 [169]; and
the NSA is apparently moving away from using elliptic-curve crypto.
201
202
Chapter 5
■
Cryptography
If quantum computers ever work, we have other ‘post-quantum’ algorithms
ready to go, for which quantum computers give no obvious advantage.
In 2020, NIST began the third round of public review of submissions for
the Post-Quantum Cryptography Standardization Process. The 65 initial
submissions have been cut to 15 through two rounds of review12 . One or more
algorithms will now be chosen and standardised, so ciphersuites using them
could be dropped into protocols such as TLS as upgrades. Many protocols in
use could even be redesigned to use variants on Kerberos. If elliptic logarithms
become easy, we have these resources and can also fall back to discrete logs in
prime fields, or to RSA. But if elliptic logs become easy, bitcoins will become
trivial to forge, and the cryptocurrency ecosystem would probably collapse,
putting an end to the immensely wasteful mining operations I describe in
section 20.7. So mathematicians who care about the future of the planet might
do worse than to study the elliptic logarithm problem.
5.7.9 What else goes wrong
Very few attacks on systems nowadays involve cryptanalysis in the sense of a
mathematical attack on the encryption algorithm or key. There have indeed
been attacks on systems designed in the 20th century, mostly involving keys
that were kept too short by export-control rules, clueless designs or both.
I already discussed in section 4.3.1 how weak crypto has facilitated a wave of
car theft, as all the devices used for remote key entry were defeated one after
another in 2005–15. In later chapters, I give examples of how the crypto wars
and their export control rules resulted in attacks on door locks (section 13.2.5),
mobile phones (section 22.3.1) and copyright enforcement (section 24.2.5).
Most attacks nowadays exploit the implementation. In chapter 2, I mentioned the scandal of NIST standardising a complicated random number
generator based on elliptic curves that turned out to contain an NSA backdoor; see section 2.2.1.5. Poor random number generators have led to many
other failures: RSA keys with common factors [1142], predictable seeds for
discrete logs [1679], etc. These vulnerabilities have continued; thanks to the
Internet of Things, the proportion of RSA certs one can find out there on the
Internet that share a common factor with other RSA keys has actually risen
between 2012 and 2020; 1 in 172 IoT certs are trivially vulnerable [1048].
Many of the practical attacks on cryptographic implementations that have
forced significant changes over the past 20 years have exploited side channels
such as timing and power analysis; I devote Chapter 19 to these.
12 One
of them, the McEliece cryptosystem, has been around since 1978; we’ve had digital signatures based on hash functions for about as long, and some of us used them in the 1990s to avoid
paying patent royalties on RSA.
5.8 Summary
In Chapter 20, I’ll discuss a number of systems that use public-key mechanisms in intricate ways to get interesting emergent properties, including the
Signal messaging protocol, the TOR anonymity system, and cryptocurrencies.
I’ll also look at the crypto aspects of SGX enclaves. These also have interesting
failure modes, some but not all of them relating to side channels.
In Chapter 21, I’ll discuss protocols used in network infrastructure such as
DKIM, DNSSec versus DNS over HTTP, and SSH.
5.8 Summary
Many ciphers fail because they’re used badly, so the security engineer needs a
clear idea of what different types of cipher do. This can be tackled at different
levels; one is at the level of crypto theory, where we can talk about the
random oracle model, the concrete model and the semantic security model,
and hopefully avoid using weak modes of operation and other constructions.
The next level is that of the design of individual ciphers, such as AES, or
the number-theoretic mechanisms that underlie public-key cryptosystems
and digital signature mechanisms. These also have their own specialised
fields of mathematics, namely block cipher cryptanalysis and computational
number theory. The next level involves implementation badness, which is
much more intractable and messy. This involves dealing with timing, error
handling, power consumption and all sorts of other grubby details, and is
where modern cryptosystems tend to break in practice.
Peering under the hood of real systems, we’ve discussed how block ciphers
for symmetric key applications can be constructed by the careful combination
of substitutions and permutations; for asymmetric applications such as public
key encryption and digital signature one uses number theory. In both cases,
there is quite a large body of mathematics. Other kinds of ciphers – stream
ciphers and hash functions – can be constructed from block ciphers by using
them in suitable modes of operation. These have different error propagation,
pattern concealment and integrity protection properties. A lot of systems fail
because popular crypto libraries encourage programmers to use inappropriate
modes of operation by exposing unsafe defaults. Never use ECB mode unless
you really understand what you’re doing.
There are many other things that can go wrong, from side channel attacks to
poor random number generators. In particular, it is surprisingly hard to build
systems that are robust even when components fail (or are encouraged to) and
where the cryptographic mechanisms are well integrated with other measures
such as access control and physical security. I’ll return to this repeatedly in later
chapters.
The moral is: Don’t roll your own! Don’t design your own protocols, or your
own ciphers; and don’t write your own crypto code unless you absolutely have
203
204
Chapter 5
■
Cryptography
to. If you do, then you not only need to read this book (and then read it again,
carefully); you need to read up the relevant specialist material, speak to experts,
and have capable motivated people try to break it. At the very least, you need to
get your work peer-reviewed. Designing crypto is a bit like juggling chainsaws;
it’s just too easy to make fatal errors.
Research problems
There are many active threads in cryptography research. Many of them are
where crypto meets a particular branch of mathematics (number theory,
algebraic geometry, complexity theory, combinatorics, graph theory, and
information theory). The empirical end of the business is concerned with
designing primitives for encryption, signature and composite operations, and
which perform reasonably well on available platforms. The two meet in the
study of subjects ranging from cryptanalysis, to the search for primitives that
combine provable security properties with decent performance.
The best way to get a flavor of what’s going on at the theoretical end of
things is to read the last few years’ proceedings of research conferences such
as Crypto, Eurocrypt and Asiacrypt; work on cipher design appears at Fast
Software Encryption; attacks on implementations often appear at CHES; while
attacks on how crypto gets used in systems can be found in the systems security
conferences such as IEEE Security and Privacy, CCS and Usenix.
Further reading
The classic papers by Whit Diffie and Martin Hellman [556] and by Ron Rivest,
Adi Shamir and Len Adleman [1610] are the closest to required reading in this
subject. Bruce Schneier’s Applied Cryptography [1670] covers a lot of ground at
a level a non-mathematician can understand, and got crypto code out there
in the 1990s despite US export control laws, but is now slightly dated. Alfred
Menezes, Paul van Oorshot and Scott Vanstone’s Handbook of Applied Cryptography [1291] is one reference book on the mathematical detail. Katz and Lindell
is the book we get our students to read for the math. It gives an introduction to
the standard crypto theory plus the number theory you need for public-key
crypto (including elliptic curves and index calculus) but is also dated: they
don’t mention GCM, for example [1025].
There are many more specialist books. The bible on differential cryptanalysis
is by its inventors Eli Biham and Adi Shamir [246], while a good short tutorial
on linear and differential cryptanalysis was written by Howard Heys [897].
Doug Stinson’s textbook has another detailed explanation of linear cryptanalysis [1832]; and the modern theory of block ciphers can be traced through the
Further reading
papers in the Fast Software Encryption conference series. The original book on
modes of operation is by Carl Meyer and Steve Matyas [1303]. Neal Koblitz
has a good basic introduction to the mathematics behind public key cryptography [1062]; and the number field sieve is described by Arjen and Henrik
Lenstra [1143]. For the practical attacks on TLS over the past twenty years, see
the survey paper by Christopher Meyer and Joerg Schwenk [1304] as well as
the chapter on Side Channels later in this book.
If you want to work through the mathematical detail of theoretical cryptology, there’s an recent graduate textbook by Dan Boneh and Victor Shoup [288].
A less thorough but more readable introduction to randomness and algorithms
is in [836]. Research at the theoretical end of cryptology is found at the FOCS,
STOC, Crypto, Eurocrypt and Asiacrypt conferences.
The history of cryptology is fascinating, and so many old problems keep on
recurring that anyone thinking of working with crypto should study it. The
standard work is Kahn [1003]; there are also compilations of historical articles
from Cryptologia [529–531] as well as several books on the history of cryptology
in World War II by Kahn, Marks, Welchman and others [440, 1004, 1226, 2011].
The NSA Museum at Fort George Meade, Md., is also worth a visit, but perhaps
the best is the museum at Bletchley Park in England.
Finally, no chapter that introduces public key encryption would be complete
without a mention that, under the name of ‘non-secret encryption,’ it was first
discovered by James Ellis in about 1969. However, as Ellis worked for GCHQ,
his work remained classified. The RSA algorithm was then invented by Clifford
Cocks, and also kept secret. This story is told in [626]. One effect of the secrecy
was that their work was not used: although it was motivated by the expense
of Army key distribution, Britain’s Ministry of Defence did not start building
electronic key distribution systems for its main networks until 1992. And the
classified community did not pre-invent digital signatures; they remain the
achievement of Whit Diffie and Martin Hellman.
205
CHAPTER
6
Access Control
Anything your computer can do for you it can potentially do for someone else.
– ALAN COX
Microsoft could have incorporated effective security measures as standard, but good sense
prevailed. Security systems have a nasty habit of backfiring and there is no doubt they would
cause enormous problems.
– RICK MAYBURY
6.1 Introduction
I first learned to program on an IBM mainframe whose input was punched
cards and whose output was a printer. You queued up with a deck of cards, ran
the job, and went away with printout. All security was physical. Then along
came machines that would run more than one program at once, and the protection problem of preventing one program from interfering with another. You
don’t want a virus to steal the passwords from your browser, or patch a banking application so as to steal your money. And many reliability problems stem
from applications misunderstanding each other, or fighting with each other.
But it’s tricky to separate applications when the customer wants them to share
data. It would make phishing much harder if your email client and browser
ran on separate machines, so you were unable to just click on URLs in emails,
but that would make life too hard.
From the 1970s, access control became the centre of gravity of computer security. It’s where security engineering meets computer science. Its function is
to control which principals (persons, processes, machines, . . .) have access to
which resources in the system – which files they can read, which programs they
can execute, how they share data with other principals, and so on. It’s become
horrendously complex. If you start out by leafing through the 7000-plus pages
of Arm’s architecture reference manual or the equally complex arrangements
207
208
Chapter 6
■
Access Control
for Windows, your first reaction might be ‘I wish I’d studied music instead!’ In
this chapter I try to help you make sense of it all.
Access control works at a number of different levels, including at least:
1. Access controls at the application level may express a very rich,
domain-specific security policy. The call centre staff in a bank are typically not allowed to see your account details until you have answered a
couple of security questions; this not only stops outsiders impersonating
you, but also stops the bank staff looking up the accounts of celebrities,
or their neighbours. Some transactions might also require approval from
a supervisor. And that’s nothing compared with the complexity of the
access controls on a modern social networking site, which will have a
thicket of rules about who can see, copy, and search what data from
whom, and privacy options that users can set to modify these rules.
2. The applications may be written on top of middleware, such as a
web browser, a bank’s bookkeeping system or a social network’s
database management system. These enforce a number of protection
properties. For example, bookkeeping systems ensure that a transaction that debits one account must credit another, with the debits
and credits balancing so that money cannot be created or destroyed;
they must also allow the system’s state to be reconstructed later.
3. As the operating system constructs resources such as files and communications ports from lower level components, it has to provide ways
to control access to them. Your Android phone treats apps written by
different companies as different users and protects their data from
each other. The same happens when a shared server separates the
VMs, containers or other resources belonging to different users.
4. Finally, the operating system relies on the processor and its associated memory-management hardware, which control which
memory addresses a given process or thread can access.
As we work up from the hardware through the operating system and
middleware to the application layer, the controls become progressively more
complex and less reliable. And we find the same access-control functions
being implemented at multiple layers. For example, the separation between
different phone apps that is provided by Android is mirrored in your browser
which separates web page material according to the domain name it came
from (though this separation is often less thorough). And the access controls
built at the application layer or the middleware layer may largely duplicate
access controls in the underlying operating system or hardware. It can get
very messy, and to make sense of it we need to understand the underlying
principles, the common architectures, and how they have evolved.
6.2 Operating system access controls
I will start off by discussing operating-system protection mechanisms that
support the isolation of multiple processes. These came first historically – being
invented along with the first time-sharing systems in the 1960s – and they
remain the foundation on which many higher-layer mechanisms are built, as
well as inspiring similar mechanisms at higher layers. They are often described
as discretionary access control (DAC) mechanisms, which leave protection to the
machine operator, or mandatory access control (MAC) mechanisms which are
typically under the control of the vendor and protect the operating system itself
from being modified by malware. I’ll give an introduction to software attacks
and techniques for defending against them – MAC, ASLR, sandboxing, virtualisation and what can be done with hardware. Modern hardware not only
provides CPU support for virtualisation and capabilities, but also hardware
support such as TPM chips for trusted boot to stop malware being persistent. These help us tackle the toxic legacy of the old single-user PC operating
systems such as DOS and Win95/98 which let any process modify any data,
and constrain the many applications that won’t run unless you trick them into
thinking that they are running with administrator privileges.
6.2 Operating system access controls
The access controls provided with an operating system typically authenticate
principals using a mechanism such as passwords or fingerprints in the case of
phones, or passwords or security protocols in the case of servers, then authorise
access to files, communications ports and other system resources.
Access controls can often be modeled as a matrix of access permissions, with
columns for files and rows for users. We’ll write r for permission to read, w for
permission to write, x for permission to execute a program, and - for no access
at all, as shown in Figure 6.1.
In this simplified example, Sam is the system administrator and has universal
access (except to the audit trail, which even he should only be able to read).
Alice, the manager, needs to execute the operating system and application, but
only through the approved interfaces – she mustn’t have the ability to tamper
Operating
Accounts
Accounting
Audit
System
Program
Data
Trail
Sam
rwx
rwx
rw
r
Alice
x
x
rw
–
Bob
rx
r
r
r
Figure 6.1: Naive access control matrix
209
210
Chapter 6
■
Access Control
with them. She also needs to read and write the data. Bob, the auditor, can read
everything.
This is often enough, but in the specific case of a bookkeeping system it’s not
quite what we need. We want to ensure that transactions are well-formed – that
each debit is balanced by credits somewhere else – so we don’t want Alice to
have uninhibited write access to the account file. We would also rather that Sam
didn’t have this access. So we would prefer that write access to the accounting
data file be possible only via the accounting program. The access permissions
might now look like in Figure 6.2:
User
Operating
Accounts
Accounting
Audit
System
Program
Data
Trail
Sam
rwx
rwx
r
r
Alice
rx
x
–
–
Accounts program
rx
rx
rw
w
Bob
rx
r
r
r
Figure 6.2: Access control matrix for bookkeeping
Another way of expressing a policy of this type would be with access triples
of (user, program, file). In the general case, our concern isn’t with a program so
much as a protection domain which is a set of processes or threads that share
access to the same resources.
Access control matrices (whether in two or three dimensions) can be used to
implement protection mechanisms as well as just model them. But they don’t
scale well: a bank with 50,000 staff and 300 applications would have a matrix of
15,000,000 entries, which might not only impose a performance overhead but
also be vulnerable to administrators’ mistakes. We will need a better way of
storing and managing this information, and the two main options are to compress the users and to compress the rights. With the first, we can use groups or
roles to manage large sets of users simultaneously, while with the second we
may store the access control matrix either by columns (access control lists) or
rows (capabilities, also known as ‘tickets’ to protocol engineers and ‘permissions’ on mobile phones) [1642, 2024].
6.2.1 Groups and roles
When we look at large organisations, we usually find that most staff fit into
one of a small number of categories. A bank might have 40 or 50: teller, call
centre operator, loan officer and so on. Only a few dozen people (security
6.2 Operating system access controls
manager, chief foreign exchange dealer, … ) will need personally customised
access rights.
So we need to design a set of groups, or functional roles, to which staff can
be assigned. Some vendors (such as Microsoft) use the words group and role
almost interchangeably, but a more careful definition is that a group is a list
of principals, while a role is a fixed set of access permissions that one or more
principals may assume for a period of time. The classic example of a role is the
officer of the watch on a ship. There is exactly one watchkeeper at any one time,
and there is a formal procedure whereby one officer relieves another when the
watch changes. In most government and business applications, it’s the role that
matters rather than the individual.
Groups and roles can be combined. The officers of the watch of all ships currently
at sea is a group of roles. In banking, the manager of the Cambridge branch
might have their privileges expressed by membership of the group manager
and assumption of the role acting manager of Cambridge branch. The group manager might express a rank in the organisation (and perhaps even a salary band)
while the role acting manager might include an assistant accountant standing in
while the manager, deputy manager, and branch accountant are all off sick.
Whether we need to be careful about this distinction is a matter for the application. In a warship, even an ordinary seaman may stand watch if everyone
more senior has been killed. In a bank, we might have a policy that “transfers
over $10m must be approved by two staff, one with rank at least manager and
one with rank at least assistant accountant”. If the branch manager is sick, then
the assistant accountant acting as manager might have to get the regional head
office to provide the second signature on a large transfer.
6.2.2 Access control lists
The traditional way to simplify the management of access rights is to store the
access control matrix a column at a time, along with the resource to which the
column refers. This is called an access control list or ACL (pronounced ‘ackle’).
In the first of our above examples, the ACL for file 3 (the account file) might
look as shown here in Figure 6.3.
User
Accounting
Data
Sam
rw
Alice
rw
Bob
r
Figure 6.3: Access control list (ACL)
211
212
Chapter 6
■
Access Control
ACLs have a number of advantages and disadvantages as a means of managing security state. They are a natural choice in environments where users
manage their own file security, and became widespread in Unix systems from
the 1970s. They are the basic access control mechanism in Unix-based systems
such as Linux and Apple’s macOS, as well as in derivatives such as Android
and iOS. The access controls in Windows were also based on ACLs, but have
become more complex over time. Where access control policy is set centrally,
ACLs are suited to environments where protection is data-oriented; they are
less suited where the user population is large and constantly changing, or
where users want to be able to delegate their authority to run a particular
program to another user for some set period of time. ACLs are simple to
implement, but are not efficient for security checking at runtime, as the typical
operating system knows which user is running a particular program, rather
than what files it has been authorized to access since it was invoked. The
operating system must either check the ACL at each file access, or keep track
of the active access rights in some other way.
Finally, distributing the access rules into ACLs makes it tedious to find
all the files to which a user has access. Verifying that no files have been left
world-readable or even world-writable could involve checking ACLs on
millions of user files; this is a real issue for large complex firms. Although you
can write a script to check whether any file on a server has ACLs that breach
a security policy, you can be tripped up by technology changes; the move to
containers has led to many corporate data exposures as admins forgot to check
the containers’ ACLs too. (The containers themselves are often dreadful as it’s
a new technology being sold by dozens of clueless startups.) And revoking
the access of an employee who has just been fired will usually have to be done
by cancelling their password or authentication token.
Let’s look at an important example of ACLs – their implementation in Unix
(plus its derivatives Android, MacOS and iOS).
6.2.3 Unix operating system security
In traditional Unix systems, files are not allowed to have arbitrary access
control lists, but simply rwx attributes that allow the file to be read, written
and executed. The access control list as normally displayed has a flag to show
whether the file is a directory, then flags r, w and x for owner, group and world
respectively; it then has the owner’s name and the group name. A directory
with all flags set would have the ACL:
drwxrwxrwx Alice Accounts
In our first example in Figure 6.1, the ACL of file 3 would be:
-rw-r----- Alice Accounts
6.2 Operating system access controls
This records that the file is simply a file rather than a directory; that the file
owner can read and write it; that group members (including Bob) can read it
but not write it; that non-group members have no access at all; that the file
owner is Alice; and that the group is Accounts.
The program that gets control when the machine is booted (the operating
system kernel) runs as the supervisor, and has unrestricted access to the whole
machine. All other programs run as users and have their access mediated by the
supervisor. Access decisions are made on the basis of the userid associated with
the program. However if this is zero (root), then the access control decision is
‘yes’. So root can do what it likes – access any file, become any user, or whatever.
What’s more, there are certain things that only root can do, such as starting
certain communication processes. The root userid is typically made available
to the system administrator in systems with discretionary access control.
This means that the system administrator can do anything, so we have difficulty implementing an audit trail as a file that they cannot modify. In our
example, Sam could tinker with the accounts, and have difficulty defending
himself if he were falsely accused of tinkering; what’s more, a hacker who
managed to become the administrator could remove all evidence of his intrusion. The traditional, and still the most common, way to protect logs against
root compromise is to keep them separate. In the old days that meant sending
the system log to a printer in a locked room; nowadays, it means sending it
to another machine, or even to a third-party service. Increasingly, it may also
involve mandatory access control, as we discuss later.
Second, ACLs only contain the names of users, not of programs; so there is
no straightforward way to implement access triples of (user, program, file).
Instead, Unix provides an indirect method: the set-user-id (suid) file attribute.
The owner of a program can mark the file representing that program as suid,
which enables it to run with the privilege of its owner rather than the privilege
of the user who has invoked it. So in order to achieve the functionality needed
by our second example above, we could create a user ‘account-package’ to
own file 2 (the accounts package), make the file suid and place it in a directory
to which Alice has access. This special user can then be given the access that
the accounts program needs.
But when you take an access control problem that has three dimensions –
(user, program, data) – and implement it using two-dimensional mechanisms,
the outcome is much less intuitive than triples and people are liable to make
mistakes. Programmers are often lazy or facing tight deadlines; so they just
make the application suid root, so it can do anything. This practice leads
to some shocking security holes. The responsibility for making access control
decisions is moved from the operating system environment to the application
program, and most programmers are insufficiently experienced to check everything they should. (It’s hard to know what to check, as the person invoking
213
214
Chapter 6
■
Access Control
a suid root program controls its environment and could manipulate this in
unexpected ways.)
Third, ACLs are not very good at expressing mutable state. Suppose we want
a transaction to be authorised by a manager and an accountant before it’s acted
on; we can either do this at the application level (say, by having queues of transactions awaiting a second signature) or by doing something fancy with suid.
Managing stateful access rules is difficult; they can complicate the revocation of
users who have just been fired, as it can be hard to track down the files they’ve
opened, and stuff can get stuck.
Fourth, the Unix ACL only names one user. If a resource will be used by
more than one of them, and you want to do access control at the OS level, you
have a couple of options. With older systems you had to use groups; newer
systems implement the Posix system of extended ACLs, which may contain
any number of named user and named group entities. In theory, the ACL and
suid mechanisms can often be used to achieve the desired effect. In practice,
programmers are often in too much of a hurry to figure out how to do this,
and security interfaces are usually way too fiddly to use. So people design their
code to require much more privilege than it strictly ought to have, as that seems
to be the only way to get the job done.
6.2.4 Capabilities
The next way to manage the access control matrix is to store it by rows. These
are called capabilities, and in our example in Figure 6.1 above, Bob’s capabilities
would be as in Figure 6.4 here:
User
Bob
Operating
Accounts
Accounting
Audit
System
Program
Data
Trail
rx
r
r
r
Figure 6.4: A capability
The strengths and weaknesses of capabilities are roughly the opposite of
ACLs. Runtime security checking is more efficient, and we can delegate a right
without much difficulty: Bob could create a certificate saying ‘Here is my capability and I hereby delegate to David the right to read file 4 from 9am to 1pm,
signed Bob’. On the other hand, changing a file’s status becomes tricky as it
can be hard to find out which users have access. This can be tiresome when we
have to investigate an incident or prepare evidence. In fact, scalable systems
end up using de-facto capabilities internally, as instant system-wide revocation is just too expensive; in Unix, file descriptors are really capabilities, and
6.2 Operating system access controls
continue to grant access for some time even after ACL permissions or even
file owners change. In a distributed Unix, access may persist for the lifetime of
Kerberos tickets.
Could we do away with ACLs entirely then? People built experimental
machines in the 1970s that used capabilities throughout [2024]; the first
commercial product was the Plessey System 250, a telephone-switch controller [1578]. The IBM AS/400 series systems brought capability-based
protection to the mainstream computing market in 1988, and enjoyed some
commercial success. The public key certificates used in cryptography are in
effect capabilities, and became mainstream from the mid-1990s. Capabilities
have started to supplement ACLs in operating systems, including more recent
versions of Windows, FreeBSD and iOS, as I will describe later.
In some applications, they can be the natural way to express security policy.
For example, a hospital may have access rules like ‘a nurse shall have access to
all the patients who are on his or her ward, or who have been there in the last
90 days’. In early systems based on traditional ACLs, each access control decision required a reference to administrative systems to find out which nurses
and which patients were on which ward, when – but this made both the HR
system and the patient administration system safety-critical, which hammered
reliability. Matters were fixed by giving nurses ID cards with certificates that
entitle them to access the files associated with a number of wards or hospital departments [535, 536]. If you can make the trust relationships in systems
mirror the trust relationships in that part of the world you’re trying to automate, you should. Working with the grain can bring advantages at all levels in
the stack, making things more usable, supporting safer defaults, cutting errors,
reducing engineering effort and saving money too.
6.2.5 DAC and MAC
In the old days, anyone with physical access to a computer controlled all of
it: you could load whatever software you liked, inspect everything in memory or on disk and change anything you wanted to. This is the model behind
discretionary access control (DAC): you start your computer in supervisor mode
and then, as the administrator, you can make less-privileged accounts available for less-trusted tasks – such as running apps written by companies you
don’t entirely trust, or giving remote logon access to others. But this can make
things hard to manage at scale, and in the 1970s the US military started a
huge computer-security research program whose goal was to protect classified
information: to ensure that a file marked ‘Top Secret’ would never be made
available to a user with only a ‘Secret’ clearance, regardless of the actions of
any ordinary user or even of the supervisor. In such a multilevel secure (MLS)
system, the sysadmin is no longer the boss: ultimate control rests with a remote
215
216
Chapter 6
■
Access Control
government authority that sets security policy. The mechanisms started to be
described as mandatory access control (MAC). The supervisor, or root access if
you will, is under remote control. This drove development of technology for
mandatory access control – a fascinating story, which I tell in Part 2 of the book.
From the 1980s, safety engineers also worked on the idea of safety integrity
levels; roughly, that a more dependable system must not rely on a less dependable one. They started to realise they needed something similar to multilevel
security, but for safety. Military system people also came to realise that the
tamper-resistance of the protection mechanisms themselves was of central
importance. In the 1990s, as computers and networks became fast enough
to handle audio and video, the creative industries lobbied for digital rights
management (DRM) in the hope of preventing people undermining their
business models by sharing music and video. This is also a form of mandatory
access control – stopping a subscriber sharing a song with a non-subscriber is
in many ways like stopping a Top Secret user sharing an intelligence report
with a Secret user.
In the early 2000s, these ideas came together as a number of operating-system
vendors started to incorporate ideas and mechanisms from the MAC research
programme into their products. The catalyst was an initiative by Microsoft and
Intel to introduce cryptography into the PC platform to support DRM. Intel
believed the business market for PCs was saturated, so growth would come
from home sales where, they believed, DRM would be a requirement. Microsoft
started with DRM and then realised that offering rights management for documents too might be a way of locking customers tightly into Windows and
Office. They set up an industry alliance, now called the Trusted Computing
Group, to introduce cryptography and MAC mechanisms into the PC platform. To do this, the operating system had to be made tamper-resistant, and
this is achieved by means of a separate processor, the Trusted Platform Module
(TPM), basically a smartcard chip mounted on the PC motherboard to support
trusted boot and hard disk encryption. The TPM monitors the boot process,
and at each stage a hash of everything loaded so far is needed to retrieve the
key needed to decrypt the next stage. The real supervisor on the system is now
no longer you, the machine owner – it’s the operating-system vendor.
MAC, based on TPMs and trusted boot, was used in Windows 6 (Vista) from
2006 as a defence against persistent malware1 . The TPM standards and architecture were adapted by other operating-system vendors and device OEMs,
1
Microsoft had had more ambitious plans; its project Palladium would have provided a new, more
trusted world for rights-management apps, alongside the normal one for legacy software. They
launched Information Rights Management – DRM for documents – in 2003 but corporates didn’t
buy it, seeing it as a lock-in play. A two-world implementation turned out to be too complex for
Vista and after two separate development efforts it was abandoned; but the vision persisted from
2004 in Arm’s TrustZone, which I discuss below.
6.2 Operating system access controls
and there is now even a project for an open-source TPM chip, OpenTitan, based
on Google’s product. However the main purpose of such a design, whether the
design itself is open or closed, is to lock a hardware device to using specific
software.
6.2.6 Apple’s macOS
Apple’s macOS operating system (formerly called OS/X or Mac OS X) is based
on the FreeBSD version of Unix running on top of the Mach kernel. The BSD
layer provides memory protection; applications cannot access system memory
(or each others’) unless running with advanced permissions. This means, for
example, that you can kill a wedged application using the ‘Force Quit’ command without having to reboot the system. On top of this Unix core are a
number of graphics components, including OpenGL, Quartz, QuickTime and
Carbon, while at the surface the Aqua user interface provides an elegant and
coherent view to the user.
At the file system level, macOS is almost a standard Unix. The default installation has the root account disabled, but users who may administer the system
are in a group ‘wheel’ that allows them to su to root. If you are such a user,
you can install programs (you are asked for the root password when you do
so). Since version 10.5 (Leopard), it has been based on TrustedBSD, a variant of
BSD that incorporates mandatory access control mechanisms, which are used
to protect core system components against tampering by malware.
6.2.7 iOS
Since 2008, Apple has led the smartphone revolution with the iPhone, which
(along with other devices like the iPad) uses the iOS operating system – which
is now (in 2020) the second-most popular. iOS is based on Unix; Apple took the
Mach kernel from CMU and fused it with the FreeBSD version of Unix, making
a number of changes for performance and robustness. For example, in vanilla
Unix a filename can have multiple pathnames that lead to an inode representing a file object, which is what the operating system sees; in iOS, this has been
simplified so that files have unique pathnames, which in turn are the subject of
the file-level access controls. Again, there is a MAC component, where mechanisms from Domain and Type Enforcement (DTE) are used to tamper-proof
core system components (we’ll discuss DTE in more detail in chapter 9). Apple
introduced this because they were worried that apps would brick the iPhone,
leading to warranty claims.
Apps also have permissions, which are capabilities; they request a capability
to access device services such as the mobile network, the phone, SMSes, the
camera, and the first time the app attempts to use such a service. This is granted
217
218
Chapter 6
■
Access Control
if the user consents2 . The many device services open up possible side-channel
attacks; for example, an app that’s denied access to the keyboard could deduce
keypresses using the accelerometer and gyro. We’ll discuss side channels in
Part 2, in the chapter on that subject.
The Apple ecosystem is closed in the sense that an iPhone will only run apps
that Apple has signed3 . This enables the company to extract a share of app
revenue, and also to screen apps for malware or other undesirable behaviour,
such as the exploitation of side channels to defeat access controls.
The iPhone 5S introduced a fingerprint biometric and payments, adding
a secure enclave (SE) to the A7 processor to give them separate protection.
Apple decided to trust neither iOS nor TrustZone with such sensitive data,
since vulnerabilities give transient access until they’re patched. Its engineers
also worried that an unpatchable exploit might be found in the ROM (this
eventually happened, with Checkm8). While iOS has access to the system
partition, the user’s personal data are encrypted, with the keys managed by
the SE. Key management is bootstrapped by a unique 256-bit AES key burned
into fusible links on the system-on-chip. When the device is powered up, the
user has ten tries to enter a passcode; only then are file keys derived from the
master key and made available4 . When the device is locked, some keys are still
usable so that iOS can work out who sent an incoming message and notify you;
the price of this convenience is that forensic equipment can get some access to
user data. The SE also manages upgrades and prevents rollbacks. Such public
information as there is can be found in the iOS Security white paper [129].
The security of mobile devices is a rather complex issue, involving not just
access controls and tamper resistance, but the whole ecosystem – from the
provision of SIM cards through the operation of app stores to the culture of
how people use devices, how businesses try to manipulate them and how
government agencies spy on them. I will discuss this in detail in the chapter
on phones in Part 2.
6.2.8 Android
Android is the world’s most widely used operating system, with 2.5 billion
active Android devices in May 2019, according to Google’s figures. Android
2
The trust-on-first-use model goes back to the 1990s with the Java standard J2ME, popularised by
Symbian, and the Resurrecting Duckling model from about the same time. J2ME also supported
trust-on-install and more besides. When Apple and Android came along, they initially made different choices. In each case, having an app store was a key innovation; Nokia failed to realise that
this was important to get a two-sided market going. The app store does some of the access control
by deciding what apps can run. This is hard power in Apple’s case, and soft power in Android’s;
we’ll discuss this in the chapter on phones.
3
There are a few exceptions: corporates can get signing keys for internal apps, but these can be
blacklisted if abused.
4 I’ll discuss fusible links in the chapter on tamper resistance, and iPhone PIN retry defeats in the
chapter on surveillance and privacy.
6.2 Operating system access controls
is based on Linux; apps from different vendors run under different userids.
The Linux mechanisms control access at the file level, preventing one app from
reading another’s data and exhausting shared resources such as memory and
CPU. As in iOS, apps have permissions, which are in effect capabilities: they
grant access to device services such as SMSes, the camera and the address book.
Apps come in signed packages, as .apk files, and while iOS apps are signed
by Apple, the verification keys for Android come in self-signed certificates
and function as the developer’s name. This supports integrity of updates
while maintaining an open ecosystem. Each package contains a manifest that
demands a set of permissions, and users have to approve the ‘dangerous’
ones – roughly, those that can spend money or compromise personal data.
In early versions of Android, the user would have to approve the lot on
installation or not run the app. But experience showed that most users would
just click on anything to get through the installation process, and you found
even flashlight apps demanding access to your address book, as they could
sell it for money. So Android 6 moved to the Apple model of trust on first use;
apps compiled for earlier versions still demand capabilities on installation.
Since Android 5, SELinux has been used to harden the operating system with
mandatory access controls, so as not only to protect core system functions from
attack but also to separate processes strongly and log violations. SELinux was
developed by the NSA to support MAC in government systems; we’ll discuss
it further in chapter 9. The philosophy is actions require the consent of three
parties: the user, the developer and the platform.
As with iOS (and indeed Windows), the security of Android is a matter
of the whole ecosystem, not just of the access control mechanisms. The new
phone ecosystem is sufficiently different from the old PC ecosystem, but
inherits enough of the characteristics of the old wireline phone system, that
it merits a separate discussion in the chapter on Phones in Part Two. We’ll
consider other aspects in the chapters on Side Channels and Surveillance.
6.2.9 Windows
The current version of Windows (Windows 10) appears to be the third-most
popular operating system, having achieved a billion monthly active devices
in March 2020 (until 2016, Windows was the leader). Windows has a scarily
complex access control system, and a quick canter through its evolution may
make it easier to understand what’s going on.
Early versions of Windows had no access control. A break came with Windows 4 (NT), which was very much like Unix, and was inspired by it, but with
some extensions. First, rather than just read, write and execute there were separate attributes for take ownership, change permissions and delete, to support more
flexible delegation. These attributes apply to groups as well as users, and group
219
220
Chapter 6
■
Access Control
permissions allow you to achieve much the same effect as suid programs in
Unix. Attributes are not simply on or off, as in Unix, but have multiple values:
you can set AccessDenied, AccessAllowed or SystemAudit. These are parsed in
that order: if an AccessDenied is encountered in an ACL for the relevant user or
group, then no access is permitted regardless of any conflicting AccessAllowed
flags. The richer syntax lets you arrange matters so that everyday configuration tasks, such as installing printers, don’t have to require full administrator
privileges.
Second, users and resources can be partitioned into domains with distinct
administrators, and trust can be inherited between domains in one direction or
both. In a typical large company, you might put all the users into a personnel
domain administered by HR, while assets such as servers and printers may
be in resource domains under departmental control; individual workstations
may even be administered by their users. Things can be arranged so that the
departmental resource domains trust the user domain, but not vice versa – so
a hacked or careless departmental administrator can’t do too much external
damage. The individual workstations would in turn trust the department
(but not vice versa) so that users can perform tasks that require local privilege (such as installing software packages). Limiting the damage a hacked
administrator can do still needs careful organisation. The data structure used
to manage all this, and hide the ACL details from the user interface, is called
the Registry. Its core used to be the Active Directory, which managed remote
authentication – using either a Kerberos variant or TLS, encapsulated behind
the Security Support Provider Interface (SSPI), which enables administrators to
plug in other authentication services. Active Directory is essentially a database
that organises users, groups, machines, and organisational units within a
domain in a hierarchical namespace. It lurked behind Exchange, but is now
being phased out as Microsoft becomes a cloud-based company and moves its
users to Office365.
Windows has added capabilities in two ways which can override or complement ACLs. First, users or groups can be either allowed or denied access by
means of profiles. Security policy is set by groups rather than for the system
as a whole; group policy overrides individual profiles, and can be associated
with sites, domains or organisational units, so it can start to tackle complex
problems. Policies can be created using standard tools or custom coded.
The second way in which capabilities insinuate their way into Windows is
that in many applications, people use TLS for authentication, and TLS certificates provide another, capability-oriented, layer of access control outside the
purview of the Active Directory.
I already mentioned that Windows Vista introduced trusted boot to make the
operating system itself tamper-resistant, in the sense that it always boots into a
known state, limiting the persistence of malware. It added three further protection mechanisms to get away from the previous default of all software running
6.2 Operating system access controls
as root. First, the kernel was closed off to developers; second, the graphics
subsystem and most drivers were removed from the kernel; and third, User
Account Control (UAC) replaced the default administrator privilege with user
defaults instead. Previously, so many routine tasks needed administrative privilege that many enterprises made all their users administrators, which made it
difficult to contain malware; and many developers wrote their software on the
assumption that it would have access to everything. According to Microsoft
engineers, this was a major reason for Windows’ lack of robustness: applications monkey with system resources in incompatible ways. So they added
an Application Information Service that launches applications which require
elevated privilege and uses virtualisation to contain them: if they modify the
registry, for example, they don’t modify the ‘real’ registry but simply the version of it that they can see.
Since Vista, the desktop acts as the parent process for later user processes,
so even administrators browse the web as normal users, and malware they
download can’t overwrite system files unless given later authorisation. When
a task requires admin privilege, the user gets an elevation prompt asking them
for an admin password. (Apple’s macOS is similar although the details under
the hood differ somewhat.) As admin users are often tricked into installing
malicious software, Vista added mandatory access controls in the form of file
integrity levels. The basic idea is that low-integrity processes (such as code you
download from the Internet) should not be able to modify high-integrity data
(such as system files) in the absence of some trusted process (such as verification of a signature by Microsoft on the code in question).
In 2012, Windows 8 added dynamic access control which lets you control
user access by context, such as their work PC versus their home PC and their
phone; this is done via account attributes in Active Directory, which appear
as claims about a user, or in Kerberos tickets as claims about a domain. In
2016, Windows 8.1 added a cleaner abstraction with principals, which can be a
user, computer, process or thread running in a security context or a group to
which such a principal belongs, and security identifiers (SIDs), which represent
such principals. When a user signs in, they get tickets with the SIDs to which
they belong. Windows 8.1 also prepared for the move to cloud computing
by adding Microsoft accounts (formerly LiveID), whereby a user signs in to a
Microsoft cloud service rather than to a local server. Where credentials are
stored locally, it protects them using virtualisation. Finally, Windows 10 added
a number of features to support the move to cloud computing with a diversity
of client devices, ranging from certificate pinning (which we’ll discuss in
the chapter on Network Security) to the abolition of the old secure attention
sequence ctrl-alt-del (which is hard to do on touch-screen devices and which
users didn’t understand anyway).
To sum up, Windows evolved to provide a richer and more flexible set of
access control tools than any system previously sold in mass markets. It was
221
222
Chapter 6
■
Access Control
driven by corporate customers who need to manage tens of thousands of staff
performing hundreds of different job roles across hundreds of different sites,
providing internal controls to limit the damage that can be done by small numbers of dishonest staff or infected machines. (How such controls are actually
designed will be our topic in the chapter on Banking and Bookkeeping.) The
driver for this development was the fact that Microsoft made over half of its
revenue from firms that licensed more than 25,000 seats; but the cost of the
flexibility that corporate customers demanded is complexity. Setting up access
control for a big Windows shop is a highly skilled job.
6.2.10 Middleware
Doing access control at the level of files and programs was fine in the early
days of computing, when these were the resources that mattered. Since the
1980s, growing scale and complexity has led to access control being done at
other levels instead of (or as well as) at the operating system level. For example,
bookkeeping systems often run on top of a database product such as Oracle,
which looks to the operating system as one large file. So most of the access
control has to be done in the database; all the operating system supplies may
be an authenticated ID for each user who logs on. And since the 1990s, a lot of
the work at the client end has been done by the web browser.
6.2.10.1 Database access controls
Before people started using websites for shopping, database security was
largely a back-room concern. But enterprises now have critical databases to
handle inventory, dispatch and e-commerce, fronted by web servers that pass
transactions to the databases directly. These databases now contain much of the
data that matter to our lives – bank accounts, vehicle registrations and employment records – and failures sometimes expose them to random online users.
Database products, such as Oracle, DB2 and MySQL, have their own access
control mechanisms, which are modelled on operating-system mechanisms,
with privileges typically available for both users and objects (so the mechanisms are a mixture of access control lists and capabilities). However, the
typical database access control architecture is comparable in complexity with
Windows; modern databases are intrinsically complex, as are the things they
support – typically business processes involving higher levels of abstraction
than files or domains. There may be access controls aimed at preventing any
user learning too much about too many customers; these tend to be stateful,
and may deal with possible statistical inference rather than simple yes-no
access rules. I devote a whole chapter in Part 2 to exploring the topic of
Inference Control.
6.2 Operating system access controls
Ease of administration is often a bottleneck. In companies I’ve advised, the
operating-system and database access controls have been managed by different
departments, which don’t talk to each other; and often IT departments have to
put in crude hacks to make the various access control systems seem to work as
one, but which open up serious holes.
Some products let developers bypass operating-system controls. For
example, Oracle has both operating system accounts (whose users must be
authenticated externally by the platform) and database accounts (whose users
are authenticated directly by the Oracle software). It is often convenient to
use the latter, to save the effort of synchronising with what other departments
are doing. In many installations, the database is accessible directly from the
outside; and even where it’s shielded by a web service front-end, this often
contains loopholes that let SQL code be inserted into the database.
Database security failures can thus cause problems directly. The Slammer worm in 2003 propagated itself using a stack-overflow exploit against
Microsoft SQL Server 2000 and created large amounts of traffic as compromised
machines sent floods of attack packets to random IP addresses.
Just as Windows is tricky to configure securely, because it’s so complicated,
the same goes for the typical database system. If you ever have to lock one
down – or even just understand what’s going on – you had better read a specialist textbook, such as [1175], or get in an expert.
6.2.10.2 Browsers
The web browser is another middleware platform on which we rely for access
control and whose complexity often lets us down. The main access control rule
is the same-origin policy whereby JavaScript or other active content on a web
page is only allowed to communicate with the IP address that it originally came
from; such code is run in a sandbox to prevent it altering the host system, as I’ll
describe in the next section. But many things can go wrong.
In previous editions of this book, we considered web security to be a matter
of how the servers were configured, and whether this led to cross-site vulnerabilities. For example a malicious website can include links or form buttons
aimed at creating a particular side-effect:
https://mybank.com/transfer.cgi?amount=10000USD&recipient=thief
The idea is that if a user clicks on this who is logged into mybank.com,
there may be a risk that the transaction will be executed, as there’s a valid
session cookie. So payment websites deploy countermeasures such as using
short-lived sessions and an anti-CSRF token (an invisible MAC of the session
cookie), and checking the Referer: header. There are also issues around web
authentication mechanisms; I described OAuth briefly in section 4.7.4. If you
design web pages for a living you had better understand the mechanics of all
223
224
Chapter 6
■
Access Control
this in rather more detail (see for example [120]); but many developers don’t
take enough care. For example, as I write in 2020, Amazon Alexa has just
turned out to have a misconfigured policy on cross-origin resource sharing,
which meant that anyone who compromised another Amazon subdomain
could replace the skills on a target Alexa with malicious ones [1483].
By now there’s a realisation that we should probably have treated browsers as
access control devices all along. After all, the browser is the place on your laptop were you run code written by people you don’t want to trust and who will
occasionally be malicious; as we discussed earlier, mobile-phone operating systems run different apps as different users to give even more robust protection.
Even in the absence of malice, you don’t want to have to reboot your browser
if it hangs because of a script in one of the tabs. (Chrome tries to ensure this by
running each tab in a separate operating-system process.)
Bugs in browsers are exploited in drive-by download attacks, where visiting an
attack web page can infect your machine, and even without this the modern
web environment is extremely difficult to control. Many web pages are full
of trackers and other bad things, supplied by multiple ad networks and data
brokers, which make a mockery of the intent behind the same-origin policy.
Malicious actors can even use web services to launder origin: for example, the
attacker makes a mash-up of the target site plus some evil scripts of his own,
and then gets the victim to view it through a proxy such as Google Translate.
A prudent person will go to their bank website by typing in the URL directly,
or using a bookmark; unfortunately, the marketing industry trains everyone to
click on links in emails.
6.2.11 Sandboxing
The late 1990s saw the emergence of yet another type of access control: the
software sandbox, introduced by Sun with its Java programming language. The
model is that a user wants to run some code that she has downloaded as an
applet, but is concerned that the applet might do something nasty, such as
stealing her address book and mailing it off to a marketing company, or just
hogging the CPU and running down the battery.
The designers of Java tackled this problem by providing a ‘sandbox’ – a
restricted environment in which the code has no access to the local hard
disk (or at most only temporary access to a restricted directory), and is only
allowed to communicate with the host it came from (the same-origin policy).
This is enforced by having the code executed by an interpreter – the Java
Virtual Machine (JVM) – with only limited access rights [784]. This idea was
adapted to JavaScript, the main scripting language used in web pages, though
it’s actually a different language; and other active content too. A version
of Java is also used on smartcards so they can support applets written by
different firms.
6.2 Operating system access controls
6.2.12 Virtualisation
Virtualisation is what powers cloud computing; it enables a single machine to
emulate a number of machines independently, so that you can rent a virtual
machine (VM) in a data centre for a few tens of dollars a month rather than having to pay maybe a hundred for a whole server. Virtualisation was invented in
the 1960s by IBM [496]; a single machine could be partitioned using VM/370
into multiple virtual machines. Initially this was about enabling a new
mainframe to run legacy apps from several old machine architectures; it soon
became normal for a company that bought two computers to use one for its production environment and the other as a series of logically separate machines
for development, testing, and minor applications. It’s not enough to run a
virtual machine monitor (VMM) on top of a host operating system, and then
run other operating systems on top; you have to deal with sensitive instructions
that reveal processor state such as absolute addresses and the processor clock.
Working VMMs appeared for Intel platforms with VMware ESX Server in 2003
and (especially) Xen in 2003, which accounted for resource usage well enough
to enable AWS and the cloud computing revolution. Things can be done more
cleanly with processor support, which Intel has provided since 2006 with VT-x,
and whose details I’ll discuss below. VM security claims rest to some extent
on the argument that a VMM hypervisor’s code can be much smaller than an
operating system and thus easier to code-review and secure; whether there
are actually fewer vulnerabilities is of course an empirical question [1578].
At the client end, virtualisation allows people to run a guest operating
system on top of a host (for example, Windows on top of macOS), which
offers not just flexibility but the prospect of better containment. For example,
an employee might have two copies of Windows running on their laptop – a
locked-down version with the office environment, and another for use at
home. Samsung offers Knox, which creates a virtual machine on a mobile
phone that an employer can lock down and manage remotely, while the user
enjoys a normal Android as well on the same device.
But using virtualisation to separate security domains on clients is harder than
it looks. People need to share data between multiple VMs and if they use ad-hoc
mechanisms, such as USB sticks and webmail accounts, this undermines the
separation. Safe data sharing is far from trivial. For example, Bromium5 offers
VMs tailored to specific apps on corporate PCs, so you have one VM for Office,
one for Acrobat reader, one for your browser and so on. This enables firms
to work reasonably securely with old, unsupported software. So how do you
download an Office document? Well, the browser exports the file from its VM
to the host hard disc, marking it ‘untrusted’, so when the user tries to open
it they’re given a new VM with that document plus Office and nothing else.
5 Now
owned by HP
225
226
Chapter 6
■
Access Control
When they then email this untrusted document, there’s an Outlook plugin that
stops it being rendered in the ‘sent mail’ pane. Things get even more horrible
with network services integrated into apps; the rules on what sites can access
which cookies are complicated, and it’s hard to deal with single signon and
workflows that cross multiple domains. The clipboard also needs a lot more
rules to control it. Many of the rules change from time to time, and are heuristics rather than hard, verifiable access logic. In short, using VMs for separation
at the client requires deep integration with the OS and apps if it’s to appear
transparent to the user, and there are plenty of tradeoffs made between security and usability. In effect, you’re retrofitting virtualisation on to an existing
OS and apps that were not built for it.
Containers have been the hot new topic in the late 2010s. They evolved as
a lightweight alternative to virtualisation in cloud computing and are often
confused with it, especially by the marketing people. My definition is that
while a VM has a complete operating system, insulated from the hardware by
a hypervisor, a container is an isolated guest process that shares a kernel with
other containers. Container implementations separate groups of processes
by virtualising a subset of operating-system mechanisms, including process
identifiers, interprocess communication, and namespaces; they also use techniques such as sandboxing and system call filtering. The business incentive
is to minimise the guests’ size, their interaction complexity and the costs of
managing them, so they are deployed along with orchestration tools. Like
any other new technology, there are many startups with more enthusiasm
than experience. A 2019 survey by Jerry Gamblin disclosed that of the top
1000 containers available to developers on Docker Hub, 194 were setting up
blank root passwords [743]. If you’re going to use cloud systems, you need to
pay serious attention to your choice of tools, and also learn yet another set of
access control mechanisms – those offered by the service provider, such as the
Amazon AWS Identity and Access Management (IAM). This adds another
layer of complexity, which people can get wrong. For example, in 2019 a
security firm providing biometric identification services to banks and the
police left its entire database unprotected; two researchers found it using Elasticsearch and discovered millions of people’s photos, fingerprints, passwords
and security clearance levels on a database that they could not only read but
write [1867].
But even if you tie down a cloud system properly, there are hardware limits on what the separation mechanisms can achieve. In 2018, two classes of
powerful side-channel attacks were published: Meltdown and Spectre, which
I discuss in the following section and at greater length in the chapter on side
channels. Those banks that use containers to deploy payment processing rely,
at least implicitly, on their containers being difficult to target in a cloud the
size of Amazon’s or Google’s. For a comprehensive survey of the evolution of
virtualisation and containers, see Randal [1578].
6.3 Hardware protection
6.3 Hardware protection
Most access control systems set out not just to control what users can do, but
to limit what programs can do as well. In many systems, users can either write
programs, or download and install them, and these programs may be buggy
or even malicious.
Preventing one process from interfering with another is the protection problem. The confinement problem is that of preventing programs communicating
outward other than through authorized channels. There are several flavours
of each. The goal may be to prevent active interference, such as memory overwriting, or to stop one process reading another’s memory directly. This is what
commercial operating systems set out to do. Military systems may also try to
protect metadata – data about other data, or subjects, or processes – so that, for
example, a user can’t find out what other users are logged on to the system or
what processes they’re running.
Unless one uses sandboxing techniques (which are too restrictive for general programming environments), solving the protection problem on a single
processor means, at the very least, having a mechanism that will stop one program from overwriting another’s code or data. There may be areas of memory
that are shared to allow interprocess communication; but programs must be
protected from accidental or deliberate modification, and must have access to
memory that is similarly protected.
This usually means that hardware access control must be integrated with
the processor’s memory management functions. A classic mechanism is segment addressing. Memory is addressed by two registers, a segment register that
points to a segment of memory, and an address register that points to a location within that segment. The segment registers are controlled by the operating
system, often by a component of it called the reference monitor which links the
access control mechanisms with the hardware.
The implementation has become more complex as processors themselves
have. Early IBM mainframes had a two-state CPU: the machine was either in
authorized state or it was not. In the latter case, the program was restricted to
a memory segment allocated by the operating system; in the former, it could
write to segment registers at will. An authorized program was one that was
loaded from an authorized library.
Any desired access control policy can be implemented on top of this, given
suitable authorized libraries, but this is not always efficient; and system security depended on keeping bad code (whether malicious or buggy) out of the
authorized libraries. So later processors offered more complex hardware mechanisms. Multics, an operating system developed at MIT in the 1960s and which
inspired Unix, introduced rings of protection which express differing levels of
privilege: ring 0 programs had complete access to disk, supervisor states ran
227
228
Chapter 6
■
Access Control
in ring 2, and user code at various less privileged levels [1687]. Many of its
features have been adopted in more recent processors.
There are a number of general problems with interfacing hardware and software security mechanisms. For example, it often happens that a less privileged
process such as application code needs to invoke a more privileged process
(e.g., a device driver). The mechanisms for doing this need to be designed with
care, or security bugs can be expected. Also, performance may depend quite
drastically on whether routines at different privilege levels are called by reference or by value [1687].
6.3.1 Intel processors
The Intel 8088/8086 processors used in early PCs had no distinction between
system and user mode, and thus any running program controlled the whole
machine6 . The 80286 added protected segment addressing and rings, so for
the first time a PC could run proper operating systems. The 80386 had built-in
virtual memory, and large enough memory segments (4 Gb) that they could be
ignored and the machine treated as a 32-bit flat address machine. The 486 and
Pentium series chips added more performance (caches, out of order execution
and additional instructions such as MMX).
The rings of protection are supported by a number of mechanisms. The current privilege level can only be changed by a process in ring 0 (the kernel).
Procedures cannot access objects in lower-level rings directly but there are gates
that allow execution of code at a different privilege level and manage the supporting infrastructure, such as multiple stack segments.
From 2006, Intel added hardware support for x86 virtualisation, known as
Intel VT, which helped drive the adoption of cloud computing. Some processor architectures such as S/370 and PowerPC are easy to virtualise, and the
theoretical requirements for this had been established in 1974 by Gerald Popek
and Robert Goldberg [1535]; they include that all sensitive instructions that
expose raw processor state must be privileged instructions. The native Intel
instruction set, however, has sensitive user-mode instructions, requiring messy
workarounds such as application code rewriting and patches to hosted operating systems. Adding VMM support in hardware means that you can run an
operating system in ring 0 as it was designed; the VMM has its own copy of
the memory architecture underneath. You still have to trap sensitive opcodes,
but system calls don’t automatically require VMM intervention, you can run
unmodified operating systems, things go faster and systems are generally more
robust. Modern Intel CPUs now have nine rings: ring 0–3 for normal code,
6 They
had been developed on a crash programme to save market share following the advent of
RISC processors and the market failure of the iAPX432.
6.3 Hardware protection
under which is a further set of ring 0–3 VMM root mode for the hypervisor,
and at the bottom is system management mode (SMM) for the BIOS. In practice,
the four levels that are used are SMM, ring 0 of VMX root mode, the normal
ring 0 for the operating system, and ring 3 above that for applications.
In 2015, Intel released Software Guard eXtensions (SGX), which lets trusted
code run in an enclave – an encrypted section of the memory – while the
rest of the code is executed as usual. The company had worked on such
architectures in the early years of the Trusted Computing initiative, but let
things slide until it needed an enclave architecture to compete with TrustZone,
which I discuss in the next section. The encryption is performed by a Memory
Encryption Engine (MEE), while SGX also introduces new instructions and
memory-access checks to ensure non-enclave processes cannot access enclave
memory (not even root processes). SGX has been promoted for DRM and
securing cloud VMs, particularly those containing crypto keys, credentials or
sensitive personal information; this is under threat from Spectre and similar
attacks, which I discuss in detail in the chapter on side channels. Since SGX’s
security perimeter is the CPU, its software is encrypted in main memory,
which imposes real penalties in both time and space. Another drawback used
to be that SGX code had to be signed by Intel. The company has now delegated
signing (so bad people can get code signed) and from SGXv2 will open up the
root of trust to others. So people are experimenting with SGX malware, which
can remain undetectable by anti-virus software. As SGX apps cannot issue
syscalls, it had been hoped that enclave malware couldn’t do much harm,
yet Michael Schwarz, Samuel Weiser and Daniel Gruss have now worked out
how to mount stealthy return-oriented programming (ROP) attacks from an
enclave on a host app; they argue that the problem is a lack of clarity about
what enclaves are supposed to do, and that any reasonable threat model must
include untrusted enclaves [1691]. This simple point may force a rethink of
enclave architectures; Intel says ‘In the future, Intel’s control-flow enforcement
technology (CET) should help address this threat inside SGX’7 . As for what
comes next, AMD released full system memory encryption in 2016, and Intel
announced a competitor. This aimed to deal with cold-boot and DMA attacks,
and protect code against an untrusted hypervisor; it might also lift space and
performance limits on next-generation enclaves. However, Jan Werner and colleagues found multiple inference and data-injection attacks on AMD’s offering
when it’s used in a virtual environment. [2014]. There’s clearly some way to go.
As well as the access-control vulnerabilities, there are crypto issues, which
I’ll discuss in the chapter on Advanced Cryptographic Engineering.
7 The
best defence against ROP attacks in 2019 appears to be Apple’s mechanism, in the iPhone
X3 and later, for signing pointers with a key that’s kept in a register; this stops ROP attacks as the
attacker can’t guess the signatures.
229
230
Chapter 6
■
Access Control
6.3.2 Arm processors
The Arm is the processor core most commonly used in phones, tablets and
IoT devices; billions have been used in mobile phones alone, with a high-end
device having several dozen Arm cores of various sizes in its chipset. The
original Arm (which stood for Acorn Risc Machine) was the first commercial
RISC design; it was released in 1985, just before MIPS. In 1991, Arm became a
separate firm which, unlike Intel, does not own or operate any fabs: it licenses a
range of processor cores, which chip designers include in their products. Early
cores had a 32-bit datapath and contained fifteen registers, of which seven
were shadowed by banked registers for system processes to cut the cost of
switching context on interrupt. There are multiple supervisor modes, dealing
with fast and normal interrupts, the system mode entered on reset, and various
kinds of exception handling. The core initially contained no memory management, so Arm-based designs could have their hardware protection extensively
customized; there are now variants with memory protection units (MPUs), and
others with memory management units (MMUs) that handle virtual memory
as well.
In 2011, Arm launched version 8, which supports 64-bit processing and
enables multiple 32-bit operating systems to be virtualised. Hypervisor
support added yet another supervisor mode. The cores come in all sizes, from
large 64-bit superscalar processors with pipelines over a dozen stages deep, to
tiny ones for cheap embedded devices.
TrustZone is a security extension that supports the ‘two worlds’ model
mentioned above and was made available to mobile phone makers in
2004 [45]. Phones were the ‘killer app’ for enclaves as operators wanted to
lock subsidised phones and regulators wanted to make the baseband software
that controls the RF functions tamper-resistant [1241]. TrustZone supports an
open world for a normal operating system and general-purpose applications,
plus a closed enclave to handle sensitive operations such as cryptography
and critical I/O (in a mobile phone, this can include the SIM card and the
fingerprint reader). Whether the processor is in a secure or non-secure state
is orthogonal to whether it’s in user mode or a supervisor mode (though the
interaction between secure mode and hypervisor mode can be nontrivial). The
closed world hosts a single trusted execution environment (TEE) with separate
stacks, a simplified operating system, and typically runs only trusted code
signed by the OEM – although Samsung’s Knox, which sets out to provide
‘home’ and ‘work’ environments on your mobile phone, allows regular rich
apps to execute in the secure environment.
Although TrustZone was released in 2004, it was kept closed until 2015;
OEMs used it to protect their own interests and didn’t open it up to app
developers, except occasionally under NDA. As with Intel SGX, there appears
to be no way yet to deal with malicious enclave apps, which might come
6.4 What goes wrong
bundled as DRM with gaming apps or be mandated by authoritarian states;
and, as with Intel SGX, enclave apps created with TrustZone can raise issues
of transparency and control, which can spill over into auditability, privacy and
much else. Again, company insiders mutter ‘wait and see’; no doubt we shall.
Arm’s latest offering is CHERI8 which adds fine-grained capability support
to Arm CPUs. At present, browsers such as Chrome put tabs in different processes, so that one webpage can’t slow down the other tabs if its scripts run
slowly. It would be great if each object in each web page could be sandboxed
separately, but this isn’t possible because of the large cost, in terms of CPU
cycles, of each inter-process context switch. CHERI enables a process spawning
a subthread to allocate it read and write accesses to specific ranges of memory,
so that multiple sandboxes can run in the same process. This was announced
as a product in 2018 and we expect to see first silicon in 2021. The long-term
promise of this technology is that, if it were used thoroughly in operating systems such as Windows, Android or iOS, it would have prevented most of the
zero-day exploits of recent years. Incorporating a new protection technology
at scale costs real money, just like the switch from 32-bit to 64-bit CPUs, but it
could save the cost of lots of patches.
6.4 What goes wrong
Popular operating systems such as Android, Linux and Windows are very large
and complex, with their features tested daily by billions of users under very
diverse circumstances. Many bugs are found, some of which give rise to vulnerabilities, which have a typical lifecycle. After discovery, a bug is reported to
a CERT or to the vendor; a patch is shipped; the patch is reverse-engineered,
and an exploit may be produced; and people who did not apply the patch
in time may find that their machines have been compromised. In a minority
of cases, the vulnerability is exploited at once rather than reported – called a
zero-day exploit as attacks happen from day zero of the vulnerability’s known
existence. The economics, and the ecology, of the vulnerability lifecycle are the
subject of study by security economists; I’ll discuss them in Part 3.
The traditional goal of an attacker was to get a normal account on the system
and then become the system administrator, so they could take over the system
completely. The first step might have involved guessing, or social-engineering,
a password, and then using an operating-system bug to escalate from user to
root [1131].
The user/root distinction became less important in the twenty-first century
for two reasons. First, Windows PCs were the most common online devices
8 Full disclosure: this was developed by
by Robert Watson.
a team of my colleagues at Cambridge and elsewhere, led
231
232
Chapter 6
■
Access Control
(until 2017 when Android overtook them) so they were the most common
attack targets; and as they ran many applications as administrator, an application that could be compromised typically gave administrator access. Second,
attackers come in two basic types: targeted attackers, who want to spy on
a specific individual and whose goal is typically to acquire access to that
person’s accounts; and scale attackers, whose goal is typically to compromise
large numbers of PCs, which they can organise into a botnet. This, too,
doesn’t require administrator access. Even if your mail client does not run as
administrator, it can still be used by a spammer who takes control.
However, botnet herders do prefer to install rootkits which, as their name suggests, run as root; they are also known as remote access trojans or RATs. The
user/root distinction does still matter in business environments, where you do
not want such a kit installed as an advanced persistent threat by a hostile intelligence agency, or by a corporate espionage firm, or by a crime gang doing
reconnaissance to set you up for a large fraud.
A separate distinction is whether an exploit is wormable – whether it can be
used to spread malware quickly online from one machine to another without
human intervention. The Morris worm was the first large-scale case of this, and
there have been many since. I mentioned Wannacry and NotPetya in chapter
2; these used a vulnerability developed by the NSA and then leaked to other
state actors. Operating system vendors react quickly to wormable exploits, typically releasing out-of-sequence patches, because of the scale of the damage
they can do. The most troublesome wormable exploits at the time of writing
are variants of Mirai, a worm used to take over IoT devices that use known
root passwords. This appeared in October 2016 to exploit CCTV cameras, and
hundreds of versions have been produced since, adapted to take over different vulnerable devices and recruit them into botnets. Wormable exploits often
use root access but don’t have to; it is sufficient that the exploit be capable of
automatic onward transmission9 . I will discuss the different types of malware
in more detail in section 21.3.
However, the basic types of technical attack have not changed hugely in a
generation and I’ll now consider them briefly.
6.4.1 Smashing the stack
The classic software exploit is the memory overwriting attack, colloquially
known as ‘smashing the stack’, as used by the Morris worm in 1988; this
infected so many Unix machines that it disrupted the Internet and brought
9 In
rare cases even human transmission can make malware spread quickly: an example was the
ILoveYou worm which spread itself in 2000 via an email with that subject line, which caused
enough people to open it, running a script that caused it to be sent to everyone in the new victim’s
address book.
6.4 What goes wrong
malware forcefully to the attention of the mass media [1810]. Attacks involving
violations of memory safety accounted for well over half the exploits against
operating systems in the late 1990s and early 2000s [484] but the proportion
has been dropping slowly since then.
Programmers are often careless about checking the size of arguments, so an
attacker who passes a long argument to a program may find that some of it gets
treated as code rather than data. The classic example, used in the Morris worm,
was a vulnerability in the Unix finger command. A common implementation
of this would accept an argument of any length, although only 256 bytes had
been allocated for this argument by the program. When an attacker used the
command with a longer argument, the trailing bytes of the argument ended up
overwriting the stack and being executed by the system.
The usual exploit technique was to arrange for the trailing bytes of the argument to have a landing pad – a long space of no-operation (NOP) commands, or
other register commands that didn’t change the control flow, and whose task
was to catch the processor if it executed any of them. The landing pad delivered the processor to the attack code which will do something like creating a
shell with administrative privilege directly (see Figure 6.5).
Stack-overwriting attacks were around long before 1988. Most of the
early 1960s time-sharing systems suffered from this vulnerability, and fixed
it [805]. Penetration testing in the early ’70s showed that one of the most
frequently-used attack strategies was still “unexpected parameters” [1168].
Intel’s 80286 processor introduced explicit parameter checking instructions – verify read, verify write, and verify length – in 1982, but they were
avoided by most software designers to prevent architecture dependencies.
Stack overwriting attacks have been found against all sorts of programmable
devices – even against things like smartcards and hardware security modules,
whose designers really should have known better.
Malicious code
Malicious
argument
‘Landing pad’
Over-long
input
Target machine
memory map
Area allocated for
input buffer
Figure 6.5: Stack smashing attack
Executable
code
233
234
Chapter 6
■
Access Control
6.4.2 Other technical attacks
Many vulnerabilities are variations on the same general theme, in that they
occur when data in grammar A is interpreted as being code in grammar B.
A stack overflow is when data are accepted as input (e.g. a URL) and end up
being executed as machine code. These are failures of type safety. In fact, a stack
overflow can be seen either as a memory safety failure or as a failure to sanitise
user input, but there are purer examples of each type.
The use after free type of safety failure is now the most common cause of
remote execution vulnerabilities and has provided a lot of attacks on browsers
in recent years. It can happen when a chunk of memory is freed and then still
used, perhaps because of confusion over which part of a program is responsible for freeing it. If a malicious chunk is now allocated, it may end up taking
its place on the heap, and when an old innocuous function is called a new,
malicious function may be invoked instead. There are many other variants
on the memory safety theme; buffer overflows can be induced by improper
string termination, passing an inadequately sized buffer to a path manipulation function, and many other subtle errors. See Gary McGraw’s book ‘Software
Security [1268] for a taxonomy.
SQL injection attacks are the most common attack based on failure to sanitise
input, and arise when a careless web developer passes user input to a back-end
database without checking to see whether it contains SQL code. The game is
often given away by error messages, from which a capable and motivated user
may infer enough to mount an attack. There are similar command-injection
problems afflicting other languages used by web developers, such as PHP. The
usual remedy is to treat all user input as suspicious and validate it. But this
can be harder than it looks, as it’s difficult to anticipate all possible attacks and
the filters written for one shell may fail to be aware of extensions present in
another. Where possible, one should only act on user input in a safe context,
by designing such attacks out; where it’s necessary to blacklist specific exploits,
the mechanism needs to be competently maintained.
Once such type-safety and input-sanitisation attacks are dealt with, race conditions are probably next. These occur when a transaction is carried out in two
or more stages, where access rights are verified at the first stage and something sensitive is done at the second. If someone can alter the state in between
the two stages, this can lead to an attack. A classic example arose in early versions of Unix, where the command to create a directory, ‘mkdir’, used to work
in two steps: the storage was allocated, and then ownership was transferred
to the user. Since these steps were separate, a user could initiate a ‘mkdir’ in
background, and if this completed only the first step before being suspended,
a second process could be used to replace the newly created directory with a
link to the password file. Then the original process would resume, and change
ownership of the password file to the user.
6.4 What goes wrong
A more modern example arises with the wrappers used in containers to
intercept system calls made by applications to the operating system, parse
them, and modify them if need be. These wrappers execute in the kernel’s
address space, inspect the enter and exit state on all system calls, and encapsulate only security logic. They generally assume that system calls are atomic,
but modern operating system kernels are highly concurrent. System calls are
not atomic with respect to each other; there are many possibilities for two
system calls to race each other for access to shared memory, which gives rise to
time-of-check-to-time-of-use (TOCTTOU) attacks. An early (2007) example calls a
path whose name spills over a page boundary by one byte, causing the kernel
to sleep while the page is fetched; it then replaces the path in memory [1996].
There have been others since, and as more processors ship in each CPU chip as
time passes, and containers become an ever more common way of deploying
applications, this sort of attack may become more and more of a problem.
Some operating systems have features specifically to deal with concurrency
attacks, but this field is still in flux.
A different type of timing attack can come from backup and recovery systems. It’s convenient if you can let users recover their own files, rather than
having to call a sysadmin – but how do you protect information assets from a
time traveller? People can reacquire access rights that were revoked, and play
even more subtle tricks.
One attack that has attracted a lot of research effort recently is return-oriented
programming (ROP) [1711]. Many modern systems try to prevent type safety
attacks by data execution prevention – marking memory as either code or
data, a measure that goes back to the Burroughs 5000; and if all the code
is signed, surely you’d think that unauthorised code cannot be executed?
Wrong! An attacker can look for gadgets – sequences of instructions with some
useful effect, ending in a return. By collecting enough gadgets, it’s possible
to assemble a machine that’s Turing powerful, and implement our attack
code as a chain of ROP gadgets. Then all one has to do is seize control of the
call stack. This evolved from the return-to-libc attack which uses the common
shared library libc to provide well-understood gadgets; many variants have
been developed since, including an attack that enables malware in an SGX
enclave to mount stealthy attacks on host apps [1691]. The latest attack variant,
block-oriented programming (BOP), can often generate attacks automatically
from crashes discovered by program fuzzing, defeating current control-flow
integrity controls [966]. This coevolution of attack and defence will no doubt
continue.
Finally there are side channels. The most recent major innovation in attack
technology targets CPU pipeline behaviour. In early 2018, two game-changing
attacks pioneered the genre: Meltdown, which exploits side-channels created
by out-of-order execution on Intel processors [1173], and Spectre, which exploits
speculative execution on Intel, AMD and Arm processors [1070]. The basic idea
235
236
Chapter 6
■
Access Control
is that large modern CPUs’ pipelines are so long and complex that they look
ahead and anticipate the next dozen instructions, even if these are instructions
that the current process wouldn’t be allowed to execute (imagine the access
check is two instructions in the future and the read operation it will forbid is
two instructions after that). The path not taken can still load information into a
cache and thus leak information in the form of delays. With some cunning, one
process can arrange things to read the memory of another. I will discuss Spectre
and Meltdown in more detail later in the chapter on side channels. Although
mitigations have been published, further attacks of the same general kind keep
on being discovered, and it may take several years and a new generation of processors before they are brought entirely under control. It all reminds me of a
saying by Roger Needham, that optimisation consists of replacing something
that works with something that almost works, but is cheaper. Modern CPUs
are so heavily optimised that we’re bound to see more variants on the Spectre
theme. Such attacks limit the protection that can be offered not just by containers and VMs, but also by enclave mechanisms such as TrustZone and SGX.
In particular, they may stop careful firms from entrusting high-value cryptographic keys to enclaves and prolong the service life of old-fashioned hardware
cryptography.
6.4.3 User interface failures
A common way to attack a fortress is to trick the guards into helping you, and
operating systems are no exception. One of the earliest attacks was the Trojan
Horse, a program the administrator is invited to run but which contains a nasty
surprise. People would write games that checked whether the player was the
system administrator, and if so would create another administrator account
with a known password. A variant was to write a program with the same name
as a common system utility, such as the ls command which lists all the files
in a Unix directory, and design it to abuse the administrator privilege (if any)
before invoking the genuine utility. You then complain to the administrator that
something’s wrong with the directory. When they enter the directory and type
ls to see what’s there, the damage is done. This is an example of the confused
deputy problem: if A does some task on behalf of B, and its authority comes
from both A and B, and A’s authority exceeds B, things can go wrong. The fix
in this particular case was simple: an administrator’s ‘PATH’ variable (the list
of directories to be searched for a suitably-named program when a command is
invoked) should not contain ‘.’ (the symbol for the current directory). Modern
Unix versions ship with this as a default. But it’s still an example of how you
have to get lots of little details right for access control to be robust, and these
details aren’t always obvious in advance.
Perhaps the most serious example of user interface failure, in terms of the
number of systems historically attacked, consists of two facts: first, Windows
6.4 What goes wrong
is forever popping up confirmation dialogues, which trained people to click
boxes away to get their work done; and second, that until 2006 a user needed
to be the administrator to install anything. The idea was that restricting software installation to admins enabled Microsoft’s big corporate customers, such
as banks and government departments, to lock down their systems so that staff
couldn’t run games or other unauthorised software. But in most environments,
ordinary people need to install software to get their work done. So hundreds
of millions of people had administrator privileges who shouldn’t have needed
them, and installed malicious code when a website simply popped up a box
telling them to do something. This was compounded by the many application
developers who insisted that their code run as root, either out of laziness or
because they wanted to collect data that they really shouldn’t have had. Windows Vista started to move away from this, but a malware ecosystem is now
well established in the PC world, and one is starting to take root in the Android
ecosystem as businesses pressure people to install apps rather than using websites, and the apps demand access to all sorts of data and services that they
really shouldn’t have. We’ll discuss this later in the chapter on phones.
6.4.4 Remedies
Software security is not all doom and gloom; things got substantially better
during the 2000s. At the turn of the century, 90% of vulnerabilties were buffer
overflows; by the time the second edition of this book came out in 2008, it was
just under half, and now it’s even less. Several things made a difference.
1. The first consists of specific defences. Stack canaries are a random
number inserted by the compiler next to the return address on the
stack. If the stack is overwritten, then with high probability the canary
will change [484]. Data execution prevention (DEP) marks all memory as
either data or code, and prevents the former being executed; it appeared
in 2003 with Windows XP. Address space layout randomisation (ASLR)
arrived at the same time; by making the memory layout different in
each instance of a system, it makes it harder for an attacker to predict
target addresses. This is particularly important now that there are
toolkits to do ROP attacks, which bypass DEP. Control flow integrity
mechanisms involve analysing the possible control-flow graph at
compile time and enforcing this at runtime by validating indirect
control-flow transfers; this appeared in 2005 and was incorporated
in various products over the following decade [351]. However the
analysis is not precise, and block-oriented programming attacks
are among the tricks that have evolved to exploit the gaps [966].
2. The second consists of better general-purpose tools. Static-analysis programs such as Coverity can find large numbers of potential software
237
238
Chapter 6
■
Access Control
bugs and highlight ways in which code deviates from best practice;
if used from the start of a project, they can make a big difference. (If
added later, they can throw up thousands of alerts that are a pain to deal
with.) The radical solution is to use a better language; my colleagues
increasingly write systems code in Rust rather than in C or C++10 .
3. The third is better training. In 2002, Microsoft announced a security
initiative that involved every programmer being trained in how to
write secure code. (The book they produced for this, ‘Writing Secure
Code’ [929], is still worth a read.) Other companies followed suit.
4. The latest approach is DevSecOps, which I discuss in section 27.5.6.
Agile development methodology is extended to allow very rapid
deployment of patches and response to incidents; it may enable the
effort put into design, coding and testing to be aimed at the most urgent
problems.
Architecture matters; having clean interfaces that evolve in a controlled
way, under the eagle eye of someone experienced who has a long-term stake
in the security of the product, can make a huge difference. Programs should
only have as much privilege as they need: the principle of least privilege [1642].
Software should also be designed so that the default configuration, and in
general, the easiest way of doing something, should be safe. Sound architecture is critical in achieving safe defaults and using least privilege. However,
many systems are shipped with dangerous defaults and messy code, exposing
all sorts of interfaces to attacks like SQL injection that just shouldn’t happen.
These involve failures of incentives, personal and corporate, as well as
inadequate education and the poor usability of security tools.
6.4.5 Environmental creep
Many security failures result when environmental change undermines a security model. Mechanisms that worked adequately in an initial environment
often fail in a wider one.
Access control mechanisms are no exception. Unix, for example, was originally designed as a ‘single user Multics’ (hence the name). It then became an
operating system to be used by a number of skilled and trustworthy people
in a laboratory who were sharing a single machine. In this environment the
function of the security mechanisms is mostly to contain mistakes; to prevent
one user’s typing errors or program crashes from deleting or overwriting
another user’s files. The original security mechanisms were quite adequate for
this purpose.
10 Rust
emerged from Mozilla research in 2010 and has been used to redevelop Firefox; it’s been
voted the favourite language in the Stack Overflow annual survey from 2016–2019.
6.5 Summary
But Unix security became a classic ‘success disaster’. Over the 50 years since
Ken Thomson started work on it at Bell Labs in 1969, Unix was repeatedly
extended without proper consideration being given to how the protection
mechanisms also needed to be extended. The Berkeley versions assumed an
extension from a single machine to a network of machines that were all on
one LAN and all under one management. The Internet mechanisms (telnet,
ftp, DNS, SMTP) were originally written for mainframes on a secure network.
Mainframes were autonomous, the network was outside the security protocols, and there was no transfer of authorisation. So remote authentication,
which the Berkeley model really needed, was simply not supported. The
Sun extensions such as NFS added to the party, assuming a single firm with
multiple trusted LANs. We’ve had to retrofit protocols like Kerberos, TLS and
SSH as duct tape to hold the world together. The arrival of billions of phones,
which communicate sometimes by wifi and sometimes by a mobile network,
and which run apps from millions of authors (most of them selfish, some of
them actively malicious), has left security engineers running ever faster to
catch up.
Mixing many different models of computation together has been a factor
in the present chaos. Some of their initial assumptions still apply partially,
but none of them apply globally any more. The Internet now has billions of
phones, billions of IoT devices, maybe a billion PCs, and millions of organisations whose managers not only fail to cooperate but may be in conflict.
There are companies that compete; political groups that despise each other,
and nation states that are at war with each other. Users, instead of being trustworthy but occasionally incompetent, are now largely unskilled – but some
are both capable and hostile. Code used to be simply buggy – but now there
is a lot of malicious code out there. Attacks on communications used to be the
purview of intelligence agencies – now they can be done by youngsters who’ve
downloaded attack tools from the net and launched them without any real idea
of how they work.
6.5 Summary
Access control mechanisms operate at a number of levels in a system, from
the hardware up through the operating system and middleware like browsers
to the applications. Higher-level mechanisms can be more expressive, but also
tend to be more vulnerable to attack for a variety of reasons ranging from intrinsic complexity to implementer skill.
The main function of access control is to limit the damage that can be
done by particular groups, users, and programs whether through error or
malice. The most widely fielded examples are Android and Windows at the
client end and Linux at the server end; they have a common lineage and
239
240
Chapter 6
■
Access Control
many architectural similarities. The basic mechanisms (and their problems)
are pervasive. Most attacks involve the opportunistic exploitation of bugs;
products that are complex, widely used, or both are particularly likely to have
vulnerabilities found and turned into exploits. Many techniques have been
developed to push back on the number of implementation errors, to make
it less likely that the resulting bugs give rise to vulnerabilties, and harder to
turn the vulnerabilities into exploits; but the overall dependability of large
software systems improves only slowly.
Research problems
Most of the issues in access control were identified by the 1960s or early 1970s
and were worked out on experimental systems such as Multics [1687] and
the CAP [2024]. Much of the research in access control systems since then has
involved reworking the basic themes in new contexts, such as mobile phones.
Recent threads of research include enclaves, and the CHERI mechanisms for
adding finer-grained access control. Another question is: how will developers
use such tools effectively?
In the second edition I predicted that ‘a useful research topic for the next
few years will be how to engineer access control mechanisms that are not just
robust but also usable – by both programmers and end users.’ Recent work
by Yasemin Acar and others has picked that up and developed it into one
of the most rapidly-growing fields of security research [11]. Many if not most
technical security failures are due at least in part to the poor usability of the protection mechanisms that developers are expected to use. I already mention in
the chapter on cryptography how crypto APIs often induce people to use really
unsafe defaults, such as encrypting long messages with ECB mode; access control is just as bad, as anyone coming cold to the access control mechanisms in
a Windows system or either an Intel or Arm CPU will find.
As a teaser, here’s a new problem. Can we extend what we know about access
control at the technical level – whether hardware, OS or app – to the organisational level? In the 20th century, there were a number of security policies
proposed, from Bell-LaPadula to Clark-Wilson, which we discuss at greater
length in Part 2. Is it time to revisit this for a world of deep outsourcing and
virtual organisations, now that we have interesting technical analogues?
Further reading
There’s a history of virtualisation and containers by Allison Randal at [1578];
a discussion of how mandatory access controls were adapted to operating
systems such as OS X and iOS by Robert Watson in [1997]; and a reference
Further reading
book for Java security written by its architect Li Gong [784]. The Cloud Native
Security Foundation is trying to move people towards better open-source
practices around containers and other technologies for deploying and managing cloud-native software. Going back a bit, the classic descriptions of Unix
security are by Fred Grampp and Robert Morris in 1984 [806] and by Simson
Garfinkel and Eugene Spafford in 1996 [753], while the classic on Internet
security by Bill Cheswick and Steve Bellovin [222] gives many examples of
network attacks on Unix systems.
Carl Landwehr gives a useful reference to many of the flaws found in operating systems in the 1960s through the 1980s [1131]. One of the earliest reports
on the subject (and indeed on computer security in general) is by Willis Ware
in 1970 [1990]; Butler Lampson’s seminal paper on the confinement problem
appeared in 1970s [1127] and three years later, another influential early paper
was written by Jerry Saltzer and Mike Schroeder [1642]. The textbook we get
our students to read on access control issues is Dieter Gollmann’s ‘Computer
Security’ [780]. The standard reference on Intel’s SGX and indeed its CPU security architecture is by Victor Costan and Srini Devadas [479].
The field of software security is fast-moving; the attacks change significantly
(at least in their details) from one year to the next. The classic starting point is
Gary McGraw’s 2006 book [1268]. Since then we’ve had ROP attacks, Spectre
and much else; a short but useful update is Matthias Payer’s Software Security [1506]. But to really keep up, it’s not enough to just read textbooks; you
need to follow security conferences such as Usenix and CCS as well as the
security blogs such as Bruce Schneier, Brian Krebs and – dare I say it – our own
lightbluetouchpaper.org. The most detail on the current attacks is probably in Google’s Project Zero blog; see for example their analysis of attacks
on iPhones found in the wild for an insight into what’s involved in hacking
modern operating systems with mandatory access control components [205].
241
CHAPTER
7
Distributed Systems
A distributed system is one in which the failure of a computer you didn’t even know existed can
render your own computer unusable.
– LESLIE LAMPORT [1125]
What’s in a name? That which we call a rose by any other name would smell as sweet.
– WILLIAM SHAKESPEARE
7.1 Introduction
We need a lot more than authentication, access control and cryptography to
build a robust distributed system of any size. Some things need to happen
quickly, or in the right order, and matters that are trivial to deal with for a few
machines become a big deal once we have hyperscale data centres with complex arrangements for resilience. Everyone must have noticed that when you
update your address book with an online service provider, the update might
appear a second later on another device, or perhaps only hours later.
Over the last 50 years, we’ve learned a lot about issues such as concurrency,
failure recovery and naming as we’ve built things ranging from phone systems
and payment networks to the Internet itself. We have solid theory, and a lot of
hard-won experience. These issues are central to the design of robust secure
systems but are often handled rather badly. I’ve already described attacks
on protocols that arise as concurrency failures. If we replicate data to make
a system fault-tolerant, then we may increase the risk of data theft. Finally,
naming can be a thorny problem. There are complex interactions of people
and objects with accounts, sessions, documents, files, pointers, keys and other
ways of naming stuff. Many organisations are trying to build larger, flatter
namespaces – whether using identity cards to track citizens or using device ID
to track objects – but there are limits to what we can practically do. Big data
243
244
Chapter 7
■
Distributed Systems
means dealing with lots of identifiers, many of which are ambiguous or even
changing, and a lot of things can go wrong.
7.2 Concurrency
Processes are called concurrent if they can run at the same time, and this is
essential for performance; modern computers have many cores and run many
programs at a time, typically for many users. However, concurrency is hard to
do robustly, especially when processes can act on the same data. Processes may
use old data; they can make inconsistent updates; the order of updates may or
may not matter; the system might deadlock; the data in different systems might
never converge to consistent values; and when it’s important to make things
happen in the right order, or even to know the exact time, this can be trickier
than you might think. These issues go up and down the entire stack.
Systems are becoming ever more concurrent for a number of reasons. First
is scale: Google may have started off with four machines but their fleet passed
a million in 2011. Second is device complexity; a luxury car can now contain
dozens to hundreds of different processors. The same holds for your laptop and
your mobile phone. Deep within each CPU, instructions are executed in parallel, and this complexity leads to the Spectre attacks we discussed in the chapter
on access control. On top of this, virtualization technologies such as Xen are
the platforms on which modern cloud services are built, and they may turn a
handful of real CPUs in a server into hundreds or even thousands of virtual
CPUs. Then there’s interaction complexity: going up to the application layer,
an everyday transaction such as booking a rental car may call other systems to
check your credit card, your credit reference agency score, your insurance claim
history and much else, while these systems in turn may depend on others.
Programming concurrent systems is hard, and the standard textbook
examples come from the worlds of operating system internals and of performance measurement. Computer scientists are taught Amdahl’s law: if the
proportion that can be parallelised is p and s is the speedup from the extra
resources, the overall speedup is (1 − p + p∕s)−1 . Thus if three-quarters of your
program can be parallelised but the remaining quarter cannot be, then the
maximum speedup you can get is four times; and if you throw eight cores
at it, the practical speedup is not quite three times1 . But concurrency control
in the real world is also a security issue. Like access control, it is needed to
prevent users interfering with each other, whether accidentally or on purpose.
And concurrency problems can occur at many levels in a system, from the
hardware right up to the business logic. In what follows, I provide a number
of concrete examples; they are by no means exhaustive.
1
(
1−
3
4
+ 34 . 18
)−1
= (0.25 + 0.09375)−1 = (0.34375)−1 = 2.909
7.2 Concurrency
7.2.1 Using old data versus paying to propagate state
I’ve already described two kinds of concurrency problem: replay attacks on
protocols, where an attacker manages to pass off out-of-date credentials; and
race conditions, where two programs can race to update some security state.
As an example, I mentioned the ‘mkdir’ vulnerability from Unix, in which a
privileged instruction that is executed in two phases could be attacked halfway
through by renaming the object on which it acts. Another example goes back to
the 1960s, where in one of the first multiuser operating systems, IBM’s OS/360,
an attempt to open a file caused it to be read and its permissions checked; if
the user was authorised to access it, it was read again. The user could arrange
things so that the file was altered in between [1131].
These are examples of a time-of-check-to-time-of-use (TOCTTOU) attack. We
have systematic ways of finding such attacks in file systems [252], but attacks
still crop up both at lower levels, such as system calls in virtualised environments, and at higher levels such as business logic. Preventing them isn’t always
economical, as propagating changes in security state can be expensive.
A good case study is card fraud. Since credit and debit cards became popular in the 1970s, the banking industry has had to manage lists of hot cards
(whether stolen or abused), and the problem got steadily worse in the 1980s
as card networks went international. It isn’t possible to keep a complete hot
card list in every merchant terminal, as we’d have to broadcast all loss reports
instantly to tens of millions of devices, and even if we tried to verify all transactions with the bank that issued the card, we’d be unable to use cards in
places with no network (such as in remote villages and on airplanes) and we’d
impose unacceptable costs and delays elsewhere. Instead, there are multiple
levels of stand-in processing, exploiting the fact that most payments are local,
or low-value, or both.
Merchant terminals are allowed to process transactions up to a certain limit
(the floor limit) offline; larger transactions need online verification with the merchant’s bank, which will know about all the local hot cards plus foreign cards
that are being actively abused; above another limit it might refer the transaction to a network such as VISA with a reasonably up-to-date international
list; while the largest transactions need a reference to the card-issuing bank. In
effect, the only transactions that are checked immediately before use are those
that are local or large.
Experience then taught that a more centralised approach can work better for
bad terminals. About half the world’s ATM transactions use a service that gets
alerts from subscribing banks when someone tries to use a stolen card at an
ATM, or guesses the PIN wrong. FICO observed that criminals take a handful
of stolen cards to a cash machine and try them out one by one; they maintain a
list of the 40 ATMs worldwide that have been used most recently for attempted
fraud, and banks that subscribe to their service decline all transactions at those
245
246
Chapter 7
■
Distributed Systems
machines – which become unusable by those banks’ cards for maybe half an
hour. Most thieves don’t understand this and just throw them away.
Until about 2010, payment card networks had the largest systems that manage the global propagation of security state, and their experience taught us
that revoking compromised credentials quickly and on a global scale is expensive. The lesson was learned elsewhere too; the US Department of Defense, for
example, issued 16 million certificates to military personnel during 1999–2005,
by which time it had to download 10 million revoked certificates to all security
servers every day, and some systems took half an hour to do this when they
were fired up [1301].
The costs of propagating security state can lead to centralisation. Big service
firms such as Google, Facebook and Microsoft have to maintain credentials
for billions of users anyway, so they offer logon as a service to other websites. Other firms, such as certification authorities, also provide online credentials. But although centralisation can cut costs, a compromise of the central
service can be disruptive. In 2011, for example, hackers operating from Iranian
IP addresses compromised the Dutch certification authority Diginotar. On July
9th, they generated fake certificates and did middleperson attacks on the gmail
of Iranian activists. Diginotar noticed on the 19th that certificates had been
wrongly issued but merely called in its auditors. The hack became public on the
29th, and Google reacted by removing all Diginotar certificates from Chrome
on September 3rd, and getting Mozilla to do likewise. This led immediately to
the failure of the company, and Dutch public services were unavailable online
for many days as ministries scrambled to get certificates for their web services
from other suppliers [471].
7.2.2 Locking to prevent inconsistent updates
When people work concurrently on a document, they may use a version control
system to ensure that only one person has write access at any one time to any
given part of it, or at least to warn of contention and flag up any inconsistent
edits. Locking is one general way to manage contention for resources such as
filesystems and to make conflicting updates less likely. Another approach is
callback; a server may keep a list of all those clients which rely on it for security
state and notify them when the state changes.
Credit cards again provide an example of how this applies to security. If I
own a hotel and a customer presents a credit card on check-in, I ask the card
company for a pre-authorisation, which records that I will want to make a debit
in the near future; I might register a claim on ‘up to $500’. This is implemented
by separating the authorisation and settlement systems. Handling the failure
modes can be tricky. If the card is cancelled the following day, my bank can
7.2 Concurrency
call me and ask me to contact the police, or to get her to pay cash2 . This is an
example of the publish-register-notify model of how to do robust authorisation
in distributed systems (of which there’s a more general description in [153]).
Callback mechanisms don’t provide a universal solution, though. The credential issuer might not want to run a callback service, and the customer might
object on privacy grounds to the issuer being told all her comings and goings.
Consider passports as another example. In many countries, government ID is
required for many transactions, but governments won’t provide any guarantee, and most citizens would object if the government kept a record of every
time an ID document was presented. Indeed, one of the frequent objections to
the Indian government’s requirement that the Aadhar biometric ID system be
used in more and more transactions is that checking citizens’ fingerprints or
iris codes at all significant transactions creates an audit trail of all the places
where they have done business, which is available to officials and to anyone
who cares to bribe them.
There is a general distinction between those credentials whose use gives rise
to some obligation on the issuer, such as credit cards, and the others, such
as passports. Among the differences is whether the credential’s use changes
important state, beyond possibly adding to a log file or other surveillance
system. This is linked with whether the order in which updates are made is
important.
7.2.3 The order of updates
If two transactions arrive at the government’s bank account – say a credit of
$500,000 and a debit of $400,000 – then the order in which they are applied
may not matter much. But if they’re arriving at my bank account, the order will
have a huge effect on the outcome! In fact, the problem of deciding the order
in which transactions are applied has no clean solution. It’s closely related to
the problem of how to parallelise a computation, and much of the art of building efficient distributed systems lies in arranging matters so that processes are
either simply sequential or completely parallel.
The traditional bank algorithm was to batch the transactions overnight and
apply all the credits for each account before applying all the debits. Inputs
from devices such as ATMs and check sorters were first batched up into
journals before the overnight reconciliation. Payments which bounce then
2 My
bank might or might not have guaranteed me the money; it all depends on what sort of
contract I’ve got with it. There were also attacks for a while when crooks figured out how to
impersonate a store and cancel an authorisation so that a card could be used to make multiple
big purchases. And it might take a day or three for the card-issuing bank to propagate an alarm
to the merchant’s bank. A deep dive into all this would be a book chapter in itself!
247
248
Chapter 7
■
Distributed Systems
have to be reversed out – and in the case of ATM and debit transactions where
the cash has already gone, you can end up with customers borrowing money
without authorisation. In practice, chains of failed payments terminate. In
recent years, one country after another has introduced real-time gross settlement
(RTGS) systems in which transactions are booked in order of arrival. There
are several subtle downsides. First, at many institutions, the real-time system
for retail customers is an overlay on a platform that still works by overnight
updates. Second, the outcome can depend on the order of transactions, which
can depend on human, system and network vagaries, which can be an issue
when many very large payments are made between financial institutions.
Credit cards operate a hybrid strategy, with credit limits run in real time while
settlement is run just as in an old-fashioned checking account.
In the late 2010s, the wave of interest in cryptocurrency led some
entrepreneurs to believe that a blockchain might solve the problems of
inconsistent update, simplifying applications such as supply-chain management. The energy costs rule out a blockchain based on proof-of-work for most
applications; but might some other kind of append-only public ledger find
a killer app? We will have to wait and see. Meanwhile, the cryptocurrency
community makes extensive use of off-chain mechanisms that are often very
reminiscent of the checking-account approach: disconnected applications
propose tentative updates that are later reconciled and applied to the main
chain. Experience suggests that there is no magic solution that works in the
general case, short perhaps of having a small number of very large banks that
are very competent at technology. We’ll discuss this further in the chapter on
banking.
In other systems, the order in which transactions arrive is much less important. Passports are a good example. Passport issuers only worry about their
creation and expiration dates, not the order in which visas are stamped on
them3 .
7.2.4 Deadlock
Another problem is deadlock, where two systems are each waiting for the other
to move first. Edsger Dijkstra famously explained this problem, and its possible
solutions, via the dining philosophers’ problem. A number of philosophers are
seated round a table, with a chopstick between each of them; and a philosopher
can only eat when they can pick up the two chopsticks on either side. So if all
of them try to eat at once and each picks up the chopstick on their right, they
get stuck [560].
3 Many
Arab countries won’t let you in with an Israeli stamp on your passport, but most pure
identification systems are essentially stateless.
7.2 Concurrency
This can get really complex when you have multiple hierarchies of locks distributed across systems, some of which fail (and where failures can mean that
the locks aren’t reliable) [152]. And deadlock is not just about technology; the
phrase ‘Catch-22’ has become popular to describe deadlocks in bureaucratic
processes 4 . Where a process is manual, some fudge may be found to get round
the catch, but when everything becomes software, this option may no longer
be available.
In a well known business problem – the battle of the forms – one company
issues an order with its own contract terms attached, another company accepts
it subject to its own terms, and trading proceeds without any further agreement. In the old days, the matter might only be resolved if something went
wrong and the companies ended up in court; even so, one company’s terms
might specify an American court while the other’s specify one in England. As
trading has become more electronic, the winner is often the company that can
compel the loser to trade using its website and thus accept its terms and conditions. Firms increasingly try to make sure that things fail in their favour. The
resulting liability games can have rather negative outcomes for both security
and safety; we’ll discuss them further in the chapter on economics.
7.2.5 Non-convergent state
When designing protocols that update the state of a distributed system, the
‘motherhood and apple pie’ is ACID – that transactions should be atomic, consistent, isolated and durable. A transaction is atomic if you ‘do it all or not at
all’ – which makes it easier to recover after a failure. It is consistent if some
invariant is preserved, such as that the books must still balance. This is common
in banking systems, and is achieved by insisting that the sum total of credits
and debits made by each transaction is zero (I’ll discuss this more in the chapter
on banking and bookkeeping). Transactions are isolated if they are serialisable,
and they are durable if once done they can’t be undone.
These properties can be too much, or not enough, or both. On the one hand,
each of them can fail or be attacked in numerous obscure ways; on the other,
it’s often sufficient to design the system to be convergent. This means that, if
the transaction volume were to tail off, then eventually there would be consistent state throughout [1355]. Convergence is usually achieved using semantic
tricks such as timestamps and version numbers; this can often be enough where
transactions get appended to files rather than overwritten.
In real life, you also need ways to survive things that go wrong and are
not completely recoverable. The life of a security or audit manager can be a
constant battle against entropy: apparent deficits (and surpluses) are always
4 Joseph
Heller’s 1961 novel of that name described multiple instances of inconsistent and crazy
rules in the World War 2 military bureaucracy.
249
250
Chapter 7
■
Distributed Systems
turning up, and sometimes simply can’t be explained. For example, different
national systems have different ideas of which fields in bank transaction
records are mandatory or optional, so payment gateways often have to guess
data in order to make things work. Sometimes they guess wrong; and sometimes people see and exploit vulnerabilities which aren’t understood until
much later (if ever). In the end, things may get fudged by adding a correction
factor and setting a target for keeping it below a certain annual threshold.
Durability is a subject of debate in transaction processing. The advent of
phishing and keylogging attacks has meant that some small proportion of
bank accounts will at any time be under the control of criminals; money gets
moved both from them and through them. When an account compromise is
detected, the bank moves to freeze it and perhaps to reverse payments that
have recently been made from it. The phishermen naturally try to move funds
through institutions, or jurisdictions, that don’t do transaction reversal, or
do it at best slowly and grudgingly [76]. This sets up a tension between the
recoverability and thus the resilience of the payment system on the one hand
and transaction durability and finality on the other5 .
7.2.6 Secure time
The final concurrency problem of special interest to the security engineer is the
provision of accurate time. As authentication protocols such as Kerberos can
be attacked by inducing clock error, it’s not enough to simply trust a random
external time source. One possibility is a Cinderella attack: if a security critical
program such as a firewall has a licence with a timelock, an attacker might wind
your clock forward “and cause your firewall to turn into a pumpkin”. Given
the spread of IoT devices that may be safety-critical and use time in ways that
are poorly understood, there is now some concern about possible large-scale
service denial attacks. Time is a lot harder than it looks: even if you have an
atomic clock, leap seconds cannot be predicted but need to be broadcast somehow; some minutes have 61 and even 62 seconds; odd time effects can be a
security issue6 ; and much of the world is not using the Gregorian calendar.
Anyway, there are several possible approaches to the provision of secure
time. You can give every computer a radio clock, and indeed your smartphone
5 This
problem goes back centuries, with a thicket of laws around whether someone acting in
good faith can acquire good title to stolen goods or stolen funds. The Bills of Exchange Act 1882
gave good title to people who bought bills of exchange in good faith, even if they were stolen.
Something similar used to hold for stolen goods bought in an open market, but that was eventually repealed. In the case of electronic payments, the banks acted as a cartel to make payments
final more quickly, both via card network rules and by lobbying European institutions over the
Payment Services Directives. As for the case of bitcoin, it’s still in flux; see section 20.7.5.
6 Some ATMs didn’t check customer balances for a few days after Y2K, leading to unauthorised
overdrafts once the word got round.
7.3 Fault tolerance and failure recovery
has GPS – but that can be jammed by a passing truck driver. You can abandon absolute time and instead use Lamport time, in which all you care about is
whether event A happened before event B rather than what date it is [1124].
For robustness reasons, Google doesn’t use time in its internal certificates, but
uses ranges of serial numbers coupled to a revocation mechanism [23].
In many applications, you may end up using the network time protocol (NTP).
This has a moderate amount of protection, with clock voting and authentication of time servers, and is dependable enough for many purposes. However, you still need to take care. For example, Netgear hardwired their home
routers to use an NTP server at the University of Wisconsin-Madison, which
was swamped with hundreds of thousands of packets a second; Netgear ended
up having to pay them $375,000 to maintain the time service for three years.
Shortly afterwards, D-Link repeated the same mistake [447]. Second, from 2016
there have been denial-of-service attacks using NTP servers as force multipliers; millions of servers turned out to be abusable, so many ISPs and even IXPs
started blocking them. So if you’re planning to deploy lots of devices outside
your corporate network that will rely on NTP, you’d better think hard about
which servers you want to trust and pay attention to the latest guidance from
CERT [1801].
7.3 Fault tolerance and failure recovery
Failure recovery is often the most important aspect of security engineering, yet
it is one of the most neglected. For many years, most of the research papers
on computer security have dealt with confidentiality, and most of the rest with
authenticity and integrity; availability has almost been ignored. Yet the actual
expenditures of a modern information business – whether a bank or a search
engine – are the other way round. Far more is spent on availability and recovery mechanisms, such as multiple processing sites and redundant networks,
than on integrity mechanisms such as code review and internal audit, and this
in turn is way more than is spent on encryption. As you read through this book,
you’ll see that many other applications, from burglar alarms through electronic
warfare to protecting a company from DDoS attacks, are fundamentally about
availability. Fault tolerance and failure recovery are often the core of the security engineer’s job.
Classical fault tolerance is usually based on redundancy, fortified using
mechanisms such as logs and locking, and is greatly complicated when it must
withstand malicious attacks on these mechanisms. Fault tolerance interacts
with security in a number of ways: the failure model, the nature of resilience,
the location of redundancy used to provide it, and defence against service
denial attacks. I’ll use the following definitions: a fault may cause an error,
which is an incorrect state; this may lead to a failure, which is a deviation from
251
252
Chapter 7
■
Distributed Systems
the system’s specified behavior. The resilience which we build into a system
to tolerate faults and recover from failures will have a number of components,
such as fault detection, error recovery and if necessary failure recovery. The
meaning of mean-time-before-failure (MTBF) and mean-time-to-repair (MTTR)
should be obvious.
7.3.1 Failure models
In order to decide what sort of resilience we need, we must know what sort of
attacks to expect. Much of this will come from an analysis of threats specific to
our system’s operating environment, but some general issues bear mentioning.
7.3.1.1 Byzantine failure
First, the failures with which we are concerned may be normal or malicious,
and we often model the latter as Byzantine. Byzantine failures are inspired by
the idea that there are n generals defending Byzantium, t of whom have been
bribed by the attacking Turks to cause as much confusion as possible. The generals can pass oral messages by courier, and the couriers are trustworthy, so
each general can exchange confidential and authentic communications with
each other general (we could imagine them encrypting and computing a MAC
on each message). What is the maximum number t of traitors that can be tolerated?
The key observation is that if we have only three generals, say Anthony, Basil
and Charalampos, and Anthony is the traitor, then he can tell Basil “let’s attack”
and Charalampos “let’s retreat”. Basil can now say to Charalampos “Anthony
says let’s attack”, but this doesn’t let Charalampos conclude that Anthony’s the
traitor. It could just as easily have been Basil; Anthony could have said “let’s
retreat” to both of them, but Basil lied when he said “Anthony says let’s attack”.
This beautiful insight is due to Leslie Lamport, Robert Shostak and
Marshall Pease, who proved that the problem has a solution if and only if
n ≥ 3t + 1 [1126]. Of course, if the generals are able to sign their messages,
then no general dare say different things to two different colleagues. This
illustrates the power of digital signatures in particular and of end-to-end
security mechanisms in general. There is now a substantial literature on
Byzantine fault tolerance – the detailed design of systems able to withstand
this kind of failure; see for example the algorithm by Miguel Castro and
Barbara Liskov [396].
Another lesson is that if a component which fails (or can be induced to fail
by an opponent) gives the wrong answer rather than just no answer, then it’s
much harder to use it to build a resilient system. It can be useful if components
that fail just stop, or if they can at least be quickly identified and blacklisted.
7.3 Fault tolerance and failure recovery
7.3.1.2 Interaction with fault tolerance
So we can constrain the failure rate in a number of ways. The two most
obvious are by using redundancy and fail-stop processes. The latter process
error-correction information along with data, and stop when an inconsistency
is detected; for example, bank transaction processing will typically stop if
an out-of-balance condition is detected after a processing task. The two may
be combined; the processors used in some safety-critical functions in cars
and aircraft typically have two or more cores. There was pioneering work
on a fault-tolerant multiprocessor (FTMP) in the 1970s, driven by the Space
Shuttle project; this explored which components should be redundant and
the associated design trade-offs around where the error detection takes place
and how closely everything is synchronised [922]. Such research ended up
driving the design of fault-tolerant processors used in various submarines
and spacecraft, as well as architectures used by Boeing and Airbus. The FTMP
idea was also commercialised by Tandem and then by Stratus, which sold
machines for payment processing. The Stratus had two disks, two buses and
even two CPUs, each of which would stop if it detected errors; the fail-stop
CPUs were built by having two CPU chips on the same card and comparing
their outputs. If they disagreed the output went open-circuit. A replacement
card would arrive in the post; you’d take it down to the machine room, notice
that card 5 had a flashing red light, pull it out and replace it with the new
one – all while the machine was processing dozens of transactions per second.
Nowadays, the data centres of large service firms have much more elaborate
protocols to ensure that if a machine fails, another machine takes over; if a
rack fails, another rack takes over; and even if a data centre fails, its workload
is quickly recovered on others. Google was a leader in developing the relevant
software stack, having discovered in the early 2000s that it was much cheaper
to build large-scale systems with commodity PCs and smart software than to
buy ever-larger servers from specialist vendors.
While redundancy can make a system more resilient, it has costs. First, we
have to deal with a more complex software stack and toolchain. Banks eventually moved away from Stratus because they found it was less reliable overall
than traditional mainframes: although there was less downtime due to hardware failure, this didn’t compensate for the extra software failure caused by
an unfamiliar development environment. Second, if I have multiple sites with
backup data, then confidentiality could fail if any of them gets compromised7 ;
and if I have some data that I have a duty to destroy, then purging it from multiple backup tapes can be a headache. The modern-day issue with developing
software in containers on top of redundant cloud services is not so much the
7 Or
the communications between your data centres get tapped; we discussed in section 2.2.1.3
how GCHQ did that to Google.
253
254
Chapter 7
■
Distributed Systems
programming languages, or compromise via data centres; it’s that developers
are unfamiliar with the cloud service providers’ access control tools and all too
often leave sensitive data world-readable.
There are other traps for the unwary. In one case in which I was called as
an expert, my client was arrested while using a credit card in a store, accused
of having a forged card, and beaten up by the police. He was adamant that
the card was genuine. Much later, we got the card examined by VISA, who
confirmed that it was indeed genuine. What happened, as well as we can reconstruct it, was this. Credit cards have two types of redundancy on the magnetic
strip – a simple checksum obtained by combining together all the bytes on the
track using exclusive-or, and a cryptographic checksum which we’ll describe
in detail later in section 12.5.1. The former is there to detect errors, and the latter to detect forgery. It appears that in this particular case, the merchant’s card
reader was out of alignment in such a way as to cause an even number of bit
errors which cancelled each other out by chance in the simple checksum, while
causing the crypto checksum to fail. The result was a false alarm, and a major
disruption in my client’s life.
Redundancy is hard enough to deal with in mechanical systems. For
example, training pilots to handle multi-engine aircraft involves drilling them
on engine failure procedures, first in the simulator and then in real aircraft
with an instructor. Novice pilots are in fact more likely to be killed by an
engine failure in a multi-engine plane than in a single; landing in the nearest
field is less hazardous for them than coping with sudden asymmetric thrust.
The same goes for instrument failures; it doesn’t help to have three artificial
horizons in the cockpit if, under stress, you rely on the one that’s broken.
Aircraft are much simpler than many modern information systems – yet there
are still air crashes when pilots fail to manage the redundancy that’s supposed
to keep them safe. There are also complex failures, as when two Boeing 737
Max aircraft crashed because of failures in a single sensor, when the plane
had two but the software failed to read them both, and the pilots hadn’t been
trained how to diagnose the problem and manage the consequences. All too
often, system designers put in multiple protection mechanisms and don’t
think through the consequences carefully enough. Many other safety failures
are failures of usability, and the same applies to security, as we discussed in
Chapter 3; redundancy isn’t an antidote to poor design.
7.3.2 What is resilience for?
When introducing redundancy or other resilience mechanisms into a system,
we need to understand what they’re for and the incentives facing the various
actors. It therefore matters whether the resilience is local or crosses geographical or organisational boundaries.
7.3 Fault tolerance and failure recovery
In the first case, replication can be an internal feature of the server to make it
more trustworthy. I already mentioned 1980s systems such as Stratus and Tandem; then we had replication of standard hardware at the component level,
such as redundant arrays of inexpensive disks (RAID). Since the late 1990s there
has been massive investment in developing rack-scale systems that let multiple cheap PCs do the work of expensive servers, with mechanisms to ensure
a single server that fails will have its workload taken over rapidly by another,
and indeed a rack that fails can also be recovered on a hot spare. These are now
a standard component of cloud service architecture: any firm operating hundreds of thousands of servers will have so many failures that recovery must be
largely automated.
But often things are much more complicated. A service may have to assume
that some of its clients are trying to cheat it and may also have to rely on a number of services, none of which is completely accurate. When opening a bank
account, or issuing a passport, we might want to check against services from
voter rolls through credit reference agencies to a database of driver’s licences,
and the results may often be inconsistent. Trust decisions may involve complex
logic, not entirely unlike the systems used in electronic warfare to try to work
out which of your inputs are being jammed. (I’ll discuss these further in the
chapter on electronic and information warfare.)
The direction of mistrust has an effect on protocol design. A server faced with
multiple untrustworthy clients and a client relying on multiple servers that
may be incompetent, unavailable or malicious will both wish to control the
flow of messages in a protocol in order to contain the effects of service denial.
It’s hard to design systems for the real world in which everyone is unreliable
and all are mutually suspicious.
Sometimes the emphasis is on security renewability. The obvious example here
is bank cards: a bank can upgrade security from time to time by mailing out
newer versions of its cards, whether upgrading from mag strip to chip or from
cheap chips to more sophisticated ones; and it can recover from a compromise
by mailing out cards out of cycle to affected customers. Pay TV and mobile
phones are somewhat similar.
7.3.3 At what level is the redundancy?
Systems may be made resilient against errors, attacks and equipment failures
at a number of levels. As with access control, these become progressively more
complex and less reliable as we go up to higher layers in the system.
Some computers have been built with redundancy at the hardware level,
such as Stratus systems and RAID discs I mentioned earlier. But simple
replication cannot provide a defense against malicious software, or against an
intruder who exploits faulty software.
255
256
Chapter 7
■
Distributed Systems
At the next level up, there is process group redundancy. Here, we may run multiple copies of a system on multiple servers in different locations and compare
their outputs. This can stop the kind of attack in which the opponent gets physical access to a machine and subverts it, whether by mechanical destruction or
by inserting unauthorised software. It can’t defend against attacks by authorised users or damage by bad authorised software, which could simply order
the deletion of a critical file.
The next level is backup, where we typically take a copy of the system (a
checkpoint) at regular intervals. The copies are usually kept on media that can’t
be overwritten such as write-protected tapes or discs with special software.
We may also keep journals of all the transactions applied between checkpoints.
Whatever the detail, backup and recovery mechanisms not only enable us
to recover from physical asset destruction, they also ensure that if we do get
an attack at the logical level, we have some hope of recovering. The classic
example in the 1980s would have been a time bomb that deletes the customer
database on a specific date; since the arrival of cryptocurrency, the fashion has
been for ransomware.
Businesses with critical service requirements, such as banks and retailers,
have had backup data centres for many years. The idea is that if the main centre goes down, the service will failover to a second facility. Maintaining such
facilities absorbed most of a typical bank’s information security budget.
Backup is not the same as fallback. A fallback system is typically a less capable system to which processing reverts when the main system is unavailable.
One example was the use of manual imprinting machines to capture credit
card transactions from the card embossing when electronic terminals failed.
Fallback systems are an example of redundancy in the application layer – the
highest layer we can put it.
It is important to realise that these are different mechanisms, which do different things. Redundant disks won’t protect against a malicious programmer
who deletes all your account files, and backups won’t stop him if rather than
just deleting files he writes code that slowly inserts more and more errors8 .
Neither will give much protection against attacks on data confidentiality. On
the other hand, the best encryption in the world won’t help you if your data
processing center burns down. Real-world recovery plans and mechanisms
involve a mixture of all of the above.
The remarks that I made earlier about the difficulty of redundancy, and the
absolute need to plan and train for it properly, apply in spades to system
backup. When I was working in banking in the 1980s, we reckoned that we
could probably get our backup system working within an hour or so of our
8 Nowadays
the really serious ransomware operators will hack your system, add file encryption
surreptitiously and wait before they pounce – so they hold hostage not just your current data but
several weeks’ backups too
7.3 Fault tolerance and failure recovery
main processing centre being destroyed, but the tests were limited by the
fact that we didn’t want to risk processing during business hours: we would
recover the main production systems on our backup data centre one Saturday
a year. By the early 1990s, Tesco, a UK supermarket, had gotten as far as
live drills: they’d pull the plug on the main processing centre once a year
without warning the operators, to make sure the backup came up within
40 seconds. By 2011, Netflix had developed ‘chaos monkeys’ – systems that
would randomly knock out a machine, or a rack, or even a whole data centre,
to test resilience constantly. By 2019, large service firms have gotten to such
a scale that they don’t need this. If you have three million machines across
thirty data centres, then you’ll lose machines constantly, racks frequently, and
whole data centres often enough that you have to engineer things to keep
going. So nowadays, you can simply pay money and a cloud service provider
will worry about a lot of the detail for you. But you need to really understand
what sort of failures Amazon or Google or Microsoft can handle for you and
what you have to deal with yourself. The standard service level agreements of
the major providers allow them to interrupt your service for quite a few hours
per month, and if you use a smaller cloud service (even a government cloud),
it will have capacity limits about which you have to think carefully.
It’s worth trying to work out which services you depend on that are outside
your direct supply chain. For example, Britain suffered a fuel tanker drivers’
strike in 2001, and some hospitals had to close because of staff shortages, which
was supposed to not happen. The government had allocated petrol rations to
doctors and nurses, but not to schoolteachers. So the schools closed, and the
nurses had to stay home to look after their kids, and this closed hospitals too.
This helped the strikers defeat Prime Minister Tony Blair: he abandoned his
signature environmental policy of steadily increasing fuel duty. As we become
increasingly dependent on each other, contingency planning gets ever harder.
7.3.4 Service-denial attacks
One of the reasons we want security services to be fault-tolerant is to make
service-denial attacks less attractive, less effective, or both. Such attacks are
often used as part of a larger plan. For example, one might take down a security
server to force other servers to use cached copies of credentials, or swamp a
web server to take it temporarily offline and then get another machine to serve
the pages that victims try to download.
A powerful defense against service denial is to prevent the opponent from
mounting a selective attack. If principals are anonymous – say there are several
equivalent services behind a load balancer, and the opponent has no idea which
one to attack – then he may be ineffective. I’ll discuss this further in the context
of burglar alarms and electronic warfare.
257
258
Chapter 7
■
Distributed Systems
Where this isn’t possible, and the opponent knows where to attack, then there
are some types of service-denial attacks that can be stopped by redundancy and
resilience mechanisms and others that can’t. For example, the TCP/IP protocol
has few effective mechanisms for hosts to protect themselves against network
flooding, which comes in a wide variety of flavours. Defense against this kind
of attack tends to involve moving your site to a beefier hosting service with
specialist packet-washing hardware – or tracing and arresting the perpetrator.
Distributed denial-of-service (DDoS) attacks came to public notice when they
were used to bring down Panix, a New York ISP, for several days in 1996. During the late 1990s they were occasionally used by script kiddies to take down
chat servers. In 2001 I mentioned them in passing in the first edition of this
book. Over the following three years, extortionists started using them; they’d
assemble a botnet, a network of compromised PCs, which would flood a target
webserver with packet traffic until its owner paid them to desist. Typical targets were online bookmakers, and amounts of $10,000 – $50,000 were typically
demanded to leave them alone, and the typical bookie paid up the first time this
happened. When the attacks persisted, the first solution was replication: operators moved their websites to hosting services such as Akamai whose servers
are so numerous (and so close to customers) that they can shrug off anything
the average botnet could throw at them. In the end, the blackmail problem
was solved when the bookmakers met and agreed not to pay any more blackmail money, and the Ukrainian police were prodded into arresting the gang
responsible.
By 2018, we had come full circle, and about fifty bad people were operating
DDoS-as-a-service, mostly for gamers who wanted to take down their opponents’ teamspeak servers. The services were sold online as ‘booters’ that would
boot your opponents out of the game; a few dollars would get a flood of perhaps 100Gbit/sec. Service operators also called them, more euphemistically,
‘stressors’ – with the line that you could use them to test the robustness of
your own website. This didn’t fool anyone, and just before Christmas 2018
the FBI took down fifteen of these sites, arresting a number of their operators and causing the volumes of DDoS traffic to drop noticeably for several
months [1447].
Finally, where a more vulnerable fallback system exists, a common technique
is to use a service-denial attack to force victims into fallback mode. The classic
example is in payment cards. Smartcards are generally harder to forge than
magnetic strip cards, but perhaps 1% of them fail every year, thanks to static
electricity and worn contacts. Also, some tourists still use magnetic strip cards.
So most card payment systems still have a fallback mode that uses the magnetic
strip. A simple attack is to use a false terminal, or a bug inserted into the cable to
a genuine terminal, to capture card details and then write them to the magnetic
strip of a card with a dead chip.
7.4 Naming
7.4 Naming
Naming is a minor if troublesome aspect of ordinary distributed systems, but
it becomes surprisingly hard in security engineering. During the dotcom boom
in the 1990s, when SSL was invented and we started building public-key certification authorities, we hit the problem of what names to put on certificates.
A certificate that says simply “the person named Ross Anderson is allowed to
administer machine X” is little use. I used to be the only Ross Anderson I knew
of; but as soon as the first search engines came along, I found dozens of us. I am
also known by different names to dozens of different systems. Names exist in
contexts, and naming the principals in secure systems is becoming ever more
important and difficult.
Conceptually, namespaces can be hierarchical or flat. You can identify me as
‘The Ross Anderson who teaches computer science at Cambridge, England’ or
as ‘The Ross Anderson who’s rossjanderson@gmail.com’ or even as ‘the Ross
Anderson with such-and-such a passport number’. But these are not the same
kind of thing, and linking them causes all sorts of problems.
In general, using more names increases complexity. A public-key certificate
that simply says “this is the key to administer machine X” is a bearer token,
just like a metal door key; whoever controls the private key for that certificate
is the admin, just as if the root password were in an envelope in a bank vault.
But once my name is involved, and I have to present some kind of passport or
ID card to prove who I am, the system acquires a further dependency. If my
passport is compromised the consequences could be far-reaching, and I really
don’t want to give the government an incentive to issue a false passport in my
name to one of its agents.
After 9/11, governments started to force businesses to demand governmentissue photo ID in places where this was not previously thought necessary. In
the UK, for example, you can no longer board a domestic flight using just the
credit card with which you bought the ticket; you have to produce a passport
or driving license – which you also need to order a bank transfer in a branch
for more than £1000, to rent an apartment, to hire a lawyer or even to get a job.
Such measures are not only inconvenient but introduce new failure modes into
all sorts of systems.
There is a second reason that the world is moving towards larger, flatter name
spaces: the growing dominance of the large service firms in online authentication. Your name is increasingly a global one; it’s your Gmail or Hotmail
address, your Twitter handle, or your Facebook account. These firms have not
merely benefited from the technical externalities, which we discussed in the
chapter on authentication, and business externalities, which we’ll discuss in
the chapter on economics, they have sort-of solved some of the problems of
naming. But we can’t be complacent as many other problems remain. So it’s
259
260
Chapter 7
■
Distributed Systems
useful to canter through what a generation of computer science researchers
have learned about naming in distributed systems.
7.4.1 The Needham naming principles
During the last quarter of the twentieth century, engineers building distributed
systems ran up against many naming problems. The basic algorithm used to
bind names to addresses is known as rendezvous: the principal exporting a
name advertises it somewhere, and the principal seeking to import and use it
searches for it. Obvious examples include phone books and file system directories.
People building distributed systems soon realised that naming gets complex
quickly, and the lessons are set out in a classic article by Needham [1426]. Here
are his ten principles.
1. The function of names is to facilitate sharing. This continues to hold:
my bank account number exists in order to share the information
that I deposited money last week with the teller from whom I
am trying to withdraw money this week. In general, names are
needed when the data to be shared is changeable. If I only ever
wished to withdraw exactly the same sum as I’d deposited, a
bearer deposit certificate would be fine. Conversely, names need
not be shared – or linked – where data will not be; there is no
need to link my bank account number to my telephone number
unless I am going to pay my phone bill from the account.
2. The naming information may not all be in one place, and so resolving names
brings all the general problems of a distributed system. This holds with
a vengeance. A link between a bank account and a phone number
assumes both of them will remain stable. So each system relies on
the other, and an attack on one can affect the other. Many banks
use two-channel authorisation to combat phishing – if you order
a payment online, you get a text message on your mobile phone
saying ‘if you want to pay $X to account Y, please enter the following
four-digit code into your browser’. The standard attack is for the
crook to claim to be you to the phone company and report the loss of
your phone. So they give him a new SIM that works for your phone
number, and he makes off with your money. The phone company
could stop that, but it doesn’t care too much about authentication,
as all it stands to lose is some airtime, whose marginal cost is zero.
And the latest attack is to use Android malware to steal authentication codes. Google could stop that by locking down the Android
platform as tightly as Apple – but it lacks the incentive to do so.
7.4 Naming
3. It is bad to assume that only so many names will be needed. The shortage
of IP addresses, which motivated the development of IP version 6
(IPv6), is well enough discussed. What is less well known is that the
most expensive upgrade the credit card industry ever had to make was
the move from thirteen-digit credit card numbers to sixteen. Issuers
originally assumed that thirteen digits would be enough, but the
system ended up with tens of thousands of banks – many with dozens
of products – so a six-digit bank identification number was needed.
Some issuers have millions of customers, so a nine-digit account
number is the norm. And there’s also a check digit to detect errors.
4. Global names buy you less than you think. For example, the 128-bit
address in IPv6 can in theory enable every object in the universe to
have a unique name. However, for us to do business, a local name
at my end must be resolved into this unique name and back into
a local name at your end. Invoking a unique name in the middle
may not buy us anything; it may even get in the way if the unique
naming service takes time, costs money, or occasionally fails (as it
surely will). In fact, the name service itself will usually have to be
a distributed system, of the same scale (and security level) as the
system we’re trying to protect. So we can expect no silver bullets from
this quarter. Adding an extra name, or adopting a more complicated
one, has the potential to add extra costs and failure modes.
5. Names imply commitments, so keep the scheme flexible enough to cope
with organisational changes. This sound principle was ignored
in the design of the UK government’s key management system for secure email [116]. There, principals’ private keys are
generated from their email addresses. So the frequent reorganisations meant that the security infrastructure had to be rebuilt
each time – and that more money had to be spent solving secondary problems such as how people access old material.
6. Names may double as access tickets, or capabilities. We have already
seen a number of examples of this in Chapters 2 and 3. In general,
it’s a bad idea to assume that today’s name won’t be tomorrow’s
password or capability – remember the Utrecht fraud we discussed
in section 4.5. Norway, for example, used to consider the citizen’s
ID number to be public, but it ended up being used as a sort of
password in so many applications that they had to relent and make
it private. There are similar issues around the US Social Security
Number (SSN). So the Department of Defense created a surrogate
number called the EDIPI, which was supposed to be not sensitive; but,
sure enough, people started using it as an authenticator instead of as
an identifier.
261
262
Chapter 7
■
Distributed Systems
I’ve given a number of examples of how things go wrong when a
name starts being used as a password. But sometimes the roles of
name and password are ambiguous. In order to get entry to a car
park I used to use at the university, I had to speak my surname
and parking badge number into a microphone at the barrier. So if I
say, “Anderson, 123”, which of these is the password? In fact it was
“Anderson”, as anyone can walk through the car park and note down
valid badge numbers from the parking permits on the car windscreens.
7. Things are made much simpler if an incorrect name is obvious. In standard
distributed systems, this enables us to take a liberal attitude to caching.
In payment systems, credit card numbers used to be accepted while the
terminal was offline so long as the credit card number appears valid
(i.e., the last digit is a proper check digit of the first fifteen) and it is
not on the hot card list. The certificates on modern chip cards provide
a higher-quality implementation of the same basic concept; authentication mechanisms such as crypto and security printing can give the
added benefit of making names resilient to spoofing. As an example
of what can still go wrong, the Irish police created over 50 dockets
for Mr ‘Prawo Jazdy’, wanted for failing to pay over fifty traffic tickets – until they realised that this is Polish for ‘Driving licence’ [193].
8. Consistency is hard, and is often fudged. If directories are replicated, then
you may find yourself unable to read, or to write, depending on whether
too many or too few directories are available. Naming consistency causes
problems for business in a number of ways, of which perhaps the
most notorious is the bar code system. Although this is simple enough
in theory – with a unique numerical code for each product – in
practice different manufacturers, distributors and retailers attach
quite different descriptions to the bar codes in their databases. Thus
a search for products by ‘Kellogg’s’ will throw up quite different
results depending on whether or not an apostrophe is inserted, and
this can cause confusion in the supply chain. Proposals to fix this
problem can be surprisingly complicated [916]. There are also the
issues of convergence discussed above; data might not be consistent across a system, even in theory. There are also the problems
of timeliness, such as whether a product has been recalled.
9. Don’t get too smart. Phone numbers are much more robust than computer
addresses. Early secure messaging systems – from PGP to government
systems – tried to link keys to email addresses, but these change
when people’s jobs do. More modern systems such as Signal and
WhatsApp use mobile phone numbers instead. In the same way,
early attempts to replace bank account numbers and credit card
7.4 Naming
numbers with public-key certificates in protocols like SET failed,
though in some mobile payment systems, such as Kenya’s M-Pesa,
they’ve been replaced by phone numbers. (I’ll discuss further
specific problems of public key infrastructures in section 21.6.)
10. Some names are bound early, others not; and in general it is a bad thing
to bind early if you can avoid it. A prudent programmer will normally
avoid coding absolute addresses or filenames as that would make
it hard to upgrade or replace a machine. It’s usually better to leave
this to a configuration file or an external service such as DNS. Yet
secure systems often want stable and accountable names as any
third-party service used for last-minute resolution could be a point
of attack. Designers therefore need to pay attention to where the
naming information goes, how devices get personalised with it,
and how they get upgraded – including the names of services on
which the security may depend, such as the NTP service discussed in
section 7.2.6 above.
7.4.2 What else goes wrong
The Needham principles were crafted for the world of the early 1990s in which
naming systems could be imposed at the system owner’s convenience. Once we
moved to the reality of modern web-based (and interlinked) service industries,
operating at global scale, we found that there is more to add.
By the early 2000s, we had learned that no naming system can be globally
unique, decentralised and human-meaningful. In fact, it’s a classic trilemma:
you can only have two of those attributes (Zooko’s triangle) [38]. In the
past, engineers went for naming systems that were unique and meaningful,
like URLs, or unique and decentralised, as with public keys in PGP or the
self-signed certificates that function as app names in Android. Human names
are meaningful and local but don’t scale to the Internet. I mentioned above that
as soon as the first search engines came along, I could instantly find dozens
of other people called Ross Anderson, but it’s even worse than that; half a
dozen worked in fields I’ve also worked in, such as software engineering and
electricity distribution.
The innovation from sites like Facebook is to show on a really large scale that
names don’t have to be unique. We can use social context to build systems that
are both decentralised and meaningful – which is just what our brains evolved
to cope with. Every Ross Anderson has a different set of friends and you can
tell us apart that way.
How can we make sense of all this, and stop it being used to trip people up?
It is sometimes helpful to analyse the properties of names in detail.
263
264
Chapter 7
■
Distributed Systems
7.4.2.1 Naming and identity
First, the principals in security protocols are usually known by many different
kinds of name – a bank account number, a company registration number, a
personal name plus a date of birth or a postal address, a telephone number, a
passport number, a health service patient number, or a userid on a computer
system.
A common mistake is to confuse naming with identity. Identity is when two
different names (or instances of the same name) correspond to the same principal (this is known to computer scientists as an indirect name or symbolic link).
One classic example comes from the registration of title to real estate. Someone
who wishes to sell a house often uses a different name than they did at the time
it was purchased: they might have changed their name on marriage, or on gender transition, or started using their middle name instead. A land-registration
system must cope with a lot of identity issues like this.
There are two types of identity failure leading to compromise: where I’m
happy to impersonate anybody, and where I want to impersonate a specific
individual. The former case includes setting up accounts to launder cybercrime
proceeds, while an example of the latter is SIM replacement (I want to clone a
CEO’s phone so I can loot a company bank account). If banks (or phone companies) just ask people for two proofs of address, such as utility bills, that’s
easy. Demanding government-issue photo ID may require us to analyse statements such as “The Aaron Bell who owns bank account number 12345678 is the
Aaron James Bell with passport number 98765432 and date of birth 3/4/56”.
This may be seen as a symbolic link between two separate systems – the bank’s
and the passport office’s. Note that the latter part of this ‘identity’ encapsulates
a further statement, which might be something like “The US passport office’s
file number 98765432 corresponds to the entry in the New York birth register
for 3/4/56 of one Aaron James Bell.” If Aaron is commonly known as Jim, it
gets messier still.
In general, names may involve several steps of recursion, which gives attackers a choice of targets. For example, a lot of passport fraud is pre-issue fraud:
the bad guys apply for passports in the names of genuine citizens who haven’t
applied for a passport already and for whom copies of birth certificates are easy
to obtain. Postmortem applications are also common. Linden Labs, the operators of Second Life, introduced a scheme whereby you prove you’re over 18 by
providing the driver’s license number or social security number of someone
who is. Now a web search quickly pulls up such data for many people, such as
the rapper Tupac Amaru Shakur; and yes, Linden Labs did accept Mr Shakur’s
license number – even through the license had expired and he’s dead.
There can also be institutional failure. For example, the United Arab Emirates
started taking iris scans of all visitors after women who had been deported
to Pakistan for prostitution offences would turn up a few weeks later with a
7.4 Naming
genuine Pakistani passport in a different name and accompanied by a different
‘husband’. Similar problems led many countries to issue biometric visas so they
don’t have to depend on passport issuers in countries they don’t want to have
to trust.
In addition to corruption, a pervasive failure is the loss of original records.
In countries where registers of births, marriages and deaths are kept locally
and on paper, some are lost, and smart impersonators exploit these. You might
think that digitisation is fixing this problem, but the long-term preservation
of digital records is a hard problem even for rich countries; document formats
change, software and hardware become obsolete, and you either have to emulate old machines or translate old data, neither of which is ideal. Various states
have run pilot projects on electronic documents that must be kept forever, such
as civil registration, but we still lack credible standards. Sensible developed
countries still keep paper originals as the long-term document of record. In
less developed countries, you may have to steer between the Scylla of flaky
government IT and the Charybdis of natural disasters – while listening to the
siren song of development consultants saying ‘put it on the blockchain!
7.4.2.2 Cultural assumptions
The assumptions that underlie names change from one country to another. In
the English-speaking world, people may generally use as many names as they
please; a name is simply what you are known by. But some countries forbid
the use of aliases, and others require them to be registered. The civil registration of births, marriages, civil partnerships, gender transitions and deaths
is an extremely complex one, often politicised, tied up with religion in many
countries and with the issue of ID documents as well. And incompatible rules
between countries cause real problems for migrants, for tourists and indeed
for companies with overseas customers.
In earlier editions of this book, I gave as an example that writers who change
their legal name on marriage often keep publishing using their former name.
So my lab colleague, the late Professor Karen Spärck Jones, got a letter from
the university every year asking why she hadn’t published anything (she was
down on the payroll as Karen Needham). The publication-tracking system just
could not cope with everything the personnel system knew. And as software
gets in everything and systems get linked up, conflicts can have unexpected
remote effects. For example, Karen was also a trustee of the British Library and
was not impressed when it started to issue its own admission tickets using
the name on the holder’s home university library card. Such issues caused
even more friction when the university introduced an ID card system keyed
to payroll names to give unified access to buildings, libraries and canteens.
These issues with multiple names are now mainstream; it’s not just professors,
265
266
Chapter 7
■
Distributed Systems
musicians and novelists who use more than one name. Trans people who want
to stop firms using names from a previous gender; women who want to stop
using a married name when they separate or divorce, and who perhaps need
to if they’re fleeing an abusive partner; people who’ve assumed new names
following religious conversion – there’s no end of sources of conflict. If you’re
building a system that you hope will scale up globally, you’ll eventually have
to deal with them all.
Human naming conventions also vary by culture. Chinese may have both
English and Chinese given names if they’re from Hong Kong, with the English
one coming before and the Chinese one coming after the family name. Many
people in South India, Indonesia and Mongolia have only a single name – a
mononym. The Indian convention is to add two initials – for your place of
birth and your father’s name. So ‘BK Rajan’ may mean Rajan, son of Kumar,
from Bangalore. A common tactic among South Indian migrants to the USA
is to use the patronymic (here, Kumar) as a surname; but when western computer systems misinterpret Rajan as a surname, confusion can arise. Russians
are known by a forename, a patronymic and a surname. Icelanders have no
surname; their given name is followed by a patronymic if they are male and
a matronymic if they are female. In the old days, when ‘Maria Trosttadóttir’
arrived at US immigration and the officer learned that ‘Trosttadóttir’ isn’t a
surname or even a patronymic, their standard practice was to compel her to
adopt as a surname a patronymic (say, ‘Carlsson’ if her father was called Carl).
Many Indians in the USA have had similar problems, all of which cause unnecessary offence. And then there are cultures where your name changes after you
have children.
Another cultural divide is often thought to be that between the Englishspeaking countries, where identity cards were unacceptable on privacy
grounds9 , and the countries conquered by Napoleon or by the Soviets, where
identity cards are the norm. What’s less well known is that the British Empire
happily imposed ID on many of its subject populations, so the real divide is
perhaps whether a country was ever conquered.
The local history of ID conditions all sorts of assumptions. I know Germans
who have refused to believe that a country could function at all without a
proper system of population registration and ID cards yet admit they are asked
for their ID card only rarely (for example, to open a bank account or get married). Their card number can’t be used as a name because it is a document
number and changes every time a new card is issued. The Icelandic ID card
number, however, is static; it’s just the citizen’s date of birth plus two further
digits. What’s more, the law requires that bank account numbers contain the
account holder’s ID number. These are perhaps the extremes of private and
public ID numbering.
9 unless
they’re called drivers’ licences or health service cards!
7.4 Naming
Finally, in many less developed countries, the act of registering citizens and
issuing them with ID is not just inefficient but political [89]. The ruling tribe
may seek to disenfranchise the others by making it hard to register births in
their territory or by making it inconvenient to get an ID card. Sometimes cards
are reissued in the run-up to an election in order to refresh or reinforce the
discrimination. Cards can be tied to business permits and welfare payments;
delays can be used to extract bribes. Some countries (such as Brazil) have
separate registration systems at the state and federal level, while others (such
as Malawi) have left most of their population unregistered. There are many
excluded groups, such as refugee children born outside the country of their
parents’ nationality, and groups made stateless for religious or ideological
reasons. Target 16.9 of the United Nations’ Sustainable Development Goals is
to ‘provide legal identity for all, including birth registration’; and a number of
companies sell ID systems and voting systems financed by development aid.
These interact with governments in all sorts of complex ways, and there’s a
whole research community that studies this [89]. Oh, and if you think this is
a third-world problem, there are several US states using onerous registration
procedures to make it harder for Black people to vote; and in the Windrush
scandal, it emerged that the UK government had deported a number of
foreign-born UK residents who were automatically entitled to citizenship as
they had not maintained a good enough paper trail of their citizenship to
satisfy increasingly xenophobic ministers.
In short, the hidden assumptions about the relationship between governments and people’s names vary in ways that constrain system design and
cause unexpected failures when assumptions are carried across borders. The
engineer must always be alert to the fact that a service-oriented ID is one thing
and a legal identity or certificate of citizenship is another. Governments are
forever trying to entangle the two, but this leads to all sorts of pain.
7.4.2.3 Semantic content of names
Changing from one type of name to another can be hazardous. A bank got sued
after they moved from storing customer data by account number to storing it
by name and address. They wrote a program to link up all the accounts operated by each of their customers, in the hope that it would help them target
junk mail more accurately. The effect on one customer was serious: the bank
statement for the account he kept for his mistress got sent to his wife, who
divorced him.
The semantics of names can change over time. In many transport systems,
tickets and toll tags can be bought for cash, which defuses privacy concerns, but
it’s more convenient to link them to bank accounts, and these links accumulate
over time. The card that UK pensioners use to get free bus travel also started out
267
268
Chapter 7
■
Distributed Systems
anonymous, but in practice the bus companies try to link up the card numbers
to other passenger identifiers. In fact, I once got a hardware store loyalty card
with a random account number (and no credit checks). I was offered the chance
to change this into a bank card after the store was taken over by a supermarket
and the supermarket started a bank.
7.4.2.4 Uniqueness of names
Human names evolved when we lived in small communities. We started
off with just forenames, but by the late Middle Ages the growth of travel
led governments to bully people into adopting surnames. That process
took a century or so and was linked with the introduction of paper into
Europe as a lower-cost and more tamper-resistant replacement for parchment;
paper enabled the badges, seals and other bearer tokens, which people had
previously used for road tolls and the like, to be replaced with letters that
mentioned their names.
The mass movement of people, business and administration to the Internet has been too fast for social adaptation. There are now way more people
(and systems) online than we’re used to dealing with. So how can we make
human-memorable names unique? As we discussed above, Facebook tells one
John Smith from another the way humans do, by clustering each one with his
set of friends and adding a photo.
Perhaps the other extreme is cryptographic names. Names are hashes either
of public keys or of other stable attributes of the object being named. All sorts
of mechanisms have been proposed to map real-world names, addresses and
even document content indelibly and eternally on to the bitstring outputs
of hash functions (see, for example, [846]). You can even use hashes of
biometrics or the surface microstructure of objects, coupled with a suitable
error-correction code. The world of cryptocurrency and blockchains makes
much use of hash-based identifiers. Such mechanisms can make it impossible
to reuse names; as expired domain names are often bought by bad people and
exploited, this is sometimes important.
This isn’t entirely new, as it has long been common in transaction processing to just give everything and everyone a number. This can lead to failures,
though, if you don’t put enough uniqueness in the right place. For example, a
UK bank assigned unique sequence numbers to transactions by printing them
on the stationery used to capture the deal. Once, when they wanted to send
£20m overseas, the operator typed in £10m by mistake. A second payment
of £10m was ordered – but this acquired the same transaction sequence number from the paperwork. So two payments were sent to SWIFT with the same
date, payee, amount and sequence number – and the second was discarded as
a duplicate [310].
7.4 Naming
7.4.2.5 Stability of names and addresses
Many names include some kind of address, yet addresses change. While we
still had a phone book in Cambridge, about a quarter of the addresses changed
every year; with work email, the turnover is probably higher. When we tried
in the late 1990s to develop a directory of people who use encrypted email,
together with their keys, we found that the main cause of changed entries
was changes of email address [104]. (Some people had assumed it would
be the loss or theft of keys; the contribution from this source was precisely
zero.) Things are perhaps more stable now. Most people try to keep their
personal mobile phone numbers, so they tend to be long-lived, and the same
goes increasingly for personal email addresses. The big service providers like
Google and Microsoft generally don’t issue the same email address twice, but
other firms still do.
Distributed systems pioneers considered it a bad thing to put addresses in
names [1355]. But hierarchical naming systems can involve multiple layers
of abstraction with some of the address information at each layer forming
part of the name at the layer above. Also, whether a namespace is better
flat depends on the application. Often people end up with different names
at the departmental and organisational level (such as rja14@cam.ac.uk and
ross.anderson@cl.cam.ac.uk in my own case). So a clean demarcation
between names and addresses is not always possible.
Authorisations have many (but not all) of the properties of addresses. Kent’s
Law tells designers that if a credential contains a list of what it may be used for,
then the more things there are on this list the shorter its period of usefulness.
A similar problem besets systems where names are composite. For example,
some online businesses recognize me by the combination of email address and
credit card number. This is clearly bad practice. Quite apart from the fact that
I have several email addresses, I have several credit cards.
There are good reasons to use pseudonyms. Until Facebook came along, people considered it sensible for children and young people to use online names
that weren’t easily linkable to their real names and addresses. When you go
for your first job on leaving college aged 22, or for a CEO’s job at 45, you don’t
want a search to turn up all your teenage rants. Many people also change email
addresses from time to time to escape spam; I used to give a different email
address to every website where I shop. On the other hand, some police and
other agencies would prefer people not to use pseudonyms, which takes us
into the whole question of traceability online – which I’ll discuss in Part 2.
7.4.2.6 Restrictions on the use of names
The interaction between naming and society brings us to a further problem:
some names may be used only in restricted circumstances. This may be laid
269
270
Chapter 7
■
Distributed Systems
down by law, as with the US social security number and its equivalents in some
other countries. Sometimes it is a matter of marketing: a significant minority
of customers avoid websites that demand too much information.
Restricted naming systems interact in unexpected ways. For example, it’s
fairly common for hospitals to use a patient number as an index to medical
record databases, as this may allow researchers to use pseudonymous records
for some purposes. This causes problems when a merger of health maintenance organisations, or a policy change, forces the hospital to introduce
uniform names. There have long been tussles in Britain’s health service, for
example, about which pseudonyms can be used for which purposes.
Finally, when we come to law and policy, the definition of a name throws
up new and unexpected gotchas. For example, regulations that allow police
to collect communications data – that is, a record of who called whom and
when – are usually much more lax than the regulations governing phone
tapping; in many countries, police can get communications data just by
asking the phone company. This led to tussles over the status of URLs, which
contain data such as the parameters passed to search engines. Clearly some
policemen would like a list of everyone who hit a URL like http://www
.google.com/search?q=cannabis+cultivation; just as clearly, many people would consider such large-scale trawling to be an unacceptable invasion
of privacy. The resolution in UK law was to define traffic data as that which
was sufficient to identify the machine being communicated with, or in lay
language ‘Everything up to the first slash.’ I discuss this in much more detail
later, in the chapter ‘Surveillance or Privacy?’
7.4.3 Types of name
Not only is naming complex at all levels – from the technical up through the
organisational to the political – but some of the really wicked issues go across
levels. I noted in the introduction that names can refer not just to persons (and
machines acting on their behalf), but also to organisations, roles (‘the officer of
the watch’), groups, and compound constructions: principal in role – Alice as
manager; delegation – Alice for Bob; conjunction – Alice and Bob. Conjunction
often expresses implicit access rules: ‘Alice acting as branch manager plus Bob
as a member of the group of branch accountants’.
That’s only the beginning. Names also apply to services (such as NFS, or a
public-key infrastructure) and channels (which might mean wires, ports or
crypto keys). The same name might refer to different roles: ‘Alice as a computer
game player’ ought to have less privilege than ‘Alice the system administrator’.
The usual abstraction used in the security literature is to treat them as different
principals. So there’s no easy mapping between names and principals, especially when people bring their own devices to work or take work devices home,
and therefore may have multiple conflicting names or roles on the same platform. Many organisations are starting to distinguish carefully between ‘Alice
7.5 Summary
in person’, ‘Alice as a program running on Alice’s home laptop’ and ‘a program
running on Alice’s behalf on the corporate cloud’, and we discussed some of
the possible mechanisms in the chapter on access control.
Functional tensions are often easier to analyse if you work out how they’re
driven by the underlying business processes. Businesses mainly want to get
paid, while governments want to identify people uniquely. In effect, business
wants your credit card number while government wants your passport number. An analysis based on incentives can sometimes indicate whether a naming
system might be better open or closed, local or global, stateful or stateless – and
whether the people who maintain it are the same people who will pay the costs
of failure (economics is one of the key issues for dependability,and is the subject
of the next chapter).
Finally, although I’ve illustrated many of the problems of naming with
respect to people – as that makes the problems more immediate and compelling – many of the same problems pop up in various ways for cryptographic
keys, unique product codes, document IDs, file names, URLs and much more.
When we dive into the internals of a modern corporate network we may find
DNS Round Robin to multiple machines, each on its own IP addresses, behind
a single name; or Anycast to multiple machines, each on the same IP address,
behind a single name; or Cisco’s HSRP protocol, where the IP address and the
Ethernet MAC address move from one router to another router. (I’ll discuss
more technical aspects of network security in Part 2.) Anyway, as systems
scale, it becomes less realistic to rely on names that are simple, interchangeable
and immutable. You need to scope naming carefully, understand who controls
the names on which you rely, work out how slippery they are, and design
your system to be dependable despite their limitations.
7.5 Summary
Many secure distributed systems have incurred large costs, or developed serious vulnerabilities, because their designers ignored the basics of how to build
(and how not to build) distributed systems. Most of these basics have been in
computer science textbooks for a generation.
Many security breaches are concurrency failures of one kind or another;
systems use old data, make updates inconsistently or in the wrong order, or
assume that data are consistent when they aren’t or even can’t be. Using time
to order transactions may help, but knowing the right time is harder than
it seems.
Fault tolerance and failure recovery are critical. Providing the ability to
recover from security failures, as well as from random physical and software
failures, is the main purpose of the protection budget for many organisations.
At a more technical level, there are significant interactions between protection
271
272
Chapter 7
■
Distributed Systems
and resilience mechanisms. Byzantine failure – where defective processes
conspire rather than failing randomly – is an issue, and it interacts with our
choice of cryptographic tools.
There are many different flavors of redundancy, and we have to use the right
combination. We need to protect not just against failures and attempted manipulation, but also against deliberate attempts to deny service that may be part
of larger attack plans.
Many problems also arise from trying to make a name do too much, or
making assumptions about it which don’t hold outside of one particular
system, culture or jurisdiction. For example, it should be possible to revoke a
user’s access to a system by cancelling their user name without getting sued
on account of other functions being revoked. The simplest solution is often
to assign each principal a unique identifier used for no other purpose, such
as a bank account number or a system logon name. But many problems arise
when merging two systems that use naming schemes that are incompatible.
Sometimes this can even happen by accident.
Research problems
I’ve touched on many technical issues in this chapter, from secure time
protocols to the complexities of naming. But perhaps the most important
research problem is to work out how to design systems that are resilient in the
face of malice, that degrade gracefully, and whose security can be recovered
simply once the attack is past. All sorts of remedies have been pushed in
the past, from getting governments to issue everyone with ID to putting it
all on the blockchain. However these magic bullets don’t seem to kill any of
the goblins.
It’s always a good idea for engineers to study failures; we learn more from
the one bridge that falls down than from the thousand that don’t. We now have
a growing number of failed ID systems, such as the UK government’s Verify
scheme – an attempt to create a federated logon system for public service that
was abandoned in 2019 [1394]. There is a research community that studies failures of ID systems in less developed countries [89]. And then there’s the failure
of blockchains to live up to their initial promise, which I’ll discuss in Part 2 of
this book.
Perhaps we need to study more carefully the conditions under which we
can recover neatly from corrupt security state. Malware and phishing attacks
mean that at any given time a small (but nonzero) proportion of customer bank
accounts are under criminal control. Yet the banking system carries on. The proportion of infected laptops, and phones, varies quite widely by country, and the
effects might be worth more careful study.
Further reading
Classical computer science theory saw convergence in distributed systems as
an essentially technical problem, whose solution depended on technical properties (at one level, atomicity, consistency, isolation and durability; at another,
digital signatures, dual control and audit). Perhaps we need a higher-level view
in which we ask how we obtain sufficient agreement about the state of the
world and incorporate not just technical resilience mechanisms and protection
technologies, but also the mechanisms whereby people who have been victims
of fraud obtain redress. Purely technical mechanisms that try to obviate the
need for robust redress may actually make things worse.
Further reading
If the material in this chapter is unfamiliar to you, you may be coming to
the subject from a maths/crypto background or chips/engineering or even
law/policy. Computer science students get many lectures on distributed
systems; to catch up, I’d suggest Saltzer and Kaashoek [1643]. Other books
we’ve recommended to our students over the years include Tanenbaum and
van Steen [1863] and Mullender [1355]. A 2003 report from the US National
Research Council, ‘Who Goes There? Authentication Through the Lens of Privacy’,
discusses the tradeoffs between authentication and privacy and how they
tend to scale poorly [1041]. Finally, there’s a recent discussion of naming by
Pat Helland [882].
273
CHAPTER
8
Economics
The great fortunes of the information age lie in the hands of companies that have established
proprietary architectures that are used by a large installed base of locked-in customers.
– CARL SHAPIRO AND HAL VARIAN
There are two things I am sure of after all these years: there is a growing societal need for high
assurance software, and market forces are never going to provide it.
– EARL BOEBERT
The law locks up the man or woman
Who steals the goose from off the common
But leaves the greater villain loose
Who steals the common from the goose.
– TRADITIONAL, 17th CENTURY
8.1 Introduction
Round about 2000, we started to realise that many security failures weren’t due
to technical errors so much as to wrong incentives: if the people who guard a
system are not the people who suffer when it fails, then you can expect trouble.
In fact, security mechanisms are often designed deliberately to shift liability,
which can lead to even worse trouble.
Economics has always been important to engineering, at the raw level of cost
accounting; a good engineer was one who could build a bridge safely with a
thousand tons of concrete when everyone else used two thousand tons. But the
perverse incentives that arise in complex systems with multiple owners make
economic questions both more important and more subtle for the security
engineer. Truly global-scale systems like the Internet arise from the actions of
millions of independent principals with divergent interests; we hope that
reasonable global outcomes will result from selfish local actions. The outcome
we get is typically a market equilibrium, and often a surprisingly stable one.
Attempts to make large complex systems more secure, or safer, will usually
fail if this isn’t understood. At the macro level, cybercrime patterns have
275
276
Chapter 8
■
Economics
been remarkably stable through the 2010s even though technology changed
completely, with phones replacing laptops, with society moving to social
networks and servers moving to the cloud. Network insecurity is somewhat
like air pollution or congestion, in that people who connect insecure machines
to the Internet do not bear the full consequences of their actions while people
who try to do things right suffer the side-effects of others’ carelessness.
In general, people won’t change their behaviour unless they have an incentive to. If their actions take place in some kind of market, then the equilibrium
will be where the forces pushing and pulling in different directions balance
each other out. But markets can fail; the computer industry has been dogged
by monopolies since its earliest days. The reasons for this are now understood,
and their interaction with security is starting to be.
Security economics has developed rapidly as a discipline since the early
2000s. It provides valuable insights not just into ‘security’ topics such as
privacy, bugs, spam, and phishing, but into more general areas of system
dependability. For example, what’s the optimal balance of effort by programmers and testers? (For the answer, see section 8.6.3.) It also enables us to
analyse many important policy problems – such as the costs of cybercrime and
the most effective responses to it. And when protection mechanisms are used
to limit what someone can do with their possessions or their data, questions
of competition policy and consumer rights follow – which we need economics
to analyse. There are also questions of the balance between public and private
action: how much of the protection effort should be left to individuals, and
how much should be borne by vendors, regulators or the police? Everybody
tries to pass the buck.
In this chapter I first describe how we analyse monopolies in the classical
economic model, how information goods and services markets are different,
and how network effects and technical lock-in make monopoly more likely.
I then look at asymmetric information, another source of market power. Next
is game theory, which enables us to analyse whether people will cooperate or
compete; and auction theory, which lets us understand the working of the ad
markets that drive much of the Internet – and how they fail. These basics then
let us analyse key components of the information security ecosystem, such as
the software patching cycle. We also get to understand why systems are less
reliable than they should be: why there are too many vulnerabilities and why
too few cyber-crooks get caught.
8.2 Classical economics
Modern economics is an enormous field covering many different aspects
of human behaviour. The parts of it that have found application in security so far are largely drawn from microeconomics, game theory and
8.2 Classical economics
behavioral economics. In this section, I’ll start with a helicopter tour of the
most relevant ideas from microeconomics. My objective is not to provide a
tutorial on economics, but to get across the basic language and ideas, so we
can move on to discuss security economics.
The modern subject started in the 18th century when growing trade
changed the world, leading to the industrial revolution, and people wanted
to understand what was going on. In 1776, Adam Smith’s classic ‘The Wealth
of Nations’ [1792] provided a first draft: he explained how rational self-interest
in a free market leads to progress. Specialisation leads to productivity gains,
as people try to produce something others value to survive in a competitive
market. In his famous phrase, “It is not from the benevolence of the butcher,
the brewer, or the baker, that we can expect our dinner, but from their regard
to their own interest.” The same mechanisms scale up from a farmers’ market
or small factory to international trade.
These ideas were refined by nineteenth-century economists; David Ricardo
clarified and strengthened Smith’s arguments in favour of free trade, while
Stanley Jevons, Léon Walras and Carl Menger built detailed models of supply
and demand. One of the insights from Jevons and Menger is that the price of
a good, at equilibrium in a competitive market, is the marginal cost of production. When coal cost nine shillings a ton in 1870, that didn’t mean that every
mine dug coal at this price, merely that the marginal producers – those who
were only just managing to stay in business – could sell at that price. If the
price went down, these mines would close; if it went up, even more marginal
mines would open. That’s how supply responded to changes in demand. (It
also gives us an insight into why so many online services nowadays are free;
as the marginal cost of duplicating information is about zero, lots of online
businesses can’t sell it and have to make their money in other ways, such as
from advertising. But we’re getting ahead of ourselves.)
By the end of the century Alfred Marshall had combined models of supply
and demand in markets for goods, labour and capital into an overarching ‘classical’ model in which, at equilibrium, all the excess profits would be competed
away and the economy would be functioning efficiently. By 1948, Kenneth
Arrow and Gérard Debreu had put this on a rigorous mathematical foundation
by proving that markets give efficient outcomes, subject to certain conditions,
including that the buyers and sellers have full property rights, that they have
complete information, that they are rational and that the costs of doing transactions can be neglected.
Much of the interest in economics comes from the circumstances in which
one or more of these conditions aren’t met. For example, suppose that transactions have side-effects that are not captured by the available property rights.
Economists call these externalities, and they can be either positive or negative.
An example of a positive externality is scientific research, from which everyone can benefit once it’s published. As a result, the researcher doesn’t capture
277
278
Chapter 8
■
Economics
the full benefit of their work, and we get less research than would be ideal
(economists reckon we do only a quarter of the ideal amount of research). An
example of a negative externality is environmental pollution; if I burn a coal
fire, I get the positive effect of heating my house but my neighbour gets the
negative effect of smell and ash, while everyone shares the negative effect of
increased CO2 emissions.
Externalities, and other causes of market failure, are of real importance to the
computer industry, and to security folks in particular, as they shape many of
the problems we wrestle with, from industry monopolies to insecure software.
Where one player has enough power to charge more than the market clearing price, or nobody has the power to fix a common problem, then markets
alone may not be able to sort things out. Strategy is about acquiring power, or
preventing other people having power over you; so the most basic business
strategy is to acquire market power in order to extract extra profits, while distributing the costs of your activity on others to the greatest extent possible.
Let’s explore that now in more detail.
8.2.1 Monopoly
As an introduction, let’s consider a textbook case of monopoly. Suppose we
have a market for apartments in a university town, and the students have different incomes. We might have one rich student able to pay $4000 a month,
maybe 300 people willing to pay at least $2000 a month, and (to give us round
numbers) at least 1000 prepared to pay at least $1000 a month. That gives us
the demand curve shown in Figure 8.1.
So if there are 1000 apartments being let by many competing landlords, the
market-clearing price will be at the intersection of the demand curve with the
vertical supply curve, namely $1000. But suppose the market is rigged – say
the landlords have set up a cartel, or the university makes its students rent
through a tied agency. A monopolist landlord examines the demand curve, and
notices that if he rents out only 800 apartments, he can get $1400 per month for
each of them. Now 800 times $1400 is $1,120,000 per month, which is more
than the million dollars a month he’ll make from the market price at $1000.
(Economists would say that his ‘revenue box’ is the box CBFO rather than
EDGO in Figure 8.1.) So he sets an artificially high price, and 200 apartments
remain empty.
This is clearly inefficient, and the Italian economist Vilfredo Pareto invented
a neat way to formalise this. A Pareto improvement is any change that would
make some people better off without making anyone else worse off, and an
allocation is Pareto efficient if there isn’t any Pareto improvement available.
8.2 Classical economics
price
$4000 pm A
supply
$1400 pm
$1000 pm
C
E
B
D
demand
F
G
800 1000
apartments
Figure 8.1: The market for apartments
Here, the allocation is not efficient, as the monopolist could rent out one empty
apartment to anyone at a lower price, making both him and them better off.
Now Pareto efficiency is a rather weak criterion; both perfect communism
(everyone gets the same income) and perfect dictatorship (the king gets the
lot) are Pareto-efficient. In neither case can you make anyone better off without
making someone else worse off! Yet the simple monopoly described here is
not efficient even in this very weak sense.
So what can the monopolist do? There is one possibility – if he can charge
everyone a different price, then he can set each student’s rent at exactly
what they are prepared to pay. We call such a landlord a price-discriminating
monopolist; he charges the rich student exactly $4000, and so on down to
the 1000th student whom he charges exactly $1000. The same students get
apartments as before, yet almost all of them are worse off. The rich student
loses $3000, money that he was prepared to pay but previously didn’t have
to; economists refer to this money he saved as surplus. The discriminating
monopolist manages to extract all the consumer surplus.
Merchants have tried to price-discriminate since antiquity. The carpet seller
in Istanbul who expects you to haggle down his price is playing this game, as
is an airline selling first, business and cattle class seats. The extent to which
firms can charge people different prices depends on a number of factors,
principally their market power and their information asymmetry. Market power is
a measure of how close a merchant is to being a monopolist; under monopoly
the merchant is a price setter, while under perfect competition he is a price taker
279
280
Chapter 8
■
Economics
who has to accept whatever price the market establishes. Merchants naturally
try to avoid this. Information asymmetry can help them in several ways.
A carpet seller has much more information about local carpet prices than a
tourist who’s passing through, and who won’t have the time to haggle in ten
different shops. So the merchant may prefer to haggle rather than display fixed
prices. An airline is slightly different. Thanks to price-comparison sites, its
passengers have good information on base prices, but if it does discount to fill
seats, it may be able to target its offers using information from the advertising
ecosystem. It can also create its own loyalty ecosystem by offering occasional
upgrades. Technology tends to make firms more like airlines and less like small
carpet shops; the information asymmetry isn’t so much whether you know
about average prices, as what the system knows about you and how it locks
you in.
Monopoly can be complex. The classic monopolist, like the landlord or cartel in our example, may simply push up prices for everyone, resulting in a
clear loss of consumer surplus. Competition law in the USA looks for welfare loss of this kind, which often happens where a cartel operates price discrimination. During the late 19th century, railroad operators charged different
freight rates to different customers, depending on how profitable they were,
how perishable their goods were and other factors – basically, shaking them
all down according to their ability to pay. This led to massive resentment and
to railway regulation. In the same way, telcos used to price-discriminate like
crazy; SMSes used to cost a lot more than voice, and voice a lot more than
data, especially over distance. This led to services like Skype and WhatsApp
which use data services to provide cheaper calls and messaging, and also to
net neutrality regulation in a number of countries. This is still a tussle space,
with President Trump’s appointee at the FCC reversing many previous net
neutrality rulings.
However, many firms with real market power like Google and Facebook give
their products away free to most of their users, while others, like Amazon (and
Walmart), cut prices for their customers. This challenges the traditional basis
that economists and lawyers used to think about monopoly, in the USA at least.
Yet there’s no doubt about monopoly power in tech. We may have gone from
one dominant player in the 1970s (IBM) to two in the 1990s (Microsoft and Intel)
and a handful now (Google, Facebook, Amazon, Microsoft, maybe Netflix)
but each dominates its field; although Arm managed to compete with Intel,
there has been no new search startup since Bing in 2009 (whose market share
is slipping), and no big social network since Instagram in 2011 (now owned by
Facebook). So there’s been a negative effect on innovation, and the question
what we do about it is becoming a hot political topic. The EU has fined tech
majors multiple times for competition offences.
To understand what’s going on, we need to dive more deeply into how information monopolies work.
8.3 Information economics
8.3 Information economics
The information and communications industries are different from traditional
manufacturing in a number of ways, and among the most striking is that these
markets have been very concentrated for generations. Even before computers
came along, newspapers tended to be monopolies, except in the biggest cities.
Much the same happened with railways, and before that with canals. When
electrical tabulating equipment came along in the late 19th century, it was dominated by NCR, until a spin-off from NCR’s Manhattan sales office called IBM
took over. IBM dominated the computer industry in the 1960s and 70s, then
Microsoft came along and took pole position in the 90s. Since then, Google and
Facebook have come to dominate advertising, Apple and Google sell phone
operating systems, ARM and Intel do CPUs, while many other firms dominate
their own particular speciality. Why should this be so?
8.3.1 Why information markets are different
Recall that in a competitive equilibrium, the price of a good should be its
marginal cost of production. But for information that’s almost zero! That’s
why there is so much free stuff online; zero is its fair price. If two or more
suppliers compete to offer an operating system, or a map, or an encyclopedia,
that they can duplicate for no cost, then they will keep on cutting their prices
without limit. Take for example encyclopedias; the Britannica used to cost
$1,600 for 32 volumes; then Microsoft brought out Encarta for $49.95, forcing
Britannica to produce a cheap CD edition; and now we have Wikipedia for
free [1721]. One firm after another has had to move to a business model in
which the goods are given away free, and the money comes from advertising
or in some parallel market. And it can be hard to compete with services that
are free, or are so cheap it’s hard to recoup the capital investment you need to
get started. So other industries with high fixed costs and low marginal costs
tend to be concentrated – such as newspapers, airlines and hotels.
Second, there are often network externalities, whereby the value of a network
grows more than linearly in the number of users. Networks such as the telephone and email took some time to get going because at the start there were
only a few other enthusiasts to talk to, but once they passed a certain threshold in each social group, everyone needed to join and the network rapidly
became mainstream. The same thing happened again with social media from
the mid-2000s; initially there were 40–50 startups doing social networks, but
once Facebook started to pull ahead, suddenly all young people had to be there,
as that was where all your friends were, and if you weren’t there then you
missed out on the party invitations. This positive feedback is one of the mechanisms by which network effects can get established. It can also operate in a
281
282
Chapter 8
■
Economics
two-sided market which brings together two types of user. For example, when
local newspapers got going in the nineteenth century, businesses wanted to
advertise in the papers with lots of readers, and readers wanted papers with
lots of small ads so they could find stuff. So once a paper got going, it often
grew to be a local monopoly; it was hard for a competitor to break in. The same
thing happened when the railways allowed the industrialisation of agriculture;
powerful firms like Cargill and Armour owned the grain elevators and meat
packers, dealing with small farmers on one side and the retail industry on the
other. We saw the same pattern in the 1960s when IBM mainframes dominated
computing: firms used to develop software for IBM as they’d have access to
more users, while many users bought IBM because there was more software
for it. When PCs came along, Microsoft beat Apple for the same reason; and
now that phones are replacing laptops, we see a similar pattern with Android
and iPhone. Another winner was eBay in the late 1990s: most people wanting
to auction stuff will want to use the largest auction, as it will attract more bidders. Network effects can also be negative; once a website such as Myspace
starts losing custom, negative feedback can turn the loss into a rout.
Third, there are various supply-side scale economies enjoyed by leading
information services firms, ranging from access to unmatchable quantities of
user data to the ability to run large numbers of A/B tests to understand user
preferences and optimise system performance. These enable early movers to
create, and incumbents to defend, competitive advantage in service provision.
Fourth, there’s often lock-in stemming from interoperability, or a lack thereof.
Once a software firm commits to using a platform such as Windows or Oracle for its product, it can be expensive to change. This has both technical and
human components, and the latter are often dominant; it’s cheaper to replace
tools than to retrain programmers. The same holds for customers, too: it can
be hard to close a sale if they not only have to buy new software and convert
files, but retrain their staff too. These switching costs deter migration. Earlier
platforms where interoperability mattered included the telephone system, the
telegraph, mains electricity and even the railways.
These four features separately – low marginal costs, network externalities,
supply-side scale economies and technical lock-in – can lead to industries with
dominant firms; in combination, they are even more likely to. If users want to
be compatible with other users (and with vendors of complementary products
such as software) then they will logically buy from the vendor they expect to
win the biggest market share.
8.3.2 The value of lock-in
There is an interesting result, due to Carl Shapiro and Hal Varian: that the value
of a software company is the total lock-in (due to both technical and network
8.3 Information economics
effects) of all its customers [1721]. To see how this might work, consider a firm
with 100 staff each using Office, for which it has paid $150 per copy. It could
save this $15,000 by moving to a free program such as LibreOffice, so if the costs
of installing this product, retraining its staff, converting files and so on – in
other words the total switching costs – were less than $15,000, it would switch.
But if the costs of switching were more than $15,000, then Microsoft would put
up its prices.
As an example of the link between lock-in, pricing and value, consider how
prices changed over a decade. In the second edition of this book, this example
had the cost of Office as $500; since then, cloud-based services that worked just
like Office, such as Google Docs, cut the costs of switching – so Microsoft had to
slash its prices. As I started writing this edition in 2019, I saw standalone Office
for sale at prices ranging between $59.99 and £164. Microsoft’s response since
2013 has been trying to move its customers to an online subscription service
(Office365) which costs universities a few tens of pounds per seat depending on
what options they choose and how good they are at negotiating, while Google
is also trying to move organisations away from their free services to paid G
Suite versions that cost about the same. Charging $30 a year for an online service is better business than charging $60 for a program that the customer might
use for five years or even seven. When I revised this chapter in 2020, I saw I can
now get a ‘lifetime key’ for about double the cost of a standalone product last
year. There’s a new form of lock-in, namely that the cloud provider now looks
after all your data.
Lock-in explains why so much effort gets expended in standards wars and
antitrust suits. It also helps explain the move to the cloud (though cost cutting is
a bigger driver). It’s also why so many security mechanisms aim at controlling
compatibility. In such cases, the likely attackers are not malicious outsiders,
but the owners of the equipment, or new firms trying to challenge the incumbent by making compatible products. This doesn’t just damage competition,
but innovation too. Locking things down too hard can also be bad for business,
as innovation is often incremental, and products succeed when new firms find
killer applications for them [905]. The PC, for example, was designed by IBM
as a machine to run spreadsheets; if they had locked it down to this application
alone, then a massive opportunity would have been lost. Indeed, the fact that
the IBM PC was more open than the Apple Mac was a factor in its becoming the
dominant desktop platform. (That Microsoft and Intel later stole IBM’s lunch
is a separate issue.)
So the law in many countries gives companies a right to reverse-engineer
their competitors’ products for compatibility [1650]. Incumbents try to build
ecosystems in which their offerings work better together than with their competitors’. They lock down their products using digital components such as
cloud services and cryptography so that even if competitors have the legal right
to try to reverse engineer these products, they are not always going to succeed
283
284
Chapter 8
■
Economics
in practice. Incumbents also use their ecosystems to learn a lot about their customers, the better to lock them in; and a variety of digital mechanisms are used
to control aftermarkets and enforce planned obsolescence. I will discuss these
more complex ecosystem strategies in more detail below in section 8.6.4.
8.3.3 Asymmetric information
Another way markets can fail, beyond monopoly and public goods, is when
some principals know more than others, or know it slightly earlier, or can find
it out more cheaply. We discussed how an old-fashioned carpet trader has an
information advantage over tourists buying in his store; but the formal study of
asymmetric information was kicked off by a famous paper in 1970 on the ‘market
for lemons’ [35], for which George Akerlof won a Nobel prize. It presents the
following simple yet profound insight: suppose that there are 100 used cars
for sale in a town: 50 well-maintained cars worth $2000 each, and 50 ‘lemons’
worth $1000. The sellers know which is which, but the buyers don’t. What is
the market price of a used car?
You might think $1500; but at that price, no good cars will be offered for sale.
So the market price will be close to $1000. This is why, if you buy a new car,
maybe 20% falls off the price the second you drive it out of the dealer’s lot.
Asymmetric information is also why poor security products dominate some
markets. When users can’t tell good from bad, they might as well buy the
cheapest. When the market for antivirus software took off in the 1990s, people
would buy the $10 product rather than the $20 one. (Nowadays there’s much
less reason to buy AV, as the malware writers test their code against all available
products before releasing it – you should focus on patching systems instead.
That people still buy lots of AV is another example of asymmetric information.)
A further distinction can be drawn between hidden information and hidden action. For example, Volvo has a reputation for building safe cars that help
their occupants survive accidents, yet Volvo drivers have more accidents. Is this
because people who know they’re bad drivers buy Volvos so they’re less likely
to get killed, or because people in Volvos believe they’re safer and drive faster?
The first is the hidden-information case, also known as adverse selection, while
the second is the hidden-action case, also known as moral hazard. Both effects
are important in security, and both may combine in specific cases. (In the case
of drivers, people adjust their driving behaviour to keep their risk exposure
at the level with which they’re comfortable. This also explains why mandatory seat-belt laws tend not to save lives overall, merely to move fatalities from
vehicle occupants to pedestrians and cyclists [19].)
Asymmetric information explains many market failures in the real world,
from low prices in used-car markets to the high price of cyber-risks insurance
(firms who know they cut corners may buy more of it, making it expensive
for the careful). In the world of information security, it’s made worse by the
8.3 Information economics
fact that most stakeholders are not motivated to tell the truth; police and intelligence agencies, as well as security vendors, try to talk up the threats while
software vendors, e-commerce sites and banks downplay them [112].
8.3.4 Public goods
An interesting case of positive externalities is when everyone gets the same
quantity of some good, whether they want it or not. Classic examples are air
quality, national defense and scientific research. Economists call these public
goods, and the formal definition is that such goods are non-rivalrous (my using
them doesn’t mean there’s less for you) and non-excludable (there’s no practical way to stop people consuming them). Uncoordinated markets are generally
unable to provide public goods in socially optimal quantities.
Public goods may be supplied by governments directly, as with national
defense, or by using indirect mechanisms such as laws on patents and copyrights to encourage people to produce inventions, books and music by giving
them a temporary monopoly. Very often, public goods are provided by some
mix of public and private action; scientific research is done in universities
that get some public subsidy, earn some income from student fees, and get
some research contracts from industry (which may get patents on the useful
inventions).
Many aspects of security are public goods. I do not have an anti-aircraft gun
on the roof of my house; air-defense threats come from a small number of
actors, and are most efficiently dealt with by government action. So what about
Internet security? Certainly there are strong externalities; people who connect
insecure machines to the Internet end up dumping costs on others, as they
enable bad actors to build botnets. Self-protection has some aspects of a public
good, while insurance is more of a private good. So what should we do about it?
The answer may depend on whether the bad actors we’re concerned with
are concentrated or dispersed. In our quick survey of cybercrime in section 2.3
we noted that many threats have consolidated as malware writers, spammers
and others have become commercial. By 2007, the number of serious spammers had dropped to a handful, and by 2020, the same had become true of
denial-of-service (DoS) attacks: there seems to be one dominant DoS-for-hire
provider. This suggests a more centralised defence strategy, namely, finding
the bad guys and throwing them in jail.
Some have imagined a gentler government response, with rewards paid to
researchers who discover vulnerabilities, paid for by fines imposed on the
firms whose software contained them. To some extent this happens already
via bug bounty programs and vulnerability markets, without government
intervention. But a cynic will point out that in real life what happens is that
vulnerabilities are sold to cyber-arms manufacturers who sell them to governments who then stockpile them – and industry pays for the collateral damage,
285
286
Chapter 8
■
Economics
as with NotPetya. So is air pollution the right analogy – or air defense? This
brings us to game theory.
8.4 Game theory
Game theory has some of the most fundamental insights of modern economics.
It’s about when we cooperate, and when we fight.
There are really just two ways to get something you want if you can’t find or
make it yourself. You either make something useful and trade it; or you take
what you need, by force, by the ballot box or whatever. Choices between cooperation and conflict are made every day at all sorts of levels, by both humans
and animals.
The main tool we can use to study and analyse them is game theory – the
study of problems of cooperation and conflict among independent decision
makers. Game theory provides a common language used by economists, biologists and political scientists as well as computer scientists, and is a useful tool
for building collaboration across disciplines. We’re interested in games of strategy, and we try to get to the core of a decision by abstracting away much of the
detail. For example, consider the school playground game of ‘matching pennies’: Alice and Bob toss coins and reveal them simultaneously, upon which
Alice gets Bob’s penny if they’re different and Bob gets Alice’s penny if they’re
the same. I’ll write this as in Figure 8.2:
BOB
Alice
H
T
H
−1,1
1,−1
T
1,−1
−1,1
Figure 8.2: Matching pennies
Each entry in the table shows first Alice’s outcome and then Bob’s. Thus if the
coins fall (H,H) Alice loses a penny and Bob gains a penny. This is an example
of a zero-sum game: Alice’s gain is Bob’s loss.
Often we can solve a game quickly by writing out a payoff matrix like this.
Here’s an example (Figure 8.3):
BOB
Alice
Left
Right
Top
1,2
0,1
Bottom
2,1
1,0
Figure 8.3: Dominant strategy equilibrium
8.4 Game theory
In game theory, a strategy is just an algorithm that takes a game state and
outputs a move1 . In this game, no matter what Bob plays, Alice is better off
playing ‘Bottom’; and no matter what Alice plays, Bob is better off playing
‘Left’. Each player has a dominant strategy – an optimal choice regardless of
what the other does. So Alice’s strategy should be a constant ‘Bottom’ and Bob’s
a constant ‘Left’. We call this a dominant strategy equilibrium.
Another example is shown in Figure 8.4:
BOB
Alice
Left
Right
Top
2,1
0,0
Bottom
0,0
1,2
Figure 8.4: Nash equilibrium
Here each player’s optimal strategy depends on what they think the other
player will do. We say that two strategies are in Nash equilibrium when
Alice’s choice is optimal given Bob’s, and vice versa. Here there are two
symmetric Nash equilibria, at top left and bottom right. You can think of them
as being like local optima while a dominant strategy equilibrium is a global
optimum.
8.4.1 The prisoners’ dilemma
We’re now ready to look at a famous problem that applies to many situations
from international trade negotiations through cooperation between hunting
animals to whether the autonomous systems that make up the Internet cooperate effectively to protect its infrastructure. It was first studied by scientists
at the Rand corporation in 1950 in the context of US and USSR defense spending; Rand was paid to think about possible strategies in nuclear war. But they
presented it using the following simple example.
Two prisoners are arrested on suspicion of planning a bank robbery. The
police interview them separately and tell each of them: “If neither of you confesses you’ll each get a year for carrying a concealed firearm without a permit.
If only one of you confesses, he’ll go free and the other will get 6 years for
conspiracy to rob. If both of you confess, you will each get three years.”
1 In
business and politics, a strategy is a means of acquiring power, such as monopoly power or
military advantage, by a sequence of moves; the game-theoretic meaning is a somewhat simplified version, to make problems more tractable.
287
288
Chapter 8
■
Economics
What should the prisoners do? Figure 8.5 shows their payoff matrix:
Benjy
Alfie
Confess
Deny
Confess
−3,-3
0,-6
Deny
−6,0
−1,−1
Figure 8.5: The prisoners’ dilemma
When Alfie looks at this table, he will reason as follows: “If Benjy’s going to
confess then I should too as then I get 3 years rather than 6; and if he’s going to
deny then I should still confess as I’ll walk rather than doing a year”. Benjy will
reason similarly. The two of them confess, and get three years each. This is not
just a Nash equilibrium; it’s a dominant strategy equilibrium. Each prisoner
should confess regardless of what the other does.
But hang on, you say, if they had agreed to keep quiet then they’ll get a year
each, which is a better outcome for them! In fact the strategy (deny,deny) is
Pareto efficient, while the dominant strategy equilibrium is not. (That’s one
reason it’s useful to have concepts like ‘Pareto efficient’ and ‘dominant strategy
equilibrium’ rather than just arguing over ‘best’.)
So what’s the solution? Well, so long as the game is going to be played once
only, and this is the only game in town, there isn’t a solution. Both prisoners
will confess and get three years.
You may think this is fair enough, as it serves them right. However, the Prisoners’ Dilemma can be used to model all sorts of interactions where we decide
whether or not to cooperate: international trade, nuclear arms control, fisheries protection, the reduction of CO2 emissions, and the civility of political
discourse. Even matters of self-control such as obesity and addiction can be
seen as failures of cooperation with our future selves. In these applications, we
really want cooperation so we can get good outcomes, but the way a single-shot
game is structured can make them really hard to achieve. We can only change
this if somehow we can change the game itself.
There are many possibilities: there can be laws of various kinds from
international treaties on trade to the gangster’s omertà. In practice, a prisoner’s
dilemma game is changed by altering the rules or the context so as to turn it
into another game where the equilibrium is more efficient.
8.4.2 Repeated and evolutionary games
Suppose the game is played repeatedly – say Alfie and Benjy are career criminals who expect to be dealing with each other again and again. Then of course
there can be an incentive for them to cooperate. There are at least two ways of
modelling this.
8.4 Game theory
In the 1970s, Bob Axelrod started thinking about how people might play
many rounds of prisoners’ dilemma. He set up a series of competitions to
which people could submit programs, and these programs played each other
repeatedly in tournaments. He found that one of the best strategies overall
was tit-for-tat, which is simply that you cooperate in round one, and at each
subsequent round you do to your opponent what he or she did in the previous
round [148]. It began to be realised that strategy evolution could explain a lot.
For example, in the presence of noise, players tend to get locked into (defect,
defect) whenever one player’s cooperative behaviour is misread by the other
as defection. So in this case it helps to ‘forgive’ the other player from time
to time.
A parallel approach was opened up by John Maynard Smith and George
Price [1253]. They considered what would happen if you had a mixed population of aggressive and docile individuals, ‘hawks’ and ‘doves’, with the
behaviour that doves cooperate; hawks take food from doves; and hawks fight,
with a risk of death. Suppose the value of the food at each interaction is v and
the risk of death in a hawk fight is c per encounter. Then the payoff matrix looks
like Figure 8.6:
Hawk
Dove
Hawk
v−c v−c
, 2
2
v, 0
Dove
0, v
v v
,
2 2
Figure 8.6: The hawk-dove game
Here, if v > c, the whole population will become hawk, as that’s the dominant
strategy, but if c > v (fighting is too expensive) then there is an equilibrium
where the probability p that a bird is a hawk sets the hawk payoff and the dove
payoff equal, that is
v−c
v
p
+ (1 − p)v = (1 − p)
2
2
which is solved by p = v∕c. In other words, you can have aggressive and docile
individuals coexisting in a population, and the proportion of aggressive individuals will be a function of the costs of aggression; the more dangerous a
fight is, the fewer combative individuals there will be. Of course, the costs can
change over time, and diversity can be a good thing in evolutionary terms, as a
society with some hard men may be at an advantage when war breaks out. But
it takes generations for a society to move to equilibrium. Perhaps our current
high incidence of aggression reflects conditions in pre-state societies. Indeed,
anthropologists believe that tribal warfare used to be endemic in such societies;
the archaeological record shows that until states came along, about a quarter to
289
290
Chapter 8
■
Economics
a third of men and boys died of homicide [1134]. Maybe we just haven’t been
civilised long enough for evolution to catch up.
Such insights, along with Bob Axelrod’s simulation methodology, got many
people from moral philosophers to students of animal behaviour interested
in evolutionary game theory. They offer further insights into how cooperation
evolved. It turns out that many primates have an inbuilt sense of fairness and
punish individuals who are seen to be cheating – the instinct for vengeance
is one mechanism to enforce sociality. Fairness can operate in a number of
different ways at different levels. For example, doves can get a better result
against hawks if they can recognise each other and interact preferentially, giving a model for how some social movements and maybe even some religions
establish themselves [1788]. Online reputation systems, as pioneered by eBay
and now used by firms like Uber and AirBnB, perform a similar function: they
help doves avoid hawks by making interactions into iterated games.
Of course, the basic idea behind tit-for-tat goes back a long way. The Old
Testament has ‘An eye for an eye’ and the New Testament ‘Do unto others
as you’d have them do unto you’ – the latter formulation being the more
fault-tolerant – and versions of it can be found in Aristotle, in Confucius and
elsewhere. More recently, Thomas Hobbes used similar arguments in the
seventeenth century to argue that a state did not need the Divine Right of
Kings to exist, paving the way for revolutions, republics and constitutions in
the eighteenth.
Since 9/11, people have used hawk-dove games to model the ability of fundamentalists to take over discourse in religions at a time of stress. Colleagues and
I have used evolutionary games to model how insurgents organise themselves
into cells [1375]. Evolutionary games also explain why cartel-like behaviour
can appear in industries even where there are no secret deals.
For example, Internet service in the UK involves a regulated monopoly that
provides the local loop, and competing retail companies that sell Internet service to households. If the local loop costs the ISPs £6 a month, how come the
ISPs all charge about £30? Well, if one were to undercut the others, they’d all
retaliate by cutting their own prices, punishing the defector. It’s exactly the
same behavior you see where three airlines operate a profitable route, and one
lowers its prices to compete for volume; the others will often respond by cutting prices even more sharply to punish it and make the route unprofitable.
And just as airlines offer all sorts of deals, air miles and so on to confuse the
customer, so also the telecomms providers offer their own confusion pricing.
Similar structures lead to similar behaviour. Tacit collusion can happen in both
industries without the company executives actually sitting down and agreeing
to fix prices (which would be illegal). As pricing becomes more algorithmic,
both lawyers and economists may need to understand more computer science;
and computer scientists need to understand economic analysis tools such as
game theory and auction theory.
8.5 Auction theory
8.5 Auction theory
Auction theory is vital for understanding how Internet services work, and what
can go wrong. Much online activity is funded by the ad auctions run by firms
like Google and Facebook, and many e-commerce sites run as auctions.
Auctions have been around for millennia, and are the standard way of selling
livestock, fine art, mineral rights, bonds and much else; many other transactions from corporate takeovers to house sales are also really auctions. They are
the fundamental way of discovering prices for unique goods. There are many
issues of game play, asymmetric information, cheating – and some solid theory
to guide us.
Consider the following five traditional types of auction.
1. In the English, or ascending-bid, auction, the auctioneer starts at a
reserve price and then raises the price until only one bidder is left. This is
used to sell art and antiques.
2. In the Dutch, or descending-bid, auction, the auctioneer starts out at a
high price and cuts it gradually until someone bids. This is used to sell
flowers.
3. In the first-price sealed-bid auction, each bidder is allowed to make
one bid. After bidding closes, all the bids are opened and the highest bid wins. This has been used to auction TV rights; it’s also used
for government contracts, where it’s the lowest bid that wins.
4. In the second-price sealed-bid auction, or Vickrey auction, we also
get sealed bids and the highest bid wins, but that bidder pays
the price in the second-highest bid. This is familiar from eBay,
and is also how online ad auctions work; it evolved to sell rare
postage stamps, though the earliest known use was by the poet
Goethe to sell a manuscript to a publisher in the 18th century.
5. In the all-pay auction, every bidder pays at every round, until all but one
drop out. This is a model of war, litigation, or a winner-take-all market
race between several tech startups. It’s also used for charity fundraising.
The first key concept is strategic equivalence. The Dutch auction and the
first-price sealed-bid auction give the same result, in that the highest bidder
gets the goods at his reservation price – the maximum he’s prepared to bid.
Similarly, the English auction and the Vickrey auction give the same result
(modulo the bid increment). However the two pairs are not strategically
equivalent. In a Dutch auction, you should bid low if you believe your
valuation is a lot higher than anybody else’s, while in a second-price auction
it’s best to bid truthfully.
The second key concept is revenue equivalence. This is a weaker concept; it’s
not about who will win, but how much money the auction is expected to raise.
291
292
Chapter 8
■
Economics
The interesting result here is the revenue equivalence theorem, which says that you
get the same revenue from any well-behaved auction under ideal conditions.
These conditions include risk-neutral bidders, no collusion, Pareto efficiency
(the highest bidder gets the goods) and independent valuations (no externalities between bidders). In such circumstances, the bidders adjust their strategies
and the English, Dutch and all-pay auctions all yield the same. So when you
design an auction, you have to focus on the ways in which the conditions aren’t
ideal. For details and examples, see Paul Klemperer’s book [1059].
And there are many things that can go wrong. There may be bidding rings,
where all the buyers collude to lowball the auction; here, a first-price auction
is best as it takes only one defector to break ranks, rather than two. Second,
there’s entry detection: in one UK auction of TV rights, bidders had to submit
extensive programming schedules, which involved talking to production companies, so everyone in the industry knew who was bidding and the franchises
with only one bidder went for peanuts. Third, there’s entry deterrence: bidders
in corporate takeovers often declare that they will top any other bid. Fourth,
there’s risk aversion: if you prefer a certain profit of $1 to a 50% chance of $2,
you’ll bid higher at a first-price auction. Fifth, there are signaling games; in US
spectrum auctions, some bidders broke anonymity by putting zip codes in the
least significant digits of their bids, to signal what combinations of areas they
were prepared to fight for, and to deter competitors from starting a bidding
war there. And then there are budget constraints: if bidders are cash-limited,
all-pay auctions are more profitable.
Advertisement auctions are big business, with Google, Facebook and Amazon making about $50bn, $30bn and $10bn respectively in 2019, while the rest
of the industry gets about $40bn. The ad auction mechanism pioneered by
Google is a second-price auction tweaked to optimise revenue. Bidders offer
to pay prices bi , the platform estimates their ad quality as ei , based on the ad’s
relevance and clickthrough rate. It then calculates ‘ad rank’ as ai = bi ei . The idea
is that if my ad is five times as likely to be clicked on as yours, then my bid of
10c is just as good as your bid of 50c. This is therefore a second-price auction,
but based on ranking ai rather than bi . Thus if I have five times your ad quality,
I bid 10c and you bid 40c, then I get the ad and pay 8c. It can be shown that
under reasonable assumptions, this maximises platform revenue.
There’s one catch, though. Once media become social, then ad quality can
easily segue into virality. If your ads are good clickbait and people click on
them, you pay less. One outcome was that in the 2016 US Presidential Election, Hilary Clinton paid a lot more per ad than Donald Trump did [1236].
Both auction theory and empirical data show how the drive to optimise platform revenue may lead to ever more extreme content: in addition to virality
effects at the auction step, Facebook’s delivery algorithms put ads in front of
the people most likely to click on them, strengthening the effect of filter bubbles, and that this is not all due to user actions [41]. Some people feel this
8.6 The economics of security and dependability
‘delivery optimisation’ should be prohibited by electoral law; certainly it’s one
more example of mechanisms with structural tension between efficiency and
fairness. In fact, in the UK, election ads aren’t permitted on TV, along with some
other categories such as tobacco. In my opinion, the cleanest solution in such
jurisdictions is to ban them online too, just like tobacco.
Ad pricing isn’t the only way market mechanisms drive social media to
promote extreme content. As former Googler Tristan Harris has explained,
the platforms’ recommender algorithms are optimised to maximise the time
people spend on-site, which means not just providing bottomless scrolling
feeds and letting users accumulate followers, but also a bias towards anxiety
and outrage. At YouTube, such algorithms gave recommendations that heavily favoured Trump in 2016 [1886]. What’s more, ad delivery can be skewed
by factors such as gender and race, as advertisers compete for more ‘valuable’
demographics, and by content effects because of the appeal of ad headlines or
images. This can be deliberate or accidental, and can affect a broad range of
ads including employment and housing [40]. This all raises thorny political
issues at the boundary between economics and psychology, which are at
the centre of policy debates around regulating tech. Economic tools such as
auction theory can often be used to unpick them.
8.6 The economics of security and dependability
Economists used to see a simple interaction between economics and security:
richer nations could afford bigger armies. But after 1945, nuclear weapons were
thought to decouple national survival from economic power, and the fields
of economics and strategic studies drifted apart [1240]. It has been left to the
information security world to re-establish the connection.
Round about 2000, a number of us noticed persistent security failures that
appeared at first sight to be irrational, but which we started to understand
once we looked more carefully at the incentives facing the various actors.
I observed odd patterns of investment by banks in information security
measures [55, 56]. Hal Varian looked into why people were not spending as
much money on anti-virus software as the vendors hoped [1947]. When the
two of us got to discussing these cases in 2001, we suddenly realised that there
was an interesting and important research topic here, so we contacted other
people with similar interests and organised a workshop for the following
year. I was writing the first edition of this book at the time, and found that
describing many of the problems as incentive problems made the explanations
much more compelling; so I distilled what I learned from the book’s final edit
into a paper ‘Why Information Security is Hard – An Economic Perspective”.
This paper, plus the first edition of this book, got people talking [73]. By
293
294
Chapter 8
■
Economics
the time they came out, the 9/11 attacks had taken place and people were
searching for new perspectives on security.
We rapidly found many other examples of security failure associated with
institutional incentives, such as hospital systems bought by medical directors
and administrators that support their interests but don’t protect patient privacy. (Later, we found that patient safety failures often had similar roots.) Jean
Camp had been writing about markets for vulnerabilities, and two startups
had set up early vulnerability markets. Networking researchers were starting
to use auction theory to design strategy-proof routing protocols. The Department of Defense had been mulling over its failure to get vendors to sell them
secure systems, as you can see in the second quote at the head of this chapter.
Microsoft was thinking about the economics of standards. All these ideas came
together at the Workshop on the Economics of Information Security at Berkeley in June 2002, which launched security economics as a new field of study.
The picture that started to emerge was of system security failing because the
people guarding a system were not the people who suffered the costs of failure. Sometimes, security mechanisms are used to dump risks on others, and
if you are one of those others you’d be better off with an insecure system. Put
differently, security is often a power relationship; the principals who control
what it means in a given system often use it to advance their own interests.
This was the initial insight, and the story of the birth of security economics is
told in [79]. But once we started studying the subject seriously, we found that
there’s a lot more to it than that.
8.6.1 Why is Windows so insecure?
The hot topic in 2002, when security economics got going, was this. Why is
Windows so insecure, despite Microsoft’s dominant market position? It’s possible to write much better software, and there are fields such as defense and
healthcare where a serious effort is made to produce dependable systems. Why
do we not see a comparable effort made with commodity platforms, especially
since Microsoft has no real competitors?
By then, we understood the basics of information economics: the combination of high fixed and low marginal costs, network effects and technical lock-in
makes platform markets particularly likely to be dominated by single vendors,
who stand to gain vast fortunes if they can win the race to dominate the market.
In such a race, the Microsoft philosophy of the 1990s – ‘ship it Tuesday and get it
right by version 3’ – is perfectly rational behaviour. In such a race, the platform
vendor must appeal not just to users but also to complementers – to the software companies who decide whether to write applications for its platform or
for someone else’s. Security gets in the way of applications, and it tends to be a
lemons market anyway. So the rational vendor engaged in a race for platform
8.6 The economics of security and dependability
dominance will enable all applications to run as root on his platform2 , until
his position is secure. Then he may add more security – but will be tempted
to engineer it in such a way as to maximise customer lock-in, or to appeal to
complementers in new markets such as digital media.
The same pattern was also seen in other platform products, from the old
IBM mainframe operating systems through telephone exchange switches to
the early Symbian operating system for mobile phones. Products are insecure
at first, and although they improve over time, many of the new security features are for the vendor’s benefit as much as the user’s. And this is exactly
what we saw with Microsoft’s product lines. DOS had no protection at all and
kick-started the malware market; Windows 3 and Windows 95 were dreadful; Windows 98 was only slightly better; and security problems eventually so
annoyed Microsoft’s customers that finally in 2003 Bill Gates decided to halt
development until all its engineers had been on a secure coding course. This
was followed by investment in better testing, static analysis tools, and regular
patching. The number and lifetime of exploitable vulnerabilities continued to
fall through later releases of Windows. But the attackers got better too, and the
protection in Windows isn’t all for the user’s benefit. As Peter Gutmann points
out, much more effort went into protecting premium video content than into
protecting users’ credit card numbers [843].
From the viewpoint of the consumer, markets with lock-in are often ‘bargains
then rip-offs’. You buy a nice new printer for $39.95, then find to your disgust
after just a few months that you need two new printer cartridges for $19.95
each. You wonder whether you’d not be better off just buying a new printer.
From the viewpoint of the application developer, markets with standards races
based on lock-in look a bit like this. At first it’s really easy to write code for
them; later on, once you’re committed, there are many more hoops to jump
through. From the viewpoint of the poor consumer, they could be described as
‘poor security, then security for someone else’.
The same pattern can be seen with externalities from security management
costs to infrastructure decisions that the industry takes collectively. When racing to establish a dominant position, vendors are tempted to engineer products so that most of the security management cost is dumped on the user. A
classic example is SSL/TLS encryption. This was adopted in the mid-1990s as
Microsoft and Netscape battled for dominance of the browser market. As we
discussed in Chapter 5, SSL leaves it up to the user to assess the certificate
offered by a web site and decide whether to trust it; and this led to all kinds of
phishing and other attacks. Yet dumping the compliance costs on the user made
perfect sense at the time; competing protocols such as SET would have saddled
banks with the cost of issuing certificates to every customer who wanted to buy
2 To
make coding easier, and enable app developers to steal the user’s other data for sale in secondary markets.
295
296
Chapter 8
■
Economics
stuff online, and that would just have cost too much [524]. The world ended
up with an insecure system of credit card payments on the Internet, and with
most of the stakeholders trying to dump liability on others in ways that block
progress towards something better.
There are also network effects for bads, as well as for goods. Most malware
writers targeted Windows rather than Mac or Linux through the 2000s and
2010s as there are simply more Windows machines to infect – leading to an odd
equilibrium in which people who were prepared to pay more for their laptop
could have a more secure one, albeit one that didn’t run as much software. This
model replicated itself when smartphones took over the world in the 2010s;
since Android took over from Windows as the world’s most popular operating
system, we’re starting to see a lot of bad apps for Android, while people who
pay more for an iPhone get better security but less choice. We will discuss this
in detail in the chapter on phones.
8.6.2 Managing the patching cycle
The second big debate in security economics was about how to manage the
patching cycle. If you discover a vulnerability, should you just publish it, which
may force the vendor to patch it but may leave people exposed for months until
they do so? Or should you report it privately to the vendor – and risk getting
a lawyer’s letter threatening an expensive lawsuit if you tell anyone else, after
which the vendor just doesn’t bother to patch it?
This debate goes back a long way; as we noted in the preface, the Victorians agonised over whether it was socially responsible to publish books about
lockpicking, and eventually concluded that it was [1899]. People have worried
more recently about whether the online availability of the US Army Improvised
Munitions Handbook [1928] helps terrorists; in some countries it’s a crime to
possess a copy.
Security economics provides both a theoretical and a quantitative framework
for discussing some issues of this kind. We started in 2002 with simple models
in which bugs were independent, identically distributed and discovered at
random; these have nice statistical properties, as attackers and defenders are
on an equal footing, and the dependability of a system is a function only of
the initial code quality and the total amount of time spent testing it [75]. But
is the real world actually like that? Or is it skewed by correlated bugs, or by
the vendor’s inside knowledge? This led to a big policy debate. Eric Rescorla
argued that software is close enough to the ideal that removing one bug makes
little difference to the likelihood of an attacker finding another one later, so frequent disclosure and patching were an unnecessary expense unless the same
vulnerabilities were likely to be rediscovered [1599]. Ashish Arora and others
responded with data showing that public disclosure made vendors fix bugs
8.6 The economics of security and dependability
more quickly; attacks increased to begin with, but reported vulnerabilities
declined over time [134]. In 2006, Andy Ozment and Stuart Schechter found
that the rate at which unique vulnerabilities were disclosed for the core
OpenBSD operating system decreased over a six-year period [1490]. In short,
in the right circumstances, software can be more like wine than like milk – it
improves with age. (Sustainability is a holy grail, and I discuss it in more
detail in Part 3.)
Several further institutional factors helped settle the debate in favour of
responsible disclosure, also known as coordinated disclosure, whereby people
report bugs to vendors or to third parties that keep them confidential for a
period until patches are available, then let the reporters get credit for their discoveries. One was the political settlement at the end of Crypto War I whereby
bugs would be reported to CERT which would share them with the NSA
during the bug-fixing process, as I will discuss later in section 26.2.7.3. This
got governments on board. The second was the emergence of commercial vulnerability markets such as those set up by iDefense and TippingPoint, where
security researchers could sell bugs; these firms would then disclose each
bug responsibly to the vendor, and also work out indicators of compromise
that could be sold to firms operating firewall or intrusion-detection services.
Third, smart software firms started their own bug-bounty programs, so that
security researchers could sell their bugs directly, cutting out middlemen such
as CERT and iDefense.
This marketplace sharpened considerably after Stuxnet drove governments
to stockpile vulnerabilities. We’ve seen the emergence of firms like Zerodium
that buy bugs and sell them to state actors, and to cyberweapons suppliers
that also sell to states; zero-day exploits for platforms such as the iPhone can
now sell for a million dollars or more. This had knock-on effects on the supply
chain. For example, in 2012 we came across the first case of a volunteer deliberately contributing vulnerable code to an open-source project3 , no doubt in the
hope of a six-figure payoff if it had found its way into widely-used platforms.
Already in 2010, Sam Ransbotham had shown that although open-source and
proprietary software are equally secure in an ideal model, bugs get turned into
exploits faster in the open source world, so attackers target it more [1582]. In
2014, Abdullah Algarni and Yashwant Malaiya surveyed vulnerability markets and interviewed some of the more prolific researchers: a combination of
curiosity and economic incentives draw in many able young men, many from
less developed countries. Some disclose responsibly, some use vulnerability
markets to get both money and recognition, while others sell for more money
to the black hats. Some will offer bugs to the vendor, but if not treated properly
will offer them to the bad guys instead. Vendors have responded with comparable offers: at Black Hat 2019, Apple announced a bug bounty schedule that
3 Webkit,
which is used in mobile phone browsers
297
298
Chapter 8
■
Economics
goes up to $1m for exploits that allow zero-click remote command execution
on iOS. Oh, and many of the bug hunters retire after a few years [39]. Like
it or not, volunteers running open-source projects now find themselves some
capable motivated opponents if their projects get anywhere, and even if they
can’t match Apple’s pocket, it’s a good idea to keep as many of the researchers
onside as possible.
The lifecycle of a vulnerability now involves not just its discovery, but perhaps some covert use by an intelligence agency or other black-hat actor; then its
rediscovery, perhaps by other black hats but eventually by a white hat; the shipment of a patch; and then further exploitation against users who didn’t apply
the patch. There are tensions between vendors and their customers over the
frequency and timing of patch release, as well as with complementers and secondary users over trust. A vulnerability in Linux doesn’t just affect the server in
your lab and your kid’s Raspberry Pi. Linux is embedded everywhere: in your
air-conditioner, your smart TV and even your car. This is why responsible disclosure is being rebranded as coordinated disclosure. There may be simply too
many firms using a platform for the core developers to trust them all about a
forthcoming patch release. There are also thousands of vulnerabilities, of which
dozens appear each year in the exploit kits used by criminals (and some no
doubt used only once against high-value targets, so they never become known
to defense systems). We have to study multiple overlapping ecosystems – of
the vulnerabilities indexed by their CVE numbers; of the Indicators of Compromise (IoCs) that get fed to intrusion detection systems; of disclosure to vendors
directly, via markets, via CERTs and via ISACs; of the various botnets, crime
gangs and state actors; and of the various recorded crime patterns. We have
partial correlations between these ecosystems, but the data are generally noisy.
I’ll come back to all this and discuss the technical details in section 27.5.7.
8.6.3 Structural models of attack and defence
The late Jack Hirshleifer, the founder of conflict theory, told the story of Anarchia, an island whose flood defences were constructed by individual families
each of whom maintained a section of the flood wall. The island’s flood defence
thus depended on the weakest link, that is, the laziest family. He compared
this with a city whose defences against missile attack depend on the single
best defensive shot [908]. Another example of best-shot is medieval warfare,
where there could be a single combat between the two armies’ champions. This
can lead to different political systems. Medieval Venice, the best example of
weakest-link defence because of the risk of flooding, had strong central government, with the merchant families electing a Doge with near-dictatorial powers
over flood defence. In much of the rest of late medieval Europe, kings or chieftains led their own armies to kill enemies and seize land; the strongest king
8.6 The economics of security and dependability
built the biggest empire, and this led to a feudal system that optimised the
number of men at arms.
Hal Varian extended this model to the dependability of information systems – where performance can depend on the weakest link, the best effort,
or the sum-of-efforts [1949]. This last case, the sum-of-efforts, is the modern
model for warfare: we pay our taxes and the government hires soldiers. It’s
more efficient than best-shot (where most people will free-ride behind the
heroes), which in turn is more efficient than weakest-link (where everyone will
be vulnerable via the laziest). Information security is an interesting mix of all
three modes. Program correctness can depend on the weakest link (the most
careless programmer introducing a vulnerability) while software vulnerability
testing may depend on the sum of everyone’s efforts. Security may also
depend on the best effort – the actions taken by an individual champion
such as a security architect. As more agents are added, systems become more
reliable in the sum-of-efforts case but less reliable in the weakest-link case. So
as software companies get bigger, they end up hiring more testers and fewer
(but more competent) programmers; Microsoft found by the early 2000s that
they had more test engineers than software engineers.
Other models of attack and defence include epidemic models of malware
spread, which were important back when computer viruses spread from
machine to machine via floppy disks, but are of less interest now that we see
relatively few wormable exploits; and models of security games that hinge
on timing, notably the game of FlipIt by Ron Rivest and colleagues [559];
indeed, there’s a whole conference (Gamesec) devoted to game theory and
information security. There are also models of social networks. For example,
most social networks owe their connectivity to a relatively small number of
nodes that have a relatively high number of links to other nodes [1998]. Knocking out these nodes can rapidly disconnect things; William the Conqueror
consolidated England after 1066 by killing the Anglo-Saxon nobility and
replacing them with Normans, while Stalin killed the richer peasants. US and
British forces similarly targeted highly-connected people in counterinsurgency
operations during the Iraq war (and the resulting social breakdown in Sunni
areas helped the emergence of Islamic State). Such models also suggest that
for insurgents to form into cells is the natural and most effective response to
repeated decapitation attacks [1375].
George Danezis and I also showed that where solidarity is needed for
defence, smaller and more homogeneous groups will be more effective [511].
Rainer Böhme and Tyler Moore studied what happens where it isn’t – if people
use defense mechanisms that bring only private benefit, then the weakest-link
model becomes one of low-hanging fruit. Examples include spammers who
simply guess enough weak passwords to replenish their stock of compromised
email accounts, and some types of card-not-present fraud [277].
299
300
Chapter 8
■
Economics
In short, the technology of conflict in any age can have deep and subtle effects
on politics, as it conditions the kind of institutions that can survive and thrive.
These institutions in turn shape the security landscape. Tyler Moore, Allan
Friedman and Ariel Procaccia studied whether a national agency such as the
NSA with both defensive and offensive missions would disclose vulnerabilities so they could be fixed, or stockpile them; they concluded that if it could
ignore the social costs that fall on others, it would stockpile [1340]. However the
biggest institutions in the security ecosystem are probably not the government
agencies but the dominant firms.
8.6.4 The economics of lock-in, tying and DRM
Technical lock-in is one of the factors that lead to dominant-firm markets, and
software firms have spent billions over more than thirty years on mechanisms
that make it hard for their customers to leave but easy for their competitors to
defect. The 1980s saw file format wars where companies tried to stop anyone
else accessing the word-processing files or spreadsheets their software generated. By the 1990s, the fight had shifted to network compatibility as Microsoft
tried to exclude other operating systems from LANs, until SAMBA created
interoperability with Apple; in the wake of a 1993 anti-trust suit, Microsoft held
back from using the Windows contract to block it. Adversarial interoperability
emerged as a kind of judo to fight network effects [570]. Similar mechanisms are
used to control markets in neighbouring or complementary goods and services,
examples being tying ink cartridges to printers, and digital rights management
(DRM) systems that lock music and videos to a specific machine or family of
machines, by preventing users from simply copying them as files. In an early
security-economics paper, Hal Varian pointed out in 2002 that their unfettered
use could damage competition [1948].
In 2003, Microsoft, Intel and others launched a ‘Trusted Computing’ initiative
that extended rights management to other types of file, and Windows Server
2003 offered ‘Information Rights Management’ (IRM) whereby I could email
you a Word document that you could only read on screen, not print, and only
till the end of the month. There was obvious potential for competitive abuse;
by transferring control of user data from the owner of the machine on which it
is stored to the creator of the file in which it is stored, the potential for lock-in
is hugely increased [74]. Think of the example in section 8.3.2 above, in which
a firm has 100 staff, each with a PC on which they install Office for $150. The
$15,000 they pay Microsoft is roughly equal to the total costs of switching to
(say) LibreOffice, including training, converting files and so on. However, if
control of the files moves to its thousands of customers, and the firm now has
to contact each customer and request a digital certificate in order to migrate
the file, then clearly the switching costs have increased – so you could expect
8.6 The economics of security and dependability
the cost of Office to increase too. IRM failed to take off at the time: corporate
America quickly understood that it was a lock-in play, European governments
objected to the fact that the Trusted Computing initiative excluded small firms,
and Microsoft couldn’t get the mechanisms to work properly with Vista. (But
now that email has moved to the cloud, both Microsoft and Google are offering
restricted email services of just the type that was proposed, and objected to,
back in 2003.)
Another aspect concerns DRM and music. In the late 1990s and early 2000s,
Hollywood and the music industry lobbied hard for mandatory DRM in
consumer electronics equipment, and we still pay the costs of that in various
ways; for example, when you switch your presentation from a VGA adapter to
HDMI and you lose the audio. Hollywood’s claim that unlicensed peer-to-peer
filesharing would destroy the creative industries was always shaky; a 2004
study showed that downloads didn’t harm music industry revenues overall [1459] while a later one suggested that downloaders actually bought more
CDs [51]. However the real issue was explained in 2005 by Google’s chief
economist [1950]: that a stronger link between the tech industry and music
would help tech firms more than the music industry, because tech was more
concentrated (with only three serious music platforms then – Microsoft,
Sony and Apple). The content industry scoffed, but by the end of that year
music publishers were protesting that Apple was getting too large a share
of the cash from online music sales. Power in the supply chain moved from
the music majors to the platforms, so the platforms (now Apple, Google,
Amazon and Spotify) got most of the money and the residual power in the
music industry shifted from the majors to the independents – just as airline
deregulation favoured aircraft makers and low-cost airlines. This is a striking
demonstration of the predictive power of economic analysis. By fighting a
non-existent threat, the record industry let the computer industry eat its lunch.
I discuss this in more detail in section 24.5.
DRM had become much less of an issue by 2020; the move from removable
media to streaming services means that few people copy music or movies
any more; the question is whether you pay a subscription to avoid the ads.
Similarly, the move to cloud-based services means that few people steal
software. As a result, crimes involving copyright infringement have dropped
sharply [92].
However, the move to the cloud is making lock-in a more complex matter,
operating at the level of ecosystems as well as of individual products. We discussed above how competition from Google Docs cut the price of Office, and
so Microsoft responded with a move to Office365; and how the total cost of
ownership of either that service or G-suite is greater than a standalone productivity product. So where is the lock-in? Well, if you opt for the Google ecosystem, you’ll probably be using not just Gmail and Google Docs but a Google
calendar, maps and much else. Although you can always download all your
301
302
Chapter 8
■
Economics
data, reinstalling it on a different platform (such as Microsoft’s or Apple’s) will
be a lot of bother, so you’ll probably just grit your teeth and pay for more storage when the free quota runs out. Similarly, if you start using tools like Slack
or Splunk in an IT company, you’ll end up customising them in all sorts of
ways that make it difficult to migrate. Again, this is nothing new; my own university’s dreadful accounting system has been a heavily customised version
of Oracle Financials for about 20 years. Now everyone’s playing the lock-in
game by inducing customers to buy or build complementary assets, or even to
outsource whole functions. Salesforce has taken over many companies’ sales
admin, Palantir has locked in many US police forces, and the big academic
publishers are usurping the functions of university libraries. Where there’s no
viable competition, there’s a real policy issue. The depth of Microsoft lock-in
on public-sector IT is illustrated by the brave attempts made by the city of
Munich to break away and use Linux in public administration: this was eventually reverted after 15 years, several visits of Bill Gates, and a new mayor [760].
The IT industry now has such global scale and influence that we need to see its
competition problems in a larger context.
8.6.5 Antitrust law and competition policy
The control of whole ecosystems by cartels is nothing new. Tim Wu reminds
us that both the English civil war and the American revolution started as
revolts against royal monopolies, while US antitrust law was inspired by Louis
Brandeis’ campaign against J.P. Morgan’s railway empire, and its European
equivalent by the help that German monopolists gave Hitler in his rise to
power [2053]. Joshua Specht tells the history of how big food companies like
Cargill and Armour grabbed control of the two-sided markets opened up
by the railroads, consolidated their power by buying infrastructure such as
grain elevators, dumped climate risk on small farmers, ran union organisers
out of town and even got the politicians to pass ‘ag-gag’ laws that define
animal-rights activism as terrorism [1812]. There are echoes of this in the
way the big IT service firms have built out their market power, controlling
everything from the ad ecosystem through operating systems to datacentres,
and seeking to marginalise their critics.
US antitrust activity has been on the wane since the 2000 election, after which
the new President Bush ended a big case against Microsoft. This was coupled
with US competition law turning its focus to consumer surplus, at the expense
of the other effects of monopoly [2053]. In fact, the whole global economy
has become more monopolistic over the first two decades of the twenty-first
century, and IT appears to account for much of the growth in industry concentration [235]. But it isn’t the only factor. The USA has also seen a wave of
corporate mergers, and there is a growing literature on moats – structural barriers to competition, of which network effects and technical lock-in are merely
two examples. Others range from patents and regulatory capture to customer
8.6 The economics of security and dependability
insight derived from control of data [1433]. (The word ‘moat’ appears due to
Warren Buffett, who became one of the world’s richest men by buying shares
in several dozen companies with captive markets [1834].) The dynamics of the
information industries compound many of these existing problems and can
make both effective competition, and effective regulation, even harder. However a clear pattern is now emerging: that US markets are becoming steadily
less competitive, while markets in Europe are becoming slightly more so [1524].
A new generation of competition-law scholars, such as Lina Khan of Harvard, argues that American law needs to take a much broader view of competition abuse than consumer surplus, just as Europe has always done [1046].
So should Amazon and Facebook be broken up, just like AT&T? President
Obama’s antitrust economist Carl Shapiro argue that antitrust law is ill-suited
to tackle the political power that large corporations wield, and so remedies
should be targeted at specific harms [1719]. Carl does however concede that
US antitrust law has been excessively narrowed by the Supreme Court in the
last 40 years, that the consumer-welfare test is inadequate, that dominant firms’
exclusionary conduct and labour-market practices both need to be tackled, and
that the USA needs to control horizontal mergers better [1720].
European competition law has for many years forbidden firms from using a
dominant position in one market to establish a dominant position in another,
and we’ve seen a whole series of judgements against the big tech firms in the
European courts. Regulators are designed to be more independent, since no
one member state wants to risk them being captured by any other [1524]. As
for the likely future direction, a 2019 report for the European Commission’s
Directorate-General of Competition by Jacques Crémer, Yves-Alexandre de
Montjoye and Heike Schweizter highlights not just the tech majors’ network
externalities and extreme returns to scale, but also the fact that they control
more and more of the data thanks to the move to online services and cloud
computing [497]. As a result they have economies of scope: succeeding in
one business makes it easier to succeed in another. It concludes that the
EU’s competition-law framework is basically sound but needs some tuning:
regulators need to protect both competition for the market and competition in
the market, such as on dominant platforms, which have a responsibility not to
distort competition there. In this environment, regulators must pay attention
to multihoming, switching, interoperability, data portability and the effect on
aftermarkets.
Tying spare parts is already regulated in Europe, with specific laws in some
sectors requiring vendors to let other firms make compatible spare parts, and
in others requiring that they make spares available for a certain period of time.
Some very specific policy issues can arise if you use security mechanisms to
tie products to each other. This links in with laws on planned obsolescence,
which is reinforced for goods with digital components when the vendors
limit the time period for which software updates are made available. The
rules have recently been upgraded in the European Union by a new Sales
303
304
Chapter 8
■
Economics
of Goods Directive (2019/771) that from January 2022 requires firms selling
goods with digital components – whether embedded software, cloud services
or associated phone apps – to maintain this software for at least two years
after the goods are sold, and for longer if this is the reasonable expectation
of the customer (for cars and white goods it’s likely to mean ten years). Such
regulations will become more of an issue now we have software in durable
goods such as cars and medical devices; I’ll discuss sustainability in the last
chapter of this book.
8.6.6 Perversely motivated guards
“There’s nane sae blind as them that will na see”, goes an old Scots proverb,
and security engineering throws up lots of examples.
There’s very little police action against cybercrime, as they found it
simpler to deter people from reporting it. As we noted in section 2.3,
this enabled them to claim that crime was falling for many years
even though it was just moving online like everything else.
Governments have imposed a duty on banks to spot money laundering, especially since 9/11. However no banker really wants to know
that one of his customers is a Mafioso. So banks lobby for risk reduction to be formalised as due diligence; they press for detailed regulations that specify the forms of ID they need for new account opening,
and the processing to be done to identify suspicious transactions.
When it comes to fraud, spotting a rare bank fraud pattern means
a payment service provider should now carry the loss rather than
just telling the customer she must be mistaken or lying. So they’re
tempted to wait and learn about new fraud types from industry or
from academics, rather than doing serious research of their own.
Click fraud is similar. Spotting a pattern of ‘inorganic clicks’ from
a botnet means you can’t charge the advertisers for those clicks
any more. You have to do some work to mitigate the worst of
it, but if you have a dominant market position then the harder
you work at fighting click fraud, the less revenue you earn.
Finding bugs in your own code is another example. Of course you
have to tweak the obvious bugs that stop it working, but what about
the more subtle bugs that can be exploited by attackers? The more
time you spend looking for them, the more time you have to spend
fixing them. You can always go and buy static analysis tools, but
then you’ll find thousands more bugs and your ship date will slip by
months. So firms tend to do that only if their customers demand it,
and it’s only cheap if you do it from the start of a project (but in that
case you could just as well write the code in Rust rather than in C).
8.6 The economics of security and dependability
There are more subtle examples, such as when it’s not politically acceptable
to tell the truth about threats. In the old days, it was hard to talk to a board of
directors about the insider threat, as directors mostly preferred to believe the
best about their company; so a typical security manager would make chilling
presentations about ‘evil hackers’ in order to get the budget to build internal
controls. Nowadays, the security-policy space in many companies has been
captured by the big four accountancy firms, whose consensus on internal controls is tied to their thought leadership on governance, which a cynic might say
is optimised for the welfare not of their ostensible client, the shareholders, but
for their real client, the CEO. Executive frauds are rarely spotted unless they
bring the company down; the effort goes instead into the annoying and irrelevant, such as changing passwords every month and insisting on original paper
receipts. I discuss all this in detail in section 12.2.2.
Or consider the 2009 parliamentary expenses scandal in the UK described in
section 2.3.6. Perhaps the officers of the Houses of Parliament didn’t defend the
expenses system more vigorously because they have to think of MPs and peers
as ‘honourable members’ in the context of a government that was pushing
harsh surveillance legislation with a slogan of ‘If you’ve nothing to hide you
have nothing to fear’. The author of that slogan, then Home Secretary Jacqui
Smith, may have had nothing to hide, but her husband did: he was watching
porn and charging it to her parliamentary expenses. Jacqui lost her job, and her
seat in Parliament too. Had officers known that the information on the expenses
server could cost a cabinet minister her job, they probably ought to have classified it Top Secret and kept it in a vault. But how could the extra costs have
been justified to the Treasury? On that cheerful note, let’s go on to privacy.
8.6.7 Economics of privacy
The privacy paradox is that people say that they value privacy, yet act otherwise. If you stop people in the street and ask them their views, about a third
say they are privacy fundamentalists and will never hand over their personal
information to marketers or anyone else; about a third say they don’t care; and
about a third are in the middle, saying they’d take a pragmatic view of the risks
and benefits of any disclosure. However, their shopping behavior – both online
and offline – is quite different; the great majority of people pay little heed to
privacy, and will give away the most sensitive information for little benefit.
Privacy-enhancing technologies have been offered for sale by various firms,
yet most have failed in the marketplace. Why should this be?
Privacy is one aspect of information security that interested economists
before 2000. In 1978, Richard Posner defined privacy in terms of secrecy [1539],
and the following year extended it to seclusion [1540]. In 1980, Jack Hirshleifer
published a seminal paper in which he argued that rather than being about
305
306
Chapter 8
■
Economics
withdrawing from society, privacy was a means of organising society, arising
from evolved territorial behavior; internalised respect for property supports
autonomy. In 1996, Hal Varian analysed privacy in terms of information
markets [1944]. Consumers want to not be annoyed by irrelevant marketing
calls while marketers do not want to waste effort; yet both are frustrated,
because of search costs, externalities and other factors. Varian suggested giving
consumers rights in information about themselves, and letting contracts sort
it out.
However, as we’ve seen, the information industries are prone to market
failures leading to monopoly, and the proliferation of dominant, informationintensive business models demands a different approach. Andrew Odlyzko
argued in 2003 that these monopolies simultaneously increase both the
incentives and the opportunities for price discrimination [1464]. Companies
mine online interactions for data revealing individuals’ willingness to pay,
and while the differential pricing we see in many markets from airline
yield-management systems to telecommunications prices may be economically efficient, it is increasingly resented. Peter Swire argued that we should
measure the externalities of privacy intrusion [1856]. If a telesales operator
calls 100 prospects, sells three of them insurance, and annoys 80, then the
conventional economic analysis considers only the benefit to the three and
to the insurer. But persistent annoyance causes millions of people to go
ex-directory, screen calls through an answering machine, or just not have a
landline at all. The long-run societal costs of robocalls can be considerable.
Empirical studies of people’s privacy valuations have supported this.
The privacy paradox has generated a significant literature, and is compounded by at least three factors. First, there are many different types of
privacy harm, from discrimination in employment, credit and insurance,
through the kind of cybercrime that presents as payment fraud, to personal
crimes such as stalking and non-consensual intimate imagery.
Second, the behavioral factors we discussed in section 3.2.5 play a large role.
Leslie John and colleagues demonstrated the power of context with a neat
experiment. She devised a ‘privacy meter’ in the form of a list of embarrassing
questions; the score was how many questions a subject would answer before
they balked. She tried this on three groups of students: a control group in a
neutral university setting, a privacy treatment group who were given strong
assurances that their data would be encrypted, their IP addresses not stored,
and so on; and a gamer treatment group that was taken to an external website (howbadareyou.com with a logo of a smiling devil). You might think that
the privacy treatment group would disclose more, but in fact they disclosed
less – as privacy had been made salient to them. As for the gamer group, they
happily disclosed twice as much as the control group [989].
Third, the industry understands this, and goes out of its way to make privacy risks less salient. Privacy policies are usually not on the front page, but
8.6 The economics of security and dependability
are easily findable by concerned users; policies typically start with anodyne
text and leave the unpleasant stuff to the end, so they don’t alarm the casual
viewer, but the vigilant minority can quickly find a reason not to use the site,
so they also don’t stop the other users clicking on the ads. The cookie warnings mandated in Europe are mostly anodyne, though some firms give users
fine-grained control; as noted in section 3.2.5, the illusion of control is enough
to reassure many.
So what’s the overall effect? In the 2000s and early 2010s there was evidence
that the public were gradually learning what we engineers already understood
about the risks; we could see this for example in the steadily rising proportion
of Facebook users who opt to use privacy controls to narrow that system’s very
open defaults.
In 2015, almost two years after the Snowden revelations, two surveys conducted by Pew Research disclosed a growing sense of learned helplessness
among the US public. 93% of adults said that being in control of who can get
information about them is important, and 90% that controlling what information is collected about them is important; 88% said it’s important that no-one
watch or listen to them without their permission. Yet just 6% of adults said
they were ‘very confident’ that government agencies could keep their records
private and secure, while another 25% said they were ‘somewhat confident.’
The figures for phone companies and credit card companies were similar while
those for advertisers, social media and search engines were significantly worse.
Yet few respondents had done anything significant, beyond occasionally clearing their browser history or refusing particularly inappropriate demands for
personal information [1206].
These tensions have been growing since the 1960s, and have led to complex
privacy regulation that differs significantly between the US and Europe. I’ll
discuss this in much more detail in section 26.6.
8.6.8 Organisations and human behaviour
Organisations often act in apparently irrational ways. We frequently see
firms and even governments becoming so complacent that they’re unable
to react to a threat until it’s a crisis, when they panic. The erosion of health
service resilience and pandemic preparedness in Europe and North America
in the century since the 1918–19 Spanish flu is merely the most salient of
many examples. As another example, it seems that there’s always one phone
company, and one bank, that the bad guys are picking on. A low rate of fraud
makes people complacent, until the bad guys notice. The rising tide of abuse
is ignored, or blamed on customers, for as long as possible. Then it gets in the
news and executives panic. Loads of money get spent for a year or two, stuff
gets fixed, and the bad guys move on to the next victim.
307
308
Chapter 8
■
Economics
So the security engineer needs to anticipate the ways in which human frailties
express themselves through organizational behaviour.
There’s a substantial literature on institutional economics going back to
Thorstein Veblen. One distinguished practitioner, Herb Simon, was also
a computing pioneer and founded computer science at CMU. In a classic
book on administrative behaviour, he explained that the decisions taken by
managers are not just about efficiency but also organisational loyalty and
authority, and the interaction between the organisation’s goals and the incentives facing individual employees; there are messy hierarchies of purpose,
while values and facts are mixed up [1758]. A more modern analysis of these
problems typically sees them as principal-agency issues in the framework
of microeconomics; this is a typical approach of professors of accountancy.
We will discuss the failures of the actual practice of accountancy later, in
section 12.2. Another approach is public-choice economics, which applies
microeconomic methods to study the behaviour of politicians, civil servants
and people in public-sector organsations generally. I summarise public choice
in section 26.3.3; the principles are illustrated well in the TV sitcom “Yes
Minister’ which explores the behaviour of British civil servants. Cynics note
that bureaucracies seem to evolve in such a way as to minimise the likelihood
of blame.
My own observation, having worked in banks, tech companies big and small
and in the university sector too, is that competition is more important than
whether an enterprise is publicly or privately owned. University professors
compete hard with each other; our customer isn’t our Vice-Chancellor but the
Nobel Prize committee or equivalent. But as university administrators work
in a hierarchy with the VC at the top, they face the same incentives as civil
servants and display many of the same strengths and weaknesses. Meanwhile,
some private firms have such market power that internally they behave just
like government (though with much better pay at the top).
8.6.9 Economics of cybercrime
If you’re going to protect systems from attack, it’s a good idea to know who the
attackers are, how many they are, where they come from, how they learn their
jobs and how they’re motivated. This brings us to the economics of cybercrime.
In section 2.3 we gave an overview of the cybercrime ecosystem, and there are
many tools we can use to study it in more detail. At the Cambridge Cybercrime
Centre we collect and curate the data needed to do this, and make it available
to over a hundred researchers worldwide. As in other economic disciplines,
there’s an iterative process of working out what the interesting questions are
and collecting the data to answer them. The people with the questions are not
just economists but engineers, psychologists, lawyers, law enforcement and,
increasingly, criminologists.
8.6 The economics of security and dependability
One approach to crime is that of Chicago-school economists such as Gary
Becker, who in 1968 analysed crime in terms of rewards and punishments [201].
This approach gives many valuable insights but isn’t the whole story. Why is
crime clustered in bad neighbourhoods? Why do some kids from these neighbourhoods become prolific and persistent offenders? Traditional criminologists
study questions like these, and find explanations of value in crime prevention:
the worst offenders often suffer multiple deprivation, with poor parenting,
with substance and alcohol abuse, and get drawn into cycles of offending.
The earlier they start in their teens, the longer they’ll persist before they give
up. Critical criminologists point out that laws are made by the powerful, who
maintain their power by oppressing the poor, and that bad neighbourhoods are
more likely to be over-policed and stigmatised than the nice suburbs where the
rich white people live.
Drilling down further, we can look at the bad neighbourhoods, the psychology of offenders, and the pathways they take into crime. Since the 1960s
there has been a substantial amount of research into using environmental
design to suppress crime, initially in low-cost housing and then everywhere.
For example, courtyards are better than parks, as residents are more likely
to identify and challenge intruders; many of these ideas for situational crime
prevention go across from criminology into systems design. In section 13.2.2
we’ll discuss this in more detail.
Second, psychologically normal people don’t like harming others; people
who do so tend to have low empathy, perhaps because of childhood abuse,
or (more often) to have minimisation strategies to justify their actions. Bank
robbers see bankers as the real exploiters; soldiers dehumanise the enemy as
‘gooks’ or ‘terrs’; and most common murderers see their crimes as a matter of
honour. “She cheated on me” and “He disrespected me” are typical triggers;
we discussed the mechanisms in section 3.2.4. These mechanisms go across to
the world of online and electronic fraud. Hackers on the wrong side of the law
tend to feel their actions are justified anyway: hacktivists are political activists
after all, while cyber-crooks use a variety of minimisation strategies to avoid
feeling guilty. Some Russian cybercrooks take the view that the USA screwed
Russia over after 1989, so they’re just getting their own back (and they’re supported in this by their own government’s attitudes and policies). As for bankers
who dump fraud risks on customers, they talk internally about ‘the avalanche
of fraudulent risks of fraud’ they’d face if they owned up to security holes.
Third, it’s important to understand the pathways to crime, the organisation
of criminal gangs, and the diffusion of skills. Steve Levitt studied the organisation and finances of Chicago crime gangs, finding that the street-level dealers
were earning less than minimum wage [1153]. They were prepared to stand
in the rain and be shot at for a chance to make it to the next level up, where
the neighbourhood boss drove around in a BMW with three girls. Arresting
the boss won’t make any difference as there are dozens of youngsters who’ll
309
310
Chapter 8
■
Economics
fight to replace him. To get a result, the police should target the choke point,
such as the importer’s system administrator. These ideas also go across. Many
cyber-criminals start off as gamers, then cheat on games, then deal in game
cheats, then learn how to code game cheats, and within a few years the more
talented have become malware devs. So one policy intervention is to try to
stop kids crossing the line between legal and illegal game cheating. As I mentioned in section 3.2.4, the UK National Crime Agency bought Google ads
which warned people in Britain searching for DDoS-for-hire services that the
use of such services was illegal. Ben Collier and colleagues used our Cybercrime Centre data to show that this halted the growth of DDoS attacks in the
UK, compared with the USA where they continued to grow [457].
We discussed the overall costs of cybercrime in section 2.3, noting that the
ecosystem has been remarkably stable over the past decade, despite the fact
that the technology has changed; we now go online from phones more than
laptops, use social networks, and keep everything in the cloud. Most acquisitive crime is now online; in 2019 we expect that about a million UK households
suffered a burglary or car theft, while over two million suffered a fraud or scam,
almost always online. (In 2020 the difference will be even more pronounced;
burglary has fallen still further with people staying at home through the lockdown.) Yet policy responses lag almost everywhere. Studies of specific crimes
are reported at various places in this book.
The effects of cybercrime are also studied via the effects of breach disclosures.
Alessandro Acquisti and colleagues have studied the effects on the stock price
of companies of reporting a security or privacy breach [15]; a single breach
tends to cause a small dip that dissipates after a week or so, but a double breach
can impair investor confidence over the longer term. Breach disclosure laws
have made breaches into insurable events; if TJX loses 47m records and has to
pay $5 to mail each customer, that’s a claim; we’ll discuss cyber-insurance later
in section 28.2.9.
Overall, though, measurement is tricky. Most of the relevant publications
come from organisations with an incentive to talk up the losses, from police
agencies to anti-virus vendors; our preferred methodology is to count the losses
by modus operandi and by sector, as presented in section 2.3.
8.7 Summary
Many systems fail because the incentives are wrong, rather than because of
some technical design mistake. As a result, the security engineer needs to
understand basic economics as well as the basics of crypto, protocols, access
controls and psychology. Security economics has grown rapidly to explain
many of the things that we used to consider just ‘bad weather’. It constantly
Further reading
throws up fascinating new insights into all sorts of questions from how to
optimise the patching cycle through whether people really care about privacy.
Research problems
So far, three areas of economics have been explored for their relevance to security, namely microeconomics, game theory and behavioural economics. But
economics is a vast subject. What other ideas might it give us?
In the history paper I wrote on the origins of security economics, I suggested a
new research student might follow the following heuristics to select a research
topic. First, think of security and X for other subfields X of economics. Second, think about the security economics of Y for different applications Y; there
have already been some papers on topics like payments, pornography, gaming, and censorship, but these aren’t the only things computers are used for.
Third, where you find gold, keep digging (e.g. behavioral privacy) [79]. Since
then I would add the following.
Fourth, there is a lot of scope for data-driven research now that we’re starting to make large datasets available to academics (via the Cambridge Cybercrime Centre) and many students are keen to develop skills in data science. A
related problem is how to gather more data that might be useful in exploring
other fields, from the productivity of individual security staff to how security works within institutions, particularly large complex institutions such as
governments and healthcare systems. Is there any good way of measuring the
quality of a security culture?
Fifth, now we’re starting to put software and online connectivity in durable
safety-critical things like cars and medical devices, we need to know a lot more
about the interaction between security and safety, and about how we can keep
such systems patched and running for decades. This opens up all sorts of new
topics in dependability and sustainability.
The current research in security economics is published mostly at the Workshop on the Economics of Information Security (WEIS), which has been held
annually since 2002 [77]. There are liveblogs of all but one of the workshops,
which you can find on our blog https://www.lightbluetouchpaper.org.
Further reading
The classic introduction to information economics is Shapiro and Varian’s
‘Information Rules’ which remains remarkably fresh for a book written twenty
years ago [1721]. This is still on our student reading list. The most up-to-date
summary is probably Jacques Crémer, Yves-Alexandre de Montjoye and Heike
311
312
Chapter 8
■
Economics
Schweizter’s 2019 report for the European Commission’s Directorate-General
of Competition, which analyses what goes wrong with markets in which
information plays a significant role [497]; I would read also Carl Shapiro’s
2019 review of the state of competition policy in the USA[1720]. Tim Wu’s
“The Master Switch” discusses monopoly in telecomms and the information
industries generally, including the breakup of AT&T, which was essential
to the development of the Internet as we know it today – one of antitrust
law’s greatest achievements [2051]. His later book, “The Curse of Bigness”,
tells the broader antitrust story, including the antitrust case against IBM that
spawned the modern software industry [2053]. If you’re seriously interested
in antitrust and competition policy you need to dive into the detail, for which
I’d suggest Thomas Philippon’s “The Great Reversal – How America Gave
up on Free Markets” [1524]. This analyses multiple aspects of market power
across several industries in America and Europe, and explains the machinery
economists use for the purpose.
The early story of security economics is told in [79]; there’s an early (2007)
survey of the field that I wrote with Tyler Moore at [111], and a more comprehensive 2011 survey, also with Tyler, at [112]. For privacy economics, see
Alessandro Acquisti’s online bibliography, and the survey paper he wrote with
George Loewenstein and Laura Brandimarte [16]; there’s also a survey of the
literature on the privacy paradox by Spiros Kokolakis [1078]. Then, to dive into
the research literature, I’d suggest the WEIS conference papers and liveblogs.
A number of economists study related areas. I mentioned Jack Hirshleifer’s
conflict theory [909]; another important strand is the economics of crime, which
was kick-started by Gary Becker [201], and has been popularised by Steve
Levitt and Stephen Dubner’s “Freakonomics” [1153]. Diego Gambetta is probably the leading scholar of organised crime; his ‘Codes of the Underworld: How
Criminals Communicate’ is a classic [742]. Finally, there is a growing research
community and literature on cyber-criminology, for which the website of our
Cambridge Cybercrime Centre might be a reasonable starting point.
If you plan to do research in security economics and your degree wasn’t in
economics, you might work through a standard textbook such as Varian [1945]
or the Core Economics website. Adam Smith’s classic ‘An inquiry into the nature
and causes of the wealth of nations’ is still worth a look, while Dick Thaler’s ‘Misbehaving’ tells the story of behavioural economics.
PART
II
In this second part of the book, I describe a large number of applications of secure systems, many of which
introduce particular protection concepts or technologies.
There are three broad themes. Chapters 9–12 look
at conventional computer security issues, and by
discussing what we’re trying to do and how it’s done
in different environments – the military, healthcare,
the census and banking – we introduce security policy
models which set out the protection concepts that real
systems try to implement. These range from multilevel
security through compartments to anonymisation and
internal control. We introduce our first detailed case
studies, from government networks through medical
records to payment systems.
Chapters 13–20 look at the hardware and system engineering aspects of information security.
This ranges from biometrics, through the design
of hardware security mechanisms from physical
locks, security printing and seals, to chip-level
tamper-resistance and emission security. We study
applications that illustrate these technologies, ranging
from burglar alarms through curfew tags, utility meters
and payment cards to the control of nuclear weapons.
We end up with a chapter on advanced cryptographic
engineering, where hardware and software security
meet: there we discuss topics from secure messaging
and anonymous communications through hardware
security modules and enclaves to blockchains.
314
Part II
Our third theme is attacks on networks and on highly-networked systems.
We start off in Chapter 21 with attacks on computer networks and defensive
technologies ranging from firewalls to PKI. We then study the phone ecosystems in Chapter 22, from bad apps down through switching exploits and out
to the policy tussles over 5G. Chapter 23 tackles electronic and information
warfare, showing how far techniques of denial, deception and exploitation
can be taken by a serious combatants, and helping us hone our appreciation
of anonymity and traffic analysis. Chapter 24 shows how some of these techniques are adapted in systems for digital rights management.
Finally, in Chapter 25 I present four areas of bleeding-edge security research
in 2020. First are autonomous vehicles, including the ‘autopilot’ systems
starting to appear in family cars. Second, we look at the machine-learning
systems on which such vehicles increasingly rely, and which are starting to be
used in many other applications. Third, we work through the realistic options
for people to protect ourselves using privacy technology and operational
security measures in a world where surveillance is being turbocharged by
machine-learning techniques. Finally, we look at elections, which are becoming ever more fraught with claims (whether true or false) of interference and
cheating.
This ordering tries to give the chapters a logical progression. Thus, for
example, I discuss frauds against magnetic stripe bank cards before going
on to describe the smartcards which replaced them and the phone payment
systems which rely on smartcards for the SIMs that authenticate devices to
the network, but which have so many more vulnerabilities thanks to the rich
environment that has evolved since the launch of the iPhone.
Often a technology has evolved through a number of iterations over several
applications. In such cases I try to distill what I know into a history. It can be
confusing and even scary when you first dive into a 5,000-page manual for
something that’s been evolving for thirty years like the card payment system,
or an Intel or Arm CPU; the story of how it evolved and why is often what you
need to make sense of it.
CHAPTER
9
Multilevel Security
Most high assurance work has been done in the area of kinetic devices and infernal machines
that are controlled by stupid robots. As information processing technology becomes more
important to society, these concerns spread to areas previously thought inherently harmless, like
operating systems.
– EARL BOEBERT
The password on the government phone always seemed to drop, and I couldn’t get into it.
– US diplomat and former CIA officer KURT VOLKER, explaining why he texted from his
personal phone
I brief; you leak; he/she commits a criminal offence by divulging classified information.
– BRITISH CIVIL SERVICE VERB
9.1 Introduction
In the next few chapters I’m going to explore the concept of a security policy
using case studies. A security policy is a succinct description of what we’re trying to achieve; it’s driven by an understanding of the bad outcomes we wish
to avoid and in turn drives the engineering. After I’ve fleshed out these ideas
a little, I’ll spend the rest of this chapter exploring the multilevel security (MLS)
policy model used in many military and intelligence systems, which hold information at different levels of classification (Confidential, Secret, Top Secret, … ),
and have to ensure that data can be read only by a principal whose clearance
level is at least as high. Such policies are increasingly also known as information
flow control (IFC).
They are important for a number of reasons, even if you’re never planning to
work for a government contractor:
1. from about 1980 to about 2005, the US Department of Defense spent
several billion dollars funding research into multilevel security. So the
315
316
Chapter 9
■
Multilevel Security
model was worked out in great detail, and we got to understand the
second-order effects of pursuing a single policy goal with great zeal;
2. the mandatory access control (MAC) systems used to implement it have
now appeared in all major operating systems such as Android, iOS and
Windows to protect core components against tampering by malware, as I
described in Chapter 6;
3. although multilevel security concepts were originally developed
to support confidentiality in military systems, many commercial systems now use multilevel integrity policies. For example,
safety-critical systems use a number of safety integrity levels1 .
The poet Archilochus famously noted that a fox knows many little things,
while a hedgehog knows one big thing. Security engineering is usually in fox
territory, but multilevel security is an example of the hedgehog approach.
9.2 What is a security policy model?
Where a top-down approach to security engineering is possible, it will typically
take the form of threat model – security policy – security mechanisms. The critical,
and often neglected, part of this process is the security policy.
By a security policy, we mean a document that expresses clearly and concisely
what the protection mechanisms are to achieve. It is driven by our understanding of threats, and in turn drives our system design. It will often take the form
of statements about which users may access which data. It plays the same role
in specifying the system’s protection requirements, and evaluating whether
they have been met, that the system specification does for functionality and
the safety case for safety. Like the specification, its primary function is to communicate.
Many organizations use the phrase ‘security policy’ to mean a collection of
vapid statements, as in Figure 9.1:
Megacorp, Inc. security policy
1. This policy is approved by Management.
2. All staff shall obey this security policy.
3. Data shall be available only to those with a “need-to-know”.
4. All breaches of this policy shall be reported at once to Security.
Figure 9.1: typical corporate policy language
1 Beware
though that terminology varies between different safety-engineering disciplines. The
safety integrity levels in electricity generation are similar to Biba, while automotive safety
integrity levels are set in ISO 26262 as a hazard/risk metric that depends on the likelihood
that a fault will cause an accident, together with the expected severity and controllability.
9.2 What is a security policy model?
This sort of language is common, but useless – at least to the security engineer. It dodges the central issue, namely ‘Who determines “need-to-know” and
how?’ Second, it mixes statements at different levels (organizational approval
of a policy should logically not be part of the policy itself). Third, there is a
mechanism but it’s implied rather than explicit: ‘staff shall obey’ – but what
does this mean they actually have to do? Must the obedience be enforced by the
system, or are users ‘on their honour’? Fourth, how are breaches to be detected
and who has a specific duty to report them?
When you think about it, this is political language. A politician’s job is to
resolve the tensions in society, and this often requires vague language on which
different factions can project their own wishes; corporate executives are often
operating politically, to balance different factions within a company2 .
Because the term ‘security policy’ is often abused to mean using security for
politics, more precise terms have come into use by security engineers.
A security policy model is a succinct statement of the protection properties that
a system must have. Its key points can typically be written down in a page or
less. It is the document in which the protection goals of the system are agreed
with an entire community, or with the top management of a customer. It may
also be the basis of formal mathematical analysis.
A security target is a more detailed description of the protection mechanisms
that a specific implementation provides, and how they relate to a list of control
objectives (some but not all of which are typically derived from the policy
model). The security target forms the basis for testing and evaluation of a
product.
A protection profile is like a security target but expressed in a manner that is
independent of the implementation, so as to enable comparable evaluations
across products and versions. This can involve the use of a semi-formal language, or at least of suitable security jargon. A protection profile is a requirement for products that are to be evaluated under the Common Criteria [1398].
(I discuss the Common Criteria in section 28.2.7; they are used by many governments for mutual recognition of security evaluations of defense information
systems.)
When I don’t have to be so precise, I may use the phrase ‘security policy’ to
refer to either a security policy model or a security target. I will never use it to
refer to a collection of platitudes.
Sometimes, we’re confronted with a completely new application and have
to design a security policy model from scratch. More commonly, there already
exists a model; we just have to choose the right one, and develop it into
2 Big
projects often fail in companies when the specification becomes political, and they fail even
more often when run by governments – issues I’ll discuss further in Part 3.
317
318
Chapter 9
■
Multilevel Security
a security target. Neither of these steps is easy. In this section of the book,
I provide a number of security policy models, describe them in the context
of real systems, and examine the engineering mechanisms (and associated
constraints) which a security target can use to meet them.
9.3 Multilevel security policy
On March 22, 1940, President Roosevelt signed Executive Order 8381, enabling
certain types of information to be classified Restricted, Confidential or
Secret [980]. President Truman later added a higher level of Top Secret. This
developed into a common protective marking scheme for the sensitivity
of documents, and was adopted by NATO governments too in the Cold
War. Classifications are labels, which run upwards from Unclassified through
Confidential, Secret and Top Secret (see Figure 9.2). The original idea was that
information whose compromise could cost lives was marked ‘Secret’ while
information whose compromise could cost many lives was ‘Top Secret’.
Government employees and contractors have clearances depending on the care
with which they’ve been vetted; in the USA, for example, a ‘Secret’ clearance
involves checking FBI fingerprint files, while ‘Top Secret’ also involves
background checks for the previous five to fifteen years’ employment plus
an interview and often a polygraph test [548]. Candidates have to disclose all
their sexual partners in recent years and all material that might be used to
blackmail them, such as teenage drug use or gay affairs3 .
The access control policy was simple: you can read a document only if your
clearance is at least as high as the document’s classification. So an official
cleared to ‘Top Secret’ could read a ‘Secret’ document, but not vice versa.
So information may only flow upwards, from confidential to secret to top
secret, but never downwards – unless an authorized person takes a deliberate
decision to declassify it.
The system rapidly became more complicated. The damage criteria for classifying documents were expanded from possible military consequences to economic harm and even political embarrassment. Information that is neither classified nor public is known as ‘Controlled Unclassified Information’ (CUI) in the
USA while Britain uses ‘Official’4 .
3 In
June 2015, the clearance review data of about 20m Americans was stolen from the Office of
Personnel Management by the Chinese intelligence services. By then, about a million Americans
had a Top Secret clearance; the OPM data also covered former employees and job applicants, as
well as their relatives and sexual partners. With hindsight, collecting all the dirt on all the citizens
with a sensitive job may not have been a great idea.
4 Prior to adopting the CUI system, the United States had more than 50 different markings for
data that was controlled but not classified, including For Official Use Only (FOUO), Law Enforcement Sensitive (LES), Proprietary (PROPIN), Federal Tax Information (FTI), Sensitive but Unclassified (SBU), and many, many others. Some agencies made up their own labels, without any
9.3 Multilevel security policy
TOP SECRET
SECRET
CONFIDENTIAL
UNCLASSIFIED
Figure 9.2: multilevel security
There is also a system of codewords whereby information, especially at Secret
and above, can be restricted further. For example, information that might reveal
intelligence sources or methods – such as the identities of agents or decryption
capabilities – is typically classified ‘Top Secret Special Compartmented Intelligence’ or TS/SCI, which means that so-called need to know restrictions are
imposed as well, with one or more codewords attached to a file. Some codewords relate to a particular military operation or intelligence source and are
available only to a group of named users. To read a document, a user must
have all the codewords that are attached to it. A classification label, plus a
set of codewords, makes up a security category or (if there’s at least one codeword) a compartment, which is a set of records with the same access control
policy. Compartmentation is typically implemented nowadays using discretionary access control mechanisms; I’ll discuss it in the next chapter.
There are also descriptors, caveats and IDO markings. Descriptors are words
such as ‘Management’, ‘Budget’, and ‘Appointments’: they do not invoke
any special handling requirements, so we can deal with a file marked ‘Confidential – Management’ as if it were simply marked ‘Confidential’. Caveats
are warnings such as “UK Eyes Only”, or the US equivalent, “NOFORN”;
they do create restrictions. There are also International Defence Organisation
markings such as NATO5 . The lack of obvious differences between codewords,
descriptors, caveats and IDO marking helps make the system confusing.
A more detailed explanation can be found in [1565].
9.3.1 The Anderson report
In the 1960s, when computers started being widely used, the classification
system caused serious friction. Paul Karger, who worked for the USAF then,
described having to log off from a Confidential system, walk across the yard
coordination. Further problems arose when civilian documents marked Confidential ended up
at the National Archives and Records Administration, where CONFIDENTIAL was a national
security classification. Moving from this menagerie of markings to a single centrally-managed
government-wide system has taken more than a decade and is still ongoing. The UK has its own
post-Cold-War simplification story.
5 Curiously, in the UK ‘NATO Secret’ is less secret than ‘Secret’, so it’s a kind of anti-codeword
that moves the content down the lattice rather than up.
319
320
Chapter 9
■
Multilevel Security
to a different hut, show a pass to an armed guard, then go in and log on to a
Secret system – over a dozen times a day. People soon realised they needed
a way to deal with information at different levels at the same desk, but how
could this be done without secrets leaking? As soon as one operating system
bug was fixed, some other vulnerability would be discovered. The NSA hired
an eminent computer scientist, Willis Ware, to its scientific advisory board,
and in 1967 he brought the extent of the computer security problem to official
and public attention [1989]. There was the constant worry that even unskilled
users would discover loopholes and use them opportunistically; there was
also a keen and growing awareness of the threat from malicious code. (Viruses
were not invented until the 1980s; the 70’s concern was Trojans.) There was
then a serious scare when it was discovered that the Pentagon’s World Wide
Military Command and Control System (WWMCCS) was vulnerable to Trojan
Horse attacks; this had the effect of restricting its use to people with a ‘Top
Secret’ clearance, which was inconvenient.
The next step was a 1972 study by James Anderson for the US government
which concluded that a secure system should do one or two things well; and
that these protection properties should be enforced by mechanisms that were
simple enough to verify and that would change only rarely [52]. It introduced
the concept of a reference monitor – a component of the operating system that
would mediate access control decisions and be small enough to be subject to
analysis and tests, the completeness of which could be assured. In modern
parlance, such components – together with their associated operating procedures – make up the Trusted Computing Base (TCB). More formally, the TCB
is defined as the set of components (hardware, software, human, … ) whose
correct functioning is sufficient to ensure that the security policy is enforced,
or, more vividly, whose failure could cause a breach of the security policy. The
Anderson report’s goal was to make the security policy simple enough for the
TCB to be amenable to careful verification.
9.3.2 The Bell-LaPadula model
The multilevel security policy model that gained wide acceptance was proposed by Dave Bell and Len LaPadula in 1973 [211]. Its basic property is that
information cannot flow downwards. More formally, the Bell-LaPadula (BLP)
model enforces two properties:
The simple security property: no process may read data at a higher level.
This is also known as no read up (NRU);
The *-property: no process may write data to a lower level. This is also
known as no write down (NWD).
9.3 Multilevel security policy
The *-property was Bell and LaPadula’s critical innovation. It was driven by
the WWMCCS debacle and the more general fear of Trojan-horse attacks. An
uncleared user might write a Trojan and leave it around where a system administrator cleared to ‘Secret’ might execute it; it could then copy itself into the
‘Secret’ part of the system, read the data there and try to signal it down somehow. It’s also quite possible that an enemy agent could get a job at a commercial
software house and embed some code in a product that would look for secret
documents to copy. If it could then write them down to where its creator could
read them, the security policy would have been violated. Information might
also be leaked as a result of a bug, if applications could write down.
Vulnerabilities such as malicious and buggy code are assumed to be given. It
is also assumed that most staff are careless, and some are dishonest; extensive
operational security measures have long been used, especially in defence
environments, to prevent people leaking paper documents. So the pre-existing
culture assumed that security policy was enforced independently of user
actions; Bell-LaPadula sets out to enforce it not just independently of users’
direct actions, but of their indirect actions (such as the actions taken by
programs they run).
So we must prevent programs running at ‘Secret’ from writing to files at
‘Unclassified’. More generally we must prevent any process at High from
signalling to any object at Low. Systems that enforce a security policy independently of user actions are described as having mandatory access control, as
opposed to the discretionary access control in systems like Unix where users can
take their own access decisions about their files.
The Bell-LaPadula model enabled designers to prove theorems. Given both
the simple security property (no read up), and the star property (no write
down), various results can be proved: in particular, if your starting state
is secure, then your system will remain so. To keep things simple, we will
generally assume from now on that the system has only two levels, High
and Low.
9.3.3 The standard criticisms of Bell-LaPadula
The introduction of BLP caused a lot of excitement: here was a security policy that did what the defence establishment thought it wanted, was intuitively
clear, yet still allowed people to prove theorems. Researchers started to beat up
on it and refine it.
The first big controversy was about John McLean’s System Z, which he
defined as a BLP system with the added feature that a user can ask the system
administrator to temporarily declassify any file from High to Low. In this way,
Low users can read any High file without breaking the BLP assumptions.
Dave Bell countered that System Z cheats by doing something his model
321
322
Chapter 9
■
Multilevel Security
doesn’t allow (changing labels isn’t a valid operation on the state), and John
McLean’s retort was that it didn’t explicitly tell him so: so the BLP rules were
not in themselves enough. The issue is dealt with by introducing a tranquility
property. Strong tranquility says that security labels never change during
system operation, while weak tranquility says that labels never change in such
a way as to violate a defined security policy.
Why weak tranquility? In a real system we often want to observe the principle of least privilege and start off a process at the uncleared level, even if the
owner of the process were cleared to ‘Top Secret’. If they then access a confidential email, their session is automatically upgraded to ‘Confidential’; in general,
a process is upgraded each time it accesses data at a higher level (the high water
mark principle). As subjects are usually an abstraction of the memory management sub-system and file handles, rather than processes, this means that state
changes when access rights change, rather than when data actually moves.
The practical implication is that a process acquires the security labels of all the
files it reads, and these become the default label set of every file that it writes.
So a process which has read files at ‘Secret’ and ‘Crypto’ will thereafter create
files marked ‘Secret Crypto’. This will include temporary copies made of other
files. If it then reads a file at ‘Secret Nuclear’ then all files it creates after that
will be labelled ‘Secret Crypto Nuclear’, and it will not be able to write to any
temporary files at ‘Secret Crypto’.
The effect this has on applications is one of the serious complexities of multilevel security; most application software needs to be rewritten (or at least
modified) to run on MLS platforms. Real-time changes in security level mean
that access to resources can be revoked at any time, including in the middle of
a transaction. And as the revocation problem is generally unsolvable in modern operating systems, at least in any complete form, the applications have to
cope somehow. Unless you invest some care and effort, you can easily find
that everything ends up in the highest compartment – or that the system fragments into thousands of tiny compartments that don’t communicate at all with
each other. In order to prevent this, labels are now generally taken outside the
MLS machinery and dealt with using discretionary access control mechanisms
(I’ll discuss this in the next chapter).
Another problem with BLP, and indeed with all mandatory access control
systems, is that separating users and processes is the easy part; the hard part is
when some controlled interaction is needed. Most real applications need some
kind of trusted subject that can break the security policy; the classic example
was a trusted word processor that helps an intelligence analyst scrub a Top
Secret document when she’s editing it down to Secret [1272]. BLP is silent on
how the system should protect such an application. So it becomes part of the
Trusted Computing Base, but a part that can’t be verified using models based
solely on BLP.
9.3 Multilevel security policy
Finally it’s worth noting that even with the high-water-mark refinement, BLP
still doesn’t deal with the creation or destruction of subjects or objects (which
is one of the hard problems of building a real MLS system).
9.3.4 The evolution of MLS policies
Multilevel security policies have evolved in parallel in both the practical and
research worlds.
The first multilevel security policy was a version of high water mark written in 1967–8 for the ADEPT-50, a mandatory access control system developed
for the IBM S/360 mainframe [2010]. This used triples of level, compartment
and group, with the groups being files, users, terminals and jobs. As programs
(rather than processes) were subjects, it was vulnerable to Trojan horse compromises. Nonetheless, it laid the foundation for BLP, and also led to the current
IBM S/390 mainframe hardware security architecture [942].
The next big step was Multics. This had started as an MIT project in 1965 and
developed into a Honeywell product; it became the template and inspirational
example for ‘trusted systems’. The evaluation that was carried out on it by Paul
Karger and Roger Schell was hugely influential and was the first appearance of
the idea that malware could be hidden in the compiler [1022] – and led to Ken
Thompson’s famous paper ‘Reflections on Trusting Trust’ ten years later [1887].
Multics had a derivative system called SCOMP that I’ll discuss in section 9.4.1.
The torrent of research money that poured into multilevel security from
the 1980s led to a number of alternative formulations. Noninterference was
introduced by Joseph Goguen and Jose Meseguer in 1982 [774]. In a system
with this property, High’s actions have no effect on what Low can see.
Nondeducibility is less restrictive and was introduced by David Sutherland in
1986 [1851] to model applications such as a LAN on which there are machines
at both Low and High, with the High machines encrypting their LAN traffic6 .
Nondeducibility turned out to be too weak, as there’s nothing to stop Low
making deductions about High input with 99% certainty. Other theoretical
models include Generalized Noninterference and restrictiveness [1278]; the
Harrison-Ruzzo-Ullman model tackles the problem of how to deal with the creation and deletion of files, on which BLP is silent [869]; and the Compartmented
Mode Workstation (CMW) policy attempted to model the classification of
information using floating labels, as in the high water mark policy [808,2042].
Out of this wave of innovation, the model with the greatest impact on modern
systems is probably the type enforcement (TE) model, due to Earl Boebert and
Dick Kain [272], later extended by Lee Badger and others to Domain and Type
6 Quite
a lot else is needed to do this right, such as padding the High traffic with nulls so that Low
users can’t do traffic analysis – see [1635] for an early example of such a system. You may also
need to think about Low traffic over a High network, such as facilities for soldiers to phone home.
323
324
Chapter 9
■
Multilevel Security
Enforcement (DTE) [154]. This assigns subjects to domains and objects to types,
with matrices defining permitted domain-domain and domain-type interactions. This is used in SELinux, now a component of Android, which simplifies
it by putting both subjects and objects in types and having a matrix of allowed
type pairs [1189]. In effect this is a second access-control matrix; in addition to
having a user ID and group ID, each process has a security ID (SID). The Linux
Security Modules framework provides pluggable security where you can set
rules that operate on SIDs.
DTE introduced a language for configuration (DTEL), and implicit typing of
files based on pathname; so all objects in a given subdirectory may be declared
to be in a given domain. DTE is more general than BLP, as it starts to deal
with integrity as well as confidentiality concerns. One of the early uses was to
enforce trusted pipelines: the idea is to confine a set of processes in a pipeline so
that each can only talk to the previous stage and the next stage. This can be used
to assemble guards and firewalls that cannot be bypassed unless at least two
stages are compromised [1432]. Type-enforcement mechanisms can be aware
of code versus data, and privileges can be bound to code; in consequence the
tranquility problem can be dealt with at execute time rather than as data are
read. This can make things much more tractable. They are used, for example,
in the Sidewinder firewall.
The downside of the greater flexibility and expressiveness of TE/DTE is that
it is not always straightforward to implement policies like BLP, because of state
explosion; when writing a security policy you have to consider all the possible
interactions between different types. Other mechanisms may be used to manage policy complexity, such as running a prototype for a while to observe what
counts as normal behaviour; you can then turn on DTE and block all the information flows not seen to date. But this doesn’t give much assurance that the
policy you’ve derived is the right one.
In 1992, role-based access control (RBAC) was introduced by David Ferraiolo
and Richard Kuhn to manage policy complexity. It formalises rules that attach
primarily to roles rather than to individual users or machines [678, 679].
Transactions that may be performed by holders of a given role are specified,
then mechanisms for granting membership of a role (including delegation).
Roles, or groups, had for years been the mechanism used in practice in organizations such as banks to manage access control; the RBAC model started
to formalize this. It can be used to give finer-grained control, for example
by granting different access rights to ‘Ross as Professor’, ‘Ross as member of
the Admissions Committee’ and ‘Ross reading private email’. A variant of
it, aspect-based access control (ABAC), adds context, so you can distinguish
‘Ross at his workstation in the lab’ from ‘Ross on his phone somewhere on
Earth’. Both have been supported by Windows since Windows 8.
SELinux builds it on top of TE, so that users are mapped to roles at login time,
roles are authorized for domains and domains are given permissions to types.
9.3 Multilevel security policy
On such a platform, RBAC can usefully deal with integrity issues as well as
confidentiality, by allowing role membership to be revised when certain programs are invoked. Thus, for example, a process calling untrusted software that
had been downloaded from the net might lose the role membership required
to write to sensitive system files. I discuss SELinux in more detail at 9.5.2.
9.3.5 The Biba model
The incorporation into Windows 7 of a multilevel integrity model revived interest in a security model devised in 1975 by Ken Biba [238], which deals with
integrity alone and ignores confidentiality. Biba’s observation was that confidentiality and integrity are in some sense dual concepts – confidentiality is a
constraint on who can read a message, while integrity is a constraint on who
can write or alter it. So you can recycle BLP into an integrity policy by turning
it upside down.
As a concrete application, an electronic medical device such as an ECG may
have two separate modes: calibration and use. Calibration data must be protected from corruption, so normal users should be able to read it but not write
to it; when a normal user resets the device, it will lose its current user state (i.e.,
any patient data in memory) but the calibration must remain unchanged. Only
an authorised technician should be able to redo the calibration.
To model such a system, we can use a multilevel integrity policy with the
rules that we can read data at higher levels (i.e., a user process can read the
calibration data) and write to lower levels (i.e., a calibration process can write to
a buffer in a user process); but we must never read down or write up, as either
could allow High integrity objects to become contaminated with Low – i.e.,
potentially unreliable – data. The Biba model is often formulated in terms of
the low water mark principle, which is the dual of the high water mark principle
discussed above: the integrity of an object is the lowest level of all the objects
that contributed to its creation.
This was the first formal model of integrity. A surprisingly large number
of real systems work along Biba lines. For example, the passenger information system in a railroad may get information from the signalling system, but
shouldn’t be able to affect it; and an electricity utility’s power dispatching system will be able to see the safety systems’ state but not interfere with them. The
safety-critical systems community talks in terms of safety integrity levels, which
relate to the probability that a safety mechanism will fail and to the level of risk
reduction it is designed to give.
Windows, since version 6 (Vista), marks file objects with an integrity level,
which can be Low, Medium, High or System, and implements a default policy
of NoWriteUp. Critical files are at System and other objects are at Medium by
default – except for the browser which is at Low. So things downloaded using
325
326
Chapter 9
■
Multilevel Security
IE can read most files in a Windows system, but cannot write to them. The goal
is to limit the damage that can be done by malware.
As you might expect, Biba has the same fundamental problems as BellLaPadula. It cannot accommodate real-world operation very well without
numerous exceptions. For example, a real system will usually require trusted
subjects that can override the security model, but Biba on its own cannot
protect and confine them, any more than BLP can. For example, a car’s airbag
is on a less critical bus than the engine, but when it deploys you assume there’s
a risk of a fuel fire and switch the engine off. There are other real integrity
goals that Biba also cannot express, such as assured pipelines. In the case of
Windows, Microsoft even dropped the NoReadDown restriction and did not
end up using its integrity model to protect the base system from users, as this
would have required even more frequent user confirmation. In fact, the Type
Enforcement model was introduced by Boebert and Kain as an alternative to
Biba. It is unfortunate that Windows didn’t incorporate TE.
9.4 Historical examples of MLS systems
The second edition of this book had a much fuller history of MLS systems; since
these have largely gone out of fashion, and the MLS research programme has
been wound down, I give a shorter version here.
9.4.1 SCOMP
A key product was the secure communications processor (SCOMP), a derivative of
Multics launched in 1983 [710]. This was a no-expense-spared implementation
of what the US Department of Defense believed it wanted for handling messaging at multiple levels of classification. It had formally verified hardware and
software, with a minimal kernel to keep things simple. Its operating system,
STOP, used Multics’ system of rings to maintain up to 32 separate compartments, and to allow appropriate one-way information flows between them.
SCOMP was used in applications such as military mail guards. These are
firewalls that allow mail to pass from Low to High but not vice versa [538].
(In general, a device which supports one-way flow is known as a data diode.)
SCOMP’s successor, XTS-300, supported C2G, the Command and Control
Guard. This was used in the time phased force deployment data (TPFDD)
system whose function was to plan US troop movements and associated
logistics. SCOMP’s most significant contribution was to serve as a model for
the Orange Book [544] – the US Trusted Computer Systems Evaluation Criteria.
This was the first systematic set of standards for secure computer systems,
being introduced in 1985 and finally retired in December 2000. The Orange
9.4 Historical examples of MLS systems
Book was enormously influential not just in the USA but among allied powers;
countries such as the UK, Germany, and Canada based their own national
standards on it, until these national standards were finally subsumed into the
Common Criteria [1398].
The Orange Book allowed systems to be evaluated at a number of levels with
A1 being the highest, and moving downwards through B3, B2, B1 and C2 to C1.
SCOMP was the first system to be rated A1. It was also extensively documented
in the open literature. Being first, and being fairly public, it set a target for the
next generation of military systems.
MLS versions of Unix started to appear in the late 1980s, such as AT&T’s
System V/MLS [48]. This added security levels and labels, showing that MLS
properties could be introduced to a commercial operating system with minimal changes to the system kernel. By this book’s second edition (2007), Sun’s
Solaris had emerged as the platform of choice for high-assurance server systems and for many clients as well. Comparted Mode Workstations (CMWs) were
an example of the latter, allowing data at different levels to be viewed and
modified at the same time, so an intelligence analyst could read ‘Top Secret’
data in one window and write reports at ‘Secret’ in another, without being
able to accidentally copy and paste text downwards [934]. For the engineering,
see [635, 636].
9.4.2 Data diodes
It was soon realised that simple mail guards and crypto boxes were too
restrictive, as more complex networked services were developed besides mail.
First-generation MLS mechanisms were inefficient for real-time services.
The US Naval Research Laboratory (NRL) therefore developed the Pump – a
one-way data transfer device (a data diode) to allow secure one-way information flow (Figure 9.3. The main problem is that while sending data from Low to
HIGH
PUMP
LOW
Figure 9.3: The NRL pump
327
328
Chapter 9
■
Multilevel Security
High is easy, the need for assured transmission reliability means that acknowledgement messages must be sent back from High to Low. The Pump limits the
bandwidth of possible backward leakage using a number of mechanisms such
as buffering and random timing of acknowledgements [1014,1016, 1017]. The
attraction of this approach is that one can build MLS systems by using data
diodes to connect separate systems at different security levels. As these systems don’t process data at more than one level – an architecture called system
high – they can be built from cheap commercial-off-the-shelf (COTS) components.
You don’t need to worry about applying MLS internally, merely protecting
them from external attack, whether physical or network-based. As the cost
of hardware has fallen, this has become the preferred option, and the world’s
military bases are now full of KVM switches (which let people switch their
keyboard, video display and mouse between Low and High systems) and data
diodes (to link Low and High networks). The pump’s story is told in [1018].
An early application was logistics. Some signals intelligence equipment is
‘Top Secret’, while things like jet fuel and bootlaces are not; but even such
simple commodities may become ‘Secret’ when their quantities or movements
might leak information about tactical intentions. The systems needed to manage all this can be hard to build; MLS logistics projects in both the USA and UK
have ended up as expensive disasters. In the UK, the Royal Air Force’s Logistics
Information Technology System (LITS) was a 10 year (1989–99), £500m project
to provide a single stores management system for the RAF’s 80 bases [1388]. It
was designed to operate on two levels: ‘Restricted’ for the jet fuel and boot polish, and ‘Secret’ for special stores such as nuclear bombs. It was initially implemented as two separate database systems connected by a pump to enforce the
MLS property. The project became a classic tale of escalating costs driven by
creeping changes in requirements. One of these changes was the easing of classification rules with the end of the Cold War. As a result, it was found that
almost all the ‘Secret’ information was now static (e.g., operating manuals for
air-drop nuclear bombs that are now kept in strategic stockpiles rather than
at airbases). To save money, the ‘Secret’ information is now kept on a CD and
locked up in a safe.
Another major application of MLS is in wiretapping. The target of investigation should not know they are being wiretapped, so the third party must be
silent – and when phone companies started implementing wiretaps as silent
conference calls, the charge for the conference call had to go to the wiretapper,
not to the target. The modern requirement is a multilevel one: multiple agencies at different levels may want to monitor a target, and each other, with the
police tapping a drug dealer, an anti-corruption unit watching the police, and
so on. Eliminating covert channels is harder than it looks; for a survey from the
mid-2000s, see [1710]; a pure MLS security policy is insufficient, as suspects
can try to hack or confuse wiretapping equipment, which therefore needs to
resist online tampering. In one notorious case, a wiretap was discovered on the
9.5 MAC: from MLS to IFC and integrity
mobile phones of the Greek Prime Minister and his senior colleagues during
the Athens olympics; the lawful intercept facility in the mobile phone company’s switchgear was abused by unauthorised software, and was detected
when the buggers’ modifications caused some text messages not to be delivered [1553]. The phone company was fined 76 million Euros (almost $100m).
The clean way to manage wiretaps nowadays with modern VOIP systems may
just be to write everything to disk and extract what you need later.
There are many military embedded systems too. In submarines, speed, reactor output and RPM are all Top Secret, as a history of these three measurements
would reveal the vessel’s performance – and that’s among the few pieces of
information that even the USA and the UK don’t share. The engineering is
made more complex by the need for the instruments not to be Top Secret when
the vessel is in port, as that would complicate maintenance. And as for air
combat, some US radars won’t display the velocity of a US aircraft whose performance is classified, unless the operator has the appropriate clearance. When
you read stories about F-16 pilots seeing an insanely fast UFO whose speed on
their radar didn’t make any sense, you can put two and two together. It will be
interesting to see what sort of other side-effects follow when powerful actors
try to bake MAC policies into IoT infrastructure, and what sort of superstitious
beliefs they give rise to.
9.5 MAC: from MLS to IFC and integrity
In the first edition of this book, I noted a trend to use mandatory access
controls to prevent tampering and provide real-time performance guarantees
[1021, 1315], and ventured that “perhaps the real future of multilevel systems
is not in confidentiality, but integrity.” Government agencies had learned that
MAC was what it took to stop malware. By the second edition, multilevel
integrity had hit the mass market in Windows, which essentially uses the Biba
model.
9.5.1 Windows
In Windows, all processes do, and all securable objects (including directories, files and registry keys) may, have an integrity-level label. File objects
are labelled ‘Medium’ by default, while Internet Explorer (and everything
downloaded using it) is labelled ‘Low’. User action is therefore needed to
upgrade downloaded content before it can modify existing files. It’s also
possible to implement a crude BLP policy using Windows, as you can also set
‘NoReadUp’ and ‘NoExecuteUp’ policies. These are not installed as default;
Microsoft was concerned about malware installing itself in the system and
329
330
Chapter 9
■
Multilevel Security
then hiding. Keeping the browser ‘Low’ makes installation harder, and
allowing all processes (even Low ones) to inspect the rest of the system makes
hiding harder. But this integrity-only approach to MAC does mean that
malware running at Low can steal all your data; so some users might care
to set ‘NoReadUp’ for sensitive directories. This is all discussed by Joanna
Rutkowska in [1637]; she also describes some interesting potential attacks
based on virtualization.
9.5.2 SELinux
The case of SELinux is somewhat similar to Windows in that the immediate
goal of mandatory access control mechanisms was also to limit the effects of a
compromise. SELinux [1189] was implemented by the NSA, based on the Flask
security architecture [1815], which separates the policy from the enforcement
mechanism; a security context contains all of the security attributes associated
with a subject or object in Flask, where one of those attributes includes the Type
Enforcement type attribute. A security identifier is a handle to a security context, mapped by the security server. This is where policy decisions are made
and resides in the kernel for performance [820]. It has been mainstream since
Linux 2.6. The server provides a security API to the rest of the kernel, behind
which the security model is hidden. The server internally implements a general constraints engine that can express RBAC, TE, and MLS. In typical Linux
distributions from the mid-2000s, it was used to separate various services, so
an attacker who takes over your web server does not thereby acquire your DNS
server as well. Its adoption by Android has made it part of the world’s most
popular operating system, as described in Chapter 6.
9.5.3 Embedded systems
There are many fielded systems that implement some variant of the Biba
model. As well as the medical-device and railroad signalling applications I
already mentioned, there are utilities. In an electricity utility, for example,
there is typically a hierarchy of safety systems, which operate completely
independently at the highest safety integrity level; these are visible to, but
cannot be influenced by, operational systems such as power dispatching;
retail-level metering systems can be observed by, but not influenced by,
the billing system. Both retail meters and the substation-level meters in the
power-dispatching system feed information into fraud detection, and finally
there are the executive information systems, which can observe everything
while having no direct effect on operations. In cars, most makes have separate CAN buses for the powertrain and for the cabin, as you don’t want
a malicious app on your radio to be able to operate your brakes (though
9.6 What goes wrong
in 2010, security researchers found that the separation was completely
inadequate [1087]).
It’s also worth bearing in mind that simple integrity controls merely stop
malware taking over the machine – they don’t stop it infecting a Low compartment and using that as a springboard from which to spread elsewhere, or to
issue instructions to other machines.
To sum up, many of the lessons learned in the early multilevel systems go
across to a number of applications of wider interest. So do a number of the
failure modes, which I’ll now discuss.
9.6 What goes wrong
Engineers learn more from the systems that fail than from those that succeed,
and here MLS systems have been an effective teacher. The billions of dollars
spent on building systems to follow a simple policy with a high level of assurance have clarified many second-order and third-order consequences of information flow controls. I’ll start with the more theoretical and work through to
the business and engineering end.
9.6.1 Composability
Consider a simple device that accepts two ‘High’ inputs H1 and H2 ; multiplexes
them; encrypts them by xor’ing them with a one-time pad (i.e., a random generator); outputs the other copy of the pad on H3 ; and outputs the ciphertext,
which being encrypted with a cipher system giving perfect secrecy, is considered to be low (output L), as in Figure 9.4.
In isolation, this device is provably secure. However, if feedback is permitted, then the output from H3 can be fed back into H2 , with the result that the
high input H1 now appears at the low output L. Timing inconsistencies can
also break the composition of two secure systems (noted by Daryl McCullough [1262]).
RAND
H3
•
XOR
L
H2
H1
XOR
Figure 9.4: Insecure composition of secure systems with feedback
331
332
Chapter 9
■
Multilevel Security
In general, the composition problem – how to compose two or more secure components into a secure system – is hard, even at the relatively uncluttered level
of proving results about ideal components [1432]. (Simple information flow
doesn’t compose; neither does noninterference or nondeducibility.) Most of
the low-level problems arise when some sort of feedback is introduced; without it, composition can be achieved under a number of formal models [1279].
However, in real life, feedback is pervasive, and composition of security properties can be made even harder by interface issues, feature interactions and so
on. For example, one system might produce data at such a rate as to perform
a service-denial attack on another. And the composition of secure components
is often frustrated by higher-level incompatibilities. Components might have
been designed in accordance with two different security policies, or designed
according to inconsistent requirements.
9.6.2 The cascade problem
An example of the composition problem is given by the cascade problem
(Figure 9.5). After the Orange book introduced a series of evaluation levels,
this led to span-limit rules about the number of levels at which a system
can operate [548]. For example, a system evaluated to B3 was in general
allowed to process information at Unclassified, Confidential and Secret, or at
Confidential, Secret and Top Secret; there was no system permitted to process
Unclassified and Top Secret data simultaneously [548].
As the diagram shows, it is straightforward to connect together two B3 systems in such a way that this policy is broken. The first system connects together
Unclassified and Secret, and its Secret level communicates with the second
system – which also processes Top Secret information [925]. This defeats the
span limit.
Top Secret
Secret
Secret
Unclassified
Figure 9.5: The cascade problem
9.6 What goes wrong
9.6.3 Covert channels
One of the reasons why span limits are imposed on multilevel systems emerges
from a famous – and extensively studied – problem: the covert channel. First
pointed out by Lampson in 1973 [1127], a covert channel is a mechanism that
was not designed for communication but that can nonetheless be abused to
allow information to be communicated down from High to Low.
A typical covert channel arises when a high process can signal to a low one by
affecting some shared resource. In a modern multicore CPU, it could increase
the clock frequency of the CPU core it’s using at time ti to signal that the i-th
bit in a Top Secret file was a 1, and let it scale back to signal that the bit was a 0.
This gives a covert channel capacity of several tens of bits per second [36]. Since
2018, CPU designers have been struggling with a series of covert channels that
exploit the CPU microarchitecture; with names like Meltdown, Spectre, and
Foreshadow, they have provided not just ways for High to signal to Low but
for Low to circumvent access control and read memory at High. I will discuss
these in detail in the chapter on side channels.
The best that developers have been able to do consistently with confidentiality protection in regular operating systems is to limit it to 1 bit per second or
so. (That is a DoD target [545], and techniques for doing a systematic analysis
may be found in Kemmerer [1038].) One bit per second may be tolerable in an
environment where we wish to prevent large TS/SCI files – such as satellite
photographs – leaking down from TS/SCI users to ‘Secret’ users. However,
it’s potentially a lethal threat to high-value cryptographic keys. This is one of
the reasons for the military and banking doctrine of doing crypto in special
purpose hardware.
The highest-bandwidth covert channel of which I’m aware occurs in large
early-warning radar systems, where High – the radar processor – controls hundreds of antenna elements that illuminate Low – the target – with high speed
pulse trains, which are modulated with pseudorandom noise to make jamming
harder. In this case, the radar code must be trusted as the covert channel bandwidth is many megabits per second.
9.6.4 The threat from malware
The defense computer community was shocked when Fred Cohen wrote the
first thesis on computer viruses, and used a virus to penetrate multilevel secure
systems easily in 1983. In his first experiment, a file virus that took only eight
hours to write managed to penetrate a system previously believed to be multilevel secure [452]. People had been thinking about malware since the 1960s
and had done various things to mitigate it, but their focus had been on Trojans.
There are many ways in which malicious code can be used to break access
controls. If the reference monitor (or other TCB components) can be corrupted,
333
334
Chapter 9
■
Multilevel Security
then malware can deliver the entire system to the attacker, for example by issuing an unauthorised clearance. For this reason, slightly looser rules apply to
so-called closed security environments which are defined to be those where ‘system applications are adequately protected against the insertion of malicious
logic’ [548], and this in turn created an incentive for vendors to tamper-proof
the TCB, using techniques such as TPMs. But even if the TCB remains intact,
malware could still copy itself up from Low to High (which BLP doesn’t prevent) and use a covert channel to signal information down.
9.6.5 Polyinstantiation
Another problem that exercised the research community is polyinstantiation.
Suppose our High user has created a file named agents, and our Low user
now tries to do the same. If the MLS operating system prohibits him, it will
have leaked information – namely that there is a file called agents at High. But
if it lets him, it will now have two files with the same name.
Often we can solve the problem by a naming convention, such as giving Low
and High users different directories. But the problem remains a hard one for
databases [1652]. Suppose that a High user allocates a classified cargo to a ship.
The system will not divulge this information to a Low user, who might think
the ship is empty, and try to allocate it another cargo or even to change its
destination.
Here the US and UK practices diverge. The solution favoured in the USA is
that the High user allocates a Low cover story at the same time as the real High
cargo. Thus the underlying data will look something like Figure 9.6.
In the UK, the theory is simpler – the system will automatically reply ‘classified’ to a Low user who tries to see or alter a High record. The two available
views would be as in Figure 9.7.
This makes the system engineering simpler. It also prevents the mistakes and
covert channels that can still arise with cover stories (e.g., a Low user tries to
add a container of ammunition for Cyprus). The drawback is that everyone
tends to need the highest available clearance in order to get their work done.
(In practice, cover stories still get used in order not to advertise the existence
of a covert mission any more than need be.)
Level
Cargo
Destination
Secret
Missiles
Iran
Restricted
–
–
Unclassified
Engine spares
Cyprus
Figure 9.6: how the USA deals with classified data
9.6 What goes wrong
Level
Cargo
Destination
Secret
Missiles
Iran
Restricted
Classified
Classified
Unclassified
–
–
Figure 9.7: how the UK deals with classified data
9.6.6 Practical problems with MLS
Multilevel secure systems are surprisingly expensive and difficult to build and
deploy. There are many sources of cost and confusion.
1. They are built in small volumes, and often to high standards of physical
robustness, using elaborate documentation, testing and other quality
control measures driven by military purchasing bureaucracies.
2. MLS systems have idiosyncratic administration tools and procedures. A trained Unix administrator can’t just take on an MLS
installation without significant further training; so many MLS
systems are installed without their features being used.
3. Many applications need to be rewritten or at least greatly modified to
run under MLS operating systems [1632].
4. Because processes are automatically upgraded as they see new labels,
the files they use have to be too. New files default to the highest label
belonging to any possible input. The result of all this is a chronic tendency for things to be overclassified. There’s a particular problem when
system components accumulate all the labels they’ve seen, leading to
label explosion where they acquire such a collection that no single principal can access them any more. So they get put in the trusted computing base, which ends up containing a quite uncomfortably large part
of the operating system (plus utilities, plus windowing system software, plus middleware such as database software). This ‘TCB bloat’
constantly pushes up the cost of evaluation and reduces assurance.
5. The classification of data can get complex:
in the run-up to a conflict, the location of ‘innocuous’ stores
such as food could reveal tactical intentions, and so may be suddenly
upgraded;
classifications are not always monotone. Equipment classified at ‘confidential’ may easily contain components classified ‘secret’, and on the
flip side it’s hard to grant access at ‘secret’ to secret information in a
‘top secret’ database;
335
336
Chapter 9
■
Multilevel Security
information may need to be downgraded. An intelligence analyst might need to take a satellite photo classified at TS/SCI, and
paste it into an assessment for field commanders at ‘secret‘. In
case information was covertly hidden in the image by a virus,
this may involve special filters, lossy compression of images
and so on. One option is a ‘print-and-fax’ mechanism that
turns a document into a bitmap, and logs it for traceability.
we may need to worry about the volume of information
available to an attacker. For example, we might be happy to
declassify any single satellite photo, but declassifying the
whole collection would reveal our surveillance capability
and the history of our intelligence priorities. (I will look at
this aggregation problem in more detail in section 11.2.)
Similarly, the output of an unclassified program acting on unclassified
data may be classified, for example if standard data mining techniques applied to an online forum throw up a list of terror suspects.
6. Although MLS systems can prevent undesired things (such as information leakage), they also prevent desired things too (such as building
a search engine to operate across all an agency’s Top Secret compartmented data). So even in military environments, the benefits can be
questionable. After 9/11, many of the rules were relaxed, and access
controls above Top Secret are typically discretionary, to allow information sharing. The cost of that, of course, was the Snowden disclosures.
7. Finally, obsessive government secrecy is a chronic burden. The late
Senator Daniel Moynihan wrote a critical study of its real purposes,
and its huge costs in US foreign and military affairs [1348]. For
example, President Truman was never told of the Venona decrypts
because the material was considered ‘Army Property’. As he put it:
“Departments and agencies hoard information, and the government
becomes a kind of market. Secrets become organizational assets, never
to be shared save in exchange for another organization’s assets.”
More recent examples of MLS doctrine impairing operational effectiveness include the use of unencrypted communications to drones
in the Afghan war (as the armed forces feared that if they got the
NSA bureaucracy involved, the drones would be unusable), and
the use of the notoriously insecure Zoom videoconferencing system
for British government cabinet meetings during the coronavirus
crisis (the government’s encrypted videoconferencing terminals
are classified, so ministers aren’t allowed to take them home). This
brings to mind a quip from an exasperated British general: “What’s
the difference between Jurassic Park and the Ministry of Defence?
One’s a theme park full of dinosaurs, and the other’s a movie!”
9.7 Summary
There has been no shortage of internal strategic critique. A 2004 report by
Mitre’s JASON programme of the US system of classification concluded that it
was no longer fit for purpose [980]. There are many interesting reasons, including the widely different risk/benefit calculations of the producer and consumer
communities; classification comes to be dominated by distribution channels
rather than by actual risk. The relative ease of attack has led government systems to be too conservative and risk-averse. It noted many perverse outcomes;
for example, Predator imagery in Iraq is Unclassified, and was for some time
transmitted in clear, as the Army feared that crypto would involve the NSA
bureaucracy in key management and inhibit warfighting.
Mitre proposed instead that flexible compartments be set up for specific purposes, particularly when getting perishable information to tactical compartments; that intelligent use be made of technologies such as rights management
and virtualisation; and that lifetime trust in cleared individuals be replaced
with a system focused on transaction risk.
Anyway, one of the big changes since the second edition of this book is that
the huge DoD research programme on MLS has disappeared, MLS equipment
is no longer very actively promoted on the government-systems market, and
systems have remained fairly static for a decade. Most government systems
now operate system high – that is, entirely at Official, or at Secret, or at Top
Secret. The difficulties discussed in the above section, plus the falling cost of
hardware and the arrival of virtualisation, have undermined the incentive to
have different levels on the same machine. The deployed MLS systems thus
tend to be firewalls or mail guards between the different levels, and are often
referred to by a new acronym, MILS (for multiple independent levels of security). The real separation is at the network level, between unclassified networks,
the Secret Internet Protocol Router Network (SIPRNet) which handles secret
data using essentially standard equipment behind crypto, and the Joint Worldwide Intelligence Communications System (JWICS) which handles Top Secret
material and whose systems are kept in Secure Compartmentalized Information Facilities (SCIFs) – rooms shielded to prevent electronic eavesdropping,
which I’ll discuss later in the chapter on side channels.
There are occasional horrible workarounds such as ‘browse-down’ systems
that will let someone at High view a website at Low; they’re allowed to click
on buttons and links to navigate, just not to enter any text. Such ugly hacks
have clear potential for abuse; at best they can help keep honest people from
careless mistakes.
9.7 Summary
Mandatory access control was initially developed for military applications,
where it is still used in specialized firewalls (guards and data diodes). The main
337
338
Chapter 9
■
Multilevel Security
use of MAC mechanisms nowadays, however, is in platforms such as Android,
iOS and Windows, where they protect the operating systems themselves from
malware. MAC mechanisms have been a major subject of computer security
research since the mid-1970’s, and the lessons learned in trying to use them
for military multilevel security underlie many of the schemes used for security evaluation. It is important for the practitioner to understand both their
strengths and limitations, so that you can draw on the research literature when
it’s appropriate, and avoid being dragged into overdesign when it’s not.
There are many problems which we need to be a ‘fox’ rather than a ‘hedgehog’ to solve. By trying to cast all security problems as hedgehog problems,
MLS often leads to inappropriate security goals, policies and mechanisms.
Research problems
A standing challenge, sketched out by Earl Boebert in 2001 after the NSA
launched SELinux, is to adapt mandatory access control mechanisms to
safety-critical systems (see the quote at the head of this chapter, and [271]).
As a tool for building high-assurance, special-purpose devices where the
consequences of errors and failures can be limited, mechanisms such as type
enforcement and role-based access control should be useful outside the world
of security. Will we see them widely used in the Internet of Things? We’ve
mentioned Biba-type mechanisms in applications such as cars and electricity
distribution; will the MAC mechanisms in products such as SELinux, Windows and Android enable designers to lock down information flows and
reduce the likelihood of unanticipated interactions?
The NSA continues to fund research on MLS, now under the label of IFC,
albeit at a lower level than in the past. Doing it properly in a modern smartphone is hard; for an example of such work, see the Weir system by Adwait
Nadkarni and colleagues [1374]. In addition to the greater intrinsic complexity of modern operating systems, phones have a plethora of side-channels and
their apps are often useful only in communication with cloud services, where
the real heavy lifting has to be done. The commercial offering for separate ‘low’
and ‘high’ phones consists of products such as Samsung’s Knox.
A separate set of research issues surround actual military opsec, where reality
falls far short of policy. All armed forces involved in recent conflicts, including
US and UK forces in Iraq and Afghanistan, have had security issues around
their personal mobile phones, with insurgents in some cases tracing their families back home and harassing them with threats. The Royal Navy tried to
ban phones in 2009, but too many sailors left. Tracking ships via Instagram
is easy; a warship consists of a few hundred young men and women, aged
18-24, with nothing much else to do but put snaps on social media. Discipline
tends to focus on immediate operational threats, such as when a sailor is seen
Further reading
snapchatting on mine disposal: there the issue is the risk of using a radio near
a mine! Different navies have tried different things: the Norwegians have their
own special network for sailors and the USA is trying phones with MLS features. But NATO exercises have shown that for one navy to hack another’s
navigation is shockingly easy. And even the Israelis have had issues with their
soldiers using mobiles on the West Bank and the Golan Heights.
Further reading
The unclassified manuals for the UK government’s system of information classification, and the physical, logical and other protection mechanisms required
at the different levels, have been available publicly since 2013, with the latest
documents (at the time of writing) having been released in November 2018 on
the Government Security web page [803]. The report on the Walker spy ring is a
detailed account of a spectacular failure, and brings home the sheer complexity
of running a system in which maybe three million people have a clearance at
any one time, with a million applications being processed each year [878]. And
the classic on the abuse of the classification process to cover up waste, fraud
and mismanagement in the public sector is by Chapman [409].
On the technical side, textbooks such as Dieter Gollmann’s Computer
Security [780] give an introduction to MLS systems, while many of the
published papers on actual MLS systems can be found in the proceedings of
two conferences: academics’ conference is the IEEE Symposium on Security &
Privacy (known in the trade as ‘Oakland’ as that’s where it used to be held),
while the NSA supplier community’s unclassified bash is the Computer Security Applications Conference (ACSAC) whose proceedings are (like Oakland’s)
published by the IEEE. Fred Cohen’s experiments on breaking MLS systems
using viruses are described in his book [452]. Many of the classic early papers
in the field can be found at the NIST archive [1397]; NIST ran a conference
series on multilevel security up till 1999. Finally, a history of the Orange Book
was written by Steve Lipner [1172]; this also tells the story of the USAF’s early
involvement and what was learned from systems like WWMCCS.
339
CHAPTER
10
Boundaries
They constantly try to escape
From the darkness outside and within
By dreaming of systems so perfect that no one will need to be good.
– TS ELIOT
You have zero privacy anyway. Get over it.
– SCOTT MCNEALY
10.1 Introduction
When we restrict information flows to protect privacy or confidentiality, a policy goal is usually not to prevent information flowing ‘down’ a hierarchy but
to prevent it flowing ‘across’ between smaller groups.
1. If you give the million US Federal employees and contractors with a
Top Secret clearance access to too much Top Secret data, then you get a
whistleblower like Ed Snowden if you’re lucky, or a traitor like Aldrich
Ames if you’re not.
2. As mobile phones spread round the world, they’ve made wildlife crime
easier. Game rangers and others who fight poaching face organised
crime, violence and insider threats at all levels, but unlike in national
intelligence there’s no central authority to manage clearances and counterintelligence.
3. If you let too many people in a health service see patient records, you
get scandals where staff look up data on celebrities. And the existence
of big central systems can lead to big scandals, such as where a billion
English medical records going back a decade were sold to multiple drug
companies.
341
342
Chapter 10
■
Boundaries
4. Similar issues arise in social care and in education. There are frequent
calls for data sharing, yet attempts to do it in practice cause all sorts of
problems.
5. If you let everyone in a bank or an accountancy firm see all the customer
records, then an unscrupulous manager could give really good advice
to a client by looking at the confidential financial information of that
client’s competitors.
The basic problem is that if you centralise systems containing sensitive information, you create a more valuable asset and simultaneously give more people
access to it. Just as the benefits of networks can scale more than linearly, so can
the harms.
A common mitigation is to restrict how much information any individual
sees. In our five example cases above:
1. Intelligence services put sensitive information into compartments, so
that an analyst working on Argentina might see only the Top Secret
reports relating to Argentina and its neighbouring countries;
2. Systems that support game conservation have to do something
similar, but access control has to be a federated effort involving
multiple conservancies, researchers, rangers and other actors;
3. Many hospital systems limit staff access to the wards or departments
where they work, to the extent that this is reasonably practical, and
patients have a right to forbid the use of their data outside their direct
care. Both are becoming more difficult to implement as systems get more
complex and their operators lack the incentive to make the effort;
4. In 2010, the UK parliament closed down a system that was supposed
to give doctors, teachers and social workers shared access to all
childrens’ data, as they realised it was both unsafe and illegal. Yet
there’s constant pressure for information sharing, and all sorts of issues
with schools and other institutions using dubious cloud services;
5. Financial firms have ‘Chinese walls’ between different parts of the
business, and bank staff are now often limited to accessing records
for which they have a recent customer authorisation, such as by
the customer answering security questions over the phone.
We will discuss these kinds of access control in this chapter. There are several aspects: what sort of technical designs are feasible, the operational costs
they impose on the organisation, and – often the critical factor – whether the
organisation is motivated to implement and police them properly.
In the last chapter, we discussed multilevel security and saw that it can be
hard to get the mechanisms right. In this chapter, we’ll see that when we go for
fine-grained access controls, it’s also hard to get the policy right. Are the groups
or roles static or dynamic? Are they set by national policy, by commercial law,
10.1 Introduction
by professional ethics, or – as with your group of Facebook friends – by the system’s users? What happens when people fight over the rules, or deceive each
other? Even where everyone is working for the same boss, different parts of an
organisation can have quite different incentives. Some problems can be technically complex but simple in policy terms (wildlife) while others use standard
mechanisms but have wicked policy problems (healthcare).
To start with a simpler case, suppose you’re trying to set security policy at the
tax collection office. Staff have been caught in the past making improper access
to the records of celebrities, selling data to outsiders, and leaking income details
in alimony cases [189]. How might you go about stopping that?
Your requirement might be to stop staff looking at tax records belonging to a
different geographical region, or a different industry – except under strict controls. Thus instead of the information flow control boundaries being horizontal
as we saw in the classic civil service model in Figure 10.1, we actually need the
boundaries to be mostly vertical, as shown in Figure 10.2.
Lateral information flow controls may be organizational, as when an intelligence agency keeps the names of agents working in one foreign country
secret from the department responsible for spying on another. They may be
relationship-based, as in a law firm where different clients’ affairs, and the
clients of different partners, must be kept separate. They may be a mixture of
the two, as in medicine where patient confidentiality is based in law on the
rights of the patient but may be enforced by limiting access to a particular
hospital department or medical practice. They may be volumetric, as when
a game conservancy doesn’t mind declassifying a handful of leopard photos
but doesn’t want the poachers to get the whole collection, as that would let
them work out the best places to set traps.
Doctors, bankers and spies have all learned that as well as preventing overt
information flows, they also have to prevent information leakage through
side-channels such as billing data. The mere fact that patient X paid doctor Y
suggests that X suffered from something in Y’s speciality.
TOP SECRET
SECRET
CONFIDENTIAL
OPEN
Figure 10.1: Multilevel security
A
B
C
D
shared data
Figure 10.2: Multilateral security
E
343
344
Chapter 10
■
Boundaries
10.2 Compartmentation and the lattice model
The United States and its allies restrict access to secret information by codewords as well as classifications. These are pre-computer mechanisms for
expressing an access control group, such as the codeword Ultra in World War
2, which referred to British and American decrypts of messages that had been
enciphered using the German Enigma machine. The fact that the Enigma had
been broken was worth protecting at almost any cost. So Ultra clearances
were given to only a small group of people – in addition to the cryptologists,
translators and analysts, the list included the Allied leaders and their senior
generals. No-one who had ever held an Ultra clearance could be placed at risk
of capture; and the intelligence could never be used in such a way as to let
Hitler suspect that his principal cipher had been broken. So when Ultra told of
a target, such as an Italian convoy to North Africa, the Allies would send over
a plane to ‘spot’ it an hour or so before the attack. This policy was enforced
by special handling rules; for example, Churchill got his Ultra summaries in a
special dispatch box to which he had a key but his staff did not. (Ultra security
is described by David Kahn [1004] and Gordon Welchman [2011].)
Much the same precautions are in place today. Information whose compromise could expose intelligence sources or methods is marked TS/SCI for
‘Top Secret – Special Compartmented Intelligence’ and may have one or more
codewords. A classification plus a set of codewords gives a compartment or
security context. So if you have N codewords, you can have 2N compartments;
some intelligence agencies have had over a million of them active. This
caution was a reaction to a series of disastrous insider threats. Aldrich Ames,
a CIA officer who had accumulated access to a large number of compartments by virtue of long service and seniority, and because he worked in
counterintelligence, was able to betray almost the entire US agent network
in Russia. The KGB’s overseas operations were similarly compromised by
Vassily Mitrokhin – an officer who’d become disillusioned with communism
and who was sent to work in the archives while waiting for his pension [119].
There was an even earlier precedent in the Walker spy case. There, an attempt
to keep naval vessels in compartments just didn’t work, as a ship could be
sent anywhere without notice, and for a ship to have no local key material was
operationally unacceptable. So the US Navy’s 800 ships all ended up with the
same set of cipher keys, which the Walker family sold to the Russians [878].
You clearly don’t want anybody to have access to too much, but how can you
do that?
Attempts were made to implement compartments using mandatory access
controls, leading to the lattice model. Classifications together with codewords
form a lattice – a mathematical structure in which any two objects A and B can
be in a dominance relation A > B or B > A. They don’t have to be: A and B
could simply be incomparable (but in this case, for the structure to be a lattice,
10.2 Compartmentation and the lattice model
they will have a least upper bound and a greatest lower bound). As an illustration, suppose we have a codeword, say ‘Crypto’. Then someone cleared to
‘Top Secret’ would be entitled to read files classified ‘Top Secret’ and ‘Secret’,
but would have no access to files classified ‘Secret Crypto’ unless he also had
a crypto clearance. This can be expressed as shown in Figure 10.3.
(TOP SECRET, {CRYPTO, FOREIGN})
(TOP SECRET, {CRYPTO})
(TOP SECRET, {})
(SECRET, {CRYPTO, FOREIGN})
(SECRET, {CRYPTO})
(SECRET, {})
(UNCLASSIFIED, {})
Figure 10.3: A lattice of security labels
As it happens, the Bell-LaPadula model can work more or less unchanged.
We still have information flows between High and Low as before, where High
is a compartment that dominates Low. If two nodes in a lattice are incompatible — as with ‘Top Secret’ and ‘Secret Crypto’ in Figure 10.3 – then there
should be no information flow between them at all. In fact, the lattice and
Bell-LaPadula models are essentially equivalent, and were developed in parallel. Most products built in the 20th century for the multilevel secure market
could be used in compartmented mode. For a fuller history, see the second
edition of this book.
In practice, mandatory access control products turned out to be not that
effective for compartmentation. It is easy to use such a system to keep data in
different compartments separate – just give them incompatible labels (‘Secret
Tulip’, ‘Secret Daffodil’, ‘Secret Crocus’, … ). But the operating system has now
become an isolation mechanism, rather than a sharing mechanism; and the
real problems facing users of intelligence systems have to do with combining
data in different compartments, and downgrading it after sanitization. Lattice
security models offer little help here.
There was a sea change in the US intelligence community after 9/11. Leaders
claimed that the millions of compartments had got in the way of the war on terror, and that better information sharing might have enabled the community to
forestall the attack, so President Bush ordered more information sharing within
the intelligence community. There was a drive by NSA Director Keith Alexander to ‘collect it all’, and rather than minimising data collection to maximise it
345
346
Chapter 10
■
Boundaries
instead and make everything searchable. So nowadays, government systems
use mandatory access control to keep the Secret systems apart from the unclassified stuff, and the Top Secret systems from both, using data diodes and other
mechanisms that we discussed in the previous chapter. The stuff above Top
Secret now appears to be mostly managed using discretionary access controls.
The Snowden revelations have told us all about search systems such as
XKeyscore, which search over systems that used to have many compartments.
If a search can throw up results with many codewords attached, then reading
that result would require all those clearances. In such a world, local labels just
get in the way; but without them, as I asked in the second edition of this book,
how do you forestall a future Aldrich Ames? Perhaps the US intelligence
community was lucky that the failure mode was Ed Snowden instead. As a
system administrator he was in a position to circumvent the discretionary
access controls and access a large number of compartments.
We later learned that at the CIA, too, compartmentation was not always
effective. In 2017, its hacking tools were leaked in the Vault 7 incident, and a
redacted version of the internal report into that was published in 2020 after
the trial of the alleged leaker. It revealed that most sensitive cyberweapons
were not compartmented, users shared sysadmin passwords, there was no
user activity monitoring and historical data were available indefinitely. They
did not notice the loss until the tools ended up on Wikileaks a year later.
In fact, the Joint worldwide Intel Communications System (JWICS), which
the intel community uses for Top Secret data, did not yet use two-factor
authentication [2054].
There are a few compartments Ed Snowden didn’t get to, such as the details
of which cryptographic systems the NSA can exploit and how – this was
marked ‘extremely compartmented information’ (ECI). Commercial firms may
also have special mechanisms for protecting material such as unpublished
financial results; at my university we compile exam papers on machines that
are not even attached to the network. In such cases, what’s happening may be
not so much a compartment as a whole new level above Top Secret.
10.3 Privacy for tigers
People involved in fighting wildlife crime face a fascinating range of problems.
The threats range from habitat encroachment through small-scale poaching for
bushmeat to organised crime gangs harvesting ivory, rhino horn and tiger body
parts on an industrial scale. The gangs may be protected by disaffected communities; even heads of government can be a threat, whether by undermining
environmental laws or even by protecting poaching gangs. And often the best
poacher is a former ranger.
10.3 Privacy for tigers
Even where sovereign threats are absent, public-sector defenders often
work for mutually suspicious governments; protecting the snow leopard from
poachers involves rangers in India, Pakistan, China, Nepal and Tajikistan,
while the illegal ivory trade in East Africa spills over borders from Kenya
down to South Africa. And technology is making matters worse; as mobile
phone masts have gone up in less developed countries, so has poaching. Its
military, insider-threat and political aspects are thus similar in many ways to
traditional security and intelligence work. The critical difference is that the
defenders are a loose coalition of NGOs, park rangers and law-enforcement
agencies. There isn’t a central bureaucracy to manage classifications, clearances
and counterintelligence.
We had a project with Tanya Berger-Wolf, the leader of Wildbook, an ecological information management system that uses image recognition to match and
analyse data collected on animals via tourist photos, camera traps, drones and
other data sources [93]. Her idea was that if we could link up the many photographs taken of individual wild animals, we could dramatically improve the
science of ecology and population biology, together with the resource management, biodiversity, and conservation decisions that depend on them. Modern
image-recognition software makes this feasible, particularly for large animals
with distinctive markings, such as elephants, giraffes and zebras. Wildbook is
now deployed for over a dozen species at over a dozen locations.
In 2015, two Spanish citizens were arrested in Namibia’s Knersvlagte nature
reserve with 49 small succulent plants; a search of their hotel room revealed
2000 more, of which hundreds were threatened species. It turned out that they
sold these plants through a website, had made numerous collecting trips, and
found rare specimens via botanical listservs and social networks. They pleaded
guilty, paid a $160,000 fine and were banned from the country for life. It turned
out that they had also used another citizen-science website, iSpot [2013]. Incidents like this showed that wildlife aggregators need access control, and are
also leading to a rethink among botanists, zoologists and others about open
data [1169]. So what should the policy be?
What one needs to protect varies by species and location. With rare plants,
we don’t want thieves to learn the GPS location of even a single specimen. With
endangered Coahuilan box tortoises, we don’t want thieves stealing them from
the wild and selling them as pets with false documents claiming they were
bred in captivity. There, the goal is a public database of all known tortoises, and
conservators are busy photographing all the wild specimens in their range, a
360 km2 region of Mexico. This will enable the US Fish and Wildlife Service
to check shipments. With the snow leopard, Wildbook had three years of
camera-trap data from one Nepal conservancy, and wanted a security policy to
help this scale to five locations in Nepal, India and Pakistan. This is a Red List
species with only a few hundred individuals in each of these three countries. In
Africa the picture is similar; Wildbook started out by tracking zebras, of which
347
348
Chapter 10
■
Boundaries
the Grévy’s zebra is endangered. Animals cross borders between mutually
suspicious countries, and tourists post tagged photos despite leaflets and
warnings that they should not geotag [2077]. Some tourists simply don’t know
how to turn off tagging; some are so dumb they get out of their cars and
get eaten. The protection requirements also vary by country; in Namibia the
authorities are keen to stop tourists posting tagged photos of rhino, while in
Kenya the rhinos all have their own armed guards and the authorities are less
bothered.
The new wildlife aggregation sites can use image recognition to identify individual animals and link up sightings into location histories; other
machine-learning techniques then aggregate these histories into movement
models. We rapidly find sensitive outputs, such as which waterhole has lots of
leopards, or which island has lots of breeding whales. This is one of the ways
animal privacy differs from the human variety: highly abstracted data are
often more sensitive rather than less. In effect, our machine-learning models
acquire the ‘lore’ that an individual ranger might learn after a decade working
at a conservancy. As such individuals make the best poachers if they go over
to the dark side, we need to keep models that learn their skills out of the
poachers’ hands. And we need to be smart about sensitivity: it’s not enough
to protect only the data and movement models of snow leopards, if a poacher
can also track them by tracking the mountain goats that they eat.
Our primary protection goal is to not give wildlife criminals actionable intelligence, such as “an animal of species A is more likely to be at location X at time
T”. In particular, we don’t want the citizen-science data platforms we build to
make the situation worse. Our starting point is to use an operations-research
model as a guide to derive access rules for (a) recent geotagged photos, (b) predictive models and (c) photo collections. And we need to be able to tweak the
rules by species and location.
There are four levels of access. The core Wildbook team maintains the software and has operational access to almost everything; we might call this level
zero. At level one are the admins of whom there might be maybe 20 per species;
as access control is delegated there will be further admins per conservancy or
per reserve. At level two are hundreds of people who work for conservancies
collecting and contributing data, and who at present are sort-of known to Wildbook; as the system scales up, we need to cope with delegated administration.
At level three there are thousands of random citizens who contribute photos
and are rewarded with access to non-sensitive outputs. Our threat model is
that the set of citizen scientists at level 3 will always include poachers; the
set of conservancy staff at level 2 will include a minority who are careless or
disloyal; and we hope that the level 1 admins usually won’t be in cahoots with
poachers.
The focus of our insider threat mitigation is conservancy staff who may
be tempted to defect. Given that conservancies often operate in weak states,
10.4 Health record privacy
the threat of eventual detection and imprisonment can seem remote. The
most powerful deterrent available is the social pressure from conservancy
peers: loyalty to colleagues, a sense of teamwork and a sense of mission.
The task is to find a technical means of supporting group cohesion and
loyalty. The civil-service approach of having a departmental security officer
who looks over everyone’s shoulder all the time is not feasible anyway in a
financially-stretched conservancy employing ten or twenty people on low
wages in less-developed country (LDC) conditions.
The problem is not just one of providing analytics so that we can alarm if
a member of staff starts looking at lots of records of rhino, or lots of records
at a Serengeti waterhole. We already have admins per species and per location. The problem is motivating people to pay attention and take action. Our
core strategy is local public auditability for situational awareness and deterrence, based on two-dimensional transparency. All conservancy staff are in at
least one group, relating to the species of interest to them or the park where
they work. Staff in the rhino group therefore see who’s been looking at rhino
records – including individual sighting records and models – while staff working in the Serengeti see who’s interested in data and models there. In effect it’s
a matrix system for level 2 staff; you get to see Serengeti rhinos if you’re there
or if you’re a rhino expert, and in either case you share first-line responsibility
for vigilance. Level 1 staff can enrol level 2 staff and make peering arrangements with other conservancies, but their relevant actions are visible to level 2
colleagues. We will have to see how this works in the field.
10.4 Health record privacy
Perhaps the most complex and instructive example of security policies where
access control supports privacy is found in clinical information systems. The
healthcare sector spends a much larger share of national income than the military in all developed countries, and although hospitals are still less automated,
they are catching up fast. The protection of medical information is thus an
important case study for us all, with many rich and complex tradeoffs.
Many countries have laws regulating healthcare safety and privacy, which
help shape the health IT sector. In the USA, the Health Insurance Portability
and Accountability Act (HIPAA) was passed by Congress in 1996 following
a number of privacy failures. In one notorious case, a convicted child rapist
working as an orthopedic technician at Newton-Wellesley Hospital in Newton,
Massachusetts, was caught using a former employee’s password to go through
the records of 954 patients (mostly young females) to get the phone numbers
of girls to whom he then made obscene phone calls [318]. He ended up doing
jail time, and the Massachusetts senator Edward Kennedy was one of HIPAA’s
sponsors.
349
350
Chapter 10
■
Boundaries
The HIPAA regulations have changed over time. The first set, issued by
the Clinton administration in December 2000, were moderately robust, and
based on assessment of the harm done to people who were too afraid to
seek treatment in time because of privacy concerns. In the run-up to the
rulemaking, HHS estimated that privacy concerns led 586,000 Americans to
delay seeking cancer treatment, and over 2 million to delay seeking mental
health treatment. Meanwhile, over 1 million simply did not seek treatment
for sexually transmitted infections [875]. In 2002, President Bush rewrote
and relaxed them to the ‘Privacy Rule’; this requires covered entities such as
hospitals and insurers to maintain certain security standards and procedures
for protected health information (PHI), with both civil and criminal penalties for
violations (although very few penalties were imposed in the first few years).
The rule also gave patients the right to demand copies of their records. Covered entities can disclose information to support treatment or payment, but
other disclosures require patient consent; this led to complaints by researchers.
The privacy rule was followed by further ‘administrative simplification’ rules
in 2006 to promote healthcare systems interoperability. This got a further
boost when President Obama’s stimulus bill allocated billions of dollars to
health IT, and slightly increased the penalties for privacy violations; in 2013
his administration extended the rules to the business associates of covered
entities. But grumbling continues. Health privacy advocates note that the
regime empowered health data holders to freely and secretly aggregate and
broker protected health information, while hospitals complain that it adds to
their costs and patient advocates have been complaining for over a decade
that it’s often used by hospital staff as an excuse to be unhelpful – such
as by preventing people tracing injured relatives [828]. Although HIPAA
regulation gives much less privacy than in Europe, it is still the main driver
for information security in healthcare, which accounts for over 10% of the
U.S. economy. Another driver is local market effects: in the USA, for example,
systems are driven to some extent by the need to generate billing records,
and the market is also concentrated with Epic having a 29% market share for
electronic medical record systems in 2019 while Cerner had 26% [1353].
In Europe, data-protection law sets real boundaries. In 1995, the UK government attempted to centralise all medical records, which led to a confrontation
with the doctors’ professional body, the British Medical Association (BMA).
The BMA hired me to devise a policy for safety and privacy of clinical information, which I’ll discuss later in this chapter. The evolution of medical privacy
over the 25 years since is a valuable case study; it’s remarkable how little the
issues have changed despite the huge changes in technology.
Debates about the safety and privacy tradeoffs involved with medical information started around this time in other European countries too. The Germans
put summary data such as current prescriptions and allergies on the medical
insurance card that residents carry; other countries held back, reasoning that
10.4 Health record privacy
if emergency data are moved from a human-readable MedAlert bracelet to a
smartcard, this could endanger patients who fall ill on an airplane or a foreign
holiday. There was a series of scandals in which early centralised systems were
used to get information on celebrities. There were also sharp debates about
whether people could stop their records being used in research, whether out
of privacy concerns or for religious reasons – for example, a Catholic woman
might want to forbid her gynaecological records being sold to a drug company
doing research on abortion pills.
European law around consent and access to records was clarified in 2010 by
the European Court of Human Rights in the case I v Finland. The complainant
was a nurse at a Finnish hospital, and also HIV-positive. Word of her condition
spread among colleagues, and her contract was not renewed. The hospital’s
access controls were not sufficient to prevent colleagues accessing her record,
and its audit trail was not sufficient to determine who had compromised her
privacy. The court’s view was that health care staff who are not involved in
the care of a patient must be unable to access that patient’s electronic medical
record: “What is required in this connection is practical and effective protection
to exclude any possibility of unauthorised access occurring in the first place.”
This judgment became final in 2010, and since then health providers have been
supposed to design their systems so that patients can opt out effectively from
secondary uses of their data.
10.4.1 The threat model
The appropriate context to study health IT threats is not privacy alone, but
safety and privacy together. The main objective is safety, and privacy is often
subordinate. The two are also intertwined, though in many ways.
There are various hazards with medical systems, most notably safety usability failures, which are reckoned to kill about as many people as road traffic
accidents. I will discuss these issues in the chapter on Assurance and Sustainability. They interact directly with security; vulnerabilities are particularly
likely to result in the FDA mandating recalls of products such as infusion
pumps. The public are much more sensitive to safety issues if they have
a security angle; we have much less tolerance of hostile action than of
impersonal risk.
A second hazard is that loss of confidence in medical privacy causes people
to avoid treatment, or to seek it too late.
1. The most comprehensive data were collected by the US Department
of Health and Human Services prior to the HIPAA rulemaking under
President Clinton. HHS estimated that privacy concerns led 586,000
Americans to delay seeking cancer treatment, and over 2 million to
351
352
Chapter 10
■
Boundaries
delay seeking mental health treatment. Meanwhile, over 1 million simply did not seek treatment for sexually transmitted infections [875];
2. The Rand corporation found that over 150,000 soldiers who served in
Iraq and Afghanistan failed to seek treatment for post-traumatic stress
disorder (PTSD), which is believed to contribute to the suicide rate
among veterans being about double that of comparable civilians – a
significant barrier being access to confidential treatment [1864];
3. The most authoritative literature review concluded that many patients,
particularly teenagers, gay men and prostitutes, withheld information
or simply failed to seek treatment because of confidentiality concerns.
Anonymised HIV testing more than doubled the testing rate among gay
men [1654].
So poor privacy is a safety issue, as well as a critical factor in providing
equal healthcare access to a range of citizens, from veterans to at-risk and
marginalised groups. The main privacy threat comes from insiders, with a
mix of negligence and malice, in roughly three categories:
1. There are targeted attacks on specific individuals, ranging from creepy
doctors looking up the records of a date on a hospital computer, to journalists stalking a politician or celebrity. These cause harm to individuals
directly;
2. There are bulk attacks, as where governments or hospitals sell millions
of records to a drug company, sometimes covertly and sometimes
with the claim that the records have been ‘anonymised’ and are thus no
longer personal health information;
3. Most of the reported breaches are accidents, for example where a doctor
leaves a laptop on a train, or when a misconfigured cloud server leaves
millions of people’s records online [768]. These are reported at five times
the rate of breaches at private firms, as healthcare providers have a
reporting duty. Sometimes accidental leaks lead to opportunistic attacks.
The resulting press coverage, which is mostly of bulk attacks and accidents,
causes many to fear for the privacy of their health data, although they may not
be directly at risk. The bulk attacks also offend many people’s sense of justice,
violate their autonomy and agency, and undermine trust in the system.
So how big is the direct risk? And how much of the risk is due to technology?
As things get centralised, we hit a fundamental scaling problem. The likelihood
that a resource will be abused depends on its value and on the number of people with access to it. Aggregating personal information into large databases
increases both these risk factors at the same time. Over the past 25 years, we’ve
moved from a world in which each doctor’s receptionist had access to maybe
5,000 patients’ records in a paper library or on the practice PC, to one in which
10.4 Health record privacy
the records of thousands of medical practices are hosted on common platforms.
Some shared systems give access to data on many patients and have been
abused. This was already a concern 25 years ago as people started building
centralised systems to support emergency care, billing and research, and it has
become a reality since. Even local systems can expose data at scale: a large district hospital is likely to have records on over a million former patients. And
privacy issues aren’t limited to organizations that treat patients directly: some
of the largest collections of personal health information are in the hands of
health insurers and research organizations.
To prevent abuses scaling, lateral information flow controls are needed.
Early hospital systems that gave all staff access to all records led to a number
of privacy incidents, of which the most notable was the one that led to the I v
Finland judgment of the European court; but there were similar incidents in
the UK going back to the mid-1990s. All sorts of ad hoc privacy mechanisms
had been tried, but by the mid-1990s we felt the need for a proper access
control policy, thought through from first principles and driven by a realistic
model of the threats.
10.4.2 The BMA security policy
By 1995, most medical practices had computer systems to keep records; the
suppliers were small firms that had often been started by doctors whose hobby
was computing rather than golf or yachting, and they were attuned to doctors’ practical needs. Hospitals had central administrative systems to take care
of billing, and some were moving records from paper to computers. There
was pressure from the government, which pays for about 90% of medical care
in Britain through the National Health Service; officials believed that if they
had access to all the information, they could manage things better, and this
caused tension with doctors who cared about professional autonomy. One of
the last things done by Margaret Thatcher’s government, in 1991, had been to
create an ‘internal market’ in the health service where regional commissioners act like insurers and hospitals bill them for treatments; implementing this
was a work in progress, both messy and contentious. So the Department of
Health announced that it wanted to centralise all medical records. The Internet
boom had just started, and medics were starting to send information around
by private email; enthusiasts were starting to build systems to get test results
electronically from hospitals to medical practices. The BMA asked whether
personal health information should be encrypted on networks, but the government refused to even consider this (the crypto wars were getting underway;
see 26.2.7.3 for that story). This was the last straw; the BMA realised they’d better get an expert and asked me what their security policy should be. I worked
with their staff and members to develop one.
353
354
Chapter 10
■
Boundaries
We rapidly hit a problem. The government strategy assumed a single
electronic patient record (EPR) that would follow the patient around from
conception to autopsy, rather than the traditional system of having different
records on the same patient at different hospitals and doctors’ offices, with
information flowing between them in the form of referral and discharge
letters. An attempt to devise a security policy for the EPR that would observe
existing ethical norms became unmanageably complex [822], with over 60
rules. Different people have access to your record at different stages of your
life; your birth record is also part of your mother’s record, your record while
you’re in the army or in jail might belong to the government, and when you
get treatment for a sexually transmitted disease you may have the right to
keep that completely private.
The Department of Health next proposed a multilevel security policy:
sexually transmitted diseases would be at a level corresponding to Secret,
normal patient records at Confidential and administrative data such as drug
prescriptions and invoices at Restricted. But this was obviously a non-starter.
For example, how should a prescription for anti-retroviral drugs be classified?
As it’s a prescription, it should be Restricted; but as it identifies a person as
HIV positive, it should be Secret. It was wrong in all sorts of other ways too;
some people with HIV are open about their condition while others with minor
conditions are very sensitive about them. Sensitivity is a matter for the patient
to decide, not the Prime Minister. Patient consent is central: records can only
be shared with third parties if the patient agrees, or in a limited range of legal
exceptions, such as contact tracing for infectious diseases like TB.
Medical colleagues and I realised that we needed a security context with finer
granularity than a lifetime record, so we decided to let existing law and practice set the granularity, then build the policy on that. We defined a record as
the maximum set of facts to which the same people have access: patient + doctor, patient + doctor plus surgery staff, patient + patient’s mother + doctor +
staff, and so on. So a patient will usually have more than one record, and this
offended the EPR advocates.
A really hard problem was the secondary use of records. In the old days, this
meant a researcher or clinical auditor sitting in the library of a hospital or medical practice, patiently collecting statistics; consent consisted of a notice in the
waiting room saying something like ‘We use our records in medical research
to improve care for all; if you don’t want your records used in this way, please
speak to your doctor.’ By 1995, we’d already seen one company offering subsidised computers to General Practitioners (GPs)1 in return for allowing remote
queries by drug companies to return supposedly anonymous data.
1 Britain’s
GPs are the equivalent of family doctors in the USA; they have historically acted as
gatekeepers to the system and as custodians of each patient’s lifetime medical record. They also
act as the patient’s advocate and join up care between medical practice, hospital and community.
This helps keeps healthcare costs down in the UK, compared with the USA.
10.4 Health record privacy
The goals of the BMA security policy were therefore to enforce the principle of consent, and to prevent too many people getting access to too many
records. It did not try to do anything new, but merely to codify existing best
practice, and to boil it down into a page of text that everyone – doctor, engineer
or administrator – could understand.
Starting from these principles and insights, we proposed a policy of nine
principles.
1. Access control: each identifiable clinical record shall be marked with an
access control list naming the people who may read it and append data
to it.
2. Record opening: a clinician may open a record with herself
and the patient on the access control list. Where a patient has
been referred, she may open a record with herself, the patient
and the referring clinician(s) on the access control list.
3. Control: One of the clinicians on the access control list must be
marked as being responsible. Only she may alter the access control
list, and she may only add other health care professionals to it.
4. Consent and notification: the responsible clinician must notify the
patient of the names on his record’s access control list when it is opened,
of all subsequent additions, and whenever responsibility is transferred.
His consent must also be obtained, except in emergency or in the case of
statutory exemptions.
5. Persistence: no-one shall have the ability to delete clinical information until the appropriate time period has expired.
6. Attribution: all accesses to clinical records shall be marked on
the record with the subject’s name, as well as the date and
time. An audit trail must also be kept of all deletions.
7. Information flow: Information derived from record A may be appended
to record B if and only if B’s access control list is contained in A’s.
8. Aggregation control: there shall be effective measures to prevent
the aggregation of personal health information. In particular,
patients must receive special notification if any person whom it
is proposed to add to their access control list already has access
to personal health information on a large number of people.
9. Trusted computing base: computer systems that handle personal health
information shall have a subsystem that enforces the above principles in
an effective way. Its effectiveness shall be subject to evaluation by independent experts.
355
356
Chapter 10
■
Boundaries
From the technical viewpoint, this policy is strictly more expressive than
the Bell-LaPadula model of the last chapter, as it contains an information
flow control mechanism in principle 7, but also contains state. In fact, it takes
compartmentation to the logical limit, as there are more compartments than
patients. A discussion for a technical audience can be found at [60]. The full
policy dealt with a lot more issues, such as access to records by vulnerable
patients who might be coerced [59].
Similar policies were developed by other medical bodies including the
Swedish and German medical associations; the Health Informatics Association of Canada, and an EU project (these are surveyed in [1079]). The BMA
model was adopted by the Union of European Medical Organisations (UEMO)
in 1996, and feedback from public consultation on the policy can be found
in [61].
10.4.3 First practical steps
Feedback from the field came from a pilot implementation in a medical
practice [871], which was positive, and from a hospital system developed in
Hastings, which controlled access using a mixture of roles and capabilities,
rather than the ACLs in which the BMA model was expressed. It turned
out that the practical way to do access control at hospital scale was by rules
such as ‘a ward nurse can see the records of all patients who have within the
previous 90 days been on her ward’, ‘a junior doctor can see the records of all
patients who have been treated in her department’, and ‘a senior doctor can
see the records of all patients, but if she accesses the record of a patient who
has never been treated in her department, then the senior doctor responsible
for that patient’s care will be notified’2 .
The technical lessons learned are discussed in [535, 536, 871]. With hindsight,
the BMA model was a lossless compression of what doctors said they did while
the role-based model was a slightly lossy version but which implemented what
hospitals do in practice and worked well in that context. One of the BMA rules,
though, created difficulty in both contexts: the desire for a small trusted computing base. GPs ended up having to trust all the application code that they
got from their suppliers, and while they could influence its evolution, there
was no useful trusted subset. The hospital records system was much worse: it
had to rely on the patient administrative system (PAS) to tell it which patients,
and which nurses, are on which ward. The PAS was flaky and often down, so it
wasn’t acceptable to make a safety-critical system depend on it. The next iteration was to give each hospital staff member a smartcard containing credentials
for their departments or wards.
2 The Hastings system was initially designed independently of the BMA project. When we learned
of each other we were surprised at how much our approaches coincided, and reassured that we
had captured the profession’s expectations in a reasonably consistent way.
10.4 Health record privacy
The policy response from the Department of Health was to set up a committee of inquiry under Dame Fiona Caldicott. She acknowledged that some 60
established flows of information within the NHS were unlawful, and recommended the appointment of a responsible privacy officer in each healthcare
organisation [369]. This was at least a start, but it created a moral hazard:
while the privacy officer, typically a senior nurse, was blamed when things
went wrong, the actual policy was set by ministers – leading to the classic
security-economics gotcha we discussed in Chapter 8, of Bob guarding the
system while Alice pays the cost of failure. Anyway, the government changed,
and the new administration of Tony Blair went for a legal rather than a technical fix – with a data-protection law that allowed data controllers to pretend
that data were anonymous so long as they themselves could not re-identify
them, even if others could re-identify them by matching them with other
data3 . We will discuss the limits of anonymisation in the following chapter.
10.4.4 What actually goes wrong
In his second term as Prime Minister, Tony Blair announced a £6bn plan to
modernise health service computing in England. The National Programme for
IT (NPfIT), as it came to be known, turned out to be the world’s most expensive
civilian IT disaster. After David Cameron came to power in 2010, an inquiry
from the National Audit Office noted of a total expenditure of about £10bn,
some £2bn spent on broadband networking and digital X-ray imaging resulted
in largely working systems, while the rest didn’t give value for money, and the
core aim that every patient should have an electronic care record would not be
achieved [1392]. Cameron formally killed the project, but its effects continued
for years because of entrenched supplier contracts, and health IT was held up
for a decade [1562].
NPfIT had called for all hospital systems to be replaced during 2004–2010
with standard ones, to give each NHS patient a single electronic care record.
The security policy had three main mechanisms.
1. There are role-based access controls like those pioneered at Hastings.
2. In order to access patient data, a staff member also needs a legitimate
relationship. This abstracts the Hastings idea of ‘her department’.
3. There was a plan that patients would be able to seal certain parts
of their records, making them visible only to a particular care team.
However, the providers never got round to implementing this. It wasn’t
3 The
UK law was supposed to transpose the EU Data Protection Directive (95/46/EC) into UK
law to provide a level playing field on privacy; this loophole was one of several that allowed UK
firms a lot of wriggle room, annoying the French and Germans [597]. The EU eventually pushed
through the stricter General Data Protection Regulation (2016/679).
357
358
Chapter 10
■
Boundaries
consistent with the doctrine of a single electronic health record, which
had been repeated so often by ministers that it had become an article
of religious faith. As late as 2007, Parliament’s Health Committee
noted that suppliers hadn’t even got a specification yet [927].
As a result, patients receiving outpatient psychiatric care at a hospital found
that the receptionist could see their case notes. Formerly, the notes were kept
in paper in the psychiatrist’s filing cabinet; all the receptionist got to know was
that Mrs Smith was seen once a month by Dr Jones. But now the receptionist role had to be given access to patient records so that they could see and
amend administrative data such as appointment times; and everyone working
reception in the hospital wing where Dr Jones had his office had a legitimate
relationship. So they all got access to everything. This illustrates why the doctrine of a single record with a single security context per patient was a bad idea.
Thanks to project mismanagement, less than ten percent of England’s hospitals
actually installed these systems, though the doctrine of ‘RBAC + relationship’
has affected others since. It now looks like the failure to support multiple security contexts per patient is about to become an issue in the USA as firms start
pushing health apps supported by the FHIR standard, to which I’ll return in
section 10.4.5.
10.4.4.1 Emergency care
The next thing to go wrong was emergency medical records. One of the stories
used by politicians to sell NPfIT had been ‘Suppose you fall ill in Aberdeen
and the hospital wants access to your records in London … ’. This was, and
remains, bogus. Paramedics and emergency-room physicians are trained to
treat what they see, and assume nothing; the idea that they’d rely on a computer to tell the blood group of an unconscious patient is simply daft. But policy
was policy, and in Scotland the government created an ‘emergency care record’
of prescriptions and allergies that is kept on a central database for use by emergency room clinicians, paramedics and the operators of out-of-hours medical
helpline services. Sensitive information about 2.5 million people was made
available to tens of thousands of people, and the inevitable happened; one
doctor of Queen Margaret Hospital in Dunfermline was arrested and charged
for browsing the health records of then Prime Minister Gordon Brown, First
Minister Alex Salmond and various sports and TV personalities. The case was
eventually dropped as ‘not in the public interest’ to prosecute [1745]. Patients
had been offered the right to opt out of this system, but it was a very odd
opt-out: if you did nothing, your data were collected from your GP and made
available to the Department of Health in Edinburgh and also to the ambulance
service. If you opted out, your data were still collected from your GP and made
10.4 Health record privacy
available to the Department of Health; they just weren’t shared with the ambulance crew.
This was also policy in England where it was called ‘consent-to-view’: the
state would collect everything and show users only what they were allowed to
see. Everybody’s records would be online, and doctors would only be allowed
to look at them if they claimed the patient had consented. Officials assured
Parliament that this was the only practical way to build NPfIT; they described
this as ‘an electronic version of the status quo’ [927]. The English emergency
system, the Summary Care Record (SCR), also has sensitive data on most citizens, is widely accessible, but is little used; if you end up in an ambulance,
they’ll take a medical history from you en route to hospital, just as they always
have4 . Something similar also happened in the Netherlands, where a database
of citizens’ medical insurance details ended up being accessible not just by doctors and pharmacists but alternative healers and even taxi firms, with entirely
predictable results [187].
10.4.4.2 Resilience
The move to centralised systems typically makes failures rarer but larger, and
health systems are no exception. The NPfIT’s only real achievement was to
standardise all X-ray imaging in England using digital machines and cloud
storage. An early warning of fragility came on 11th December 2005, when a leak
of 250,000 litres of petrol at the Buncefield oil storage depot formed a vapour
cloud and detonated – the largest peacetime explosion in Europe. Oil companies were later fined millions of pounds for safety breaches. Our local hospital
lost X-ray service as both the primary and backup network connections to
the cloud service passed nearby. A further warning came when the Wannacry
worm infected machines at another nearby hospital in 2017; managers foolishly
closed down the network, in the hope of preventing further infection, and then
found that they had to close the emergency room and send patients elsewhere.
With no network they could do no X-rays (and get no pathology test results
either, even from the hospital’s own lab). There have been further incidents of
hospitals closed by ransomware since, particularly in the USA.
10.4.4.3 Secondary uses
Databases relating to payment usually don’t allow a real opt-out, and the
UK example is the Hospital Episode Statistics (HES) database, which collects
bills sent by hospitals to the commissioning bodies that pay them, and has
4 In
the coronavirus crisis, the SCR was ‘enriched’ by adding a lot of data from the GP record,
making it available to planners, and making it opt-out by default. It’s still not clear that any
worthwhile use has been made of it.
359
360
Chapter 10
■
Boundaries
extensive information on every state-funded hospital visit and test in England
and Wales since 1998 – about a billion records in total5 . These records have
proved impossible to protect, not just because anonymisation of complete
records is impractical but because of the intense political pressure for access
by researchers. More and more people had got access under the 1997–2010
Labour government; and after David Cameron became Prime Minister in
2010, the floodgates opened. Cameron hired a ‘transparency tsar’ who’d
previously run a health IT business, and announced ‘Open Data measures’ in
2011 which had the goal that every NHS patient would be a research patient,
in order to make Britain a world leader in pharmaceutical research. Officials
claimed that ‘All necessary safeguards would be in place to ensure protection
of patients’ details – the data will be anonymised and the process will be
carefully and robustly regulated’ [1811]. Anonymisation meant that your
personal details were redacted down to your postcode and date of birth; this
is quite inadequate, as we’ll discuss in the next chapter.
In 2013 the government announced that records would also be harvested
from GP systems; GPs were given eight weeks to inform their patients of the
impending upload. This caused enough disquiet that privacy campaigners,
GPs and others got together to set up a medical privacy campaign group,
medConfid
Download