DETECTION PROBLEMS IN
WATERMARKING,
STEGANOGRAPHY, AND
CAMERA IDENTIFICATION
BY
JUN(EUGENE) YU
BS, Lanzhou University, China, 2001
MS, Lanzhou University, China, 2004
DISSERTATION
Submitted in partial fulfillment of the requirements for
the degree of Doctor of Philosophy in Electrical Engineering
in the Graduate School of
Binghamton University
State University of New York
2011
UMI Number: 3474057
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent on the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
UMI 3474057
Copyright 2011 by ProQuest LLC.
All rights reserved. This edition of the work is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
c Copyright by Jun Yu 2011
All Rights Reserved
Accepted in partial fulfillment of the requirements for
the degree of Doctor of Philosophy in Electrical Engineering
in the Graduate School of
Binghamton University
State University of New York
2011
July 29, 2011
Scott A. Craver, Chair and Faculty Advisor
Department of Electrical Engineering, Binghamton University
Jessica Fridrich, Member
Department of Electrical Engineering, Binghamton University
Mark Fowler, Member
Department of Electrical Engineering, Binghamton University
Kenneth Chiu, Outside Examiner
Department of Computer Science, Binghamton University
iii
Abstract
This dissertation covers research results in three fields, i.e. reverse engineering digital watermarking systems, security in steganography, and camera
related digital image processing.
The research in the first field is under a new attacking philosophy, i.e. learn
the target watermarking system first before attacking. A 3-stage model
is introduced to model generic watermark detection systems so as to aid
reversing in a systematic manner. Two reversing techniques, i.e. superrobustness and noise snakes, are developed and applied to real life attacks.
In order to handle watermark residing in feature space of very large dimensionality, a more precise geometric model for normalized correlation based
watermark detector is presented and analyzed.
The research in the second field is under the philosophy of creating a stegofriendly cover which would ease the embedding restriction. As a theoretic
trial, the subset selection technique is examined under the i.i.d. and the
first order Markov case. As a practical implementation, this philosophy is
implemented in the Apple iChat video-conferencing software via cultural engineering. The goal of this cultural engineering approach is to render covert
data to be culturally undetectable instead of statistically undetectable.
The research in the last category includes two topics, i.e. DSLR lens identification and benchmarking camera image stabilization systems. In both
applications, white noise pattern image is used as shooting target to supersede the commonly used checkerboard pattern image, since it excels in
position registration and bears the histogram-invariant feature in each scenario respectively. For that task of distinguishing different copies of the
same lens using chromatic aberration, focal distance is determined to be
another important factor in shaping the lens chromatic aberration pattern,
iv
which completes the lens identification feature space. For the task of evaluating camera IS systems, a camera-motion blur metric directly generated
from test picture histogram is developed to fulfill the benchmarking task
without resorting to either human visual assessment or other more complex
motion blur metrics.
v
Acknowledgements
I would like to thank my advisor Scott Craver for guiding me, step by step,
on how to perform scientific research as well as reverse engineering1 , and
on how to behave as a rigorous researcher. I’m also very grateful to his
generous and continuous financial support in these years.
I would also like to thank Professor Jessica Fridrich for her very systematic way of teaching and inspiring discussions, as well as her enlightening
comments on this dissertation.
I must thank all Professors who showed me human wisdom in the fields of
Electrical Engineering and Mathematics on their courses during my studies
in Binghamton University.
Special thanks to my friends Enping Li, Idris Atakli, Yibo Sun, Hongbing
Hu, Xiao Xiao, Kun Huang, Mo Chen, Ming Liu, Pranav, and Yan Zhang
for our cherished memories here in Binghamton.
Finally, I’m very grateful to my parents Kaibei Yu and Xiaoqiao Lian for
their wholehearted support in all these years.
1
For the difference between the two subjects, please refer to section 2.1.1.
vi
Contents
List of Tables
xii
List of Figures
xiii
1 Introduction
1
1.1
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Specific Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2 Breaking the BOWS Watermark by Super-robustness
2.1
2.2
2.3
5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1.1
Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1.2
Reversing Digital Watermark . . . . . . . . . . . . . . . . . . . .
6
A 3-Stage Watermarking System Model . . . . . . . . . . . . . . . . . .
7
2.2.1
Common Transforms and Detection Algorithms . . . . . . . . . .
7
Breaking BOWS by Identifying Super-robustness . . . . . . . . . . . . .
8
2.3.1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3.1.1
Goals of the Contest . . . . . . . . . . . . . . . . . . . .
9
2.3.1.2
A Comparison to the SDMI Challenge . . . . . . . . . .
9
2.3.1.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2
2.3.3
2.3.4
REVERSE ENGINEERING
. . . . . . . . . . . . . . . . . . . .
11
2.3.2.1
Determining the Feature Space . . . . . . . . . . . . . .
11
2.3.2.2
Determining the Sub-band . . . . . . . . . . . . . . . .
13
BREAKING THE WATERMARK . . . . . . . . . . . . . . . . .
14
2.3.3.1
A Curious Anomaly . . . . . . . . . . . . . . . . . . . .
14
2.3.3.2
The Rest of the Algorithm . . . . . . . . . . . . . . . .
15
SUPER-ROBUSTNESS AND ORACLE ATTACKS . . . . . . .
15
2.3.4.1
17
Oracle Attacks Exploiting Super-robustness . . . . . . .
vii
CONTENTS
2.3.5
2.3.4.2
Mitigation Strategies . . . . . . . . . . . . . . . . . . .
17
2.3.4.3
Other Advice to the Watermark Designer . . . . . . . .
18
CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3 Reversing Normalized Correlation based Watermark Detector by Noise
20
Snakes
3.1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
SUPER-ROBUSTNESS AND FALSE POSITIVES . . . . . . . . . . . .
21
3.2.1
Examples of Super-robustness . . . . . . . . . . . . . . . . . . . .
22
3.2.2
Generic Super-robustness . . . . . . . . . . . . . . . . . . . . . .
23
MEASURING A DETECTION REGION . . . . . . . . . . . . . . . . .
24
3.3.1
Noise Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.3.2
The Growth Rate of a Noise Snake . . . . . . . . . . . . . . . . .
24
3.3.3
Information Leakage from Snakes . . . . . . . . . . . . . . . . . .
26
3.3.4
Extracting the Size of the Feature Space . . . . . . . . . . . . . .
29
3.4
RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.5
DISCUSSION AND CONCLUSIONS
. . . . . . . . . . . . . . . . . . .
30
3.5.1
Is it Useful? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.5.2
Attack Prevention . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.2
3.3
4 A More Precise Model for Watermark Detector Using Normalized
Correlation
33
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.2
Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
4.3
Quartic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.4
Numerical Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.4.1
Identify Valid Roots . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.4.2
Detection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.4.3
The Pattern of α(h) . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.4.3.1
Five AC DCT Embedding . . . . . . . . . . . . . . . .
39
4.4.3.2
Twelve AC DCT Embedding . . . . . . . . . . . . . . .
42
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.5
viii
CONTENTS
5 Circumventing the Square Root Law in Steganography by Subset Selection
45
5.1
The Square-root Law in Steganography . . . . . . . . . . . . . . . . . .
45
5.2
Circumventing the Law by Subset Selection . . . . . . . . . . . . . . . .
46
5.2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.2.2
The i.i.d. Case . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.2.3
The First Order Markov Case . . . . . . . . . . . . . . . . . . . .
47
5.2.3.1
The Algorithm . . . . . . . . . . . . . . . . . . . . . . .
48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.3
Conclusion
6 Covert Communication via Cultural Engineering
51
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
6.2
Supraliminal Channels
. . . . . . . . . . . . . . . . . . . . . . . . . . .
52
6.2.1
A two-image protocol for supraliminal encoding . . . . . . . . . .
52
6.2.2
Public key encoding . . . . . . . . . . . . . . . . . . . . . . . . .
54
Manufacturing a Stego-friendly Channel . . . . . . . . . . . . . . . . . .
55
6.3.1
Apple iChat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
6.3.2
Cultural engineering . . . . . . . . . . . . . . . . . . . . . . . . .
57
Software Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
6.4.1
Quartz Composer animations . . . . . . . . . . . . . . . . . . . .
58
6.4.2
Receiver architecture and implementation issues . . . . . . . . .
59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
6.3
6.4
6.5
Results
6.6
Discussions and Conclusions
. . . . . . . . . . . . . . . . . . . . . . . .
7 DLSR Lens Identification via Chromatic Aberration
7.1
7.2
62
64
Introduction on Camera-related Digital Forensics . . . . . . . . . . . . .
64
7.1.1
Forensic Issues in The Information Age . . . . . . . . . . . . . .
64
7.1.2
Digital Camera: ‘Analog’ Part and ‘Digital’ part . . . . . . . . .
65
7.1.3
Identification Feature Category . . . . . . . . . . . . . . . . . . .
65
DSLR Lens Identification . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7.2.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
7.2.1.1
Lens Aberrations . . . . . . . . . . . . . . . . . . . . . .
67
7.2.1.2
Previous Works . . . . . . . . . . . . . . . . . . . . . .
68
A New Shooting Target . . . . . . . . . . . . . . . . . . . . . . .
69
7.2.2
ix
CONTENTS
7.2.3
7.2.4
7.2.5
7.2.6
7.2.2.1
Pattern Misalignment Problems . . . . . . . . . . . . .
69
7.2.2.2
White Noise Pattern . . . . . . . . . . . . . . . . . . . .
70
7.2.2.3
Paper Printout vs LCD Display . . . . . . . . . . . . .
70
Major Feature-Space Dimensions of Lens CA Pattern . . . . . .
71
7.2.3.1
Focal Length . . . . . . . . . . . . . . . . . . . . . . . .
71
7.2.3.2
Focal Distance . . . . . . . . . . . . . . . . . . . . . . .
72
7.2.3.3
Aperture Size and Other Issues . . . . . . . . . . . . . .
73
7.2.3.4
Size of a Complete Lens CA Identification Feature Space 73
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
7.2.4.1
Test Picture Shooting Settings . . . . . . . . . . . . . .
74
7.2.4.2
CA Pattern Extraction . . . . . . . . . . . . . . . . . .
76
7.2.4.3
CA Pattern Comparison Results . . . . . . . . . . . . .
76
Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
7.2.5.1
Improve the Identification Capability . . . . . . . . . .
78
7.2.5.2
The Potential of the Bokeh Modulation . . . . . . . . .
79
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
8 Benchmarking Camera IS by the Picture Histograms of a White Noise
82
Pattern
8.1
8.2
Motivation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
8.1.1
The Volume of Test Picture Space . . . . . . . . . . . . . . . . .
82
8.1.2
The Consistency of an IS Metric . . . . . . . . . . . . . . . . . .
83
8.1.3
Our Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
Our Automatic IS Benchmarks . . . . . . . . . . . . . . . . . . . . . . .
83
8.2.1
A Histogram based IS Metric . . . . . . . . . . . . . . . . . . . .
83
8.2.2
A Histogram-Invariant Shooting Target . . . . . . . . . . . . . .
84
8.2.3
Our Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
8.2.3.1
8.3
8.4
The Rationale behind the Histogram Subtraction Operation . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
8.3.1
Synthetic Motion Blur . . . . . . . . . . . . . . . . . . . . . . . .
86
8.3.2
Real Camera Motion Blur . . . . . . . . . . . . . . . . . . . . . .
88
Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
x
CONTENTS
8.5
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9 Conclusions
9.1
90
91
Reversing Digital Watermarking . . . . . . . . . . . . . . . . . . . . . .
91
9.1.1
Security and Robustness . . . . . . . . . . . . . . . . . . . . . . .
91
9.1.2
Reversing before Attacking . . . . . . . . . . . . . . . . . . . . .
92
9.1.2.1
Caution on Designing Patterns . . . . . . . . . . . . . .
92
9.2
Circumventing Detection in Steganography . . . . . . . . . . . . . . . .
93
9.3
Camera related Image Processing . . . . . . . . . . . . . . . . . . . . . .
93
9.3.1
White Noise Pattern Image . . . . . . . . . . . . . . . . . . . . .
94
9.3.2
Complete Lens Identification Feature Space . . . . . . . . . . . .
94
9.3.2.1
Outlook for the Bokeh Modulation . . . . . . . . . . . .
94
Motion Blur Measure from Histogram . . . . . . . . . . . . . . .
94
9.3.3
References
96
xi
List of Tables
2.1
The distribution of transforms types. . . . . . . . . . . . . . . . . . . . .
8
2.2
The distribution of detection algorithms. . . . . . . . . . . . . . . . . . .
8
2.3
Successive attacks for image 1. By amplifying a few AC coefficients by
the right value, the detector will fail. . . . . . . . . . . . . . . . . . . . .
2.4
16
Successive attacks for image 2. We start with 9 AC coefficients. ’R’
denotes that the coefficient is no longer needed when the gain increases.
16
7.1
Six Canon DSLR lenses. . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
7.2
The sequential order of test pictures. . . . . . . . . . . . . . . . . . . . .
75
xii
List of Figures
2.1
Three stage decomposition of watermark detection system.
. . . . . . .
7
2.2
The three images from the BOWS contest. . . . . . . . . . . . . . . . . .
9
2.3
Some visible artifacts in the image, suggesting either severe compression
or a block-based watermark. . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
12
Some severe transformations of an image. Left, a shuffling of blocks
aimed at preserving block-based FFT magnitudes. Right, a severe addition of 8-by-8 DC offsets. This preserved the watermark with only 3.94
dB PSNR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
12
Left: AC removal pattern 1; Center: AC removal pattern 2; Right:
intersection of the pattern 1 and 2. . . . . . . . . . . . . . . . . . . . . .
13
2.6
The three images after the attack. Note the obvious block artifacts. . .
17
3.1
An image from the BOWS contest, and a severe distortion that preserves
the watermark. From this experimental image we concluded that the
watermark was embedded in non-DC 8-by-8 DCT coefficients. . . . . . .
21
3.2
A normalized correlation detection region with threshold τ . . . . . . . .
23
3.3
The construction of a noise snake. . . . . . . . . . . . . . . . . . . . . .
25
3.4
A pair of constructed noise snakes, used to plumb the detection cone.
.
25
3.5
Optimal growth factors for a snake of unit length. On the left, a cone
. . . . . .
26
angle of π/3 and n = 1000. On the right, π/3 and n = 9000.
3.6
Lemma: two independently generated snakes have approximately perpendicular off-axis components. . . . . . . . . . . . . . . . . . . . . . . .
3.7
The relation between the cone angle and the dot product of two independently generated noise snakes. . . . . . . . . . . . . . . . . . . . . . .
3.8
28
28
Estimated threshold using two noise snakes. Ground truth is τ = 0.5, n =
500. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
31
LIST OF FIGURES
3.9
Estimated dimension using two noise snakes. Ground truth is n = 500. .
31
4.1
A detection cone intersects with a Gaussian noise ball. . . . . . . . . . .
35
4.2
Three possible geometrical relationships. Arc in solid line is within the
detection cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
38
Left: AC coefficients used for embedding. Middle: Image attacked by
noise with amplitude R = 10000. Right: Image attacked by noise with
amplitude R = 15000. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
Detection threshold is 0.011. Left: the arc angle α(h) predicted by our
model. Right: the detection rate obtained by Monte Carlo method. . . .
4.5
41
Detection threshold is 0.015. Left: the arc angle α(h) predicted by our
model. Right: the detection rate obtained by Monte Carlo experiment. .
4.7
40
Detection Threshold is 0.008. Left: the arc angle α(h) predicted by our
model. Right: the detection rate obtained by Monte Carlo method. . . .
4.6
39
42
Detection threshold is 0.01. Left: the arc angle α(h) predicted by our
model. Right: the detection rate obtained by Monte Carlo experiment. .
43
5.1
The second and third step of our algorithm. . . . . . . . . . . . . . . . .
49
5.2
The worst overlapping case. . . . . . . . . . . . . . . . . . . . . . . . . .
49
6.1
The two-image protocol for establishing a supraliminal channel from an
image hash. Alice and Bob’s transmissions are each split across two images, with the semantic content of the second keying the steganographic
content of the first. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
53
The transmission pipeline. Data is fed to a ciphertext server, polled by
a doctored PRNG patch in an animation. . . . . . . . . . . . . . . . . .
57
6.3
The 14-line ciphertext server, adapted from a 41-line web sever in (1). .
58
6.4
Quartz composer flow-graph for our sample animation. At the top, the
”Ciphertext Gateway” is polled by a clock to update the state of the
display. The animation contains two nested iterators to draw columns
and rows of sprites (blocks).
6.5
. . . . . . . . . . . . . . . . . . . . . . . .
59
Average occlusion of background pixels over 31369 frames (16 minutes
28 seconds) of a video chat session. . . . . . . . . . . . . . . . . . . . . .
xiv
61
LIST OF FIGURES
6.6
BER (left) and block rows transmitted per second (right) with our animation/decoder pair, for several block sizes and attempted periods. . . .
6.7
61
Supraliminal backdrops. On the left, colored blocks. On the right, the
user’s office superimposed with rain clouds. The animations are driven
either by pseudo-random bits or ciphertext, estimable by the recipient.
62
7.1
White noise pattern image as shooting target. . . . . . . . . . . . . . . .
70
7.2
Two B-G Channel CA Patterns with one lens works at different focal
length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3
Two B-G Channel CA Patterns with one lens works at different focal
distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4
77
Mismatch plot. Left: CA pattern of p13 was used as reference. Right:
CA pattern in p1 was used as reference. . . . . . . . . . . . . . . . . . .
7.6
73
Mismatch plot. Left: CA pattern of p31 was used as reference. Right:
CA pattern of p22 was used as reference. . . . . . . . . . . . . . . . . . .
7.5
72
77
Bokeh pattern: the polygon shapes are due to the 8-blade aperture diaphragm being slightly closed. . . . . . . . . . . . . . . . . . . . . . . . .
79
8.1
Different motion ranges and their resulting histograms. . . . . . . . . . .
87
8.2
Image crops: the original image and three synthetically blurred images
with different motion ranges. . . . . . . . . . . . . . . . . . . . . . . . .
87
8.3
The reference picture. Left: its histogram; right: the center crop. . . . .
88
8.4
From left to right: center crop of the picture 2, 9, 5, 10 . . . . . . . . . .
88
8.5
Our motion blur metric. . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
xv
1
Introduction
My dissertation encompasses research works that can be mainly categorized into three
fields, i.e. reverse engineering watermarking systems, the security in steganography, as
well as camera related digital image processing and forensics.
In essence, detection is the keyword in common behind those three fields. To be
specific,
• Reversing a digital watermark is about how to defeat a watermark detector by
firstly learning its inner workings.
• Circumventing the Square Root Law in steganography is about how to, at least
in theory, achieve bigger embedding payload while preserving the statistics of the
cover media so that the act of embedding is hard to detect.
• Covert communication via Cultural Engineering is about how to smuggle secret
data through popular communication software without raising suspicion.
• DSLR lens identification is about how to distinguish each individual lens by using
lens’s chromatic aberration pattern as its fingerprint;
• Benchmarking camera Image Stabilization systems is about how to detect and
measure the camera-incurred motion blur in an automatic and straightforward
way.
For detection problems, as long as security is involved, there is always the war
between system designers and attackers. In many application scenarios, the state of
art is a cat-and-mouse game. We hope our research works could shed some light on the
final judgement as to which side is more likely to win for the scenarios listed above.
1
1. INTRODUCTION
1.1
Organization
Chapter 2, 3, and 4 present several techniques on how to reverse-engineer a watermarking system, in an evolving order. Chapter 2 begins by introducing a generic
3-stage framework for modeling watermarking system. It then applies this model to
a real life open challenge – the first BOWS1 contest. The reversing technique superrobustness which enabled us to win the first phase of the BOWS is described here.
Chapter 3 further develops the idea of super-robustness for watermark detector that is
based on normalized correlation. It describes how to automatically generate large false
positive vectors that can be used to reveal the geometry of the watermark detection
region. Chapter 3 employs a simplified geometric model which makes the computation tractable, and succeeds in handling watermark feature space dimensionality to the
scale of 102 . Chapter 4 explores watermarking systems with much larger feature space
dimensionality(to the scale of 104 ) by using a more precise geometric model. It delves
into the mathematics describing the situation when a hyper-cone intersects a hypersphere in hyper-dimensional space. Although both symbolic and complete numerical
solutions are not obtainable due to the complex nature of the model, it still finds one
way to disclose information about the watermark detector.
Chapter 5 discusses the possibility of circumventing the Square Root Law in steganography. Both the i.i.d. and the first order Markov case are analyzed here.
Other than mimicking natural noise or preserving the statistics of the cover media,
Chapter 6 provides a new philosophy of covert communication, i.e. through Cultural Engineering. It then implements this idea by exploiting the video-conferencing software’s
random background animation to smuggle secret data.
Chapter 7 examines the problem of DSLR camera lens identification. The lens
chromatic aberration(CA) pattern is used as each lens’s unique fingerprint. The focal
distance is discovered to be another important factor in shaping the lens CA pattern,
and this discovery completes the feature space for lens CA data. After taking all the
relevant factors into account, it succeeds in distinguishing different copies of the same
lens.
Chapter 8 presents a fast and automatic way of measuring camera motion incurred
1
BOWS stands for ‘Break Our Watermarking System’.
2
1.2 Specific Contributions
motion blur,1 which is used for a very practical application, i.e. benchmarking different
camera Image Stabilization systems. It introduces a white noise pattern image as
shooting target, which enables the blur measure to be derived directly from test picture
histograms.
1.2
Specific Contributions
This section provides a summary of my contributions in this dissertation.
1. Super-robustness is found to be a security flaw rather than a blessing for a watermarking system. Along with two other team members(Professor Craver and
Idris Atakli), by identifying traces of super-robustness, we were able to reverse
the type of transform and the watermark feature space that were used by the
BOWS watermark system, which enabled us to design ‘guided attack’ later. I
came up with attack configurations for all three contest images which gave us the
winning PSNR.
2. Noise snakes are large false positives grown along the normalized correlation
based watermark detection boundary, which can be used to reverse the geometry of the detection region. This idea is inspired by the concept of superrobustness. While identifying super-robustness requires human expert knowledge,
noise snakes are automatically generated by our algorithm. Along with Professor
Craver, I designed the noise snake technique.
3. For normalized correlation based watermarking systems using a feature space
of very large dimensionality, a precise geometric model is needed in order to
ensure accurate reversing results. I delved into the mathematical description of
this model which boils down to a quartic equation. After my efforts of seeking
analytical and numerical solutions both failed due to the complex nature of the
model, I observed one pattern that can still be used to aid our reversing task.
To be more specific, I noticed that as the watermark detection rate crosses 50%,
there is a corresponding pattern change in the tail part of the arc angle curve
α(h)
2
w.r.t. the noise strength R.
1
Note that this is different from the moving object incurred motion blur.
The intersection angle of the watermark detection region and the noise sphere, in hyperdimensional space.
2
3
1. INTRODUCTION
4. Let g denote the steganographic embedding function, and the Kullback-Leibler
divergence is used as security measure. A subset of the cover which has a pdf
fg that is a fixed point under g can be selected to achieve linear embedding rate
without leaving statistical traces. I analyzed the embedding payload for both
i.i.d. and first order Markov case for this subset selection technique.
5. The term Cultural Engineering refers to the act of tricking people to do certain
things by firstly creating or promoting a relevant culture then use this culture to
influence people to do such things, which people are not likely to do otherwise.
Along with Professor Craver and Enping Li, we implemented a covert communication channel through Apple iChat’s video-conferencing facility. To be specific, I
designed the ‘rain drop’ animation using Apple ‘Quartz Composer’ programming
environment. Also I implemented a matched filter for decoding covert data that
is hidden in the ‘rain drop’ animation.
6. Chromatic aberration(CA) is a class of inevitable artifacts for all optical lens
systems. I discovered that the lens focal distance was another important factor
in shaping the lens CA pattern, which had not received attention in previous
research works. This discovery completes the feature space of the lens CA identification data. Along with the new white noise pattern shooting target, which is
a contribution from Professor Craver, I was able to distinguish different copies of
the same lens via their CA pattern.
7. The Image Stabilization(IS) system is a very useful add-on functionality in most of
today’s digital cameras. However, there lacks a practical way to fairly benchmark
and compare different IS systems from different camera manufactures. Thanks to
our white noise pattern shooting target, I was able to build a motion blur measure
directly from test picture histograms, which can be used for benchmarking the
effectiveness of a camera IS system, in a fast and accurate manner.
4
2
Breaking the BOWS Watermark by Super-robustness
2.1
2.1.1
Introduction
Reverse Engineering
The concept of reverse engineering probably dates back to the days of the industrial
revolution. As defined in (2), reverse engineering is the process of analyzing a subject
system to identify the system’s components and their interrelationships and to create
representation of the system in another form or at a higher level of abstraction. Reverse
engineering is very similar to scientific research, with the difference in their subjects, i.e.
the subject of reverse engineering is man-made while the subject of scientific research
is natural phenomenon which no one knows if it was ever engineered. (3)
Perhaps the most well-known application of reverse engineering is to develop competing products in most industries. Due to computer software’s ever-growing complexity, reverse engineering has been widely used for visualizing the subject software’s
structure, its modes of operation, and the features that drive its behavior. In particular, it has been used in both the designing and the defending of malicious software,
in evaluating cryptographic algorithms, in defeating digital rights management(DRM)
which is prohibited by the Digital Millennium Copyright Act. (3)
In fact, reverse engineering applies to virtually any system that could be modeled
as a classifier, in which case the reverser has control over the input signal and would
get feedback from the classifier for each input. Problems including watermarking,
biometrics, and steganalysis all fall into this category.
5
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
2.1.2
Reversing Digital Watermark
During the transition from the 20th to the 21st century, digital watermarking was
deemed as a promising solution for the DRM (Digital Rights Management) problem.
After years of research and commercialization efforts, however, things have not gone as
expected due to a variety of reasons.
On the technical side, this is primarily due to three reasons as follows,
• Unlike cryptography, the multimedia object in digital watermarking such as dig-
ital image, audio and video are all “continuous” in some sense. In practice, given
a watermarked media and its corresponding un-watermarked version1 , a binary
search between the two can be performed to locate the local watermark detection
boundary. As a consequence, the geometry of a watermark detection region can
be gradually disclosed by Oracle Attack. The knowledge obtained could be finally
used to design a very efficient attack. For example, Sensitivity Attacks can often
achieve a PSNR even higher than the watermark embedding process. (4, 5)
• In practical scenarios, watermarking systems are open to a variety of attacks. (6,
7, 8)
• How to model the behavior of human attackers is an unsolved problem, which
implies that there is no well-proven security guidelines available for watermarking
system designers.
Reverse engineering digital watermarks is important because it provides a way of
quantifying the security of watermarking systems. In return, the lessons learned during
reverse engineering could serve as a design guide towards a more secure watermarking
system. This chapter examines a real life example showing how a watermarking system can
be reversed and then attacked. It begins by firstly presenting a 3-stage framework for
modeling generic watermarking systems. Reversing techniques we used to break the
BOWS watermark are covered here.
1
Un-watermarked media could either be the original media, or a watermarked version but subjects
to such a heavy attack that the embedded watermark can no longer be detected by the watermark
detector.
6
2.2 A 3-Stage Watermarking System Model
2.2
A 3-Stage Watermarking System Model
In the field of software reverse engineering, the cracking procedure is normally divided
into two phases, i.e. system level reversing and code level reversing. (3) We take a similar
approach by firstly identifying the structure of the watermark detection systems.
Most known watermark detection system can be decomposed into three stages, i.e.
transform, feature space, and the detection algorithm, as illustrated in Figure 2.1. A
media work1 under inspection is firstly subject to certain transform in order to be
represented in a proper domain. Then a subset of the feature space, or a subband, is
selected as the suspected watermark signal carrier. Lastly, this subset is fed into a detection algorithm, and the detection statistics generated are compared with a threshold
to produce the final watermark detection decision.
Figure 2.1: Three stage decomposition of watermark detection system.
It is worth mentioning that this decomposition model also fits many non-watermarking
applications which could essentially be modeled as a classifier. Therefore, the reversing
experience from our 3-stage watermark detector model may be readily transferred to
other non-watermarking systems that can be modeled this way.
2.2.1
Common Transforms and Detection Algorithms
As a useful knowledge to aid reverse engineering, we performed a paper survey to record
the most commonly used transforms and detection algorithms. Table 2.1 and 2.2 imply
that watermarking researchers as a whole are tend to choose common transforms and
detection algorithms.
It should be pointed out that the numbers listed in Table 2.1 and 2.2 are based
on around 70 papers we collected, hence they may not reflect the exact order of the
usage frequencies of all watermarking research documents. However, given an unknown
1
Media works includes digital image, audio, video etc.
7
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
Transform Type
# of Papers
Wavelets
Spatial Domain
DCT
Fourier
Vector Quantization
Independent Component Analysis
Moment
Histogram
Fractal
Orthogonal Basis Function
Scale Space Representation
21
15
14
5
2
2
1
1
1
1
1
Table 2.1: The distribution of transforms types.
Detector Type
# of Papers
Correlation
Maximum Likelihood
QIM
Rao Test
47
7
7
1
Table 2.2: The distribution of detection algorithms.
watermarking system, those two tables could be used as a priority guide for attackers
who want to schedule their attacks in a statistically more efficient way.
2.3
2.3.1
Breaking BOWS by Identifying Super-robustness
INTRODUCTION
From December 15, 2005 to March 15, 2006, the Break Our Watermarking System
(BOWS) contest challenged researchers to break an unnamed image watermarking system, by rendering a watermark undetectable in three separate images. (9) Doctored
images had to meet sufficient quality standards to be considered successfully attacked,
and all three had to be successfully attacked to qualify for the prize. The prize went to
the team which achieved the highest quality level by March 15, 2006. Our team from
8
2.3 Breaking BOWS by Identifying Super-robustness
Binghamton University was fortunate enough to have the highest quality level on the
15th, winning the prize of 300 dollars and a digital camera.
2.3.1.1
Goals of the Contest
The specific goal was to alter three separate 512-by-512 grayscale images, so that all
three fail to trigger a watermark detector. Relative to the original, the quality of
a successful attack had to exceed 30dB PSNR, and the winner was the team who
achieved the highest PSNR – although the average PSNR of all three was computed
from the average MSE:
P SN R = 10 log10
1 P3
3
k=1
2552
P
1
2
i,j (Ik [i, j] − Jk [i, j])
262144
The three images are illustrated below.
Figure 2.2: The three images from the BOWS contest.
2.3.1.2
A Comparison to the SDMI Challenge
It is inevitable to compare this watermarking challenge to a previous one, the SDMI
challenge of 2000. (8) In this contest, the Secure Digital Music Initiative (SDMI) challenged “hackers” to break four audio watermarking technologies and two supplemental
digital signature technologies. Several teams passed the preliminary phase of the challenge for all four watermark technologies, although representatives of Verance, Inc, say
that their own watermarking technology was not broken by any participant. (10, 11, 12)
There are some specific facets of the two contests which are worth comparing and
contrasting:
9
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
• The SDMI challenge was three weeks, and the BOWS contest was three months.
The organizers of the BOWS contest also arranged a post-challenge period of
three months after the watermarking algorithm was made public.
• The BOWS contest oracle took seconds, whereas the SDMI challenge oracle often
took hours. The SDMI challenge oracle presumably involved watermark tests and
listening tests by a human being, and the delay between submission and results
could take varying amounts of time. (10)
• The BOWS contest oracle’s behavior was well-defined. For example, the minimum
quality metric of the BOWS contest is explicitly defined at 30.00 dB PSNR, and
the detector was automated: it would accept any properly formatted image, even
a blank image or wrong image. This allowed reverse-engineering attacks of the
variety we employed. In the SDMI challenge, the human’s behavior was unclear.
Some researchers submitted properly signed TOC files to the signature verification
oracle, and received a rejection from the oracle; they could not tell if the oracle
was defective or if the human had a policy of rejecting trivial submissions. (10)
• The BOWS oracle told the user both the detector output and the image quality
(which could also be computed at the user’s end, since the quality metric was
explicitly defined.) The SDMI oracle would not reveal the specific reason for a
rejection: either the detector found the watermark or the audio quality was poor,
or both.
• The SDMI challenge provided watermarked and un-watermarked samples; the
BOWS contest only provided watermarked samples with no reference images. It
is likely the SDMI challenge would not be won if there were no un-watermarked
samples. With few possible opportunities to query an oracle, one could only analyze the difference between before and after clips, and possibly look for anomalies
in the watermarked clips.
One major difference is that the SDMI challenge was not amenable to oracle attacks.
The BOWS contest allowed potentially millions of queries in serial, versus hundreds of
queries in serial over the lifetime of the SDMI challenge.
10
2.3 Breaking BOWS by Identifying Super-robustness
2.3.1.3
Results
Our team achieved a PSNR of 39.22 by the deadline, winning the contest. (9) We
achieved this first by reverse-engineering the algorithm from experimental oracle submissions, and then by exploiting an observed sensitivity to noise spikes in the feature
space. John Earl of Cambridge University deserves special recognition for rapidly
increasing his PSNR results just before the deadline. If we extrapolate from his submissions, he could have beaten us were the contest just a few hours longer. (9)
The algorithm was then revealed to be one published by M. L. Miller, G. J. Doerr
and I. J. Cox in 2004. (13) We confirmed that our guess for the image transform and
subband matched the paper, while their detector partially explained the weakness we
used to defeat the detector.
In the subsequent three-month challenge with a known algorithm, Andreas Westfeld
achieved an astonishing 58.07 dB, a quality level exceeding even the minor error incurred
by digitizing the image in the first place. (14)
2.3.2
REVERSE ENGINEERING
The reverse-engineering of the algorithm was performed in two steps: reverse-engineering
the feature space by submitting experimental images, and reverse-engineering the subband of the feature space by carving out selected portions of the feature space once it
was known.
2.3.2.1
Determining the Feature Space
In retrospect, the feature space was very clear given the obvious block artifacts in the
images (see figure 2.3,) but we had to be certain. Our deducing of the feature space
followed the following principle: For each suspected image feature space, find a severe
attack that leaves that feature space untouched.
The philosophy behind this attack is that a severe modification of an image should
break all other watermarks, providing a test with high selectivity. Curiously, our approach was the opposite of the contest goals: we were to damage the image quality as
much as possible, while failing to remove the watermark.
For example, we would invert the image in luminance (this broke the watermark,)
add DC offsets to the pixels (did not break the watermark,) mirror-flip the image, or
11
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
Figure 2.3: Some visible artifacts in the image, suggesting either severe compression or
a block-based watermark.
blocks of the image, et cetera. After a small set of tests, we submitted a test image which
convinced us that the watermark was embedded in 8-by-8 DCT coefficients (figure 2.4.)
Here, each 8-by-8 block was given the worst possible DC offset without changing any
AC coefficients in an FFT or DCT of the block. This preserved the watermark, but
reduced image quality to 3.94 dB PSNR.
Figure 2.4: Some severe transformations of an image. Left, a shuffling of blocks aimed
at preserving block-based FFT magnitudes. Right, a severe addition of 8-by-8 DC offsets.
This preserved the watermark with only 3.94 dB PSNR.
Our attacks included flipping signs of common transform coefficients, reasoning that
if a watermark is embedded multiplicatively rather than additively, then only changes
in magnitude would damage the watermark. We found that changes in sign broke the
watermark. Thus this approach could tell us not only what feature space was used, but
also what kind of detector might be used. The precise dividing line between a choice of
12
2.3 Breaking BOWS by Identifying Super-robustness
feature space and choice of detector is unclear: if only magnitude information is used
in a detector, is this a restriction of the feature space?
2.3.2.2
Determining the Sub-band
Once we were convinced that the watermark was embedded in 8-by-8 AC DCT coefficients, we then submitted images with bands removed from each block. We took
the largest bands we could remove without hurting the watermark, and computed the
complement of their union. The result, illustrated below, matches the subband used in
the watermarking algorithm (13).
Figure 2.5: Left: AC removal pattern 1; Center: AC removal pattern 2; Right: intersection of the pattern 1 and 2.
In this process we were guided by knowledge of common watermarking algorithms.
Watermarks have commonly resided in low-frequency and middle-frequency bands, a
tactic described in the seminal paper by Cox et al, in order to make the watermark
difficult to separate from the image content. (15) We expect common subbands to be
upper-triangular regions, and square regions in the DCT coefficient matrix. Hence,
our attacks struck out lower-triangular sections of the matrix, and gnomonic sections.
Taking the union of the largest lower-triangular and the largest gnomonic section, we
found the largest “pattern” we could remove without damaging the watermark, the
largest region of geometric significance.
A second, parallel attempt at determining the sub-band focused on eliminating
DCT coefficients arbitrarily until the watermark broke. This produced a contradictory
result discussed in the next section: far fewer coefficients would need to be damaged
13
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
to eliminate the watermark, than our sub-band estimates (and later, the published
watermarking algorithm) would suggest.
2.3.3
BREAKING THE WATERMARK
Once we determined the secret algorithm, we could focus our work directly on the
feature space in hopes of inflicting maximum damage. Our initial approach was to
experimentally damage DCT coefficients until the watermark was removed, and then
develop a working set of coefficients and multiplier that incurred minimum damage.
We did not, however, adopt any sophisticated sensitivity attack, and performed
only thousands of oracle queries, by hand.
2.3.3.1
A Curious Anomaly
Early on, we discovered by experiment that an unusually small amount of damaged
coefficients could render the watermark undetectable. We tried the following algorithm:
1. Sort the DCT coefficients by magnitude.
2. Let k ← 200, d ← 50
3. while d >= 1:
(a) Tentatively eliminate the k largest coefficients.
(b) If the watermark remains, k ← k + d.
(c) If the watermark is removed, k ← k − d; d ← ⌊d/2⌋.
Initially, the elimination rule set each coefficient to 0. Later we improved our result
by amplifying rather than zeroing each coefficient, for example multiplying each of k
coefficients by 3. Doing so, we found that a few hundred coefficients were enough to
destroy the watermark in all three images.
However, our previous analysis gave us a sub-band of 49152 coefficients (A 512-by512 grayscale image, 4096 8-by-8 blocks, 12 AC coefficients taken per block.) It seems
unlikely that a few hundred zeroed coefficients would destroy a signal of 49152 samples.
We suspected that some aspect of the detector made it sensitive to noise spikes in the
right locations.
14
2.3 Breaking BOWS by Identifying Super-robustness
The watermark detector, which we could not guess, was revealed to be a Viterbi
decoder that extracted a long message from the 49152 coefficients, and compared that
to a reference message. (13) We believe that this is the reason that the resulting detector
could be derailed by a carefully chosen set of noise spikes. Whatever the cause, we then
sought to reduce the number of samples we zeroed or amplified.
2.3.3.2
The Rest of the Algorithm
Once we could eliminate the watermark by amplifying the k highest-magnitude coefficients, we then reduced the set size incrementally:
1. Choose k and D, so that multiplying the k highest-magnitude DCT coefficients
by D destroys the watermark.
2. For j = 1 · · · k:
(a) Reinstate the original value of coefficient j
(b) If the watermark becomes detectable, return coefficient j to its damaged
state.
We therefore wiped from our list of damaged coefficients all those which were not
needed to remove the watermark. The resulting set was surprisingly small, reducing
hundreds of coefficients to less than ten, per image, which we then reduced by hand,
increasing the amplification factor.
Figures 2.3 and 2.4 illustrate the reduction process for images 1 and 2: raising
the value D from 3 to 3.55, we found that some coefficients could be removed from
the attack set for an overall increase in PSNR. We stopped when removing the next
coefficient required an amplification that reduced the PSNR.
2.3.4
SUPER-ROBUSTNESS AND ORACLE ATTACKS
For purposes of reverse-engineering, we adopted the strategy of making severe changes
that will not effect one type of watermark. For example, if a watermark is embedded
in block AC DCT coefficients, a random DC offset to each block has no effect on the
watermark, even though it is an extreme amount of noise which renders the image
useless.
15
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
(Coefficient numbers)*amplification
PSNR
(12,69,107,127,132,140,141)*3.4
37.53dB
(12,69,127,132,140,141)*3.4
38.19dB
(12,69,127,140,141)*3.4
38.97dB
(12,69,127,141)*3.55
39.67dB
Table 2.3: Successive attacks for image 1. By amplifying a few AC coefficients by the
right value, the detector will fail.
AC coefficients
Multiplier
206
216
223
236
3.1
R
3.2
R
R
3.2
R
R
R
9.5
R
R
R
279
286
310
314
316
R
R
R
R
R
R
R
R
R
R
Table 2.4: Successive attacks for image 2. We start with 9 AC coefficients. ’R’ denotes
that the coefficient is no longer needed when the gain increases.
Watermarks are typically designed so that the watermark remains detectable even
if the image is reasonably degraded. A layperson might also expect the watermark to
become undetectable if image distortion is ridiculously severe—however, some carefully
chosen forms of degradation have no effect whatsoever on a detector, for example if they
miss the feature space.
We call this property super-robustness: the property of a watermarking algorithm
to survive select types of quality degradation far beyond what any reasonable person
would expect. If we can render an image unrecognizable and maintain the watermark,
then we have found an attack to which it is super-robust.
This may seem like an admirable property for a watermark to possess, but it is
in fact a security weakness. There is no need for a watermark to survive complete
destruction of an image, and in doing so it only gives information to an attacker.
The destructive operation is likely to destroy all other types of watermarks, giving an
attacker a reverse-engineering tool with high selectivity.
16
2.3 Breaking BOWS by Identifying Super-robustness
Figure 2.6: The three images after the attack. Note the obvious block artifacts.
2.3.4.1
Oracle Attacks Exploiting Super-robustness
The super-robustness attack exploits severe false alarms, by inflicting noise that would
break most other watermarks. As human beings, we may use expertise and common
sense in choosing such noise. A computer is not so gifted, and must determine severe
false alarms blindly.
We developed an oracle attack based on the principle of super-robustness, in which
an incremental amount of noise is added to an image as long as the watermark remains
detectable. (16, 17) We find that the noise component within the feature space quickly
converges to the detection region boundary and stays there; several properties of this
noise can be used to estimate the feature space size and detector threshold, if we guess
in advance what type of detector is used. However, this takes far more oracle queries
than we needed to break the more complicated BOWS watermark.
2.3.4.2
Mitigation Strategies
There is a simple strategy to prevent super-robustness attacks: prevent severe inputs to
the detector. This may be easier to say than do, because it is not clear how to recognize
a “severe” input. We offer several recommendations to the designer of a watermark
detector:
• Make every effort to normalize an image before detection.
• Place hard limits on feature coefficient magnitude, and overall energy.
• Whenever possible, give the detector access to the watermarked image for comparison’s sake. For example, a watermark can contain an identifier, which can be
17
2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS
used to look up a copy of a watermarked image. If the test image does not resemble the marked image in some rudimentary way, then the oracle should declare
the watermark undetectable.
• Reject inputs that fail to match a reasonable model of the image.
• Use “cocktail watermarking,” as described in (18), and write a detector that does
not reveal which watermarks are missing after an attack.
• Choose the right application. In a fingerprinting or traitor-tracing application,
an attacker doesn’t have direct access to the detector at all. Indeed, attackers
may not even know if content is watermarked, until someone gets in legal trouble.
Meanwhile, for content-control watermarks, especially those detected in consumer
electronics devices like television sets or portable music players, oracle attacks will
be very difficult to prevent.
Note that these are properties of a detector, not the watermark itself. Any proper
watermark feature sub-space will be super-robust to some attack—namely, destroying
all image data outside the sub-space—so prevention must focus on masking this effect,
using a sophisticated detector.
2.3.4.3
Other Advice to the Watermark Designer
Beyond the problem of super-robustness, several other exploitable weaknesses are worth
highlighting. We provide some tips to the watermark designer who wishes to avoid our
attacks.
1. Do not use a steganographic algorithm to embed one bit. If you simply wish to
watermark an image with a fixed copyright label, which is either present or not
(e.g., a one bit message) do not adapt a stego algorithm intended to encode
arbitrary long messages.
It seems reasonable to adapt a stego algorithm into a watermark algorithm by
embedding a long reference message, and later comparing that reference message
to the output of the stego-decoder. However, the improved bitrate of a steganographic algorithm is often at the expense of robustness. Using that payload to
send a single bit is suboptimal.
18
2.3 Breaking BOWS by Identifying Super-robustness
2. Watch for security flaws introduced by detector components. In this case, the watermark message was extracted using a Viterbi decoder, which we feel introduced
an exploitable weakness. Since each chip of the signal was detected by correlation, why not simply detected the whole watermark with a single correlation?
The answer is that the algorithm was intended to decode an unknown message,
rather than search for one specific message known in advance.
3. Consider how stereotypical the feature space is. Many, many image watermarking
algorithms use block DCT coefficients, or full-frame DCT coefficients. Many use
straight correlators or normalized correlators, although the BOWS watermark did
not. An attacker is going to guess these right from the start.
We are not advocating security through obscurity, by asking a designer to pick
an obscure transform; however, something like the key-dependent random image
transforms of Fridrich, et al might increase the guessing entropy of the concealed
algorithm, at least because the key-dependent image transform parameterizes
more of the algorithm as key (19).
2.3.5
CONCLUSIONS
The BOWS contest gave us an excellent opportunity to test new attack methodologies,
and to further explore the process by which watermarks are broken in the real world.
Because we reverse-engineered the watermark through the oracle, we question the
value of keeping a watermark algorithm secret in practice (although for the purposes
of the contest, keeping the algorithm secret was an excellent idea.) If a watermark
detector can be exploited as an oracle, then the algorithm is essentially leaked; the
extent of this leakage is the subject of further study.
19
3
Reversing Normalized Correlation based Watermark
Detector by Noise Snakes
3.1
INTRODUCTION
Sensitivity attacks, or more generally oracle attacks, have been employed for years to
break watermarking systems and have become increasingly sophisticated. (4, 5) These
attacks seek to remove a watermark by finding a point outside the watermark detection
region that is as close as possible to the watermarked image.
Sometimes, however, we are not initially interested in breaking an unknown watermark, but learning its structure, or its detection algorithm. If we have an unknown
algorithm, can an oracle be used experimentally to deduce it? The answer is yes, as has
been demonstrated by recent watermarking challenges (10, 20). This use of an oracle
for reverse-engineering is ad-hoc, however, and based on expert knowledge. An automated watermark reverse-engineer, which uses a detector as an oracle and analyzed its
output, would be very useful, and we have begun to explore how components of this
system could be built.
The problem is that embedding algorithms can be arbitrarily complex. However,
if we have some general information about a watermark’s generic structure, we can
then seek to reverse-engineer further details by oracle attack. For example we might
assume that a watermark is feature-based, extracting a vector f of unknown features
and comparing them to a reference watermark by some popular detector structure.
20
3.2 SUPER-ROBUSTNESS AND FALSE POSITIVES
3.1.1
Motivation
This research was primarily motivated by our success in reverse-engineering the unknown BOWS watermark. (20) In this project, we took watermarked images, distorted
them in severe ways, and fed them to the provided watermark detector as experimental
images.
Our goal was to test if a given feature space or detector was in use, by inflicting
severe changes that would not hurt certain types of watermarks. This gave us a test
with high selectivity for certain feature spaces, sub-bands and detector structures.
Extending this concept, we sought to explore how the false positives of a detector leak
information about the underlying watermarking algorithm.
Figure 3.1: An image from the BOWS contest, and a severe distortion that preserves the
watermark. From this experimental image we concluded that the watermark was embedded
in non-DC 8-by-8 DCT coefficients.
3.2
SUPER-ROBUSTNESS AND FALSE POSITIVES
Super-robustness is the property of a watermark to survive specific extreme and absurd
image distortions. Robust watermarks are of course intended to survive distortion if
the image quality remains high; however, one rarely mandates the converse, that the
watermark should break when the image is damaged far beyond its useful value.
Super-robustness may sound positive, but it is a security flaw. A watermark will
be super-robust to operations specific to its feature space and detector, and this makes
the watermark amenable to reverse-engineering, using the detector as an oracle. In
21
3. REVERSING NORMALIZED CORRELATION BASED
WATERMARK DETECTOR BY NOISE SNAKES
other words, super-robustness results in a detector that leaks information about the
watermarking algorithm.
3.2.1
Examples of Super-robustness
In the BOWS contest, we tested for several types of watermark algorithms by testing
for super-robustness. Here are some properties we used to fashion experimental test
images:
• If a watermark is embedded in AC frequency transform coefficients, it should be
immune to DC offsets. For block-based transforms, an independent DC offset
can be added to each block without hurting the watermark, although many other
types of watermarks would be damaged. See figure 3.1 for an example.
• If a watermark is embedded multiplicatively rather than additively in a signal
or a feature vector, then the signs of those features can be randomized without
changing the embedded signal.
• Any watermark that uses a proper feature subset can be super-robust to the
wiping out of all other information. For example, if a full-frame DCT watermark
uses a sub-band of 50000 AC coefficients, erasing or randomizing all but those
50000 coefficients could render the image unreadable without hurting the mark.
• If a watermark detector uses linear correlation, an amplified signal should always
trigger the detector while an attenuated signal eventually falls on the wrong side
of the detection region. In contrast, a normalized correlation detector is robust
to both severe amplification and attenuation.
Note that super-robustness is a property of the detector, not of the watermark embedder. Ultimately the detector is responsible for admitting a false positive. While a
certain feature space or embedding method may suggest a super-robustness attack, a
detector does not have to blindly test an attacker’s experimental images. This suggests some remedies for examples of super-robustness described above: build a smarter
detector, one that can recognize or rule out certain severe false positives.
22
3.2 SUPER-ROBUSTNESS AND FALSE POSITIVES
3.2.2
Generic Super-robustness
Suppose we have a generic watermarking algorithm, whose detector follows this basic
framework:
1. Extract a vector of n features f = {f1 , f2 , · · · fn } from the image.
2. Compare this vector to a watermark vector, for example by normalized correlation:
f ·w
≶ τ ∈ (0, 1)
kf kkwk
3. Report a successful detection if the correlation exceeds a chosen threshold.
In the case of normalized correlation, the detection region is a cone, with its tip at
the origin, its axis equal to the watermark vector w, and its interior angle determined
by the threshold τ . (21) Specifically, the cone angle is θ = cos−1 τ .
θ= cos
−1
τ
w
f
Figure 3.2: A normalized correlation detection region with threshold τ .
A key fact about this and may other detection regions is that it is unbounded. If
one can efficiently travel within it, then one can find images that set off the watermark
detector that are very distant from the original. This gives us a means to construct
severe false positives, from a starting point within the detection region: gradually add
noise to an image, as long as the detector continues to recognize the watermark.
23
3. REVERSING NORMALIZED CORRELATION BASED
WATERMARK DETECTOR BY NOISE SNAKES
3.3
3.3.1
MEASURING A DETECTION REGION
Noise Snakes
Our generic approach to generating false positives is to grow a “noise snake” using
incremental uniform noise vectors. We do not know the watermark vector, hence the
direction that we should be traveling; however, with an unbounded detection region
we expect that an expanding noise vector will move outward into distant climes of the
detection region, providing some useful information about its shape and orientation.
Our noise snakes are constructed via the following algorithm:
1. Start with test image I, treated here as a vector.
2. Initialize our snake vector to J ← I.
3. Do for k = 1, 2, · · · K:
(a) Choose a vector uniformly over the n-dimensional unit hypersphere Sn .
This can be accomplished by constructing an n-dimensional vector X of
i.i.d.N (0, 1) Gaussians, and scaling the vector to unit length.
(b) Choose an appropriate scaling factor α, which for normalized correlation is
ideally proportional to the length of J.
(c) If J + αX still triggers the watermark detector, J ← J + αX.
(d) Else, discard X and leave J unchanged.
We have observed that in high dimensions, a noise snake quickly converges to the
detection region boundary, and grows gradually outward. Thus a pair of these snakes,
constructed separately, can provide useful information about the detection surface.
3.3.2
The Growth Rate of a Noise Snake
When extending a noise snake by adding a uniform noise vector, we must wrestle with
two conflicting factors: large noise vectors are more likely to drop us out of the detection
region, but small noise vectors contribute little length per iteration. The optimal growth
factor α is the one that maximizes the expected increment kαXk · Pr[δ(J + αX) = 1].
In the specific case of snakes on a cone, the growth factor α is proportional to the
length of the snake as it grows. This can be seen by a simple and obvious geometric
24
3.3 MEASURING A DETECTION REGION
w
Figure 3.3: The construction of a noise snake.
w
Figure 3.4: A pair of constructed noise snakes, used to plumb the detection cone.
argument: the cone is congruent to scaled versions of itself. Thus if there is an optimal
length α to extend a snake of length 1, then M α is optimal to extend a snake of length
M . We need only determine the appropriate α for a snake of unit length.
The good news is that theoretically, the growth rate is exponential in the number
of queries. The bad news is two-fold: first, the best growth factor depends on the
dimension and cone angle, which we do not know. Secondly, the best growth rate is
nevertheless slow: For feature sets in the thousands, the best α ranges from 0.16 to
0.06. For larger feature sets, growth is small. We find in our experiments that for
realistic feature sizes, a snake of useful length requires a number of queries roughly
proportional to the dimension n.
25
3. REVERSING NORMALIZED CORRELATION BASED
WATERMARK DETECTOR BY NOISE SNAKES
0.5
0.5
Py*r*12
Py
0.4
0.4
0.35
0.35
0.3
0.25
0.2
0.15
0.3
0.25
0.2
0.15
0.1
0.1
0.05
0.05
0
Py*r*12
Py
0.45
Probability − Py
Probability − Py
0.45
0
0.1
0.2
0.3
Length − r
0.4
0.5
0
0.6
0
0.1
0.2
0.3
Length − r
0.4
0.5
0.6
Figure 3.5: Optimal growth factors for a snake of unit length. On the left, a cone angle
of π/3 and n = 1000. On the right, π/3 and n = 9000.
3.3.3
Information Leakage from Snakes
We now explain how noise snakes can be used to estimate parameters of a watermark
detector. First, we establish a fact about uniform noise vectors.
lemma 1 If W is chosen uniformly over the unit n-sphere Sn , and v is an arbitrary
vector, the probability Pr[W · v > cos θ] is
Z
Sn−1 θ n−2
sin
ρdρ
Sn 0
...where Sn is the surface volume of Sn .
proof 1 Since W is uniform, the probability of any subset of Sn is proportional to its
measure. Our subset Sn is rotationally symmetric about v, so let us integrate over the
v axis: consider point t ∈ [−1, 1] representing the v component of the hypersphere. For
√
each t we have a shell of radius r = 1 − t2 , contributing a total hyper-surface measure
√
of Sn−1 rn−2 dt2 + dr2 . For example, a sphere in three dimensions is composed of circu√
√
lar shells, and each a circular shell has contribution 2πr dt2 + dr2 = S2 r1 dt2 + dr2 .
The sphere portion with angle beneath θ is
Z r=sin θ
p
Sn−1 rn−2 dt2 + dr2
Area =
r=0
r
Z r=sin θ
dt2
n−2
=
Sn−1 r
dr 1 + 2
dr
r=0
r
Z r=sin θ
r2
Sn−1 rn−2 dr 1 + 2
=
t
r=0
26
3.3 MEASURING A DETECTION REGION
=
Z
r=sin θ
Sn−1 rn−2
r=0
dr
t
...since 2tdt + 2rdr = 0. Substituting r = sin ρ, we get
Area = Sn−1
Z
θ
dr
t
= dρ and
sinn−2 ρdρ
0
And we divide by Sn to get the probability of hitting that region.
The area of a unit hypersphere Sn is

 2(n+1)/2 π(n−1)/2
(n−2)!!
Sn =
n/2
 12π
( 2 n−1)!
for n odd
for n even
The surface fraction Cn = Sn−1 /Sn−2 therefore has a closed form (22)
Cn =
(
1 (n−2)!!
2 (n−3)!!
1 (n−2)!!
π (n−3)!!
for n odd
for n even
It might surprise the reader to know that the surface volume of a unit hypersphere
actually decreases with n, for n > 7.
Altogether, this means that if θ < π2 , then the probability drops exponentially with
n. This is the “equatorial bulge” phenomenon in high dimensions: as the number of
dimensions n gets large, the angle between two uniformly chosen direction vectors will
be within ǫ of π/2 with high probability.
Corollary 1 Two independently generated noise-snakes have off-axis components S
and T whose angle is very close to π2 .
proof 2 The density function of the set of noise-snakes is rotationally symmetric about
the watermark axis. This is because each component, a uniform vector, has a rotationally symmetric density, and because if s is a valid noise snake, so is T s, where T is
any rotation holding the cone axis constant.
Because of this, the probability Pr[S ∈ F ] = Pr[T S ∈ T F ]. If we subtract w and then
normalize each snake, the symmetric distribution implies that W = (S − w)/kS − wk
is uniformly distributed over the unit n − 1 sphere.
27
3. REVERSING NORMALIZED CORRELATION BASED
WATERMARK DETECTOR BY NOISE SNAKES
w
Figure 3.6: Lemma: two independently generated snakes have approximately perpendicular off-axis components.
This observation gives us a simple means to estimate the cone angle from two
√
constructed noise snakes X and Y of equal length r. Using trigonometry, ( 2r sin θ)2 =
r2 +r2 −2rr cos φ where φ is the angle between the snakes (see figure 3.7.) Rearranging,
we get:
sin2 θ = 1 − X · Y
We can then estimate the cone angle and detector threshold by constructing two
snakes of sufficient length, and computing their dot product.
θ
φ
w
Figure 3.7: The relation between the cone angle and the dot product of two independently
generated noise snakes.
28
3.3 MEASURING A DETECTION REGION
3.3.4
Extracting the Size of the Feature Space
Once we have a decent estimate for the cone angle, we can use another technique to
estimate the feature space size, a more useful piece of information. To accomplish this,
we use the detection oracle again to deduce the error rate under two different noise
power levels.
If we have a watermark vector w which falls squarely within the detection cone, and
add a uniform noise vector r, the probability of detection is
Sn−1
Sn
Z
ψ
sinn−2 xdx
−1 kwk
ψ = θ + sin
sin θ
krk
Pr[δ = 1] =
(3.1)
0
(3.2)
Where θ is the cone angle, which we estimate using the technique described earlier.
The second equation has one unknown, the watermark length kwk. The top equation
has one unknown, n; the hit rate PY can be estimated by experiment.
If we then consider different detection rates PY for uniform noise vectors of length
A, and then for noise of length B, we can combine these equations into the following
identity:
tan θ =
A sin ψA − B sin ψB
A cos ψA − B cos ψB
...where ψA and ψB are the integration limits in equation (1). This removes the
unknown w, while n is implicit in the constants ψA and ψB . Here is our algorithm:
1. Pick two power levels A and B. They can be arbitrary, but make sure that the
error rate under those noise levels is reasonably estimable in a few hundred trials
(e.g., 0.5 rather than 0.0000005.)
2. Use the watermark detector to estimate PA and PB , the detection rate under
uniform noise of length A and B, respectively.
3. For all suspected values of n:
(a) Use the hypothesized n and estimated detection rates in equation (1) to estimate the parameters ψA and ψB . The integral does not have a simple closed
form, but the integration limit is easily determined by Newton’s method.
29
3. REVERSING NORMALIZED CORRELATION BASED
WATERMARK DETECTOR BY NOISE SNAKES
(b) Compute ǫn ← tan θ −
A sin ψA −B sin ψB A cos ψA −B cos ψB 4. Choose the value of n that minimizes the error ǫ.
3.4
RESULTS
We tested our techniques on a generic watermark detector with a varying number of
DCT feature coefficients. We used normalized correlation with a detection threshold
of 0.5. This requires some justification: a proper detector would probably have a much
higher threshold, since for τ = 0.5 the false alarm rate is unnecessarily low. However,
in our experience watermark detectors are often designed in an ad-hoc manner, and
designers are often prone to choosing round numbers, or in the case of thresholds, a
value squarely between 0 and 1. In fact, a higher threshold would increase the growth
rate of our noise snake.
We first generated noise snakes to deduce a detector threshold. Figure 3.8 shows
our estimates for a detector with τ = 0.5. This required an average of 1016 detector
queries per experiment, to generate two snakes. Figure 3.9 shows the corresponding
dimension estimates, once the threshold is deduced to be 0.5.
It is worth pointing out that in this detector, the asymptotic false alarm probability
is approximately 2.39 × 10−33 . This means that we can roughly estimate a very low
false alarm probability in only thousands of trials.
3.5
3.5.1
DISCUSSION AND CONCLUSIONS
Is it Useful?
Our experience reverse-engineering watermarks “by hand” shows us that only dozens of
experimental queries are needed by an expert to identify a watermark using a common
feature space, sub-band, and detector. In contrast, this technique is generic, and utilizes
no expert knowledge or a priori information about likely watermarks. The number of
queries is far larger.
We feel that techniques of this type can be combined with codified expert knowledge into an overall system that dismantles an unknown algorithm and estimates its
parameters far more efficiently.
30
3.5 DISCUSSION AND CONCLUSIONS
Estimated Detector Threshold
1
0.9
0.8
Threshold
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
7
13
19 25 31
37 43 49
55 61 67 73
79 85 91
97 103 109 115
Experiment #
Figure 3.8: Estimated threshold using two noise snakes. Ground truth is τ = 0.5, n = 500.
Estimated dimension
1000
900
800
Dimension
700
600
500
400
300
200
100
0
1
4
7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Experiment
Figure 3.9: Estimated dimension using two noise snakes. Ground truth is n = 500.
31
3. REVERSING NORMALIZED CORRELATION BASED
WATERMARK DETECTOR BY NOISE SNAKES
Another question is whether such detector parameters are very useful. So what if
we know the rough number of features used by a watermark detector? The answer
is that this contains more information than one might initially think. For example,
the BOWS watermark used around 50000 AC DCT coefficients. (13) Consider the
significance of that number. In fact the exact value was 49152, the only genuinely
“round” hexadecimal number near 50000. If one is told that an image watermark uses
“around 50000” coefficients, 49152 is a very good guess for the actual value. It also
turns out that 49152 = 12 × 4096, and the BOWS test images contained 4096 8-by-8
blocks. Hence from even a rough estimate of the feature space size, one can make a
good guess that the algorithm is block-based, and amends 12 coefficients per block.
This is, in fact, the algorithm used.
3.5.2
Attack Prevention
As watermark designers, we might decide that enough is enough, and we have had it
with these snakes on this cone. What are we going to do about it? The most obvious
strategy is to avoid letting a detector be used as an oracle. Beyond that, preventing
super-robustness in a detector could prevent these attacks.
In our geometric analogy, this means that a detection region should be bounded.
Instead of a cone, for example, we can have a truncated cone by rejecting any image
whose feature vector is overlong. The problem with this approach is that it only
prevents super-robustness within the feature space: as long as some other information
is ignored—that is, as long as the feature space is a proper subspace of the image—we
can launch super-robustness attacks to deduce the feature space. When considering
non-feature and feature spaces combined, the detection region is no longer a cone, but
a hyper-cylinder with the cone as a base. This region remains unbounded.
Time will tell if these attacks can be prevented, or if super-robustness can be effectively utilized to reverse-engineer watermarking algorithms. We feel that great gains
in efficiency can be made, and an overall system for automation of watermark reverseengineering can use super-robustness as an exploitable flaw, and use oracle attacks such
as those described here to deduce crucial unknown parameters of a data hiding system.
32
4
A More Precise Model for Watermark Detector Using
Normalized Correlation
4.1
Introduction
As has been shown in previous work, with deliberately designed attacks, detection
statistics obtained from an oracle may shed light on certain parameters of a watermark
detector. One technique, the technique of noise calipers or snakes on a cone, was based
on a general strategy used to win the first phase of the first BOWS contest; (9, 13) the
strategy of locating modes of super-robustness within a detector was formalized into an
estimation technique of constructing “noise snakes” on the surface of a detection cone
by using the detector as an oracle. These results could be used to estimate the detection
region’s geometry. (20) In the subsequent BOWS2 contest, the designers proposed a
series of teeth-shaped structures along the watermark detection boundary specifically
designed to foil this type of reverse engineering technique. (23)
The simplified geometric model of watermark detection leads to a rather compact
analytical result which aids in reverse engineering useful information about the watermark detector. (16, 17) However, it suffers from a simplistic and unrealistic model
of a watermarked image. We can imagine a cover image as a vector within a feature
space; a watermark vector is added or in some way embedded to push the cover vector
into a detection region.Thus we have two vectors, a cover and watermark, whose sum
lie within a detection cone whilst the cover alone lies without. In the noise calipers
technique of Craver et al, a watermarked image is modeled as a point along a detection
cone’s axis. This greatly simplifies the mathematics needed to estimate the probability
of proper detection when uniform noise is added to the watermarked image, because
33
4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR
USING NORMALIZED CORRELATION
a bubble of uniform noise about the image corresponds to a hypersphere intersecting
with a cone, a shape with rotational symmetry about the cone axis. (16, 17)
In reality, a watermarked image is offset from the cone axis – a point along the axis
represents the watermark alone, added to an empty cover image. Thus this simplistic
model is inaccurate, although it produced precise estimates of watermark detector parameters for feature spaces of hundreds of dimensions. On the other hand, multimedia
watermark algorithms often have a very large watermark feature space, and this simplified model is not precise enough under very high dimensions. Therefore, we have
inserted a cover vector into the model in hopes of producing more accurate predictions
at the cost of complexity.
In this paper, we will first derive the analytical expression for the detection rate
of a watermarked work subject to Gaussian or uniform noise. Then we will show
that this model provides us with a practical way to disclose information of watermark
detector with large feature space dimensions. Computing the measure of intersection
between an n-cone and an offset n-sphere is of such complexity that it requires numerical
integration.
4.2
Problem Setting
Consider a watermarking scheme whose watermark feature space has dimensionality
n, with normalized correlation being adopted as the measure of watermark detection.
The watermark detection boundary can be visualized as a hyper-cone in n-dimensional
Euclidean space.
If a watermark attacker adds in-domain Gaussian noise scaled to a magnitude R
to a watermarked image Iw , the set of possible noised images can be visualized by an
n-sphere1 of radius R centered at the vector Iw. Note that when a Gaussian vector
is scaled to a fixed length, it is uniformly distributed over a hypersphere. The same
construction is used to visualize noise added to a watermarked image.
The 3D projection version of the overall picture is illustrated in figure. 4.1.
Let A = {a1 , · · · , an } be any vector in n-d space and θ be the detection threshold
angle, we have the intersection of the n-sphere and the n-d hyper-cone be defined by
1
We follow the geometrician’s notation here. Under topological notation, a sphere in n spatial
dimensions is considered an (n−1) sphere, because it is locally homeomorphic to an (n−1)-dimensional
hyperplane.
34
4.3 Quartic Equation
Figure 4.1: A detection cone intersects with a Gaussian noise ball.
the following two equations:
A·w
= cos θ
||A|| · ||w||
||A − Iw || = R
4.3
(4.1)
(4.2)
Quartic Equation
For simplicity and without loss of generality, we can choose our feature space coordinate
system so that the watermark cone axis is the unit vector e1 , and the cover image is
approximately a multiple of e2 . Lack of perfect orthogonality means that some of the
energy of the cover image I lies along e1 . We can choose our coordinate system so that
all energy from image and mark are confined to these two vectors:
w = (1, 0, · · · , 0)
| {z }
n−1
Iw = (c1 , c2 , 0, · · · , 0)
| {z }
n−2
35
4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR
USING NORMALIZED CORRELATION
From equation (4.1), we have:
a21 = cos2 θ ·
n
X
a2i
(4.3)
i=1
Which could be expanded as:
− sin2 θ · a21 + cos2 θ · a22 + cos2 θ ·
n
X
a2i = 0
(4.4)
i=3
From equation (4.2), we have:
(a1 − c1 )2 + (a2 − c2 )2 +
n
X
i=3
a2i − R2 = 0
Noticed in both equation (4.4) and (4.5), there is a common term
P
then let h2 = ni=3 a2i .
(4.5)
Pn
2
i=3 ai .
We
From equation (4.4), we have:
a1 = cot θ
q
a22 + h2
(4.6)
Substituting equation (4.6) to equation (4.5), we have:
cot2 θ (a22 + h2 ) + c21 + (a2 − c2 )2 + h2 − R2 = 2c1
q
a22 + h2 · cotθ
(4.7)
Taking the square of both sides and allowing csc2 θ h2 + c21 + c22 − R2 = g,
(csc2 θ a22 − 2c2 a2 + g)2 = 4c21 (a22 + h2 ) cot2 θ
(4.8)
Rearranging gives us a quartic equation with respect to a2 ,
Aa42 + Ba32 + Ca22 + Da2 + E = 0
A = csc4 θ
B = −4 csc2 θ c2
C = 2g csc2 θ − 4 cot2 θ c21 + 4c22
D = −4gc2
E = g 2 − 4h2 cot2 θ c21
36
(4.9)
4.4 Numerical Solutions
There will be four roots for this quartic equation and only two of them are valid,
which correspond to the components of the two intersection points p1 and p2 along
a slice of height h parallel to e2 . For any possible h, after the coordinates (a1 , a2 )
of p1 and p2 are determined, we are able to calculate the detection rate within that
projection plane which is defined by the fixed h. Then the overall detection rate could
be calculated using the following double integral,
Pr[YES] =
=
R hsup
−hsup
Sn−2 2
Sn
Sn−2 hn−3
Sn
R hsup
0
R p2
p1 Pr[a, h]da dh
Rn−1
hn−3
R p2
p1 Pr[a, h]da dh
Rn−1
(4.10)
Where hsup is the supremum of h given a fixed noise amplitude R and a detection
angle θ, i.e. where the two intersection points p1 and p2 merge into one. Sn is the
hyper-surface area of an n-sphere of unit radius, and we have:
Sn−2
n−2
=
Sn
2π
4.4
Numerical Solutions
Due to its inherent complexity, the analytic solutions of equation (4.9) are not obtainable even with the aid of computer, we have to resort to numerical analysis instead.
4.4.1
Identify Valid Roots
In order to pick out the two valid roots of equation (4.9), we first calculate a1 from a2
by equation (4.6), then all four pairs of (a1 , a2 ) are subject to two verification criteria.
Since equation (4.6) was just used and it is derived from equation (4.1), we use equation
(4.5) which is derived from equation (4.2) as the first criterion; the second criterion is:
both a1 and a2 should be real.
4.4.2
Detection Rate
Our numerical analysis has three known inputs, the normalized watermark vector w,
the watermarked image Iw , and the detection cone angle θ. For a given attacking
strength R, the numerical version of formula (4.10) is:
37
4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR
USING NORMALIZED CORRELATION
Pr[YES] =
=
√
hn−3 α(h) R2 − h2
Rn−1
r
hsup
h n−3
n−2 X
h2
α(h)( )
1− 2
πR
R
R
Sn−2 2
Sn
Phsup
h=0
(4.11)
h=0
In formula (4.11), we do summation along h. For a fixed h, we have a hyperplane
cuts both the detection n-cone and the noise n-sphere. Therefore, within this 2D
cutting plane, we have a circle intersects with a hyperbola, which gives two intersection
points p1 and p2 in general. The angle α(h) represents the portion of the arc between
p1 and p2 which is within the detection cone. Since the range of arccos function lies
within [0, π), in order to address α(h) larger than π, we introduced a third point p3
√
at (c1 + R2 − h2 , c2) and divide α(h) into two angles, i.e. ∠1 formed by p1 and p3 ,
and ∠2 formed by p2 and p3 . ∠1 and ∠2 can be calculated based on the coordinates of
the three points using cosine theorem. Note that there are three different geometrical
relationships as illustrated in figure. 4.2 and α(h) should be calculated accordingly.
p3
p3
p3
p
2
∠2
∠2
∠1
p
2
p
2
p1
∠1
∠2
∠1
p1
p1
Figure 4.2: Three possible geometrical relationships. Arc in solid line is within the
detection cone.
h (n−3)
)
. The calculation suffers
The hardest part in formula (4.11) is the term ( R
from a phenomenon we called ‘tail bulge’, which is due to the very large exponent
n − 3.1 It means almost all contributions of the whole detection rate lies at the very
tail part of h where h gets really close to hsup . Even with the ability of arbitrary
precision computation, obtaining the final detection rate within a reasonable precision
is not an easy task.
1
It evokes analogy with the term ‘equatorial bulge’ in hyper-dimensional geometry.
38
4.4 Numerical Solutions
However, certain watermark detector parameters can still be disclosed in some other
way even if the detection rate is not obtainable. In fact, we found out that the arc
angle α(h) exhibits certain behavior when it gets close to the critical detection rate,
i.e. when the detection rate gets around 50%, which can be used as a good indicator
to reveal the structure of the watermark detector.
4.4.3
The Pattern of α(h)
In order to test our model, we implemented a generic 8-by-8 block based AC DCT
watermark embedding algorithm which is similar to the one that was used in the BOWS
contest, and then applied two different sets of parameters for testing. In the first group
of tests, we use those five AC DCT coefficients which are marked up with numbers
in figure. 4.3 for watermark embedding. In the second group of tests, we use twelve
coefficients which are marked up with gray shade in figure. 4.3.
Figure 4.3: Left: AC coefficients used for embedding. Middle: Image attacked by noise
with amplitude R = 10000. Right: Image attacked by noise with amplitude R = 15000.
4.4.3.1
Five AC DCT Embedding
In our first group of tests, the standard 256-by-256 grayscale testing image ‘cameraman’
is used as cover image I. A normalized watermark signal w is generated at random.
8-by-8 DCT transformation of the cover I is performed, and an appropriately magnified
version of w is embedded into the five AC DCT coefficients of each block to obtain the
watermarked image Iw . Hence, the feature space dimension n is 5120 in this test. The
numerical values of the vector Iw is:
39
4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR
USING NORMALIZED CORRELATION
.
Iw = (92.8, 5499.8, 0, · · · , 0)
| {z }
5118
5−AC DCT Embedding, thresold=0.011
195
190
5−AC DCT Embedding, threshold=0.011
0.62
R=5k
R=6k
R=7k
R=8k
0.6
0.58
0.56
Detection Rate
Arc Angle(in degrees)
185
180
0.54
0.52
0.5
175
0.48
0.46
170
0.44
165
0
1000
2000
3000
4000
h
5000
6000
7000
0.42
4000
8000
4500
5000
5500
6000
6500
7000
Gaussian Noise Amplitude R
7500
8000
8500
9000
Figure 4.4: Detection threshold is 0.011. Left: the arc angle α(h) predicted by our model.
Right: the detection rate obtained by Monte Carlo method.
We first set the detection threshold cos θ to be 0.011, and then check if the numerical
results predicted by our model agrees with the data collected by Monte Carlo method
using real image and real random Gaussian noise of amplitude R. Our test results are
illustrated in figure. 4.4. The left plot in figure. 4.4 shows four separate curves of the
arc angle α(h), each under a different attacking strength R respectively. It is obvious
that there is a sharp transition of the curve shape between the curve of R = 6000 and
the curve of R = 7000. This phenomenon is echoed by the right plot, in which the 50%
detection rate also lies between 6000 and 7000.
According to the structure of formula (4.11), α(h) can be deemed as a weighting
h (n−3)
factor of ( R
)
, and we know that virtually all contributions of the detection rate
lies at the very tail part of h. Therefore, whether the curve of α(h) is pointing upwards
or downwards, i.e. larger or smaller than π, at its tail part would suggest whether the
detection rate is greater or less than 50%.
Likewise, we obtained another pair of curves which are illustrated in figure. 4.5
for the case of cos θ = 0.008. Note that this time our model predicts the critical
condition would occur within (10000, 11000), while the 50% detection rate appears
40
4.4 Numerical Solutions
5−AC DCT Embedding, thresold=0.008
188
186
0.58
184
0.56
182
0.54
Detection Rate
Arc Angle(in degrees)
5−AC DCT Embedding, threshold=0.008
0.6
R=9k
R=10k
R=11k
R=12k
180
178
0.52
0.5
176
0.48
174
0.46
172
170
0
2000
4000
6000
h
8000
10000
0.44
0.8
12000
0.85
0.9
0.95
1
1.05
1.1
Gaussian Noise Amplitude R
1.15
1.2
1.25
1.3
4
x 10
Figure 4.5: Detection Threshold is 0.008. Left: the arc angle α(h) predicted by our
model. Right: the detection rate obtained by Monte Carlo method.
within (9000, 10000) in our Monte Carlo experiment. While there could be other causes,
we suspect this discrepancy is mainly due to the image clipping effect, i.e. under severe
attacks like R = 10000, many pixel values will exceed the range of [0, 255] and are
subject to a clipping processing before they can be saved as an attacked image. For
example, attacks like the one shown in the middle of figure. 4.3 with R = 10000 will
overflow around 9% of the image pixels.
This clipping process breaks the perfect spherical shape of the noise hypersphere,
and it is not considered in our model. Since we chose a coordinate system so that the
watermarked image Iw only has nonzero components in e1 and e2 , it means we applied
a rotation transformation whose orientation is determined by the watermarked image
Iw as well as the random watermark w. Hence, we should expect this discrepancy to
be persistent under the same test settings, i.e. the same Iw and w.
In our experiment, we found out that the discrepancy only occurs when a relatively
small detection threshold is chosen, or equivalently, when the critical condition occurs at
a relatively large R. Fortunately, many practical watermarking algorithms chose, and in
fact should choose, a relatively large detection threshold, for the sake of its own security.
We have already demonstrated in our previous paper how the super-robustness of a
watermark algorithm can actually compromise its security against human attackers.(20)
In all, the curve of α(h) which is provided by our model matches the detection curve
obtained from Monte Carlo experiments when R is not too large. When R gets very
41
4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR
USING NORMALIZED CORRELATION
large, the critical range of R predicted by our model is consistently one step larger than
the results from Monte Carlo experiments.
4.4.3.2
Twelve AC DCT Embedding
In our second group of tests, we choose the 256-by-256 grayscale testing image ‘lena’
as cover image I. A normalized watermark w is generated at random. Then we embed
mark into twelve AC DCT coefficients of each block to obtain the watermarked image
Iw . This time, the feature space dimension n is 12288, and the numerical values of Iw
is:
.
Iw = (162.5, 5217.3, 0, · · · , 0)
| {z }
12286
12−AC DCT Embedding, thresold=0.015
200
195
12−AC DCT Embedding, threshold=0.015
0.8
R=8k
R=9k
R=10k
R=11k
0.75
0.7
190
185
Detection Rate
Arc Angle(in degrees)
0.65
180
175
0.6
0.55
0.5
0.45
170
0.4
165
160
0.35
0
2000
4000
6000
h
8000
10000
12000
6000
7000
8000
9000
10000
Gaussian Noise Amplitude R
11000
12000
13000
Figure 4.6: Detection threshold is 0.015. Left: the arc angle α(h) predicted by our model.
Right: the detection rate obtained by Monte Carlo experiment.
Once again, as shown in figure. 4.6, our model prediction agrees with the results
from Monte Carlo experiment. On the other hand, when the detection threshold gets
too small, consistent discrepancy reappears, as shown in figure. 4.7.
The attacked image in the middle of figure. 4.3 is attacked by a Gaussian noise with
R = 15000. The average overflow rate for this strength of attack is marginally over 9%
under this group of settings.
42
4.5 Conclusions
12−AC DCT Embedding, thresold=0.01
190
12−AC DCT Embedding, threshold=0.01
0.65
R=14k
R=15k
R=16k
R=17k
188
0.6
186
182
Detection Rate
Arc Angle(in degrees)
184
180
178
0.55
0.5
176
0.45
174
172
170
0
2000
4000
6000
8000
10000
12000
14000
16000
0.4
1.2
18000
h
1.3
1.4
1.5
1.6
Gaussian Noise Amplitude R
1.7
1.8
4
x 10
Figure 4.7: Detection threshold is 0.01. Left: the arc angle α(h) predicted by our model.
Right: the detection rate obtained by Monte Carlo experiment.
4.5
Conclusions
In this paper, we proposed a more precise model for watermark detectors using normalized correlation, which is a popular choice in practice. The model boils down to a
quartic equation to solve, which is inherently complex due to the symmetry breaking
as compared with our former model. Since its analytical solutions is not obtainable
even with the aid of computer, we turned to solve it numerically.
Due to the quirky ‘tail bulge’ phenomenon on the way of evaluating the detection
rate, the overall detection rate is also infeasible to compute. However, also due to the
‘tail bulge’ effect, information of the watermark detector can be disclosed by checking
the pattern of the arc angle curve, i.e. by catching the shape transition of the curve
which depicts α(h) versus R.
We implemented a generic block based AC DCT embedding algorithm and use two
different sets of parameters to verify our model. It turned out that our model provides
good match with the detection results obtained from Monte Carlo experiments when R
is not really large. This is when the feature space dimension n equals to 5120 or 12288.
Hence it proved to be reliable under large feature space dimensions and could be used
as a practical way to reverse engineer parameters of the target watermark detector.
On the other hand, our model has consistent discrepancy when the attacking noise
amplitude R gets too large, and we suspect this is mainly due to the clipping effect
43
4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR
USING NORMALIZED CORRELATION
under large R. Fortunately, this issue is largely relieved by the principle that a secure
watermark algorithm should not have a really small detection threshold, or in other
words, super-robustness has been proved to be a security weakness of a watermark
algorithm.
44
5
Circumventing the Square Root Law in Steganography by
Subset Selection
5.1
The Square-root Law in Steganography
Cachin introduced a secure measure for steganographic systems using the KullbackLeibler divergence DKL (P (n) ||Q(n) ), where P (n) and Q(n) denote the distribution of
the cover data and the stego data respectively.(24)
Under this security measure, a steganographic system is called perfectly secure
if DKL (P (n) ||Q(n) ) = 0, and is called not perfectly secure if DKL (P (n) ||Q(n) ) > 0.
Likewise, a ǫ-secure steganogrphic system is,
DKL (P (n) ||Q(n) ) = ǫ,
ǫ>0
(5.1)
For a not perfectly secure steganographic system as defined above, if the three
conditions listed as follows are satisfied, the Square Root Law states: “The secure
payload grows according to the square root of the cover size.” (25, 26, 27)
• The steganographic cover can be modeled as a stationary Markov chain.
• The embedding operation can be modeled as independent substitutions of one
state for another.
• The embedding scheme does not preserve all statistical properties of the cover.
Note that for a perfectly secure steganographic system, the embedding rate grows
linearly with the cover size – a positive capacity, and an imperfectly secure steganographic system has zero-capacity. In practice, for complex objects like digital media, it
45
5. CIRCUMVENTING THE SQUARE ROOT LAW IN
STEGANOGRAPHY BY SUBSET SELECTION
is impractical(or may never be possible) for either the warden or the steganographer to
have perfect knowledge about the cover source. Therefore, the square root law is well
applicable for practical steganographic systems that are all not perfectly secure, and
its secure payload result essentially refines the established zero-capacity result for such
systems. (26, 27)
Also note that the secure payload in this law refers to the number of embedding
changes, and hence may be different from the steganographic information been conveyed. By using adaptive source coding methods, the square root law implies an
√
asymptotic information entropy of O( n log n).
5.2
Circumventing the Law by Subset Selection
5.2.1
Introduction
While the square root law works on the small enough cover subset to evade detection,
we work on the problem of finding the largest cover subset that possesses a desired
distribution. That is, a substring whose type distribution is close to a fixed point under
the embedding processing. In other words, a distribution that is unchanged under the
process that alters it.
It is worth noting that our scheme does not contradict the square root law, because
our subset selection procedure depends on the observation of the cover which deviates
from the ‘independence condition’ of the square root law.
5.2.2
The i.i.d. Case
For a stream of i.i.d. random variables {Xk }nk=1 over a finite outcome set C = {Ci }N
i=1 ,
its histogram is equivalent to the pmf1 of a single random variable, which is denoted by
fX . The task of finding a subset of coefficients that has a histogram of f can therefore
be translated to: find a scale factor α so that for all symbols in C
α max[
C
f (Ci )
]≤1
fX (Ci )
(5.2)
The number of embeddable coefficients is maximized when equality is achieved in
the above inequality.
αu = min[
C
1
Probability mass function.
46
fX (Ci )
]
f (Ci )
(5.3)
5.2 Circumventing the Law by Subset Selection
Also, it is easy to prove
0 ≤ αu ≤ 1
(5.4)
The left side of the inequality can be proved by the fact that both f and fX are
pmf and therefore are non-negative.
Moreover, let Cf denote the set of outcomes that f > 0 and CfX denote the set of
outcomes that fX > 0, we have
0 < αu ≤ 1
if Cf ⊆ CfX
(5.5)
The right side of the inequality in (5.4) could be proved by contradiction in the
following manner
Let αu > 1, then
1=
X
C
fX (Ci ) ≥
X
αu f (Ci ) = αu > 1
C
Which is a contradiction, therefore αu ≤ 1, with equality if and only if f = fX .
Consequently, the maximal number of embeddable coefficients is
n
X
αu f (Ci ) = αu n
(5.6)
C
Therefore, if the condition in (5.5) is satisfied, the embedding payload is linearly
proportional to the total number of coefficients n in i.i.d. case. Meanwhile, it is worth
noting that what the embedding function g really does on this subset of data is no more
than the shuffling operation, hence the histogram is preserved.
5.2.3
The First Order Markov Case
Let A denote the transition matrix with element aij , {i, j = 1, · · · , N }. We assume for
all (i, j), aij > 0 so that A is irreducible.
aij = Pr{Xk = Cj |Xk−1 = Ci }
Let hC denotes the histogram of the cover, it no longer equals to fX , which is the
pmf of a single random variable Xk . Follow the same procedure as the i.i.d. case, we
can choose a subset of coefficients that follows f ‘under’ the histogram hC . Those coefficients are randomly distributed along the cover coefficient stream. However, the same
47
5. CIRCUMVENTING THE SQUARE ROOT LAW IN
STEGANOGRAPHY BY SUBSET SELECTION
shuffling operation would not work anymore, because aij now links adjacent coefficients
together.
On the other hand, due to the fact that coefficients are linked together, we can still
perform shuffling if we cut the cover coefficient stream into chunks in a non-overlapping
manner. The requirement for a possible shuffle is, two candidate chunks must have the
same header and tail coefficients, anything in between does not matter because we are
dealing with first order Markov process.
Moreover, it is obvious that only chunks of three coefficients long are what we really
looking for, since the real embeddable coefficients are those between the header and
the tail.
5.2.3.1
The Algorithm
The possibility of a triple with header Ci and tail Ck is
Pr{(Ci , C∗ , Ck )} = Pr(Ci )
X
(aij ajk )
(5.7)
j
Also, we have
X
Pr{(Ci , C∗ , Ck )} = 1
(5.8)
i,k
The first group of embeddable coefficients is chosen according to the following algorithm:
1. Choose (i, k) that has maximal Pr{(Ci , C∗ , Ck )} according to (5.7) and name
them (H, T ).
2. Draw circles along the cover stream where there is triple matches (CH , C∗ , CT )1 .
3. Pick out as many non-overlapping circles as possible.
Figure 5.1 illustrates the second and the third step. Each point on the line denotes
a coefficient in the cover stream.
The worst overlapping case is illustrated in Figure 5.2, with only a fraction of all
circles can be picked out in a non-overlapping manner. In this worst case, at most 1/2
of the circles could be picked out when CH 6= CT , and at most 1/3 of the circles could
be picked out when CH = CT .
1
C∗ denotes the wildcard symbol.
48
5.2 Circumventing the Law by Subset Selection
The second step
H
CHCH
C
CT CT
CT
H H
CH
C C
CT
CT CT
The third step
H
CH
C
CT
CT
H
CH
C
CT
CT
Figure 5.1: The second and third step of our algorithm.
H H
H H
H H
When CH 6= CT
CHCH
C C
C C
C C
CT CT
CT CT
CT CT
CT CT
H
H
H
H
H
H
H
When CH = CT
CH
C
C
C
C
C
C
C
CT
CT
CT
CT
CT
CT
CT
CT
Figure 5.2: The worst overlapping case.
49
5. CIRCUMVENTING THE SQUARE ROOT LAW IN
STEGANOGRAPHY BY SUBSET SELECTION
Since the number of embeddable coefficients is equal to the number of non-overlapping
circles on the cover stream. In the worst case, the maximal number of the first group
of embeddable coefficients is
X
n
1
n
Pr{(CH , C∗ , CT )} = Pr(CH )
(aHj ajT )
3
3
9
j
X
n
1
n
nlb
Pr{(CH , C∗ , CT )} = Pr(CH )
(aHj ajT )
1 =
3
2
6
nlb
1 =
j
if CH = CT
(5.9)
if CH 6= CT
(5.10)
As we can see, the number of embeddable coefficients given by (5.9) and (5.10) is
linear to n, and it is for the worst case.
Likewise, we can pick out the second group of embeddable coefficients by firstly
choosing head and tail which give the second largest Pr (Ci , C∗ , Ck ) and then draw
non-overlapping circles on the remaining white spaces along the cover stream. Repeat
this method, we can pick out the third and more groups of embeddable coefficients.
Note that during embedding, the shuffling operation is always limited within the same
group.
The total number of embeddable coefficients is the summation over all possible
groups. Since for the worst case of the first group, the number of embeddable coefficients
is already linear to n, we proved that by using the technique of subset selection, linear
embedding rate can be achieved for the first order Markov case.
5.3
Conclusion
Subset selection technique is a theoretical trail of our embedding philosophy, i.e. create
a stego-friendly cover first before embedding. Since no distortion limit is considered in
our analysis, in the big picture, it is a special case of the more general result established
in (28).1
1
Thank Professor Fridrich for pointing this out.
50
6
Covert Communication via Cultural Engineering
6.1
Introduction
The concept of supraliminal channels was introduced in 1998, as a steganographic
primitive used to achieve public key exchange in the presence of an active warden.(29)
Alice and Bob must communicate in the presence of Wendy, a warden; they have
no shared keys in advance, and must enact a key exchange to use a steganographic
algorithm. However, any overt key exchange is evidence of secret communication,
and any steganographic bits embedded with a known key can be damaged by Wendy.
Supraliminal channels allow robust and innocuous transmission of public key data in
this environment.
In this paper, we describe how a new feature added to the video chat capabilities
of iChat for the Apple Macintosh allows supraliminal content to be attached to a
videoconferencing session. By implementing animation primitives that covertly accept
ciphertext from an external application, and then fold this ciphertext into the evolution
of a computer animation, ciphertext can be transmitted within the semantic content
of the video.
This paper is organized as follows: in section 6.2, we review supraliminal channels
and outline an efficient protocol to achieve a supraliminal channel from a steganographic
channel and an image hash. In section 6.3 we explain our overall philosophy of data
hiding and how it differs from the traditional approach of imperceptibly modifying
cover data. In section 6.4 we describe our software architecture, and in section 6.5 the
results of our supraliminal channel.
51
6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING
6.2
Supraliminal Channels
Unlike a steganographic channel, in which a multimedia object is indetectibly modified,
a supraliminal channel chooses or controls the semantic content of the cover data itself.
This can be achieved by a cover data generation program, or mimic function which
transforms an input string reversably into a convincing piece of cover data. (30). However, we will show that overt content generation is not strictly necessary to construct
a supraliminal channel. Supraliminal channels are not keyed, and are readable by the
warden. The main purpose of such a channel is ensuring total robustness of the message
in the presence of an active warden, even if the embedding and detection algorithm is
known. This follows from an assumption that the warden is allowed to add noise, but
not make any modifications that effect the semantic meaning of the cover data being
sent.
If Alice and Bob use such a channel to send a meaningful message, it will be readable to Wendy; however, they can send strings which resemble channel noise, so that
Wendy cannot tell if an image is purposely generated. This is sufficient to transmit key
data in order to accomplish key exchange, after which the shared key can be used in
a traditional steganographic channel. The combination of a key exchange using subliminal and supraliminal channels achieves so-called pure steganography, that is, secure
steganography without prior exchange of secret data. (29)
6.2.1
A two-image protocol for supraliminal encoding
If one has a very low bitrate cover data hash function, it can be extended into a higher
bitrate supraliminal channel by bootstrapping a keyed steganographic algorithm and
extending the transmission over multiple steps. Suppose that Alice and Bob have access
to a cover data hash algorithm h : I → {0, 1}n . As a toy example, a recipient can choose
a single word that best describes an image I, or other descriptive text suitably derived
from I, and let h(I) be the SHA-256 hash of that text.
This alone is not suitable for a supraliminal channel, firstly because the bit rate
is extremely low, and secondly because the hash cannot be inverted into a generation
function. However, Alice can perform the following:
1. Alice chooses two natural cover images I and J.
52
6.2 Supraliminal Channels
2. Alice embeds a message m in J, using a robust keyed steganographic algorithm
with the hash of I as a key:
Jˆ = Embed(m, J, key = h(I))
3. Alice sends Jˆ to Bob.
ˆ
4. Bob confirms receipt of J.
5. Alice sends I.
ˆ h(I))
6. Bob (and Wendy) can extract the message m = Detect(J,
This two-image protocol is illustrated in figure 6.1. Note that the message is public;
the purpose of the channel is robust transmission without requiring a known key in
advance, essentially robust publication. This approach to public-key steganography
was introduced by Anderson, Petitcolas and Kuhn, requiring a keyed steganographic
transmission followed by a broadcast of the key after the data is known to be securely
received. (31) The authors point out that a broadcast of a key is suspicious, but can
be plausibly explained in certain scenarios; this protocol exhibits such a scenario, by
replacing the broadcast with the transmission of an arbitrary message.
Figure 6.1: The two-image protocol for establishing a supraliminal channel from an image
hash. Alice and Bob’s transmissions are each split across two images, with the semantic
content of the second keying the steganographic content of the first.
This approach to creating a supraliminal channel has several desirable advantages.
First, it can use any cover data hash algorithm, which need not be invertible. Second,
it does not require any algorithm to generate content, which is difficult. Instead, Alice
chooses a key by arbitrary choice of natural content.
53
6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING
6.2.2
Public key encoding
As mentioned above, the message m must be as statistically meaningless as channel
noise—that is, it must resemble channel noise that would be found if Wendy attempted
to decode an innocent message. In the sequel, we will show a channel where a computer
animation, originally driven by a pseudo-random number generator, is instead driven
by message bits. This means that if message bits are computationally indistinguishable
from the PRNG output, it can be sent over the channel. This goal is trivially achievable
if the PRNG is, say, a counter-mode AES keystream.
For this reason, we consider the problem of packaging key data to resemble n-bit
strings of i.i.d. fair coin flips. Public keys are randomly generated, so it may seem that
they can be convincingly disguised as random strings; however, keys have structure, as
well as unlikely mathematical properties. An RSA key’s modulus is not any random
number, but one that just happens to be difficult to factor—a relative rarity among
random strings.
To achieve random packaging of a public key, we use the Diffie-Hellman key exchange
with a public prime p for which the discrete logarithm function is hard, and a generator
for Z/pZ. Alice must transmit a single value A = g a
transmits the value B =
gb
(mod p) for a secret a, and Bob
(mod p). Thereafter their session key is B a = g ab = Ab .
The value of A is uniformly chosen over p − 1 values, and must be shaped so that
it is uniform over all n-bit strings, where n = 1 + ⌊log2 p⌋. The algorithm to do so is
simple:
1. Set n = 1 + ⌊log2 p⌋;
2. Generate a uniform n-bit number a;
3. if(0 < a ≤ p − 1){
set A = g a
(mod p);
}
4. else if (n ≥ p) : {
repeat: {
Choose b uniformly among {1, 2, · · · p − 1};
set B = g b
(mod p);
54
6.3 Manufacturing a Stego-friendly Channel
set A = p + B;
}
} while(A ≥ 2n );
5. else set A = 0;1
This algorithm encodes a number of the form g a
(mod p) such that any n-bit
string is equally likely to be observed.
6.3
Manufacturing a Stego-friendly Channel
The most common approach to steganography entails the subtle modification of a fixed
but arbitrary piece of cover data. This must be performed so that no statistical test
can distinguish between natural and doctored data. The challenge of this approach is
that we have no complete statistical model for multimedia data, and cannot guarantee
that any artificial modification of cover data can go undetected.
A second approach to steganography avoids “natural” data and instead places bits
in simpler, artificial content, for example TCP packet timestamps or the ordering of
note-on messages in MIDI data. (32, 33). While this data may be statistically simpler
to characterize, the same modeling problem prevents an absolute guarantee of secrecy;
another problem with this approach is that discrete data is often easy to overwrite
without harming the content.
A third, extreme approach is to design a form of communication that makes an ideal
cover for data embedding, and somehow convince the public to use it. Thus instead
of using natural data, artificial data is made natural by placing it in common use.
As a toy example, one could create a popular multi-player game or other multi-user
application, and design a messaging protocol that makes considerable use of i.i.d. bit
strings as random nonces. These could then be replaced by ciphertext to establish a
covert channel.
The obvious flaw in this plan is convincing a large group of people to start using and
sending the type of data we want. Users will not adopt a new media format en masse
1
Key exchange therefore fails with probability 2−n , a virtual impossibility but necessary to make
the distribution truly uniform.
55
6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING
simply because it makes steganography harder to detect 1 . However, each emerging
trend in social networking carries with it the possibility of an ideal stego channel.
6.3.1
Apple iChat
In late 2007, Apple announced version 10.5 of their operating system, Mac OS X. This
included an extension to their bundled iChat instant messaging software that allows
the user to apply special effects to a video chat session. Special effects include camera
distortions, various image filters, and the ability to replace the user’s background with
a specific image or video file. The software allows users to drop in their own image
and video backdrops. In January 2008, the operating system market share statistics
for Mac OS X is a bit more than 7%. (35)
The authors suspected that if the software allowed image and video backdrops, then
it would allow arbitrary multimedia files adhering to Apple’s QuickTime standard. This
includes Quartz Compositions, computer animations represented as programs rather
than static multimedia data. Quartz Compositions are used within Mac OS X as
animations for screen savers and within applications. They are composed of flow graphs
of rendering and computation elements, called patches, that reside in user and system
graphics libraries.
The ramifications of a computer-generated backdrop in a popular video chat application are significant: computer animations are often driven or guided by pseudo-random
number generators, and if by some device we could replace the pseudo-random data
with ciphertext, then we have a supraliminal channel within a video chat application.
As long as an animation uses pseudo-random data transparently enough that the PRNG
bits can be estimated from the video at the receiver, then some of the semantic content
of the video is generated from message bits. The fact that this is possible within the
default instant messaging client of a popular operating system creates the opportunity
for a plausible, but stego-friendly form of communication.
1
This approach was taken to its logical extreme in 2001, when a software developer announced a
simple LSB-embedding steganographic tool on Usenet cryptography fora. When people pointed out
the well-known statistical attacks on brute LSB embedding, the author responded with the astonishing
argument that if only the public would start randomizing the LSB planes of their images, his stegoimages would be harder to detect. (34)
56
6.4 Software Overview
6.3.2
Cultural engineering
Social engineering is the act of beating a security system by manipulating or deceiving
people, for example into revealing confidential information or granting access to an
adversary. (36) On a larger scale than the individual, we have the possibility for cultural
engineering: achieving a security goal by swaying society in general towards a certain
behavior. Popularizing a steganographically friendly form of communication is a form
of cultural engineering.
A cultural shift that has greatly affected the status of cryptography is the public
embrace of the Internet for electronic commerce, and other applications requiring secure
HTTP connections. This made the public use of encryption far more common; not only
did this phenomenon make encryption less conspicuous, but it fostered a general public
expectation of privacy-enhancing technologies.
6.4
Software Overview
Our data-hiding channel is concealed in a computer animation, whose pseudo-random
number generator occasionally outputs ciphertext. The overall transmitter architecture
is shown in figure 6.2. The iChat application hosts a computer animated backdrop;
the computer animation contains a doctored pseudo-random generator, that retrieves
pseudo-random bits from an external server. The server can be fed data from an
application, and outputs either pseudo-random bits or ciphertext bits.
Figure 6.2: The transmission pipeline. Data is fed to a ciphertext server, polled by a
doctored PRNG patch in an animation.
The server is a rudimentary Tcl program with C extensions for generating pseudorandom data. It generates pseudo-random data using 128-bit AES in counter mode;
57
6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING
when fed a request for data, of the form ”{okbkh} [number],” the server returns a number of octal, binary, or hexadecimal digits from its store. The request ”i {HkC} [data]”
causes the server to input a data string specified either in hexadecimal or character format; upon future requests for random bits, this message data is exclusive-ored with the
keystream.
#!/usr/bin/tclsh
set port 8080
proc answer {sock host2 port2} { fileevent $sock readable [list serve $sock] }
proc serve sock {
fconfigure $sock -blocking 0 -translation binary
gets $sock line
if {[fblocked $sock]} return
fileevent $sock readable ""
set response [myServe $line]
puts $sock $response; flush $sock
fileevent $sock readable [list serve $sock]
}
puts "Bit server running on port $port, socket=[socket -server answer $port]."
vwait forever
Figure 6.3: The 14-line ciphertext server, adapted from a 41-line web sever in (1).
6.4.1
Quartz Composer animations
A Quartz Composition is a computer animation represented as a flow graph of fixed
”patches.” Patches are encapsulated procedures for rendering or computation with
published inputs and outputs; they reside in system and user libraries, which can be
extended with user-designed patches.
The animation we used for our experiments is a randomly changing grid of colored
squares diagrammed in figure 6.4. The ”ciphertext server” patch is a customized patch
which connects to our server to retrieve pseudo-random data. The data is requested as
a string of octal digits, which are converted to the hue of each column. The remainder
of the animation uses standard Quartz Composer patches.
58
6.4 Software Overview
Figure 6.4: Quartz composer flow-graph for our sample animation. At the top, the
”Ciphertext Gateway” is polled by a clock to update the state of the display. The animation
contains two nested iterators to draw columns and rows of sprites (blocks).
6.4.2
Receiver architecture and implementation issues
To establish a channel, the captured video at the receiver must be analyzed to estimate
the pseudo-random bits used to drive the background animation. This requires that
the animation uses its bits in a suitably transparent way, and that the background
state can be estimated while partially occluded by a user. For our experiments, we
capture the video on the receiving end as a video file for analysis; in the future we will
implement a real-time receiver that polls the contents of the iChat video window.
Using our ciphertext server and customized patch, a designer only needs to compose
a computer animation, and write a corresponding C function to estimate the PRNG
state given bitmap frames from the received video. The server, patch and receiver code
59
6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING
is already written.
One implementation issue we have encountered is sandboxing of Quartz compositions by Apple. Because of the clear security risk of video files that can covertly access
one’s file system or connect to the Internet, QuickTime Player refuses to load any
custom patches, and limits native patches that access the outside world, such as RSS
readers. This prevented our customized patch from connecting to our server whenever
it was installed as a backdrop for iChat.
Fortunately, the sandboxing was relatively simple to override. By examining the binary files of existing patches, we found that Quartz Composer patches simply implement
the class method +(BOOL) isSafe, which by outputs NO. Subclassing the patch class
circumvents this measure. This cannot be accomplished through Apple’s public API,
however; the public API does not allow a programmer to implement a patch directly,
but only a “plug-in” object, a wrapper through which isSafe cannot be overridden.
The ciphertext server patch had to be written directly as a patch, using undocumented
calls 1 .
Another implementation issue concerns random delays in rendering. When immersed in iChat, our animations do not render regularly, but experience random lag
times between updates. This is a minor problem, because iChat does not skip frames
of our animation; it merely delays animation frames over more frames. This only limits
the speed of supraliminal transmission, and does not result in the dropping of message
bits. To cope with the delays our decoder must be asynchronous, identifying from the
received video when the animation has changed its pseudo-random state. This is easily
accomplished with our animation, because block hues are dramatically altered across
the entire frame during a state transition.
6.5
Results
For purposes of designing an optimal animation, we measured per-pixel occlusion of
the iChat application. Pixels can be occluded either by the user in the foreground,
or by misalignment of the camera or changes in lighting that confuse the background
subtraction algorithm. We first replaced iChat’s backdrop with a frame of bright green
pixels, recorded the output as a video file, and computed the fraction of pixels in each
1
We are indebted to Christopher Wright of Kineme.net whose expertise on the undocumented API
made our patch possible.
60
6.5 Results
pixel position position that were blocked. An occlusion graph for a 16-minute video
chat is shown in figure 6.5.
Figure 6.5: Average occlusion of background pixels over 31369 frames (16 minutes 28
seconds) of a video chat session.
Based on this data we determine that a pseudo-random state should be estimable
given the upper left and right corners of video. We produced several animations that
only utilized the top band of the frame, and could easily embed more data per frame by
incorporating more of the sides of the frame. One animation featured a grid of colored
blocks. We transmitted this backdrop with varying periods and grid sizes—each column
transmits three bits of data. Decoding was accomplished by estimating the hue of the
topmost row of blocks. The results of decoding are shown in figure 6.6.
Attempted period (s)
Attempted period (s)
Blocks
0.5
0.25
0.125
0.05
Blocks
0.5
0.25
0.125
0.05
8
0.002
0.005
0.087
0.095
8
2.01
4.25
8.14
7.58
16
0.00
0.004
0.069
0.101
16
2.07
4.08
7.62
8.31
32
0.003
0.095
0.045
0.062
32
2.02
4.12
7.57
7.74
64
0.002
0.0015
0.003
0.006
64
2.05
4.01
7.36
8.21
Figure 6.6: BER (left) and block rows transmitted per second (right) with our animation/decoder pair, for several block sizes and attempted periods.
These results show a limit of approximately 8 animation frames per second effec-
61
6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING
tively transmitted, and to our surprise the BER decreased as we increased the block
size. We believe this occurs because errors are largely due to intermittent errors, rather
than general noise prohibiting hue estimation. This result indicates that backdrop computer animations can easily leak hundreds or thousands of bits per second. Hence this
channel is suitable for public key exchange, and has a suitable bitrate for transmitting
text messages as well.
Figure 6.7: Supraliminal backdrops. On the left, colored blocks. On the right, the user’s
office superimposed with rain clouds. The animations are driven either by pseudo-random
bits or ciphertext, estimable by the recipient.
6.6
Discussions and Conclusions
We have illustrated a means to smuggle ciphertext into a pseudo-random data source
driving a computer animation, which can then be added as a novelty backdrop for a
popular video chat application. The pseudo-random bits can be estimated at the other
end of the channel, allowing the leakage of ciphertext.
Rather than hiding our information imperceptibly in the background noise of a
multimedia file, we embed quite literally in the background, in an extremely perceptible fashion. Rather than achieving imperceptibility, we instead seek a different goal:
plausible deniability. The background animation is obvious but plausibly deniable as
innocent. Since it comprises the semantic content of the video, a warden cannot justifiably change it.
Our channel is designed so that new and creative animation/decoder pairs are easy
to create. Once a designer has installed our ciphertext/PRNG patch and the corresponding server, designing a transmitter is a simple matter of using the patch in a
62
6.6 Discussions and Conclusions
Quartz Composition, a step that requires no coding. Decoding requires the implementation of a C function that inputs each video frame as a bitmap, and can estimate
message bits. If the designer creates an animation whose bits are suitably estimable, for
example our backdrop of colored squares, the decoder can be easily written. Data rates
of thousands of bits per second are possible if the animation is properly designed. We
hope that this framework will spur creative development of animation/decoder pairs,
and encourage further search for possible supraliminal channels in emerging social networking applications.
63
7
DLSR Lens Identification via Chromatic Aberration
7.1
Introduction on Camera-related Digital Forensics
7.1.1
Forensic Issues in The Information Age
There is no doubt that more and more digital devices are going to be developed and
deployed in people’s daily life, in order to facilitate information capture and access.
As those digital devices play a more and more important role in our life, a couple of
fundamental security problems are going to have unprecedented level of importance.
Digital forensics, a relatively new and ever-evolving research field, is expected to
provide solutions to the following problems:
1. Source Identification: the origin of a certain piece of information.
2. Forgery Detection: the integrity of a certain piece of information that was generated by a digital device.
3. Timing Identification: the timing or the sequence of several pieces of information.
The first problem involves with the unique identification feature of every digital
device. The second problem involves with the consistency of the identification feature
presented in that piece of information. And the identification feature for forgery detection could be either from the digital device or the information content been generated.
The third problem involves with the gradual evolvement of the identification feature as
time goes by.1 All three problems point to the same key: the identification feature.
1
In practice, it is degradation in most cases due to the second law of thermodynamics.
64
7.1 Introduction on Camera-related Digital Forensics
7.1.2
Digital Camera: ‘Analog’ Part and ‘Digital’ part
In the big picture given by physics, the diversity of our universe stems from the so
called Symmetry Breaking. In other words, it is symmetry breaking that makes our
world so colorful. As a consequence, the identification feature of everything ultimately
originates from this symmetry breaking.
In the perspective of an engineer, most digital devices are an integration of hardware
and software. For the hardware part of a digital device, engineers commonly use a term
imperfection, which is related to symmetry breaking but also more specific. For the
software part of a digital device, the uniqueness of any algorithm can also be deemed
as cases of symmetry breaking.
Today’s digital camera is a complex system consists of numerous hardware and
software components, such as glass optics, anti-aliasing filter, infrared filter, color filter
array, micro lens, image sensor, and image processing chips etc. Given the fact that no
man-made device can be free from any imperfection, camera hardware imperfections can
always be used as an identification feature to distinguish different cameras. In addition,
when it comes to the in-camera proprietary software, at least part of the algorithm
or parameter settings is very likely to be different from one camera manufacturer to
another.
In general, while the hardware imperfection is capable of distinguishing different
copies of the same camera, software ‘signature’ only works for distinguishing different
models or only different manufacturers, since the in-camera processing software in different copies of the same model is most likely the same.1 This capability difference is
essentially due to the analog nature of any physical device and the digital nature of
any in-device software.
7.1.3
Identification Feature Category
From the perspective of three different applications, the task of source identification
is to extract the most effective and robust identification feature set reflecting the inner workings of a camera; the task of forgery detection is to detect the inconsistency
within a produced image, either the inconsistency within the camera fingerprints or the
inconsistency within the image semantic content; since all digital cameras would age
1
Although the camera firmware could be different due to upgrading.
65
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
with time, their ‘fingerprints’ are expected to change with time as well. For example,
it turns out that the cosmos radiation actually plays an important role in changing the
PRNU pattern of a camera image sensor. (37)
The identification features been explored in previous research works fall into four
categories, i.e. hardware induced feature, software induced feature, picture semantic
content feature, and statistical feature as been manifested in the final picture. Unlike
the first three features that focus on one specific origin, the last statistical feature is
an accumulated result of all camera imaging and processing, including both hardware
imperfections and processing softwares.
1. The hardware induced ‘analog fingerprints’:
• Lens geometric distortion, such as the radial distortion.(38)
• Lens chromatic aberration, such as the lateral chromatic distortion.(39, 40)
• Image sensor imperfections, such as the PRNU (41, 42), and the age of
PRNU (43).
2. The software induced ‘digital fingerprints’:
• The color filter array(CFA) interpolation matrices. (44, 45)
• The double JPEG quantization effects. (46)
• The compression configurations stored in the JPEG header. (47)
3. The identification features coming from the semantic content of a picture:
• The lighting consistency of the objects in a picture. (48).
4. The statistical patterns presented in a picture:
• A group of selected feature patterns relating to color, image quality, and
wavelet statistics. (49)
• The camera response function. (50, 51)
As a remark on the statistical identification features, we noticed that few research
works had evaluated how the camera shooting modes would affect the detection reliability. Camera shooting mode settings is a frequently visited functionality for many
66
7.2 DSLR Lens Identification
camera users. Take a Canon’s camera for example, there are quite different shooting
modes such as ‘portrait’, ‘landscape’, and ‘standard’ etc. Those modes usually have
different combinations of sharpness, contrast, saturation, and color tone settings, which
are expected to produce different statistical features in the corresponding final pictures.
Except for the CFA interpolation matrix related features, fortunately, most of the
identification features belonging to the first three categories are much less likely to be
affected by the in-camera shooting modes settings.
7.2
DSLR Lens Identification
7.2.1
Motivation
With the popularization of point-and-shoot cameras, Digital Single-Lens Reflex (DSLR)
cameras have become popular among users dissatisfied with the image quality produced
by compact point-and-shoot cameras. This trend is reflected by the statistics released
by the image hosting website flickr.com. (52)
One major difference between point-and-shoot cameras and the DSLR is that DSLR
cameras allow the user to change lenses. As a result, the price of a single DSLR
interchangeable lens is often comparable to the price of a DSLR body. For this reason,
it is worth exploring the forensic identification of features inherent to camera lenses,
as well as the identifiable features of camera bodies such as the PRNU of the camera
CCD or CMOS sensors. (41, 42)
On the technical side, in the broad sense of all glass lenses, lens chromatic aberration
features are highly variable, as revealed by previous papers and as will be further
demonstrated in this paper. If we have the goal of lens classification, we definitely want
a deeper understanding of the working of the lenses in order to yield stable identification
features. The lens parameters are generally more accessible for interchangeable DSLR
lenses than their counterparts built in point-and-shoot cameras.
7.2.1.1
Lens Aberrations
Due to its physical limitations, it is impossible to build a glass lens system without
any aberration. Besides others, there are two common categories of lens aberration,
geometric distortion and chromatic aberration (hereafter CA). Geometric distortion
67
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
commonly manifests as pincushion or barrel distortion in real life pictures, while chromatic aberration often appears as color fringes, especially in the corners of pictures.
While there are many causes that can possibly account for different color fringes in
an image taken by a digital camera, there are two major types, Lateral CA and Axial
CA1 . Both of them stem from the fact that the refractive index of lens depends on light
wavelength.
To be more specific, LCA is relevant to obliquely incident light, and could be explained as a lens magnification factor that depends on the light wavelength. It often
manifests in telephoto lenses due to their large image magnification factor. On the
other hand, ACA refers to incident light that is parallel to the lens optical axis. It is
due to the incapability of a lens to focus different colors in the same focal plane. ACA
manifests under large lens aperture.
Note that since geometric distortion like pincushion and barrel distortion are due to
the magnification ratio variation along the radial direction of the image, it will interact
with LCA which is also relevant to magnification ratio. In fact, in a real life picture, all
sorts of lens aberrations coexist and are intertwined with each other. If our goal is to
collect forensic data for lens identification, instead of isolating each of them, it might
be more efficient to capture all at a time. With this philosophy in mind, we will use
the term CA for the rest of the paper.
7.2.1.2
Previous Works
Currently, there are three applications for the measurement of lens aberration, i.e.
source identification, forgery detection, and CG–Natural image identification. Choi
and Lam proposed to use the metric of lens geometric distortion to aid in identifying
source cameras. (38) Johnson and Farid proposed for the first time to use LCA as
a possible approach to camera identification as well as forgery detection. (39) Their
estimation work is based on a global linear model. Gloe and Borowka etc. suggest
to use local estimation instead in order to achieve better computation efficiency. (40)
Also, they found out that LCA is not necessarily to be linear in real case. Results
from both Johnson and Gloe showed relatively large estimation variation which makes
it impossible to perform lens identification for different copies of the same lens.
1
Lateral CA is also known as Transverse CA, Axial CA is also known as Longitudinal CA.
68
7.2 DSLR Lens Identification
While both Choi and Gloe noticed the large variation of lens aberration patterns
under different focal length settings, none of the previous papers pointed out another
factor, the lens focal distance. Focal distance plays an equally important role when it
comes to the lens aberration pattern. Consider a real zoom lens for example: a zoom
lens normally changes focal length by rotating or pushing the zoom-ring. By rotating
or pushing, the geometric relationship of the glass lenses changes greatly, which will
result in a quite different lens aberration pattern. Likewise, a lens normally changes its
focal distance by rotating or pushing the focal-ring, which has the same type of impact
on lens aberration patterns. Therefore, in order to obtain a consistent lens aberration
pattern, only keep the focal length unchanged is not enough, focal distance also has to
be fixed.
7.2.2
7.2.2.1
A New Shooting Target
Pattern Misalignment Problems
As a convention, previous lens forensic papers as well as many lens distortion calibration
papers use a checkerboard pattern image or some deformed checkerboard pattern image
as shooting target. (40, 53, 54, 55) While a checkerboard pattern works well for image
calibration, it may not be a good choice when it comes to lens identification for two
important reasons.
In practice, even with a very good tripod, there is inevitable wobbling incurred
by the action of changing lens, and to a lesser degree, by the mirror flipping inside
any DSLR camera body. Those slight camera movements will be translated into the
shifting of the shooting area, i.e. the portion of the target on each picture will be
slightly different from shoot to shoot, given the task of comparing multiple lenses. As
a result, if a checkerboard pattern is being used as shooting target, the positions of all
corner points in each test picture are expected to be different. Consequently, misaligned
CA patterns that were extracted from those misaligned test pictures of a checkerboard
pattern can not provide a fair ground for lens comparison.
With respect to a corner detection algorithm that is to be applied to test pictures
of a checkerboard pattern, most of the commonly used corner detectors would also
introduce misalignment. Partly due to the fact that there is not a unanimously agreed
mathematical definition of ‘corner point’, most corner detectors are unable to produce
robust results. (56, 57) In other words, even if there is absolutely no camera movements,
69
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
with all shooting settings kept the same, the positions of the detected corner points
may not always be consistent(in pixel precision) from shoot to shoot.
7.2.2.2
White Noise Pattern
In order to address the pattern misalignment problem, we propose to use a white noise
pattern as shooting target. By using a white noise image like figure (7.1), we can
achieve error-free alignment by applying one rigid block division to all test pictures,
without worrying about either the minor camera movements or the instability inherent
in a corner detector. Furthermore, we can save the running time needed for corner
detection.
Figure 7.1: White noise pattern image as shooting target.
7.2.2.3
Paper Printout vs LCD Display
It is worth noting that a LCD displaying a white noise pattern can not be used as
shooting target. Instead, a paper printed version of white noise pattern should be
used.
The reason is that each single pixel on a regular LCD screen typically consists
of three subpixels, i.e. in the formation of either R-G-B or B-G-R stripes along the
horizontal direction. Therefore, at an edge where a white pixel meets a black pixel,
there is 1/3 pixel of Red or Blue, depending on the display’s subpixel arrangement.
70
7.2 DSLR Lens Identification
Given the fact that today’s camera sensor greatly outresolves today’s LCD display, this
1/3 color pixel would be magnified on the test pictures taken. Thus this ‘pseudo CA
pattern’ from a LCD display would overwhelm the real CA pattern from a camera lens
and make the extracted CA pattern useless.
7.2.3
Major Feature-Space Dimensions of Lens CA Pattern
Among all user-controllable lens parameters, there are three major ones that play an
important role in shaping a lens’ CA pattern. While focal length and aperture size have
already received attention in previous papers, to our best knowledge, the focal distance
as another equally important factor, has never been revealed. As a consequence, while
previous lens forensic papers succeeded in using CA patterns to distinguish lenses of
different model, the CA patterns they obtained were not stable enough to fulfill the
task of distinguishing different lens copies of the same model.
In this paper, we will demonstrate that after taking this third parameter into account, we are able to obtain a stable enough CA pattern that is capable of distinguishing
different lenses of the same model.
7.2.3.1
Focal Length
Focal length has already been identified as one of the key variables in shaping a zoom
lens’ CA pattern. (38, 40) However, in practice, it is worth noting that the focal length
information recorded in the EXIF metadata tag1 has certain precision granularity. Take
Canon’s EF-S 18-55mm IS and EF 17-40mm lens for example: the finest focal length
step been reported in the EXIF metadata tag is either 1mm or 2mm, depending on the
focal range. Therefore, given two pictures taken by the same lens and with the same
EXIF focal length reading, the underlying CA pattern might actually be different.
Since the stability of CA pattern is key to the success of lens identification, the EXIF
focal length reading should be taken with caution.
Figure (7.2) shows how lens CA pattern varies at different focal length. A Canon
EF-S 18-55mm IS lens was used to produce the two figures, both with the same focal
1
EXIF stands for exchangeable image file format, it is a specification for the image file format used
by digital cameras.
71
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
distance but different focal length, i.e. 50mm1 for the left plot and 55mm for the right
plot.
50
50
40
40
30
30
20
20
10
10
0
0
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
70
80
Figure 7.2: Two B-G Channel CA Patterns with one lens works at different focal length.
7.2.3.2
Focal Distance
Mechanically speaking, the action of adjusting lens focal length(also called zooming)
changes the arrangement of the groups of glasses inside a zoom lens, which would finally
be reflected as CA pattern change in the pictures taken.
On the other hand, focal distance is adjusted during autofocus or manual focus
procedure in order to achieve sharp focus at the distance where the main shooting
object lies. During focusing procedure, the glass arrangement inside a lens also changes,
which is very similar to what is happening during a zooming procedure. This means
focal distance is of equal importance as focal length in shaping the lens CA pattern.
Moreover, focal length only plays a part in zoom lenses, while focal distance plays
a part in virtually all modern digital lenses, encompassing both zoom lenses and prime
lenses2 .
Figure (7.3) shows how lens CA pattern varies at different focal distance. The
identical lens used to produce Figure (7.2) was used here. This time, focal length was
fixed at 55mm. The focal distance is around 2.8m for the left plot, and around 2.2m
1
2
50mm is from the EXIF metadata, hence it is not a very precise number.
The focus length of a prime lens is a constant.
72
7.2 DSLR Lens Identification
for the right plot1 . Note that the rotational shift of the lens focus ring, which was in
response to the focal distance change, was only within 2mm, a very tiny amount.
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
0
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
70
80
Figure 7.3: Two B-G Channel CA Patterns with one lens works at different focal distance.
7.2.3.3
Aperture Size and Other Issues
Another important factor is the lens aperture size. A CA pattern typically tends to
get smaller as the lens aperture is stepped down.
Yet another user-controllable lens feature, the lens image stabilization system, may
also affect lens CA pattern. Due to the modelling complexity involved with an image
stabilization system, we will not investigate how the working of image stabilization
would affect lens CA pattern in this paper. In practice, we can avoid the impact of lens
image stabilization by simply turning it off during test.
7.2.3.4
Size of a Complete Lens CA Identification Feature Space
For the sake of digital forensics, the identification data needed for a camera sensor
such as PRNU is one matrix of the same size as the sensor’s pixel count per RGB
channel. (42) When it comes to lens CA patten, one can think of one 2D vector matrix
per R-G and B-G channel pair, plus three extra dimensions of the focal length, the
focal distance, and the aperture size.
1
The distance here is the length from the shooting target to the camera image sensor.
73
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
Essentially, the difference between lens identification and sensor identification boils
down to the complexity difference of the physical constructions of a camera lens as
versus an image sensor.
For a specific camera sensor with resolution x × y, also let the CA estimation block
size be xb × yb . Let fl and fd denote focal length and focal distance, and let A denote
the total number of possible lens aperture size. We can formulate the required data
volume ratio between lens CA pattern and sensor PRNU pattern.
||Dlens ||
||Dsensor ||
=
=
x
xb
·
y
yb
· 2 · 2 · fl · fd · A
x·y·3
4 fl · fd · A
·
3
x b · yb
(7.1)
In practice, typical values for xb , yb , and A are on the scale of 101 . The ratio
then depends on the ‘granularity’ of fl and fd . The granularity here means a certain
quantization scale, within which, the lens CA pattern can be deemed as unchanged.
For a practical lens forensic problem with a lens database, the granularity of fl and fd
would largely depend on the total number of lenses to be distinguished.
In the experiment section, we will show that for fixed fl and fd , the CA pattern
data obtained is enough to distinguish several copies of the same lens. It is a future
research topic to determine the appropriate granularity of fl and fd for the task of
distinguishing a large number of copies of lenses.
7.2.4
Experiment
7.2.4.1
Test Picture Shooting Settings
The shooting settings been adopted in our experiment is listed as follows:
• Camera ISO is fixed at the minimal value1 so as to eliminate interference from
camera sensor noise.
• Camera’s output format is set to be ‘RAW’, in order to eliminate the interference
from the JPEG compression.
1
ISO= 100 in our case.
74
7.2 DSLR Lens Identification
• Lens aperture is always fixed at the maximum in order to yield larger ‘overall’ chromatic aberration estimation. For manufacture precision reasons, both
geometric and chromatic aberrations typically deteriorate at maximal aperture.
Larger ‘overall’ CA readings are expected considering the fact that all those aberrations are intertwined with each other.
• Turn off all in-camera image processing options such as Canon’s picture style 1
and Lens Peripheral Illumination Correction, so as to eliminate contamination
from in-camera image processing.
The six Canon DSLR lenses used in our experiment are listed in table (7.1). The
timing sequence of all 36 test pictures is listed in table (7.2), in sequential order. In
the ‘Lens ID’ column of table (7.2), the numbers after the symbol @ denote the focal
length that each zoom lens was set to work at.
Lens Model
Lens ID
EF 50mm f/1.8
EF-S 18-55mm IS
EF 17-40mm
L1,L2,L3
L4,L5
L6
Table 7.1: Six Canon DSLR lenses.
Lens ID
L1
L2
L3
L1
L4@55mm
L5@55mm
Picture
p1-3
p4-6
p7-9
p10-12
p13-15
p16-18
Lens ID
L4@55mm
L4@18mm
L5@18mm
L4@18mm
L6@17mm
L6@40mm
Picture
p19-21
p22-24
p25-27
p28-30
p31-33
p34-36
Table 7.2: The sequential order of test pictures.
The purpose of this shooting order is to prove that our scheme works even in the
presence of camera movements incurred by the action of switching lens. Take the first
12 pictures p1-12 for example: there were three cumulative lens switching actions before
picture p10, p11, and p12 were taken. And there were fewer switching actions before
picture p4-9 were taken. Therefore, if the CA patterns extracted from p10-12 show more
similarity with the CA patterns from p1-3, than the CA patterns from picture p4-9 do,
1
Canon’s picture style includes in-camera processing of sharpness, contrast, saturation, and color
tone.
75
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
it would indicate that our method is robust to minor camera movements incurred by
lens switching actions.
The focal distance for all 36 test pictures are the same, i.e. a fixed focal distance.
7.2.4.2
CA Pattern Extraction
A Canon EOS 7D DSLR camera body was used in our experiment, which has a 18
mega-pixel CMOS image sensor. In order to reduce the computational complexity,
we choose to shoot only at a resolution of 2592 × 1728 instead of its native resolution
5184 × 3456, i.e. 4.5 mega-pixel instead of the full 18 mega-pixel.
The block size we used during R-G and B-G channel-wise correlation is 48 × 48
pixels, thus gave two 2D vector CA pattern matrices of the size 53 × 351 for each test
picture. The standard correlation coefficient measure was used as our criterion for the
search of best channel-wise CA shift.
Image interpolation is needed in order to extract CA shift at sub-pixel precision.
While 5× bicubic interpolation is enough to reveal CA patterns for the low-end Canon
EF-S 18-55mm IS zoom lens, higher interpolation rate is needed for the EF 50mm
f/1.8 prime lens and the high-end EF 17-40mm zoom lens, which is within expectation.
Therefore, 10× bicubic interpolation was used in our experiment.
In the CA pattern extraction procedure, we were essentially brute-forcing the best
channel-wise CA shift for every 48×48 block of a test picture. In general, the extraction
algorithm we used is very similar to the algorithm described in the paper by Gloe and
Borowka. (40) The main difference is that ours is free from the hassle of corner detection,
thanks to the use of white noise pattern.
7.2.4.3
CA Pattern Comparison Results
The similarity between the CA patterns that were extracted from all 36 pictures can
be partly revealed by the four plots in figure (7.4) and figure (7.5). For each plot,
a CA pattern from one specific picture is used as reference pattern, and all 36 CA
patterns(including the reference itself) are compared with this reference pattern. The
total number of mismatching CA vectors are recorded. Thus, if a picture has only a
small number of mismatch counts, it means this picture’s underlying CA pattern is very
1
Pixels in picture boundary are not used in our experiment.
76
7.2 DSLR Lens Identification
similar to the underlying CA pattern of the reference picture, which further indicates
those two pictures are likely to be taken by the same lens.
CA Pattern Mismatch Plot −− reference to p31
1800
1600
Mismatch counts of CA vectors
1400
Mismatch counts of CA vectors
B−G
R−G
1800
1600
1200
1000
800
600
400
1400
1200
1000
800
600
400
200
200
B−G
R−G
0
CA Pattern Mismatch Plot −− reference to p22
2000
0
5
10
15
20
Test Picture ID
25
30
35
0
40
0
5
10
15
20
Test Picture ID
25
30
35
40
Figure 7.4: Mismatch plot. Left: CA pattern of p31 was used as reference. Right: CA
pattern of p22 was used as reference.
CA Pattern Mismatch Plot −− reference to p13
2000
1800
B−G
R−G
1600
1400
1400
Mismatch counts of CA vectors
Mismatch counts of CA vectors
1600
1200
1000
800
600
1200
1000
800
600
400
400
200
200
0
CA Pattern Mismatch Plot −− reference to p1
1800
B−G
R−G
0
5
10
15
20
Test Picture ID
25
30
35
40
0
0
5
10
15
20
Test Picture ID
25
30
35
40
Figure 7.5: Mismatch plot. Left: CA pattern of p13 was used as reference. Right: CA
pattern in p1 was used as reference.
The left plot of figure (7.4) uses the CA pattern underlying in p31 as reference. All
36 CA patterns were then compared with this reference pattern. As we can see, only
the mismatch counts from p31, p32, and p33 are small, which echoes the fact that only
those three test pictures were taken by lens L6@17mm. This plot demonstrates that our
scheme works well for either distinguishing lenses of different model or distinguishing
77
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
one single lens works at different focal length.
Likewise, the right plot of figure (7.4) uses p22 as reference. The mismatch counts
for Lens L1, L2, L3, and L6 are very high. More importantly, it shows that the mismatch
counts for all pictures taken by lens L4 are consistently less than those by L5, in the
presence of one more cumulative minor camera movements. This plot demonstrates
that our scheme is able to distinguish different copies of the same lens that all work
at the same focal length, in the presence of minor camera movements incurred by the
action of switching lens.
The left plot of figure (7.5) is very similar to the right plot of figure (7.4), only
its reference been changed from p22 to p13, namely from a picture by L4@18mm to a
picture by L4@55mm. In this plot, the two groups of pictures taken by L4@55mm are
well separable from all other groups, including the three pictures taken by L5@55mm.
Perhaps the right plot of figure (7.5) provides the most convincing proof out of all
four plots, since distinguishing among EF 50mm f/1.8 prime lenses is more challenging
than distinguishing among EF-S 18-55mm IS zoom lenses. This is because a prime
lens typically has better control on chromatic aberration than a zoom lens, hence giving
smaller CA vectors to follow. As we can see from the plot, while the space left for
distinguishing gets smaller, picture p10, p11, and p12 that were taken by lens L1 are
still well separable from pictures taken by lens L2 and L3, even after three cumulative
lens switching actions.
7.2.5
7.2.5.1
Outlook
Improve the Identification Capability
There are three possible ways to improve the identification capability with the philosophy of ‘enriching’ the identification data set. Firstly, we can shoot test pictures with
higher resolution. In our case, shoot at the full 18-mega pixel resolution. Secondly, we
can use higher interpolation rate during CA pattern extraction. Lastly, instead of using
CA patterns at a fixed focal distance, we can compute several ‘slices’ of CA patterns
at different focal distance, and combine those ‘slices’ of CA patterns together to form
a more complete identification data set.
78
7.2 DSLR Lens Identification
7.2.5.2
The Potential of the Bokeh Modulation
This discussion stems from an observation in our experiment. For the same lens and
all shooting settings kept the same, two test pictures pin and pout were taken in good
focus and out of focus respectively. We notice that the CA pattern extracted from pout
look like a ‘transformed’ version of the pattern extracted from pin .
Figure 7.6: Bokeh pattern: the polygon shapes are due to the 8-blade aperture diaphragm
being slightly closed.
In the perspective of photography, the term bokeh refers to the way a lens renders
the out of focus points of light. (58) This bokeh pattern depends on lens structures
such as the aperture diaphragm shape and the aperture size, as well as focal distance
and focal length. Figure (7.6) is a real life picture showing bokeh effects.
The whole point behind this phenomenon is: if the bokeh modulation could be modeled and calculated for each individual lens, there is hope to identify the corresponding
lens by checking certain types of real life pictures, which contain well defined image
regions that are either in good focus, or out of focus. To be more specific, consider
three types of image region containing edges or corner points: (a) in good focus; (b)
inside focus; (c) outside focus. For image region of type (a), the CA pattern extraction
is the same as from test pictures of white noise pattern image. For image region of type
(b) and (c), the CA pattern extracted is expected to be the ‘classical’ lens CA pattern
subjects to the corresponding ‘bokeh modulation’.
79
7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION
In summary, the ‘bokeh modulation’ might be deemed as an extra dimension in the
lens identification feature space, which has the potential to enable the identification
of individual lens from certain types of real life pictures. The desired real life picture
should meet two requirements described as follows,
• It should contain well defined objects that have discernible distance with respect
to the focal plane, so that the modes of bokeh modulation can be retrieved from
the ‘more complete’ lens identification feature space1 .
• Those objects should contain regions with abundant edges or corner points, so as
to facilitate CA pattern extraction.
7.2.6
Conclusions
In a real life picture, different types of geometric distortion and different types of lens
chromatic aberration are always intertwined with each other. For the application of
lens identification, there is no need to isolate each one of them. In fact, we believe it
might be more efficient if capturing them as much as possible as a whole.
For the first time, we proposed to use a white noise pattern image as the shoot target.
The biggest advantage of a white noise pattern over the conventional checkerboard
pattern is, it solves the pattern misalignment problem. In addition, it frees us from the
complexity and the instability of corner detection.
Another contribution of this paper is determining that the lens focal distance plays
an important role in shaping the lens CA pattern. With this finding in hand, we further
demonstrated that if we fix the lens focal distance, we are able to obtain a stable enough
CA pattern that can fulfil the task of distinguishing different copies of the same lens.
For a complete lens identification feature space in terms of chromatic aberration,
a CA pattern at fixed focal length, fixed focal distance, and fixed aperture size is
only ‘one slice’ of pattern sitting in the overall feature space. In this view, the reason
why previous lens forensic papers were unable to obtain stable enough CA patterns
for distinguishing different copies of the same lens now has a clear explanation. That
is, they were most likely dealing with different ‘slices’ of the CA pattern locating at
different positions on the focal distance axis in the overall feature space.
1
This ‘more complete’ lens identification feature space has four dimensions, i.e. aperture size, focal
length, focal distance, and the newly added bokeh modulation.
80
7.2 DSLR Lens Identification
There are two interesting follow-up research topics. Firstly, the ‘granularity’ of
the focal length and focal distance is an interesting topic for future research. Lens
identification among a large lens database might be feasible if we can fill the extra
three dimensions of the lens CA identification feature space with enough ‘slices’ of CA
pattern at different focal length, different focal distance, and different aperture size.
Secondly, the ‘bokeh modulation’ might has the potential to enable lens identification
from certain types of real life pictures. In other words, bokeh modulation might become
the fourth dimension of a more complete lens identification feature space.
81
8
Benchmarking Camera IS by the Picture Histograms of a
White Noise Pattern
8.1
Motivation
Image Stabilization has been proposed during the 1970s, and Canon introduced the
first image stabilized zoom lens to the consumer market in 1995. Ever since then, the
Image Stabilization(IS hereafter) has been proven to be very useful in reducing camera
motion blur in the situation of long exposure time hand-held shooting, especially when
shooting with a telephoto lens.
Nowadays, IS has become a default functionality for most digital cameras. Since
different camera manufactures have developed their own proprietary IS techniques,
1
it would be interesting to design an accurate and consistent evaluation scheme so that
people can compare them.
8.1.1
The Volume of Test Picture Space
Due to the statistical nature of the human hand motion when holding a camera, a good
IS benchmark can only be generated from a good amount of test pictures.
Unfortunately, since most of today’s camera IS reviews still require human intervention, especially human visual assessment, for practical reasons, the number of
test pictures under each set of shooting settings has to be limited to a very small
amount(usually ∼ 101 ). (59, 60) As a consequence, it is unlikely to draw much con1
Canon uses IS (Image Stabilization), Nikon uses VR(Vibration Reduction), Sony uses SSS (Super
Steady Shot), Panasonic and Leica use MegaOIS, Sigma uses OS (Optical Stabilization), Tamron uses
VC (Vibration Compensation), and Pentax uses SR(Shake Reduction).
82
8.2 Our Automatic IS Benchmarks
clusions based on the rough benchmarks that were generated from such a small test
picture space.
8.1.2
The Consistency of an IS Metric
Another big issue associated with human visual assessment lies in the blurring thresholding process, i.e. even for an expert, the threshold between a ‘soft’ test picture and
a ‘very soft’ test picture is highly likely to be inconsistent from one review to another,
which could lead to biased comparison results when performing IS comparison across
different camera models.
On the other hand, since there is resolution variation and contrast variation when
it comes to cross comparison among different camera models, many generic image blur
metrics should be used with caution since they may not be easily scalable with respect
to resolution and contrast variations.
8.1.3
Our Goal
The most straightforward way to increase the evaluation speed is to develop an automatic scheme without human intervention. Additionally, a well defined IS metric is
needed to enable unbiased comparison across different camera models.
With the capability of evaluating a large number of test pictures within a short
time, we could obtain test readings in much finer granularity. Further more, with an
accurate and consistent blur metric, we could have more bins of different degrees of
blur than, for example, only four bins such as ‘Sharp’-‘Mild Blur’-‘Heavy Blur’-‘Very
Heave Blur’. (59, 60)
In a word, our goal is to generate an IS benchmark that is much more informative
than current ones which either require human intervention or use some generic blur
metrics that may not be consistent with respect to different camera models.
8.2
8.2.1
Our Automatic IS Benchmarks
A Histogram based IS Metric
Think about a black letter ‘T’ on a white paper as shooting target, the camera motionincurred blurring would manifest as pixel value blend around the boundary of the letter
‘T’, i.e. those blurred pixels would have pixel values that are between ‘pure black’ and
83
8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS
OF A WHITE NOISE PATTERN
‘pure white’. More importantly, there is a clear correlation between the count of those
blurred pixels and the degree of the camera motion, i.e. a larger camera motion is
expected to create more ‘blurred pixels’.
Furthermore, since both resolution(pixel count) and contrast have direct translation
in test picture histogram, this ‘blurred pixel count’ is easily scalable with respect to
resolution and contrast variations, which is an important feature for our camera IS cross
comparison application. Take for example, given two cameras C1 and C2 of resolution
100 × 100 and 200 × 200 respectively. And the image sensor size of both cameras is the
same. For the same amount of camera motion, the blurred pixels would be four times
as many in test pictures taken by C2 as in those taken by C1 .
Lastly, given the same amount of camera motion, by checking the remaining camera
motion from an IS system, we would know how effective that IS system is. Therefore,
it is reasonable to use the count of those ‘blurred pixels’ to build up a metric indicating
the effectiveness of a camera IS system.
It is worth to point out that our model considers the motion range as the indicator
for motion blur but not the motion timing. For a generic motion measure, both the
motion range and the motion timing should be considered. However, since we are
working with the short-time(shutter time) human hand-held motion, which has its own
pattern, we value the motion range as a more predominant factor than the motion
timing in our camera motion blur evaluation process.
8.2.2
A Histogram-Invariant Shooting Target
To simplify the subsequent histogram analysis, a histogram-invariant shooting target
in the presence of camera shift would be desirable.
The term histogram-invariant here means that this shooting target can always guarantee a 1 : 1 black-white area ratio no matter which part of the target a camera is
shooting at. Also note that we use the term camera shift here to differentiate from
camera motion. We use camera shift to refer the position change of a hand-held camera from shot to shot, and we use camera motion to refer the movement of a camera
during a single shot.
The commonly used checkerboard pattern image is not histogram-invariant in the
presence of camera shift no matter how small each black or white block is. For a
checkerboard pattern, the black-white area ratio depends on the actual image patch
84
8.2 Our Automatic IS Benchmarks
been framed. In other words, the black-white ratio would fluctuate while a hand-held
camera changes position from shot to shot.
We discovered that a white noise pattern image with a fine enough black and white
unit would meet this constant black-white area ratio requirement no matter which part
of the image target is been framed.
8.2.3
Our Algorithm
With the two crucial building blocks ready, we designed an evaluation procedure as
follows:
1. Under each set of shooting settings, take one motion-free test picture of the white
noise target(possibly with a tripod), and use its histogram for future reference.
2. Under each set of shooting settings, take enough amount of test pictures of the
white noise target with camera motion coming into play.
3. Subtract the histogram of each test picture taken in step 2 by the corresponding
reference histogram obtained in step 1.
4. For each histogram obtained in step 3, count the number of ‘blurred pixels’ falling
in the middle range of the histogram, and use this count as the IS metric.
8.2.3.1
The Rationale behind the Histogram Subtraction Operation
Although our white noise pattern image only consists of ‘pure’ black and white, due
to the imperfections in any optical imaging system as well as the variation of the
environment lighting, even for a motion-free test picture, its histogram would not consist
of only two spikes. Therefore, instead of counting ‘histogram middle range pixels’
directly in each test picture taken in step 2, we need to perform histogram subtraction
beforehand in order to make sure that we are counting the real ‘blurred pixels’ caused
by camera motion.
For each test picture taken in step 2, which has various degrees of motion blur, we
can think of one corresponding motion-free reference picture ‘behind’ it, so that the
motion blur benchmark depends on the comparison between this real test picture and
its presumed motion-free reference picture. Thanks to the histogram-invariant feature
of our white noise pattern target, all those presumed motion-free reference pictures
85
8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS
OF A WHITE NOISE PATTERN
would have a same histogram as the reference picture we took in step 1. Therefore, the
histogram comparison between each test picture and their respective presumed reference
picture is equivalent to the histogram comparison between each test picture and the
reference picture we took in step 1. By subtracting off the reference histogram(not the
reference picture) we obtained in step 1, we can make sure only the real ‘blurred pixels’
are counted for the final IS metric.
When compared to other evaluation methods that perform subtraction directly on
blur metric, such as in (60), according to the Data Processing Inequality in information
theory (61), histogram subtraction is more meaningful since it preserves more valuable
information which is helpful in generating the final blur metric. In other words, direct
subtraction on generic blur metrics may not bear a tractable physical meaning and
hence may not produce useful camera motion blur metric.
8.3
8.3.1
Experiment
Synthetic Motion Blur
Convolution with a motion blur kernel is one way to model linear motion blur. (62)
We firstly ran our algorithm on synthetically blurred images that were generated by
convolution.
Figure (8.1) illustrates how the motion range would affect the histogram of the
synthetically generated images. Note that the motion range corresponds to the number
of non-zero coefficients in a motion blur kernel, and it is different from motion length,
which denotes the motion translation in pixel. The motion blur kernel is a 2D matrix
in general, and it degenerates to a vector for both horizontal and vertical cases. For
instance, for horizontal and vertical cases, the number of non-zero kernel coefficients
equals to the motion length; for our 45 ◦ diagonal case, the number of non-zero kernel
coefficients are 7, 17, and 25, when the motion length is 3, 7, and 11 respectively.1
In figure (8.1), the first column are the histograms of the original motion-free white
noise pattern image. The remaining three columns are histograms of the images that
were subject to motion blur kernels with motion length set to be 3, 7, and 11 respec1
We use the fspecial(‘motion’,motionLen,angle) function in Matlab to generate those motion blur
kernels.
86
8.3 Experiment
Motion−free
6
2.5
2.5
1.5
1
0
50
1
100
150
Pixel Value
200
0
250
Motion−free
x 10
2
1.5
1
0.5
0
50
100
150
Pixel Value
200
0
250
motionLen = 3
6
x 10
2
1.5
1
0.5
0
50
100
150
Pixel Value
200
0
250
motionLen = 7
6
x 10
2.5
2.5
2.5
2.5
1.5
1
0.5
0
1.5
1
0.5
0
50
100
150
Pixel Value
200
0
250
Motion−free
6
x 10
Pixel Count
3
Pixel Count
3
2
2
1.5
1
0.5
0
50
100
150
Pixel Value
200
0
250
motionLen = 3
6
x 10
0
50
100
150
Pixel Value
200
0
250
motionLen = 7
x 10
0
1
0
50
100
150
Pixel Value
200
250
0
Pixel Count
Pixel Count
Pixel Count
3
2.5
0.5
2
1.5
1
0.5
0
50
100
150
Pixel Value
200
250
0
50
100
150
Pixel Value
250
200
250
motionLen = 11
x 10
3
2.5
1
0
6
3
0.5
200
motionLen = 11
1
2.5
2
250
0.5
6
1.5
200
2
3
2
100
150
Pixel Value
1.5
2.5
1.5
50
x 10
3
2
0
6
3
Pixel Count
Pixel Count
1.5
0.5
6
Pixel Count
2
Pixel Count
2.5
Pixel Count
2.5
2
motionLen = 11
6
x 10
3
0
Diagonal
motionLen = 7
6
x 10
3
0.5
Vertical
motionLen = 3
6
x 10
3
Pixel Count
Horizontal
Pixel Count
x 10
3
2
1.5
1
0.5
0
50
100
150
Pixel Value
200
250
0
0
50
100
150
Pixel Value
Figure 8.1: Different motion ranges and their resulting histograms.
tively. The histograms in figure (8.1) are also organized in three rows, with motion
blur effect along horizontal, vertical, and 45 ◦ degree respectively.
As we can see from figure (8.1), larger motion range would result in more ‘blurred
pixels’ in the middle range of the histogram. Or equivalently, less pure white and black
pixels on the two ends of the histogram. Additionally, the number of ‘blurred pixels’ is
irrelevant to the motion orientation as long as the number of non-zero kernel coefficients
are kept the same, which is desirable for our application.
Figure 8.2: Image crops: the original image and three synthetically blurred images with
different motion ranges.
Figure (8.2) are the crops of the original motion-free white noise pattern image and
three synthetically blurred images. The three blurred ones correspond to the histograms
87
8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS
OF A WHITE NOISE PATTERN
locating at (1,2), (2,3), and (3,4) in figure (8.1) respectively.
8.3.2
Real Camera Motion Blur
Followed by our success in synthetic images, we turned to real camera motion blur. In
our experiment, with all shooting settings kept the same, we took a group of 11 test
pictures. The third test picture was chosen to be the reference picture, since it is the
sharpest among all 11. The histogram of this reference picture as well as its center crop
are shown in figure (8.3). Note that this histogram is quite different from the ideal
case as shown in the first column of figure (8.1), which only contains two spikes at each
extreme ends.
Histogram of the reference picture
5
12
x 10
10
8
6
4
2
0
0
50
100
150
200
250
Figure 8.3: The reference picture. Left: its histogram; right: the center crop.
Figure(8.4) are the center crops of four test pictures, and figure (8.5) is our motion
blur metric for all test pictures.
Figure 8.4: From left to right: center crop of the picture 2, 9, 5, 10
As we can see, our motion blur metric in figure (8.5) matches very well with the
perceived extent of motion blur in figure(8.4). Also note that there is relative camera
shift between all test pictures.
88
8.4 Outlook
Our Motion Blur Metric
6
2
x 10
1.8
1.6
Blurred pixel counts
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
Test picture index
8
9
10
11
Figure 8.5: Our motion blur metric.
As to the running speed of our batch code, in practice, loading those test pictures
into memory probably consumes more time than the rest of the processing.
8.4
Outlook
In practice, image blur is an overall effect that has various origins. For example, camera
lens defocus is another important contributor to the overall image blur. Therefore, in
order to build a refined blur metric that only reflects camera motion, removing the blur
caused by defocusing would be an important pre-processing.1
In addition, our scheme has space for the possible incorporation of a human visual
model specially designed for camera motion blur, i.e. the relationship between the
‘blurred pixel count’ and the perceived degrees of the motion blur might be nonlinear.
In that case, a weighted blurred pixel count can be used as the final IS benchmark,
with the weight based on certain Human Visual System model.
To finish the whole working cycle, with the human hand-held motion pattern in
mind, we can even plan to assemble a three dimensional shaking simulator using motors from fruit blender or other appliances, so that we can free reviewer from taking
enormous test pictures by hand.
1
Thank Professor Fridrich for pointing this out.
89
8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS
OF A WHITE NOISE PATTERN
8.5
Conclusion
Since our scheme only resorts to image histogram, a first order statistics, it runs fast.
Besides speed advantage, this histogram based metric is readily scalable in terms of
resolution variation and contrast variation when it comes to cross comparison among
different camera models.
When compared to conventional approach requiring human intervention, our method
is capable of producing benchmarks at a much finer granularity due to its capability
of handling a much larger volume of test picture space. Further more, we are relying
on numerical threshold values other than human eyes’ discretion, which makes a fair
comparison across different camera models feasible.
Finally, pre-processing such as removing defocus incurred blur is expected to produce better results.
90
9
Conclusions
9.1
Reversing Digital Watermarking
9.1.1
Security and Robustness
In the early days of information hiding, people summarized three key components in its
problem space, i.e. imperceptibility, capacity, and robustness. (63) Some believe that
more robustness implies more space to reinforce the security of an information hiding
scheme.
In this dissertation, we proved that robustness and security should be treated as
two independent components in the watermarking problem space by introducing the
concept of super-robustness, which refers to the property that a watermark can survive
certain types of attack far beyond one would expect. Rather than a blessing, superrobustness is in fact a security flaw due to its implicit high selectivity. The power of
this technique is demonstrated by our success in the first BOWS contest.
While human expert’s knowledge is required to identify modes of super-robustness,
we further developed an automatic algorithm specially for watermark detector using
normalized correlation. Several large false positive vectors that we called noise snakes
are automatically grown along the boundary of the detection region, a hyper-cone for
this case, which can be used to disclose useful information such as the cone angle
and the dimensionality of the watermark feature space. The working assumption here
is we already know the transform and the feature space that is used by the target
watermarking scheme.
If the feature dimensionality goes too high(> 104 ), a more precise model is needed in
order to achieve accurate reversing results. We then explored the mathematical model
91
9. CONCLUSIONS
which describes a hyper-cone intersecting a hyper-sphere, and we successfully transfered
this n-D problem to a pseudo 4-D problem, which depends on the solution to a quartic
equation. However, due to the complex nature of the model, both analytical and
numerical solutions of this quartic equation are not obtainable even with a computer.
Luckily enough, we observed the pattern of the arc angle curve α(h) w.r.t. the noise
amplitude R when the watermark detection rate crosses the critical point(50%).
In general, our mathematical analysis indicates a much harder life for attackers
working in a watermark feature space with very large dimensionality.
9.1.2
Reversing before Attacking
Another important factor that contributes to our success in the BOWS contest is our
attacking strategy, i.e. instead of concentrating on attacking, we learn the target watermarking system, especially its detection algorithm, by reverse engineering first. The
knowledge we learned by identifying the modes of super-robustness proves to be very
helpful in helping design more guided attacks.
In order to make our reversing life more systematic, a generic framework for modelling watermark detection system is introduced, which consists of transform, feature
space, and detector. Note that this model also applies to many non-watermarking systems that can be essentially modeled as a classifier, which implies that some of the
reversing experience we gained on this model might be readily transferable to other
non-watermarking problems.
9.1.2.1
Caution on Designing Patterns
In our BOWS contest experience, human expertise was found to be very useful in the
sense that it helps us quickly narrow down possible candidates with high confidence.
The reason behind this is, watermarking designers are tend to follow common designing
path which leaves obvious patterns for attackers to convince their attacking direction
by semantic hints.
For example, watermarking researchers as a whole are tend to use common transforms, common feature space, and common detectors. In particular, for the case of
block DCT embedding, people tend to choose subband that forms a regular geometric shape, which gives strong hint that the attackers are on the right track. This is
observation is backed by our literature survey.
92
9.2 Circumventing Detection in Steganography
From the perspective of a watermark designer, transforms like (19) might be more
preferred over those written in textbook. For feature space, more randomness and
protection should be introduced when select the watermark feature space. As to detectors using normalized correlation, for example, the detection cone should at least be
truncated in order to prevent information leakage through noise snakes. Some of these
design concerns had been implemented in the second BOWS contest. (23)
9.2
Circumventing Detection in Steganography
For the task of evading detection, other than the conventional methodology of mimicking natural noise or preserving the statistics of the cover media, we explore an emerging
new philosophy, which works by firstly creating a stego-friendly channel.
Whereas the subset selection technique described in Chapter 5 is a theoretical efforts in this direction, we implement this idea by introducing the concept of cultural
engineering. The term culture engineering is derived from the term social engineering.
It means to trick people to do certain things by firstly creating or promoting a related
culture which would influence people to do such things, which people are not likely
to do otherwise. As a real life implementation, the random animation backdrop in an
iChat video-conference session can be driven by our cipher message without raising
suspicion, since it looks culturally natural.
In short, while philosophy of conventional approaches is to make stego data looks
statistically natural, our goal is to make stego data looks culturally natural.
This idea was later extended to audio later by other student in my research group,
in the form of an iPhone walkie-talkie application. (64)
9.3
Camera related Image Processing
In chapter 7, we refined the categorization of the camera identification features as
compared to camera forensic survey papers. (65, 66, 67) Additionally, we notice that
the different in-camera shooting modes is a blind point in previous research on statistical
identification features.
93
9. CONCLUSIONS
9.3.1
White Noise Pattern Image
One of the major contributions in Chapter 7 and Chapter 8 is the introduction of a
new shooting target, i.e. we use a white noise pattern to supersede the commonly used
checkerboard pattern image as our shooting target.
For the scenario of lens identification, the white noise pattern excels in its good
position registration capability, so that the CA pattern been extracted is perfectly
aligned from one test picture to another in the presence of camera movement. For the
scenario of image stabilization evaluation, this pattern excels in its histogram invariant
feature, so that our motion blur metric can be directly derived from the histogram of
test pictures.
9.3.2
Complete Lens Identification Feature Space
Focal distance is another major contribution we made in Chapter 7. Among all usercontrollable parameters, focal length, aperture size had been found to be important
factors in shaping the lens CA pattern. Adding focal distance completes the lens
identification feature space, and enable us to distinguish different copies of the same
lens. Due to its complex mechanical designs, the identification feature space for lens is
expected to be larger than image sensors.
9.3.2.1
Outlook for the Bokeh Modulation
If the modes of the bokeh modulation can be modeled and calculated for each individual
lens, there is hope to identify the corresponding lens from certain types of real life
pictures. In other words, the bokeh modulation might be the fourth dimension of a
more complete lens identification feature space.
9.3.3
Motion Blur Measure from Histogram
Thanks to the histogram-invariant feature of our white noise pattern image, we can
build a motion blur metric directly from the histograms of test picture, which is very
fast.
When compared with current image stabilization methods which either require human visual assessment or use some generic blur metrics, this consistent blur metric
94
9.3 Camera related Image Processing
enables a fair comparison across different image stabilization systems from different
camera manufacturers.
In addition, according to the Data Processing Inequality in information theory (61),
the histogram subtraction operation in our algorithm is better than subtraction between
generic blur metrics as in (60).
Finally, removing image blur caused by lens defocus would further improve the
metric accuracy.
95
[13] M. L. Miller, G. J. Doerr, and I. J. Cox. Applying Informed Coding and Embedding to Design a Robust, High capacity Watermark. IEEE Trans. on
Image Processing, 13:792–807, June 2004. 11, 13, 15, 32,
33
[14] A. Westfeld. Lessons from the BOWS Contest. In
MM&Sec ’06: Proc. 8th Workshop on Multimedia and Security, pages 208–213, 2006. 11
References
[15] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon. Secure Spread Spectrum Watermarking for Multimedia. IEEE Trans. on Image Processing, 6:1673–1687,
1997. 13
[1] Wikibooks.
Tcl
Programming.
http://en.wikibooks.org/wiki/Tcl_Programming/Print_version,
May 2009. xiv, 58
[2] Elliot J. Chikofsky and James H. Cross II. Reverse engineering and design recovery: a taxonomy. IEEE
Software Magazine, 7(1):13–17, Jan 1990. 5
[16] S. Craver and J. Yu. Reverse-enineering a detector
with false alarms. In Proc. SPIE, Security, Steganography, and Watermarking of Multimedia Contents IX, 2007.
17, 33, 34
[17] Scott
Craver,
Idris
Atakli,
and
Jun
Yu.
Reverse-engineering a watermark detector using
an oracle. EURASIP J. Inf. Secur., 2007:5:1–5:16,
January 2007. 17, 33, 34
[3] Eldad Eilam. Reversing: Secretes of Reverse Engineering,
pages xxiii–23. Wiley Publishing Inc, 2005. 5, 7
[18] Chun-Shien Lu, Shih-Kun Huang, Chwen-Jye Sze, and HongYuan M ark Liao. Cocktail Watermarking for Digital
Image Protection. IEEE Trans. on Multimedia, 2:209–
224, 2000. 18
[4] Jean-Paul M. G. Linnartz and Maten van Dijk. Analysis
of the Sensitivity Attack against Electronic Watermarks in Images. Lecture Notes in Computer Science, 1525:258–272, 1998. 6, 20
[19] J. Fridrich, A. C. Baldoza, and R. J. Simard. Robust
Digital Watermarking Based on Key-Dependent
Basis Functions. Lecture Notes in Computer Science,
1525:143–157, 1998. 19, 93
[5] P. Comesana, L. Perez-Freire, and F. Perez-Gonzalez.
Blind newton sensitivity attack. IEE Proceedings on
Information Security, 153:115–125, Sept. 2006. 6, 20
[20] I. Atakli, S. Craver, and J. Yu. How we Broke the
BOWS Watermark. In Proc. SPIE, Security, Steganography, and Watermarking of Multimedia Contents IX,
2007. 20, 21, 33, 41
[6] Fabien A. P. Petitcolas, Ross J. Anderson, and Markus G.
Kuhn. Attacks on Copyright Marking Systems.
In Proceedings of the Second International Workshop on
Information Hiding, pages 218–238, London, UK, 1998.
Springer-Verlag. 6
[21] I. J. Cox, M. L. Miller, and J. A. Bloom. Digital Watermarking. Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA, 2002. 23
[7] Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica
Fridrich, and Ton Kalker. Digital Watermarking and
Steganography, pages 335–374. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2nd edition, 2007.
6
[22] Eric
W.
Weisstein.
Hypersphere.
From
MathWorld
A
Wolfram
Web
Resource.
http://www.mathworld.wolfram.com/Hypersphere.html.
27
[8] Secure Digital Music Initiative. SDMI Public Challenge. http://www.hacksdmi.org, Sept. 2000. 6, 9
[23] Teddy Furon and Patrick Bas.
Broken arrows.
EURASIP J. Inf. Secur., 2008:1–13, 2008. 33, 93
[9] Watermark Virtual Laboratory(Wavila) of the ECRYPT.
The Break Our Watermarking System (BOWS)
Contest. http://lci.det.unifi.it/BOWS/, 2006. 8, 11, 33
[24] Christian Cachin. An information-theoretic model
for steganography. Inf. Comput., 192:41–56, July
2004. 45
[10] S. Craver, M. Wu, Liu, A. Stubblefield, B. Swartzlander, D. S. Dean, and E. Felten. Reading Between the
Lines: Lessons Learned from the SDMI Challenge.
In Proc. of the 10th Usenix Security Symposium, pages
353–363, August 2001. 9, 10, 20
[25] Andrew D. Ker, Tomáš Pevný, Jan Kodovský, and Jessica
Fridrich. The square root law of steganographic
capacity. In Proceedings of the 10th ACM workshop on
Multimedia and security, MMSec ’08, pages 107–116, New
York, NY, USA, 2008. ACM. 45
[11] J. Boeuf and J. P. Stern. An Analysis of One of the
SDMI Candidates. Lecture Notes in Computer Science,
2137:395–??, 2001. 9
[26] Toms Filler, Andrew D. Ker, and Jessica J. Fridrich. The
square root law of steganographic capacity for
Markov covers. In Storage and Retrieval for Image and
Video Databases, 2009. 45, 46
[12] R. Petrovic, B. Tehranchi, and J. M. Winograd. Digital
Watermark Security Considerations. In MM&Sec
’06: Proc. 8th Workshop on Multimedia and Security,
pages 152–157, 2006. 9
[27] Andrew D. Ker. Estimating the Information Theoretic Optimal Stego Noise. In IWDW’09, pages 184–
198, 2009. 45, 46
96
REFERENCES
[28] Ying Wang and Pierre Moulin.
Perfectly Secure
Steganography: Capacity, Error Exponents, and
Code Constructions. Computing Research Repository,
abs/cs/070, 2007. 50
[41] Jan Luks, Jessica J. Fridrich, and Miroslav Goljan. Digital camera identification from sensor pattern
noise. IEEE Transactions on Information Forensics and
Security, pages 205–214, 2006. 66, 67
[29] Scott Craver. On Public-Key Steganography in the
Presence of an Active Warden. Lecture Notes in
Computer Science, 1525:355–368, 1998. 51, 52
[42] Mo Chen, Jessica Fridrich, and Miroslav Goljan. Digital
imaging sensor identification (further study). In In
Security, Steganography, and Watermarking of Multimedia
Contents IX. Edited by Delp, Edward J., III; Wong, Ping
Wah. Proceedings of the SPIE, 6505, 2007. 66, 67, 73
[30] Peter Wayner.
Disappearing Cryptography: Information Hiding: Steganography & Watermarking 2nd Edition.
Morgan Kaufmann, 2002. 52
[43] J. Fridrich and M. Goljan. Determining approximate
age of digital images using sensor defects. In Society
of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, 7880 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, February
2011. 66
[31] Ross J. Anderson and Fabien A. P. Petitcolas. On the
Limits of Steganography. IEEE Journal of Selected
Areas in Communications, 16(4):474–481, 1998. Special
issue on copyright & privacy protection. 53
[44] Alin C. Popescu and Hany Farid. Exposing digital
forgeries in color filter array interpolated images.
IEEE Transactions on Signal Processing, 53:3948–3959,
2005. 66
[32] John Giffin, Rachel Greenstadt, Peter Litwack, and
Richard Tibbetts. Covert messaging through TCP
timestamps. In Proceedings of the 2nd international conference on Privacy enhancing technologies, PET’02, pages
194–208, Berlin, Heidelberg, 2003. Springer-Verlag. 55
[45] Sevinc Bayram, Husrev T. Sencar, and Nasir Memon. Improvements on source camera-model identification based on CFA interpolation. In Proceedings of
WG 11.9 International Conference on Digital Forensics,
2006. 66
[33] Daisuke Inoue and Tsutomu Matsumoto.
Scheme of
standard MIDI files steganography and its evaluation. In SPIE Security and Watermarking of Multimedia
Contents IV, 4675, pages 194–205, 2002. 55
[46] Weihong Wang and Hany Farid. Exposing digital forgeries in video by detecting double quantization.
In Proceedings of the 11th ACM workshop on Multimedia
and security, MM&Sec ’09, pages 39–48, New York, NY,
USA, 2009. ACM. 66
[34] Anthony Steven Szopa. New Steganography Innovation from Ciphile Software. talk.politics.crypto, August 2001. 3B894FCE.FC4B729E@ciphile.com. 56
[35] NetApplications.com.
Operating
System
Market
Share
for
January,
2008.
http://marketshare.hitslink.com/report.aspx?qprid=8.
56
[47] Eric Kee, Micah K. Johnson, and Hany Farid. Displayaware Image Editing. IEEE Transactions on Information Forensics and Security, 2011. 66
[48] Micah K. Johnson and Hany Farid. Exposing digital
forgeries by detecting inconsistencies in lighting.
In Proceedings of the 7th workshop on Multimedia and security, MM&Sec ’05, pages 1–10, New York, NY, USA,
2005. ACM. 66
[36] Kevin Mitnick and William Simon. The Art of Deception:
Controlling the Human Element of Security. John Wiley
& Sons, 2002. 57
[37] Jenny Leung, Jozsef Dudas, Glenn H. Chapman, Israel
Koren, and Zahava Koren. Quantitative Analysis of
In-Field Defects in Image Sensor Arrays. Defect
and Fault-Tolerance in VLSI Systems, IEEE International
Symposium on, 0:526–534, 2007. 66
[49] Mehdi Kharrazi, Husrev T. Sencar, and Nasir D. Memon.
Blind source camera identification. In ICIP’04,
pages 709–712, 2004. 66
[50] Alin C. Popescu and Hany Farid. Statistical Tools for
Digital Forensics. In In 6th International Workshop
on Information Hiding, pages 128–147. Springer-Verlag,
Berlin-Heidelberg, 2004. 66
[38] K. S. Choi, E. Y. Lam, and K. K. Y. Wong. Source camera identification using footprints from lens aberration. In N. Sampat, J. M. DiCarlo, & R. A. Martin, editor, Society of Photo-Optical Instrumentation Engineers
(SPIE) Conference Series, 6069 of Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, pages 172–179, February 2006. 66, 68, 71
[51] Y.-F. Hsu and S.-F. Chang. Image Splicing Detection Using Camera Response Function Consistency and Automatic Segmentation. In International Conference on Multimedia and Expo, 2007. 66
[39] Micah K. Johnson and Hany Farid. Exposing digital
forgeries through chromatic aberration. In MMSec
’06: Proceedings of the 8th workshop on Multimedia and
security, pages 48–55, New York, NY, USA, 2006. ACM.
66, 68
[52] Flickr.com. Most Popular Cameras in the Flickr
Community. http://www.flickr.com/cameras, 2010. 67
[53] John Mallon and Paul F. Whelan. Calibration and removal of lateral chromatic aberration in images.
Pattern Recogn. Lett., 28(1):125–135, 2007. 69
[40] Thomas Gloe, Karsten Borowka, and Antje Winkler.
Efficient estimation and large-scale evaluation of
lateral chromatic aberration for digital image
forensics. In Nasir D. Memon, Jana Dittmann, Adnan M.
Alattar, and Edward J. Delp III, editors, Media Forensics
and Security II, 7541, page 754107. SPIE, 2010. 66, 68,
69, 71, 76
[54] Sing Bing Kang. Automatic Removal of Chromatic
Aberration from a Single Image. In Computer Vision
and Pattern Recognition, IEEE Computer Society Conference on, 0, pages 1–8. IEEE Computer Society, 2007.
69
97
REFERENCES
[55] Hiroshi Shimamoto Takayuki Yamashita. Automatic Removal of Chromatic Aberration from a Single Image. In Morley M. Blouke, editor, Sensors, Cameras, and
Systems for Scientific/Industrial Applications VII, 6068.
SPIE, 2006. 69
[62] Sebastian Schuon and Klaus Diepold. Comparison of
Motion Deblur Algorithms and Real World Deployment. Acta Astronautica, 64, 2009. 86
[63] W. Bender, D. Gruhl, N. Morimoto, and Aiguo Lu.
Techniques for data hiding. IBM Syst. J., 35:313–
336, September 1996. 91
[56] Stephen M. Smith and J. Michael Brady. SUSAN A
New Approach to Low Level Image Processing. International Journal of Computer Vision, 23:45–78, 1997.
10.1023/A:1007963824710. 69
[64] Enping Li and Scott Craver. A supraliminal channel in a wireless phone application. In Proceedings
of the 11th ACM workshop on Multimedia and security,
MM&Sec ’09, pages 151–154, New York, NY, USA, 2009.
ACM. 93
[57] X. C. He and N. H. C. Yung. Curvature Scale Space
Corner Detector with Adaptive Threshold and
Dynamic Region of Support. Pattern Recognition,
International Conference on, 2:791–794, 2004. 69
[65] Tian tsong Ng, Shih fu Chang, Ching yung Lin, and Qibin
Sun. Passive-blind Image Forensics. In In Multimedia
Security Technologies for Digital Rights, 2006. 93
[58] Wikipedia.com.
Bokeh.
http://en.wikipedia.org/wiki/Bokeh, August 2011. 79
[59] dpreview.com.
Digital Photography
http://www.dpreview.com/, 2011. 82, 83
[66] Tran Van Lanh, Kai-Sen Chong, Sabu Emmanuel, and Mohan S. Kankanhalli. A Survey on Digital Camera
Image Forensic Methods. In ICME’07, pages 16–19,
2007. 93
Review.
[60] SLRgear.com.
SLRgear
Image
Stabilization
Testing,
Version
2.
http://www.slrgear.com/articles/ISWP2/ismethods_v2.html,
2011. 82, 83, 86, 95
[67] Nitin Khanna, Aravind K. Mikkilineni, Anthony F. Martone, Gazi N. Ali, George T.-C. Chiu, Jan P. Allebach,
and Edward J. Delp. % bf A survey of forensic characterization methods for physical devices. Digital Investigation, 3(Supplement 1):17 – 28, 2006. The Proceedings
of the 6th Annual Digital Forensic Research Workshop
(DFRWS ’06). 93
[61] Thomas M. Cover and Joy Thomas. Elements of Information Theory, chapter 2, page 32. Wiley, 1991. 86, 95
98