DETECTION PROBLEMS IN WATERMARKING, STEGANOGRAPHY, AND CAMERA IDENTIFICATION BY JUN(EUGENE) YU BS, Lanzhou University, China, 2001 MS, Lanzhou University, China, 2004 DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering in the Graduate School of Binghamton University State University of New York 2011 UMI Number: 3474057 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent on the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI 3474057 Copyright 2011 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 c Copyright by Jun Yu 2011 All Rights Reserved Accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering in the Graduate School of Binghamton University State University of New York 2011 July 29, 2011 Scott A. Craver, Chair and Faculty Advisor Department of Electrical Engineering, Binghamton University Jessica Fridrich, Member Department of Electrical Engineering, Binghamton University Mark Fowler, Member Department of Electrical Engineering, Binghamton University Kenneth Chiu, Outside Examiner Department of Computer Science, Binghamton University iii Abstract This dissertation covers research results in three fields, i.e. reverse engineering digital watermarking systems, security in steganography, and camera related digital image processing. The research in the first field is under a new attacking philosophy, i.e. learn the target watermarking system first before attacking. A 3-stage model is introduced to model generic watermark detection systems so as to aid reversing in a systematic manner. Two reversing techniques, i.e. superrobustness and noise snakes, are developed and applied to real life attacks. In order to handle watermark residing in feature space of very large dimensionality, a more precise geometric model for normalized correlation based watermark detector is presented and analyzed. The research in the second field is under the philosophy of creating a stegofriendly cover which would ease the embedding restriction. As a theoretic trial, the subset selection technique is examined under the i.i.d. and the first order Markov case. As a practical implementation, this philosophy is implemented in the Apple iChat video-conferencing software via cultural engineering. The goal of this cultural engineering approach is to render covert data to be culturally undetectable instead of statistically undetectable. The research in the last category includes two topics, i.e. DSLR lens identification and benchmarking camera image stabilization systems. In both applications, white noise pattern image is used as shooting target to supersede the commonly used checkerboard pattern image, since it excels in position registration and bears the histogram-invariant feature in each scenario respectively. For that task of distinguishing different copies of the same lens using chromatic aberration, focal distance is determined to be another important factor in shaping the lens chromatic aberration pattern, iv which completes the lens identification feature space. For the task of evaluating camera IS systems, a camera-motion blur metric directly generated from test picture histogram is developed to fulfill the benchmarking task without resorting to either human visual assessment or other more complex motion blur metrics. v Acknowledgements I would like to thank my advisor Scott Craver for guiding me, step by step, on how to perform scientific research as well as reverse engineering1 , and on how to behave as a rigorous researcher. I’m also very grateful to his generous and continuous financial support in these years. I would also like to thank Professor Jessica Fridrich for her very systematic way of teaching and inspiring discussions, as well as her enlightening comments on this dissertation. I must thank all Professors who showed me human wisdom in the fields of Electrical Engineering and Mathematics on their courses during my studies in Binghamton University. Special thanks to my friends Enping Li, Idris Atakli, Yibo Sun, Hongbing Hu, Xiao Xiao, Kun Huang, Mo Chen, Ming Liu, Pranav, and Yan Zhang for our cherished memories here in Binghamton. Finally, I’m very grateful to my parents Kaibei Yu and Xiaoqiao Lian for their wholehearted support in all these years. 1 For the difference between the two subjects, please refer to section 2.1.1. vi Contents List of Tables xii List of Figures xiii 1 Introduction 1 1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Specific Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Breaking the BOWS Watermark by Super-robustness 2.1 2.2 2.3 5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Reversing Digital Watermark . . . . . . . . . . . . . . . . . . . . 6 A 3-Stage Watermarking System Model . . . . . . . . . . . . . . . . . . 7 2.2.1 Common Transforms and Detection Algorithms . . . . . . . . . . 7 Breaking BOWS by Identifying Super-robustness . . . . . . . . . . . . . 8 2.3.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1.1 Goals of the Contest . . . . . . . . . . . . . . . . . . . . 9 2.3.1.2 A Comparison to the SDMI Challenge . . . . . . . . . . 9 2.3.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 2.3.3 2.3.4 REVERSE ENGINEERING . . . . . . . . . . . . . . . . . . . . 11 2.3.2.1 Determining the Feature Space . . . . . . . . . . . . . . 11 2.3.2.2 Determining the Sub-band . . . . . . . . . . . . . . . . 13 BREAKING THE WATERMARK . . . . . . . . . . . . . . . . . 14 2.3.3.1 A Curious Anomaly . . . . . . . . . . . . . . . . . . . . 14 2.3.3.2 The Rest of the Algorithm . . . . . . . . . . . . . . . . 15 SUPER-ROBUSTNESS AND ORACLE ATTACKS . . . . . . . 15 2.3.4.1 17 Oracle Attacks Exploiting Super-robustness . . . . . . . vii CONTENTS 2.3.5 2.3.4.2 Mitigation Strategies . . . . . . . . . . . . . . . . . . . 17 2.3.4.3 Other Advice to the Watermark Designer . . . . . . . . 18 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Reversing Normalized Correlation based Watermark Detector by Noise 20 Snakes 3.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 SUPER-ROBUSTNESS AND FALSE POSITIVES . . . . . . . . . . . . 21 3.2.1 Examples of Super-robustness . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Generic Super-robustness . . . . . . . . . . . . . . . . . . . . . . 23 MEASURING A DETECTION REGION . . . . . . . . . . . . . . . . . 24 3.3.1 Noise Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 The Growth Rate of a Noise Snake . . . . . . . . . . . . . . . . . 24 3.3.3 Information Leakage from Snakes . . . . . . . . . . . . . . . . . . 26 3.3.4 Extracting the Size of the Feature Space . . . . . . . . . . . . . . 29 3.4 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 DISCUSSION AND CONCLUSIONS . . . . . . . . . . . . . . . . . . . 30 3.5.1 Is it Useful? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5.2 Attack Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 3.3 4 A More Precise Model for Watermark Detector Using Normalized Correlation 33 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Quartic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Numerical Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.1 Identify Valid Roots . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.2 Detection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.3 The Pattern of α(h) . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4.3.1 Five AC DCT Embedding . . . . . . . . . . . . . . . . 39 4.4.3.2 Twelve AC DCT Embedding . . . . . . . . . . . . . . . 42 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5 viii CONTENTS 5 Circumventing the Square Root Law in Steganography by Subset Selection 45 5.1 The Square-root Law in Steganography . . . . . . . . . . . . . . . . . . 45 5.2 Circumventing the Law by Subset Selection . . . . . . . . . . . . . . . . 46 5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.2 The i.i.d. Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.3 The First Order Markov Case . . . . . . . . . . . . . . . . . . . . 47 5.2.3.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Conclusion 6 Covert Communication via Cultural Engineering 51 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 Supraliminal Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2.1 A two-image protocol for supraliminal encoding . . . . . . . . . . 52 6.2.2 Public key encoding . . . . . . . . . . . . . . . . . . . . . . . . . 54 Manufacturing a Stego-friendly Channel . . . . . . . . . . . . . . . . . . 55 6.3.1 Apple iChat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.3.2 Cultural engineering . . . . . . . . . . . . . . . . . . . . . . . . . 57 Software Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.4.1 Quartz Composer animations . . . . . . . . . . . . . . . . . . . . 58 6.4.2 Receiver architecture and implementation issues . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 6.4 6.5 Results 6.6 Discussions and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 7 DLSR Lens Identification via Chromatic Aberration 7.1 7.2 62 64 Introduction on Camera-related Digital Forensics . . . . . . . . . . . . . 64 7.1.1 Forensic Issues in The Information Age . . . . . . . . . . . . . . 64 7.1.2 Digital Camera: ‘Analog’ Part and ‘Digital’ part . . . . . . . . . 65 7.1.3 Identification Feature Category . . . . . . . . . . . . . . . . . . . 65 DSLR Lens Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.2.1.1 Lens Aberrations . . . . . . . . . . . . . . . . . . . . . . 67 7.2.1.2 Previous Works . . . . . . . . . . . . . . . . . . . . . . 68 A New Shooting Target . . . . . . . . . . . . . . . . . . . . . . . 69 7.2.2 ix CONTENTS 7.2.3 7.2.4 7.2.5 7.2.6 7.2.2.1 Pattern Misalignment Problems . . . . . . . . . . . . . 69 7.2.2.2 White Noise Pattern . . . . . . . . . . . . . . . . . . . . 70 7.2.2.3 Paper Printout vs LCD Display . . . . . . . . . . . . . 70 Major Feature-Space Dimensions of Lens CA Pattern . . . . . . 71 7.2.3.1 Focal Length . . . . . . . . . . . . . . . . . . . . . . . . 71 7.2.3.2 Focal Distance . . . . . . . . . . . . . . . . . . . . . . . 72 7.2.3.3 Aperture Size and Other Issues . . . . . . . . . . . . . . 73 7.2.3.4 Size of a Complete Lens CA Identification Feature Space 73 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.2.4.1 Test Picture Shooting Settings . . . . . . . . . . . . . . 74 7.2.4.2 CA Pattern Extraction . . . . . . . . . . . . . . . . . . 76 7.2.4.3 CA Pattern Comparison Results . . . . . . . . . . . . . 76 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.2.5.1 Improve the Identification Capability . . . . . . . . . . 78 7.2.5.2 The Potential of the Bokeh Modulation . . . . . . . . . 79 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8 Benchmarking Camera IS by the Picture Histograms of a White Noise 82 Pattern 8.1 8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.1.1 The Volume of Test Picture Space . . . . . . . . . . . . . . . . . 82 8.1.2 The Consistency of an IS Metric . . . . . . . . . . . . . . . . . . 83 8.1.3 Our Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Our Automatic IS Benchmarks . . . . . . . . . . . . . . . . . . . . . . . 83 8.2.1 A Histogram based IS Metric . . . . . . . . . . . . . . . . . . . . 83 8.2.2 A Histogram-Invariant Shooting Target . . . . . . . . . . . . . . 84 8.2.3 Our Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.2.3.1 8.3 8.4 The Rationale behind the Histogram Subtraction Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3.1 Synthetic Motion Blur . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3.2 Real Camera Motion Blur . . . . . . . . . . . . . . . . . . . . . . 88 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 x CONTENTS 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Conclusions 9.1 90 91 Reversing Digital Watermarking . . . . . . . . . . . . . . . . . . . . . . 91 9.1.1 Security and Robustness . . . . . . . . . . . . . . . . . . . . . . . 91 9.1.2 Reversing before Attacking . . . . . . . . . . . . . . . . . . . . . 92 9.1.2.1 Caution on Designing Patterns . . . . . . . . . . . . . . 92 9.2 Circumventing Detection in Steganography . . . . . . . . . . . . . . . . 93 9.3 Camera related Image Processing . . . . . . . . . . . . . . . . . . . . . . 93 9.3.1 White Noise Pattern Image . . . . . . . . . . . . . . . . . . . . . 94 9.3.2 Complete Lens Identification Feature Space . . . . . . . . . . . . 94 9.3.2.1 Outlook for the Bokeh Modulation . . . . . . . . . . . . 94 Motion Blur Measure from Histogram . . . . . . . . . . . . . . . 94 9.3.3 References 96 xi List of Tables 2.1 The distribution of transforms types. . . . . . . . . . . . . . . . . . . . . 8 2.2 The distribution of detection algorithms. . . . . . . . . . . . . . . . . . . 8 2.3 Successive attacks for image 1. By amplifying a few AC coefficients by the right value, the detector will fail. . . . . . . . . . . . . . . . . . . . . 2.4 16 Successive attacks for image 2. We start with 9 AC coefficients. ’R’ denotes that the coefficient is no longer needed when the gain increases. 16 7.1 Six Canon DSLR lenses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.2 The sequential order of test pictures. . . . . . . . . . . . . . . . . . . . . 75 xii List of Figures 2.1 Three stage decomposition of watermark detection system. . . . . . . . 7 2.2 The three images from the BOWS contest. . . . . . . . . . . . . . . . . . 9 2.3 Some visible artifacts in the image, suggesting either severe compression or a block-based watermark. . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 12 Some severe transformations of an image. Left, a shuffling of blocks aimed at preserving block-based FFT magnitudes. Right, a severe addition of 8-by-8 DC offsets. This preserved the watermark with only 3.94 dB PSNR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 12 Left: AC removal pattern 1; Center: AC removal pattern 2; Right: intersection of the pattern 1 and 2. . . . . . . . . . . . . . . . . . . . . . 13 2.6 The three images after the attack. Note the obvious block artifacts. . . 17 3.1 An image from the BOWS contest, and a severe distortion that preserves the watermark. From this experimental image we concluded that the watermark was embedded in non-DC 8-by-8 DCT coefficients. . . . . . . 21 3.2 A normalized correlation detection region with threshold τ . . . . . . . . 23 3.3 The construction of a noise snake. . . . . . . . . . . . . . . . . . . . . . 25 3.4 A pair of constructed noise snakes, used to plumb the detection cone. . 25 3.5 Optimal growth factors for a snake of unit length. On the left, a cone . . . . . . 26 angle of π/3 and n = 1000. On the right, π/3 and n = 9000. 3.6 Lemma: two independently generated snakes have approximately perpendicular off-axis components. . . . . . . . . . . . . . . . . . . . . . . . 3.7 The relation between the cone angle and the dot product of two independently generated noise snakes. . . . . . . . . . . . . . . . . . . . . . . 3.8 28 28 Estimated threshold using two noise snakes. Ground truth is τ = 0.5, n = 500. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 31 LIST OF FIGURES 3.9 Estimated dimension using two noise snakes. Ground truth is n = 500. . 31 4.1 A detection cone intersects with a Gaussian noise ball. . . . . . . . . . . 35 4.2 Three possible geometrical relationships. Arc in solid line is within the detection cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 38 Left: AC coefficients used for embedding. Middle: Image attacked by noise with amplitude R = 10000. Right: Image attacked by noise with amplitude R = 15000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Detection threshold is 0.011. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo method. . . . 4.5 41 Detection threshold is 0.015. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo experiment. . 4.7 40 Detection Threshold is 0.008. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo method. . . . 4.6 39 42 Detection threshold is 0.01. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo experiment. . 43 5.1 The second and third step of our algorithm. . . . . . . . . . . . . . . . . 49 5.2 The worst overlapping case. . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.1 The two-image protocol for establishing a supraliminal channel from an image hash. Alice and Bob’s transmissions are each split across two images, with the semantic content of the second keying the steganographic content of the first. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 53 The transmission pipeline. Data is fed to a ciphertext server, polled by a doctored PRNG patch in an animation. . . . . . . . . . . . . . . . . . 57 6.3 The 14-line ciphertext server, adapted from a 41-line web sever in (1). . 58 6.4 Quartz composer flow-graph for our sample animation. At the top, the ”Ciphertext Gateway” is polled by a clock to update the state of the display. The animation contains two nested iterators to draw columns and rows of sprites (blocks). 6.5 . . . . . . . . . . . . . . . . . . . . . . . . 59 Average occlusion of background pixels over 31369 frames (16 minutes 28 seconds) of a video chat session. . . . . . . . . . . . . . . . . . . . . . xiv 61 LIST OF FIGURES 6.6 BER (left) and block rows transmitted per second (right) with our animation/decoder pair, for several block sizes and attempted periods. . . . 6.7 61 Supraliminal backdrops. On the left, colored blocks. On the right, the user’s office superimposed with rain clouds. The animations are driven either by pseudo-random bits or ciphertext, estimable by the recipient. 62 7.1 White noise pattern image as shooting target. . . . . . . . . . . . . . . . 70 7.2 Two B-G Channel CA Patterns with one lens works at different focal length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Two B-G Channel CA Patterns with one lens works at different focal distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 77 Mismatch plot. Left: CA pattern of p13 was used as reference. Right: CA pattern in p1 was used as reference. . . . . . . . . . . . . . . . . . . 7.6 73 Mismatch plot. Left: CA pattern of p31 was used as reference. Right: CA pattern of p22 was used as reference. . . . . . . . . . . . . . . . . . . 7.5 72 77 Bokeh pattern: the polygon shapes are due to the 8-blade aperture diaphragm being slightly closed. . . . . . . . . . . . . . . . . . . . . . . . . 79 8.1 Different motion ranges and their resulting histograms. . . . . . . . . . . 87 8.2 Image crops: the original image and three synthetically blurred images with different motion ranges. . . . . . . . . . . . . . . . . . . . . . . . . 87 8.3 The reference picture. Left: its histogram; right: the center crop. . . . . 88 8.4 From left to right: center crop of the picture 2, 9, 5, 10 . . . . . . . . . . 88 8.5 Our motion blur metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 xv 1 Introduction My dissertation encompasses research works that can be mainly categorized into three fields, i.e. reverse engineering watermarking systems, the security in steganography, as well as camera related digital image processing and forensics. In essence, detection is the keyword in common behind those three fields. To be specific, • Reversing a digital watermark is about how to defeat a watermark detector by firstly learning its inner workings. • Circumventing the Square Root Law in steganography is about how to, at least in theory, achieve bigger embedding payload while preserving the statistics of the cover media so that the act of embedding is hard to detect. • Covert communication via Cultural Engineering is about how to smuggle secret data through popular communication software without raising suspicion. • DSLR lens identification is about how to distinguish each individual lens by using lens’s chromatic aberration pattern as its fingerprint; • Benchmarking camera Image Stabilization systems is about how to detect and measure the camera-incurred motion blur in an automatic and straightforward way. For detection problems, as long as security is involved, there is always the war between system designers and attackers. In many application scenarios, the state of art is a cat-and-mouse game. We hope our research works could shed some light on the final judgement as to which side is more likely to win for the scenarios listed above. 1 1. INTRODUCTION 1.1 Organization Chapter 2, 3, and 4 present several techniques on how to reverse-engineer a watermarking system, in an evolving order. Chapter 2 begins by introducing a generic 3-stage framework for modeling watermarking system. It then applies this model to a real life open challenge – the first BOWS1 contest. The reversing technique superrobustness which enabled us to win the first phase of the BOWS is described here. Chapter 3 further develops the idea of super-robustness for watermark detector that is based on normalized correlation. It describes how to automatically generate large false positive vectors that can be used to reveal the geometry of the watermark detection region. Chapter 3 employs a simplified geometric model which makes the computation tractable, and succeeds in handling watermark feature space dimensionality to the scale of 102 . Chapter 4 explores watermarking systems with much larger feature space dimensionality(to the scale of 104 ) by using a more precise geometric model. It delves into the mathematics describing the situation when a hyper-cone intersects a hypersphere in hyper-dimensional space. Although both symbolic and complete numerical solutions are not obtainable due to the complex nature of the model, it still finds one way to disclose information about the watermark detector. Chapter 5 discusses the possibility of circumventing the Square Root Law in steganography. Both the i.i.d. and the first order Markov case are analyzed here. Other than mimicking natural noise or preserving the statistics of the cover media, Chapter 6 provides a new philosophy of covert communication, i.e. through Cultural Engineering. It then implements this idea by exploiting the video-conferencing software’s random background animation to smuggle secret data. Chapter 7 examines the problem of DSLR camera lens identification. The lens chromatic aberration(CA) pattern is used as each lens’s unique fingerprint. The focal distance is discovered to be another important factor in shaping the lens CA pattern, and this discovery completes the feature space for lens CA data. After taking all the relevant factors into account, it succeeds in distinguishing different copies of the same lens. Chapter 8 presents a fast and automatic way of measuring camera motion incurred 1 BOWS stands for ‘Break Our Watermarking System’. 2 1.2 Specific Contributions motion blur,1 which is used for a very practical application, i.e. benchmarking different camera Image Stabilization systems. It introduces a white noise pattern image as shooting target, which enables the blur measure to be derived directly from test picture histograms. 1.2 Specific Contributions This section provides a summary of my contributions in this dissertation. 1. Super-robustness is found to be a security flaw rather than a blessing for a watermarking system. Along with two other team members(Professor Craver and Idris Atakli), by identifying traces of super-robustness, we were able to reverse the type of transform and the watermark feature space that were used by the BOWS watermark system, which enabled us to design ‘guided attack’ later. I came up with attack configurations for all three contest images which gave us the winning PSNR. 2. Noise snakes are large false positives grown along the normalized correlation based watermark detection boundary, which can be used to reverse the geometry of the detection region. This idea is inspired by the concept of superrobustness. While identifying super-robustness requires human expert knowledge, noise snakes are automatically generated by our algorithm. Along with Professor Craver, I designed the noise snake technique. 3. For normalized correlation based watermarking systems using a feature space of very large dimensionality, a precise geometric model is needed in order to ensure accurate reversing results. I delved into the mathematical description of this model which boils down to a quartic equation. After my efforts of seeking analytical and numerical solutions both failed due to the complex nature of the model, I observed one pattern that can still be used to aid our reversing task. To be more specific, I noticed that as the watermark detection rate crosses 50%, there is a corresponding pattern change in the tail part of the arc angle curve α(h) 2 w.r.t. the noise strength R. 1 Note that this is different from the moving object incurred motion blur. The intersection angle of the watermark detection region and the noise sphere, in hyperdimensional space. 2 3 1. INTRODUCTION 4. Let g denote the steganographic embedding function, and the Kullback-Leibler divergence is used as security measure. A subset of the cover which has a pdf fg that is a fixed point under g can be selected to achieve linear embedding rate without leaving statistical traces. I analyzed the embedding payload for both i.i.d. and first order Markov case for this subset selection technique. 5. The term Cultural Engineering refers to the act of tricking people to do certain things by firstly creating or promoting a relevant culture then use this culture to influence people to do such things, which people are not likely to do otherwise. Along with Professor Craver and Enping Li, we implemented a covert communication channel through Apple iChat’s video-conferencing facility. To be specific, I designed the ‘rain drop’ animation using Apple ‘Quartz Composer’ programming environment. Also I implemented a matched filter for decoding covert data that is hidden in the ‘rain drop’ animation. 6. Chromatic aberration(CA) is a class of inevitable artifacts for all optical lens systems. I discovered that the lens focal distance was another important factor in shaping the lens CA pattern, which had not received attention in previous research works. This discovery completes the feature space of the lens CA identification data. Along with the new white noise pattern shooting target, which is a contribution from Professor Craver, I was able to distinguish different copies of the same lens via their CA pattern. 7. The Image Stabilization(IS) system is a very useful add-on functionality in most of today’s digital cameras. However, there lacks a practical way to fairly benchmark and compare different IS systems from different camera manufactures. Thanks to our white noise pattern shooting target, I was able to build a motion blur measure directly from test picture histograms, which can be used for benchmarking the effectiveness of a camera IS system, in a fast and accurate manner. 4 2 Breaking the BOWS Watermark by Super-robustness 2.1 2.1.1 Introduction Reverse Engineering The concept of reverse engineering probably dates back to the days of the industrial revolution. As defined in (2), reverse engineering is the process of analyzing a subject system to identify the system’s components and their interrelationships and to create representation of the system in another form or at a higher level of abstraction. Reverse engineering is very similar to scientific research, with the difference in their subjects, i.e. the subject of reverse engineering is man-made while the subject of scientific research is natural phenomenon which no one knows if it was ever engineered. (3) Perhaps the most well-known application of reverse engineering is to develop competing products in most industries. Due to computer software’s ever-growing complexity, reverse engineering has been widely used for visualizing the subject software’s structure, its modes of operation, and the features that drive its behavior. In particular, it has been used in both the designing and the defending of malicious software, in evaluating cryptographic algorithms, in defeating digital rights management(DRM) which is prohibited by the Digital Millennium Copyright Act. (3) In fact, reverse engineering applies to virtually any system that could be modeled as a classifier, in which case the reverser has control over the input signal and would get feedback from the classifier for each input. Problems including watermarking, biometrics, and steganalysis all fall into this category. 5 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS 2.1.2 Reversing Digital Watermark During the transition from the 20th to the 21st century, digital watermarking was deemed as a promising solution for the DRM (Digital Rights Management) problem. After years of research and commercialization efforts, however, things have not gone as expected due to a variety of reasons. On the technical side, this is primarily due to three reasons as follows, • Unlike cryptography, the multimedia object in digital watermarking such as dig- ital image, audio and video are all “continuous” in some sense. In practice, given a watermarked media and its corresponding un-watermarked version1 , a binary search between the two can be performed to locate the local watermark detection boundary. As a consequence, the geometry of a watermark detection region can be gradually disclosed by Oracle Attack. The knowledge obtained could be finally used to design a very efficient attack. For example, Sensitivity Attacks can often achieve a PSNR even higher than the watermark embedding process. (4, 5) • In practical scenarios, watermarking systems are open to a variety of attacks. (6, 7, 8) • How to model the behavior of human attackers is an unsolved problem, which implies that there is no well-proven security guidelines available for watermarking system designers. Reverse engineering digital watermarks is important because it provides a way of quantifying the security of watermarking systems. In return, the lessons learned during reverse engineering could serve as a design guide towards a more secure watermarking system. This chapter examines a real life example showing how a watermarking system can be reversed and then attacked. It begins by firstly presenting a 3-stage framework for modeling generic watermarking systems. Reversing techniques we used to break the BOWS watermark are covered here. 1 Un-watermarked media could either be the original media, or a watermarked version but subjects to such a heavy attack that the embedded watermark can no longer be detected by the watermark detector. 6 2.2 A 3-Stage Watermarking System Model 2.2 A 3-Stage Watermarking System Model In the field of software reverse engineering, the cracking procedure is normally divided into two phases, i.e. system level reversing and code level reversing. (3) We take a similar approach by firstly identifying the structure of the watermark detection systems. Most known watermark detection system can be decomposed into three stages, i.e. transform, feature space, and the detection algorithm, as illustrated in Figure 2.1. A media work1 under inspection is firstly subject to certain transform in order to be represented in a proper domain. Then a subset of the feature space, or a subband, is selected as the suspected watermark signal carrier. Lastly, this subset is fed into a detection algorithm, and the detection statistics generated are compared with a threshold to produce the final watermark detection decision. Figure 2.1: Three stage decomposition of watermark detection system. It is worth mentioning that this decomposition model also fits many non-watermarking applications which could essentially be modeled as a classifier. Therefore, the reversing experience from our 3-stage watermark detector model may be readily transferred to other non-watermarking systems that can be modeled this way. 2.2.1 Common Transforms and Detection Algorithms As a useful knowledge to aid reverse engineering, we performed a paper survey to record the most commonly used transforms and detection algorithms. Table 2.1 and 2.2 imply that watermarking researchers as a whole are tend to choose common transforms and detection algorithms. It should be pointed out that the numbers listed in Table 2.1 and 2.2 are based on around 70 papers we collected, hence they may not reflect the exact order of the usage frequencies of all watermarking research documents. However, given an unknown 1 Media works includes digital image, audio, video etc. 7 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS Transform Type # of Papers Wavelets Spatial Domain DCT Fourier Vector Quantization Independent Component Analysis Moment Histogram Fractal Orthogonal Basis Function Scale Space Representation 21 15 14 5 2 2 1 1 1 1 1 Table 2.1: The distribution of transforms types. Detector Type # of Papers Correlation Maximum Likelihood QIM Rao Test 47 7 7 1 Table 2.2: The distribution of detection algorithms. watermarking system, those two tables could be used as a priority guide for attackers who want to schedule their attacks in a statistically more efficient way. 2.3 2.3.1 Breaking BOWS by Identifying Super-robustness INTRODUCTION From December 15, 2005 to March 15, 2006, the Break Our Watermarking System (BOWS) contest challenged researchers to break an unnamed image watermarking system, by rendering a watermark undetectable in three separate images. (9) Doctored images had to meet sufficient quality standards to be considered successfully attacked, and all three had to be successfully attacked to qualify for the prize. The prize went to the team which achieved the highest quality level by March 15, 2006. Our team from 8 2.3 Breaking BOWS by Identifying Super-robustness Binghamton University was fortunate enough to have the highest quality level on the 15th, winning the prize of 300 dollars and a digital camera. 2.3.1.1 Goals of the Contest The specific goal was to alter three separate 512-by-512 grayscale images, so that all three fail to trigger a watermark detector. Relative to the original, the quality of a successful attack had to exceed 30dB PSNR, and the winner was the team who achieved the highest PSNR – although the average PSNR of all three was computed from the average MSE: P SN R = 10 log10 1 P3 3 k=1 2552 P 1 2 i,j (Ik [i, j] − Jk [i, j]) 262144 The three images are illustrated below. Figure 2.2: The three images from the BOWS contest. 2.3.1.2 A Comparison to the SDMI Challenge It is inevitable to compare this watermarking challenge to a previous one, the SDMI challenge of 2000. (8) In this contest, the Secure Digital Music Initiative (SDMI) challenged “hackers” to break four audio watermarking technologies and two supplemental digital signature technologies. Several teams passed the preliminary phase of the challenge for all four watermark technologies, although representatives of Verance, Inc, say that their own watermarking technology was not broken by any participant. (10, 11, 12) There are some specific facets of the two contests which are worth comparing and contrasting: 9 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS • The SDMI challenge was three weeks, and the BOWS contest was three months. The organizers of the BOWS contest also arranged a post-challenge period of three months after the watermarking algorithm was made public. • The BOWS contest oracle took seconds, whereas the SDMI challenge oracle often took hours. The SDMI challenge oracle presumably involved watermark tests and listening tests by a human being, and the delay between submission and results could take varying amounts of time. (10) • The BOWS contest oracle’s behavior was well-defined. For example, the minimum quality metric of the BOWS contest is explicitly defined at 30.00 dB PSNR, and the detector was automated: it would accept any properly formatted image, even a blank image or wrong image. This allowed reverse-engineering attacks of the variety we employed. In the SDMI challenge, the human’s behavior was unclear. Some researchers submitted properly signed TOC files to the signature verification oracle, and received a rejection from the oracle; they could not tell if the oracle was defective or if the human had a policy of rejecting trivial submissions. (10) • The BOWS oracle told the user both the detector output and the image quality (which could also be computed at the user’s end, since the quality metric was explicitly defined.) The SDMI oracle would not reveal the specific reason for a rejection: either the detector found the watermark or the audio quality was poor, or both. • The SDMI challenge provided watermarked and un-watermarked samples; the BOWS contest only provided watermarked samples with no reference images. It is likely the SDMI challenge would not be won if there were no un-watermarked samples. With few possible opportunities to query an oracle, one could only analyze the difference between before and after clips, and possibly look for anomalies in the watermarked clips. One major difference is that the SDMI challenge was not amenable to oracle attacks. The BOWS contest allowed potentially millions of queries in serial, versus hundreds of queries in serial over the lifetime of the SDMI challenge. 10 2.3 Breaking BOWS by Identifying Super-robustness 2.3.1.3 Results Our team achieved a PSNR of 39.22 by the deadline, winning the contest. (9) We achieved this first by reverse-engineering the algorithm from experimental oracle submissions, and then by exploiting an observed sensitivity to noise spikes in the feature space. John Earl of Cambridge University deserves special recognition for rapidly increasing his PSNR results just before the deadline. If we extrapolate from his submissions, he could have beaten us were the contest just a few hours longer. (9) The algorithm was then revealed to be one published by M. L. Miller, G. J. Doerr and I. J. Cox in 2004. (13) We confirmed that our guess for the image transform and subband matched the paper, while their detector partially explained the weakness we used to defeat the detector. In the subsequent three-month challenge with a known algorithm, Andreas Westfeld achieved an astonishing 58.07 dB, a quality level exceeding even the minor error incurred by digitizing the image in the first place. (14) 2.3.2 REVERSE ENGINEERING The reverse-engineering of the algorithm was performed in two steps: reverse-engineering the feature space by submitting experimental images, and reverse-engineering the subband of the feature space by carving out selected portions of the feature space once it was known. 2.3.2.1 Determining the Feature Space In retrospect, the feature space was very clear given the obvious block artifacts in the images (see figure 2.3,) but we had to be certain. Our deducing of the feature space followed the following principle: For each suspected image feature space, find a severe attack that leaves that feature space untouched. The philosophy behind this attack is that a severe modification of an image should break all other watermarks, providing a test with high selectivity. Curiously, our approach was the opposite of the contest goals: we were to damage the image quality as much as possible, while failing to remove the watermark. For example, we would invert the image in luminance (this broke the watermark,) add DC offsets to the pixels (did not break the watermark,) mirror-flip the image, or 11 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS Figure 2.3: Some visible artifacts in the image, suggesting either severe compression or a block-based watermark. blocks of the image, et cetera. After a small set of tests, we submitted a test image which convinced us that the watermark was embedded in 8-by-8 DCT coefficients (figure 2.4.) Here, each 8-by-8 block was given the worst possible DC offset without changing any AC coefficients in an FFT or DCT of the block. This preserved the watermark, but reduced image quality to 3.94 dB PSNR. Figure 2.4: Some severe transformations of an image. Left, a shuffling of blocks aimed at preserving block-based FFT magnitudes. Right, a severe addition of 8-by-8 DC offsets. This preserved the watermark with only 3.94 dB PSNR. Our attacks included flipping signs of common transform coefficients, reasoning that if a watermark is embedded multiplicatively rather than additively, then only changes in magnitude would damage the watermark. We found that changes in sign broke the watermark. Thus this approach could tell us not only what feature space was used, but also what kind of detector might be used. The precise dividing line between a choice of 12 2.3 Breaking BOWS by Identifying Super-robustness feature space and choice of detector is unclear: if only magnitude information is used in a detector, is this a restriction of the feature space? 2.3.2.2 Determining the Sub-band Once we were convinced that the watermark was embedded in 8-by-8 AC DCT coefficients, we then submitted images with bands removed from each block. We took the largest bands we could remove without hurting the watermark, and computed the complement of their union. The result, illustrated below, matches the subband used in the watermarking algorithm (13). Figure 2.5: Left: AC removal pattern 1; Center: AC removal pattern 2; Right: intersection of the pattern 1 and 2. In this process we were guided by knowledge of common watermarking algorithms. Watermarks have commonly resided in low-frequency and middle-frequency bands, a tactic described in the seminal paper by Cox et al, in order to make the watermark difficult to separate from the image content. (15) We expect common subbands to be upper-triangular regions, and square regions in the DCT coefficient matrix. Hence, our attacks struck out lower-triangular sections of the matrix, and gnomonic sections. Taking the union of the largest lower-triangular and the largest gnomonic section, we found the largest “pattern” we could remove without damaging the watermark, the largest region of geometric significance. A second, parallel attempt at determining the sub-band focused on eliminating DCT coefficients arbitrarily until the watermark broke. This produced a contradictory result discussed in the next section: far fewer coefficients would need to be damaged 13 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS to eliminate the watermark, than our sub-band estimates (and later, the published watermarking algorithm) would suggest. 2.3.3 BREAKING THE WATERMARK Once we determined the secret algorithm, we could focus our work directly on the feature space in hopes of inflicting maximum damage. Our initial approach was to experimentally damage DCT coefficients until the watermark was removed, and then develop a working set of coefficients and multiplier that incurred minimum damage. We did not, however, adopt any sophisticated sensitivity attack, and performed only thousands of oracle queries, by hand. 2.3.3.1 A Curious Anomaly Early on, we discovered by experiment that an unusually small amount of damaged coefficients could render the watermark undetectable. We tried the following algorithm: 1. Sort the DCT coefficients by magnitude. 2. Let k ← 200, d ← 50 3. while d >= 1: (a) Tentatively eliminate the k largest coefficients. (b) If the watermark remains, k ← k + d. (c) If the watermark is removed, k ← k − d; d ← ⌊d/2⌋. Initially, the elimination rule set each coefficient to 0. Later we improved our result by amplifying rather than zeroing each coefficient, for example multiplying each of k coefficients by 3. Doing so, we found that a few hundred coefficients were enough to destroy the watermark in all three images. However, our previous analysis gave us a sub-band of 49152 coefficients (A 512-by512 grayscale image, 4096 8-by-8 blocks, 12 AC coefficients taken per block.) It seems unlikely that a few hundred zeroed coefficients would destroy a signal of 49152 samples. We suspected that some aspect of the detector made it sensitive to noise spikes in the right locations. 14 2.3 Breaking BOWS by Identifying Super-robustness The watermark detector, which we could not guess, was revealed to be a Viterbi decoder that extracted a long message from the 49152 coefficients, and compared that to a reference message. (13) We believe that this is the reason that the resulting detector could be derailed by a carefully chosen set of noise spikes. Whatever the cause, we then sought to reduce the number of samples we zeroed or amplified. 2.3.3.2 The Rest of the Algorithm Once we could eliminate the watermark by amplifying the k highest-magnitude coefficients, we then reduced the set size incrementally: 1. Choose k and D, so that multiplying the k highest-magnitude DCT coefficients by D destroys the watermark. 2. For j = 1 · · · k: (a) Reinstate the original value of coefficient j (b) If the watermark becomes detectable, return coefficient j to its damaged state. We therefore wiped from our list of damaged coefficients all those which were not needed to remove the watermark. The resulting set was surprisingly small, reducing hundreds of coefficients to less than ten, per image, which we then reduced by hand, increasing the amplification factor. Figures 2.3 and 2.4 illustrate the reduction process for images 1 and 2: raising the value D from 3 to 3.55, we found that some coefficients could be removed from the attack set for an overall increase in PSNR. We stopped when removing the next coefficient required an amplification that reduced the PSNR. 2.3.4 SUPER-ROBUSTNESS AND ORACLE ATTACKS For purposes of reverse-engineering, we adopted the strategy of making severe changes that will not effect one type of watermark. For example, if a watermark is embedded in block AC DCT coefficients, a random DC offset to each block has no effect on the watermark, even though it is an extreme amount of noise which renders the image useless. 15 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS (Coefficient numbers)*amplification PSNR (12,69,107,127,132,140,141)*3.4 37.53dB (12,69,127,132,140,141)*3.4 38.19dB (12,69,127,140,141)*3.4 38.97dB (12,69,127,141)*3.55 39.67dB Table 2.3: Successive attacks for image 1. By amplifying a few AC coefficients by the right value, the detector will fail. AC coefficients Multiplier 206 216 223 236 3.1 R 3.2 R R 3.2 R R R 9.5 R R R 279 286 310 314 316 R R R R R R R R R R Table 2.4: Successive attacks for image 2. We start with 9 AC coefficients. ’R’ denotes that the coefficient is no longer needed when the gain increases. Watermarks are typically designed so that the watermark remains detectable even if the image is reasonably degraded. A layperson might also expect the watermark to become undetectable if image distortion is ridiculously severe—however, some carefully chosen forms of degradation have no effect whatsoever on a detector, for example if they miss the feature space. We call this property super-robustness: the property of a watermarking algorithm to survive select types of quality degradation far beyond what any reasonable person would expect. If we can render an image unrecognizable and maintain the watermark, then we have found an attack to which it is super-robust. This may seem like an admirable property for a watermark to possess, but it is in fact a security weakness. There is no need for a watermark to survive complete destruction of an image, and in doing so it only gives information to an attacker. The destructive operation is likely to destroy all other types of watermarks, giving an attacker a reverse-engineering tool with high selectivity. 16 2.3 Breaking BOWS by Identifying Super-robustness Figure 2.6: The three images after the attack. Note the obvious block artifacts. 2.3.4.1 Oracle Attacks Exploiting Super-robustness The super-robustness attack exploits severe false alarms, by inflicting noise that would break most other watermarks. As human beings, we may use expertise and common sense in choosing such noise. A computer is not so gifted, and must determine severe false alarms blindly. We developed an oracle attack based on the principle of super-robustness, in which an incremental amount of noise is added to an image as long as the watermark remains detectable. (16, 17) We find that the noise component within the feature space quickly converges to the detection region boundary and stays there; several properties of this noise can be used to estimate the feature space size and detector threshold, if we guess in advance what type of detector is used. However, this takes far more oracle queries than we needed to break the more complicated BOWS watermark. 2.3.4.2 Mitigation Strategies There is a simple strategy to prevent super-robustness attacks: prevent severe inputs to the detector. This may be easier to say than do, because it is not clear how to recognize a “severe” input. We offer several recommendations to the designer of a watermark detector: • Make every effort to normalize an image before detection. • Place hard limits on feature coefficient magnitude, and overall energy. • Whenever possible, give the detector access to the watermarked image for comparison’s sake. For example, a watermark can contain an identifier, which can be 17 2. BREAKING THE BOWS WATERMARK BY SUPER-ROBUSTNESS used to look up a copy of a watermarked image. If the test image does not resemble the marked image in some rudimentary way, then the oracle should declare the watermark undetectable. • Reject inputs that fail to match a reasonable model of the image. • Use “cocktail watermarking,” as described in (18), and write a detector that does not reveal which watermarks are missing after an attack. • Choose the right application. In a fingerprinting or traitor-tracing application, an attacker doesn’t have direct access to the detector at all. Indeed, attackers may not even know if content is watermarked, until someone gets in legal trouble. Meanwhile, for content-control watermarks, especially those detected in consumer electronics devices like television sets or portable music players, oracle attacks will be very difficult to prevent. Note that these are properties of a detector, not the watermark itself. Any proper watermark feature sub-space will be super-robust to some attack—namely, destroying all image data outside the sub-space—so prevention must focus on masking this effect, using a sophisticated detector. 2.3.4.3 Other Advice to the Watermark Designer Beyond the problem of super-robustness, several other exploitable weaknesses are worth highlighting. We provide some tips to the watermark designer who wishes to avoid our attacks. 1. Do not use a steganographic algorithm to embed one bit. If you simply wish to watermark an image with a fixed copyright label, which is either present or not (e.g., a one bit message) do not adapt a stego algorithm intended to encode arbitrary long messages. It seems reasonable to adapt a stego algorithm into a watermark algorithm by embedding a long reference message, and later comparing that reference message to the output of the stego-decoder. However, the improved bitrate of a steganographic algorithm is often at the expense of robustness. Using that payload to send a single bit is suboptimal. 18 2.3 Breaking BOWS by Identifying Super-robustness 2. Watch for security flaws introduced by detector components. In this case, the watermark message was extracted using a Viterbi decoder, which we feel introduced an exploitable weakness. Since each chip of the signal was detected by correlation, why not simply detected the whole watermark with a single correlation? The answer is that the algorithm was intended to decode an unknown message, rather than search for one specific message known in advance. 3. Consider how stereotypical the feature space is. Many, many image watermarking algorithms use block DCT coefficients, or full-frame DCT coefficients. Many use straight correlators or normalized correlators, although the BOWS watermark did not. An attacker is going to guess these right from the start. We are not advocating security through obscurity, by asking a designer to pick an obscure transform; however, something like the key-dependent random image transforms of Fridrich, et al might increase the guessing entropy of the concealed algorithm, at least because the key-dependent image transform parameterizes more of the algorithm as key (19). 2.3.5 CONCLUSIONS The BOWS contest gave us an excellent opportunity to test new attack methodologies, and to further explore the process by which watermarks are broken in the real world. Because we reverse-engineered the watermark through the oracle, we question the value of keeping a watermark algorithm secret in practice (although for the purposes of the contest, keeping the algorithm secret was an excellent idea.) If a watermark detector can be exploited as an oracle, then the algorithm is essentially leaked; the extent of this leakage is the subject of further study. 19 3 Reversing Normalized Correlation based Watermark Detector by Noise Snakes 3.1 INTRODUCTION Sensitivity attacks, or more generally oracle attacks, have been employed for years to break watermarking systems and have become increasingly sophisticated. (4, 5) These attacks seek to remove a watermark by finding a point outside the watermark detection region that is as close as possible to the watermarked image. Sometimes, however, we are not initially interested in breaking an unknown watermark, but learning its structure, or its detection algorithm. If we have an unknown algorithm, can an oracle be used experimentally to deduce it? The answer is yes, as has been demonstrated by recent watermarking challenges (10, 20). This use of an oracle for reverse-engineering is ad-hoc, however, and based on expert knowledge. An automated watermark reverse-engineer, which uses a detector as an oracle and analyzed its output, would be very useful, and we have begun to explore how components of this system could be built. The problem is that embedding algorithms can be arbitrarily complex. However, if we have some general information about a watermark’s generic structure, we can then seek to reverse-engineer further details by oracle attack. For example we might assume that a watermark is feature-based, extracting a vector f of unknown features and comparing them to a reference watermark by some popular detector structure. 20 3.2 SUPER-ROBUSTNESS AND FALSE POSITIVES 3.1.1 Motivation This research was primarily motivated by our success in reverse-engineering the unknown BOWS watermark. (20) In this project, we took watermarked images, distorted them in severe ways, and fed them to the provided watermark detector as experimental images. Our goal was to test if a given feature space or detector was in use, by inflicting severe changes that would not hurt certain types of watermarks. This gave us a test with high selectivity for certain feature spaces, sub-bands and detector structures. Extending this concept, we sought to explore how the false positives of a detector leak information about the underlying watermarking algorithm. Figure 3.1: An image from the BOWS contest, and a severe distortion that preserves the watermark. From this experimental image we concluded that the watermark was embedded in non-DC 8-by-8 DCT coefficients. 3.2 SUPER-ROBUSTNESS AND FALSE POSITIVES Super-robustness is the property of a watermark to survive specific extreme and absurd image distortions. Robust watermarks are of course intended to survive distortion if the image quality remains high; however, one rarely mandates the converse, that the watermark should break when the image is damaged far beyond its useful value. Super-robustness may sound positive, but it is a security flaw. A watermark will be super-robust to operations specific to its feature space and detector, and this makes the watermark amenable to reverse-engineering, using the detector as an oracle. In 21 3. REVERSING NORMALIZED CORRELATION BASED WATERMARK DETECTOR BY NOISE SNAKES other words, super-robustness results in a detector that leaks information about the watermarking algorithm. 3.2.1 Examples of Super-robustness In the BOWS contest, we tested for several types of watermark algorithms by testing for super-robustness. Here are some properties we used to fashion experimental test images: • If a watermark is embedded in AC frequency transform coefficients, it should be immune to DC offsets. For block-based transforms, an independent DC offset can be added to each block without hurting the watermark, although many other types of watermarks would be damaged. See figure 3.1 for an example. • If a watermark is embedded multiplicatively rather than additively in a signal or a feature vector, then the signs of those features can be randomized without changing the embedded signal. • Any watermark that uses a proper feature subset can be super-robust to the wiping out of all other information. For example, if a full-frame DCT watermark uses a sub-band of 50000 AC coefficients, erasing or randomizing all but those 50000 coefficients could render the image unreadable without hurting the mark. • If a watermark detector uses linear correlation, an amplified signal should always trigger the detector while an attenuated signal eventually falls on the wrong side of the detection region. In contrast, a normalized correlation detector is robust to both severe amplification and attenuation. Note that super-robustness is a property of the detector, not of the watermark embedder. Ultimately the detector is responsible for admitting a false positive. While a certain feature space or embedding method may suggest a super-robustness attack, a detector does not have to blindly test an attacker’s experimental images. This suggests some remedies for examples of super-robustness described above: build a smarter detector, one that can recognize or rule out certain severe false positives. 22 3.2 SUPER-ROBUSTNESS AND FALSE POSITIVES 3.2.2 Generic Super-robustness Suppose we have a generic watermarking algorithm, whose detector follows this basic framework: 1. Extract a vector of n features f = {f1 , f2 , · · · fn } from the image. 2. Compare this vector to a watermark vector, for example by normalized correlation: f ·w ≶ τ ∈ (0, 1) kf kkwk 3. Report a successful detection if the correlation exceeds a chosen threshold. In the case of normalized correlation, the detection region is a cone, with its tip at the origin, its axis equal to the watermark vector w, and its interior angle determined by the threshold τ . (21) Specifically, the cone angle is θ = cos−1 τ . θ= cos −1 τ w f Figure 3.2: A normalized correlation detection region with threshold τ . A key fact about this and may other detection regions is that it is unbounded. If one can efficiently travel within it, then one can find images that set off the watermark detector that are very distant from the original. This gives us a means to construct severe false positives, from a starting point within the detection region: gradually add noise to an image, as long as the detector continues to recognize the watermark. 23 3. REVERSING NORMALIZED CORRELATION BASED WATERMARK DETECTOR BY NOISE SNAKES 3.3 3.3.1 MEASURING A DETECTION REGION Noise Snakes Our generic approach to generating false positives is to grow a “noise snake” using incremental uniform noise vectors. We do not know the watermark vector, hence the direction that we should be traveling; however, with an unbounded detection region we expect that an expanding noise vector will move outward into distant climes of the detection region, providing some useful information about its shape and orientation. Our noise snakes are constructed via the following algorithm: 1. Start with test image I, treated here as a vector. 2. Initialize our snake vector to J ← I. 3. Do for k = 1, 2, · · · K: (a) Choose a vector uniformly over the n-dimensional unit hypersphere Sn . This can be accomplished by constructing an n-dimensional vector X of i.i.d.N (0, 1) Gaussians, and scaling the vector to unit length. (b) Choose an appropriate scaling factor α, which for normalized correlation is ideally proportional to the length of J. (c) If J + αX still triggers the watermark detector, J ← J + αX. (d) Else, discard X and leave J unchanged. We have observed that in high dimensions, a noise snake quickly converges to the detection region boundary, and grows gradually outward. Thus a pair of these snakes, constructed separately, can provide useful information about the detection surface. 3.3.2 The Growth Rate of a Noise Snake When extending a noise snake by adding a uniform noise vector, we must wrestle with two conflicting factors: large noise vectors are more likely to drop us out of the detection region, but small noise vectors contribute little length per iteration. The optimal growth factor α is the one that maximizes the expected increment kαXk · Pr[δ(J + αX) = 1]. In the specific case of snakes on a cone, the growth factor α is proportional to the length of the snake as it grows. This can be seen by a simple and obvious geometric 24 3.3 MEASURING A DETECTION REGION w Figure 3.3: The construction of a noise snake. w Figure 3.4: A pair of constructed noise snakes, used to plumb the detection cone. argument: the cone is congruent to scaled versions of itself. Thus if there is an optimal length α to extend a snake of length 1, then M α is optimal to extend a snake of length M . We need only determine the appropriate α for a snake of unit length. The good news is that theoretically, the growth rate is exponential in the number of queries. The bad news is two-fold: first, the best growth factor depends on the dimension and cone angle, which we do not know. Secondly, the best growth rate is nevertheless slow: For feature sets in the thousands, the best α ranges from 0.16 to 0.06. For larger feature sets, growth is small. We find in our experiments that for realistic feature sizes, a snake of useful length requires a number of queries roughly proportional to the dimension n. 25 3. REVERSING NORMALIZED CORRELATION BASED WATERMARK DETECTOR BY NOISE SNAKES 0.5 0.5 Py*r*12 Py 0.4 0.4 0.35 0.35 0.3 0.25 0.2 0.15 0.3 0.25 0.2 0.15 0.1 0.1 0.05 0.05 0 Py*r*12 Py 0.45 Probability − Py Probability − Py 0.45 0 0.1 0.2 0.3 Length − r 0.4 0.5 0 0.6 0 0.1 0.2 0.3 Length − r 0.4 0.5 0.6 Figure 3.5: Optimal growth factors for a snake of unit length. On the left, a cone angle of π/3 and n = 1000. On the right, π/3 and n = 9000. 3.3.3 Information Leakage from Snakes We now explain how noise snakes can be used to estimate parameters of a watermark detector. First, we establish a fact about uniform noise vectors. lemma 1 If W is chosen uniformly over the unit n-sphere Sn , and v is an arbitrary vector, the probability Pr[W · v > cos θ] is Z Sn−1 θ n−2 sin ρdρ Sn 0 ...where Sn is the surface volume of Sn . proof 1 Since W is uniform, the probability of any subset of Sn is proportional to its measure. Our subset Sn is rotationally symmetric about v, so let us integrate over the v axis: consider point t ∈ [−1, 1] representing the v component of the hypersphere. For √ each t we have a shell of radius r = 1 − t2 , contributing a total hyper-surface measure √ of Sn−1 rn−2 dt2 + dr2 . For example, a sphere in three dimensions is composed of circu√ √ lar shells, and each a circular shell has contribution 2πr dt2 + dr2 = S2 r1 dt2 + dr2 . The sphere portion with angle beneath θ is Z r=sin θ p Sn−1 rn−2 dt2 + dr2 Area = r=0 r Z r=sin θ dt2 n−2 = Sn−1 r dr 1 + 2 dr r=0 r Z r=sin θ r2 Sn−1 rn−2 dr 1 + 2 = t r=0 26 3.3 MEASURING A DETECTION REGION = Z r=sin θ Sn−1 rn−2 r=0 dr t ...since 2tdt + 2rdr = 0. Substituting r = sin ρ, we get Area = Sn−1 Z θ dr t = dρ and sinn−2 ρdρ 0 And we divide by Sn to get the probability of hitting that region. The area of a unit hypersphere Sn is 2(n+1)/2 π(n−1)/2 (n−2)!! Sn = n/2 12π ( 2 n−1)! for n odd for n even The surface fraction Cn = Sn−1 /Sn−2 therefore has a closed form (22) Cn = ( 1 (n−2)!! 2 (n−3)!! 1 (n−2)!! π (n−3)!! for n odd for n even It might surprise the reader to know that the surface volume of a unit hypersphere actually decreases with n, for n > 7. Altogether, this means that if θ < π2 , then the probability drops exponentially with n. This is the “equatorial bulge” phenomenon in high dimensions: as the number of dimensions n gets large, the angle between two uniformly chosen direction vectors will be within ǫ of π/2 with high probability. Corollary 1 Two independently generated noise-snakes have off-axis components S and T whose angle is very close to π2 . proof 2 The density function of the set of noise-snakes is rotationally symmetric about the watermark axis. This is because each component, a uniform vector, has a rotationally symmetric density, and because if s is a valid noise snake, so is T s, where T is any rotation holding the cone axis constant. Because of this, the probability Pr[S ∈ F ] = Pr[T S ∈ T F ]. If we subtract w and then normalize each snake, the symmetric distribution implies that W = (S − w)/kS − wk is uniformly distributed over the unit n − 1 sphere. 27 3. REVERSING NORMALIZED CORRELATION BASED WATERMARK DETECTOR BY NOISE SNAKES w Figure 3.6: Lemma: two independently generated snakes have approximately perpendicular off-axis components. This observation gives us a simple means to estimate the cone angle from two √ constructed noise snakes X and Y of equal length r. Using trigonometry, ( 2r sin θ)2 = r2 +r2 −2rr cos φ where φ is the angle between the snakes (see figure 3.7.) Rearranging, we get: sin2 θ = 1 − X · Y We can then estimate the cone angle and detector threshold by constructing two snakes of sufficient length, and computing their dot product. θ φ w Figure 3.7: The relation between the cone angle and the dot product of two independently generated noise snakes. 28 3.3 MEASURING A DETECTION REGION 3.3.4 Extracting the Size of the Feature Space Once we have a decent estimate for the cone angle, we can use another technique to estimate the feature space size, a more useful piece of information. To accomplish this, we use the detection oracle again to deduce the error rate under two different noise power levels. If we have a watermark vector w which falls squarely within the detection cone, and add a uniform noise vector r, the probability of detection is Sn−1 Sn Z ψ sinn−2 xdx −1 kwk ψ = θ + sin sin θ krk Pr[δ = 1] = (3.1) 0 (3.2) Where θ is the cone angle, which we estimate using the technique described earlier. The second equation has one unknown, the watermark length kwk. The top equation has one unknown, n; the hit rate PY can be estimated by experiment. If we then consider different detection rates PY for uniform noise vectors of length A, and then for noise of length B, we can combine these equations into the following identity: tan θ = A sin ψA − B sin ψB A cos ψA − B cos ψB ...where ψA and ψB are the integration limits in equation (1). This removes the unknown w, while n is implicit in the constants ψA and ψB . Here is our algorithm: 1. Pick two power levels A and B. They can be arbitrary, but make sure that the error rate under those noise levels is reasonably estimable in a few hundred trials (e.g., 0.5 rather than 0.0000005.) 2. Use the watermark detector to estimate PA and PB , the detection rate under uniform noise of length A and B, respectively. 3. For all suspected values of n: (a) Use the hypothesized n and estimated detection rates in equation (1) to estimate the parameters ψA and ψB . The integral does not have a simple closed form, but the integration limit is easily determined by Newton’s method. 29 3. REVERSING NORMALIZED CORRELATION BASED WATERMARK DETECTOR BY NOISE SNAKES (b) Compute ǫn ← tan θ − A sin ψA −B sin ψB A cos ψA −B cos ψB 4. Choose the value of n that minimizes the error ǫ. 3.4 RESULTS We tested our techniques on a generic watermark detector with a varying number of DCT feature coefficients. We used normalized correlation with a detection threshold of 0.5. This requires some justification: a proper detector would probably have a much higher threshold, since for τ = 0.5 the false alarm rate is unnecessarily low. However, in our experience watermark detectors are often designed in an ad-hoc manner, and designers are often prone to choosing round numbers, or in the case of thresholds, a value squarely between 0 and 1. In fact, a higher threshold would increase the growth rate of our noise snake. We first generated noise snakes to deduce a detector threshold. Figure 3.8 shows our estimates for a detector with τ = 0.5. This required an average of 1016 detector queries per experiment, to generate two snakes. Figure 3.9 shows the corresponding dimension estimates, once the threshold is deduced to be 0.5. It is worth pointing out that in this detector, the asymptotic false alarm probability is approximately 2.39 × 10−33 . This means that we can roughly estimate a very low false alarm probability in only thousands of trials. 3.5 3.5.1 DISCUSSION AND CONCLUSIONS Is it Useful? Our experience reverse-engineering watermarks “by hand” shows us that only dozens of experimental queries are needed by an expert to identify a watermark using a common feature space, sub-band, and detector. In contrast, this technique is generic, and utilizes no expert knowledge or a priori information about likely watermarks. The number of queries is far larger. We feel that techniques of this type can be combined with codified expert knowledge into an overall system that dismantles an unknown algorithm and estimates its parameters far more efficiently. 30 3.5 DISCUSSION AND CONCLUSIONS Estimated Detector Threshold 1 0.9 0.8 Threshold 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 Experiment # Figure 3.8: Estimated threshold using two noise snakes. Ground truth is τ = 0.5, n = 500. Estimated dimension 1000 900 800 Dimension 700 600 500 400 300 200 100 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 Experiment Figure 3.9: Estimated dimension using two noise snakes. Ground truth is n = 500. 31 3. REVERSING NORMALIZED CORRELATION BASED WATERMARK DETECTOR BY NOISE SNAKES Another question is whether such detector parameters are very useful. So what if we know the rough number of features used by a watermark detector? The answer is that this contains more information than one might initially think. For example, the BOWS watermark used around 50000 AC DCT coefficients. (13) Consider the significance of that number. In fact the exact value was 49152, the only genuinely “round” hexadecimal number near 50000. If one is told that an image watermark uses “around 50000” coefficients, 49152 is a very good guess for the actual value. It also turns out that 49152 = 12 × 4096, and the BOWS test images contained 4096 8-by-8 blocks. Hence from even a rough estimate of the feature space size, one can make a good guess that the algorithm is block-based, and amends 12 coefficients per block. This is, in fact, the algorithm used. 3.5.2 Attack Prevention As watermark designers, we might decide that enough is enough, and we have had it with these snakes on this cone. What are we going to do about it? The most obvious strategy is to avoid letting a detector be used as an oracle. Beyond that, preventing super-robustness in a detector could prevent these attacks. In our geometric analogy, this means that a detection region should be bounded. Instead of a cone, for example, we can have a truncated cone by rejecting any image whose feature vector is overlong. The problem with this approach is that it only prevents super-robustness within the feature space: as long as some other information is ignored—that is, as long as the feature space is a proper subspace of the image—we can launch super-robustness attacks to deduce the feature space. When considering non-feature and feature spaces combined, the detection region is no longer a cone, but a hyper-cylinder with the cone as a base. This region remains unbounded. Time will tell if these attacks can be prevented, or if super-robustness can be effectively utilized to reverse-engineer watermarking algorithms. We feel that great gains in efficiency can be made, and an overall system for automation of watermark reverseengineering can use super-robustness as an exploitable flaw, and use oracle attacks such as those described here to deduce crucial unknown parameters of a data hiding system. 32 4 A More Precise Model for Watermark Detector Using Normalized Correlation 4.1 Introduction As has been shown in previous work, with deliberately designed attacks, detection statistics obtained from an oracle may shed light on certain parameters of a watermark detector. One technique, the technique of noise calipers or snakes on a cone, was based on a general strategy used to win the first phase of the first BOWS contest; (9, 13) the strategy of locating modes of super-robustness within a detector was formalized into an estimation technique of constructing “noise snakes” on the surface of a detection cone by using the detector as an oracle. These results could be used to estimate the detection region’s geometry. (20) In the subsequent BOWS2 contest, the designers proposed a series of teeth-shaped structures along the watermark detection boundary specifically designed to foil this type of reverse engineering technique. (23) The simplified geometric model of watermark detection leads to a rather compact analytical result which aids in reverse engineering useful information about the watermark detector. (16, 17) However, it suffers from a simplistic and unrealistic model of a watermarked image. We can imagine a cover image as a vector within a feature space; a watermark vector is added or in some way embedded to push the cover vector into a detection region.Thus we have two vectors, a cover and watermark, whose sum lie within a detection cone whilst the cover alone lies without. In the noise calipers technique of Craver et al, a watermarked image is modeled as a point along a detection cone’s axis. This greatly simplifies the mathematics needed to estimate the probability of proper detection when uniform noise is added to the watermarked image, because 33 4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR USING NORMALIZED CORRELATION a bubble of uniform noise about the image corresponds to a hypersphere intersecting with a cone, a shape with rotational symmetry about the cone axis. (16, 17) In reality, a watermarked image is offset from the cone axis – a point along the axis represents the watermark alone, added to an empty cover image. Thus this simplistic model is inaccurate, although it produced precise estimates of watermark detector parameters for feature spaces of hundreds of dimensions. On the other hand, multimedia watermark algorithms often have a very large watermark feature space, and this simplified model is not precise enough under very high dimensions. Therefore, we have inserted a cover vector into the model in hopes of producing more accurate predictions at the cost of complexity. In this paper, we will first derive the analytical expression for the detection rate of a watermarked work subject to Gaussian or uniform noise. Then we will show that this model provides us with a practical way to disclose information of watermark detector with large feature space dimensions. Computing the measure of intersection between an n-cone and an offset n-sphere is of such complexity that it requires numerical integration. 4.2 Problem Setting Consider a watermarking scheme whose watermark feature space has dimensionality n, with normalized correlation being adopted as the measure of watermark detection. The watermark detection boundary can be visualized as a hyper-cone in n-dimensional Euclidean space. If a watermark attacker adds in-domain Gaussian noise scaled to a magnitude R to a watermarked image Iw , the set of possible noised images can be visualized by an n-sphere1 of radius R centered at the vector Iw. Note that when a Gaussian vector is scaled to a fixed length, it is uniformly distributed over a hypersphere. The same construction is used to visualize noise added to a watermarked image. The 3D projection version of the overall picture is illustrated in figure. 4.1. Let A = {a1 , · · · , an } be any vector in n-d space and θ be the detection threshold angle, we have the intersection of the n-sphere and the n-d hyper-cone be defined by 1 We follow the geometrician’s notation here. Under topological notation, a sphere in n spatial dimensions is considered an (n−1) sphere, because it is locally homeomorphic to an (n−1)-dimensional hyperplane. 34 4.3 Quartic Equation Figure 4.1: A detection cone intersects with a Gaussian noise ball. the following two equations: A·w = cos θ ||A|| · ||w|| ||A − Iw || = R 4.3 (4.1) (4.2) Quartic Equation For simplicity and without loss of generality, we can choose our feature space coordinate system so that the watermark cone axis is the unit vector e1 , and the cover image is approximately a multiple of e2 . Lack of perfect orthogonality means that some of the energy of the cover image I lies along e1 . We can choose our coordinate system so that all energy from image and mark are confined to these two vectors: w = (1, 0, · · · , 0) | {z } n−1 Iw = (c1 , c2 , 0, · · · , 0) | {z } n−2 35 4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR USING NORMALIZED CORRELATION From equation (4.1), we have: a21 = cos2 θ · n X a2i (4.3) i=1 Which could be expanded as: − sin2 θ · a21 + cos2 θ · a22 + cos2 θ · n X a2i = 0 (4.4) i=3 From equation (4.2), we have: (a1 − c1 )2 + (a2 − c2 )2 + n X i=3 a2i − R2 = 0 Noticed in both equation (4.4) and (4.5), there is a common term P then let h2 = ni=3 a2i . (4.5) Pn 2 i=3 ai . We From equation (4.4), we have: a1 = cot θ q a22 + h2 (4.6) Substituting equation (4.6) to equation (4.5), we have: cot2 θ (a22 + h2 ) + c21 + (a2 − c2 )2 + h2 − R2 = 2c1 q a22 + h2 · cotθ (4.7) Taking the square of both sides and allowing csc2 θ h2 + c21 + c22 − R2 = g, (csc2 θ a22 − 2c2 a2 + g)2 = 4c21 (a22 + h2 ) cot2 θ (4.8) Rearranging gives us a quartic equation with respect to a2 , Aa42 + Ba32 + Ca22 + Da2 + E = 0 A = csc4 θ B = −4 csc2 θ c2 C = 2g csc2 θ − 4 cot2 θ c21 + 4c22 D = −4gc2 E = g 2 − 4h2 cot2 θ c21 36 (4.9) 4.4 Numerical Solutions There will be four roots for this quartic equation and only two of them are valid, which correspond to the components of the two intersection points p1 and p2 along a slice of height h parallel to e2 . For any possible h, after the coordinates (a1 , a2 ) of p1 and p2 are determined, we are able to calculate the detection rate within that projection plane which is defined by the fixed h. Then the overall detection rate could be calculated using the following double integral, Pr[YES] = = R hsup −hsup Sn−2 2 Sn Sn−2 hn−3 Sn R hsup 0 R p2 p1 Pr[a, h]da dh Rn−1 hn−3 R p2 p1 Pr[a, h]da dh Rn−1 (4.10) Where hsup is the supremum of h given a fixed noise amplitude R and a detection angle θ, i.e. where the two intersection points p1 and p2 merge into one. Sn is the hyper-surface area of an n-sphere of unit radius, and we have: Sn−2 n−2 = Sn 2π 4.4 Numerical Solutions Due to its inherent complexity, the analytic solutions of equation (4.9) are not obtainable even with the aid of computer, we have to resort to numerical analysis instead. 4.4.1 Identify Valid Roots In order to pick out the two valid roots of equation (4.9), we first calculate a1 from a2 by equation (4.6), then all four pairs of (a1 , a2 ) are subject to two verification criteria. Since equation (4.6) was just used and it is derived from equation (4.1), we use equation (4.5) which is derived from equation (4.2) as the first criterion; the second criterion is: both a1 and a2 should be real. 4.4.2 Detection Rate Our numerical analysis has three known inputs, the normalized watermark vector w, the watermarked image Iw , and the detection cone angle θ. For a given attacking strength R, the numerical version of formula (4.10) is: 37 4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR USING NORMALIZED CORRELATION Pr[YES] = = √ hn−3 α(h) R2 − h2 Rn−1 r hsup h n−3 n−2 X h2 α(h)( ) 1− 2 πR R R Sn−2 2 Sn Phsup h=0 (4.11) h=0 In formula (4.11), we do summation along h. For a fixed h, we have a hyperplane cuts both the detection n-cone and the noise n-sphere. Therefore, within this 2D cutting plane, we have a circle intersects with a hyperbola, which gives two intersection points p1 and p2 in general. The angle α(h) represents the portion of the arc between p1 and p2 which is within the detection cone. Since the range of arccos function lies within [0, π), in order to address α(h) larger than π, we introduced a third point p3 √ at (c1 + R2 − h2 , c2) and divide α(h) into two angles, i.e. ∠1 formed by p1 and p3 , and ∠2 formed by p2 and p3 . ∠1 and ∠2 can be calculated based on the coordinates of the three points using cosine theorem. Note that there are three different geometrical relationships as illustrated in figure. 4.2 and α(h) should be calculated accordingly. p3 p3 p3 p 2 ∠2 ∠2 ∠1 p 2 p 2 p1 ∠1 ∠2 ∠1 p1 p1 Figure 4.2: Three possible geometrical relationships. Arc in solid line is within the detection cone. h (n−3) ) . The calculation suffers The hardest part in formula (4.11) is the term ( R from a phenomenon we called ‘tail bulge’, which is due to the very large exponent n − 3.1 It means almost all contributions of the whole detection rate lies at the very tail part of h where h gets really close to hsup . Even with the ability of arbitrary precision computation, obtaining the final detection rate within a reasonable precision is not an easy task. 1 It evokes analogy with the term ‘equatorial bulge’ in hyper-dimensional geometry. 38 4.4 Numerical Solutions However, certain watermark detector parameters can still be disclosed in some other way even if the detection rate is not obtainable. In fact, we found out that the arc angle α(h) exhibits certain behavior when it gets close to the critical detection rate, i.e. when the detection rate gets around 50%, which can be used as a good indicator to reveal the structure of the watermark detector. 4.4.3 The Pattern of α(h) In order to test our model, we implemented a generic 8-by-8 block based AC DCT watermark embedding algorithm which is similar to the one that was used in the BOWS contest, and then applied two different sets of parameters for testing. In the first group of tests, we use those five AC DCT coefficients which are marked up with numbers in figure. 4.3 for watermark embedding. In the second group of tests, we use twelve coefficients which are marked up with gray shade in figure. 4.3. Figure 4.3: Left: AC coefficients used for embedding. Middle: Image attacked by noise with amplitude R = 10000. Right: Image attacked by noise with amplitude R = 15000. 4.4.3.1 Five AC DCT Embedding In our first group of tests, the standard 256-by-256 grayscale testing image ‘cameraman’ is used as cover image I. A normalized watermark signal w is generated at random. 8-by-8 DCT transformation of the cover I is performed, and an appropriately magnified version of w is embedded into the five AC DCT coefficients of each block to obtain the watermarked image Iw . Hence, the feature space dimension n is 5120 in this test. The numerical values of the vector Iw is: 39 4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR USING NORMALIZED CORRELATION . Iw = (92.8, 5499.8, 0, · · · , 0) | {z } 5118 5−AC DCT Embedding, thresold=0.011 195 190 5−AC DCT Embedding, threshold=0.011 0.62 R=5k R=6k R=7k R=8k 0.6 0.58 0.56 Detection Rate Arc Angle(in degrees) 185 180 0.54 0.52 0.5 175 0.48 0.46 170 0.44 165 0 1000 2000 3000 4000 h 5000 6000 7000 0.42 4000 8000 4500 5000 5500 6000 6500 7000 Gaussian Noise Amplitude R 7500 8000 8500 9000 Figure 4.4: Detection threshold is 0.011. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo method. We first set the detection threshold cos θ to be 0.011, and then check if the numerical results predicted by our model agrees with the data collected by Monte Carlo method using real image and real random Gaussian noise of amplitude R. Our test results are illustrated in figure. 4.4. The left plot in figure. 4.4 shows four separate curves of the arc angle α(h), each under a different attacking strength R respectively. It is obvious that there is a sharp transition of the curve shape between the curve of R = 6000 and the curve of R = 7000. This phenomenon is echoed by the right plot, in which the 50% detection rate also lies between 6000 and 7000. According to the structure of formula (4.11), α(h) can be deemed as a weighting h (n−3) factor of ( R ) , and we know that virtually all contributions of the detection rate lies at the very tail part of h. Therefore, whether the curve of α(h) is pointing upwards or downwards, i.e. larger or smaller than π, at its tail part would suggest whether the detection rate is greater or less than 50%. Likewise, we obtained another pair of curves which are illustrated in figure. 4.5 for the case of cos θ = 0.008. Note that this time our model predicts the critical condition would occur within (10000, 11000), while the 50% detection rate appears 40 4.4 Numerical Solutions 5−AC DCT Embedding, thresold=0.008 188 186 0.58 184 0.56 182 0.54 Detection Rate Arc Angle(in degrees) 5−AC DCT Embedding, threshold=0.008 0.6 R=9k R=10k R=11k R=12k 180 178 0.52 0.5 176 0.48 174 0.46 172 170 0 2000 4000 6000 h 8000 10000 0.44 0.8 12000 0.85 0.9 0.95 1 1.05 1.1 Gaussian Noise Amplitude R 1.15 1.2 1.25 1.3 4 x 10 Figure 4.5: Detection Threshold is 0.008. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo method. within (9000, 10000) in our Monte Carlo experiment. While there could be other causes, we suspect this discrepancy is mainly due to the image clipping effect, i.e. under severe attacks like R = 10000, many pixel values will exceed the range of [0, 255] and are subject to a clipping processing before they can be saved as an attacked image. For example, attacks like the one shown in the middle of figure. 4.3 with R = 10000 will overflow around 9% of the image pixels. This clipping process breaks the perfect spherical shape of the noise hypersphere, and it is not considered in our model. Since we chose a coordinate system so that the watermarked image Iw only has nonzero components in e1 and e2 , it means we applied a rotation transformation whose orientation is determined by the watermarked image Iw as well as the random watermark w. Hence, we should expect this discrepancy to be persistent under the same test settings, i.e. the same Iw and w. In our experiment, we found out that the discrepancy only occurs when a relatively small detection threshold is chosen, or equivalently, when the critical condition occurs at a relatively large R. Fortunately, many practical watermarking algorithms chose, and in fact should choose, a relatively large detection threshold, for the sake of its own security. We have already demonstrated in our previous paper how the super-robustness of a watermark algorithm can actually compromise its security against human attackers.(20) In all, the curve of α(h) which is provided by our model matches the detection curve obtained from Monte Carlo experiments when R is not too large. When R gets very 41 4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR USING NORMALIZED CORRELATION large, the critical range of R predicted by our model is consistently one step larger than the results from Monte Carlo experiments. 4.4.3.2 Twelve AC DCT Embedding In our second group of tests, we choose the 256-by-256 grayscale testing image ‘lena’ as cover image I. A normalized watermark w is generated at random. Then we embed mark into twelve AC DCT coefficients of each block to obtain the watermarked image Iw . This time, the feature space dimension n is 12288, and the numerical values of Iw is: . Iw = (162.5, 5217.3, 0, · · · , 0) | {z } 12286 12−AC DCT Embedding, thresold=0.015 200 195 12−AC DCT Embedding, threshold=0.015 0.8 R=8k R=9k R=10k R=11k 0.75 0.7 190 185 Detection Rate Arc Angle(in degrees) 0.65 180 175 0.6 0.55 0.5 0.45 170 0.4 165 160 0.35 0 2000 4000 6000 h 8000 10000 12000 6000 7000 8000 9000 10000 Gaussian Noise Amplitude R 11000 12000 13000 Figure 4.6: Detection threshold is 0.015. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo experiment. Once again, as shown in figure. 4.6, our model prediction agrees with the results from Monte Carlo experiment. On the other hand, when the detection threshold gets too small, consistent discrepancy reappears, as shown in figure. 4.7. The attacked image in the middle of figure. 4.3 is attacked by a Gaussian noise with R = 15000. The average overflow rate for this strength of attack is marginally over 9% under this group of settings. 42 4.5 Conclusions 12−AC DCT Embedding, thresold=0.01 190 12−AC DCT Embedding, threshold=0.01 0.65 R=14k R=15k R=16k R=17k 188 0.6 186 182 Detection Rate Arc Angle(in degrees) 184 180 178 0.55 0.5 176 0.45 174 172 170 0 2000 4000 6000 8000 10000 12000 14000 16000 0.4 1.2 18000 h 1.3 1.4 1.5 1.6 Gaussian Noise Amplitude R 1.7 1.8 4 x 10 Figure 4.7: Detection threshold is 0.01. Left: the arc angle α(h) predicted by our model. Right: the detection rate obtained by Monte Carlo experiment. 4.5 Conclusions In this paper, we proposed a more precise model for watermark detectors using normalized correlation, which is a popular choice in practice. The model boils down to a quartic equation to solve, which is inherently complex due to the symmetry breaking as compared with our former model. Since its analytical solutions is not obtainable even with the aid of computer, we turned to solve it numerically. Due to the quirky ‘tail bulge’ phenomenon on the way of evaluating the detection rate, the overall detection rate is also infeasible to compute. However, also due to the ‘tail bulge’ effect, information of the watermark detector can be disclosed by checking the pattern of the arc angle curve, i.e. by catching the shape transition of the curve which depicts α(h) versus R. We implemented a generic block based AC DCT embedding algorithm and use two different sets of parameters to verify our model. It turned out that our model provides good match with the detection results obtained from Monte Carlo experiments when R is not really large. This is when the feature space dimension n equals to 5120 or 12288. Hence it proved to be reliable under large feature space dimensions and could be used as a practical way to reverse engineer parameters of the target watermark detector. On the other hand, our model has consistent discrepancy when the attacking noise amplitude R gets too large, and we suspect this is mainly due to the clipping effect 43 4. A MORE PRECISE MODEL FOR WATERMARK DETECTOR USING NORMALIZED CORRELATION under large R. Fortunately, this issue is largely relieved by the principle that a secure watermark algorithm should not have a really small detection threshold, or in other words, super-robustness has been proved to be a security weakness of a watermark algorithm. 44 5 Circumventing the Square Root Law in Steganography by Subset Selection 5.1 The Square-root Law in Steganography Cachin introduced a secure measure for steganographic systems using the KullbackLeibler divergence DKL (P (n) ||Q(n) ), where P (n) and Q(n) denote the distribution of the cover data and the stego data respectively.(24) Under this security measure, a steganographic system is called perfectly secure if DKL (P (n) ||Q(n) ) = 0, and is called not perfectly secure if DKL (P (n) ||Q(n) ) > 0. Likewise, a ǫ-secure steganogrphic system is, DKL (P (n) ||Q(n) ) = ǫ, ǫ>0 (5.1) For a not perfectly secure steganographic system as defined above, if the three conditions listed as follows are satisfied, the Square Root Law states: “The secure payload grows according to the square root of the cover size.” (25, 26, 27) • The steganographic cover can be modeled as a stationary Markov chain. • The embedding operation can be modeled as independent substitutions of one state for another. • The embedding scheme does not preserve all statistical properties of the cover. Note that for a perfectly secure steganographic system, the embedding rate grows linearly with the cover size – a positive capacity, and an imperfectly secure steganographic system has zero-capacity. In practice, for complex objects like digital media, it 45 5. CIRCUMVENTING THE SQUARE ROOT LAW IN STEGANOGRAPHY BY SUBSET SELECTION is impractical(or may never be possible) for either the warden or the steganographer to have perfect knowledge about the cover source. Therefore, the square root law is well applicable for practical steganographic systems that are all not perfectly secure, and its secure payload result essentially refines the established zero-capacity result for such systems. (26, 27) Also note that the secure payload in this law refers to the number of embedding changes, and hence may be different from the steganographic information been conveyed. By using adaptive source coding methods, the square root law implies an √ asymptotic information entropy of O( n log n). 5.2 Circumventing the Law by Subset Selection 5.2.1 Introduction While the square root law works on the small enough cover subset to evade detection, we work on the problem of finding the largest cover subset that possesses a desired distribution. That is, a substring whose type distribution is close to a fixed point under the embedding processing. In other words, a distribution that is unchanged under the process that alters it. It is worth noting that our scheme does not contradict the square root law, because our subset selection procedure depends on the observation of the cover which deviates from the ‘independence condition’ of the square root law. 5.2.2 The i.i.d. Case For a stream of i.i.d. random variables {Xk }nk=1 over a finite outcome set C = {Ci }N i=1 , its histogram is equivalent to the pmf1 of a single random variable, which is denoted by fX . The task of finding a subset of coefficients that has a histogram of f can therefore be translated to: find a scale factor α so that for all symbols in C α max[ C f (Ci ) ]≤1 fX (Ci ) (5.2) The number of embeddable coefficients is maximized when equality is achieved in the above inequality. αu = min[ C 1 Probability mass function. 46 fX (Ci ) ] f (Ci ) (5.3) 5.2 Circumventing the Law by Subset Selection Also, it is easy to prove 0 ≤ αu ≤ 1 (5.4) The left side of the inequality can be proved by the fact that both f and fX are pmf and therefore are non-negative. Moreover, let Cf denote the set of outcomes that f > 0 and CfX denote the set of outcomes that fX > 0, we have 0 < αu ≤ 1 if Cf ⊆ CfX (5.5) The right side of the inequality in (5.4) could be proved by contradiction in the following manner Let αu > 1, then 1= X C fX (Ci ) ≥ X αu f (Ci ) = αu > 1 C Which is a contradiction, therefore αu ≤ 1, with equality if and only if f = fX . Consequently, the maximal number of embeddable coefficients is n X αu f (Ci ) = αu n (5.6) C Therefore, if the condition in (5.5) is satisfied, the embedding payload is linearly proportional to the total number of coefficients n in i.i.d. case. Meanwhile, it is worth noting that what the embedding function g really does on this subset of data is no more than the shuffling operation, hence the histogram is preserved. 5.2.3 The First Order Markov Case Let A denote the transition matrix with element aij , {i, j = 1, · · · , N }. We assume for all (i, j), aij > 0 so that A is irreducible. aij = Pr{Xk = Cj |Xk−1 = Ci } Let hC denotes the histogram of the cover, it no longer equals to fX , which is the pmf of a single random variable Xk . Follow the same procedure as the i.i.d. case, we can choose a subset of coefficients that follows f ‘under’ the histogram hC . Those coefficients are randomly distributed along the cover coefficient stream. However, the same 47 5. CIRCUMVENTING THE SQUARE ROOT LAW IN STEGANOGRAPHY BY SUBSET SELECTION shuffling operation would not work anymore, because aij now links adjacent coefficients together. On the other hand, due to the fact that coefficients are linked together, we can still perform shuffling if we cut the cover coefficient stream into chunks in a non-overlapping manner. The requirement for a possible shuffle is, two candidate chunks must have the same header and tail coefficients, anything in between does not matter because we are dealing with first order Markov process. Moreover, it is obvious that only chunks of three coefficients long are what we really looking for, since the real embeddable coefficients are those between the header and the tail. 5.2.3.1 The Algorithm The possibility of a triple with header Ci and tail Ck is Pr{(Ci , C∗ , Ck )} = Pr(Ci ) X (aij ajk ) (5.7) j Also, we have X Pr{(Ci , C∗ , Ck )} = 1 (5.8) i,k The first group of embeddable coefficients is chosen according to the following algorithm: 1. Choose (i, k) that has maximal Pr{(Ci , C∗ , Ck )} according to (5.7) and name them (H, T ). 2. Draw circles along the cover stream where there is triple matches (CH , C∗ , CT )1 . 3. Pick out as many non-overlapping circles as possible. Figure 5.1 illustrates the second and the third step. Each point on the line denotes a coefficient in the cover stream. The worst overlapping case is illustrated in Figure 5.2, with only a fraction of all circles can be picked out in a non-overlapping manner. In this worst case, at most 1/2 of the circles could be picked out when CH 6= CT , and at most 1/3 of the circles could be picked out when CH = CT . 1 C∗ denotes the wildcard symbol. 48 5.2 Circumventing the Law by Subset Selection The second step H CHCH C CT CT CT H H CH C C CT CT CT The third step H CH C CT CT H CH C CT CT Figure 5.1: The second and third step of our algorithm. H H H H H H When CH 6= CT CHCH C C C C C C CT CT CT CT CT CT CT CT H H H H H H H When CH = CT CH C C C C C C C CT CT CT CT CT CT CT CT Figure 5.2: The worst overlapping case. 49 5. CIRCUMVENTING THE SQUARE ROOT LAW IN STEGANOGRAPHY BY SUBSET SELECTION Since the number of embeddable coefficients is equal to the number of non-overlapping circles on the cover stream. In the worst case, the maximal number of the first group of embeddable coefficients is X n 1 n Pr{(CH , C∗ , CT )} = Pr(CH ) (aHj ajT ) 3 3 9 j X n 1 n nlb Pr{(CH , C∗ , CT )} = Pr(CH ) (aHj ajT ) 1 = 3 2 6 nlb 1 = j if CH = CT (5.9) if CH 6= CT (5.10) As we can see, the number of embeddable coefficients given by (5.9) and (5.10) is linear to n, and it is for the worst case. Likewise, we can pick out the second group of embeddable coefficients by firstly choosing head and tail which give the second largest Pr (Ci , C∗ , Ck ) and then draw non-overlapping circles on the remaining white spaces along the cover stream. Repeat this method, we can pick out the third and more groups of embeddable coefficients. Note that during embedding, the shuffling operation is always limited within the same group. The total number of embeddable coefficients is the summation over all possible groups. Since for the worst case of the first group, the number of embeddable coefficients is already linear to n, we proved that by using the technique of subset selection, linear embedding rate can be achieved for the first order Markov case. 5.3 Conclusion Subset selection technique is a theoretical trail of our embedding philosophy, i.e. create a stego-friendly cover first before embedding. Since no distortion limit is considered in our analysis, in the big picture, it is a special case of the more general result established in (28).1 1 Thank Professor Fridrich for pointing this out. 50 6 Covert Communication via Cultural Engineering 6.1 Introduction The concept of supraliminal channels was introduced in 1998, as a steganographic primitive used to achieve public key exchange in the presence of an active warden.(29) Alice and Bob must communicate in the presence of Wendy, a warden; they have no shared keys in advance, and must enact a key exchange to use a steganographic algorithm. However, any overt key exchange is evidence of secret communication, and any steganographic bits embedded with a known key can be damaged by Wendy. Supraliminal channels allow robust and innocuous transmission of public key data in this environment. In this paper, we describe how a new feature added to the video chat capabilities of iChat for the Apple Macintosh allows supraliminal content to be attached to a videoconferencing session. By implementing animation primitives that covertly accept ciphertext from an external application, and then fold this ciphertext into the evolution of a computer animation, ciphertext can be transmitted within the semantic content of the video. This paper is organized as follows: in section 6.2, we review supraliminal channels and outline an efficient protocol to achieve a supraliminal channel from a steganographic channel and an image hash. In section 6.3 we explain our overall philosophy of data hiding and how it differs from the traditional approach of imperceptibly modifying cover data. In section 6.4 we describe our software architecture, and in section 6.5 the results of our supraliminal channel. 51 6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING 6.2 Supraliminal Channels Unlike a steganographic channel, in which a multimedia object is indetectibly modified, a supraliminal channel chooses or controls the semantic content of the cover data itself. This can be achieved by a cover data generation program, or mimic function which transforms an input string reversably into a convincing piece of cover data. (30). However, we will show that overt content generation is not strictly necessary to construct a supraliminal channel. Supraliminal channels are not keyed, and are readable by the warden. The main purpose of such a channel is ensuring total robustness of the message in the presence of an active warden, even if the embedding and detection algorithm is known. This follows from an assumption that the warden is allowed to add noise, but not make any modifications that effect the semantic meaning of the cover data being sent. If Alice and Bob use such a channel to send a meaningful message, it will be readable to Wendy; however, they can send strings which resemble channel noise, so that Wendy cannot tell if an image is purposely generated. This is sufficient to transmit key data in order to accomplish key exchange, after which the shared key can be used in a traditional steganographic channel. The combination of a key exchange using subliminal and supraliminal channels achieves so-called pure steganography, that is, secure steganography without prior exchange of secret data. (29) 6.2.1 A two-image protocol for supraliminal encoding If one has a very low bitrate cover data hash function, it can be extended into a higher bitrate supraliminal channel by bootstrapping a keyed steganographic algorithm and extending the transmission over multiple steps. Suppose that Alice and Bob have access to a cover data hash algorithm h : I → {0, 1}n . As a toy example, a recipient can choose a single word that best describes an image I, or other descriptive text suitably derived from I, and let h(I) be the SHA-256 hash of that text. This alone is not suitable for a supraliminal channel, firstly because the bit rate is extremely low, and secondly because the hash cannot be inverted into a generation function. However, Alice can perform the following: 1. Alice chooses two natural cover images I and J. 52 6.2 Supraliminal Channels 2. Alice embeds a message m in J, using a robust keyed steganographic algorithm with the hash of I as a key: Jˆ = Embed(m, J, key = h(I)) 3. Alice sends Jˆ to Bob. ˆ 4. Bob confirms receipt of J. 5. Alice sends I. ˆ h(I)) 6. Bob (and Wendy) can extract the message m = Detect(J, This two-image protocol is illustrated in figure 6.1. Note that the message is public; the purpose of the channel is robust transmission without requiring a known key in advance, essentially robust publication. This approach to public-key steganography was introduced by Anderson, Petitcolas and Kuhn, requiring a keyed steganographic transmission followed by a broadcast of the key after the data is known to be securely received. (31) The authors point out that a broadcast of a key is suspicious, but can be plausibly explained in certain scenarios; this protocol exhibits such a scenario, by replacing the broadcast with the transmission of an arbitrary message. Figure 6.1: The two-image protocol for establishing a supraliminal channel from an image hash. Alice and Bob’s transmissions are each split across two images, with the semantic content of the second keying the steganographic content of the first. This approach to creating a supraliminal channel has several desirable advantages. First, it can use any cover data hash algorithm, which need not be invertible. Second, it does not require any algorithm to generate content, which is difficult. Instead, Alice chooses a key by arbitrary choice of natural content. 53 6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING 6.2.2 Public key encoding As mentioned above, the message m must be as statistically meaningless as channel noise—that is, it must resemble channel noise that would be found if Wendy attempted to decode an innocent message. In the sequel, we will show a channel where a computer animation, originally driven by a pseudo-random number generator, is instead driven by message bits. This means that if message bits are computationally indistinguishable from the PRNG output, it can be sent over the channel. This goal is trivially achievable if the PRNG is, say, a counter-mode AES keystream. For this reason, we consider the problem of packaging key data to resemble n-bit strings of i.i.d. fair coin flips. Public keys are randomly generated, so it may seem that they can be convincingly disguised as random strings; however, keys have structure, as well as unlikely mathematical properties. An RSA key’s modulus is not any random number, but one that just happens to be difficult to factor—a relative rarity among random strings. To achieve random packaging of a public key, we use the Diffie-Hellman key exchange with a public prime p for which the discrete logarithm function is hard, and a generator for Z/pZ. Alice must transmit a single value A = g a transmits the value B = gb (mod p) for a secret a, and Bob (mod p). Thereafter their session key is B a = g ab = Ab . The value of A is uniformly chosen over p − 1 values, and must be shaped so that it is uniform over all n-bit strings, where n = 1 + ⌊log2 p⌋. The algorithm to do so is simple: 1. Set n = 1 + ⌊log2 p⌋; 2. Generate a uniform n-bit number a; 3. if(0 < a ≤ p − 1){ set A = g a (mod p); } 4. else if (n ≥ p) : { repeat: { Choose b uniformly among {1, 2, · · · p − 1}; set B = g b (mod p); 54 6.3 Manufacturing a Stego-friendly Channel set A = p + B; } } while(A ≥ 2n ); 5. else set A = 0;1 This algorithm encodes a number of the form g a (mod p) such that any n-bit string is equally likely to be observed. 6.3 Manufacturing a Stego-friendly Channel The most common approach to steganography entails the subtle modification of a fixed but arbitrary piece of cover data. This must be performed so that no statistical test can distinguish between natural and doctored data. The challenge of this approach is that we have no complete statistical model for multimedia data, and cannot guarantee that any artificial modification of cover data can go undetected. A second approach to steganography avoids “natural” data and instead places bits in simpler, artificial content, for example TCP packet timestamps or the ordering of note-on messages in MIDI data. (32, 33). While this data may be statistically simpler to characterize, the same modeling problem prevents an absolute guarantee of secrecy; another problem with this approach is that discrete data is often easy to overwrite without harming the content. A third, extreme approach is to design a form of communication that makes an ideal cover for data embedding, and somehow convince the public to use it. Thus instead of using natural data, artificial data is made natural by placing it in common use. As a toy example, one could create a popular multi-player game or other multi-user application, and design a messaging protocol that makes considerable use of i.i.d. bit strings as random nonces. These could then be replaced by ciphertext to establish a covert channel. The obvious flaw in this plan is convincing a large group of people to start using and sending the type of data we want. Users will not adopt a new media format en masse 1 Key exchange therefore fails with probability 2−n , a virtual impossibility but necessary to make the distribution truly uniform. 55 6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING simply because it makes steganography harder to detect 1 . However, each emerging trend in social networking carries with it the possibility of an ideal stego channel. 6.3.1 Apple iChat In late 2007, Apple announced version 10.5 of their operating system, Mac OS X. This included an extension to their bundled iChat instant messaging software that allows the user to apply special effects to a video chat session. Special effects include camera distortions, various image filters, and the ability to replace the user’s background with a specific image or video file. The software allows users to drop in their own image and video backdrops. In January 2008, the operating system market share statistics for Mac OS X is a bit more than 7%. (35) The authors suspected that if the software allowed image and video backdrops, then it would allow arbitrary multimedia files adhering to Apple’s QuickTime standard. This includes Quartz Compositions, computer animations represented as programs rather than static multimedia data. Quartz Compositions are used within Mac OS X as animations for screen savers and within applications. They are composed of flow graphs of rendering and computation elements, called patches, that reside in user and system graphics libraries. The ramifications of a computer-generated backdrop in a popular video chat application are significant: computer animations are often driven or guided by pseudo-random number generators, and if by some device we could replace the pseudo-random data with ciphertext, then we have a supraliminal channel within a video chat application. As long as an animation uses pseudo-random data transparently enough that the PRNG bits can be estimated from the video at the receiver, then some of the semantic content of the video is generated from message bits. The fact that this is possible within the default instant messaging client of a popular operating system creates the opportunity for a plausible, but stego-friendly form of communication. 1 This approach was taken to its logical extreme in 2001, when a software developer announced a simple LSB-embedding steganographic tool on Usenet cryptography fora. When people pointed out the well-known statistical attacks on brute LSB embedding, the author responded with the astonishing argument that if only the public would start randomizing the LSB planes of their images, his stegoimages would be harder to detect. (34) 56 6.4 Software Overview 6.3.2 Cultural engineering Social engineering is the act of beating a security system by manipulating or deceiving people, for example into revealing confidential information or granting access to an adversary. (36) On a larger scale than the individual, we have the possibility for cultural engineering: achieving a security goal by swaying society in general towards a certain behavior. Popularizing a steganographically friendly form of communication is a form of cultural engineering. A cultural shift that has greatly affected the status of cryptography is the public embrace of the Internet for electronic commerce, and other applications requiring secure HTTP connections. This made the public use of encryption far more common; not only did this phenomenon make encryption less conspicuous, but it fostered a general public expectation of privacy-enhancing technologies. 6.4 Software Overview Our data-hiding channel is concealed in a computer animation, whose pseudo-random number generator occasionally outputs ciphertext. The overall transmitter architecture is shown in figure 6.2. The iChat application hosts a computer animated backdrop; the computer animation contains a doctored pseudo-random generator, that retrieves pseudo-random bits from an external server. The server can be fed data from an application, and outputs either pseudo-random bits or ciphertext bits. Figure 6.2: The transmission pipeline. Data is fed to a ciphertext server, polled by a doctored PRNG patch in an animation. The server is a rudimentary Tcl program with C extensions for generating pseudorandom data. It generates pseudo-random data using 128-bit AES in counter mode; 57 6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING when fed a request for data, of the form ”{okbkh} [number],” the server returns a number of octal, binary, or hexadecimal digits from its store. The request ”i {HkC} [data]” causes the server to input a data string specified either in hexadecimal or character format; upon future requests for random bits, this message data is exclusive-ored with the keystream. #!/usr/bin/tclsh set port 8080 proc answer {sock host2 port2} { fileevent $sock readable [list serve $sock] } proc serve sock { fconfigure $sock -blocking 0 -translation binary gets $sock line if {[fblocked $sock]} return fileevent $sock readable "" set response [myServe $line] puts $sock $response; flush $sock fileevent $sock readable [list serve $sock] } puts "Bit server running on port $port, socket=[socket -server answer $port]." vwait forever Figure 6.3: The 14-line ciphertext server, adapted from a 41-line web sever in (1). 6.4.1 Quartz Composer animations A Quartz Composition is a computer animation represented as a flow graph of fixed ”patches.” Patches are encapsulated procedures for rendering or computation with published inputs and outputs; they reside in system and user libraries, which can be extended with user-designed patches. The animation we used for our experiments is a randomly changing grid of colored squares diagrammed in figure 6.4. The ”ciphertext server” patch is a customized patch which connects to our server to retrieve pseudo-random data. The data is requested as a string of octal digits, which are converted to the hue of each column. The remainder of the animation uses standard Quartz Composer patches. 58 6.4 Software Overview Figure 6.4: Quartz composer flow-graph for our sample animation. At the top, the ”Ciphertext Gateway” is polled by a clock to update the state of the display. The animation contains two nested iterators to draw columns and rows of sprites (blocks). 6.4.2 Receiver architecture and implementation issues To establish a channel, the captured video at the receiver must be analyzed to estimate the pseudo-random bits used to drive the background animation. This requires that the animation uses its bits in a suitably transparent way, and that the background state can be estimated while partially occluded by a user. For our experiments, we capture the video on the receiving end as a video file for analysis; in the future we will implement a real-time receiver that polls the contents of the iChat video window. Using our ciphertext server and customized patch, a designer only needs to compose a computer animation, and write a corresponding C function to estimate the PRNG state given bitmap frames from the received video. The server, patch and receiver code 59 6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING is already written. One implementation issue we have encountered is sandboxing of Quartz compositions by Apple. Because of the clear security risk of video files that can covertly access one’s file system or connect to the Internet, QuickTime Player refuses to load any custom patches, and limits native patches that access the outside world, such as RSS readers. This prevented our customized patch from connecting to our server whenever it was installed as a backdrop for iChat. Fortunately, the sandboxing was relatively simple to override. By examining the binary files of existing patches, we found that Quartz Composer patches simply implement the class method +(BOOL) isSafe, which by outputs NO. Subclassing the patch class circumvents this measure. This cannot be accomplished through Apple’s public API, however; the public API does not allow a programmer to implement a patch directly, but only a “plug-in” object, a wrapper through which isSafe cannot be overridden. The ciphertext server patch had to be written directly as a patch, using undocumented calls 1 . Another implementation issue concerns random delays in rendering. When immersed in iChat, our animations do not render regularly, but experience random lag times between updates. This is a minor problem, because iChat does not skip frames of our animation; it merely delays animation frames over more frames. This only limits the speed of supraliminal transmission, and does not result in the dropping of message bits. To cope with the delays our decoder must be asynchronous, identifying from the received video when the animation has changed its pseudo-random state. This is easily accomplished with our animation, because block hues are dramatically altered across the entire frame during a state transition. 6.5 Results For purposes of designing an optimal animation, we measured per-pixel occlusion of the iChat application. Pixels can be occluded either by the user in the foreground, or by misalignment of the camera or changes in lighting that confuse the background subtraction algorithm. We first replaced iChat’s backdrop with a frame of bright green pixels, recorded the output as a video file, and computed the fraction of pixels in each 1 We are indebted to Christopher Wright of Kineme.net whose expertise on the undocumented API made our patch possible. 60 6.5 Results pixel position position that were blocked. An occlusion graph for a 16-minute video chat is shown in figure 6.5. Figure 6.5: Average occlusion of background pixels over 31369 frames (16 minutes 28 seconds) of a video chat session. Based on this data we determine that a pseudo-random state should be estimable given the upper left and right corners of video. We produced several animations that only utilized the top band of the frame, and could easily embed more data per frame by incorporating more of the sides of the frame. One animation featured a grid of colored blocks. We transmitted this backdrop with varying periods and grid sizes—each column transmits three bits of data. Decoding was accomplished by estimating the hue of the topmost row of blocks. The results of decoding are shown in figure 6.6. Attempted period (s) Attempted period (s) Blocks 0.5 0.25 0.125 0.05 Blocks 0.5 0.25 0.125 0.05 8 0.002 0.005 0.087 0.095 8 2.01 4.25 8.14 7.58 16 0.00 0.004 0.069 0.101 16 2.07 4.08 7.62 8.31 32 0.003 0.095 0.045 0.062 32 2.02 4.12 7.57 7.74 64 0.002 0.0015 0.003 0.006 64 2.05 4.01 7.36 8.21 Figure 6.6: BER (left) and block rows transmitted per second (right) with our animation/decoder pair, for several block sizes and attempted periods. These results show a limit of approximately 8 animation frames per second effec- 61 6. COVERT COMMUNICATION VIA CULTURAL ENGINEERING tively transmitted, and to our surprise the BER decreased as we increased the block size. We believe this occurs because errors are largely due to intermittent errors, rather than general noise prohibiting hue estimation. This result indicates that backdrop computer animations can easily leak hundreds or thousands of bits per second. Hence this channel is suitable for public key exchange, and has a suitable bitrate for transmitting text messages as well. Figure 6.7: Supraliminal backdrops. On the left, colored blocks. On the right, the user’s office superimposed with rain clouds. The animations are driven either by pseudo-random bits or ciphertext, estimable by the recipient. 6.6 Discussions and Conclusions We have illustrated a means to smuggle ciphertext into a pseudo-random data source driving a computer animation, which can then be added as a novelty backdrop for a popular video chat application. The pseudo-random bits can be estimated at the other end of the channel, allowing the leakage of ciphertext. Rather than hiding our information imperceptibly in the background noise of a multimedia file, we embed quite literally in the background, in an extremely perceptible fashion. Rather than achieving imperceptibility, we instead seek a different goal: plausible deniability. The background animation is obvious but plausibly deniable as innocent. Since it comprises the semantic content of the video, a warden cannot justifiably change it. Our channel is designed so that new and creative animation/decoder pairs are easy to create. Once a designer has installed our ciphertext/PRNG patch and the corresponding server, designing a transmitter is a simple matter of using the patch in a 62 6.6 Discussions and Conclusions Quartz Composition, a step that requires no coding. Decoding requires the implementation of a C function that inputs each video frame as a bitmap, and can estimate message bits. If the designer creates an animation whose bits are suitably estimable, for example our backdrop of colored squares, the decoder can be easily written. Data rates of thousands of bits per second are possible if the animation is properly designed. We hope that this framework will spur creative development of animation/decoder pairs, and encourage further search for possible supraliminal channels in emerging social networking applications. 63 7 DLSR Lens Identification via Chromatic Aberration 7.1 Introduction on Camera-related Digital Forensics 7.1.1 Forensic Issues in The Information Age There is no doubt that more and more digital devices are going to be developed and deployed in people’s daily life, in order to facilitate information capture and access. As those digital devices play a more and more important role in our life, a couple of fundamental security problems are going to have unprecedented level of importance. Digital forensics, a relatively new and ever-evolving research field, is expected to provide solutions to the following problems: 1. Source Identification: the origin of a certain piece of information. 2. Forgery Detection: the integrity of a certain piece of information that was generated by a digital device. 3. Timing Identification: the timing or the sequence of several pieces of information. The first problem involves with the unique identification feature of every digital device. The second problem involves with the consistency of the identification feature presented in that piece of information. And the identification feature for forgery detection could be either from the digital device or the information content been generated. The third problem involves with the gradual evolvement of the identification feature as time goes by.1 All three problems point to the same key: the identification feature. 1 In practice, it is degradation in most cases due to the second law of thermodynamics. 64 7.1 Introduction on Camera-related Digital Forensics 7.1.2 Digital Camera: ‘Analog’ Part and ‘Digital’ part In the big picture given by physics, the diversity of our universe stems from the so called Symmetry Breaking. In other words, it is symmetry breaking that makes our world so colorful. As a consequence, the identification feature of everything ultimately originates from this symmetry breaking. In the perspective of an engineer, most digital devices are an integration of hardware and software. For the hardware part of a digital device, engineers commonly use a term imperfection, which is related to symmetry breaking but also more specific. For the software part of a digital device, the uniqueness of any algorithm can also be deemed as cases of symmetry breaking. Today’s digital camera is a complex system consists of numerous hardware and software components, such as glass optics, anti-aliasing filter, infrared filter, color filter array, micro lens, image sensor, and image processing chips etc. Given the fact that no man-made device can be free from any imperfection, camera hardware imperfections can always be used as an identification feature to distinguish different cameras. In addition, when it comes to the in-camera proprietary software, at least part of the algorithm or parameter settings is very likely to be different from one camera manufacturer to another. In general, while the hardware imperfection is capable of distinguishing different copies of the same camera, software ‘signature’ only works for distinguishing different models or only different manufacturers, since the in-camera processing software in different copies of the same model is most likely the same.1 This capability difference is essentially due to the analog nature of any physical device and the digital nature of any in-device software. 7.1.3 Identification Feature Category From the perspective of three different applications, the task of source identification is to extract the most effective and robust identification feature set reflecting the inner workings of a camera; the task of forgery detection is to detect the inconsistency within a produced image, either the inconsistency within the camera fingerprints or the inconsistency within the image semantic content; since all digital cameras would age 1 Although the camera firmware could be different due to upgrading. 65 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION with time, their ‘fingerprints’ are expected to change with time as well. For example, it turns out that the cosmos radiation actually plays an important role in changing the PRNU pattern of a camera image sensor. (37) The identification features been explored in previous research works fall into four categories, i.e. hardware induced feature, software induced feature, picture semantic content feature, and statistical feature as been manifested in the final picture. Unlike the first three features that focus on one specific origin, the last statistical feature is an accumulated result of all camera imaging and processing, including both hardware imperfections and processing softwares. 1. The hardware induced ‘analog fingerprints’: • Lens geometric distortion, such as the radial distortion.(38) • Lens chromatic aberration, such as the lateral chromatic distortion.(39, 40) • Image sensor imperfections, such as the PRNU (41, 42), and the age of PRNU (43). 2. The software induced ‘digital fingerprints’: • The color filter array(CFA) interpolation matrices. (44, 45) • The double JPEG quantization effects. (46) • The compression configurations stored in the JPEG header. (47) 3. The identification features coming from the semantic content of a picture: • The lighting consistency of the objects in a picture. (48). 4. The statistical patterns presented in a picture: • A group of selected feature patterns relating to color, image quality, and wavelet statistics. (49) • The camera response function. (50, 51) As a remark on the statistical identification features, we noticed that few research works had evaluated how the camera shooting modes would affect the detection reliability. Camera shooting mode settings is a frequently visited functionality for many 66 7.2 DSLR Lens Identification camera users. Take a Canon’s camera for example, there are quite different shooting modes such as ‘portrait’, ‘landscape’, and ‘standard’ etc. Those modes usually have different combinations of sharpness, contrast, saturation, and color tone settings, which are expected to produce different statistical features in the corresponding final pictures. Except for the CFA interpolation matrix related features, fortunately, most of the identification features belonging to the first three categories are much less likely to be affected by the in-camera shooting modes settings. 7.2 DSLR Lens Identification 7.2.1 Motivation With the popularization of point-and-shoot cameras, Digital Single-Lens Reflex (DSLR) cameras have become popular among users dissatisfied with the image quality produced by compact point-and-shoot cameras. This trend is reflected by the statistics released by the image hosting website flickr.com. (52) One major difference between point-and-shoot cameras and the DSLR is that DSLR cameras allow the user to change lenses. As a result, the price of a single DSLR interchangeable lens is often comparable to the price of a DSLR body. For this reason, it is worth exploring the forensic identification of features inherent to camera lenses, as well as the identifiable features of camera bodies such as the PRNU of the camera CCD or CMOS sensors. (41, 42) On the technical side, in the broad sense of all glass lenses, lens chromatic aberration features are highly variable, as revealed by previous papers and as will be further demonstrated in this paper. If we have the goal of lens classification, we definitely want a deeper understanding of the working of the lenses in order to yield stable identification features. The lens parameters are generally more accessible for interchangeable DSLR lenses than their counterparts built in point-and-shoot cameras. 7.2.1.1 Lens Aberrations Due to its physical limitations, it is impossible to build a glass lens system without any aberration. Besides others, there are two common categories of lens aberration, geometric distortion and chromatic aberration (hereafter CA). Geometric distortion 67 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION commonly manifests as pincushion or barrel distortion in real life pictures, while chromatic aberration often appears as color fringes, especially in the corners of pictures. While there are many causes that can possibly account for different color fringes in an image taken by a digital camera, there are two major types, Lateral CA and Axial CA1 . Both of them stem from the fact that the refractive index of lens depends on light wavelength. To be more specific, LCA is relevant to obliquely incident light, and could be explained as a lens magnification factor that depends on the light wavelength. It often manifests in telephoto lenses due to their large image magnification factor. On the other hand, ACA refers to incident light that is parallel to the lens optical axis. It is due to the incapability of a lens to focus different colors in the same focal plane. ACA manifests under large lens aperture. Note that since geometric distortion like pincushion and barrel distortion are due to the magnification ratio variation along the radial direction of the image, it will interact with LCA which is also relevant to magnification ratio. In fact, in a real life picture, all sorts of lens aberrations coexist and are intertwined with each other. If our goal is to collect forensic data for lens identification, instead of isolating each of them, it might be more efficient to capture all at a time. With this philosophy in mind, we will use the term CA for the rest of the paper. 7.2.1.2 Previous Works Currently, there are three applications for the measurement of lens aberration, i.e. source identification, forgery detection, and CG–Natural image identification. Choi and Lam proposed to use the metric of lens geometric distortion to aid in identifying source cameras. (38) Johnson and Farid proposed for the first time to use LCA as a possible approach to camera identification as well as forgery detection. (39) Their estimation work is based on a global linear model. Gloe and Borowka etc. suggest to use local estimation instead in order to achieve better computation efficiency. (40) Also, they found out that LCA is not necessarily to be linear in real case. Results from both Johnson and Gloe showed relatively large estimation variation which makes it impossible to perform lens identification for different copies of the same lens. 1 Lateral CA is also known as Transverse CA, Axial CA is also known as Longitudinal CA. 68 7.2 DSLR Lens Identification While both Choi and Gloe noticed the large variation of lens aberration patterns under different focal length settings, none of the previous papers pointed out another factor, the lens focal distance. Focal distance plays an equally important role when it comes to the lens aberration pattern. Consider a real zoom lens for example: a zoom lens normally changes focal length by rotating or pushing the zoom-ring. By rotating or pushing, the geometric relationship of the glass lenses changes greatly, which will result in a quite different lens aberration pattern. Likewise, a lens normally changes its focal distance by rotating or pushing the focal-ring, which has the same type of impact on lens aberration patterns. Therefore, in order to obtain a consistent lens aberration pattern, only keep the focal length unchanged is not enough, focal distance also has to be fixed. 7.2.2 7.2.2.1 A New Shooting Target Pattern Misalignment Problems As a convention, previous lens forensic papers as well as many lens distortion calibration papers use a checkerboard pattern image or some deformed checkerboard pattern image as shooting target. (40, 53, 54, 55) While a checkerboard pattern works well for image calibration, it may not be a good choice when it comes to lens identification for two important reasons. In practice, even with a very good tripod, there is inevitable wobbling incurred by the action of changing lens, and to a lesser degree, by the mirror flipping inside any DSLR camera body. Those slight camera movements will be translated into the shifting of the shooting area, i.e. the portion of the target on each picture will be slightly different from shoot to shoot, given the task of comparing multiple lenses. As a result, if a checkerboard pattern is being used as shooting target, the positions of all corner points in each test picture are expected to be different. Consequently, misaligned CA patterns that were extracted from those misaligned test pictures of a checkerboard pattern can not provide a fair ground for lens comparison. With respect to a corner detection algorithm that is to be applied to test pictures of a checkerboard pattern, most of the commonly used corner detectors would also introduce misalignment. Partly due to the fact that there is not a unanimously agreed mathematical definition of ‘corner point’, most corner detectors are unable to produce robust results. (56, 57) In other words, even if there is absolutely no camera movements, 69 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION with all shooting settings kept the same, the positions of the detected corner points may not always be consistent(in pixel precision) from shoot to shoot. 7.2.2.2 White Noise Pattern In order to address the pattern misalignment problem, we propose to use a white noise pattern as shooting target. By using a white noise image like figure (7.1), we can achieve error-free alignment by applying one rigid block division to all test pictures, without worrying about either the minor camera movements or the instability inherent in a corner detector. Furthermore, we can save the running time needed for corner detection. Figure 7.1: White noise pattern image as shooting target. 7.2.2.3 Paper Printout vs LCD Display It is worth noting that a LCD displaying a white noise pattern can not be used as shooting target. Instead, a paper printed version of white noise pattern should be used. The reason is that each single pixel on a regular LCD screen typically consists of three subpixels, i.e. in the formation of either R-G-B or B-G-R stripes along the horizontal direction. Therefore, at an edge where a white pixel meets a black pixel, there is 1/3 pixel of Red or Blue, depending on the display’s subpixel arrangement. 70 7.2 DSLR Lens Identification Given the fact that today’s camera sensor greatly outresolves today’s LCD display, this 1/3 color pixel would be magnified on the test pictures taken. Thus this ‘pseudo CA pattern’ from a LCD display would overwhelm the real CA pattern from a camera lens and make the extracted CA pattern useless. 7.2.3 Major Feature-Space Dimensions of Lens CA Pattern Among all user-controllable lens parameters, there are three major ones that play an important role in shaping a lens’ CA pattern. While focal length and aperture size have already received attention in previous papers, to our best knowledge, the focal distance as another equally important factor, has never been revealed. As a consequence, while previous lens forensic papers succeeded in using CA patterns to distinguish lenses of different model, the CA patterns they obtained were not stable enough to fulfill the task of distinguishing different lens copies of the same model. In this paper, we will demonstrate that after taking this third parameter into account, we are able to obtain a stable enough CA pattern that is capable of distinguishing different lenses of the same model. 7.2.3.1 Focal Length Focal length has already been identified as one of the key variables in shaping a zoom lens’ CA pattern. (38, 40) However, in practice, it is worth noting that the focal length information recorded in the EXIF metadata tag1 has certain precision granularity. Take Canon’s EF-S 18-55mm IS and EF 17-40mm lens for example: the finest focal length step been reported in the EXIF metadata tag is either 1mm or 2mm, depending on the focal range. Therefore, given two pictures taken by the same lens and with the same EXIF focal length reading, the underlying CA pattern might actually be different. Since the stability of CA pattern is key to the success of lens identification, the EXIF focal length reading should be taken with caution. Figure (7.2) shows how lens CA pattern varies at different focal length. A Canon EF-S 18-55mm IS lens was used to produce the two figures, both with the same focal 1 EXIF stands for exchangeable image file format, it is a specification for the image file format used by digital cameras. 71 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION distance but different focal length, i.e. 50mm1 for the left plot and 55mm for the right plot. 50 50 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 Figure 7.2: Two B-G Channel CA Patterns with one lens works at different focal length. 7.2.3.2 Focal Distance Mechanically speaking, the action of adjusting lens focal length(also called zooming) changes the arrangement of the groups of glasses inside a zoom lens, which would finally be reflected as CA pattern change in the pictures taken. On the other hand, focal distance is adjusted during autofocus or manual focus procedure in order to achieve sharp focus at the distance where the main shooting object lies. During focusing procedure, the glass arrangement inside a lens also changes, which is very similar to what is happening during a zooming procedure. This means focal distance is of equal importance as focal length in shaping the lens CA pattern. Moreover, focal length only plays a part in zoom lenses, while focal distance plays a part in virtually all modern digital lenses, encompassing both zoom lenses and prime lenses2 . Figure (7.3) shows how lens CA pattern varies at different focal distance. The identical lens used to produce Figure (7.2) was used here. This time, focal length was fixed at 55mm. The focal distance is around 2.8m for the left plot, and around 2.2m 1 2 50mm is from the EXIF metadata, hence it is not a very precise number. The focus length of a prime lens is a constant. 72 7.2 DSLR Lens Identification for the right plot1 . Note that the rotational shift of the lens focus ring, which was in response to the focal distance change, was only within 2mm, a very tiny amount. 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 Figure 7.3: Two B-G Channel CA Patterns with one lens works at different focal distance. 7.2.3.3 Aperture Size and Other Issues Another important factor is the lens aperture size. A CA pattern typically tends to get smaller as the lens aperture is stepped down. Yet another user-controllable lens feature, the lens image stabilization system, may also affect lens CA pattern. Due to the modelling complexity involved with an image stabilization system, we will not investigate how the working of image stabilization would affect lens CA pattern in this paper. In practice, we can avoid the impact of lens image stabilization by simply turning it off during test. 7.2.3.4 Size of a Complete Lens CA Identification Feature Space For the sake of digital forensics, the identification data needed for a camera sensor such as PRNU is one matrix of the same size as the sensor’s pixel count per RGB channel. (42) When it comes to lens CA patten, one can think of one 2D vector matrix per R-G and B-G channel pair, plus three extra dimensions of the focal length, the focal distance, and the aperture size. 1 The distance here is the length from the shooting target to the camera image sensor. 73 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION Essentially, the difference between lens identification and sensor identification boils down to the complexity difference of the physical constructions of a camera lens as versus an image sensor. For a specific camera sensor with resolution x × y, also let the CA estimation block size be xb × yb . Let fl and fd denote focal length and focal distance, and let A denote the total number of possible lens aperture size. We can formulate the required data volume ratio between lens CA pattern and sensor PRNU pattern. ||Dlens || ||Dsensor || = = x xb · y yb · 2 · 2 · fl · fd · A x·y·3 4 fl · fd · A · 3 x b · yb (7.1) In practice, typical values for xb , yb , and A are on the scale of 101 . The ratio then depends on the ‘granularity’ of fl and fd . The granularity here means a certain quantization scale, within which, the lens CA pattern can be deemed as unchanged. For a practical lens forensic problem with a lens database, the granularity of fl and fd would largely depend on the total number of lenses to be distinguished. In the experiment section, we will show that for fixed fl and fd , the CA pattern data obtained is enough to distinguish several copies of the same lens. It is a future research topic to determine the appropriate granularity of fl and fd for the task of distinguishing a large number of copies of lenses. 7.2.4 Experiment 7.2.4.1 Test Picture Shooting Settings The shooting settings been adopted in our experiment is listed as follows: • Camera ISO is fixed at the minimal value1 so as to eliminate interference from camera sensor noise. • Camera’s output format is set to be ‘RAW’, in order to eliminate the interference from the JPEG compression. 1 ISO= 100 in our case. 74 7.2 DSLR Lens Identification • Lens aperture is always fixed at the maximum in order to yield larger ‘overall’ chromatic aberration estimation. For manufacture precision reasons, both geometric and chromatic aberrations typically deteriorate at maximal aperture. Larger ‘overall’ CA readings are expected considering the fact that all those aberrations are intertwined with each other. • Turn off all in-camera image processing options such as Canon’s picture style 1 and Lens Peripheral Illumination Correction, so as to eliminate contamination from in-camera image processing. The six Canon DSLR lenses used in our experiment are listed in table (7.1). The timing sequence of all 36 test pictures is listed in table (7.2), in sequential order. In the ‘Lens ID’ column of table (7.2), the numbers after the symbol @ denote the focal length that each zoom lens was set to work at. Lens Model Lens ID EF 50mm f/1.8 EF-S 18-55mm IS EF 17-40mm L1,L2,L3 L4,L5 L6 Table 7.1: Six Canon DSLR lenses. Lens ID L1 L2 L3 L1 L4@55mm L5@55mm Picture p1-3 p4-6 p7-9 p10-12 p13-15 p16-18 Lens ID L4@55mm L4@18mm L5@18mm L4@18mm L6@17mm L6@40mm Picture p19-21 p22-24 p25-27 p28-30 p31-33 p34-36 Table 7.2: The sequential order of test pictures. The purpose of this shooting order is to prove that our scheme works even in the presence of camera movements incurred by the action of switching lens. Take the first 12 pictures p1-12 for example: there were three cumulative lens switching actions before picture p10, p11, and p12 were taken. And there were fewer switching actions before picture p4-9 were taken. Therefore, if the CA patterns extracted from p10-12 show more similarity with the CA patterns from p1-3, than the CA patterns from picture p4-9 do, 1 Canon’s picture style includes in-camera processing of sharpness, contrast, saturation, and color tone. 75 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION it would indicate that our method is robust to minor camera movements incurred by lens switching actions. The focal distance for all 36 test pictures are the same, i.e. a fixed focal distance. 7.2.4.2 CA Pattern Extraction A Canon EOS 7D DSLR camera body was used in our experiment, which has a 18 mega-pixel CMOS image sensor. In order to reduce the computational complexity, we choose to shoot only at a resolution of 2592 × 1728 instead of its native resolution 5184 × 3456, i.e. 4.5 mega-pixel instead of the full 18 mega-pixel. The block size we used during R-G and B-G channel-wise correlation is 48 × 48 pixels, thus gave two 2D vector CA pattern matrices of the size 53 × 351 for each test picture. The standard correlation coefficient measure was used as our criterion for the search of best channel-wise CA shift. Image interpolation is needed in order to extract CA shift at sub-pixel precision. While 5× bicubic interpolation is enough to reveal CA patterns for the low-end Canon EF-S 18-55mm IS zoom lens, higher interpolation rate is needed for the EF 50mm f/1.8 prime lens and the high-end EF 17-40mm zoom lens, which is within expectation. Therefore, 10× bicubic interpolation was used in our experiment. In the CA pattern extraction procedure, we were essentially brute-forcing the best channel-wise CA shift for every 48×48 block of a test picture. In general, the extraction algorithm we used is very similar to the algorithm described in the paper by Gloe and Borowka. (40) The main difference is that ours is free from the hassle of corner detection, thanks to the use of white noise pattern. 7.2.4.3 CA Pattern Comparison Results The similarity between the CA patterns that were extracted from all 36 pictures can be partly revealed by the four plots in figure (7.4) and figure (7.5). For each plot, a CA pattern from one specific picture is used as reference pattern, and all 36 CA patterns(including the reference itself) are compared with this reference pattern. The total number of mismatching CA vectors are recorded. Thus, if a picture has only a small number of mismatch counts, it means this picture’s underlying CA pattern is very 1 Pixels in picture boundary are not used in our experiment. 76 7.2 DSLR Lens Identification similar to the underlying CA pattern of the reference picture, which further indicates those two pictures are likely to be taken by the same lens. CA Pattern Mismatch Plot −− reference to p31 1800 1600 Mismatch counts of CA vectors 1400 Mismatch counts of CA vectors B−G R−G 1800 1600 1200 1000 800 600 400 1400 1200 1000 800 600 400 200 200 B−G R−G 0 CA Pattern Mismatch Plot −− reference to p22 2000 0 5 10 15 20 Test Picture ID 25 30 35 0 40 0 5 10 15 20 Test Picture ID 25 30 35 40 Figure 7.4: Mismatch plot. Left: CA pattern of p31 was used as reference. Right: CA pattern of p22 was used as reference. CA Pattern Mismatch Plot −− reference to p13 2000 1800 B−G R−G 1600 1400 1400 Mismatch counts of CA vectors Mismatch counts of CA vectors 1600 1200 1000 800 600 1200 1000 800 600 400 400 200 200 0 CA Pattern Mismatch Plot −− reference to p1 1800 B−G R−G 0 5 10 15 20 Test Picture ID 25 30 35 40 0 0 5 10 15 20 Test Picture ID 25 30 35 40 Figure 7.5: Mismatch plot. Left: CA pattern of p13 was used as reference. Right: CA pattern in p1 was used as reference. The left plot of figure (7.4) uses the CA pattern underlying in p31 as reference. All 36 CA patterns were then compared with this reference pattern. As we can see, only the mismatch counts from p31, p32, and p33 are small, which echoes the fact that only those three test pictures were taken by lens L6@17mm. This plot demonstrates that our scheme works well for either distinguishing lenses of different model or distinguishing 77 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION one single lens works at different focal length. Likewise, the right plot of figure (7.4) uses p22 as reference. The mismatch counts for Lens L1, L2, L3, and L6 are very high. More importantly, it shows that the mismatch counts for all pictures taken by lens L4 are consistently less than those by L5, in the presence of one more cumulative minor camera movements. This plot demonstrates that our scheme is able to distinguish different copies of the same lens that all work at the same focal length, in the presence of minor camera movements incurred by the action of switching lens. The left plot of figure (7.5) is very similar to the right plot of figure (7.4), only its reference been changed from p22 to p13, namely from a picture by L4@18mm to a picture by L4@55mm. In this plot, the two groups of pictures taken by L4@55mm are well separable from all other groups, including the three pictures taken by L5@55mm. Perhaps the right plot of figure (7.5) provides the most convincing proof out of all four plots, since distinguishing among EF 50mm f/1.8 prime lenses is more challenging than distinguishing among EF-S 18-55mm IS zoom lenses. This is because a prime lens typically has better control on chromatic aberration than a zoom lens, hence giving smaller CA vectors to follow. As we can see from the plot, while the space left for distinguishing gets smaller, picture p10, p11, and p12 that were taken by lens L1 are still well separable from pictures taken by lens L2 and L3, even after three cumulative lens switching actions. 7.2.5 7.2.5.1 Outlook Improve the Identification Capability There are three possible ways to improve the identification capability with the philosophy of ‘enriching’ the identification data set. Firstly, we can shoot test pictures with higher resolution. In our case, shoot at the full 18-mega pixel resolution. Secondly, we can use higher interpolation rate during CA pattern extraction. Lastly, instead of using CA patterns at a fixed focal distance, we can compute several ‘slices’ of CA patterns at different focal distance, and combine those ‘slices’ of CA patterns together to form a more complete identification data set. 78 7.2 DSLR Lens Identification 7.2.5.2 The Potential of the Bokeh Modulation This discussion stems from an observation in our experiment. For the same lens and all shooting settings kept the same, two test pictures pin and pout were taken in good focus and out of focus respectively. We notice that the CA pattern extracted from pout look like a ‘transformed’ version of the pattern extracted from pin . Figure 7.6: Bokeh pattern: the polygon shapes are due to the 8-blade aperture diaphragm being slightly closed. In the perspective of photography, the term bokeh refers to the way a lens renders the out of focus points of light. (58) This bokeh pattern depends on lens structures such as the aperture diaphragm shape and the aperture size, as well as focal distance and focal length. Figure (7.6) is a real life picture showing bokeh effects. The whole point behind this phenomenon is: if the bokeh modulation could be modeled and calculated for each individual lens, there is hope to identify the corresponding lens by checking certain types of real life pictures, which contain well defined image regions that are either in good focus, or out of focus. To be more specific, consider three types of image region containing edges or corner points: (a) in good focus; (b) inside focus; (c) outside focus. For image region of type (a), the CA pattern extraction is the same as from test pictures of white noise pattern image. For image region of type (b) and (c), the CA pattern extracted is expected to be the ‘classical’ lens CA pattern subjects to the corresponding ‘bokeh modulation’. 79 7. DLSR LENS IDENTIFICATION VIA CHROMATIC ABERRATION In summary, the ‘bokeh modulation’ might be deemed as an extra dimension in the lens identification feature space, which has the potential to enable the identification of individual lens from certain types of real life pictures. The desired real life picture should meet two requirements described as follows, • It should contain well defined objects that have discernible distance with respect to the focal plane, so that the modes of bokeh modulation can be retrieved from the ‘more complete’ lens identification feature space1 . • Those objects should contain regions with abundant edges or corner points, so as to facilitate CA pattern extraction. 7.2.6 Conclusions In a real life picture, different types of geometric distortion and different types of lens chromatic aberration are always intertwined with each other. For the application of lens identification, there is no need to isolate each one of them. In fact, we believe it might be more efficient if capturing them as much as possible as a whole. For the first time, we proposed to use a white noise pattern image as the shoot target. The biggest advantage of a white noise pattern over the conventional checkerboard pattern is, it solves the pattern misalignment problem. In addition, it frees us from the complexity and the instability of corner detection. Another contribution of this paper is determining that the lens focal distance plays an important role in shaping the lens CA pattern. With this finding in hand, we further demonstrated that if we fix the lens focal distance, we are able to obtain a stable enough CA pattern that can fulfil the task of distinguishing different copies of the same lens. For a complete lens identification feature space in terms of chromatic aberration, a CA pattern at fixed focal length, fixed focal distance, and fixed aperture size is only ‘one slice’ of pattern sitting in the overall feature space. In this view, the reason why previous lens forensic papers were unable to obtain stable enough CA patterns for distinguishing different copies of the same lens now has a clear explanation. That is, they were most likely dealing with different ‘slices’ of the CA pattern locating at different positions on the focal distance axis in the overall feature space. 1 This ‘more complete’ lens identification feature space has four dimensions, i.e. aperture size, focal length, focal distance, and the newly added bokeh modulation. 80 7.2 DSLR Lens Identification There are two interesting follow-up research topics. Firstly, the ‘granularity’ of the focal length and focal distance is an interesting topic for future research. Lens identification among a large lens database might be feasible if we can fill the extra three dimensions of the lens CA identification feature space with enough ‘slices’ of CA pattern at different focal length, different focal distance, and different aperture size. Secondly, the ‘bokeh modulation’ might has the potential to enable lens identification from certain types of real life pictures. In other words, bokeh modulation might become the fourth dimension of a more complete lens identification feature space. 81 8 Benchmarking Camera IS by the Picture Histograms of a White Noise Pattern 8.1 Motivation Image Stabilization has been proposed during the 1970s, and Canon introduced the first image stabilized zoom lens to the consumer market in 1995. Ever since then, the Image Stabilization(IS hereafter) has been proven to be very useful in reducing camera motion blur in the situation of long exposure time hand-held shooting, especially when shooting with a telephoto lens. Nowadays, IS has become a default functionality for most digital cameras. Since different camera manufactures have developed their own proprietary IS techniques, 1 it would be interesting to design an accurate and consistent evaluation scheme so that people can compare them. 8.1.1 The Volume of Test Picture Space Due to the statistical nature of the human hand motion when holding a camera, a good IS benchmark can only be generated from a good amount of test pictures. Unfortunately, since most of today’s camera IS reviews still require human intervention, especially human visual assessment, for practical reasons, the number of test pictures under each set of shooting settings has to be limited to a very small amount(usually ∼ 101 ). (59, 60) As a consequence, it is unlikely to draw much con1 Canon uses IS (Image Stabilization), Nikon uses VR(Vibration Reduction), Sony uses SSS (Super Steady Shot), Panasonic and Leica use MegaOIS, Sigma uses OS (Optical Stabilization), Tamron uses VC (Vibration Compensation), and Pentax uses SR(Shake Reduction). 82 8.2 Our Automatic IS Benchmarks clusions based on the rough benchmarks that were generated from such a small test picture space. 8.1.2 The Consistency of an IS Metric Another big issue associated with human visual assessment lies in the blurring thresholding process, i.e. even for an expert, the threshold between a ‘soft’ test picture and a ‘very soft’ test picture is highly likely to be inconsistent from one review to another, which could lead to biased comparison results when performing IS comparison across different camera models. On the other hand, since there is resolution variation and contrast variation when it comes to cross comparison among different camera models, many generic image blur metrics should be used with caution since they may not be easily scalable with respect to resolution and contrast variations. 8.1.3 Our Goal The most straightforward way to increase the evaluation speed is to develop an automatic scheme without human intervention. Additionally, a well defined IS metric is needed to enable unbiased comparison across different camera models. With the capability of evaluating a large number of test pictures within a short time, we could obtain test readings in much finer granularity. Further more, with an accurate and consistent blur metric, we could have more bins of different degrees of blur than, for example, only four bins such as ‘Sharp’-‘Mild Blur’-‘Heavy Blur’-‘Very Heave Blur’. (59, 60) In a word, our goal is to generate an IS benchmark that is much more informative than current ones which either require human intervention or use some generic blur metrics that may not be consistent with respect to different camera models. 8.2 8.2.1 Our Automatic IS Benchmarks A Histogram based IS Metric Think about a black letter ‘T’ on a white paper as shooting target, the camera motionincurred blurring would manifest as pixel value blend around the boundary of the letter ‘T’, i.e. those blurred pixels would have pixel values that are between ‘pure black’ and 83 8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS OF A WHITE NOISE PATTERN ‘pure white’. More importantly, there is a clear correlation between the count of those blurred pixels and the degree of the camera motion, i.e. a larger camera motion is expected to create more ‘blurred pixels’. Furthermore, since both resolution(pixel count) and contrast have direct translation in test picture histogram, this ‘blurred pixel count’ is easily scalable with respect to resolution and contrast variations, which is an important feature for our camera IS cross comparison application. Take for example, given two cameras C1 and C2 of resolution 100 × 100 and 200 × 200 respectively. And the image sensor size of both cameras is the same. For the same amount of camera motion, the blurred pixels would be four times as many in test pictures taken by C2 as in those taken by C1 . Lastly, given the same amount of camera motion, by checking the remaining camera motion from an IS system, we would know how effective that IS system is. Therefore, it is reasonable to use the count of those ‘blurred pixels’ to build up a metric indicating the effectiveness of a camera IS system. It is worth to point out that our model considers the motion range as the indicator for motion blur but not the motion timing. For a generic motion measure, both the motion range and the motion timing should be considered. However, since we are working with the short-time(shutter time) human hand-held motion, which has its own pattern, we value the motion range as a more predominant factor than the motion timing in our camera motion blur evaluation process. 8.2.2 A Histogram-Invariant Shooting Target To simplify the subsequent histogram analysis, a histogram-invariant shooting target in the presence of camera shift would be desirable. The term histogram-invariant here means that this shooting target can always guarantee a 1 : 1 black-white area ratio no matter which part of the target a camera is shooting at. Also note that we use the term camera shift here to differentiate from camera motion. We use camera shift to refer the position change of a hand-held camera from shot to shot, and we use camera motion to refer the movement of a camera during a single shot. The commonly used checkerboard pattern image is not histogram-invariant in the presence of camera shift no matter how small each black or white block is. For a checkerboard pattern, the black-white area ratio depends on the actual image patch 84 8.2 Our Automatic IS Benchmarks been framed. In other words, the black-white ratio would fluctuate while a hand-held camera changes position from shot to shot. We discovered that a white noise pattern image with a fine enough black and white unit would meet this constant black-white area ratio requirement no matter which part of the image target is been framed. 8.2.3 Our Algorithm With the two crucial building blocks ready, we designed an evaluation procedure as follows: 1. Under each set of shooting settings, take one motion-free test picture of the white noise target(possibly with a tripod), and use its histogram for future reference. 2. Under each set of shooting settings, take enough amount of test pictures of the white noise target with camera motion coming into play. 3. Subtract the histogram of each test picture taken in step 2 by the corresponding reference histogram obtained in step 1. 4. For each histogram obtained in step 3, count the number of ‘blurred pixels’ falling in the middle range of the histogram, and use this count as the IS metric. 8.2.3.1 The Rationale behind the Histogram Subtraction Operation Although our white noise pattern image only consists of ‘pure’ black and white, due to the imperfections in any optical imaging system as well as the variation of the environment lighting, even for a motion-free test picture, its histogram would not consist of only two spikes. Therefore, instead of counting ‘histogram middle range pixels’ directly in each test picture taken in step 2, we need to perform histogram subtraction beforehand in order to make sure that we are counting the real ‘blurred pixels’ caused by camera motion. For each test picture taken in step 2, which has various degrees of motion blur, we can think of one corresponding motion-free reference picture ‘behind’ it, so that the motion blur benchmark depends on the comparison between this real test picture and its presumed motion-free reference picture. Thanks to the histogram-invariant feature of our white noise pattern target, all those presumed motion-free reference pictures 85 8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS OF A WHITE NOISE PATTERN would have a same histogram as the reference picture we took in step 1. Therefore, the histogram comparison between each test picture and their respective presumed reference picture is equivalent to the histogram comparison between each test picture and the reference picture we took in step 1. By subtracting off the reference histogram(not the reference picture) we obtained in step 1, we can make sure only the real ‘blurred pixels’ are counted for the final IS metric. When compared to other evaluation methods that perform subtraction directly on blur metric, such as in (60), according to the Data Processing Inequality in information theory (61), histogram subtraction is more meaningful since it preserves more valuable information which is helpful in generating the final blur metric. In other words, direct subtraction on generic blur metrics may not bear a tractable physical meaning and hence may not produce useful camera motion blur metric. 8.3 8.3.1 Experiment Synthetic Motion Blur Convolution with a motion blur kernel is one way to model linear motion blur. (62) We firstly ran our algorithm on synthetically blurred images that were generated by convolution. Figure (8.1) illustrates how the motion range would affect the histogram of the synthetically generated images. Note that the motion range corresponds to the number of non-zero coefficients in a motion blur kernel, and it is different from motion length, which denotes the motion translation in pixel. The motion blur kernel is a 2D matrix in general, and it degenerates to a vector for both horizontal and vertical cases. For instance, for horizontal and vertical cases, the number of non-zero kernel coefficients equals to the motion length; for our 45 ◦ diagonal case, the number of non-zero kernel coefficients are 7, 17, and 25, when the motion length is 3, 7, and 11 respectively.1 In figure (8.1), the first column are the histograms of the original motion-free white noise pattern image. The remaining three columns are histograms of the images that were subject to motion blur kernels with motion length set to be 3, 7, and 11 respec1 We use the fspecial(‘motion’,motionLen,angle) function in Matlab to generate those motion blur kernels. 86 8.3 Experiment Motion−free 6 2.5 2.5 1.5 1 0 50 1 100 150 Pixel Value 200 0 250 Motion−free x 10 2 1.5 1 0.5 0 50 100 150 Pixel Value 200 0 250 motionLen = 3 6 x 10 2 1.5 1 0.5 0 50 100 150 Pixel Value 200 0 250 motionLen = 7 6 x 10 2.5 2.5 2.5 2.5 1.5 1 0.5 0 1.5 1 0.5 0 50 100 150 Pixel Value 200 0 250 Motion−free 6 x 10 Pixel Count 3 Pixel Count 3 2 2 1.5 1 0.5 0 50 100 150 Pixel Value 200 0 250 motionLen = 3 6 x 10 0 50 100 150 Pixel Value 200 0 250 motionLen = 7 x 10 0 1 0 50 100 150 Pixel Value 200 250 0 Pixel Count Pixel Count Pixel Count 3 2.5 0.5 2 1.5 1 0.5 0 50 100 150 Pixel Value 200 250 0 50 100 150 Pixel Value 250 200 250 motionLen = 11 x 10 3 2.5 1 0 6 3 0.5 200 motionLen = 11 1 2.5 2 250 0.5 6 1.5 200 2 3 2 100 150 Pixel Value 1.5 2.5 1.5 50 x 10 3 2 0 6 3 Pixel Count Pixel Count 1.5 0.5 6 Pixel Count 2 Pixel Count 2.5 Pixel Count 2.5 2 motionLen = 11 6 x 10 3 0 Diagonal motionLen = 7 6 x 10 3 0.5 Vertical motionLen = 3 6 x 10 3 Pixel Count Horizontal Pixel Count x 10 3 2 1.5 1 0.5 0 50 100 150 Pixel Value 200 250 0 0 50 100 150 Pixel Value Figure 8.1: Different motion ranges and their resulting histograms. tively. The histograms in figure (8.1) are also organized in three rows, with motion blur effect along horizontal, vertical, and 45 ◦ degree respectively. As we can see from figure (8.1), larger motion range would result in more ‘blurred pixels’ in the middle range of the histogram. Or equivalently, less pure white and black pixels on the two ends of the histogram. Additionally, the number of ‘blurred pixels’ is irrelevant to the motion orientation as long as the number of non-zero kernel coefficients are kept the same, which is desirable for our application. Figure 8.2: Image crops: the original image and three synthetically blurred images with different motion ranges. Figure (8.2) are the crops of the original motion-free white noise pattern image and three synthetically blurred images. The three blurred ones correspond to the histograms 87 8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS OF A WHITE NOISE PATTERN locating at (1,2), (2,3), and (3,4) in figure (8.1) respectively. 8.3.2 Real Camera Motion Blur Followed by our success in synthetic images, we turned to real camera motion blur. In our experiment, with all shooting settings kept the same, we took a group of 11 test pictures. The third test picture was chosen to be the reference picture, since it is the sharpest among all 11. The histogram of this reference picture as well as its center crop are shown in figure (8.3). Note that this histogram is quite different from the ideal case as shown in the first column of figure (8.1), which only contains two spikes at each extreme ends. Histogram of the reference picture 5 12 x 10 10 8 6 4 2 0 0 50 100 150 200 250 Figure 8.3: The reference picture. Left: its histogram; right: the center crop. Figure(8.4) are the center crops of four test pictures, and figure (8.5) is our motion blur metric for all test pictures. Figure 8.4: From left to right: center crop of the picture 2, 9, 5, 10 As we can see, our motion blur metric in figure (8.5) matches very well with the perceived extent of motion blur in figure(8.4). Also note that there is relative camera shift between all test pictures. 88 8.4 Outlook Our Motion Blur Metric 6 2 x 10 1.8 1.6 Blurred pixel counts 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 Test picture index 8 9 10 11 Figure 8.5: Our motion blur metric. As to the running speed of our batch code, in practice, loading those test pictures into memory probably consumes more time than the rest of the processing. 8.4 Outlook In practice, image blur is an overall effect that has various origins. For example, camera lens defocus is another important contributor to the overall image blur. Therefore, in order to build a refined blur metric that only reflects camera motion, removing the blur caused by defocusing would be an important pre-processing.1 In addition, our scheme has space for the possible incorporation of a human visual model specially designed for camera motion blur, i.e. the relationship between the ‘blurred pixel count’ and the perceived degrees of the motion blur might be nonlinear. In that case, a weighted blurred pixel count can be used as the final IS benchmark, with the weight based on certain Human Visual System model. To finish the whole working cycle, with the human hand-held motion pattern in mind, we can even plan to assemble a three dimensional shaking simulator using motors from fruit blender or other appliances, so that we can free reviewer from taking enormous test pictures by hand. 1 Thank Professor Fridrich for pointing this out. 89 8. BENCHMARKING CAMERA IS BY THE PICTURE HISTOGRAMS OF A WHITE NOISE PATTERN 8.5 Conclusion Since our scheme only resorts to image histogram, a first order statistics, it runs fast. Besides speed advantage, this histogram based metric is readily scalable in terms of resolution variation and contrast variation when it comes to cross comparison among different camera models. When compared to conventional approach requiring human intervention, our method is capable of producing benchmarks at a much finer granularity due to its capability of handling a much larger volume of test picture space. Further more, we are relying on numerical threshold values other than human eyes’ discretion, which makes a fair comparison across different camera models feasible. Finally, pre-processing such as removing defocus incurred blur is expected to produce better results. 90 9 Conclusions 9.1 Reversing Digital Watermarking 9.1.1 Security and Robustness In the early days of information hiding, people summarized three key components in its problem space, i.e. imperceptibility, capacity, and robustness. (63) Some believe that more robustness implies more space to reinforce the security of an information hiding scheme. In this dissertation, we proved that robustness and security should be treated as two independent components in the watermarking problem space by introducing the concept of super-robustness, which refers to the property that a watermark can survive certain types of attack far beyond one would expect. Rather than a blessing, superrobustness is in fact a security flaw due to its implicit high selectivity. The power of this technique is demonstrated by our success in the first BOWS contest. While human expert’s knowledge is required to identify modes of super-robustness, we further developed an automatic algorithm specially for watermark detector using normalized correlation. Several large false positive vectors that we called noise snakes are automatically grown along the boundary of the detection region, a hyper-cone for this case, which can be used to disclose useful information such as the cone angle and the dimensionality of the watermark feature space. The working assumption here is we already know the transform and the feature space that is used by the target watermarking scheme. If the feature dimensionality goes too high(> 104 ), a more precise model is needed in order to achieve accurate reversing results. We then explored the mathematical model 91 9. CONCLUSIONS which describes a hyper-cone intersecting a hyper-sphere, and we successfully transfered this n-D problem to a pseudo 4-D problem, which depends on the solution to a quartic equation. However, due to the complex nature of the model, both analytical and numerical solutions of this quartic equation are not obtainable even with a computer. Luckily enough, we observed the pattern of the arc angle curve α(h) w.r.t. the noise amplitude R when the watermark detection rate crosses the critical point(50%). In general, our mathematical analysis indicates a much harder life for attackers working in a watermark feature space with very large dimensionality. 9.1.2 Reversing before Attacking Another important factor that contributes to our success in the BOWS contest is our attacking strategy, i.e. instead of concentrating on attacking, we learn the target watermarking system, especially its detection algorithm, by reverse engineering first. The knowledge we learned by identifying the modes of super-robustness proves to be very helpful in helping design more guided attacks. In order to make our reversing life more systematic, a generic framework for modelling watermark detection system is introduced, which consists of transform, feature space, and detector. Note that this model also applies to many non-watermarking systems that can be essentially modeled as a classifier, which implies that some of the reversing experience we gained on this model might be readily transferable to other non-watermarking problems. 9.1.2.1 Caution on Designing Patterns In our BOWS contest experience, human expertise was found to be very useful in the sense that it helps us quickly narrow down possible candidates with high confidence. The reason behind this is, watermarking designers are tend to follow common designing path which leaves obvious patterns for attackers to convince their attacking direction by semantic hints. For example, watermarking researchers as a whole are tend to use common transforms, common feature space, and common detectors. In particular, for the case of block DCT embedding, people tend to choose subband that forms a regular geometric shape, which gives strong hint that the attackers are on the right track. This is observation is backed by our literature survey. 92 9.2 Circumventing Detection in Steganography From the perspective of a watermark designer, transforms like (19) might be more preferred over those written in textbook. For feature space, more randomness and protection should be introduced when select the watermark feature space. As to detectors using normalized correlation, for example, the detection cone should at least be truncated in order to prevent information leakage through noise snakes. Some of these design concerns had been implemented in the second BOWS contest. (23) 9.2 Circumventing Detection in Steganography For the task of evading detection, other than the conventional methodology of mimicking natural noise or preserving the statistics of the cover media, we explore an emerging new philosophy, which works by firstly creating a stego-friendly channel. Whereas the subset selection technique described in Chapter 5 is a theoretical efforts in this direction, we implement this idea by introducing the concept of cultural engineering. The term culture engineering is derived from the term social engineering. It means to trick people to do certain things by firstly creating or promoting a related culture which would influence people to do such things, which people are not likely to do otherwise. As a real life implementation, the random animation backdrop in an iChat video-conference session can be driven by our cipher message without raising suspicion, since it looks culturally natural. In short, while philosophy of conventional approaches is to make stego data looks statistically natural, our goal is to make stego data looks culturally natural. This idea was later extended to audio later by other student in my research group, in the form of an iPhone walkie-talkie application. (64) 9.3 Camera related Image Processing In chapter 7, we refined the categorization of the camera identification features as compared to camera forensic survey papers. (65, 66, 67) Additionally, we notice that the different in-camera shooting modes is a blind point in previous research on statistical identification features. 93 9. CONCLUSIONS 9.3.1 White Noise Pattern Image One of the major contributions in Chapter 7 and Chapter 8 is the introduction of a new shooting target, i.e. we use a white noise pattern to supersede the commonly used checkerboard pattern image as our shooting target. For the scenario of lens identification, the white noise pattern excels in its good position registration capability, so that the CA pattern been extracted is perfectly aligned from one test picture to another in the presence of camera movement. For the scenario of image stabilization evaluation, this pattern excels in its histogram invariant feature, so that our motion blur metric can be directly derived from the histogram of test pictures. 9.3.2 Complete Lens Identification Feature Space Focal distance is another major contribution we made in Chapter 7. Among all usercontrollable parameters, focal length, aperture size had been found to be important factors in shaping the lens CA pattern. Adding focal distance completes the lens identification feature space, and enable us to distinguish different copies of the same lens. Due to its complex mechanical designs, the identification feature space for lens is expected to be larger than image sensors. 9.3.2.1 Outlook for the Bokeh Modulation If the modes of the bokeh modulation can be modeled and calculated for each individual lens, there is hope to identify the corresponding lens from certain types of real life pictures. In other words, the bokeh modulation might be the fourth dimension of a more complete lens identification feature space. 9.3.3 Motion Blur Measure from Histogram Thanks to the histogram-invariant feature of our white noise pattern image, we can build a motion blur metric directly from the histograms of test picture, which is very fast. When compared with current image stabilization methods which either require human visual assessment or use some generic blur metrics, this consistent blur metric 94 9.3 Camera related Image Processing enables a fair comparison across different image stabilization systems from different camera manufacturers. In addition, according to the Data Processing Inequality in information theory (61), the histogram subtraction operation in our algorithm is better than subtraction between generic blur metrics as in (60). Finally, removing image blur caused by lens defocus would further improve the metric accuracy. 95 [13] M. L. Miller, G. J. Doerr, and I. J. Cox. Applying Informed Coding and Embedding to Design a Robust, High capacity Watermark. IEEE Trans. on Image Processing, 13:792–807, June 2004. 11, 13, 15, 32, 33 [14] A. Westfeld. Lessons from the BOWS Contest. In MM&Sec ’06: Proc. 8th Workshop on Multimedia and Security, pages 208–213, 2006. 11 References [15] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon. Secure Spread Spectrum Watermarking for Multimedia. IEEE Trans. on Image Processing, 6:1673–1687, 1997. 13 [1] Wikibooks. Tcl Programming. http://en.wikibooks.org/wiki/Tcl_Programming/Print_version, May 2009. xiv, 58 [2] Elliot J. Chikofsky and James H. Cross II. Reverse engineering and design recovery: a taxonomy. IEEE Software Magazine, 7(1):13–17, Jan 1990. 5 [16] S. Craver and J. Yu. Reverse-enineering a detector with false alarms. In Proc. SPIE, Security, Steganography, and Watermarking of Multimedia Contents IX, 2007. 17, 33, 34 [17] Scott Craver, Idris Atakli, and Jun Yu. Reverse-engineering a watermark detector using an oracle. EURASIP J. Inf. Secur., 2007:5:1–5:16, January 2007. 17, 33, 34 [3] Eldad Eilam. Reversing: Secretes of Reverse Engineering, pages xxiii–23. Wiley Publishing Inc, 2005. 5, 7 [18] Chun-Shien Lu, Shih-Kun Huang, Chwen-Jye Sze, and HongYuan M ark Liao. Cocktail Watermarking for Digital Image Protection. IEEE Trans. on Multimedia, 2:209– 224, 2000. 18 [4] Jean-Paul M. G. Linnartz and Maten van Dijk. Analysis of the Sensitivity Attack against Electronic Watermarks in Images. Lecture Notes in Computer Science, 1525:258–272, 1998. 6, 20 [19] J. Fridrich, A. C. Baldoza, and R. J. Simard. Robust Digital Watermarking Based on Key-Dependent Basis Functions. Lecture Notes in Computer Science, 1525:143–157, 1998. 19, 93 [5] P. Comesana, L. Perez-Freire, and F. Perez-Gonzalez. Blind newton sensitivity attack. IEE Proceedings on Information Security, 153:115–125, Sept. 2006. 6, 20 [20] I. Atakli, S. Craver, and J. Yu. How we Broke the BOWS Watermark. In Proc. SPIE, Security, Steganography, and Watermarking of Multimedia Contents IX, 2007. 20, 21, 33, 41 [6] Fabien A. P. Petitcolas, Ross J. Anderson, and Markus G. Kuhn. Attacks on Copyright Marking Systems. In Proceedings of the Second International Workshop on Information Hiding, pages 218–238, London, UK, 1998. Springer-Verlag. 6 [21] I. J. Cox, M. L. Miller, and J. A. Bloom. Digital Watermarking. Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA, 2002. 23 [7] Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica Fridrich, and Ton Kalker. Digital Watermarking and Steganography, pages 335–374. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2nd edition, 2007. 6 [22] Eric W. Weisstein. Hypersphere. From MathWorld A Wolfram Web Resource. http://www.mathworld.wolfram.com/Hypersphere.html. 27 [8] Secure Digital Music Initiative. SDMI Public Challenge. http://www.hacksdmi.org, Sept. 2000. 6, 9 [23] Teddy Furon and Patrick Bas. Broken arrows. EURASIP J. Inf. Secur., 2008:1–13, 2008. 33, 93 [9] Watermark Virtual Laboratory(Wavila) of the ECRYPT. The Break Our Watermarking System (BOWS) Contest. http://lci.det.unifi.it/BOWS/, 2006. 8, 11, 33 [24] Christian Cachin. An information-theoretic model for steganography. Inf. Comput., 192:41–56, July 2004. 45 [10] S. Craver, M. Wu, Liu, A. Stubblefield, B. Swartzlander, D. S. Dean, and E. Felten. Reading Between the Lines: Lessons Learned from the SDMI Challenge. In Proc. of the 10th Usenix Security Symposium, pages 353–363, August 2001. 9, 10, 20 [25] Andrew D. Ker, Tomáš Pevný, Jan Kodovský, and Jessica Fridrich. The square root law of steganographic capacity. In Proceedings of the 10th ACM workshop on Multimedia and security, MMSec ’08, pages 107–116, New York, NY, USA, 2008. ACM. 45 [11] J. Boeuf and J. P. Stern. An Analysis of One of the SDMI Candidates. Lecture Notes in Computer Science, 2137:395–??, 2001. 9 [26] Toms Filler, Andrew D. Ker, and Jessica J. Fridrich. The square root law of steganographic capacity for Markov covers. In Storage and Retrieval for Image and Video Databases, 2009. 45, 46 [12] R. Petrovic, B. Tehranchi, and J. M. Winograd. Digital Watermark Security Considerations. In MM&Sec ’06: Proc. 8th Workshop on Multimedia and Security, pages 152–157, 2006. 9 [27] Andrew D. Ker. Estimating the Information Theoretic Optimal Stego Noise. In IWDW’09, pages 184– 198, 2009. 45, 46 96 REFERENCES [28] Ying Wang and Pierre Moulin. Perfectly Secure Steganography: Capacity, Error Exponents, and Code Constructions. Computing Research Repository, abs/cs/070, 2007. 50 [41] Jan Luks, Jessica J. Fridrich, and Miroslav Goljan. Digital camera identification from sensor pattern noise. IEEE Transactions on Information Forensics and Security, pages 205–214, 2006. 66, 67 [29] Scott Craver. On Public-Key Steganography in the Presence of an Active Warden. Lecture Notes in Computer Science, 1525:355–368, 1998. 51, 52 [42] Mo Chen, Jessica Fridrich, and Miroslav Goljan. Digital imaging sensor identification (further study). In In Security, Steganography, and Watermarking of Multimedia Contents IX. Edited by Delp, Edward J., III; Wong, Ping Wah. Proceedings of the SPIE, 6505, 2007. 66, 67, 73 [30] Peter Wayner. Disappearing Cryptography: Information Hiding: Steganography & Watermarking 2nd Edition. Morgan Kaufmann, 2002. 52 [43] J. Fridrich and M. Goljan. Determining approximate age of digital images using sensor defects. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, 7880 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, February 2011. 66 [31] Ross J. Anderson and Fabien A. P. Petitcolas. On the Limits of Steganography. IEEE Journal of Selected Areas in Communications, 16(4):474–481, 1998. Special issue on copyright & privacy protection. 53 [44] Alin C. Popescu and Hany Farid. Exposing digital forgeries in color filter array interpolated images. IEEE Transactions on Signal Processing, 53:3948–3959, 2005. 66 [32] John Giffin, Rachel Greenstadt, Peter Litwack, and Richard Tibbetts. Covert messaging through TCP timestamps. In Proceedings of the 2nd international conference on Privacy enhancing technologies, PET’02, pages 194–208, Berlin, Heidelberg, 2003. Springer-Verlag. 55 [45] Sevinc Bayram, Husrev T. Sencar, and Nasir Memon. Improvements on source camera-model identification based on CFA interpolation. In Proceedings of WG 11.9 International Conference on Digital Forensics, 2006. 66 [33] Daisuke Inoue and Tsutomu Matsumoto. Scheme of standard MIDI files steganography and its evaluation. In SPIE Security and Watermarking of Multimedia Contents IV, 4675, pages 194–205, 2002. 55 [46] Weihong Wang and Hany Farid. Exposing digital forgeries in video by detecting double quantization. In Proceedings of the 11th ACM workshop on Multimedia and security, MM&Sec ’09, pages 39–48, New York, NY, USA, 2009. ACM. 66 [34] Anthony Steven Szopa. New Steganography Innovation from Ciphile Software. talk.politics.crypto, August 2001. 3B894FCE.FC4B729E@ciphile.com. 56 [35] NetApplications.com. Operating System Market Share for January, 2008. http://marketshare.hitslink.com/report.aspx?qprid=8. 56 [47] Eric Kee, Micah K. Johnson, and Hany Farid. Displayaware Image Editing. IEEE Transactions on Information Forensics and Security, 2011. 66 [48] Micah K. Johnson and Hany Farid. Exposing digital forgeries by detecting inconsistencies in lighting. In Proceedings of the 7th workshop on Multimedia and security, MM&Sec ’05, pages 1–10, New York, NY, USA, 2005. ACM. 66 [36] Kevin Mitnick and William Simon. The Art of Deception: Controlling the Human Element of Security. John Wiley & Sons, 2002. 57 [37] Jenny Leung, Jozsef Dudas, Glenn H. Chapman, Israel Koren, and Zahava Koren. Quantitative Analysis of In-Field Defects in Image Sensor Arrays. Defect and Fault-Tolerance in VLSI Systems, IEEE International Symposium on, 0:526–534, 2007. 66 [49] Mehdi Kharrazi, Husrev T. Sencar, and Nasir D. Memon. Blind source camera identification. In ICIP’04, pages 709–712, 2004. 66 [50] Alin C. Popescu and Hany Farid. Statistical Tools for Digital Forensics. In In 6th International Workshop on Information Hiding, pages 128–147. Springer-Verlag, Berlin-Heidelberg, 2004. 66 [38] K. S. Choi, E. Y. Lam, and K. K. Y. Wong. Source camera identification using footprints from lens aberration. In N. Sampat, J. M. DiCarlo, & R. A. Martin, editor, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, 6069 of Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, pages 172–179, February 2006. 66, 68, 71 [51] Y.-F. Hsu and S.-F. Chang. Image Splicing Detection Using Camera Response Function Consistency and Automatic Segmentation. In International Conference on Multimedia and Expo, 2007. 66 [39] Micah K. Johnson and Hany Farid. Exposing digital forgeries through chromatic aberration. In MMSec ’06: Proceedings of the 8th workshop on Multimedia and security, pages 48–55, New York, NY, USA, 2006. ACM. 66, 68 [52] Flickr.com. Most Popular Cameras in the Flickr Community. http://www.flickr.com/cameras, 2010. 67 [53] John Mallon and Paul F. Whelan. Calibration and removal of lateral chromatic aberration in images. Pattern Recogn. Lett., 28(1):125–135, 2007. 69 [40] Thomas Gloe, Karsten Borowka, and Antje Winkler. Efficient estimation and large-scale evaluation of lateral chromatic aberration for digital image forensics. In Nasir D. Memon, Jana Dittmann, Adnan M. Alattar, and Edward J. Delp III, editors, Media Forensics and Security II, 7541, page 754107. SPIE, 2010. 66, 68, 69, 71, 76 [54] Sing Bing Kang. Automatic Removal of Chromatic Aberration from a Single Image. In Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0, pages 1–8. IEEE Computer Society, 2007. 69 97 REFERENCES [55] Hiroshi Shimamoto Takayuki Yamashita. Automatic Removal of Chromatic Aberration from a Single Image. In Morley M. Blouke, editor, Sensors, Cameras, and Systems for Scientific/Industrial Applications VII, 6068. SPIE, 2006. 69 [62] Sebastian Schuon and Klaus Diepold. Comparison of Motion Deblur Algorithms and Real World Deployment. Acta Astronautica, 64, 2009. 86 [63] W. Bender, D. Gruhl, N. Morimoto, and Aiguo Lu. Techniques for data hiding. IBM Syst. J., 35:313– 336, September 1996. 91 [56] Stephen M. Smith and J. Michael Brady. SUSAN A New Approach to Low Level Image Processing. International Journal of Computer Vision, 23:45–78, 1997. 10.1023/A:1007963824710. 69 [64] Enping Li and Scott Craver. A supraliminal channel in a wireless phone application. In Proceedings of the 11th ACM workshop on Multimedia and security, MM&Sec ’09, pages 151–154, New York, NY, USA, 2009. ACM. 93 [57] X. C. He and N. H. C. Yung. Curvature Scale Space Corner Detector with Adaptive Threshold and Dynamic Region of Support. Pattern Recognition, International Conference on, 2:791–794, 2004. 69 [65] Tian tsong Ng, Shih fu Chang, Ching yung Lin, and Qibin Sun. Passive-blind Image Forensics. In In Multimedia Security Technologies for Digital Rights, 2006. 93 [58] Wikipedia.com. Bokeh. http://en.wikipedia.org/wiki/Bokeh, August 2011. 79 [59] dpreview.com. Digital Photography http://www.dpreview.com/, 2011. 82, 83 [66] Tran Van Lanh, Kai-Sen Chong, Sabu Emmanuel, and Mohan S. Kankanhalli. A Survey on Digital Camera Image Forensic Methods. In ICME’07, pages 16–19, 2007. 93 Review. [60] SLRgear.com. SLRgear Image Stabilization Testing, Version 2. http://www.slrgear.com/articles/ISWP2/ismethods_v2.html, 2011. 82, 83, 86, 95 [67] Nitin Khanna, Aravind K. Mikkilineni, Anthony F. Martone, Gazi N. Ali, George T.-C. Chiu, Jan P. Allebach, and Edward J. Delp. % bf A survey of forensic characterization methods for physical devices. Digital Investigation, 3(Supplement 1):17 – 28, 2006. The Proceedings of the 6th Annual Digital Forensic Research Workshop (DFRWS ’06). 93 [61] Thomas M. Cover and Joy Thomas. Elements of Information Theory, chapter 2, page 32. Wiley, 1991. 86, 95 98