ICDFI 2013 Keynote Speech 1: Quantifying Likelihood in Digital Forensic Investigations Dr Richard Overill Department of Informatics, King’s College London richard.overill@kcl.ac.uk Synopsis • Introduction & Background • Quantitative Tools for Digital Forensics – Probability Theory – Bayesian Networks – Complexity Theory – Information Theory • How can these tools benefit us? • Summary & Conclusions Introduction & Background • Conventional (‘wet’) forensic scientists commonly quantify the outcomes of their investigations, for example: – There is a one in a million chance that two identical fingerprints were not produced by the same individual – There is a one in a billion chance that two identical DNA samples do not originate from the same individual • Digital forensic investigators generally don’t do this. Why? Quantitative Tools for Digital Investigations - I • Probability Theory – conventional forensic scientists commonly use it • Example: – Potential cosmic ray damage to CMOS and Flash RAM. In mid-1990s IBM found that a high-energy secondary cosmic ray strike could flip about one bit of CMOS RAM per month. But modern Flash memory is much more susceptible and much more densely packed, so the bit-flip rate is now per minute. This has clear implications for DFI. Quantitative Tools for Digital Investigations - II • Bayesian Networks (BNs) to reason about digital evidence and hypotheses • Pioneered by K-P Chow to reason about IP piracy over peer-to-peer networks • Need to choose conditional probabilities (CPs) for each node giving the probability of finding the each expected evidential trace if its associated hypothesis is (true, false) • We have shown the BN’s output is rather insensitive to the choice of CPs, so BNs are valid Example BN – DDoS Attack Complexity Theory - I • Ockham’s Razor and the Principle of Least Contrivance / Contingency • Hoyle: “A tornado sweeping through a junk-yard might assemble a Boeing 747 from the materials therein”, but what are the chances of that? • The least complex explanation of all the evidence is the most probable explanation • Measuring the complexity of alternative explanations (computational work, user role, software effort, etc.) can yield an odds ratio Complexity Theory - II • Example: the odds ratios against a Trojan Horse explanation for six common digital crimes have been calculated: – BitTorrent IP theft – Online auction fraud – Cyber locker extortion – Online game weapon theft – DDoS attack – Possession of child pornographic images Information Theory • Conventional (Shannon-Weaver) information theory (‘entropy’) measures the degree of unpredictability in the recovered evidence • Algorithmic information theory (SolomonovKolmogorov) measures the length of the shortest program that can reproduce all the recovered evidence • So there is a link between Complexity and Information Theory that can be exploited Benefits of Quantitative Tools • Enable the forensic investigator / examiner to prioritise cases that that have a high chance of success and to abandon cases which have a low chance of going to trial • Enables prosecution authorities to assess the relative strength of their case versus the defence’s case when deciding whether or not to proceed to trial • Enables courts to hear digital evidence presented in a similar manner to non-digital Summary & Conclusion • I hope I have persuaded you that: – quantitative tools exist to produce likelihood ratios and odds ratios for cases in which undisputed digital evidence can be fully accounted for by more than one explanation (hypothesis) – the benefits of adopting such tools are improving : • the conduct of the digital forensic investigation • the decision making of the prosecution authority • the conduct of the trial proceedings Thank you! Comments? Questions?