Quantifying Digital Forensics - Department of Informatics

ICDFI 2013 Keynote Speech 1:
Quantifying Likelihood in Digital
Forensic Investigations
Dr Richard Overill
Department of Informatics,
King’s College London
• Introduction & Background
• Quantitative Tools for Digital Forensics
– Probability Theory
– Bayesian Networks
– Complexity Theory
– Information Theory
• How can these tools benefit us?
• Summary & Conclusions
Introduction & Background
• Conventional (‘wet’) forensic scientists
commonly quantify the outcomes of their
investigations, for example:
– There is a one in a million chance that two
identical fingerprints were not produced by the
same individual
– There is a one in a billion chance that two
identical DNA samples do not originate from the
same individual
• Digital forensic investigators generally don’t
do this. Why?
Quantitative Tools for Digital
Investigations - I
• Probability Theory
– conventional forensic scientists commonly use it
• Example:
– Potential cosmic ray damage to CMOS and Flash
RAM. In mid-1990s IBM found that a high-energy
secondary cosmic ray strike could flip about one
bit of CMOS RAM per month. But modern Flash
memory is much more susceptible and much
more densely packed, so the bit-flip rate is now
per minute. This has clear implications for DFI.
Quantitative Tools for Digital
Investigations - II
• Bayesian Networks (BNs) to reason about digital
evidence and hypotheses
• Pioneered by K-P Chow to reason about IP piracy
over peer-to-peer networks
• Need to choose conditional probabilities (CPs) for
each node giving the probability of finding the
each expected evidential trace if its associated
hypothesis is (true, false)
• We have shown the BN’s output is rather
insensitive to the choice of CPs, so BNs are valid
Example BN – DDoS Attack
Complexity Theory - I
• Ockham’s Razor and the Principle of Least
Contrivance / Contingency
• Hoyle: “A tornado sweeping through a junk-yard
might assemble a Boeing 747 from the materials
therein”, but what are the chances of that?
• The least complex explanation of all the evidence
is the most probable explanation
• Measuring the complexity of alternative
explanations (computational work, user role,
software effort, etc.) can yield an odds ratio
Complexity Theory - II
• Example:
the odds ratios against a Trojan Horse
explanation for six common digital crimes have
been calculated:
– BitTorrent IP theft
– Online auction fraud
– Cyber locker extortion
– Online game weapon theft
– DDoS attack
– Possession of child pornographic images
Information Theory
• Conventional (Shannon-Weaver) information
theory (‘entropy’) measures the degree of
unpredictability in the recovered evidence
• Algorithmic information theory (SolomonovKolmogorov) measures the length of the
shortest program that can reproduce all the
recovered evidence
• So there is a link between Complexity and
Information Theory that can be exploited
Benefits of Quantitative Tools
• Enable the forensic investigator / examiner to
prioritise cases that that have a high chance
of success and to abandon cases which have a
low chance of going to trial
• Enables prosecution authorities to assess the
relative strength of their case versus the
defence’s case when deciding whether or not
to proceed to trial
• Enables courts to hear digital evidence
presented in a similar manner to non-digital
Summary & Conclusion
• I hope I have persuaded you that:
– quantitative tools exist to produce likelihood
ratios and odds ratios for cases in which
undisputed digital evidence can be fully accounted
for by more than one explanation (hypothesis)
– the benefits of adopting such tools are improving :
• the conduct of the digital forensic investigation
• the decision making of the prosecution authority
• the conduct of the trial proceedings
Thank you!