Design a Framework to Mitigate Inconsistencies in Producing Digital Evidence Zulfany Erlisa Rasjid (1840001500) Abstract Digital Evidence is data in a digital storage that is used as supporting evidence in court. FBI announced that its Computer Analysis and Response Team (CART) in 2012 has conducted more than 13,300 digital forensic examinations. These examinations involve more than 10,500 terabytes of data. Digital evidence is obtained using evidence extraction tool, where most extracton tool uses the MD5 hashing technique. MD5 hshing is known to have a weakness, where two different files had the same hash values (collision). Digital tools performed on different level of abstraction of the hard drives may result differently. The extraction process, could cause inconsistenies when the extraction failed to extract one bit of data. Another inconsistency occurs when the data has been tampered and therefore not admissible as evidence in court. This research is to discover whether such inconsistency occurs during extraction and design a framework for the extraction process and use a newly developed algorithm applied before the actual hashing process ensuring the integrity of resulting data making it admissible in court. This research will be executed in three different stages. The first stage is based on the analysis of hard drive, identifying how data gets written to hard drives. The second stage will be to identify how the extraction tools works and analyze the result. Finallly an algorithm will be developed and using a new framework, this new algorithm is applied to the input stream before the hashing process to ensure the creation of unique hash value. Finally tests and experiments will be conducted to extract the evidence data using the new framework and analyze the hashing result. In conclusion, te reasons of inconsistencies will be identified and introduce a framework with a new algorithm applied to the input string before the hashing process to reduce these failures. Keywords: Digital evidence, MD5, hashing, consistency, evidence extraction, computer forensics, collision Literature Review There are two types of Digital Evidence: digital recordings, which a projection of a physical reality and digital evidence string, which contains strings that are verified by some mathematical function [1]. The requirement of a digital evidence include unambiguity and security. With the growth of technology, computer crimes also grow. Analyzing evidence such as the contents of hard drives or memory image for example, may not always give all of the available information [2]. Therefore it is necessary to always improve on computer forensic tehnology. Issues identified in relation with digital evidence include, encryption techniques, hashing tecniques, the extraction tools and volume of data [3]. Software extraction software used nowadays have evolved, however there still exist some open problems related to the analysis of large files. [4]. The hashing techniques also play an important role in digital evidence extraction. Digital extraction tools uses hasing techniques such as SHA and MDx series. Hash algorithm is used to authenticate whether the data extraction is complete. Any changes in the input string will produce a different hash value. This depends on how reliablie the hashing function is. A study on the propertes of different hashing algorithm shows that in the field of cryptography, it is secured, however the hash algorithm used must have very low probability of collision. [5]. In 2004, cryptographers Xianyan Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu have shown that two different files had the same hash values using MD5 hashing. The impact of this collision was addressed by Eric Thompson, whereby (1) the resulting impact is very small. He claims that MD5 is still secure against attacks using brute force techniques, (2) changing one bit will have a cascading effect and (3) the change of a birthday collision is very remote [6]. In 2012, researchers Tiwari and Asawa [7], built a new hashing function based on MD5. The function uses 256-bit hashing function. They have shown that this function has high sensivity on input and is secure against generic attack, birthday attack, differential attack and statistical attack [7], however its is unknown for the probability of getting collisions. Despite of the weakness of MD5 algorithm, its is still being used in most digital evidence extraction tool. With these existing problems, the integrity degree of the digital evidence produced is unclear and it has significant impact towads the reliability and the admissibility of the evidence in court. In order to identify where the inconsistencies occur, all items related to digital evidence must be analyzed in detail. Those items include hard drives and how data gets written to it, how the digital extraction software works and finally how the hashing algorithm works. Several different formats exits in evidence representation, such as EnCase, Smart, DEB, AFF, ProDiscover, mostly use MD5 or SHA hashing [8]. Until now, It is still possible to get the same hash value for the same file. Alhough the probabiity is low, it is significant for digital evidence as the evidence will be used to support lawsuits. Therefore in order to mitigate the collision, another layer will be introduced prior to the hashing process. This layer contains an agorithm to mitigate the collisions that may be produce by the hashing algorithm. A complete experiment will be conducted to proof that by having this extra layer, the probability of getting inconsistencies are reduced and therefore gaining more trust in computer forensics. The consistency and reliability of digital evidence will ensure acceptance and admissibility of the evidence to be used to support lawsuits. References [1] U. Maurer, “New approaches to digital evidence,” Proc. IEEE, vol. 92, no. 6, pp. 933–947, 2004. [2] E. W. a Huebner, D. Bem, and O. Bem, “Computer forensics: past, present and future,” Inf. Secur. Tech. Rep., vol. 8, no. 2, pp. 32–36, 2003. [3] G. Mohay, “Technical challenges and directions for digital forensics,” Proc. - First Int. Work. Syst. Approaches to Digit. Forensic Eng., vol. 2005, pp. 155–161, 2005. [4] V. Roussev, C. Quates, and R. Martell, “Real-time digital forensics and triage,” Digit. Investig., vol. 10, no. 2, pp. 158–167, 2013. [5] H. J. Ke, J. Liu, S. J. Wang, and D. Goyal, “Hash-algorithms output for digital evidence in computer forensics,” Proc. - 2011 Int. Conf. Broadband Wirel. Comput. Commun. Appl. BWCCA 2011, vol. 1, pp. 399–404, 2011. [6] E. Thompson, A. Corporation, S. West, and U. States, “MD5 collisions and the impact on computer forensics,” pp. 36–40, 2005. [7] H. Tiwari and K. Asawa, “Building a 256-bit hash function on a stronger MD variant,” vol. 4, no. 2, pp. 67–85, 2014. [8] A. O. Flaglien, A. Mallasvik, M. Mustorp, and A. Årnes, “Storage and exchange formats for digital evidence,” Digit. Investig., vol. 8, no. 2, pp. 122–128, 2011.