Hash Set Additions to the National Software Reference Library Brayan Hernandez Computer Science Department, Hartnell College, Salinas, CA 93901 Neil Rowe, Ph.D, Computer Science Department, NPS Abstract Hash sets are used in digital forensics to check a files legitimacy. The primary objective was to see how many new additions to the National Software Reference Library (NSRL) could be made from a very large dataset purchased by NPS. The data was processed, parsed, sorted and manipulated using Python 2.7 to identify information in the data set that could be compared to the Library. After parsing the data and analyzing key information, it was compared to known datasets. 879,679 hash sets matched those of the corpora provided by the National Software Reference Library (NSRL). Code Processes Data Processing The first thing was substituting the SHA1 hash code with the MD5 hash code from the two sets of data. That code looked like this: This code provided by Dr. Rowe, reformats files so they can be handled in both windows and Linux machines. Because the format is different in the machines, it can later prove difficult to process the same files on another machine, such as Windows versus Linux. The first three lines all open files and assign them to a variable. For example : fid1 = open(dirname + ‘/sorted_hashsetsdotcom_data.txt’, ‘r’) This opens the file in whatever directory it is given, with a specific file “sorted_hashsetsdotcom_data.txt” and allows for the ability to only read the file, given in the code by the letter ‘r’. Conclusions The Data purchased from hashsets.com provided about 879,679 hash sets which matched those of the corpora Provided by the National Software Reference Library (NSRL). Being that this data was purchased from a private vendor it was not known whether it would provide sufficient matches, but it proved to have a significant number of additions to the National Software Reference Library . In total there were 6,441,457 hash codes in the file, and of that 13.6 % were as new. Matches 10000000 8000000 6000000 4000000 Introduction and methodology A Hash code for any given set of data is it’s digital signature, like a digital fingerprint. There are no two a like, and when a file is tampered with the hash code changes, and if even one digit is changed in the hash set, the file changes completely. Python 2.7 was the primary tool used to process the data because lists of items are easier to manipulate.. In Figure 1 is an example of the raw purchased data with details pertaining to each hash code. It also shows why Python is a great fit to handle lengthy and huge amounts of data. Figure 1 shows the data from 2 files, search engine details and primary, the first provides with the SHA1 hash code and in the second set the objective was to match it with its corresponding SHA1 in the second set and substitute that with the MD5 hash in the second file. The first file is 14 Gigabytes, and the second is roughly 8 Gigabytes, all text, and millions of lines. This project dealt with analyzing purchased hash sets, to see if any of them matched the corpora documented by the team at the Naval Postgraduate School (NPS). 2000000 0 NSRL Hashsets.com The Code uses a slider method using the alphanumeric value of the hashes to sort them, and with the slider go up and down depending on that value until a match is found, then substitute the SHA1 for the MD5. Output The following output shows the SHA1 hash code substituted from the other file, the file name of the hash code, its size, the Operating System, and the website where the hashes were purchased. Hash Code: D84270022E57F1850C8464FA432ADFF9955881575 File Name: index.docbox Operating System Version: Redhat 7.3 (32 bit) Operating System Type: Linux File Origin: From hashsets.com Matches to NSRL line1 = fid1.readline() This line of code is setting equal the variable ‘line1’, to a function which allows the computer to read the first line in whichever file is assigned to it, here it is the code from earlier which tells us ; fid1= open(dirname + ‘/sorted_hashsetsdotcom_data.txt’, ‘r’) First it opens the file with this line of code, and the other two files, one which has the attribute ‘w’ instead of ‘r’ which means that the file can be written on. After it sets equal variables to a function that read the lines from the files. Ongoing work Other scripts are being written to determine if the hash sets purchased are of operating systems, and versions in the corpora of the team. Below is an output to a script which partially determined all possible matches. Outputting -1 where there is no match, meaning a new addition to the corpora of Operating Systems and versions. Acknowledgments Dr. Neil Rowe, Alison Kerr, Cassandra Martin, Professor Joe Welch, Pat McNeill, and Andy Newton, and Kelly Locke. This internship was funded by a Title V Strengthening Transfer Pathways Grant. For further information Brayan Hernandez brayanhernandez@student.hartnell.edu Neil Rowe, Ph.D. ncrowe@nps.edu Figure 1 Search_engine_details.sql : INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003d54ef57f72dc6316a2fbc30ae4bf', N'ConfigFile.pyo', N'pyo', N'File', '2002-03-13 18:42:54', null, '2002-03-13 18:42:54', N'/usr/lib/python1.5/site-packages/mx/Misc/ConfigFile.pyo', N'Redhat 7.3 (32bit)', N'Redhat 7.3 i386', N'Operating Systems', N'Installation', N'Linux', N'Redhat Incorporated', N'Redhat 7.3', N'No', N'32bit', '2010-03-07', N'No', N'13079373', N'www.Redhat.com', N'North America', N'No', null, N'Linux'); INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003dae769bbca07137e18e171346e34', N'mimetypes.pyc', N'pyc', N'File', '1998-09-03 21:51:56', null, '1998-09-03 21:51:56', N'/lib/python1.5/mimetypes.pyc', N'Redhat 5.2 (32bit)', N'Redhat 5.2 i386', N'Operating Systems', N'Installation', N'Linux', N'Redhat Incorporated', N'Redhat 5.2', N'No', N'32bit', '2010-03-07', N'No', N'13079374', N'www.Redhat.com', N'North America', N'No', null, N'Linux'); INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003dc4db7143f19e6b9ff33290fe180', N'msgCore.h', N'h', N'File', '2001-09-01 11:46:04', null, '2001-09-01 11:46:04', N'/usr/include/mozilla/msgCore.h', N'Redhat 7.2 (32bit)', N'Redhat 7.2 i386', N'Operating Systems', N'Installation', N'Linux', N'Redhat Incorporated', N'Redhat 7.2', N'No', N'32bit', '2010-03-07', N'No', N'13079375', N'www.Redhat.com', N'North America', N'No', null, N'Linux'); INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003eb513019db8b5b9a77ec08fca298', N'mpzpi.py', N'py', N'File', '2004-07-22 04:23:11', null, '1996-11-27 13:47:00', N'/mnt/usr/local/share/examples/python2.2/scripts/mpzpi.py', N'FreeBSD 4.6 (32bit)', N'FreeBSD 4.6 i386', N'Operating Systems', N'Installation', N'Unix-Like (BSD)', N'FreeBSD Foundation', N'FreeBSD 4.6', N'No', N'32bit', '2010-03-07', N'No', N'13079376', N'www.Freebsd.org', N'North America', N'No', null, N'BSD'); INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003eb513019db8b5b9a77ec08fca298', N'mpzpi.py', N'py', N'File', '2004-07-22 16:06:29', null, '1996-11-27 13:47:00', N'/usr/doc/python-2.3.4/Demo/scripts/mpzpi.py', N'Slackware 10 (32bit)', N'Slackware 10 i386', N'Operating Systems', N'Installation', N'Linux', N'Slackware Linux Incorporated', N'Slackware 10', N'No', N'32bit', '2010-03-07', N'No', N'13079377', N'www.Slackware.com', N'North America', N'No', null, N'Linux'); INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003eb513019db8b5b9a77ec08fca298', N'mpzpi.py', N'py', N'File', '2004-07-23 11:53:36', null, '1996-11-27 13:47:00', N'/usr/doc/python-2.3.1/Demo/scripts/mpzpi.py', N'Slackware 9.1 (32bit)', N'Slackware 9.1 i386', N'Operating Systems', N'Installation', N'Linux', N'Slackware Linux Incorporated', N'Slackware 9.1', N'No', N'32bit', '2010-03-07', N'No', N'13079378', N'www.Slackware.com', N'North America', N'No', null, N'Linux'); Search_engine_primary.sql: INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000040f69913a27ff7401b8bf3cfd1', N'D84270022E57F1850C8464FA432ADFF99588157B', N'A204F44B0DD4B615AA41A7D6D09EF3C637283053DAC7EB1C3459E7793D888D21', N'2jSAftUOuOU+JnyjSBTPlbXJXQyjVr/PkyP1b3+0voC8CKZSvqyS43XQjKXv9/zA:OSUwO4KCq0a+R6cKZz1s2oXGSbR', N'24', N'3c3f786d6c2076657273696f6e3d22312e3022203f3e0a3c21444f4354595045', N'<?xml version="1.0" ?> <!DOCTYPE part PUBLIC "-//KDE//DTD DocBook XML V4.1-Based Variant V1.0//EN" "dtd/kdex.dtd" [ <!ENTITY kio', N'XML Document (Example Signature) - Header [3C] [3F] [78] [6D] [6C] ', N'2225', N'No', N'No', N'No', N'13078775'); INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'0000011E03A0F3C00B6E76BAC8EED431', N'B3908B6A87CD94008FA36AE6EEB833FD99926486', N'2CE100E4D575F95FB9B40D9483F365683A9EDDCD7D1FB74C93F46169F485AE16', N'Lb9m59NHrRJNWQ/t0quJBpI6B9nAKLhxHnW0DI6+vB4+QzwANW47HnWtl0xxtl:n9iLbNX/tho3ntw56FBYlSxt', null, N'67706d2d7072696d6172792d3030302e706e6700000000000000000000000000', N'gpm-primary-000.png', null, N'60', N'No', N'No', N'No', N'28018913'); INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'000003943324c3345cf1a21337b79533', N'F304E40A3E01F0A1DF3F17A80DC410E864A8947F', N'E5D7EB05AE079E5F4861EA0677C9B29AE20CEAC4754CD972EE555483413B5616', null, null, N'3c6120687265663d225061636b616765536c69646553686f772e68746d6c223e', N'<a href="PackageSlideShow.html">PackageSlideShow</a> ', N'Unknown', N'53', N'No', N'No', N'No', N'24313937'); INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000448C638A2AE7C4CB08960A627D8', N'5F6BFB2C247FA8E6F0114ACD837CC6EEE9EA5557', N'E7A84C30A4C58C8E38F28EDF1CC01E3DE4322C8D4A93792C135348AD9A5CCF0F', N'RgjcQ9lplL+eVeHdlXlcAxW7R5xtKl7QwHPeAgOcZhAvt5kUWm3:O/9jZ+eVeHdtlSDxtQ71PbUZat5nW2', N'192', N'cafebabe0000003001df0a013501360701370701380a000301390a0003013a0a', N'0 5678 9 : 9; 95<5=5>5?5@5A5B5C5D5E5FB5G5HI5JK5L5M5NO PQ P5RS 5T "U5VW &XY (Z5[\ +95]^ .95_` 195a 5bc 59d 7P 5e 5f 5ghi <jk ?l 7', N'Java Class Library - Header [CA] [FE] [BA] [BE] ', N'13526', N'No', N'No', N'No', N'31426793'); INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000965bef00c92a18b2b31e75d702c', N'26EEEB25D7005F9FF9EE08A8084C77242702FBAD', N'7481A702F0E0DEC4C70C3A714D68D152CC490AF5AC6A26AF90960CBF5F159561', N'oM5b0mfOv3KDIW5sWLDtHLj6TiSFjw+WrQz26zGMyI:PbsOTLBX0jjWrW9qM', N'48', N'feff002200540069006d0065002200200009003d002000220054006900640022', N'"Time" = "Tid"; "Percentage" = "Prosent"; "Icon Only"= "Kun symboler"; "Show" = "Vis"; "Open Energy Saver..." = "pne Strmsparing', null, N'2142', N'No', N'No', N'No', N'13078777'); INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000f2907cf8a806bcc3d6dc7699d4c', N'53D2E11C60EF1E9DDB3C04DD0A983983F87A9660', N'7AA01D480D2CE95C555440D6ACFC76532667D0400C50D173D59BCC82E0981F44', N'2dluSBuT3YUv+nVgZehwpv+RgA0ms2v+qzLvgA0msRdtBMhehx2G:cluSuk4+nVgZP+RgAA2+qzrgAARd/yet', N'24', N'3c3f786d6c2076657273696f6e3d22312e302220656e636f64696e673d225554', N'<?xml version="1.0" encoding="UTF-8"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v3" manifestVersion="1.0" description="$(r', N'XML Document (Example Signature) - Header [3C] [3F] [78] [6D] [6C] ', N'1517', N'No', N'No', N'No', N'26797511'); ',