Hash Set Additions to the National Software Reference Library

advertisement
Hash Set Additions to the National Software Reference Library
Brayan Hernandez
Computer Science Department, Hartnell College, Salinas, CA 93901
Neil Rowe, Ph.D, Computer Science Department, NPS
Abstract
Hash sets are used in digital forensics to check a files
legitimacy. The primary objective was to see how many new
additions to the National Software Reference Library (NSRL)
could be made from a very large dataset purchased by NPS.
The data was processed, parsed, sorted and manipulated
using Python 2.7 to identify information in the data set that
could be compared to the Library. After parsing the data and
analyzing key information, it was compared to known
datasets. 879,679 hash sets matched those of the corpora
provided by the National Software Reference Library
(NSRL).
Code Processes
Data Processing
The first thing was substituting the SHA1 hash code with the
MD5 hash code from the two sets of data. That code looked
like this:
This code provided by Dr. Rowe, reformats files so they can be handled in
both windows and Linux machines. Because the format is different in the
machines, it can later prove difficult to process the same files on another
machine, such as Windows versus Linux.
The first three lines all open files and assign them to a variable.
For example :
fid1 = open(dirname + ‘/sorted_hashsetsdotcom_data.txt’, ‘r’)
This opens the file in whatever directory it is given, with a specific file
“sorted_hashsetsdotcom_data.txt” and allows for the ability to only read
the file, given in the code by the letter ‘r’.
Conclusions
The Data purchased from hashsets.com provided about
879,679 hash sets which matched those of the corpora
Provided by the National Software Reference Library
(NSRL). Being that this data was purchased from a private
vendor it was not known whether it would provide sufficient
matches, but it proved to have a significant number of
additions to the National Software Reference Library . In
total there were 6,441,457 hash codes in the file, and of
that 13.6 % were as new.
Matches
10000000
8000000
6000000
4000000
Introduction and methodology
A Hash code for any given set of data is it’s digital
signature, like a digital fingerprint. There are no two a like,
and when a file is tampered with the hash code changes,
and if even one digit is changed in the hash set, the file
changes completely. Python 2.7 was the primary tool used
to process the data because lists of items are easier to
manipulate.. In Figure 1 is an example of the raw
purchased data with details pertaining to each hash code. It
also shows why Python is a great fit to handle lengthy and
huge amounts of data. Figure 1 shows the data from 2
files, search engine details and primary, the first provides
with the SHA1 hash code and in the second set the
objective was to match it with its corresponding SHA1 in
the second set and substitute that with the MD5 hash in the
second file. The first file is 14 Gigabytes, and the second is
roughly 8 Gigabytes, all text, and millions of lines. This
project dealt with analyzing purchased hash sets, to see if
any of them matched the corpora documented by the team
at the Naval Postgraduate School (NPS).
2000000
0
NSRL
Hashsets.com
The Code uses a slider method using the alphanumeric
value of the hashes to sort them, and with the slider go up
and down depending on that value until a match is found,
then substitute the SHA1 for the MD5.
Output
The following output shows the SHA1 hash code substituted from
the other file, the file name of the hash code, its size, the Operating
System, and the website where the hashes were purchased.
Hash Code: D84270022E57F1850C8464FA432ADFF9955881575
File Name: index.docbox
Operating System Version: Redhat 7.3 (32 bit)
Operating System Type: Linux
File Origin: From hashsets.com
Matches to
NSRL
line1 = fid1.readline()
This line of code is setting equal the variable ‘line1’, to a function which
allows the computer to read the first line in whichever file is assigned to it,
here it is the code from earlier which tells us ;
fid1= open(dirname + ‘/sorted_hashsetsdotcom_data.txt’, ‘r’)
First it opens the file with this line of code, and the other two files, one which
has the attribute ‘w’ instead of ‘r’ which means that the file can be written on.
After it sets equal variables to a function that read the lines from the files.
Ongoing work
Other scripts are being written to determine if the hash sets
purchased are of operating systems, and versions in the corpora of
the team. Below is an output to a script which partially
determined all possible matches. Outputting -1 where there is no
match, meaning a new addition to the corpora of Operating
Systems and versions.
Acknowledgments
Dr. Neil Rowe, Alison Kerr, Cassandra Martin, Professor
Joe Welch, Pat McNeill, and Andy Newton, and Kelly
Locke.
This internship was funded by a Title V Strengthening
Transfer Pathways Grant.
For further information
Brayan Hernandez
brayanhernandez@student.hartnell.edu
Neil Rowe, Ph.D.
ncrowe@nps.edu
Figure 1
Search_engine_details.sql :
INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003d54ef57f72dc6316a2fbc30ae4bf', N'ConfigFile.pyo', N'pyo', N'File', '2002-03-13 18:42:54', null, '2002-03-13 18:42:54', N'/usr/lib/python1.5/site-packages/mx/Misc/ConfigFile.pyo', N'Redhat 7.3 (32bit)', N'Redhat 7.3 i386', N'Operating Systems', N'Installation', N'Linux', N'Redhat Incorporated', N'Redhat 7.3', N'No', N'32bit', '2010-03-07', N'No', N'13079373', N'www.Redhat.com', N'North America', N'No', null, N'Linux');
INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003dae769bbca07137e18e171346e34', N'mimetypes.pyc', N'pyc', N'File', '1998-09-03 21:51:56', null, '1998-09-03 21:51:56', N'/lib/python1.5/mimetypes.pyc', N'Redhat 5.2 (32bit)', N'Redhat 5.2 i386', N'Operating Systems', N'Installation', N'Linux', N'Redhat Incorporated', N'Redhat 5.2', N'No', N'32bit', '2010-03-07', N'No', N'13079374', N'www.Redhat.com', N'North America', N'No', null, N'Linux');
INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003dc4db7143f19e6b9ff33290fe180', N'msgCore.h', N'h', N'File', '2001-09-01 11:46:04', null, '2001-09-01 11:46:04', N'/usr/include/mozilla/msgCore.h', N'Redhat 7.2 (32bit)', N'Redhat 7.2 i386', N'Operating Systems', N'Installation', N'Linux', N'Redhat Incorporated', N'Redhat 7.2', N'No', N'32bit', '2010-03-07', N'No', N'13079375', N'www.Redhat.com', N'North America', N'No', null, N'Linux');
INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003eb513019db8b5b9a77ec08fca298', N'mpzpi.py', N'py', N'File', '2004-07-22 04:23:11', null, '1996-11-27 13:47:00', N'/mnt/usr/local/share/examples/python2.2/scripts/mpzpi.py', N'FreeBSD 4.6 (32bit)', N'FreeBSD 4.6 i386', N'Operating Systems', N'Installation', N'Unix-Like (BSD)', N'FreeBSD Foundation', N'FreeBSD 4.6', N'No', N'32bit', '2010-03-07', N'No', N'13079376', N'www.Freebsd.org', N'North America', N'No', null, N'BSD');
INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003eb513019db8b5b9a77ec08fca298', N'mpzpi.py', N'py', N'File', '2004-07-22 16:06:29', null, '1996-11-27 13:47:00', N'/usr/doc/python-2.3.4/Demo/scripts/mpzpi.py', N'Slackware 10 (32bit)', N'Slackware 10 i386', N'Operating Systems', N'Installation', N'Linux', N'Slackware Linux Incorporated', N'Slackware 10', N'No', N'32bit', '2010-03-07', N'No', N'13079377', N'www.Slackware.com', N'North America', N'No', null, N'Linux');
INSERT INTO [search_engine_details] ([MD5], [Name], [File_Ext], [Description], [Last_Accessed], [File_Created], [Last_Written], [Full_Path], [Quick_Category], [File_Notes], [Major], [Minor], [Operating_System], [Manufacturer], [Version], [Inside_Compressed_Files], [Processor_Bits], [Record_Date], [Is_Deleted], [key_field], [website], [Geographic_Location], [Extraneous], [Log], [Graphic]) VALUES (N'0003eb513019db8b5b9a77ec08fca298', N'mpzpi.py', N'py', N'File', '2004-07-23 11:53:36', null, '1996-11-27 13:47:00', N'/usr/doc/python-2.3.1/Demo/scripts/mpzpi.py', N'Slackware 9.1 (32bit)', N'Slackware 9.1 i386', N'Operating Systems', N'Installation', N'Linux', N'Slackware Linux Incorporated', N'Slackware 9.1', N'No', N'32bit', '2010-03-07', N'No', N'13079378', N'www.Slackware.com', N'North America', N'No', null, N'Linux');
Search_engine_primary.sql:
INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000040f69913a27ff7401b8bf3cfd1', N'D84270022E57F1850C8464FA432ADFF99588157B', N'A204F44B0DD4B615AA41A7D6D09EF3C637283053DAC7EB1C3459E7793D888D21', N'2jSAftUOuOU+JnyjSBTPlbXJXQyjVr/PkyP1b3+0voC8CKZSvqyS43XQjKXv9/zA:OSUwO4KCq0a+R6cKZz1s2oXGSbR', N'24', N'3c3f786d6c2076657273696f6e3d22312e3022203f3e0a3c21444f4354595045', N'<?xml version="1.0" ?> <!DOCTYPE part PUBLIC "-//KDE//DTD DocBook XML V4.1-Based Variant V1.0//EN" "dtd/kdex.dtd" [ <!ENTITY kio', N'XML Document (Example Signature) - Header [3C] [3F] [78] [6D] [6C] ', N'2225', N'No', N'No', N'No', N'13078775');
INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'0000011E03A0F3C00B6E76BAC8EED431', N'B3908B6A87CD94008FA36AE6EEB833FD99926486', N'2CE100E4D575F95FB9B40D9483F365683A9EDDCD7D1FB74C93F46169F485AE16', N'Lb9m59NHrRJNWQ/t0quJBpI6B9nAKLhxHnW0DI6+vB4+QzwANW47HnWtl0xxtl:n9iLbNX/tho3ntw56FBYlSxt', null, N'67706d2d7072696d6172792d3030302e706e6700000000000000000000000000', N'gpm-primary-000.png', null, N'60', N'No', N'No', N'No', N'28018913');
INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'000003943324c3345cf1a21337b79533', N'F304E40A3E01F0A1DF3F17A80DC410E864A8947F', N'E5D7EB05AE079E5F4861EA0677C9B29AE20CEAC4754CD972EE555483413B5616', null, null, N'3c6120687265663d225061636b616765536c69646553686f772e68746d6c223e', N'<a href="PackageSlideShow.html">PackageSlideShow</a> ', N'Unknown', N'53', N'No', N'No', N'No', N'24313937');
INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000448C638A2AE7C4CB08960A627D8', N'5F6BFB2C247FA8E6F0114ACD837CC6EEE9EA5557', N'E7A84C30A4C58C8E38F28EDF1CC01E3DE4322C8D4A93792C135348AD9A5CCF0F', N'RgjcQ9lplL+eVeHdlXlcAxW7R5xtKl7QwHPeAgOcZhAvt5kUWm3:O/9jZ+eVeHdtlSDxtQ71PbUZat5nW2', N'192', N'cafebabe0000003001df0a013501360701370701380a000301390a0003013a0a', N'0 5678 9 : 9; 95<5=5>5?5@5A5B5C5D5E5FB5G5HI5JK5L5M5NO PQ P5RS 5T "U5VW &XY (Z5[\ +95]^ .95_` 195a 5bc 59d 7P 5e 5f 5ghi <jk ?l 7', N'Java Class Library - Header [CA] [FE] [BA] [BE] ', N'13526', N'No', N'No', N'No', N'31426793');
INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000965bef00c92a18b2b31e75d702c', N'26EEEB25D7005F9FF9EE08A8084C77242702FBAD', N'7481A702F0E0DEC4C70C3A714D68D152CC490AF5AC6A26AF90960CBF5F159561', N'oM5b0mfOv3KDIW5sWLDtHLj6TiSFjw+WrQz26zGMyI:PbsOTLBX0jjWrW9qM', N'48', N'feff002200540069006d0065002200200009003d002000220054006900640022', N'"Time" = "Tid"; "Percentage" = "Prosent"; "Icon Only"= "Kun symboler"; "Show" = "Vis"; "Open Energy Saver..." = "pne Strmsparing', null, N'2142', N'No', N'No', N'No', N'13078777');
INSERT INTO [search_engine_primary] ([MD5], [SHA_1], [SHA_256], [Fuzzy_Hash], [Fuzzy_Block], [Header_HEX], [128_Bytes_ASCII], [Signature], [Logical_Size], [Extraneous], [Encrypted], [NSRL], [key_field]) VALUES (N'00000f2907cf8a806bcc3d6dc7699d4c', N'53D2E11C60EF1E9DDB3C04DD0A983983F87A9660', N'7AA01D480D2CE95C555440D6ACFC76532667D0400C50D173D59BCC82E0981F44', N'2dluSBuT3YUv+nVgZehwpv+RgA0ms2v+qzLvgA0msRdtBMhehx2G:cluSuk4+nVgZP+RgAA2+qzrgAARd/yet', N'24', N'3c3f786d6c2076657273696f6e3d22312e302220656e636f64696e673d225554', N'<?xml version="1.0" encoding="UTF-8"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v3" manifestVersion="1.0" description="$(r', N'XML Document (Example Signature) - Header [3C] [3F] [78] [6D] [6C] ', N'1517', N'No', N'No', N'No', N'26797511');
',
Download