Automated Mapping of Large Binary Objects Ben Sangster Roy Ragsdale

advertisement
Automated Mapping of Large
Binary Objects
Ben Sangster
Roy Ragsdale
Greg Conti
http://www.loc.gov/loc/lcib/0611/images/map.jpg
The views expressed in this
presentation are those of the
author and do not reflect the
official policy or position of the
United States Military Academy,
the Department of the Army, the
Department of Defense or the
U.S. Government.
http://www.cdcr.ca.gov/News/Images/overcrowding/MuleCreek_071906v1.jpg
Motivation
0400-07FF
0800-9FFF
8000-9FFF
A000-BFFF
A000-BFFF
C000-CFFF
D000-D02E
D400-D41C
D800-DBFF
DC00-DC0F
DD00-DD0F
D000-DFFF
E000-FFFF
E000-FFFF
FF81-FFF5
1024-2047
2048-40959
32758-40959
40960-49151
49060-59151
49152-53247
53248-53294
54272-54300
55296-56319
56320-56335
56576-56591
53248-53294
57344-65535
57344-65535
65409-65525
Screen memory
Basic ROM memory
Alternate: Rom plug-in area
ROM : Basic
Alternate: RAM
RAM memory, including alternate
Video Chip (6566)
Sound Chip (6581 SID)
Color nybble memory
Interface chip 1, IRQ (6526 CIA)
Interface chip 2, NMI (6526 CIA)
Alternate: Character set
ROM: Operating System
Alternate : RAM
Jump Table
Goals
• Accurately identify regions within arbitrary binary
object
• Efficient algorithms
• Extensible framework
• Automated mapping process
• Automated process for generating test data
• Current State: BINMAP Utility
0
insert ~ 5MB here...
insert ~ 5MB here...
~12MB
0
ASCII Text
Data Structure
insert ~ 5MB here...
Compressed Image 1
insert ~ 5MB here...
Compressed Image N
Unicode URLs
Data Structure
~12MB
f(x)
0
N
f(x)
0
N
Partial Taxonomy
binary fragment
high entropy
medium entropy
compression encryption
low entropy
machine human
data
uncompressed
code
language structures
media
RLE LZW ... AES DES ...
ECB CBC ...
EN FR RU ...
repeating
values
Goal
0400-07FF
0800-9FFF
8000-9FFF
A000-BFFF
A000-BFFF
C000-CFFF
D000-D02E
D400-D41C
D800-DBFF
DC00-DC0F
DD00-DD0F
D000-DFFF
E000-FFFF
E000-FFFF
FF81-FFF5
1024-2047
2048-40959
32758-40959
40960-49151
49060-59151
49152-53247
53248-53294
54272-54300
55296-56319
56320-56335
56576-56591
53248-53294
57344-65535
57344-65535
65409-65525
ASCII Text (English)
Pointer Table
Variable Length Array
Compressed Data
Unicode (Basic Latin)
Unknown Region
Repeating Value (0xFF)
Encrypted Region (AES)
PNG Image
JavaScript
Encrypted Region (RSA Key?)
Unknown Region
BMP Image
Unicode (Hyperlinks?)
Repeating Value (0x00)
f(x)
Fragment type 1
Fragment type 2
Fragment type N
a1-a2
a3-a4
a5-a6
Fragment type 1
Fragment type 2
Fragment type N
Test 1
Test 2
Test 3
Test N
a1-a2
b1-b2
c1-c2
z1-z2
a3-a4
b3-b4
c3-c4
z3-z4
a5-a6
b5-b6
c5-c6
z5-z6
Shannon Entropy
Perl Random Number
Sequence
a1-a2
AES Encrypted Word
Document
a3-a4
ASCII Text Document
a5-a6
BMP (Single Color)
a7-a8
Shannon Entropy
Shannon entropy H(X)
measures uncertainty
and quantifies
information contained
in message.
Other Techniques
- Hamming Weight
- Index of Coincidence
- Mean / Standard Deviation
- Traditional pattern matching
- <Your ideas?>
http://en.wikipedia.org/wiki/Shannon_entropy
Window Size
(Shannon Entropy of AES sample)
Window Size
(Shannon Entropy of AES sample)
Window Size
(Shannon Entropy of AES sample)
Window Size
(Shannon Entropy of AES sample)
Window Size
(Shannon Entropy of 4 file types)
Window Size
(Shannon Entropy of 4 file types)
BinMap Demo
Extensibility
Example
Entropy/Evaluating
8.5
8
7.5
7
6.5
generic
6
compressed
encrypted
5.5
5
4.5
4
3.5
0
10000
20000
30000
40000
50000
60000
Future Work
• Improve Framework
– Analyze performance
– Develop & improve plug-ins
• Improve Datasets
• Integrate with visualization, interaction and GUI
• Other identification measures
• Apply datamining techniques
• Increase size of taxonomy
Code repository: http://binmap.googlecode.com
0x3F 0x3F 0x3F
?
?
?
Download