Sri Siri Vineela Kukkadapu

advertisement
The 15th Annual Network and Distributed System Security Symposium
Automatic Protocol Format Reverse
Engineering through Context-Aware
Monitored Execution
Zhiqiang Lin 1
Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang
1Purdue
University
2George Mason University
February 12th, 2007
1
Motivation
 Protocol reverse engineering
 A process to recover protocol specifications
 E.g., fields and their relationships
 Applications:
 Network-based Intrusion detection – DoS attacks, Port
Scans, Computer Systems
 Network management – correctly recognize and
monitor traffic
 Fuzz Testing – s/w testing technique
 …
Challenges
0x0040: cd46 4745 5420 2f6e 6577 732e 6874 6d6c
0x0050: 2048 5454 502f 312e 300d 0a55 7365 722d
0x0060: 4167 656e 743a 2057 6765 742f 312e 3130 Hierarchical
0x0070: 2e32 2028 5265 6420 4861 7420 6d6f 6469
0x0080: 6669 6564 290d 0a41 6363 6570 743a 202a
Parallel
0x0090: 2f2a 0d0a 486f 7374 3a20 3132 392e 3137
0x00a0: 342e 3838 2e37 310d 0a43 6f6e 6e65 6374
0x00b0: 696f 6e3a 204b 6565 702d 416c 6976 650d.
0x00c0: 0a0d 0a
Sequential
 Multiple fields in a single message
 Non-static size of fields
 Complex relationships among protocol fields
Challenges
HTTP-Request =
Parallel
Request-Line
Sequential (( general-header | request-header | entity-header ) CRLF)*
CRLF
[ message-body ]
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Hierarchical
A BNF Specification of HTTP Request (RFC2616)
**Hierarchical relation: A field can be further divided into multiple sub-fields
**Sequential relation : Captures the ordering between adjacent fields in a protocol.
**Parallel relation: The positions of two or more fields are exchangeable in the protocol
specification.
Note: SP and CRLF are separators
Related Work
 Network Trace
 Protocol Informatics
 Discoverer [W. Cui et. al. Security’07]
 Binary Analysis
 Polyglot [J. Caballero et. al. CCS’07]
 Automatic Network Protocol Analysis [G. Wondracek et.
al. NDSS’08]
Observation
119 int read_header(int sid) {
...
129 sgets(line, sizeof(line)-1, conn[sid].socket);
…
137 if (sscanf(line, "%[^ ] %[^ ] %[^ ]", conn[sid].dat->in_RequestMethod,
conn[sid].dat->in_RequestURI, conn[sid].dat->in_Protocol)!=3)
...
147 while (strlen(line)>0) {
REQUEST LINE field
...
divided into
154
if (strncasecmp(line, "Cookie: ", 8)==0)
METHOD, REQUEST
155
strncpy(conn[sid].dat->in_Cookie, (char *)&line+8,
URI and HTTP
sizeof(conn[sid].dat->in_Cookie)-1);
VERSION
156
if (strncasecmp(line, "Host: ", 6)==0)
157
strncpy(conn[sid].dat->in_Host, (char *)&line+6,
sizeof(conn[sid].dat->in_Host)-1);
…
160
if (strncasecmp(line, "User-Agent: ", 12)==0)
•Cookie , host, user161
strncpy(conn[sid].dat->in_UserAgent, (char *)&line+12,
agent are  Parallel
sizeof(conn[sid].dat->in_UserAgent)-1);
fields
162 }
...
Code snippet in http.c (null-httpd-0.5.0)
187 }
AutoFormat -- Basic Idea
Execution Context
G E T
Protocol Fields
/ n e w s…
Context
One Field
Another Field
System Overview
GET /news.html
input
Context-aware
Execution Monitor
Log
call stack
0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
…
24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
…
0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_getword_white
EIP
0x4BA56A2
0x4BA56A2
0x4BA56A2
0x4BA56A2
0x1F7F3
Protocol Field Identifier
 Analyze log file
 Step 1: build protocol field tree from the logged
data.
 Step 2: refine the tree using three heuristics
 Step 3: output the result
Example: Apache log data
0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
…
24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667
->apr_brigade_split_line->memchr
…
24 '\n' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core
23 '\r‘ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_rgetline_core
0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_getword_white
1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_getword_white
2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection
->0xF5A8->ap_read_request->ap_getword_white
…
GET /news.html HTTP/1.0\r\n
\n
\r
GET
0x4BA56A2
0x4BA56A2
0x4BA56A2
0x4BA56A2
0x26187
0x26322
0x1F7F3
0x1F7F3
0x1F7F3
Step 1 -- Building Protocol Field Tree
Parent node contains
offsets of its children
root
User−Agent:
Wget/1.10.2
(Red Hat
HTTP/1.0
GET
/news.html
HTTP/1.0\r\n
modified)\r\nAccept: */*\r\n….
Contains offsets of all
input data
Step 1: Building Protocol Field Tree
Missing SPACE
before “ /n”
Redundancy in
fields
Overly fine
grained fields
GET /news.html HTTP/1.0\r\n
GET /news.html HTTP/1.0\r\n
GET /news.html
GET
HTTP/1.0
/
news.html
H
TTP/1.0
/
news.html
H
TTP/1.0
/
news.html
H
TTP/1.0
\r
\n
Step 2: Refinement (Tokenization)
GET /news.html HTTP/1.0\r\n
GET /news.html HTTP/1.0\r\n
GET /news.html
GET
/
news.html
HTTP/1.0
H
\r
\n
TTP/1.0
GET /news.html HTTP/1.0\r\n
/
news.html
H
TTP/1.0
GET /news.html HTTP/1.0\r\n
/
news.html
H
TTP/1.0
Merge 2 child nodes if their content
can form one token –based on TEXTBASED PROTOCOLS
GET
GET /news.html
HTTP/1.0
/news.html
HTTP/1.0
/news.html
HTTP/1.0
/news.html
HTTP/1.0
\r\n
Step 2: Refinement (Redundant Node Deletion)
GET /news.html HTTP/1.0\r\n
GET /news.html HTTP/1.0\r\n
GET
GET /news.html
HTTP/1.0
/news.html
HTTP/1.0
/news.html
HTTP/1.0
/news.html
HTTP/1.0
An internal node is
redundant if it has
only 1 child
GET
\r\n
GET /news.html HTTP/1.0\r\n
GET /news.html
/news.html
HTTP/1.0
\r\n
Step 2: Refinement (Node Insertion)
GET /news.html HTTP/1.0\r\n
GET /news.html
GET
HTTP/1.0
\r\n
Insert a new child
node to parent IF the
offsets of children do
not match the parent.
/news.html
GET /news.html HTTP/1.0\r\n
GET /news.html
GET
/news.html
HTTP/1.0
\r\n
Step 3: Output the Result
4
GET /news.html HTTP/1.0\r\n
Parallel & Sequential
3
GET /news.html
2
1
GET
/news.html
GET
/news.html
Parallel:
*Collect execution history of each node
* For a parent- if child nodes share
similar history –MARK it
Hierarchical
HTTP/1.0
\r\n
HTTP/1.0
\r\n
Sequential:
*Pre-order traversal of tree
-lists the leaf nodes
-parent of multiple parallel nodes
Evaluation
 Implemented on top of Valgrind-3.2.3
 Also applies to QEMU, PIN
For context
aware execution
monitor
 Benchmark
 30 messages with six known protocols and one
unknown protocol.
 Evaluation Metric
 Re: Ratio of exact match
|(A ∩ W)|/|W|
 A: set of fields identified by AutoFormat
 W: set of fields identified by Wireshark
Overall Result
Re(F): Re for finest-grained fields
Re(H): Re for hierarchical fields
Re(P): Re for parallel fields
100% match
with Wireshark
* (-) => |P| for
Wireshark=0
Averages:
Re(F) = 88.5%
Re(H) = 98.0%
Re(P) = 100.0%
Re=93.4%
Discussion
 Dynamic Trace Dependency -AutoFormat does
not detect message formats not present in the
execution trace
 Byte granularity – AutoFormat does not detect
protocol fields at bit level
 Protocol State Machine – AutoFormat does not
correlate multiple messages of same protocol
session.
 Obfuscated binaries- AutoFormat does not
handle these type of inputs.
Conclusion
 Paper also includes the Slapper Worm Messages
as a part of second experimental results set.
 AutoFormat
 A tool for automatic protocol format extraction.
 Key insight
 A protocol implementation is programmed to
recognize the protocol format and usually contains
protocol field-specific execution context, and we
can actually leverage such context to infer the
hierarchical structure of protocol fields, and even
get their BNF structures.
Q&A
Thank you
For more information:
{zlin, dxu, xyzhang}@cs.purdue.edu
xjiang@gmu.edu
Download