Yubin Xia, Yutao Liu, Haibo Chen, Binyu Zang in DSN 2012
A.C. Chen 2012/09/18 @ ADL
• Introduction
• Performance Monitoring Units (PMU)
• CFI Enforcement by CFIMon
• Implementation
• Experiment
• Performance
• Conclusion
A.C. Chen 2012/09/18 @ ADL 2
A.C. Chen 2012/09/18 @ ADL 3
• Many classes of security exploits usually involve introducing abnormal control flow transfers
– Code-injection attack
– Code-Reuse Attacks
• return-into-libc (RILC)
• return-oriented programming (ROP)
• jump-oriented programming (JOP)
• Countermeasures
– non-executable stacks
– Stack-Guard
– safe C library
– heuristic means
– ….
– usually designed for a specific problem
A.C. Chen 2012/09/18 @ ADL 4
• Control flow integrity (CFI) [Abadi et al.]
– statically rewrites a program + dynamic inlined guards
• Suffer from coverage problems
• Control flow locking [Tyler Bletsch et al.]
– recompiles a program
• difficult to be applied to legacy applications
• Architectural support to validate or enforce control flow integrity [Shi et al.]
– need to re-design existing processors
A.C. Chen 2012/09/18 @ ADL 5
• Detect a set of attacks that cause abnormal control flow transfers --CFIMon
– without changes to existing hardware, source code or binaries
– leverage the hardware support for performance counters to monitor the control flow integrity (CFI)
A.C. Chen 2012/09/18 @ ADL 6
Hardware support for performance monitoring
A.C. Chen 2012/09/18 @ ADL 7
• perfmon
A.C. Chen 2012/09/18 @ ADL 8
• Interrupt-based mode (basic mode)
– lacks precise instruction pointer information
• the reported IP may be up to tens of instructions away from the actual IP (instruction pointer) causing the event
• Precision mode
– improve the precision and flexibility of PMUs
– e.g. techniques used in Intel CPU:
• PEBS: Precise Event-Based Sampling
• BTS: Branch Trace Store
• LBR: Last Branch Record
• Event Filtering
• Conditional Counting
A.C. Chen 2012/09/18 @ ADL 9
Precision Mode of Intel CPU
---Branch Trace Store (BTS) Mechanism
• Record all control transfer precisely into a predefined buffer
– jump, call, return, interrupt and exception
– also record the addresses of branch source and target
• Let a monitor get the trace in a batch
– an interrupt will be delivered when the buffer is nearly full
• Obtain all the branch information of a running application, help users locate the vulnerabilities
A.C. Chen 2012/09/18 @ ADL 10
A.C. Chen 2012/09/18 @ ADL 11
• The CFI of an application can be maintained if we can
– get a legal set of branch target addresses for every branch
– check whether the target address of every branch is within the corresponding legal set at runtime
A.C. Chen 2012/09/18 @ ADL 12
Branch Classification in X86 ISA
---Direct Branch & Its Target Address
• Direct Branch (safe branch) √
– Direct jump
• jnz c2ef0 <__write >
– Direct call
• callq 34df0 <abort >
• Since the code is read-only and cannot be modified during runtime, both the direct jump and direct call are considered safe one
A.C. Chen 2012/09/18 @ ADL 13
Branch Classification in X86 ISA
---Indirect Branch & Its Target Address
• Indirect Branch (unsafe branch) √
– Indirect jump
• jmpq *%rdx
D y n a m i c T r a i n i n g
• not possible to gain the whole target address set just by static analysis
– Indirect call
• callq *%rax
A call can only transfer control to the start of a function.
• its target address could be obtained by statically scanning the binary code of the application and the libraries it uses
– Return
• retq
In general, the target address of a return has to be the one next to a call
• its target address could also be obtained by scanning the binary code.
A.C. Chen 2012/09/18 @ ADL 14
• Offline phase
– build a legal set of target addresses for each branch instruction
• Online phase
– diagnose possible attacks with legal sets following a number of rules
• determine the status of the branch as legal , illegal or suspicious
A.C. Chen 2012/09/18 @ ADL 15
Offline Analysis
--- obtain legal set: ret_set, call_set
• Scans the binary of application and dynamic libraries to get
– ret_set
• contains all addresses of the instructions next to each call
– call_set
• contains all addresses of the first instruction of each function int add
(int a, int b){
.
.
.
.
add(3,4); printf(“TEST!”); ret_set
.
} printf(“1 st inst.”);
.
call_set
A.C. Chen 2012/09/18 @ ADL 16
Offline Analysis
--- obtain legal set:train_set
• Use training to collect branches trace ( recorded by BTS ) for each indirect jump, get the legal set of
– train_set
– there could be corner cases which are not covered
• considered as suspicious during online checking
A.C. Chen 2012/09/18 @ ADL 17
<source,target> legal illegal suspicious special case? no
<source> is direct branch?
no
<source> is return ret_set
<source> is indirect call call_set yes yes
<source> is indirect jump train_set yes no yes no yes no s w i t c h i n t o different cases based on <source>
C o n s i d e r t h e s t a t e o f a branch depending o n < t a r g e t > slide-window m e c h a n i s m
A.C. Chen 2012/09/18 @ ADL 18
Slide-Window Mechanism
---For Suspicious Branches
• The diagnose module makes a flexible decision depending on the pattern of the branches
– maintain a window of the states of recent n branches
– apply a rule of tolerating at most m suspicious branches in the recent n ones
• i.e., at most m suspicious branches are accepted in recent n branches
A.C. Chen 2012/09/18 @ ADL 19
A.C. Chen 2012/09/18 @ ADL 20
• Debian-6 with kernel version 2.6.34
– 2GB 1066MHz main memory
– Intel Core i5 processor with 4 cores
• Based on perf_events to implement the CFIMon
– a unified kernel extension in Linux for user-level performance monitoring
A.C. Chen 2012/09/18 @ ADL 21
• A kernel extension
– operate the performance samples
– monitor signals
– provide the interfaces to user-level tool
• A user-level tool with 2 modules
– diagnose module
• check the control flow integrity
• receives information from the OS to solve special cases such as signal handling
– control module
• initialize the environment
• launch and synchronize with an application
A.C. Chen 2012/09/18 @ ADL 22
A user-level tool with 2 modules
A kernel extension
A.C. Chen 2012/09/18 @ ADL 23
• The user-level tool is the parent process of the application process, executed as a monitoring process
– use ptrace to synchronize with the application process
– run for security check at the critical point
• e.g. when the child process makes the exec system call
A.C. Chen 2012/09/18 @ ADL 24
A.C. Chen 2012/09/18 @ ADL 25
• Use several real-world applications as well as 2 demo programs to detect
– Code-Injection Attacks
– Return-to-libc Attacks
– Return-oriented Programming (Samba, GPSd, and Wuftpd-2.6.0 excluded)
A.C. Chen 2012/09/18 @ ADL 26
• Use the metasploit framework to generate nopsled before the injected code
– attack each application with injected code 5 times to test the false negatives
– CFIMon detects all these attacks as expected
• report a security alarm
• For example, code-injection attack of Samba
– heap overflow function lsa_trans_name and overwrite the function pointer destructor
– CFIMon detected such attack since the branches have never appeared in the train_set
A.C. Chen 2012/09/18 @ ADL 27
• CFIMon successfully detects all these attacks without experiencing false negatives
• Return-to-libc Attack of GPSd (ver. 2.7)
– format string vulnerability in function gpsd_report
– allows remote attackers to execute arbitrary libc function (e.g. system ) via certain GPS requests (via tcp port 2947 )
– CFIMon marks it and the following branches as suspicious since the branches have never appeared in the train_set
– an alarm is triggered since the number of suspicious branches quickly exceeds the threshold suspicious branches addr. of system addr. of …
.
.
window size = 20 tolerant at most 3 suspicious branches
A.C. Chen 2012/09/18 @ ADL 28
Evaluation for Return-oriented Programming
Attacks
• Similar to other evaluation, CFIMon successfully detects all these attacks without experiencing false negatives
• Return-oriented Programming Attack of Squid
(ver. 2.5-STABLE1)
– stack overflow bug in its helper module, ntlm , when authentication
– smash the stack by supply arbitrary password of at most 300 bytes in function ntlm_check_auth
– violates the rules of CFIMon which enforces that the target address of a return instruction must be the one next to a call
A.C. Chen 2012/09/18 @ ADL 29
A.C. Chen 2012/09/18 @ ADL 30
• Quantitatively evaluate the performance of
CFIMon using several real-world applications
– Apache
– Exim
– Memcached
– Wu-ftpd
A.C. Chen 2012/09/18 @ ADL 31
• Memory overhead is negligible
– since the size of the tables ( ret_set, call_set and train_set) is quite small
• Performance overhead
Average overhead of pure BTS is 5.2%
Average overhead of CFIMon is only 6.1%
A.C. Chen 2012/09/18 @ ADL 32
A.C. Chen 2012/09/18 @ ADL 33
• The proposed CFIMon leveraged the branch trace store (BTS) mechanism to detect violation of control flow integrity
• The performance result shows that CFIMon can be applied to some real-world server applications on off-the-shell systems in daily use
A.C. Chen 2012/09/18 @ ADL 34
A.C. Chen 2012/09/18 @ ADL 35
•
There are several cases that the calling convention may be violated :
– setjmp / longjmp
• Instead of returning to its own caller, the longjmp returns to the caller of setjmp (also a legal address)
– Unix signal handling
• Instead of returning to the caller (OS), the handler returns to the interrupted process
• modify the OS to let the monitor omit the alarm when a signal handler returns
A.C. Chen 2012/09/18 @ ADL 36
High addr.
Stack Frame of A()
Stack Frame of B()
Stack Frame of C()
Stack Frame of D()
Low addr.
A()
B()
C()
D()
A.C. Chen 2012/09/18 @ ADL 37
second main
A.C. Chen 2012/09/18 @ ADL 38
Precision Mode of Intel CPU
---PEBS, BTS
• PEBS (Precise Event-Based Sampling)
– Precise Performance Counter
– atomic ‐ freeze: record exact IP address precisely
• BTS (Branch Trace Store)
– to capture all control transfer events
• jump, call, return, interrupt and exception
– also record the addresses of branch source and target
– enables the monitoring of the whole control flow of an application
A.C. Chen 2012/09/18 @ ADL 39
Precision Mode of Intel CPU
---LBR, Event Filtering, Conditional Counting
• LBR (Last Branch Record)
– to record the most recent branches into a register stack
– the size of the register stack is small
• Event Filtering
– to filter events not concerned with
– currently only available in LBR not BTS
• Conditional Counting
– to separate user-level events from kernel-level ones
– only increment counter while the processor is running at a specific privilege level
• e.g. “only counting when at user mode”
A.C. Chen 2012/09/18 @ ADL 40