A Graph-Game Model for Software Tamper Protection

advertisement
A Graph Game Model for
Software Tamper Protection
Nenad Dedić
Boston University
Mariusz Jakubowski
Ramarathnam Venkatesan
Microsoft Research
Information Hiding ‘07
June 11-13, 2007
Overview
Modeling of software tamper-resistance
•
•
•
•
•
•
Introduction
Past work on software protection
Definitions of tamper-resistance
Anti-tampering transformations
Security analysis
Conclusion
Information Hiding ’07
June 11-13, 2007
2
Software Protection
• Obfuscation
– Making programs “hard to understand”
• Tamper-resistance
– Making programs “hard to modify”
• Obfuscation  tamper-resistance
• Tamper-resistance  obfuscation?
Information Hiding ’07
June 11-13, 2007
3
Formal Obfuscation
• Impossible in general
– Black-box model (Barak et al.):
“Source code” doesn’t help adversary who can
examine input-output behavior.
– Worst-case programs and poly-time attackers
• Possible in specific limited scenarios
– Secret hiding by hashing (Lynn et al.)
– Point functions (Wee, Kalai et al.)
• Results difficult to use in practice.
Information Hiding ’07
June 11-13, 2007
4
Tamper-resistance
• Many techniques used in practice – e.g.:
–
–
–
–
Code-integrity checksums
Anti-debugging and anti-disassembly methods
Virtual machines and interpreters
Polymorphic and metamorphic code
• Never-ending battle on a very active field
– Targets: DRM, CD/DVD protection, games, dongles, licensing,
etc.
– Defenses: Binary packers and “cryptors,” special compilers,
transformation tools, programming strategies, etc.
• Current techniques tend to be “ad hoc:”
– No provable security
– No analysis of time required to crack protected instances
Information Hiding ’07
June 11-13, 2007
5
Problem Definition
We would like an algorithm Protect
roughly with following properties:
For any program P, Protect(P) outputs a new program Q:
• Q uses almost same resources as P.
• For any attacker A, if A(Q) outputs Q’, then either:
o For any input x, Q’(x) = Q(x).
o Q’ “crashes.”
Informally, tamper-protected P either works exactly like P
or fails.
Information Hiding ’07
June 11-13, 2007
6
Problems with the Definition
For any program P, Protect(P) outputs a new program Q:
• Q uses almost same resources as P
• For any attacker A, if A(Q) outputs Q’, then either:
o For any input x, Q’(x) = Q(x).
o Q’ “crashes.”
Definition imprecise, but there is a bigger problem:
It is unattainable.
Example “attack:” A(Q) = “run Q; append 0 to output”.
“Attack” is harmless, but breaks the definition.
No easy way out!
Information Hiding ’07
June 11-13, 2007
7
Towards a Realistic Model
• Give up on complete protection of P.
o Protect mainly some critical code portion L.
o Protect other parts to deflect attention away from L.
• Model restricted (but realistic) attackers.
• Make engineering assumptions about security:
o Code transformations
o Tamper detection
o Dataflow
o Control flow
Information Hiding ’07
June 11-13, 2007
8
Known Techniques and Attacks
Main scenario:
Program P contains some security-critical code L.
For example:
• L verifies that P is licensed software.
• L verifies that P has a license for rendering content.
• L contains important data (e.g., keys and credentials).
•…
Next : Survey of known techniques and attacks
to motivate the model and analysis.
Information Hiding ’07
June 11-13, 2007
9
Single-Point Check
L
L is called from P:
if (L returns 1)
then proceed;
else terminate;
P
Attack:
Control-flow analysis can help identify L.
Calls to L can then be patched.
Information Hiding ’07
June 11-13, 2007
10
Distributed Check
L is broken up into pieces, and/or
individualized copies are replicated.
L is called from P:
if (L returns 1)
then proceed;
else terminate;
P
Attacks – based on flow graph:
1. L is typically weakly connected to rest of P.
2. Guess position of one copy of L. Use graph-diffing to
find other copies (subgraph matching).
Information Hiding ’07
June 11-13, 2007
11
Code Checksums
To prevent tampering, compute
checksums C1,…,Ck of code segments.
C1
During execution, compare checksums
of loaded code segments
with pre-computed values.
P
Ck
C2
Attack:
Reading code segment can be trapped
(some hardware or VM support may be needed).
Correct code segment can then be supplied
by cracked program or VM.
Information Hiding ’07
June 11-13, 2007
12
Oblivious Hashing
Main idea of OH:
•Compute hashes H1,…,Hk of execution traces.
Hk
•Update hashes with values of assigned
variables and identifiers based on control flow.
H2
•Correct hashes can be precomputed and used
to encrypt some data.
•Individualized code replicas can be created; OH
values from each replica should be equal.
P
H1
Attacks:
Precomputed hash values could be discovered.
Code-replica scheme could be attacked using
program analysis (addressed in this work).
Information Hiding ’07
June 11-13, 2007
13
P
Anti-disassembly
Disassembling can be made difficult
by virtualization and individualization.
V1 P1
Vn Pn
Idea is to convert P into instances I=(VI,PI).
V2 P2
VI - virtual machine. PI - implementation of P for VI.
Different instances I, J can have VI  VJ .
So disassembling I is of little help to disassemble J.
Attack:
Vulnerable to attacks which do not use low-level details.
E.g. “copy-attack”:
To find out if branch B is causing crash, save state
before B and try multiple paths.
Information Hiding ’07
June 11-13, 2007
14
Defense Against Copy Attack
1. Crash only after multiple tampering changes detected.
2. Have many possible crash locations.
3. Delay the crash.
4. Randomize execution paths.
Somewhat achievable using known techniques, e.g.:
• Use redundant copies of encrypted data.
• Make many code fragments dependent on checks.
• Overlap code sections for context-dependent semantics.
• Create multiple individualized copies of code.
Information Hiding ’07
June 11-13, 2007
15
Defense Against Program Analysis
Basic notion: “local indistinguishability”
•
•
Ideally, local observations of code/data/execution
should give no useful information to attacker.
In practice, try to satisfy this as much as possible.
1. Small code fragments all look alike.
(E.g., use semantics-preserving peephole transformations.)
2. Control-flow graph looks like a complete graph.
(E.g., use computed jumps and opaque predicates.)
3. Dataflow graph looks like a complete graph.
(E.g., use lightweight encryption and temporary variable corruption.)
Information Hiding ’07
June 11-13, 2007
16
Detecting Unusual Data/Code
Security-related data/code can look unusual and rare
(e.g., XOR used for encryption and random data used
for crypto keys both stand out and can be detected).
To mitigate, can use peephole transformations, nearclear encryption, data scattering, etc.
Information Hiding ’07
June 11-13, 2007
17
Assortment of tools are available.
How to combine them effectively?
How much security can we get?
How to quantify security?
Information Hiding ’07
June 11-13, 2007
18
Basic Model
Abstraction of software tamper-resistance
• Program:
– A graph G
• Execution:
– A “random” walk on G
• Integrity check:
– Group of nodes in G responsible for monitoring a set of
code fragments (probabilistically)
• Check failure:
– Tampering flagged on all code fragments in a check’s set
• Tamper response
– An action taken when a “sufficient” number of checks have
failed
Information Hiding ’07
June 11-13, 2007
19
Elements of Model
Local tamper check:
C=InsertCheck(F1,…,Fs)
• Check C of size s specified by s code fragments F1, …, Fs .
• Each Fi detects tampering with probability p.
• Check fails if each Fi detects tampering.
( Can have many checks C1,…,Ck .)
Tamper response:
InsertResponse(P, (C1,…,Ck), f )
• P “crashes” if at least f checks fail (f is the threshold).
(“Crash” could be any other form of response: slowdown, graceful degradation,
loss of features, self-correction, etc.)
Information Hiding ’07
June 11-13, 2007
20
Elements of Model
Graph transformations:
(V,E)=GraphTransform(P, n)
P is transformed into an equivalent program Q
with flow graph G=(V,E) containing n nodes.
• G is random-looking. (rapid mixing of random walks).
• Execution of Q induces a random-looking walk on G.
Critical-code embedding:
F’=CodeEntangle(F, L)
Critical code L is embedded into code fragment F, yielding F’.
• F’ is equivalent to “ if L returns 1 then execute F ”.
• Desirable to make embedded code hard to remove.
Information Hiding ’07
June 11-13, 2007
21
The Algorithm
Main ideas: 1. Transform the flow graph into a “random” one.
2. Replicate critical code in l random nodes.
3. Randomly insert k checks of size s.
4. Create check response with threshold f.
Harden(P, L, l, n, k, s, f):
let G = (V,E) = GraphTransform(P, n)
for i = 1 to l do
select at random vV
v = CodeEntangle(L, v)
for i = 1 to k do
select at random (v1,…,vs)V
Ci = InsertCheck(v1,…,vs)
InsertResponse(G, (C1,…,Ck), f )
Information Hiding ’07
June 11-13, 2007
22
The Algorithm
• Programmer assistance can help in algorithm:
o Choose places to embed critical code L.
o Identify code/data suitable for checking.
o Identify code/data suitable for tamper response.
Information Hiding ’07
June 11-13, 2007
23
Attack Model
Attacker plays a game on the program graph G.
Goal: Run the program and avoid executing critical code L.
Game moves
• Make a step on G:
o either follow untampered execution of P
o or tamper to change execution
 (tampering detected by checks…)
• Guess a check D=(u1,…,us).
o If D=Ci , then Ci is disabled.
• If P crashes, restart.
Information Hiding ’07
June 11-13, 2007
24
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Information Hiding ’07
June 11-13, 2007
25
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Information Hiding ’07
June 11-13, 2007
26
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
= critical code
Information Hiding ’07
June 11-13, 2007
27
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
= critical code
Information Hiding ’07
June 11-13, 2007
28
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
= critical code
Information Hiding ’07
June 11-13, 2007
29
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
- tamper current node
= critical code
Information Hiding ’07
June 11-13, 2007
30
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
- tamper current node
Check is activated when
all its nodes are tampered.
= critical code
Information Hiding ’07
June 11-13, 2007
31
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
- tamper current node
Check is activated when
all its nodes are tampered.
P crashes when f checks are activated.
Information Hiding ’07
= critical code
June 11-13, 2007
32
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
- tamper current node
Check is activated when
all its nodes are tampered.
P crashes when f checks are activated.
A tries to guess a check.
Information Hiding ’07
= critical code
June 11-13, 2007
33
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
- tamper current node
Check is activated when
all its nodes are tampered.
P crashes when f checks are activated.
= critical code
A tries to guess a check.
If guess is correct, the check is disabled (can’t be activated).
Information Hiding ’07
June 11-13, 2007
34
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
In each (random) step A can either:
- observe
-models untampered execution
- tamper current node
Check is activated when
all its nodes are tampered.
P crashes when f checks are activated.
= critical code
A tries to guess a check.
If guess is correct, the check is disabled (can’t be activated).
Information Hiding ’07
June 11-13, 2007
35
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
Game moves:
observe, tamper, guess
= critical code
Information Hiding ’07
June 11-13, 2007
36
Attack Model
Attacker plays a game on flow-graph G=(V,E).
G
Check = set of nodes.
Execution = walk on G.
Game moves:
observe, tamper, guess .
Attacker wins if:
- P runs for >N steps without crashing.
- Each step in critical code is tampered.
Information Hiding ’07
= critical code
June 11-13, 2007
37
Security Estimates
Security analysis in graph model.
Parameters:
k = cn
(# of checks proportional to # of nodes)
f = cn/2
(response threshold is half of the checks)
p=1
(tamper detection is perfect)
l=n
(critical code replicated in every node)
N = n1+
(required running time before crash)
Analyzed attacks take (ns) time!
(s = check size)
No proof yet for arbitrary attacks. More work needed…
Information Hiding ’07
June 11-13, 2007
38
Security Arguments
Claim 1:
As long as no check is disabled, A wins with exp. small prob.
(Enough to have “not too many” checks disabled.)
P runs for >N steps

“Long” rapidly mixing random walk

Critical code encountered “many” times

A must tamper “many” nodes

Program crashes
Information Hiding ’07
June 11-13, 2007
39
Security Arguments
Claim 1:
As long as no check is disabled, A wins with exp. small prob.
(Enough to have “not too many” checks disabled.)
Desired claim 2:
Any O(ncs) attacker learns a check location with exp. small prob.
So far we only analyzed some specific attacks.
No complete proof of above claim yet.
Claim 1 + Claim 2  No O(ncs) attacker can win.
Information Hiding ’07
June 11-13, 2007
40
Attack 1: Voting Attack
Let V={1,…,n}.
Each check is an s-tuple of integers (v1,…,vs).
Main idea:
• Suppose A tampers with P, which subsequently crashes.
• Let WV denote the tampered nodes.
• Then any s-tuple (v1,…,vs)Ws is more likely to be a check
than not.
So “vote” for all (v1,…vs) Ws .
Do this D times and output k candidates with most votes.
Information Hiding ’07
June 11-13, 2007
41
Attack 1: Voting Attack
Let V={1,…,n}.
Each check is an s-tuple of integers (v1,…,vs).
1. Fill an s-dimensional nn…n array B with zeros.
2. for i=1 to D do
1. run P and tamper with it arbitrarily until it crashes
(let W be the set of tampered nodes)
2. for each (v1,…,vs)Ws do
B[v1,…,vs] = B[v1,…,vs] + 1
3. Find the k entries of B with highest values
and output their indices as guesses for check nodes.
Can prove: Updating the table of votes takes ns steps.
(Hence ns is lower bound on attack time.)
Information Hiding ’07
June 11-13, 2007
42
Attack 2: Intersection Attack
Let V={1,…,n}.
Each check is an s-tuple of integers (v1,…,vs).
Main idea:
Suppose A considers m tampered runs of P,
with W1,…,Wm denoting sets of tampered nodes in each run.
If some check C= (v1,…,vs) is activated in all m runs,
then C  B = (W1  W2  …  Wm)s .
For large enough m, B could be of tractable size, and
A could search all of it.
But small |B| are unlikely to contain any checks.
Can prove: Expected time to find a check is still >ncs.
Information Hiding ’07
June 11-13, 2007
43
Summary and Further Work
•Main goals of work
o Modeling of software tamper-resistance
o Algorithms for tamper-resistance with analyzable
security
•Extensions
o More realistic model:
• Allow some adversarial steps in walk.
o More realistic parameters:
• p<1 – tamper detection unreliable
• l<n – critical code replicated only in fraction of P
• Other parameters: number of checks, threshold, etc.
o Implementation
Information Hiding ’07
June 11-13, 2007
44
Download