Landon Cox
April 1, 2016
• Crucial goal of secure system
– Prevent inappropriate information flows
– Can model “appropriateness” with a lattice of tags
– i.e., only allow “low” objects to flow into “high” objects
– Non-interference := all flows are appropriate
• Information-flow analysis
– Helps track where sensitive data goes
– Getting this right is tricky
• Building blocks
– Storage objects (information receptacles)
– Processes (move information to/from objects)
• Tracking information
– Tag (or label) describes information sensitivity
– Each storage object is assigned a tag
– Need to update tags as processes execute
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
• What must we assume about any of P’s outputs?
– Must assume that they contain sensitive information
• Which processes are allowed to communicate with P?
– Other processes that are allowed to read D
• Why is this problematic?
– Probably want P to communicate with processes that can’t access D
– Hard to do anything useful otherwise
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client uid/pw accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client uid/pw accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client uid/pw accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client uid/pw accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
How do you solve this?
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client uid/pw accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
How do you solve this?
Often use a trusted
“declassifier”
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
SSH client uid/pw accept uid/pw; if (pw not in file) { return error;
} else { fork/exec shell;
}
Small piece of code trusted to remove tags
Declassifie r
Passwor d file
• Issue 1: precision
• Say that storage object is an address space
– If process P reads sensitive data item D
– P’s entire address space is tagged
• What else could we do to improve precision?
– Use finer-grained storage objects
– Tag program variables or memory words
• What are the implications for performance?
– Have to update tags much more frequently
– i.e., everytime an instruction executes
– Can introduce a lot of overhead
• Propagate taint tags with data flows c ← a op b taint(c) ← taint(a) ∪ taint(b) setTaint( a ,t) taint(a) ← {t} c = a + b taint(c) ← {t} ∪ {} = {t}
Send( c ,foo.net) Can foo.net see a?
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
// a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z);
}
Each line is an explicit flow from source operands to destination operand
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
// a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z);
}
Very easy to implement: just interpose on each instruction to update each var’s tag
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
Where is the implicit flow?
// a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1;
} y = 10; print (x); print (y);
}
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
How would you update x’s tag?
// a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1;
} y = 10; print (x); print (y);
}
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
What is tricky about this code?
// a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1;
} else { y = 10;
} print (x); print (y);
}
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
What is trickier about this code?
// a is sensitive void foo (int a) { int x, y; if (a > 10) { baz (&x);
} else { bar (&y);
} print (x); print (y);
}
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
Where is the implicit flow here?
// a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0);
} else { exit(1);
} y = 10; print (x); print (y);
}
• Issue 2: explicit vs implicit flows
• Two ways to propagate information
– Explicitly := direct transfer from one object to another
– Implicitly := indirect transfer usually via control flow
How would you track this?
// a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0);
} else { exit(1);
} y = 10; print (x); print (y);
}
• Get system to communicate in unintended ways
• Example: tenex (supposedly secure OS)
– Created a team to break in
– Team had all passwords within 48 hours … oops.
for (i=0; i<8; i++) { if (input[i] != password[i]) { break;
}
}
– Goal: require 256^8 tries to see if password is right
Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break;
}
}
• How to break?
• (user passes in input buffer, virtual mem faults are visible)
– Specially arrange the input’s layout in memory
– Force a page fault if second character is read
– If you get a fault, the first character was right
– Do again for third, fourth, … eighth character
• Can check the password in 256*8 tries
• Project proposals
– Due today (ok if you send it to me by Monday)
– Guidelines in the syllabus
– One page should be fine
• Amount of work
– Two-three weeks of effort
– Focus on answering one interesting question
Sensors rich, personal data.
Cloud largescale analysis, collection, dissemination.
Mobile present at work, home, and play.
Username
Password me@gmail.co
m
•••••••••
• Apps access sensitive information in many contexts
– Location, images, and communication
– Home, work, and play
• Apps run on behalf of many stakeholders
– Users, services, developers, platform providers, advertisers
Permissions are coarse.
No insight into what is collected and by whom.
http://blog.mylookout.com/2010/07/mobile-application-analysis-blackhat/
Consumer: “Why is my wallpaper app sending my phone number to
China?”
Enterprise: “Who is collecting information about our workers?”
http://online.wsj.com/article/SB20001424052748703806304576242923804770968.html
New mobile malware 1 New mobile malware family or variant 2
1 McAfee Threats Report: Q1 2012 - http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q1-2012.pdf
2 F-Secure Mobile Threat Report Q1 2012 - http://www.f-secure.com/weblog/archives/MobileThreatReport_Q1_2012.pdf
Where does data go after you grant access?
• Monitor where apps send data
– What happens after you grant access?
– Is observed behavior expected?
• Monitor apps at runtime
– Want users to monitor their own apps
– Must balance accuracy and efficiency
• Solution: TaintDroid
– Original collaboration with Penn State, Intel
– Will Enck (NCSU), Jaeyeon Jung (MSR), others
• TaintDroid : system-wide taint tracking for
Android
– Records “explicit” data dependencies via taint tags
– Does not capture “implicit” data dependencies
Track how information propagates
Check tags of emitted data
Tag data as enters app
Username
Password me@gmail.com
•••••••••
• TaintDroid: system-wide taint tracking for
Android
– Records “explicit” data dependencies via taint tags
– Does not capture “implicit” data dependencies
• Key issues for tag propagation
– How are tags stored?
– What is the tag-propagation logic?
– Is tracking precise and efficient?
• Project website: http://appanalysis.org
• Goal: balance precision and efficiency
Fast Process-grained
(All outputs tainted)
Ideal
Slow
Imprecise
Instruction-grained
(2-20x overhead)
Precise
• Variable-level tracking through Dalvik VM (DEX instructions)
• Patch state after native method invocation
• Extend tracking to IPC and file system
Message-level tracking
Application code ms g
Application code
Dalvik VM Dalvik VM Variable-level tracking
Method-level tracking Native system libraries
Network File system File-level tracking
• Tag-propagation logic for Dalvik executables
(DEX)
• Modified Dalvik VM
– Store and propagate 32-bit tags
• Local vars and args
– Store tags adjacent to vars on stack
– Correspond to VM registers
– 64-bit vars require two tags
• Class fields
– Store tags inside heap objects
• Arrays
– One tag per array
– Trade precision for efficient storage
• Performance optimizations
– Per-variable tags reduce storage overhead
– Adjacent tags provide spatial locality
SP
FP out0 out0 taint tag out1 out1 taint tag
(unused)
VM goop v0 == local0 v0 taint tag v1 == local1 v1 taint tag v2 == in0
… v4 taint tag
• Huge opportunity for performance gains
– JNI code is often CPU intensive
• Challenge for method-grained tracking
– In worst case, must manually reason about sideeffects
– Luckily, a very simple heuristic works most of the time class java.lang.Math { public static double cos (double d);
}
• Tainting heuristic
• “Assign union of arguments’ tags to return value on exit.”
• Most JNI methods have no side effects
• Many JNI methods operate on native types
• When it doesn’t work, use method profiles
• Generic framework for defining argument/retval dependencies
• So far, only needed to define for IBM charset converter
• See paper for more details … class java.lang.Math { public static double cos (double d);
}
• Found 2,844 JNI methods in Android source
– 913 did not use Object references
– Others could induce false negatives
• Third-party JNI is not supported
– Apps must be written entirely in Java
– Survey of Android Market, ~25% used .so file
– Subject of ongoing research
• Is TaintDroid fast and precise?
Fast Process-grained
(All outputs tainted)
TaintDroid
Slow
Imprecise
Instruction-grained
(2-20x overhead)
Precise
2000
1500
Not shown
Android TaintDroid
20% overhead
(extra memory accesses)
4.4% memory overhead
14% overhead
1000
500
0 sieve loop logic string float method total
CaffeineMark 3.0 Benchmark
(higher is better)
Android TaintDroid
2000
1500
1000
500
Reasons for efficiency
(1) Method-grained tracking of JNI calls
(2) Spatial locality of taint tags
(3) One tag per array
0 sieve loop logic string float method total
CaffeineMark 3.0 Benchmark
(higher is better)
• Selected 30 apps from Android Market
– Biased toward popular apps
– Sampled from 12 categories
• App permissions
– Access to Internet
– Access to location, camera, phone state, mic
– No native libraries
• Ran apps manually under TaintDroid
• Of 105 flagged connections, only 37 to expected servers
• 15 of 30 apps shared location with ad server
– admob.com, ad.qwapi.com, ads.mobclix.com, data.flurry.com
• Most traffic was plaintext (e.g., AdMob HTTP
GET)
...&s=a14a4a93f1e4c68&..&t=062A1CB1D476DE85
B717D9195A6722A9&d%5B coord %5D= 47.661227890000006
%2C -
122.31589477
&...
– data.flurry.com
used binary format
• In no cases were users informed by EULA
– In one case, app sent location every 30 seconds
• 7 apps sent device id (IMEI)
• 2 apps sent phone info (Ph. #, IMSI * , ICC-ID)
– Done without informing the user
– One app’s EULA indicated the IMEI was sent
– Another app sent the hash of the IMEI
• Frequency was app-specific
– One sent info every time the phone booted
• Source code available http://appanalysis.org/
– Most recent version is for Android 4.3
• Great platform for research
– Compatible with vast majority of Android apps
– Playground for all kinds of information-flow projects
• Video demo by Peter Gilbert
http://www.youtube.com/watch?v=qnLujX1Dw4Y
• Implicit flows
– Fundamentally difficult problem
– Can handle passwords (SpanDex, USENIX
Sec)
• Native code
– Ongoing work
– Talk to Ali and Alex …