Information flow Landon Cox April 1, 2016

Information flow

Landon Cox

April 1, 2016

Information flow

• Crucial goal of secure system

– Prevent inappropriate information flows

– Can model “appropriateness” with a lattice of tags

– i.e., only allow “low” objects to flow into “high” objects

– Non-interference := all flows are appropriate

• Information-flow analysis

– Helps track where sensitive data goes

– Getting this right is tricky

Information flow

• Building blocks

– Storage objects (information receptacles)

– Processes (move information to/from objects)

• Tracking information

– Tag (or label) describes information sensitivity

– Each storage object is assigned a tag

– Need to update tags as processes execute

Information flow

• Issue 1: precision

• Say that storage object is an address space

– If process P reads sensitive data item D

– P’s entire address space is tagged

• What must we assume about any of P’s outputs?

– Must assume that they contain sensitive information

• Which processes are allowed to communicate with P?

– Other processes that are allowed to read D

• Why is this problematic?

– Probably want P to communicate with processes that can’t access D

– Hard to do anything useful otherwise

Information flow





SSH client accept uid/pw; if (pw not in file) { return error;

} else { fork/exec shell;

}

Passwor d file

Information flow





SSH client uid/pw accept uid/pw; if (pw not in file) { return error;


}

Passwor d file

Information flow





SSH client uid/pw accept uid/pw; if (pw not in file) { return error;


}

Passwor d file

Information flow





SSH client uid/pw accept uid/pw; if (pw not in file) { return error;

} else { fork/exec shell;

}

Passwor d file

Information flow







}

How do you solve this?

Passwor d file

Information flow







}

How do you solve this?

Often use a trusted

“declassifier”

Passwor d file

Information flow







}

Small piece of code trusted to remove tags

Declassifie r

Passwor d file

Information flow





• What else could we do to improve precision?

– Use finer-grained storage objects

– Tag program variables or memory words

• What are the implications for performance?

– Have to update tags much more frequently

– i.e., everytime an instruction executes

– Can introduce a lot of overhead

Tracking explicit flows

• Propagate taint tags with data flows c ← a op b taint(c) ← taint(a) ∪ taint(b) setTaint( a ,t) taint(a) ← {t} c = a + b taint(c) ← {t} ∪ {} = {t}

Send( c ,foo.net) Can foo.net see a?

Information flow

• Issue 2: explicit vs implicit flows

• Two ways to propagate information

– Explicitly := direct transfer from one object to another

– Implicitly := indirect transfer usually via control flow

// a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z);

}

Each line is an explicit flow from source operands to destination operand

Information flow





// a is sensitive int foo (int a){ int b, w, x, y, z; a = 11; b = 5; w = a * 2; x = b + 1; y = w + 1; z = x + y; print (z);

}

Very easy to implement: just interpose on each instruction to update each var’s tag

Information flow





Where is the implicit flow?

// a is sensitive void foo (int a) { int x, y; if (a > 10) { x = 1;

} y = 10; print (x); print (y);

}

Information flow





How would you update x’s tag?


} y = 10; print (x); print (y);

}

Information flow





What is tricky about this code?


} else { y = 10;

} print (x); print (y);

}

Information flow





What is trickier about this code?

// a is sensitive void foo (int a) { int x, y; if (a > 10) { baz (&x);

} else { bar (&y);

} print (x); print (y);

}

Information flow





Where is the implicit flow here?

// a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0);

} else { exit(1);

} y = 10; print (x); print (y);

}

Information flow





How would you track this?

// a is sensitive void foo (int a) { int x, y; if (a > 10) { exit(0);

} else { exit(1);

} y = 10; print (x); print (y);

}

Hidden channels

• Get system to communicate in unintended ways

• Example: tenex (supposedly secure OS)

– Created a team to break in

– Team had all passwords within 48 hours … oops.

for (i=0; i<8; i++) { if (input[i] != password[i]) { break;

}

}

– Goal: require 256^8 tries to see if password is right

Hidden channels: tenex

Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break;

}

}

• How to break?

• (user passes in input buffer, virtual mem faults are visible)

– Specially arrange the input’s layout in memory

– Force a page fault if second character is read

– If you get a fault, the first character was right

– Do again for third, fourth, … eighth character

• Can check the password in 256*8 tries

Course administration

• Project proposals

– Due today (ok if you send it to me by Monday)

– Guidelines in the syllabus

– One page should be fine

• Amount of work

– Two-three weeks of effort

– Focus on answering one interesting question

Sensors  rich, personal data.

Cloud  largescale analysis, collection, dissemination.

Mobile  present at work, home, and play.

Username

Password me@gmail.co

m

•••••••••

App-centric operating systems

• Apps access sensitive information in many contexts

– Location, images, and communication

– Home, work, and play

• Apps run on behalf of many stakeholders

– Users, services, developers, platform providers, advertisers

Monitoring app behavior

Permissions are coarse.

No insight into what is collected and by whom.

http://blog.mylookout.com/2010/07/mobile-application-analysis-blackhat/

Consumer: “Why is my wallpaper app sending my phone number to

China?”

Enterprise: “Who is collecting information about our workers?”

Wider interest in the issue

http://online.wsj.com/article/SB20001424052748703806304576242923804770968.html

Emerging malware threat

New mobile malware 1 New mobile malware family or variant 2

1 McAfee Threats Report: Q1 2012 - http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q1-2012.pdf

2 F-Secure Mobile Threat Report Q1 2012 - http://www.f-secure.com/weblog/archives/MobileThreatReport_Q1_2012.pdf

Where does data go after you grant access?

Monitoring goals

• Monitor where apps send data

– What happens after you grant access?

– Is observed behavior expected?

• Monitor apps at runtime

– Want users to monitor their own apps

– Must balance accuracy and efficiency

• Solution: TaintDroid

– Original collaboration with Penn State, Intel

– Will Enck (NCSU), Jaeyeon Jung (MSR), others

Taint tracking

• TaintDroid : system-wide taint tracking for

Android

– Records “explicit” data dependencies via taint tags

– Does not capture “implicit” data dependencies

Track how information propagates

Check tags of emitted data

Tag data as enters app

Username

Password me@gmail.com

•••••••••

Taint tracking

• TaintDroid: system-wide taint tracking for

Android

– Records “explicit” data dependencies via taint tags

– Does not capture “implicit” data dependencies

• Key issues for tag propagation

– How are tags stored?

– What is the tag-propagation logic?

– Is tracking precise and efficient?

• Project website: http://appanalysis.org

Tag propagation

• Goal: balance precision and efficiency

Fast Process-grained

(All outputs tainted)

Ideal

Slow

Imprecise

Instruction-grained

(2-20x overhead)

Precise

Multi-level approach

• Variable-level tracking through Dalvik VM (DEX instructions)

• Patch state after native method invocation

• Extend tracking to IPC and file system

Message-level tracking

Application code ms g

Application code

Dalvik VM Dalvik VM Variable-level tracking

Method-level tracking Native system libraries

Network File system File-level tracking

Variable-level tracking

• Tag-propagation logic for Dalvik executables

(DEX)

Variable-level tracking

• Modified Dalvik VM

– Store and propagate 32-bit tags

• Local vars and args

– Store tags adjacent to vars on stack

– Correspond to VM registers

– 64-bit vars require two tags

• Class fields

– Store tags inside heap objects

• Arrays

– One tag per array

– Trade precision for efficient storage

• Performance optimizations

– Per-variable tags reduce storage overhead

– Adjacent tags provide spatial locality

SP

FP out0 out0 taint tag out1 out1 taint tag

(unused)

VM goop v0 == local0 v0 taint tag v1 == local1 v1 taint tag v2 == in0

… v4 taint tag

Method-grained tracking

• Huge opportunity for performance gains

– JNI code is often CPU intensive

• Challenge for method-grained tracking

– In worst case, must manually reason about sideeffects

– Luckily, a very simple heuristic works most of the time class java.lang.Math { public static double cos (double d);

}


• Tainting heuristic

• “Assign union of arguments’ tags to return value on exit.”

• Most JNI methods have no side effects

• Many JNI methods operate on native types

• When it doesn’t work, use method profiles

• Generic framework for defining argument/retval dependencies

• So far, only needed to define for IBM charset converter

• See paper for more details … class java.lang.Math { public static double cos (double d);

}


• Found 2,844 JNI methods in Android source

– 913 did not use Object references

– Others could induce false negatives

• Third-party JNI is not supported

– Apps must be written entirely in Java

– Survey of Android Market, ~25% used .so file

– Subject of ongoing research

Evaluation

• Is TaintDroid fast and precise?

Fast Process-grained

(All outputs tainted)

TaintDroid

Slow

Imprecise

Instruction-grained

(2-20x overhead)

Precise

Performance evaluation

2000

1500

Not shown

Android TaintDroid

20% overhead

(extra memory accesses)

4.4% memory overhead

14% overhead

1000

500

0 sieve loop logic string float method total

CaffeineMark 3.0 Benchmark

(higher is better)

Performance evaluation

Android TaintDroid

2000

1500

1000

500

Reasons for efficiency

(1) Method-grained tracking of JNI calls

(2) Spatial locality of taint tags

(3) One tag per array

0 sieve loop logic string float method total

CaffeineMark 3.0 Benchmark

(higher is better)

App study

• Selected 30 apps from Android Market

– Biased toward popular apps

– Sampled from 12 categories

• App permissions

– Access to Internet

– Access to location, camera, phone state, mic

– No native libraries

• Ran apps manually under TaintDroid

App study

• Of 105 flagged connections, only 37 to expected servers

App study: location

• 15 of 30 apps shared location with ad server

– admob.com, ad.qwapi.com, ads.mobclix.com, data.flurry.com

• Most traffic was plaintext (e.g., AdMob HTTP

GET)

...&s=a14a4a93f1e4c68&..&t=062A1CB1D476DE85

B717D9195A6722A9&d%5B coord %5D= 47.661227890000006

%2C -

122.31589477

&...

– data.flurry.com

used binary format

• In no cases were users informed by EULA

– In one case, app sent location every 30 seconds

App study: phone identifiers

• 7 apps sent device id (IMEI)

• 2 apps sent phone info (Ph. #, IMSI * , ICC-ID)

– Done without informing the user

– One app’s EULA indicated the IMEI was sent

– Another app sent the hash of the IMEI

• Frequency was app-specific

– One sent info every time the phone booted

appanalysis.org

• Source code available http://appanalysis.org/

– Most recent version is for Android 4.3

• Great platform for research

– Compatible with vast majority of Android apps

– Playground for all kinds of information-flow projects

• Video demo by Peter Gilbert

TaintDroid demo

http://www.youtube.com/watch?v=qnLujX1Dw4Y

Media coverage

Limitations

• Implicit flows

– Fundamentally difficult problem

– Can handle passwords (SpanDex, USENIX

Sec)

• Native code

– Ongoing work

– Talk to Ali and Alex …

Information flow Landon Cox April 1, 2016

Information flow