Tappan Zee (North) Bridge Mining Memory Accesses for Introspection Brendan Dolan-Gavitt, Tim Leek, Josh Hodosh, and Wenke Lee ACM CCS 11/6/2013 This work is sponsored by the Assistant Secretary of Defense for Research & Engineering under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. Photo by joseph a Background: Introspection • Often we want to provide isolation for a security-sensitive process • This isolation limits visibility: a semantic gap that must be bridged • Lots of existing work on automatically bridging this gap:Virtuoso,VMST ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Motivation • Application introspection is hard • Finding places to hook: also hard • Neither scenario is supported with existing techniques • Currently done by manual reverse engineering ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Fond Memories • We focus on a primitive operation used by (nearly) all apps: memory access • Tapping memory accesses provides a way to find places to hook for OS and application introspection • Memory accesses provide streams of data that illuminate the inner workings of apps ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Tap Points • Abstraction for “memory access made at point in program” • Idea: Data at a tap point of same “type” • Consists of: URLs • Caller • Program Counter dmesg • “Program” (CR3) ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Context 00a3bdgoogle.comr2ab.tmpa2bc google.comr2ab.tmp (a) strcpy (b) 00a3bda2bc r2ab.tmp google.com memcpy strcpy←open_file strcpy←open_url (c) 00a3bda2bc ACM CCS 2013 memcpy Tappan Zee (North) Bridge 11/6/2013 Context 00a3bdgoogle.comr2ab.tmpa2bc google.comr2ab.tmp (a) strcpy (b) 00a3bda2bc r2ab.tmp google.com memcpy strcpy←open_file strcpy←open_url (c) 00a3bda2bc ACM CCS 2013 memcpy Tappan Zee (North) Bridge 11/6/2013 Context 00a3bdgoogle.comr2ab.tmpa2bc google.comr2ab.tmp (a) strcpy (b) 00a3bda2bc r2ab.tmp google.com memcpy strcpy←open_file strcpy←open_url (c) 00a3bda2bc ACM CCS 2013 memcpy Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Sample Tap Points Content Read Write ! 00123456! ! ! ! ! ! ! ! Tap ! ! ! ! ! ! ! ! ! \Device\Harddisk! 00430F3C 00FFABED! 00123456! 00ABCDEF! 0064A42D! \Device\Harddisk! ACM CCS 2013 00646517 00646517 00646517 00646517 ! ! ! ! ! ! ! 0064A423 0064A424 0064A427 0064A428 Code Kernel! Kernel! Kernel! Kernel! 0064A42D 00430E13 Kernel! 0064A42D 00430E15 Kernel 0064A423 0064A424 0064A427 0064A428 ! push push push call ebx! [ebp+var_28]! esi! _memcpy! _memcpy:! [...]! 00430E08 shr ecx, 2! 00430E0B and edx, 3! 00430E0E cmp ecx, 8! 00430E11 jb short loc_430E3C! 00430E13 rep movsd! 00430E15 jmp off_430F2C[edx*4] Tappan Zee (North) Bridge 11/6/2013 Finding Tap Points • Millions of tap points in even short slices of system execution • Gigabytes of data generated • Need effective, efficient ways of finding useful tap points ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Monitoring Memory Accesses • Can instrument QEMU to log all memory accesses – SLOW • To get around this we use whole-system record and replay • Implemented in our open source dynamic analysis framework (PANDA) ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 General Strategy • Identify behavior of interest (“URL access”) • Training: Create recording in which behavior occurs (“visit google.com”) • Search: Replay recording with instrumentation and look for tap point with desired content ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Search Strategies • Known knowns • Known unknowns • Unknown unknowns ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Known Knowns: String Searching • Can be efficiently implemented using one counter per tap point per search string • Millions of tap points can be searched with a few megabytes of memory • We found: URLs, filenames, window titles, SSL/TLS master keys ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Known Knowns: TLS/SSL Keys • If exact key is not known in advance (e.g., malicious server) we cannot string search • However, we can do trial decryption on all 48-byte strings seen at tap points • TLS MAC allows us to verify decryption ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 DA can, additionally, re-render ode via a module extracted from ussion of this capability as it is oring represent the expression. We leave this generalization future work, however. Statistical Search and Clustering Known5.4 Unknowns: Information Retrieval 2, tap points need information Keeping track of this informadge about the CPU architecture g, and so we decided to encaple plugin. TZB’s other analyses call stack to arbitrary depth by not worry about the details de- Collecting bigram statistics on data that passes thro each tap point is an efficient way to enable “fuzzy” se based based on some training examples, as well as enab clustering. To implement this we collect bigram statistic all tap points seen in execution, as well as for the exemp the data seen at each tap point is thus represented sparse vector with 65,536 elements (one for each poss pair of bytes). To search, we can then sort the tap points seen by ta the distance (according to some metric) from the exemp For our metric, we have chosen to use Jensen-Shannon vergence [18], which is a smoothed and symmetrized ver of the classic Kullback-Leibler divergence [16] (also kn as information gain). We also examined the Euclidean cosine distance metrics, but found their performance t consistently worse. Jensen-Shannon divergence between probability distributions P and Q is defined as: • Given some training examples, find tap points containing “similar” data • We compute bigram byte statistics for each mation, the callstack plugin exit is translated, looking for an nstruction (currently, we look for v lr, pc on ARM). If the block then we push the return address ach time that block executes. m a function does not require any Before the execution of every baer the address we are about to stack; if so, we pop it. We only address of the basic block, beCCS 2013 terminates a ACM basic block, so the tap point & training examples • Sort by Jensen-Shannon divergence (similar to mutual information / Kullback–Leibler divergence) ✓ ◆ JSD(P, Q) = H P +Q 2 H(P ) + H(Q) 2 where H is Shannon entropy. Tappan Zee (North) 11/6/2013 Bigram collectionBridge is done by maintaining, for each Finding dmesg (training) ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 dmesg FreeBSD MINIX Haiku Linux ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Unknown Unknowns: Clustering • Group tap points containing “similar” data together • Algorithm: K-means with Jensen-Shannon as distance metric • Unsupervised learning – no training ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 23456 gptid/85a1fb05-3e8f-11e2-80e5-525400123456 r0w0e0 err#0 e=box,label=’’PART\nada0\nr#2’’]; l=’’r1w0e0’’]; xc1e9eb00; ABEL a0p3 #3 ufsid/50bec59dcc135902 r0w0e0 err#0 FreeBSD Boot Cluster DEV ada0p3 r#3 in/stt/sbin/sysctl/bin/ps/sbin/sysctl/sbi sbin/md/sbin/sysctl/sbin/sysctl/bin/ken/s /bin/ps/sbin/sysctl/sbin/sysctl/sbin/sysc r0w0e0 r0w0e0 n/ps/bin/dd/sbin/sysctl/bin/dat/bin/df/sb LABEL ada0p2 r#3 LABEL ada0p2 r#3 r0w0e0 r0w0e0 el00000000-0000-0000-0000-000000000000000 ada0p3 /faa_N_=_peOfA=fA=feTr=tul.n=_eo/.b_Yt_vtectvifat=a=-sd_Ee 000-00000000000000000000-0000-0000-0000-0 r0w0e0 Ofu=u_0y:nF:tRseeeeEfciOtmdtuinlrlrrlpp/nppfpcepinl=l=.llN err#0 lNlgllpl_.4l_l_2/l_l_22lileldlylo- 21laltlat=rrrsbgrskgni/ 5d-3e9f-11e2-a557-525400123456993c915d-3e 00123456/boot/kernel/kernel/boot/kernel/k russian|Russian Users Accounts: :charset=KOI8-R: : r#2 DEV gptid/859ead5b-3e8f-11e2r#4 r#2 r0w0e r0w0e0 DEV ada0p3 r#3 n/rcorde/bin/cat/sbin/md/sbin/sysctl/sbin/sysctl/bin/ken/s bin/dumpon/bin/ln/bin/ps/sbin/sysctl/sbin/sysctl/sbin/sysc r0w0e0 r0w0e0 tl/sbin/sysctl/bin/ps/bin/dd/sbin/sysctl/bin/dat/bin/df/sb 1009a80’’> /boot/kernel/kernel00000000-0000-0000-0000-000000000000000 ada0p3 00000-0000-0000-0000-00000000000000000000-0000-0000-0000-0 r0w0e0 00000000000993c915d-3e9f-11e2-a557-525400123456993c915d-3e err#0 9f-11e2-a557-525400123456/boot/kernel/kernel/boot/kernel/k ... se/9.0.0/etc/rc.d/newsyslog 197947 2009-1 CONFIG_AUTOGENERATED ugb $ modulesoptions ident GENERIC machine i386 ACM CCS 2013 r0w0e0 r0w0e0 gptid/85a1fb05-3e8f-11e2-80e5-525400123456 r0w0e0 err#0 10362c0’’> ada0p3 r#3 /sbin/in/bin/sh/bin/stt/sbin/sysctl/bin/ps/sbin/sysctl/sbi e> r1w0e0 DEV DEV PART ufsid/50bec59dcc135902 ada0 r#4 ada0 nss_compat.so.1dhclientShared object ‘‘nss_compat.so.1’’ n r0w0e0 r0w0e0 ot found, required by ‘‘dhclient’’nss_nis.so.1dhclientShar ed object ‘‘nss_nis.so.1’’ not found, required by ‘‘dhclie nt’’nss_files.so.1dhclientShared object ‘‘nss_files.so.1’’ digraph geom { z0xc1d8de00 [shape=box,label=’’PART\nada0\nr#2’’]; z0xc1f4f640 [label=’’r1w0e0’’]; z0xc1f4f640 -> z0xc1e9eb00; LABEL DEV ada0p2 r#3 ada0p2 r1w0e0 err#0 lang=ru_RU.KOI8-R: : :passwd_format=md5: :co DEV DEV pyright=/etc/COPYRIGHT: :welcome=/etc/motd: :sete CONFIG_AUTOGENERATED gptid/85a9469d-3e8f-11e2-80e5-525400123456 gptid/85a1fb05-3e8f-11e2-80e5-525400123456 nv=MAIL=/var/mail/$,BLOCKSIZE=K,FTP_PASSIVE_MODE=YES: r#4 r#4 gptid/85a9469d-3e8f-11e2-80e5-525400123456 r0w0e0 err#0 VFS s.ada0p2 r#3 r1w0e0 ufsid/50bec59dcc135902 r0w0e0 err#0 ada0 LABEL ada0p2 r#3 LABEL VFS r1w0e0 ada0p2 s.ada0p2 r#3 err#0 r#3 r0w0e0 r0w0e0 r1w0e0 gptid/859ead5b-3e8f-11e2r0w0e err#0 DEV ada0p2 r#3 r0w0e0 DISK ada0 ada0p2 r#1 r1w0e0 err#0 DEV ada0 r#2 Tappan Zee (North) Bridge PART ada0 r#2 11/6/2013 LABEL ada0p1 r#3 r0w0e0 ada0p1 r0w0e0 err#0 Clustering Evaluation • Manually labelled ~3000 tap points • Compared human labels to clustering • Very poor agreement • Humans said most things were “binary pattern” ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 TZB is General • Four introspection tasks: URL access, file access, SSL/TLS keys, boot messages • Five operating systems: Windows 7, Linux, FreeBSD, Minix, and Haiku • Two architectures: x86_64, ARM ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Limitations • Need to know data encoding • One caller may not be enough (or too much!) • Contents may be written (temporally) out of order ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Future Work • Clustering results suggestive, but more investigation is needed • Application to memory forensics – online vs. offline • Tap points in JITed code ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Summary • TZB locates tap points: places to interpose for extracting security-relevant information • Analysis to find tap points is heavyweight, but is automated and only needs to be done once, offline • Improves on state of the art: expensive and time-consuming manual reverse engineering ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Thanks for Listening • All code related to this project is available at https://github.com/moyix/panda • Contact: brendan@cc.gatech.edu http://cc.gatech.edu/~brendan/ • Questions? ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013 Performance • Depends on analysis • String search: ~10x • SSL/TLS: ~500x ACM CCS 2013 Tappan Zee (North) Bridge 11/6/2013