IEEE Paper Template in A4 (V1) - Academic Science,International

advertisement
Exposing Cryptographic Operations from Malware
Binaries using Dynamic Information Analysis and
Avalanche Effect
Anitha Abraham
Final Year M.Tech Student,
School of computer Sciences,
Mahatma Gandhi University
Varghese Paul
Associate Professor
Department of IT
CUSAT
Abstract—Malwares are often used to disrupt computer
operation, gather sensitive information, or gain access to private
computer systems. Nowadays malwares are becoming
increasingly stealthy. Cryptographic algorithms are used by
most of modern malwares to protect themselves from being
analyzed. The use of cryptographic algorithms and truly
transient cryptographic secrets inside the malware binary
imposes a key obstacle to effective malware analysis and defense.
The proposed malware analysis framework can be used for
effective malware analysis and defence. It can automatically
identify and recover the cryptographic operations and transient
secrets from the execution of binary executables. This framework
is based on the avalanche effect, which is a desirable property of
cryptographic functions. The avalanche effect means that, a
slight change in the input causes a significant change in the
output, ie flipping a single bit in the input changes half of the
output bits. Another feature of the avalanche effect is that it
allows us to accurately pinpoint the location, size and boundary
of both the input and output buffers. Using avalanche effect, this
framework is able to accurately pinpoint the boundary of
cryptographic operation and recover truly transient
cryptographic secrets that only exist in memory for one instant in
between multiple nested cryptographic operations.
Keywords—Binary analysis, Cryptographic Operations, Key
Recovery, Transient secret, Avalanche effect.
I. INTRODUCTION
Malware analysis and reverse engineering seek to
understand the inner workings of malware, which are
invaluable to defending against malware. Cryptographic
algorithms are used by most of modern malwares to protect
themselves from being analyzed. Inorder to prevent the
cryptographic secrets from being recovered by key searching
tools, modern malware can make the cryptographic secrets
truly transient in memory by encrypting or destroying the
secrets right after using them at run-time. The use of
cryptographic algorithms and truly transient cryptographic
secrets inside the malware binary executable imposes a key
obstacle to effective malware analysis and defense.
Different methods [2], [3] have been proposed to
automatically recover the cryptographic keys from process
memory or file. But , these methods require the cryptographic
keys to be stored in plaintext form. These methods are not
effective when the cryptographic keys are stored encrypted or
transient.
Chinju.K
Assistant Professor,
School of Computer Sciences,
Mahatma Gandhi University
Recently different binary analysis approaches [5],[6],[7]
are proposed. These approaches are able to automatically
detect the existence of cryptographic operations from a given
binary execution. These binary analysis approaches are based
on instruction profiling and signature of particular
cryptographic implementations. However, they are not
effective in the presence of multiple rounds of cryptographic
operations.
In order to recover such transient cryptographic secrets,
one needs to reliably identify exactly when and where those
transient cryptographic secrets will be in memory. This
requires one to accurately pinpoint the boundary of each of the
multiple rounds of cryptographic operations. Existing binary
analysis approaches could not accurately pinpoint the
boundary between multiple rounds of cryptographic
operations. Existing methods also fail to recover truly
transient cryptographic secrets from the execution of a given
binary executable.
Malware binary analysis framework can be used to
accurately pinpoint the boundary of individual cryptographic
operation from multiple rounds of cryptographic operations. It
can also recover truly transient secrets from the execution of a
binary executable. It is build upon avalanche effect, which
refers to the desirable property of cryptographic functions.
Avalanche effect means that one bit change in the input or key
would cause significant change in the output.
II. LITERATURE REVIEW
An exhaustive literature survey has been conducted
to identify related research works conducted in this area.
Abstracts of some of the most relevant research works are
included below.
 ReFormat: Automatic Reverse Engineering of Encrypted
Messages
ReFormat, is a system that is used to derive the message
format even when the message is encrypted. An encrypted
input message is assumed to go through two phases, message
decryption and normal protocol processing. Data lifetime
analysis of run-time buffers can pinpoint the memory
locations that contain the decrypted message generated from
the first phase and are later accessed in the second phase. It is
designed to handle applications where there exists a single
boundary between decryption and normal protocol processing.
However, multiple such boundaries may exist. It was not
designed to identify the buffers holding the unencoded data
before encoding. However it is not effective in the presence of
multiple rounds of cryptographic operations.
 Dispatcher:Enabling Active Botnet Infiltration using
Automatic Protocol Reverse engineering
Dispatcher is used to extract the format of protocol
messages sent by an application. It makes a forward pass over
the input execution trace and replicates the callstack of the
application. It then computes the read set to identify the
buffers holding the unencrypted data before encryption. Read
set includes the set of locations read inside the encryption
routine before being written. It also computes the write set to
identify the buffers holding the unencrypted data after
decryption. Write set is the set of locations written inside the
decryption routine and read later in the trace. However these
methods are not effective in the presence of multiple rounds of
cryptographic operations.
C. Automated Identification of Cryptographic Primitives in
Binary Programs
This system includes methods to identify the cryptographic
primitives within a given binary program in an automated
way. Fine-grained dynamic binary analysis is performed and
the collected information is used as input for several heuristics
that characterize unique aspects of cryptographic code. These
methods can successfully extract cryptographic keys from a
given malware binary. In the first stage, it performs finegrained binary instrumentation, and the second stage
implements several heuristics to identify cryptographic code
from the data collected by the first stage. But this method was
not effective in the presence of multiple nested cryptographic
operations.
III. MALWARE ANALYSIS FRAMEWORK
A. Goals and Assumptions
The goal is to uncover the cryptographic operations and
their secrets from the execution of a binary executable.
Specifically, this framework can determine if there is any
cryptographic operation in its execution. If present it can
further pinpoint the location of all the cryptographic functions,
their respective mode and the order of execution. It can also
pinpoint the location, size and boundary of the input and the
output buffers used by each cryptographic function identified.
It also determines exactly when the input and the output of
each cryptographic function will be at which buffers. This
enables us to recover those truly transient input and output of
each cryptographic operation that will be immediately
destroyed or re-encrypted after run-time use. It can also
determine if there is any key used in each cryptographic
operation. If present it can recover the key even if it will be
destroyed right after run-time use.
Malware binary analysis framework is assumed to monitor
the execution of the binary executable. It is also assumed that
the input and the output of any called cryptographic functions
reside in some continuous memory buffer at run-time.
B. The Principle of Malware Analysis Framework
Malware analysis framework is designed upon the
avalanche effect, which refers to the desirable property of all
cryptographic algorithms. A slight change in the input would
cause significant changes in the output. This property is
referred to as avalanche effect. Cryptographic functions are
designed to exhibit the avalanche effect. If no cryptographic
operations are present, it will not be having the avalanche
effect. Hence, the avalanche effect is a fairly unique and
defining characteristic of all good cryptographic functions.
This enables us to reliably identify the cryptographic
operations from executables.
Avalanche effect also allows us to accurately pinpoint the
location, size and boundary of both the input and output
buffers. Changing different bits in the input buffer results
different bits changed in the output buffer. It is also seen that
the changed bits are cohesive within the fixed output buffer.
Let y≥ 1 be the number of bytes of the output of the
cryptographic function. Because of the presence of avalanche
effect, any single bit change in the input would cause 4y bits
changed in the output. Those changed bits in the output are
said to be “touched” by the bit change in the input.
If we know the location of buffers p and q, we can measure
the impact of buffer p to buffer q by repeatedly changing
different bits in buffer p and comparing the corresponding
results in buffer q. Since the location of the input and output
buffers is unknown, this method is not practical. Different
runs of an executable may generate different outputs with the
same input. Therefore, it is desirable to measure the impact
between any two chosen buffers with only one run of a given
binary executable.
If any bit in memory impacts any other bit in memory
during the binary execution, then there must exist information
flow between those two bits. If we can track the information
flow from a given source, we could measure the impact
between memory buffers and detect the avalanche effect with
a single run of the binary executable.
C. Overall Architecture
Fig.1 Overall Architecture
This framework is build upon dynamic binary
analysis (DBA). Figure 1 shows the overall architecture of the
framework. The framework dynamically intercepts and
instruments the runtime instructions of the binary executable.
It also collects valuable run-time information about the
binary’s execution, tracks and records the address and value
of the bytes involved in the taint propagation. It further
analyzes the recorded runtime information and checks the
patterns of instruction execution and memory access for any
avalanche effect. Based on the detected avalanche effect
patterns, it identifies cryptographic operations and determines
the exact location, size and boundary of the input buffer, the
output buffer and any key buffer involved in each of the
identified cryptographic operations.
D. Avalanche Effect Identification
If we know the taint sources and we are able to track the
information flow across cryptographic functions, we can
identify if any part of the information flow from the taint
source exhibits any avalanche effect. Since the location and
size of the input buffer of the cryptographic operation is
unknown, it becomes a major challenge in identifying the
avalanche effect. Inorder to identify the input buffer we need
to taint any memory region and the input sources which could
contain the input buffer of the cryptographic operation. The
monitored execution of the binary must be checked
periodically to identify the presence of avalanche effect.
The execution history is viewed as a sequence of routine
invocations ordered by their completion time. For each routine
invocation, it constructs a set which contains those tainted
buffers that have been updated during that invocation or are
still alive when that invocation completes. The collected run
time information contains the snapshot of the value of tainted
buffers. This framework will be able to detect the presence of
cryptographic operations even if the cryptographic input or
key have been destroyed right after using inside any function
invocation .
For each set, we take every continuous buffer in it as the
potential input buffer, and every continuous buffer in the
remaining sets as the potential output buffer. If there exists
avalanche effect from one buffer to another, then every byte in
first buffer would taint every byte in second buffer. A metric
named taint contribution rate is used to quantitatively measure
avalanche effect between buffers. It is denoted as
C(buffer1,x,buffer2,y), which is defined as the portion of the
first y bytes of buffer2 that would be tainted by every byte
from the first x bytes of buffer1.
This metric will be 100% if and only if there exists the
avalanche effect from the first x bytes of buffer a to the first y
bytes of buffer b. By trying different values for the four
parameters, we can eventually find the right values of them
that make C(buffer1,x,buffer2,y) ≈ 100%. This allows us to
accurately pinpoint the location, size and boundary of both the
input and the output buffers of cryptographic functions. While
searching for avalanche effect, we only need to consider those
bytes that have taint relationship and aggregate them together.
Due to the nature of the avalanche effect, we are able to
discover the exact range of the input buffer and output buffer
even if part of buffers are not involved in the cryptographic
operation.
E. Empirical Evaluation
A prototype have been developed using the
dynamic instrumentation framework Valgrind in Linux.
Dynamic information analysis is done using the tool
Taintgrind. A test program which reads the plaintext from a
disk file and processes it with multiple rounds of block cipher
encryption and decryption followed by a final round of hash
was designed to evaluate the framework's capability in
detecting the cryptographic operations . The test program
performed blowfish encryption followed by AES encryption
and decryption followed by blowfish decryption and sha1
hash on the plaintext.
The two block ciphers use different secret keys. The
plaintext is given as the input for blowfish encryption. The
subsequent steps take the output of the previous step as input.
Taintgrind performs bit level dynamic taint analysis. Taint
propagation calculations are performed for each valuecreating memory or register operation. It traces the flow of
tainted, or marked, input data through an application during
execution and logs the traversal of conditional jumps and
system calls. Bit precise taint propagation allows taintedness
to propagate into bitfields and bit arrays creating a more
accurate view of the impact input has on an application’s
execution.
Table3.1 Information collected from test program
Table3.2 Information collected by the proposed framework
Table 3.1 shows the information about each block cipher
and SHA-1 hash collected by test program 1, which will be
used as the ground truth in validating the framework's analysis
results. Table 3.2 shows the analysis output by proposed
malware binary analysis framework. The runtime execution
information of the block cipher operation includes the entry
point address of the routine who is the first to see the complete
output buffer of the cryptographic operation at its execution
end. By checking OpenSSL’s crypto library’s symbol table,
we can find the routine symbol names which correspond to the
runtime environment addresses. The runtime environment
information reported by the framework matches the ground
truth, which validates the accuracy of framework’s analysis
result
IV. CONCLUSION
This framework is based on the defining characteristic of
all good cryptographic algorithms, avalanche effect. This
framework has been shown to be able to detect block cipher
and hash operations and pinpoint exactly when and where the
cryptographic input, output, and key will be in memory even
if they exist only for a few microseconds. If the execution can
be monitored, current software implementations achieve
hardly any secrecy.
REFERENCES

Xin Li, Xinyuan Wang, Wentao Chang, “CipherXRay: Exposing
Cryptographic Operations and Transient Secrets from Monitored






Binary Execution” IEEE Transactions on Dependable and Secure
computing, vol. 11, no. 2, March/April 2014
A. Shamir and N. van Someren. Playing Hide and Seek with Stored
Keys. In Proceedings of the Third International Conference on
Financial Cryptography (FC 1999), pages 118 – 124, February 1999.
T. Pettersson. Cryptographic Key Recoveryfrom Linux Memory
Dumps.In Presentation, Chaos Communication Camp, August 2007.
J. A. Halderman, S. D. Schoen, N. Heninger, W. Clarkson, W. Paul, J.
A. Calandrino, A. J. Feldman, J. Appelbaum, and E. W. Felten. Lest
We Remember: Cold Boot Attacks on Encryption Keys. In
Proceedings of the 17th USENIX Security Symposium, pages 45–60.
USENIX, August 2008.
Z. Wang, X. Jiang, W. Cui, X. Wang, and M. Grace. ReFormat:
Automatic Reverse Engineering of Encrypted Messages. In
Proceedings of the 14th European Symposium on Research in
Computer Security (ESORICS 2009), pages 200–215, September 2009.
F. Gr¨obert, C. Willems, and T. Holz. Automated Identification of
Cryptographic Primitives in Binary Programs. In Proceedings of the
14th International Symposium on Recent Advances in Intrusion
Detection (RAID 2011), September 2011.
J. Caballero, P. Poosankam, C. Kreibich, and D. Song. Dispatcher:
Enabling Active Botnet Infiltration using Automatic Protocol
Reverseengineering. In Proceedings of the 16th ACM Conference on
Computer and Communications Security (CCS 2009), pages 621–634.
ACM, October 2009.
Download