Using MPI to Break Data Encryption

advertisement
Using MPI to Break Data
Encryption
PROJECT BY: JAMES TOWNSEND
CSE704 SPRING 2011
COMPLETED UNDER DR. RUSS MILLER
Data Security
 Cryptography has been used as far back as Julius




Caesar
Important data cannot be sent in plaintext
In 1976 a standard was created by the NSB, now
NIST
IBM was internally using the Lucifer cipher
Adapted as the Data Encryption Standard (DES)
About DES
 DES is a symmetric block cipher, meaning the two
communicating parties share a key

I.E. one key encrypts and decrypts blocks
 Messages are encrypted by breaking them into
individual 64 bit blocks (8 characters)
 Each block is encrypted with a 56 bit key with 8
parity bits
 This encrypted message can then be transmitted
without worry
Controversy Around DES
 Original submitted cipher used 128 bit keys
 NSA reduced the size to 56 bits and hid the internal
design of the substitution boxes
 Some believed they did this so they could somehow
decode all of the encryptions
 Controversy was not calmed until the release of the
internal design of the algorithm
 Many still believed it was not secure
Cryptanalysis of DES
 Cryptanalysis is the process of mathematically
attacking the algorithm to find weaknesses

Goal is to discover a connection between plaintext and cipher
text that would be faster than brute force
 Over 30 years of dedicated work has been put into
cryptanalyzing DES with no significant results


Differential Cryptanalysis was discovered in 1990
NSA and IBM knew of it 20 years earlier and designed DES to
be resistant to this attack
Brute Force Attacks
 Process of searching the entire key-space to find the
correct key
 In 1976 it was inconceivable to attack

Even in 1990, estimates were almost 2500 years for a single
computer to brute force DES
 Proposals were made as early as 1977 that a $20
million machine could brute force DES in a day


In 1990, the numbers were down to $1 million machine that
could break it in 7 hours
None of these machines were publicly built
Distributed.net
 Non-profit organization dedicated to solving large-
scale problems
 Created a version of grid computing where people
could volunteer their idle computer cycles to help
search the key-space for a reward
 In 1997, the efforts of dsitributed.net cracked a DES
encryption in 96 days
 In 2001, had an estimated throughput of over 30
teraflops
How They Succeeded
 They used the combined efforts of 78,000 computers
 Users could log on and let their idle cycles be used
trying different keys
 The person whose computer found the correct key
would get $4000 in prize money
 Through this crowd-sourcing type of cracking, great
strides were made in making a public outcry
EFF
 The Electronic Frontier Foundation is a cyberspace
civil rights group
 Leading crusaders for the need of a new algorithm
 First to publicly implement a custom DES breaker
 Used this machine to break a cipher in just 56 hours
in 1998
How They Succeeded
 They built Deep Crack for just under $250,000
 1500 “Deep Crack” chips would each search different
keys and eliminate false-positives
 A head node would periodically retrieve the
possibilities from the chips and run the full
decryption on them
 Over 37,000 search units were involved in the first
decryption in 1998

24 Units per Deep Crack chip
 In collaboration with distributed.net, just 22.5 hours
for the third DES challenge
How It Worked
 For each key, the nodes would decrypt the first block and
check if it came as plaintext




Returns as a 64 bit block
If Plaintext, it would correspond to 8 characters of ASCII code
Normal English text falls into only 69 ASCII values
Odds of a random key returning 8 bytes of ASCII code is just 1/65536
 If this succeeded, try the same with the second

Odds of all bytes returning as ASCII just 1 in 4 trillion
 Any keys that are still possibilities are returned to a
central processor that attempts to decrypt the full text
Implementing on the Edge Cluster
 This approach is perfect for MPI
 The Edge computer has the OpenSSL library
 Contains many standard encryption techniques
 Can expand DES Cracker to test the survivability of many
other algorithms
 Dividing the key-space among all of the nodes and report back
the possible keys
 For theoretical purposes, I kept track of the keys searched to
estimate the total time needed
Results
 Searched 34.3 billion keys
 One 2 millionth of the key-space
 Almost perfect speedup is achieved
 Only communication step is to sum up the counted
possibilities and make sure all nodes reported results
 Differences in speedup factors likely due to load
balancing issues as the regions are divided
DES Results
Time to Run 268 Million Keys
30
25
Seconds
20
15
Sec
10
5
0
8
16
24
32
40
PEs
Processors
48
56
64
DES Results
Total Keys per Second
90
80
Keys/Second (In Millions)
70
60
50
Keys/Sec
40
30
20
10
0
1
8
16
24
32
40
48
56
64
DES Results
Speedup
1.01
1
0.99
0.98
0.97
Speedup
0.96
0.95
0.94
0.93
0.92
0
20
40
60
80
Processors
100
120
140
Implications
 A single node is capable of searching roughly 1.1
million keys per second


In comparison, each Deep Crack node searching 2.5 million
keys per second
Shows the large difference between running DES in specialized
hardware vs. software
 However, using 64 PEs, over 80 million keys per
second are possible
 Using an entire 1024 PE Edge partition, roughly 1.2
billion keys could be tested every second
Implications
 Using a completely general purpose parallel computer, it
is possible to approach the key search speeds Deep Crack
was able to achieve
 Utilizing an entire Edge partition could crack DES on
average in just 9 months

The Edge partition has a total of just 1024 PEs, compared to the
37000 search units on the original Deep Crack machine
 This is still just using non-optimized software versions of
the algorithm


OpenSSL focuses on usability, not efficiency
Hardware encryption would still take less than half the time of even
optimized software encryption
Introduction of AES
 The Advanced Encryption Standard was brought
about largely by the efforts of Deep Crack and
Distributed.net
 A new algorithm using a much larger key size
(variably 128-256 bits) was selected from a publically
submitted contest
 Much of the controversy that surrounded DES was
mitigated by this open-source process
Implementation on Edge
 Algorithm followed the same concept as the DES
Cracker
 Blocks are twice as many bits, so using 2 blocks is
even less likely to be all ASCII by chance that DES

Returned only 41 possibilities out of 4 trillion keys just by
checking the first two blocks
 Keys were harder to determine because only a
portion of the key is used per round, but that was
accounted for in the process
AES Results on the Edge
Time to Run 4.3 Billion Keys
600
500
Seconds
400
300
Seconds
200
100
0
8
16
24
32
40
PEs
48
56
64
AES Results on the Edge
Total Keys/Second
70
Keys/Second (In Millions)
60
50
40
30
20
10
0
1
8
16
24
32
PEs
40
48
56
64
AES Results on the Edge
Speedup
0.96
0.958
0.956
Speedup Factor
0.954
0.952
0.95
0.948
0.946
0.944
0.942
0.94
0
10
20
30
40
PEs
50
60
70
AES Expansion to GPGPUs
 GPGPUs (General Purpose Graphics Processing
Units) offer an exciting opportunity for parallel
computing
 Consist of CPUs extremely limited in processing
power, streamlined for very fast, simple
computations
 Perfect for simple parallel tasks, such as encrypting
files with AES
AES Expansion to GPGPUs
 NVIDIA is a leader in scientific computing on
GPGPUs

Opened the CUDA language to developers to run on their video
cards
 Dr. Russ Miller has headed a project to create a
supercomputer at the University at Buffalo using
NVIDIA cards as the processing power

Entitled the MAGIC computer
AES Cracker Implementation
 For simplicity, I used my personal computer with a
CUDA-enabled NVIDIA card as a test subject

Performed on a NVIDIA 9500GT
 I was able to find an open-source AES
implementation that was suited for a similarly styled
AES Cracker
 Many optimizations still had to be made to decrypt
with many keys per one block, as opposed to many
blocks with one key
AES Results on GPGPU
 36 Million keys per second on a single GPGPU
 By comparison, it took 40 nodes to reach 37 Million
keys per second
 Extrapolating the numbers, it would take
2.85x10^23 years for a single card to search the
entire keyspace
 The Edge machine would take 1.1x10^22, a savings
of just one order of magnitude
GPGPU Supercomputers- Magic
 The University at Buffalo Cyberinfrastructure
Laboratory has a nVidia Tesla/Intel Xeon cluster




Hybrid GPGPU/Central Processor
Hierarchy of Dell PE1950s controlling 15 nVidia Tesla S1070s
Approximately 57.5 TFLOPS
Total cost of the system was under $100,000
Extension to MAGIC Computer
 nVidia GeForce 9500 GT – 134.4 GFLOPS
 nVidia Tesla S1070 – 4147.2 GFLOPS
 Each of the 15 nodes are a 30 times faster
 Instead of just 36 Million keys/second, MAGIC is capable of
more than 16.7 Billion keys/second
 As GPGPUs become more widespread, speeds will
continue to skyrocket as prices will begin to plummet

Tesla S2050 cards already reach 5152 GFLOPS
Comparison Between Computers
 The partition of the Edge computer used consists of
128 dual-quad core nodes



Each node cost upwards of $3500
Total machine cost over $400,000
More expensive partitions also exist
 A Theoretical limit of just over 9000 GFLOPS for
$400,000 compared to MAGICs 57500 GFLOPS for
just under $100,000

This shows the real potential for GPGPU supercomputing
opportunities
Conclusion
 DES is certainly no longer secure due to the efforts of DeepCrack
and Distributed.net, as well as the dramatic role GPGPUs will
continue to play in the supercomputer market
 AES is still a very strong algorithm that is completely infeasible to
crack by current measures

Even the MAGIC system would take 6x10^20 years to search the entire keyspace
 Continuing advances in GPGPU supercomputing will make
attempts at building a successful AES cracker more realistic, but will
not be successful anytime soon

Currently would take 10^21 GPGPUs to reduce the time to crack to within a
single year
 Even if the 128 bit key size becomes obsolete, 192 and 256 bit key
versions are already in use and can be easily adopted universally

These key sizes would eliminate the chance of insecurity exponentially
References

EFF and Deep Crack


DES Information



http://www.nvidia.com/object/why-choose-tesla.html
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
Intel L5520 Specifications


http://www.cse.buffalo.edu/faculty/miller/CI/equipment.shtml
nVidia Specifications


www.ccr.buffalo.edu
CyberInfrastructure Laboratory


http://shader.kaist.edu/sslshader/libgpucrypto/
Center for Computational Research


http://en.wikipedia.org/wiki/Advanced_Encryption_Standard
Standard AES Implementation in CUDA


http://en.wikipedia.org/wiki/Distributed.net
AES Information


http://en.wikipedia.org/wiki/DES_Challenges
Distributed.net


www.openssl.org
RSA Security DES Challenges


http://en.wikipedia.org/wiki/Data_Encryption_Standard
Standard DES and AES Implementations in C


http://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracker/HTML/19980716_eff_des_faq.html
http://www.tecchannel.de/bild-zoom/2019750/11/382245/il-80380865738247327/
Thanks to Dr. Russ Miller, Kevin Cleary, and Matt Jones for specifications and costs of the CCR systems
Download