Using MPI to Break Data Encryption PROJECT BY: JAMES TOWNSEND CSE704 SPRING 2011 COMPLETED UNDER DR. RUSS MILLER Data Security Cryptography has been used as far back as Julius Caesar Important data cannot be sent in plaintext In 1976 a standard was created by the NSB, now NIST IBM was internally using the Lucifer cipher Adapted as the Data Encryption Standard (DES) About DES DES is a symmetric block cipher, meaning the two communicating parties share a key I.E. one key encrypts and decrypts blocks Messages are encrypted by breaking them into individual 64 bit blocks (8 characters) Each block is encrypted with a 56 bit key with 8 parity bits This encrypted message can then be transmitted without worry Controversy Around DES Original submitted cipher used 128 bit keys NSA reduced the size to 56 bits and hid the internal design of the substitution boxes Some believed they did this so they could somehow decode all of the encryptions Controversy was not calmed until the release of the internal design of the algorithm Many still believed it was not secure Cryptanalysis of DES Cryptanalysis is the process of mathematically attacking the algorithm to find weaknesses Goal is to discover a connection between plaintext and cipher text that would be faster than brute force Over 30 years of dedicated work has been put into cryptanalyzing DES with no significant results Differential Cryptanalysis was discovered in 1990 NSA and IBM knew of it 20 years earlier and designed DES to be resistant to this attack Brute Force Attacks Process of searching the entire key-space to find the correct key In 1976 it was inconceivable to attack Even in 1990, estimates were almost 2500 years for a single computer to brute force DES Proposals were made as early as 1977 that a $20 million machine could brute force DES in a day In 1990, the numbers were down to $1 million machine that could break it in 7 hours None of these machines were publicly built Distributed.net Non-profit organization dedicated to solving large- scale problems Created a version of grid computing where people could volunteer their idle computer cycles to help search the key-space for a reward In 1997, the efforts of dsitributed.net cracked a DES encryption in 96 days In 2001, had an estimated throughput of over 30 teraflops How They Succeeded They used the combined efforts of 78,000 computers Users could log on and let their idle cycles be used trying different keys The person whose computer found the correct key would get $4000 in prize money Through this crowd-sourcing type of cracking, great strides were made in making a public outcry EFF The Electronic Frontier Foundation is a cyberspace civil rights group Leading crusaders for the need of a new algorithm First to publicly implement a custom DES breaker Used this machine to break a cipher in just 56 hours in 1998 How They Succeeded They built Deep Crack for just under $250,000 1500 “Deep Crack” chips would each search different keys and eliminate false-positives A head node would periodically retrieve the possibilities from the chips and run the full decryption on them Over 37,000 search units were involved in the first decryption in 1998 24 Units per Deep Crack chip In collaboration with distributed.net, just 22.5 hours for the third DES challenge How It Worked For each key, the nodes would decrypt the first block and check if it came as plaintext Returns as a 64 bit block If Plaintext, it would correspond to 8 characters of ASCII code Normal English text falls into only 69 ASCII values Odds of a random key returning 8 bytes of ASCII code is just 1/65536 If this succeeded, try the same with the second Odds of all bytes returning as ASCII just 1 in 4 trillion Any keys that are still possibilities are returned to a central processor that attempts to decrypt the full text Implementing on the Edge Cluster This approach is perfect for MPI The Edge computer has the OpenSSL library Contains many standard encryption techniques Can expand DES Cracker to test the survivability of many other algorithms Dividing the key-space among all of the nodes and report back the possible keys For theoretical purposes, I kept track of the keys searched to estimate the total time needed Results Searched 34.3 billion keys One 2 millionth of the key-space Almost perfect speedup is achieved Only communication step is to sum up the counted possibilities and make sure all nodes reported results Differences in speedup factors likely due to load balancing issues as the regions are divided DES Results Time to Run 268 Million Keys 30 25 Seconds 20 15 Sec 10 5 0 8 16 24 32 40 PEs Processors 48 56 64 DES Results Total Keys per Second 90 80 Keys/Second (In Millions) 70 60 50 Keys/Sec 40 30 20 10 0 1 8 16 24 32 40 48 56 64 DES Results Speedup 1.01 1 0.99 0.98 0.97 Speedup 0.96 0.95 0.94 0.93 0.92 0 20 40 60 80 Processors 100 120 140 Implications A single node is capable of searching roughly 1.1 million keys per second In comparison, each Deep Crack node searching 2.5 million keys per second Shows the large difference between running DES in specialized hardware vs. software However, using 64 PEs, over 80 million keys per second are possible Using an entire 1024 PE Edge partition, roughly 1.2 billion keys could be tested every second Implications Using a completely general purpose parallel computer, it is possible to approach the key search speeds Deep Crack was able to achieve Utilizing an entire Edge partition could crack DES on average in just 9 months The Edge partition has a total of just 1024 PEs, compared to the 37000 search units on the original Deep Crack machine This is still just using non-optimized software versions of the algorithm OpenSSL focuses on usability, not efficiency Hardware encryption would still take less than half the time of even optimized software encryption Introduction of AES The Advanced Encryption Standard was brought about largely by the efforts of Deep Crack and Distributed.net A new algorithm using a much larger key size (variably 128-256 bits) was selected from a publically submitted contest Much of the controversy that surrounded DES was mitigated by this open-source process Implementation on Edge Algorithm followed the same concept as the DES Cracker Blocks are twice as many bits, so using 2 blocks is even less likely to be all ASCII by chance that DES Returned only 41 possibilities out of 4 trillion keys just by checking the first two blocks Keys were harder to determine because only a portion of the key is used per round, but that was accounted for in the process AES Results on the Edge Time to Run 4.3 Billion Keys 600 500 Seconds 400 300 Seconds 200 100 0 8 16 24 32 40 PEs 48 56 64 AES Results on the Edge Total Keys/Second 70 Keys/Second (In Millions) 60 50 40 30 20 10 0 1 8 16 24 32 PEs 40 48 56 64 AES Results on the Edge Speedup 0.96 0.958 0.956 Speedup Factor 0.954 0.952 0.95 0.948 0.946 0.944 0.942 0.94 0 10 20 30 40 PEs 50 60 70 AES Expansion to GPGPUs GPGPUs (General Purpose Graphics Processing Units) offer an exciting opportunity for parallel computing Consist of CPUs extremely limited in processing power, streamlined for very fast, simple computations Perfect for simple parallel tasks, such as encrypting files with AES AES Expansion to GPGPUs NVIDIA is a leader in scientific computing on GPGPUs Opened the CUDA language to developers to run on their video cards Dr. Russ Miller has headed a project to create a supercomputer at the University at Buffalo using NVIDIA cards as the processing power Entitled the MAGIC computer AES Cracker Implementation For simplicity, I used my personal computer with a CUDA-enabled NVIDIA card as a test subject Performed on a NVIDIA 9500GT I was able to find an open-source AES implementation that was suited for a similarly styled AES Cracker Many optimizations still had to be made to decrypt with many keys per one block, as opposed to many blocks with one key AES Results on GPGPU 36 Million keys per second on a single GPGPU By comparison, it took 40 nodes to reach 37 Million keys per second Extrapolating the numbers, it would take 2.85x10^23 years for a single card to search the entire keyspace The Edge machine would take 1.1x10^22, a savings of just one order of magnitude GPGPU Supercomputers- Magic The University at Buffalo Cyberinfrastructure Laboratory has a nVidia Tesla/Intel Xeon cluster Hybrid GPGPU/Central Processor Hierarchy of Dell PE1950s controlling 15 nVidia Tesla S1070s Approximately 57.5 TFLOPS Total cost of the system was under $100,000 Extension to MAGIC Computer nVidia GeForce 9500 GT – 134.4 GFLOPS nVidia Tesla S1070 – 4147.2 GFLOPS Each of the 15 nodes are a 30 times faster Instead of just 36 Million keys/second, MAGIC is capable of more than 16.7 Billion keys/second As GPGPUs become more widespread, speeds will continue to skyrocket as prices will begin to plummet Tesla S2050 cards already reach 5152 GFLOPS Comparison Between Computers The partition of the Edge computer used consists of 128 dual-quad core nodes Each node cost upwards of $3500 Total machine cost over $400,000 More expensive partitions also exist A Theoretical limit of just over 9000 GFLOPS for $400,000 compared to MAGICs 57500 GFLOPS for just under $100,000 This shows the real potential for GPGPU supercomputing opportunities Conclusion DES is certainly no longer secure due to the efforts of DeepCrack and Distributed.net, as well as the dramatic role GPGPUs will continue to play in the supercomputer market AES is still a very strong algorithm that is completely infeasible to crack by current measures Even the MAGIC system would take 6x10^20 years to search the entire keyspace Continuing advances in GPGPU supercomputing will make attempts at building a successful AES cracker more realistic, but will not be successful anytime soon Currently would take 10^21 GPGPUs to reduce the time to crack to within a single year Even if the 128 bit key size becomes obsolete, 192 and 256 bit key versions are already in use and can be easily adopted universally These key sizes would eliminate the chance of insecurity exponentially References EFF and Deep Crack DES Information http://www.nvidia.com/object/why-choose-tesla.html http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units Intel L5520 Specifications http://www.cse.buffalo.edu/faculty/miller/CI/equipment.shtml nVidia Specifications www.ccr.buffalo.edu CyberInfrastructure Laboratory http://shader.kaist.edu/sslshader/libgpucrypto/ Center for Computational Research http://en.wikipedia.org/wiki/Advanced_Encryption_Standard Standard AES Implementation in CUDA http://en.wikipedia.org/wiki/Distributed.net AES Information http://en.wikipedia.org/wiki/DES_Challenges Distributed.net www.openssl.org RSA Security DES Challenges http://en.wikipedia.org/wiki/Data_Encryption_Standard Standard DES and AES Implementations in C http://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracker/HTML/19980716_eff_des_faq.html http://www.tecchannel.de/bild-zoom/2019750/11/382245/il-80380865738247327/ Thanks to Dr. Russ Miller, Kevin Cleary, and Matt Jones for specifications and costs of the CCR systems