APPLICATIONS AND SOLUTIONS FOR COMPLEXITY Dr. Adam P. Anthony Lecture 27 NEXT WEEK Have your presentations ready! Submit electronic document/project on blackboard ALL PROJECTS ARE DUE NEXT TUESDAY, AT THE START OF CLASS REGARDLESS OF WHEN YOU ARE PRESENTING Presentation Order is posted on the course schedule Complexity in the News November 2010 issue Talks about recent attempt to prove P NP http://science.slashdot.o rg/story/10/08/08/22 6227/Claimed-ProofThat-P--NP Talks about using NPComplete problems to secure elections Dealing with NP-Completeness Not all hope is lost with the class NP! We typically analyze problems in the worst case Many cases are much easier, so we can just use a slow algorithm in this case Even if can’t solve problems perfectly, we can still seek something that is close to perfect An Approximation Algorithm is one that aims to do the best it can at solving a problem, though it may make mistakes Usually we can make a faster algorithm by cutting corners, or making a lucky guess. The Set-Cover Problem Imagine, in a large company that we need: And we have several vendors pushing products: Data Storage (DS) Data Processing (DP) Data Imaging (DI) Image Storage (IS) Image Sharing (IH) Program A can do: (DS, DP) Program B can do: (DP, DI, IH) Program C can do: (IS, IH) Program D can do: (DS) Program E can do: (IS) In large, complex instances this problem is NPComplete! Each product has the same cost, and would equally serve our needs for the specified items. Which do we choose? Greedy Algorithms Remember how nondeterministic machines always make the ‘right’ choice? The efficiency is in the fact that they only make one choice, and throw out all other options Greedy algorithm: don’t know the ‘right’ choice, but do know which one looks best at the time Not always the best! (hindsight is 20/20) Sometimes we tie! (left to guessing) Greedy Set Cover Imagine, in a large company that we need: Data Storage (DS) Data Processing (DP) Data Imaging (DI) Image Storage (IS) Image Sharing (IH) And we have several vendors pushing products: Program A can do: (DS, DP) Program B can do: (DP, DI, IH) Program C can do: (IS, IH) Program D can do: (DS) Program E can do: (IS) GreedySetCover: Repeat Pick the program which solves the most remaining problems Until No Problems Remain This algorithm isn’t perfect, but we can prove that it finds answers that are VERY close to the best answer. Leveraging NP-Completeness for Security Key property of NP: solving the problem is hard, but checking the solution is easy! Imagine a vault with the following lock: Give a particularly difficult instance of the traveling salesman problem If you can solve it, the vault will be opened! TSP is not a good problem for this task, because you can’t ‘reverse engineer’ the problem from a solution A more popular usage is Integer Factorization Cryptography Mostly used in military applications Also Idea: always assume that the enemy will intercept our messages! Can found in business, love we still keep them from reading the message? Cryptography: an ordered process of scrambling (encrypting) a message such that it can only be feasibly unscrambled (decrypted) if you possess the proper knowledge (key) Traditional Cryptography Create message, encrypt using key Deliver message to recipient Separately deliver the key How do we keep it safe? Public Key Cryptography Manufacture 1 key, 1 lock Instead of the SENDER locking up the message, ask the RECIPIENT for a lock Put the message in a case, secure with lock Send case in regular postal mail Recipient already has key, so no separate courier required! For convenience: manufacture 1000+ locks, distribute around country RSA Public Key Encryption System RSA: A popular public key cryptographic algorithm Abbreviation of authors’ last names: Rivest, Shamir Adleman Relies on the (presumed) intractability of the problem of factoring large numbers Public key = 2 numbers, N and e: Used to encrypt messages Private key = 2 numbers, N and d: Used to decrypt messages Replaces the 1000 locks from last slide Replaces the key for that lock N is the same for both public, private key Mathematical relationship between N,e,d is what makes this work Encrypting the Message 10111 Encrypting keys: N = 91 and e = 5 Message: 10111two = 23ten Encryption Step 1: 23e = 235 = 6,436,343 Encryption Step 2: 6,436,343 ÷ 91 has a remainder of 4 4ten = 100two Therefore, encrypted version of 10111 is 100. 12-13 Decrypting the Message 100 Decrypting keys: N = 91, d = 29 Encrypted Message: 100two = 4ten Decryption Step 1: 4d = 429 = 288,230,376,151,711,744 Decryption Step 2: 288,230,376,151,711,744 ÷ 91 has a remainder of 23 Original Message: 23ten = 10111two Therefore, decrypted version of 100 is 10111. 12-14 Establishing an RSA public key encryption system 1. Where did N = 91, e = 5, and d = 29 come from? Pick two prime numbers p, and q 2. 3. 7 and 13 Multiply them together to get N: 7*13 = 91 e and d can be any numbers that satisfy this equation: e*d = k(p-1)(q-1) 4. Short story: Pick p and q and the rest of the numbers fall into place, and the system works! Security of RSA Remember, d is kept secret, and we can’t decrypt without it! But d is computed using p and q! And n = p*q (Uh Oh!) A crook can compute d but only if he/she can factor n into its prime components! Factoring large integers is NP-Complete Easy for small numbers (7,13) Hard for large numbers—RSA uses p,q values with 150+ digits! A (rejected) Homework Problem Find the factors of 107,531 How can we solve this? Maybe we can write a program? http://homepages.bw.edu/~apanthon/courses/CSC18 0/slides/PrimeFactors.sb Parallel Computing If one computer is too slow, try 2! Or 4, or 8, or 16 or 32 or 64 or 128 or 256… Parallel computing involves the process of solving a single problem by: Breaking it into smaller pieces Solving each smaller piece in parallel Putting the small solutions back together Many computers today have 2-8 separate processors, giving it the apparent strength of multiple computers! Types of Parallel Computing Distributed Memory Each processor has its own private RAM storage Total RAM in a distributed computer: Sum of all private RAM sources Must request access to another’s RAM Advantages: different processors can’t interfere with another’s work Easy to expand Shared Memory All processors share a single source of RAM To access memory, just send out a Load/Store command to the Bus Advantages: Hard to program Easier to program, locate data Millions of example computers available Disadvantages: Disadvantages: But what if someone else was using that data??? Memory Bottleneck can serialize the parallelism Have to take care in how data is accessed, used Cost of Parallel Computing Question: A problem is split into 3 parts. Each part can be solved in 5 steps. How long will it take to solve the problem if we do it in parallel? Trick question!! Answer: Cost of parallelism = Cost of Splitting Data + Cost of computing the single largest part + Cost of reassembling partial solutions For Shared memory: cost of splitting/combining is very small, but cost of computing can be much higher For Distributed memory: cost of splitting/combining is larger, and cost of computing is (hopefully) faster Problems with Parallelism Splitting data is not always possible! And when it is, it may be tricky Concurrency problems: Parallel writes Race Conditions Parallel Reads are safe Combining results may be as much work as completing the non-parallel algorithm For these reasons—and more—NP-complete problems remain unsolvable in the infinite case BUT we may be able to solve larger problems than would otherwise be possible Obvious Parallelism Obvious parallelism occurs when data is in a List and the task involves executing a simple action on each item in the list where the action is independent of actions on all other items in the list Many vendors now have limited support for obvious parallelism: apply_parallel( 2 1 3 2 4 ADD-1, List) 3 5 4 5 6 6 7 7 8 8 9 Example problems Which of these are obviously parallel, which are not? Which could still be made parallel? Convert a list of Fahrenheit temperatures to Celsius For each item in a list, compute the product of that number times all the numbers that appear before it in the list 1. 2. Take two lists, multiply each corresponding element in each list, then add the resulting products together. 3. 4. 5. [2 5 8 9 10] computes [2 10 80 270 2700] [2 4 6] and [4 5 9] computes 8 + 20 + 54 = 82 Take a list and confirm whether it is sorted Search dictionary.com and record the definition of each word in a list Quantum Computing Bits can’t get any smaller But electrons can be in multiple quantum states simultaneously (“superpositioning”) qubit: can be in 2 states at once 2 qubits: 4 states at once n qubits: 2n states at once! In effect, we can build massively parallel computers! SciAm Special: How Do Quantum Computers Work? http://www.youtube.com/watch?v=hSr7hyOHO1Q&feature=related Images: ams.org