COS 109 Wednesday October 14 • Housekeeping – Questions about assignments • Continuing our discussion of algorithms – – – – Review of linear time and log time algorithms Sorting algorithms More complex algorithms NP completeness • Programming languages – Differences between algorithms and programs – How programming languages developed over time Linear time algorithms • lots of algorithms have this same basic form: look at each item in turn do the same simple computation on each item: does it match something (looking up a name in a list of names) count it (how many items are in the list) count it if it meets some criterion (how many of some kind in the list) remember some property of items found (largest, smallest, …) transform it in some way (limit size, convert case of letters, …) • amount of work (running time) is proportional to amount of data – twice as many items will take twice as long to process – computation time is linearly proportional to length of input Binary search • Examples – Guessing a number I’m thinking of a number between 1 and N (N = 2n)? – Searching for a word in a dictionary • Every query reduces the problem size by half • Doubling the problem size adds 1 more step to the running time. Logarithms for COS 109 • all logs in 109 are base 2 • all logs in 109 are integers • if N is a power of 2 like 2m, log2 of N is m • if N is not a power of 2, log2 of N is the number of bits needed to represent N the power of 2 that's bigger than N the number of times you can divide N by 2 before it becomes 0 • you don't need a calculator for these! – just figure out how many bits or what's the right power of 2 • logs are related to exponentials: log2 2N is N ; 2 • it's the same as decimal, but with 2 instead of 10 log N 2 is N Algorithms for sorting • binary search needs sorted data • how do we sort names into alphabetical order? • how do we sort numbers into increasing or decreasing order? • how do we sort a deck of cards? • how many comparison operations does sorting take? • "selection sort": – find the smallest using a variant of "find the largest" algorithm – repeat on the remaining names – this is what bridge players typically do when organizing a hand • what other algorithms might work? Selection sort • sorting n items takes time proportional to n2 – twice as many items takes 4 times as long to sort • there are much faster sorting algorithms – time proportional to n log n Why does running time matter? 100 200 400 800 7 8 9 10 n 100 200 400 800 n log n 700 1600 3600 8000 10,000 40,000 160,000 640,000 log n n^2 Quicksort: an n log n sorting algorithm • make one pass through data, putting all small items in one pile and all large items on another pile – there are now two piles, each with about 1/2 of the items – and each item in the first pile is smaller than any item in the second • make a second pass; for each pile, put all small items in one pile and all larger items in another pile – there are now four piles, each with about 1/4 of the items – and each item in a pile is smaller than any item in later piles • repeat until there are n piles – each item is now smaller than any item in a later pile • each pass looks at n items • each pass divides each pile about in half, stops when size is 1 – number of divisions is log n • n log n operations Quicksort: an n log n sorting algorithm • make one pass through data, putting all small items in one pile and all large items on another pile • to make the pass, – start at one end and move forwards until you reach a number larger than the first number, remember where you stopped – then, start at the other end and move backwards until you reach a number smaller than the first number, remember where you stopped – swap the 2 numbers and repeat until your two motions meet – swap the first element with the last number smaller than it (or the first number larger than it) 27 13 25 46 17 29 57 16 ^^^ 27 13 25 16 17 29 57 ^^^ First number smaller 46 Swap First number larger ^^ First number smaller ^^ 17 13 25 16 27 First number larger 29 57 Now, sort (17,13,25,16) and sort (29,57,46) 27 is in the right place 46 Arrows crossed so swap with first Sort/merge • Divide the input into 2 sets (first half and second half) – Sort each half (by Sort/merge) – Merge the 2 halfs What happens in practice when sorting 8 items Divide into first 4 and second 4 Sort first 4 Divide into first 2 and second 2 Sort first 2 Sort second 2 Merge sets of 2 Sort second 4 Divide into first 2 and second 2 Sort first 2 Sort second 2 Merge sets of 2 Merge sets of 4 27 13 25 46 ( ( ) ( 17 ) ( ) ( 29 57 16 ) Split 8 -> 4 and 4 ) ( ) 13 27 25 46 17 29 16 57 13 25 27 46 16 17 29 57 13 16 17 25 27 29 46 57 Split 4 -> 2 and 2 Sort 2’s Merge 2’s Merge 4’s Input is (27,13,25,46,17,29,57,16) sort (27,13,25,46) and sort (17,29,57,16) To sort (27,13,25,46) sort (27,13) and sort (25,46) then merge the 2 lists To sort (17,29,57,16) sort (17,29) and sort (57,16) then merge the 2 lists Recursion (Divide and conquer) • Both Quicksort and Sort/Merge are examples of recursion • I don’t know how to solve your problem but I can solve 2 problems of half the size and then combine the answers • I don’t know how to solve problems of the new size but I can solve 2 problems of half that size and then combine the answers • I don’t know how to solve problems of the new size but I can solve 2 problems of half that size and then combine the answers • I don’t know how to solve problems of the new size but I can solve 2 problems of half that size and then combine the answers • I don’t know how to solve problems of the new size but I can solve 2 problems of half that size and then combine the answers • I don’t know how to solve problems of the new size but I can solve 2 problems of half that size and then combine the answers • I don’t know how to solve problems of the new size but I can solve 2 problems of half that size and then combine the answers • The hope is that the problem size gets so small that I can solve it Recap of algorithms • Algorithms are recipes – Some algorithms make one (or a few passes) across data and so run in linear time – Some algorithms divide the problem size in half at each step and so run in logarithmic time – Other algorithms divide the problem into 2 halves each of which has to be solved and then the results are combined (often run in n log n time) • Key algorithms considered – Finding the maximum (or some property) – Searching – Sorting • Are their algorithms that take significantly more time? – How much time might they take? Recap of algorithms • Algorithms are recipes – Some algorithms make one (or a few passes) across data and so run in linear time – Some algorithms divide the problem size in half at each step and so run in logarithmic time – Other algorithms divide the problem into 2 halves each of which has to be solved and then the results are combined (often run in n log n time) • Key algorithms considered – Finding the maximum (or some property) – Searching – Sorting • Are their algorithms that take significantly more time? – How much time might they take? Towers of Hanoi: an exponential algorithm • Solving for 7 disks Travelling Salesman problem Would like to visit all state capitals with the smallest travel distance There are 1.29x1059 possible tours. A good travelling salesman tour Can we do better? The knapsack problem I have items of weights and a knapsack of capacity 20615 can I exactly pack the knapsack with items from my list? There are 242 (=4.398x1012) possible packings Investigating 106 per second, it would take 50 days to try all possibilities 17 27 44 48 53 54 59 61 78 80 81 90 97 99 104 175 289 356 444 459 657 682 756 888 931 944 948 987 1347 1897 2134 2198 2525 2610 2751 3563 3620 7433 8094 8211 8247 9045 The knapsack problem I have items of weights and a knapsack of capacity 20615 can I exactly pack the knapsack with items from my list? … it would take 50 days to try all possibilities But, 17 + 27 + 44 + 48 + 53 + 54 = 243 657 + 682 + 756 + 888 + 931 = 3914 8211 + 8247 = 16458 243 + 3914 + 16458 = 20615 17 27 44 48 53 54 59 61 78 80 81 90 97 99 And it just takes a few seconds to check the answer 104 175 289 356 444 459 657 682 756 888 931 944 948 987 1347 1897 2134 2198 2525 2610 2751 3563 3620 7433 8094 8211 8247 9045 Further on algorithm complexity log n n n log n .. n2 n3 .. each step divides problem size in half make one (or several) passes over the input data divide (into 2 halves) and conquer (merge answers) 2n all subsets of the input need to be considered each pair of inputs are used together each triple of inputs are used together Complexity hierarchy log n (or part of it) logarithmic n linear polynomial n log n .. n2 quadratic .. n3 cubic .. ******************************************************* 2n exponential (not polynomial) Problems that take exponential time are deemed to not be practically solvable even as computers get faster NP complete problems • There is a class of problems such that – No known method is known to solve each problem in polynomial time – It is easy to verify if a given solution is correct for each problem – If one of the problems can be solved in polynomial time, then all can • These problems are said to be NP-complete – And the underlying problem is the P vs NP (or P=NP?) problem P=NP? Algorithms in Computer Science • study and analysis of algorithms is a major component of CS courses – – – – – what can be done (and what can't) how to do it efficiently (fast, compact memory) finding fundamentally new and better ways to do things basic algorithms like searching and sorting plus lots of applications with specific needs • big programs are usually a lot of simple, straightforward parts, often intricate, occasionally clever, very rarely with a new basic algorithm, sometimes with a new algorithm for a specific task Website (actually YouTube) of the day A projection to the future from 2005 Algorithms versus Programs • An algorithm is the computer science version of a really careful, precise, unambiguous recipe – defined operations (primitives) whose meaning is completely known – defined sequence of steps, with all possible situations covered – defined condition for stopping – an idealized recipe • A program is an algorithm converted into a form that a computer can process directly – like the difference between a blueprint and a building – has to worry about practical issues like finite memory, limited speed, erroneous data, etc. – a guaranteed recipe for a cooking robot Characteristics of a programming language • Has to allow assignment to variables – We did this in Toy with STORE MEM and LOAD MEM • Has to allow arithmetic operations – +,-,*,/; – often more sophisticated like exponentiation, logarithm, sine, cosine • Has to have constructs that allow for looping – IFZERO and GOTO in Toy – Typically IF …. THEN … [ELSE] … for branching IF (condition is true) THEN do these operations ELSE do these other operations – Typically WHILE or FOR or DO for looping WHILE (this condition is true) DO these operations FOR (these values) DO these operations • May have sophisticated ways to structure data What happens to a program in a higher level language • A compiler is written that converts the program text into assembly language – The goal is to have the program work on many machines and the compiler translate it to assembler for a specific machine • The compiler itself has various steps – Parsing (think middle school English class) – Optimizing (using registers/accumulator wisely) – Code generation (creating assembler or machine code) • A language is specified by a grammar which is defined by the standard defining the language • The compiler further defines the language by how it processes Evolution of programming languages • 1940's: machine level – use binary or equivalent notations for actual numeric values • 1950's: "assembly language" – names for instructions: ADD instead of 0110101, etc. – names for locations: assembler keeps track of where things are in memory; translates this more humane language into machine language – this is the level used in the "toy" machine – needs total rewrite if moved to a different kind of CPU loop done sum get # read a number ifzero done # no more input if number is zero add sum # add in accumulated sum store sum # store new value back in sum goto loop # read another number load sum # print sum print stop 0 # sum will be 0 when program starts assembly lang program assembler binary instrs Evolution of programming languages, 1950's • "high level" languages: Fortran, Cobol – write in a more natural notation, e.g., mathematical formulas Fortran (formula translation) for math Cobol (Common business oriented language) for business – a program ("compiler", "translator") converts into assembler – potential disadvantage: lower efficiency in use of machine – enormous advantages: accessible to much wider population of users portable: same program can be translated for different machines more efficient in programmer time John Backus created Fortran compiler at IBM Grace Murray Hopper created Cobol compiler for DoD Samples of Fortran and Cobol Evolution of programming languages, 1950's to 1960’s • Algol – Developed by an international committee in the late 50’s – Designed to be universal – Sabotaged by IBM • BASIC (Beginner's All-purpose Symbolic Instruction Code) – Developed at Dartmouth in 1964 by Kemeny and Kurtz – designed to be simple so that everyone could program • LISP – Designed as a practical notation for computer programs in the late 50’s – Evolved into the language of choice for artificial intelligence programming Evolution of programming languages, 1970's • "system programming" languages: C – efficient and expressive enough to take on any programming task writing assemblers, compilers, operating systems – a program ("compiler", "translator") converts into assembler – enormous advantages: accessible to much wider population of programmers portable: same program can be translated for different machines faster, cheaper hardware helps make this happen – Evolved from B • The initial versions of UNIX were written in C A sample C program to do GET/PRINT/STOP #include <stdio.h> main() { int num, sum = 0; while (scanf("%d", &num) != -1 && num != 0) sum += num; printf("%d\n", sum); } Dennis Ritchie (creator of C)