Handout 1 The University of the Witwatersrand School of Computer Science Data Structures and Algorithms: COMS2002 Introduction to Computational Problem-Solving and Critical Thinking Ritesh Ajoodha July 23, 2017 What are algorithms? What are the advantages of studying algorithms? What are some applications of algorithms? Why do we care about algorithm efficiency and data structures? In this handout, we will answer these questions. Algorithms Any unambiguous procedure that considers input values to produce output values can be considered an algorithm. Algorithms are particularly implemented to solve well-defined computational and real-world problems. The statement of the problem specifies in general terms the desired input/output relationship. The algorithm describes a specific computational procedure for achieving that input/output relationship1 . Algorithms can be used to solve difficult mathematical puzzles like the rubik cube. In fact, there exist algorithms to perform complex strong artificial intelligence to simple numerical sorting procedures. Figure 1: Rubic cubes can be solved by an algorithm. 1 Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein, et al. Introduction to algorithms, volume 2. MIT press Cambridge, 2001 Introducing numerical sorting procedures forms the foundation for intermediate design techniques; data structures; and complexity analysis that will appear later in this course. Therefore, let us formally define a sorting problem: Definition 1 Suppose we need to sort a sequence of numbers: ( x1 , x2 , x3 , . . . , xn ). Then we should be able to reorder these numbers from smallest to largest: ( x10 , x20 , x30 , . . . , xn0 ), such that x10 ≤ x20 ≤ x30 ≤ . . . ≤ xn0 . Since most programming procedures require sorting as an intermediate step, sorting objects is considered as very important and so there have been several significant efforts that have improved sorting procedures over the years. However, the effectiveness of the sorting procedure utilised depends on its best and worst case complexity on the desired application. For example, we may want to consider the following factors about the data to be sorted before selecting an appropriate sorting algorithm: • How many items are to be sorted? As an example, suppose we are given the following array (list) of numbers: < 25; 65; 45; 78; 32; 19; 05; 97 >, then we should be able to sort these numbers and produce the array: < 05; 19; 25; 32; 45; 65; 78; 97 >. introduction to computational problem-solving and critical thinking 2 • Are the items already sorted? or partly sorted? c INSERTION SORT SORTING ALGORITHM • Are the items in some well-defined range or domain? • What is the architecture of the system which will perform the sorting operations? • How much or what type of storage do we have on the system? The correctness of an algorithm depends on the its ability to solve a problem with a minimal error rate - i.e. without too many incorrect outputs. If the algorithm solves the problem with no incorrect outputs, the algorithm can be said to solve the problem fully. Other algorithms which only produce some correct outputs are also possible solutions to the problem with specified error rates. Although in some cases these algorithms with error rates are useful, we are always expected to minimise the error rate for more correct solutions. In this course we will be mostly dealing with algorithms with 0 error rates, i.e. algorithms which solve given problems fully2 . What kinds of problems are solved by algorithms? Insertion sort is one of the fastest algorithms for sorting very small arrays. Complexity Memory Usage Recursion In-place Stability Comparison Sort Sorting Method Adaptability O(n2 ) O(n) No Yes+O(1) ES Yes Yes Insertion Yes Figure 2: A sorting algorithm picture card Any algorithms presented by students with a error rate > 0 will be marked incorrect. 2 Sorting objects is not the only computational problem that can be solved by using algorithms. Below are some examples of applications developed using algorithms by Computer Science Honours students at Wits University in 2013: Content-free Grammars, Gaussian Distributions, and Evolutionary Algorithms in Algorithmic Music Composition3 by Ritesh Ajoodha 3 Ritesh Ajoodha, Richard Klein, and Marija Jakovljevic. Using statistical models and evolutionary algorithms in algorithmic music composition. 2014 There have been several attempts to conduct algorithmic music composition by using techniques which resemble statistical models and evolutionary algorithms. There has, however, been no attempt to integrate these techniques into a suitable algorithm for progress in music evolution and structure. The aim of this research is to present an evolutionary approach to algorithmic music composition through machine learning. The research uses genetic algorithms to refine a statistical sample which will be formulated through Gaussian Distribution. Let it be clear that this research only poses procedural traits to develop music and does not attempt to algorithmically mimic emotional traits which are found in human-agents. The results showed that some melodies produced by the algorithm displayed significant interval changes and stylistic sense. The two-phased, Statistical and Evolutionary, model strengthens algorithmic music composition methods, particularly in the genetic and statistic phases where the key melody is defined. The potential for this research can be applied to a wider context of research in education and human-agent composition. “ ” Figure 3: Some pieces of music generated by Ajoodha et.al [2013]. 5 introduction to computational problem-solving and critical thinking Augmenting Chess Evaluation Functions with Artificial Neural Networks4 by Steven James Over the past 50 years, much work has been done in the field of computer chess to the extent that humans are no longer able to compete against the best chess programs. The large majority of this research is focused on expanding the game tree. Through a combination of improved hardware and search techniques, chess programs of today are able to analyse millions of positions per second. Despite the strength of these programs, many bemoan the fact that chess programs display no real intelligence they are simply efficient searching machines. Furthermore, top programs do not possess the ability for self-improvement. Artificial neural networks (ANNs) are one paradigm for creating self-improving agents. This paper proposes constructing a pseudo-intelligent chess program that implements an ANN. The program will be evolved using a genetic algorithm through a series of tournaments, with the final result being compared with a traditional, non-learning chess program. 3 4 James Steven. Augmenting chess evaluation functions with artificial neural networks. 2013 “ Creating A Niche Search Engine: The Crawler Component5 Figure 4: A chess scenario studied in James[2013]. ” 5 Affuta Johnathan. Creating a niche search engine: The crawler component. 2013 by Johnathan Affuta The following document is a research proposal for a research project in which we attempt to create a niche search engine, with a narrow but specific domain than is predefined by a user. Our goal is that this niche search engine will return the links of web pages that are more relevant to the user in the specified domain than other commercial search engines. This research project consists of 3 main components. These components are an focused crawler, a support vector machine (SVM) and an indexer. In order to collect the links to relevant documents the focused crawler will need to crawl the web and retrieve web documents and send them to the SVM which will give them a relevancy score. Only the documents that have a good score will be indexed. This proposal focuses mainly on the focused crawler. “ ” Rapidly Prototyping Connected Devices that Communicate over the 3G network, for the Internet of Things6 by Matthew Holmes Figure 5: Wolfram Alpha, a niche search engine. 6 Matthew Holmes. Rapidly prototyping connected devices that communicate over the 3g network, for the internet of things. 2013 The Internet of Things (IoT) is the latest buzzword amongst technology experts. A world where machines can communicate and interact without human interference is what is envisioned. Rapid prototyping toolkits have helped researchers, engineers, computer scientists, designers and enthusiasts to quickly build and test these Machine-to-Machine(M2M) systems. As the IoT begins to take form, a shift to new and exciting applications can be created “ Figure 6: The internet of things (IoT). introduction to computational problem-solving and critical thinking 4 as the 2G network is phased out and 3G and LTE become more prevalent. It is relatively simple to build a system that relies on the 2G network for communication, using these toolkits. This case study looks for a fast and simple way to rapidly prototype an IoT application with 3G connectivity. It can give confidence to South Africans who would like to rapidly prototype an application that uses 3G, but due to the limited options available, can’t be certain a given method will work. There is a risk involved in importing a 3G module as there are no recorded experimental projects that have been done in South Africa. Using smart phones when rapidly prototyping a system may not always be ideal for a non-technical user; they can add some complexity such as with the Android ADK. ” In this course you will learn about several data structures. A data structure is a way to store and organize data in order to facilitate access and modifications7 . There does not exist a single data structure that can be efficiently used to solve every problem and so clever use of data structures, with a master knowledge of their strengths and weaknesses, are needed to solve most problems. Course Objective Although you will learn many algorithms in the duration of this course, you might someday confront a problem with a solution not explicitly mentioned in this course, or any other published work on or off the Internet. Therefore, this course will teach you techniques to architect and critically examine algorithms. It is my hope that with a thorough understanding of these concepts - along with a few years of experience - you will eventually demonstrate a wide and superior palette of computational problem solving abilities. NP Problems Imagine we lived in a world where computers were infinitely fast and memory was unlimited. In this world every correct solution to a problem would work just as well as any other correct solution. Therefore, there will be no need to measure the efficiency of a problem since any solution will do. Unfortunately, while computers manipulate data faster than humanly possible, the computer’s speed is not infinite. Furthermore, while memory gets cheaper every year, it is certainly not unlimited and free. Therefore, given time and space as resources, we need to build efficient algorithms that solve our problems fully and use these resources cost-effectively. Every algorithm in this course will be critically evaluated by the amount 7 Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein, et al. Introduction to algorithms, volume 2. MIT press Cambridge, 2001 N LINKED LIST 5 DATA STRUCTURE Data Structures A linked list is a group of nodes which together represent a sequence. Pros: Dynamic; Easy Insertion and Deletion; Implements other data structures. Cons: Pointers need extra storage; Nodes Stored Incontiguously = Bad Access Time; Single LL are hard to traverse; Double LL use extra storage. Figure 7: A data-structure picture card introduction to computational problem-solving and critical thinking As an example of an NP-complete problem, consider a florist with a delivery service. Every morning, it loads up each delivery truck and sends it around to deliver flowers to several addresses. At the end of the day, each truck must end up back at the florist so that it is ready to be loaded for the next day. To reduce costs, the company wants to select an order of delivery stops that yields the lowest overall distance traveled by each truck. This problem is the well-known "traveling-salesman problem", and it is NP-complete. It has no known efficient solution. Under certain assumptions, however, we know of efficient algorithms that give an overall distance which is not too far above the smallest possible. L THE TRAVELLING SALESMAN ∞ NP PROBLEM of time it takes to run. That is, given a respective input how long will the algorithm take to produce the desired output? This is an efficiency measure using speed. However, there are some problems with no known efficient solution, known as NP-complete problems. Although no efficient solution has ever been presented, there has also been no proof that suggests that a solution does not exists. Furthermore, this intriguing subset of problems has a property that if an efficient solution can be found for one of them, then an efficient solution exists for all of them. Lastly, many NP-complete problems are similar to problems that have efficient solutions. It is interesting how slight alterations to the problem statement can cause a tremendous change in computational efficiency. It is important to understand NP-completeness since these problems can arise in real applications, and so if you can’t recognise these problems then you may waste time trying to obtain an efficient solution where none might exist. 5 Formulated in 1930, this is one of the most intensively studied problems in optimization. Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city? NP-Hard introduction to computational problem-solving and critical thinking 6 Tutorial 1 1. Implement each of the following algorithms below using a pen and paper: Adding Two numbers Step 1: Start Step 2: Declare variables num1, num2 and sum. Step 3: Read values num1 and num2. Step 4: Add num1 and num2 and assign the result to sum: sum ← num1+num2 Step 5: Display sum Step 6: Stop Find the largest of 3 numbers: Step 1: Start Step 2: Declare variables a,b and c. Step 3: Read variables a,b and c. Step 4: If a>b If a>c Display a is the largest number. Else Display c is the largest number. Else If b>c Display b is the largest number. Else Display c is the greatest number. Step 5: Stop Find the nth ≤ 1000 term of the Fibonacci series Step 1: Start Step 2: Declare variables first_term,second_term and temp. Step 3: Initialize variables first_term←0 second_term←1 Step 4: Display first_term and second_term Step 5: Repeat the steps until second_term1000 5.1: temp←second_term 5.2: second_term←second_term+first term 5.3: first_term←temp 5.4: Display second_term Step 6: Stop Determine if a number is prime or not Step 1: Start Step 2: Declare variables n,i,flag. Step 3: Initialize variables flag←1 i←2 Step 4: Read n from user. Step 5: Repeat the steps until i<(n/2) 5.1 If remainder of n÷i equals 0 flag←0 Go to step 6 5.2 i←i+1 Step 6: If flag=0 Display n is not prime else Display n is prime Step 7: Stop 2. Provide some examples of algorithmic procedures (e.g. How Google search engine works; learning the lyrics to a song; applying for a VISA to go to Germany; or simply baking a cake. Describe all of the processes involved in the system as well as how these processes are dependent on the completion of other processes. 3. Suppose we are comparing implementations of of two sorting algorithms: Sorting Algorithm 1 and Sorting Algorithm 2 on the same machine. For inputs of size n, Sorting Algorithm 1 runs in 8n2 steps, while Sorting Algorithm 2 introduction to computational problem-solving and critical thinking 7 runs in 64nlgn steps. For which values of n does Sorting Algorithm 1 beat Sorting Algorithm 2? 4. What is the smallest value of n such that Searching Algorithm 1’s running time is 100n2 runs faster than Searching Algorithm 2’s running time is 2n on the same machine? 5. Come up with a real-world problem in which only the best solution will do. Then come up with one in which a solution that is approximately the best is good enough. 6. There are two algorithms given below: The Travelling Salesman problem and Dijkstra’s Algorithm. Read the picture cards below and answer the following question: How are Dijkstra’s Algorithm and traveling-salesman problem similar? How are they different? ∞ Formulated in 1930, this is one of the most intensively studied problems in optimization. Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city? NP-Hard t POPULAR ALGORITHM THE TRAVELLING SALESMAN NP PROBLEM L DIJKSTRA’S ALGORITHM 75 Dijkstra’s algorithm is used to find the shortest path bewteen two nodes in a graph. Worst Case: O(| E| + |V |log|V |) Dijkstra designed the shortest path algorithm in about 20 minutes without aid of paper and pen and later implemented it for ARMAC for a slightly simplified transportation map of 64 cities in Netherland. 7. Select a data structure that you have seen previously for example: linked lists; arrays; trees; graphs; stacks; or queues, and discuss it’s strengths and weaknesses. 8. Other than speed, what other measures of efficiency might one use in a real-world setting? Laboratory Questions 1. Write a program that prints ‘Hello World’ to the screen. 2. Write a program that asks the user for his name and greets him with his name. 3. Modify the previous program such that only the users Alice and Bob are greeted with their names. 4. Write a program that asks the user for a number n and prints the sum of the numbers 1 to n 5. Modify the previous program such that only multiples of three or five are considered in the sum, e.g. 3, 5, 6, 9, 10, 12, 15 for n=17 6. Write a program that asks the user for a number n and gives him the possibility to choose between computing the sum and computing the product of 1,...,n. introduction to computational problem-solving and critical thinking 8 7. Write a program that prints a multiplication table for numbers up to 12. 8. Write a program that prints all prime numbers. (Note: if your programming language does not support arbitrary size numbers, printing all primes up to the largest number you can represent is fine too.) 9. Write a guessing game where the user has to guess a secret number. After every guess the program tells the user whether his number was too large or too small. At the end the number of tries needed should be printed. I counts only as one try if the user inputs the same number consecutively. 10. Write a program that prints the next 20 leap years. 11. Write a program that computes: 4. ∑ k = 1106(−1)k + 12k − 1 = 4.(1 − 1 1 1 1 1 + − + − ...) 3 5 7 9 11 (1) 12. Write a function that returns the largest element in a list. 13. Write function that reverses a list, preferably in place. 14. Write a function that checks whether an element occurs in a list. 15. Write a function that returns the elements on odd positions in a list. 16. Write a function that computes the running total of a list. 17. Write a function that tests whether a string is a palindrome. 18. Write three functions that compute the sum of the numbers in a list: using a for-loop, a while-loop and recursion. 19. Write a function on_all that applies a function to every element of a list. Use it to print the first twenty perfect squares. 20. Write a function that concatenates two lists. 21. Write a function that combines two lists by alternatively taking elements, e. g. [a,b,c], [1,2,3] → [a,1,b,2,c,3]. 22. Write a function that merges two sorted lists into a new list. 23. Write a function that computes the list of the first 100 Fibonacci numbers. 24. Write a function that takes a number and returns a list of its digits. 25. Write functions that add, subtract, and multiply two numbers in their digit-list representation (and return a new digit list). If you’re ambitious you can implement Karatsuba multiplication. Try different bases. What is the best base if you care about speed? 26. Implement the following sorting algorithms: Selection sort, Insertion sort, Merge sort, Quick sort, Stooge Sort. Check Wikipedia for descriptions. 27. Implement binary search. 28. Write a function that takes a list of strings an prints them, one per line, in a rectangular frame. For example the list ["Hello", "World", "in", "a", "frame"] gets printed as: ********* * Hello * * World * * in * * a * * frame * ********* introduction to computational problem-solving and critical thinking 9 29. Write function that translated a text to Pig Latin and back. English is translated to Pig Latin by taking the first letter of every word, moving it to the end of the word and adding ‘ay’. “The quick brown fox” becomes “Hetay uickqay rownbay oxfay”. 30. Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures? 31. Implement a function void reverse(char* str) in C or C++ which reverses a nullterminated string. 32. Given two strings, write a method to decide if one is a permutation of the other. 33. Write a method to replace all spaces in a string with’%20’. You may assume that the string has sufficient space at the end of the string to hold the additional characters, and that you are given the "true" length of the string. EXAMPLE Input: "Mr John Smith Output: "Mr%20Dohn%20Smith” 34. Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. If the "compressed" string would not become smaller than the original string, your method should return the original string. 35. Given an image represented by an NxN matrix, where each pixel in the image is 4 bytes, write a method to rotate the image by 90 degrees. Can you do this in place? 36. Write an algorithm such that if an element in an MxN matrix is 0, its entire row and column are set to 0. 37. Assume you have a method isSubstring which checks if one word is a substring of another. Given two strings, si and s2, write code to check if s2 is a rotation of si using only one call to isSubstring (e.g.,"waterbottle"is a rotation of "erbottlewat"). References Ritesh Ajoodha, Richard Klein, and Marija Jakovljevic. Using statistical models and evolutionary algorithms in algorithmic music composition. 2014. Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein, et al. Introduction to algorithms, volume 2. MIT press Cambridge, 2001. Matthew Holmes. Rapidly prototyping connected devices that communicate over the 3g network, for the internet of things. 2013. Affuta Johnathan. Creating a niche search engine: The crawler component. 2013. James Steven. Augmenting chess evaluation functions with artificial neural networks. 2013.