DESCRIPTION OF DATA STRUCTURE Computer science is a field of study that deals with solving a variety of problems by using computers. To solve a given problem by using computers, you need to design an algorithm for it. Multiple algorithms can be designed to solve a particular problem. An algorithm that provides the maximum efficiency should be used for solving the problem. The efficiency of an algorithm can be improved by using an appropriate data structure. Data structures help in creating programs that are simple, reusable, and easy to maintain. This module will enable a learner to select and implement an appropriate data structure and algorithm to solve a given programming problem. 1 Objectives In this session, you will learn to: Explain the role of data structures and algorithms in problem solving through computers Identify techniques to design algorithms and measure their efficiency 2 Role of Algorithms and Data Structures in Problem Solving Computers are widely being used to solve problems pertaining to various domains, such as, banking, commerce, medicine, manufacturing, and transport. To solve a given problem by using a computer, you need to write a program for it. A program consists of two components, algorithm and data structure. 3 Role of Algorithms The word algorithm is derived from the name of the Al Khwarizmi. An algorithm can be defined as a step-by-step procedure for solving a problem. An algorithm helps the user arrive at the correct result in a finite number of steps. 4 Role of Algorithms (Contd.) An algorithm has five important properties: Finiteness: an algorithm terminates after a finite numbers of steps. Definiteness: each step in algorithm is unambiguous Input:an algorithm accepts zero or more inputs Output: it produces at least one output. Effectiveness:all of the operations to be performed in the algorithm must be sufficiently basic that they can in principle be done exactly and in a finite length of time by a man using paper and pencil" 5 Role of Algorithms (Contd.) A problem can be solved using a computer only if an algorithm can be written for it. In addition, algorithms provide the following benefits: Help in writing the corresponding program Help in dividing difficult problems into a series of small solvable problems Help make the process consistent and reliable 6 data structure :is a particular way of storing and organizing data in a computer so that it can be used efficiently Role of Data Structures Different algorithms can be used to solve the same problem. Some algorithms may solve the problem more efficiently than the others. An algorithm that provides the maximum efficiency should be used to solve a problem. One of the basic techniques for improving the efficiency of algorithms is to use an appropriate data structure. Data structure is defined as a way of organizing the various data elements in memory with respect to each other. 7 Role of Data Structures (Contd.) Data can be organized in many different ways. Therefore, you can create as many data structures as you want. Some data structures that have proved useful over the years are: Arrays Linked Lists Stacks Queues Trees Graphs 8 Role of Data Structures (Contd.) Use of an appropriate data structure, helps improve the efficiency of a program. The use of appropriate data structures also allows you to overcome some other programming challenges, such as: Simplifying complex problems Creating standard, reusable code components Creating programs that are easy to understand and maintain 9 Types of Data Structures Data structures can be classified under the following two categories: Static: Example – Array Dynamic: Example – Linked List 10 STATIC VS. DYNAMIC STRUCTURES Static structures Shape and size of the structure do not change over time Easier to manage Dynamic structures Either shape or size of the structure changes over time Must deal with adding and deleting data entries as well as finding the memory space required by a growing data structure 11 Identifying Techniques for Designing Algorithms Two commonly used techniques for designing algorithms are: Divide and conquer approach Greedy approach 12 Identifying Techniques for Designing Algorithms (Contd.) Divide and conquer is a powerful approach for solving conceptually difficult problems. Divide and conquer approach requires you to find a way of: Breaking the problem into sub problems Solving the trivial cases Combining the solutions to the sub problems to solve the original problem 13 Identifying Techniques for Designing Algorithms (Contd.) Algorithms based on greedy approach are used for solving optimization problems, where you need to maximize profits or minimize costs under a given set of conditions. Some examples of optimization problems are: Finding the shortest distance from an originating city to a set of destination cities, given the distances between the pairs of cities. Selecting items with maximum value from a given set of items, where the total weight of the selected items cannot exceed a given value. 14 Determining the Efficiency of an Algorithm Factors that affect the efficiency of a program include: Speed of the machine Compiler Operating system Programming language Size of the input In addition to these factors, the way data of a program is organized, and the algorithm used to solve the problem also has a significant impact on the efficiency of a program. 15 Determining the Efficiency of an Algorithm (Contd.) The efficiency of an algorithm can be computed by determining the amount of resources it consumes. The primary resources that an algorithm consumes are: Time: The CPU time required to execute the algorithm. Space: The amount of memory used by the algorithm for its execution. The lesser resources an algorithm consumes, the more efficient it is. 16 Method for Determining Efficiency To measure the time efficiency of an algorithm, you can write a program based on the algorithm, execute it, and measure the time it takes to run. The execution time that you measure in this case would depend on a number of factors such as: Speed of the machine Compiler Operating system Programming language Input data However, we would like to determine how the execution time is affected by the nature of the algorithm. 18 Method for Determining Efficiency (Contd.) The execution time of an algorithm is directly proportional to the number of key comparisons involved in the algorithm and is a function of n, where n is the size of the input data. The rate at which the running time of an algorithm increases as a result of an increase in the volume of input data is called the order of growth of the algorithm. The order of growth of an algorithm is defined by using the big O notation. The big O notation has been accepted as a fundamental technique for describing the efficiency of an algorithm. 19 Method for Determining Efficiency (Contd.) The different orders of growth and their corresponding big O notations are: Constant - O(1) Logarithmic - O(log n) Linear - O(n) Loglinear - O(n log n) 2 Quadratic - O(n ) 3 Cubic - O(n ) n n Exponential - O(2 ), O(10 ) 20 COMPLEXITY In examining algorithm efficiency we must understand the idea of complexity Space complexity Time Complexity 21 SPACE COMPLEXITY When memory was expensive we focused on making programs as space efficient as possible and developed schemes to make memory appear larger than it really was (virtual memory and memory paging schemes) Space complexity is still important in the field of embedded computing (hand held computer based equipment like cell phones, palm devices, etc) 22 TIME COMPLEXITY Is the algorithm “fast enough” for my needs How much longer will the algorithm take if I increase the amount of data it must process Given a set of algorithms that accomplish the same thing, which is the right one to choose 23 COMPLEXITY In general, we are not so much interested in the time and space complexity for small inputs. 24 COMPLEXITY For example, let us assume two algorithms A and B that solve the same class of problems. The time complexity of A is 5,000n, 1.1n for an input with n elements. the one for B is For n = 10, A requires 50,000 steps, but B only 3, so B seems to be superior to A. For n = 1000, however, A requires 5,000,000 steps, while B requires 2.51041 steps. 25 COMPLEXITY This means that algorithm B cannot be used for large inputs, while algorithm A is still feasible. So what is important is the growth of the complexity functions. The growth of time and space complexity with increasing input size n is a suitable measure for the comparison of algorithms. 26 COMPLEXITY Comparison: time complexity of algorithms A and B Input Size Algorithm A Algorithm B n 5,000n 1.1n 10 50,000 3 100 500,000 13,781 1,000 5,000,000 2.5 1041 1,000,000 5 109 4.8 1041392 27 CASES TO EXAMINE Best case if the algorithm is executed, the fewest number of instructions are executed Average case executing the algorithm produces path lengths that will on average be the same Worst case executing the algorithm produces path lengths that are always a maximum 28 FREQUENCY COUNT examine a piece of code and predict the number of instructions to be executed for each instruction predict how many e.g. times each will be encountered as the code runs Inst # Code F.C. n+1 1 for (int i=0; i< n ; i++) 2 { cout << i; 3 p = p + i; } totaling the counts produces the F.C. (frequency count) n n ____ 3n+1 ORDER OF MAGNITUDE In the previous example: best_case = avg_case = worst_case Example is based on fixed iteration n By itself, Freq. Count is relatively meaningless Order of magnitude -> estimate of performance vs. amount of data To convert F.C. to order of magnitude: discard constant terms disregard coefficients pick the most significant term Worst case path through algorithm -> order of magnitude will be Big O (i.e. O(n)) ANOTHER EXAMPLE Inst # 1 2 3 4 Code F.C. F.C. for (int i=0; i< n ; i++) n+1 n+1 n(n+1 ) n2+n for int j=0 ; j < n; j++) { cout << i; p = p + i; } discarding constant terms produces : 3n2+2n clearing coefficients : n2+n picking the most significant term: n2 n*n n*n n2 n2 ____ 3n2+2n+1 Big O = O(n2) WHAT IS BIG O Big O rate at which algorithm performance degrades as a function of the amount of data it is asked to handle For example: O(n) -> performance degrades at a linear rate O(n2) -> quadratic degradation COMMON GROWTH RATES