UNIT I – Overview of Data Structures and Algorithms Choosing the most appropriate data structure and data description procedure is the key to creating an efficient, easy-to-understand program. Data Type In programming, a set of values from which a variable, constant, function, or other expression may take its value. Data Type Integer: whole numbers (no fractional part) Character: Alphanumeric character (usually not for arithmetic purposes) Real numbers: fixed-point and floating-point Boolean: True or false Date/Time: Range of date/time Examples 5, -1, 20, +345 ‘A’, ‘@’, ‘6’, ‘+’ 5.35, -89.075, 3E+12 Usually denoted by 0 and 1 “7/17”, “01:20:35” Can be thought of as a set of the “same kind” of data processed by a computer. Note: Data must be examined carefully so that the most appropriate data type can be used Data Structures In programming, the term data structures refer to a method of organizing a collection of data to allow it to be manipulated effectively. Collection of variables, possibly of several different data types connected in various ways. It is a way of collecting and organizing data in such a way that we can perform operations on these data in effective and sometimes efficient way. Basically anything that can store data can be called as a data structure In order to solve problem efficiently one can do the following: 1. Analyze the problem to determine the resource constraints a solution must meet. 2. Determine the operations that must be supported (e.g., record, search, insertion, deletion, etc.) 3. Quantify the constraints for each operation (e.g., search operations must be very fast) 4. Select data structure that best meet these requirements. Classification of Data Structures Categories of Data Structure Basic Data Structure Basic Data Type (Primitive Data Structures) – a set of individual data and is frequently used to create a program; sometimes called atomic data structure as they represent a form where data can no longer be divided or have no parts. Simple Type – the most basic data type which is usually declared according to the syntax rule of a language. Integer type – represents integers; Maximum/minimum value is the unit of data that a computer can process at one time and is determined by the length of one word. Real number type – fixed-point and floating-point numbers Character type – alphabets, numerals, and symbols as characters; Character codes are expressed as a binary number inside a computer. Logical type – values that are used in performing logical operations, such as AND, OR, and NOT. Enumeration type – a data type that enumerates all possible values of variables. Partial type – used to specify an original-value subset by constraining existing data types, that is identifying upper and lower limits of a variable. Pointer Type – addresses that are allocated in a main memory unit. Pointer types are used to refer to variables, file records, or functions. Structured Type (Simple Data Structure) – a data structure that contains a basic data structure or any of the defined data types as its elements. Array Type – a finite set of elements having the same type referenced under a common name. – sometimes called a table – contains data of the same type and size – each individual data is called an array element One-dimension Array – data is arrayed in a line. Two-dimensional Array – data is lined in both vertical and horizontal directions. Three-dimensional Array Multidimensional Arrays – n-dimensional arrays can be defined; May have certain limitations to the number of definable dimensions, depending on the type of programming language or compiler. Static Array – an array for which a required area is determined by a program. Dynamic Array – an array for which a required area is determined after a subscript used for arraying is provided with an expression and the expression is evaluated during execution of program. Record Type – a set of elements having different data types referenced under a common name. Also called structures. Abstract Data Type (ADT) – a mathematical model, where a set of data values and associated operations are precisely specified independent of any implementation. – A kind of data abstraction where a type’s internal form is hidden (information hiding) behind a set of access functions. Data abstraction - is any implementation of data in which the implementation details are hidden (abstracted) from the user. Data encapsulation – hiding data on the level of data type. – Values of the type are created and inspected only by calls to the access functions. General Idea: view the ADT as a black box (information hiding) Inputs and outputs are known, how things are programmed are not. General Rule: keep internal calculations private Advantage: better modularity Localizes changes; better division of labor. Logical and Physical Form of a Data Type Logical form – definition of the data item within an ADT Ex. Integers in mathematical sense: + - * / (operations) Physical form – implementation of the data item Ex. 16 or 32 bit integers Classification of Data Structures 1. Linear – the data items are arranged in a linear sequence. (Example: Array) 2. Non-Linear – the data items are not in sequence. (Example: Tree, Graph) 3. Homogeneous – represents a structure whose elements are of the same type. (Example: Array) 4. Non- Homogeneous – the elements may or may not be of the same type. (Example: Structures) 5. Static – are those whose sizes and structures associated memory locations are fixed, at compile time. (Example: Array) 6. Dynamic – are those which expand or shrink depending upon the program need and its execution. Also, their associated memory locations changes. (Example: Linked List created using pointers) Types of Data Structures Arrays - stores a collection of items at adjoining memory locations. Items that are the same type get stored together so that the position of each element can be calculated or retrieved easily. Arrays can be fixed or flexible in length. Stacks – stores a collection of items in the linear order that operations are applied. This order could be last in first out (LIFO) or first in first out (FIFO). Queues – stores a collection of items similar to a stack; however, the operation order can only be first in first out. Linked Lists – stores a collection of items in a linear order. Each element, or node, in a linked list contains a data item as well as a reference, or link, to the next item in the list. Trees – stores a collection of items in an abstract, hierarchical way. Each node is linked to other nodes and can have multiple sub-values, also known as children. Graphs – stores a collection of items in a non-linear fashion. Graphs are made up of a finite set of nodes, also known as vertices, and lines that connect them, also known as edges. These are useful for representing real-life systems such as computer networks. Tries – keyword tree; is a data structure that stores strings as data items that can be organized in a visual graph. Hash Tables – or a hash map, stores a collection of items in an associative array that plots keys to values. A hash table uses a hash function to convert an index into an array of buckets that contain the desired data item. Uses and Importance of Data Structures Implement physical forms of ADTs. Used to organize code and information in a digital space. Essential for managing large amounts of data (e.g., databases, indexing services) Choose proper data structure for each task. Data Structure Operations Actions performed as part of the data structure implementation; fundamental actions that can be performed on various data structures to manipulate, access, and manage data. • Inserting — adding new record to the structure. • Deleting — removing a record from the structure. • Sorting — arranging the record in some logical order. • Searching — finding the location of the record given a key value. • Traversing — accessing each record exactly once so that certain items in the record may be processed - visiting and examining all elements or nodes within the data structure. This can be done in different orders, such as in-order, pre-order, or post-order for trees. • Merging — combining records in two different sorted files into a single sorted file. • Searching for Min/Max – finding the minimum or maximum element in the data structure.