Data & Storage Structures: Introductions & Overview • Introduction to Data Structures • Introduction to Storage structures • Coverage Overview • Fundamental Building Blocks • Linear Data Structures • Non-Linear Information Structures • Applications • Some Skills you should Acquire Through The course Motivation for Data Structures • In a beginning programming course, variables are introduced that store a single datum. • In all but trivial programs, however, we need to store multiple related data items for convenience. • For example, data items like students records many need to be stored under a single name for ease of processing. • Such convenient structuring of data is called data organization a container for the data is called data structure. • A data structure is a means of organizing data in primary memory in a form is convenient to process by a program. Motivation for Data Structures (contd.) • A data structure could be a programming construct provided in a language or it can be defined by the programmer. • Example of data structure include: Arrays, Linked lists, Stacks, Queues, Trees, Graphs • Data structures are applied in sorting, searching, hash tables, graph algorithms, pattern matching, data compressing etc. • A course in data structure is a core course in most undergraduate Computer Science degree program. • The contents of this course has not changed much in the last two decades except for inclusion of some new algorithms. Need for Knowledge of Data Structures • Good programming and problem solving requires knowledge of data structures. • Without a sufficient understanding of data structures, the level at which a programmer can work will be severely limited. • An important distinguishing characteristics of data structures is the manner in which they are organized. • data structures can be linear, hierarchical or unordered. • data structures are also categorized as linear or dynamic depending on their allocation strategy. Motivation for Storage Structures • The main purpose for computer systems is to execute programs. • These programs together with the data they act upon, must be stored in main memory during execution. • Ideally, we would want the programs and data to reside in the main memory permanently. • However, this is not possible because • the main memory is usually too small to store all needed data permanently. • the main memory is a volatile storage device that looses its content when power is turned off or lost. Motivation for Storage Structures (contd.) • Since large data sets cannot fit into main memory, secondary storage structures are necessary to handle such data. • Magnetic disk is the most common form of secondary storage in computer systems. • With storage structures, a small portion of the data is kept in primary storage, and additional data is read from secondary storage as needed. • A data storage structures is a means of organizing data as blocks in secondary memory that optimizes I/O read/write operations. • Data storage structures include: Sequential files, Random access files, Indexed files, B-Trees, B*-Trees. Need for Efficient Storage Structures • Secondary memory (disk) is divided into equal-sized blocks. • The basic I/O operation transfers the contents of one disk block to/from main memory. • A disk access is unbelievably expensive compared to a typical computer instruction. • Without efficient data storage structures, a program will spend most of its time retrieving data from secondary storage. • A binary search tree can be ideal for internal data retrievals, but it's performance is inadequate with disk I/O operations. How B-Trees Minimize Disk Accesses • B-Trees are used for this purpose and they are considered the bread and butter of the database world. • A block of secondary memory is represented as a node in the B-tree. • The more records we can fit into a block the fewer disk accesses are required to find a record. • A B-Tree is the most flexible storage structure; it exploits the fact that many records can be read in at a time. • In a B-tree, data has high locality of reference, a feature that is extremely important when using secondary storage. Course Overview: Fundamental Building Blocks • Overview of OO concepts – We start with a review of fundamental OO concepts in Java. – In particular, we stress on inheritance, abstract classes and interfaces. – Understanding of these will be assumed and used widely in the remainder of the course. • Introduction to design patterns – These are a record of good practice in software development. – Software developers are appreciating the need to document good practices in their trade in the forms of design patterns. – The use of design patterns in software development provides a good blend of abstraction, reusability and flexibility. Course Overview: The Need for Algorithm Analysis • Real-life application development requires the ability to choose • • • • among competing algorithms for any computer process. Important criteria for measuring algorithms are: simplicity, clarity, and space- and time-efficiency. To select better algorithms from a pool of alternatives, we should be able to compute and compare their performances. This process may require making compromises; e.g., space- and time-efficiency can be improved at the cost of increased programming complexity. We'll introduce the Big-O notation used to estimate comparative performance of algorithms. Course Overview: Linear Data Structures • Review and Big-O analyses of linked lists • Arrays and linked lists are the “atoms” of all implementations encountered in data structures. • They are needed in the implementations of Trees, Graphs, Hashing techniques and Memory management. • A thorough understanding of them and their Big-O running time is essential in this course. • Review & Application of stacks & queues • Stacks and queues are revisited using simple applications. • They are applied further in the implementations of recursive, tree and graph algorithms. Course Overview: Recursion & Recursive Algorithms • Recursion is an important and popular concept in computer science. • However, recursive algorithms are usually not as efficient as iterative algorithms. • It is easier to reason about and perform correctness proof of recursive algorithm than to do that for their iterative cousins. • We study different forms of recursion concluding with a study of a backtracking algorithm. Course Overview: Trees • We dedicate seven lectures to the study of trees and related applications. • We discuss the motivation for and how they can be used to provide efficient solutions to algorithms. • Present expression trees and outline their common application areas. • We study Heap trees and their applications to heap sort and priority queue. • We also cover binary search trees, AVL trees and concluding with B-Trees. Course Overview: Graphs • We start by introducing graphs, their applications and outline their possible representations in the computer. • We then attempt to give a full implementation of graphs and popular graph traversal techniques. • The remaining sessions exemplify and provide implementations for the algorithms: • The topological sort algorithm. • Algorithms for tests of connectedness and cycles. • The shortest path algorithm. Course Overview: Hashing Techniques • Review of searching techniques and introduction to hashing. • Hash tables and hash functions are then gently introduced using simple examples. • The problem of collision is demonstrated and classical techniques for collision resolutions are outlined. • The possibility of building perfect hash functions is demonstrated. • Implementation of the different hashing techniques is presented at the end. Course Overview: Data Comparison & memory Mgt. • Data Compression • • Presents motivations for, classifications of and kinds of data compression schemes. • Gives animated examples of the following compression algorithms: • Huffman coding • LZ78 coding • LZW coding Memory management & garbage collection • So far, we understood how objects can become garbage and assumed the space they held will be freed. • In conclusion, we give an insight into garbage collection schemes and other memory management issues. Concluding Exercise & Learning Goals • Concluding exercise: • Browse through each of the sections outlined in this presentation. • As a guided tour through the course material. • Upon completing this course, you should be able to: • Identify, • Evaluate, • Select suitable and • Reliable data structures • For a range of applications.