Data & Storage Structures: Introductions & Overview

advertisement
Data & Storage Structures: Introductions & Overview
• Introduction to Data Structures
• Introduction to Storage structures
• Coverage Overview
• Fundamental Building Blocks
• Linear Data Structures
• Non-Linear Information Structures
• Applications
• Some Skills you should Acquire Through The course
Motivation for Data Structures
• In a beginning programming course, variables are introduced that
store a single datum.
• In all but trivial programs, however, we need to store multiple
related data items for convenience.
• For example, data items like students records many need to be
stored under a single name for ease of processing.
• Such convenient structuring of data is called data organization a
container for the data is called data structure.
• A data structure is a means of organizing data in primary memory
in a form is convenient to process by a program.
Motivation for Data Structures (contd.)
• A data structure could be a programming construct provided in a
language or it can be defined by the programmer.
• Example of data structure include:
Arrays, Linked lists, Stacks, Queues, Trees, Graphs
• Data structures are applied in sorting, searching, hash tables, graph
algorithms, pattern matching, data compressing etc.
• A course in data structure is a core course in most undergraduate
Computer Science degree program.
• The contents of this course has not changed much in the last two
decades except for inclusion of some new algorithms.
Need for Knowledge of Data Structures
• Good programming and problem solving requires knowledge of data
structures.
• Without a sufficient understanding of data structures, the level at
which a programmer can work will be severely limited.
• An important distinguishing characteristics of data structures is the
manner in which they are organized.
• data structures can be linear, hierarchical or unordered.
• data structures are also categorized as linear or dynamic depending
on their allocation strategy.
Motivation for Storage Structures
• The main purpose for computer systems is to execute programs.
• These programs together with the data they act upon, must be
stored in main memory during execution.
• Ideally, we would want the programs and data to reside in the main
memory permanently.
• However, this is not possible because
• the main memory is usually too small to store all needed data
permanently.
• the main memory is a volatile storage device that looses its content
when power is turned off or lost.
Motivation for Storage Structures (contd.)
• Since large data sets cannot fit into main memory, secondary
storage structures are necessary to handle such data.
• Magnetic disk is the most common form of secondary storage in
computer systems.
• With storage structures, a small portion of the data is kept in
primary storage, and additional data is read from secondary storage
as needed.
• A data storage structures is a means of organizing data as blocks in
secondary memory that optimizes I/O read/write operations.
• Data storage structures include: Sequential files, Random
access files, Indexed files, B-Trees, B*-Trees.
Need for Efficient Storage Structures
• Secondary memory (disk) is divided into equal-sized blocks.
• The basic I/O operation transfers the contents of one disk block
to/from main memory.
• A disk access is unbelievably expensive compared to a typical
computer instruction.
• Without efficient data storage structures, a program will spend
most of its time retrieving data from secondary storage.
• A binary search tree can be ideal for internal data retrievals, but it's
performance is inadequate with disk I/O operations.
How B-Trees Minimize Disk Accesses
• B-Trees are used for this purpose and they are considered the bread
and butter of the database world.
• A block of secondary memory is represented as a node in the
B-tree.
• The more records we can fit into a block the fewer disk accesses are
required to find a record.
• A B-Tree is the most flexible storage structure; it exploits the fact
that many records can be read in at a time.
• In a B-tree, data has high locality of reference, a feature that is
extremely important when using secondary storage.
Course Overview: Fundamental Building Blocks
• Overview of OO concepts
– We start with a review of fundamental OO concepts in Java.
– In particular, we stress on inheritance, abstract classes and
interfaces.
– Understanding of these will be assumed and used widely in the
remainder of the course.
• Introduction to design patterns
– These are a record of good practice in software development.
– Software developers are appreciating the need to document good
practices in their trade in the forms of design patterns.
– The use of design patterns in software development provides a good
blend of abstraction, reusability and flexibility.
Course Overview: The Need for Algorithm Analysis
• Real-life application development requires the ability to choose
•
•
•
•
among competing algorithms for any computer process.
Important criteria for measuring algorithms are: simplicity, clarity,
and space- and time-efficiency.
To select better algorithms from a pool of alternatives, we should be
able to compute and compare their performances.
This process may require making compromises; e.g., space- and
time-efficiency can be improved at the cost of increased
programming complexity.
We'll introduce the Big-O notation used to estimate comparative
performance of algorithms.
Course Overview: Linear Data Structures
• Review and Big-O analyses of linked lists
• Arrays and linked lists are the “atoms” of all implementations
encountered in data structures.
• They are needed in the implementations of Trees, Graphs, Hashing
techniques and Memory management.
• A thorough understanding of them and their Big-O running time is
essential in this course.
• Review & Application of stacks & queues
• Stacks and queues are revisited using simple applications.
• They are applied further in the implementations of recursive,
tree and graph algorithms.
Course Overview: Recursion & Recursive Algorithms
• Recursion is an important and popular concept in computer science.
• However, recursive algorithms are usually not as efficient as iterative
algorithms.
• It is easier to reason about and perform correctness proof of
recursive algorithm than to do that for their iterative cousins.
• We study different forms of recursion concluding with a study of a
backtracking algorithm.
Course Overview: Trees
• We dedicate seven lectures to the study of trees and related
applications.
• We discuss the motivation for and how they can be used to provide
efficient solutions to algorithms.
• Present expression trees and outline their common application
areas.
• We study Heap trees and their applications to heap sort and priority
queue.
• We also cover binary search trees, AVL trees and concluding with
B-Trees.
Course Overview: Graphs
• We start by introducing graphs, their applications and outline
their possible representations in the computer.
• We then attempt to give a full implementation of graphs and
popular graph traversal techniques.
• The remaining sessions exemplify and provide implementations for
the algorithms:
• The topological sort algorithm.
• Algorithms for tests of connectedness and cycles.
• The shortest path algorithm.
Course Overview: Hashing Techniques
• Review of searching techniques and introduction to hashing.
• Hash tables and hash functions are then gently introduced using
simple examples.
• The problem of collision is demonstrated and classical techniques for
collision resolutions are outlined.
• The possibility of building perfect hash functions is demonstrated.
• Implementation of the different hashing techniques is presented at
the end.
Course Overview: Data Comparison & memory Mgt.
• Data Compression
•
• Presents motivations for, classifications of and kinds of data
compression schemes.
• Gives animated examples of the following compression algorithms:
• Huffman coding
• LZ78 coding
• LZW coding
Memory management & garbage collection
• So far, we understood how objects can become garbage and
assumed the space they held will be freed.
• In conclusion, we give an insight into garbage collection
schemes and other memory management issues.
Concluding Exercise & Learning Goals
• Concluding exercise:
• Browse through each of the sections outlined in this presentation.
• As a guided tour through the course material.
• Upon completing this course, you should be able to:
• Identify,
• Evaluate,
• Select suitable and
• Reliable data structures
• For a range of applications.
Download