Lecture 3 Data Structure Overview

advertisement
Data Structures/Containers
Overviews
Standard Containers
plus properties
Consumers vs Producers
• Intelligent Consumers of Data Structures know
– What operations are supported
– Complexity of operations
– Memory costs of operations
– In your code documentation, include costs if not O(1)
• Isn’t this enough?
• Can’t we let the theoreticians build great data
structures and algorithms and use them?
• Sometimes - but
– may need to adapt the algorithms
– Or reuse the ideas
– Or even, be a producer
Review
Data Structure
Retrieve/add
Complexity
Stack
O(1)
Linked List
unordered
ordered
Search Tree
Get youngest
put
Get oldest
put
Get any
Put
put
Get or put
Hash Table
Get or put
Usually O(1)
Priority Queue
Get Minimum
Delete minimum
put
O(1)
O(logN)
O(logN)
Queue
O(1)
O(N)
O(1)
O(N)
O(logN)
Data Structures
• Why these data structures?
– experience shows these are general useful building
blocks
– Different classes of programs have different building
blocks
– Maybe more building blocks should be discovered.
• Composition/ Hybrid data structure
– can compose data structures
– e.g. list of trees, hashtable of binary trees, trees can be
implemented as list of lists
– Hybrid algorithms also useful, e.g.
quicksort+bubblesort.
Selecting a Data Structure
• In TSP, suppose we move a city in a tour?
– How should tour be represented?
• In keeping a personal address book, add/delete a
person
• If managing a telephone directory that needs to print
names in order
• add/delete bank transactions
• Spell checker vs spell corrector
Selecting a Data Structure
• In TSP, suppose we add/delete a city to a tour?
– How should tour be represented?
– linked list
• In keeping a personal address book, add/delete a
person
– hash table
• If managing a telephone directory that needs to print
names in order
– sorted tree
• add/delete bank transactions
– queue, to maintain time-order (single point)
– priority queue, multiple entry points
Selecting a Data Structure
• Polynomials
– methods: add, multiply, solve, factor, differentiate,
integrate, find extrema,...
– representation:
• dense entries: array
– position implicitly encodes degree
– implicit information is more efficient
• sparse entries: list of pairs (degree, coefficient)
– information explicit
– explicit information is more comprehensible
Class Polynomial
• Constructor Polynomial(String s)
e.g. New Polynomial(“3*x^3+ x^2+ 1”)
Methods:
void add(Polynomial p)
void mult(Polynomial p)
public void toString()
Non-obvious
private void simplify()
private void sort()
Theorem: guaranteed, absolute simplification is impossible.
Polynomial
• Representation/implementation
Array: size = maximum degree+1
Linked list: size = numbers of terms where
term = pair(coeff, degree)
Useful: class Term implements Comparable
just define int compareTo(Object o)
Term
Class Term implements Comparable
{ int coeff, exp;
Term(int c, int e)
{
coeff = c;
exp = e;
}
public int compareTo(Object o)
{
Term t = (Term)o;
if (t.exp != exp) return t.exp- exp;
else return (t.coeff-coeff);
} }
Polynomial with collections
• Collections.sort( linkedlist l)
– will sort (in O(n* log n)) time the entries where the
natural ordering (i.e. entries in l implement comparable)
• Collections.sort(arraylist a)
– same complexity
• Collections.sort( linkedlist l, Comparator c)
– you can change the ordering by defining a new object, a
comparator.
– A comparator is an interface with one method,
– int compare(Object o1, Object o2)
– Comparator and Comparables can be used to sort and
find extrema (mininum or maximum)
Linked List
• Methods
– boolean isEmpty()
– void insert(Object o) O(1) O(N) if ordered
– void delete(Object o) O(N) even if ordered
– … find(Object o)
O(N) even if ordered
• Uses:
– languages like LISP, Scheme, CLOS are based on lists
– everything can be done with lists
– one-size fits all => expensive
Ordered Linked List
• Methods
– boolean isEmpty()
– void insert(Object o) O(N)
– void delete(Object o) O(N)
– … find(Object o)
O(N)
• Not great performance for work done.
• OK if list short.
Lists
• Types of lists: circular, singly-linked, double-linked,
ordered lists, list of lists = trees
• Implementable as dynamic arrays
– if insert(o) overflows array, allocated a new array that is
twice as large.
• In Collections, LinkedList is a doubly linked list
– boolean contains(Object o)
– boolean add(Object o)
– boolean remove(Object o)
– Iterator iterator()
• supports hasNext(), next() and remove()
• What type of list do you need?
Dynamic Arrays
• What’s the problem with ordinary arrays?
– Overflow
– Replace array by new class DynamicArray.
– When array overflows, allocate twice as much space
and copy old values into new array.
• Comparison with linked list
– Storage: depends on size of objects
– For primitives, dynamic arrays require less storage.
– Time: depends on operations
• adding at head bad, at end good.
– Know your domain. What operations occur?
Frequency?
Splay Lists
• Splaying is a new idea: Probabilistic ordering
• No moving of elements on inserts, but on finds.
• The goal is have good average (amortized) performance
for finding elements.
• Insert(object o) O(1) just add to front
• Remove(object o) O(N) no change
• Find(object o) O(N) worse case
– p*N on average where is p probability of o
– Action: When you find o, move it to the front
– General: if p1>p2>…pn are probabilities of o1…on,
then list will (on average) look like o1->o2->…on.
– Or the expect rank of oi is i.
Stack
• Stack: Main Methods
– void push(object) O(1)
– void pop() O(1)
– Object top() O(1)
– boolean isEmpty() O(1)
• Uses
– hold functions calls (recursion)
– test for balanced parenthesis
– operator parsing
• Easily implemented as singly linked-list
Stack Applications
• Syntax checker
– if next token is paren e.g. (, {,[, },),] )
• if open-paren, push on stack
• if closed-paren, check if equals top of stack
– if equals, pop, else return error
• Evaluation of Postfix (or build a tree)
– If token is operand, push on stack
– If token is operator, let k be its arity
• do k pops
• apply operator to those elements
• push result
• Search trees (depth-first search: later in course)
• Backtracking algorithms (later in course)
Queues: FIFO
• Methods
– void enqueue(Object o) adds object
– Object dequeue() returns oldest object
– boolean isEmpty()
– void makeEmpty()
• If no O notation, assume O(1) (time and memory)
• Uses
– model transactions
– model requests
• Implementable as doubly linked list easily
• As array is a little tricky
Queues as Array
• Assume that we have a large enough array
• Otherwise we can use dynamic arrays
• Idea: wrap around
– front points to first entry stored (initialize to 0)
• deque: remove and decrement front (mod array size)
– back points to last entry stored (initialize to -1)
• enqueue: increment (mod array size) and insert
– if front == back, either empty or full so..
– Keep a count of number of elements stored.
Queue Applications
• Simulations:
– whenever multiple lines of customers and servers, e.g.
at a bank, grocery store etc.
• Search
– breadth first search (later in course)
• Topological Sorting (later in course)
• File-Servers or printers in a network
– Policy: first-come first serve
– Other Policies: (Priority queues)
• smallest job first
• most important job first
• ….
Basic Trees
• Main Methods
– boolean isEmpty()
– void makeEmpty()
– insert(Object o)/delete(Object o) O(log n) if balanced
– boolean find(Object o) O(log n) if balanced
• Uses
– sorted record keeping, reporting and updating
– dictionary, telephone directory,...
– internal representation for programs in compilation
– Language PROLOG based on trees
– Everything can be done with trees
– Game trees
Applications
• Sorting e.g. heapsort and treesort
• Expression Tree: evaluation of expression
• Parse Tree: Compiler has 3 main steps
– Parse into tokens
– Organize tokens into a Parse Tree
– Generate Code
• Decision Tree
– each internal node is a query
– leaf nodes are conclusion
– e.g. medicine, botany, etc
– can be built automatically from data
Hash Tables
• Main Methods
– void insert(Object key, Object o) O(1)
– void remove(Object key, Object o) O(1)
– Object retrieve(Object key)
O(1)
• Amazing!
• Uses
– Whenever add/delete, but don’t care if sorted
– dictionary but not employee records
• problem with weekly/month reports
– Symbol tables in compilers
• what does a variable/function name refer to
Priority Queues
• Not a Queue
• Main Methods
– void insert(Object o) O(log n)
– Object findMin() O(1)
– void delete(Object o) O(log n)
• Uses
– bank with multiple tellers
• events: customers arrive, depart
• process next event (min)
– How many tellers needed to give good service
– If few events, theoretically (queueing theory) works
– With many events, simulate.
Graphs
• Game Trees are often graphs
– aids checkers and chess
• State-space search (general planning) is a graph
– states (representation of model of world)
– operators: map states into next states
• Path finding is searching through a graph
– 1,000,000 queens problem (solved easily)
– job scheduling
– class/ta/room scheduling
– critical-path analysis
– flow analysis: traffic/water/electric/money/work flow
Summary
• Data Structures are the foundation of programs
• Wrong choice of data structure degrades program
significantly.
• Be Data Structure smart.
• Data Structure are the engines underlying programs
– a small part of the code
– But major determining factor for performance
Download