Uploaded by Aleks Miller

Data Structures and Algorithms Cheat Sheet

advertisement
Proving a function is of a certain O
ο‚·
State: Must find constants 𝑐 > 0 π‘Žπ‘›π‘‘ 𝑛0 ≥ 1 such that the function ≤ 𝑐 π‘₯ π‘œπ‘Ÿπ‘‘π‘’π‘Ÿ π‘“π‘œπ‘Ÿ π‘Žπ‘™π‘™ 𝑛 ≥ Trees
𝑛0
ο‚·
ο‚·
Simplify the inequality w/ the goal being a number on one side and c and 𝑛0 on the other
ο‚·
Choose value for c that simplifies the inequality the most (many values possible)
ο‚·
ο‚·
Choose value for n that is ≥ 𝑛0 and satisfies the inequality(many values possible)
ο‚·
If given the equation as functions, keep c and n general and show they exist
ο‚·
Show 𝑓 𝑛 ⋅ β„Ž 𝑛 𝑖𝑠 𝑂 𝑔 𝑛 ⋅ β„Ž 𝑛 𝑖𝑓 𝑓 𝑛 𝑖𝑠 𝑂 𝑔 𝑛 and all functions non-negative
ο‚·
ο‚·
Must show 𝑓 𝑛 ⋅ β„Ž 𝑛 ≤ 𝑐𝑔 𝑛 ⋅ β„Ž 𝑛 π‘“π‘œπ‘Ÿ π‘Žπ‘™π‘™ 𝑛 ≥ 𝑛0
ο‚·
As 𝑓 𝑛 𝑖𝑠 𝑂 𝑔 𝑛 , there are constants c’>0 and 𝑛0′ ≥ 1 such that 𝑓 𝑛 ≤ 𝑐 ′ 𝑔 𝑛 π‘“π‘œπ‘Ÿ π‘Žπ‘™π‘™ 𝑛 ≥
𝑛0 ’
ο‚·
Multiply both sides by h(n) to get 𝑓 𝑛 β„Ž 𝑛 ≤ 𝑐 ′ 𝑔 𝑛 β„Ž 𝑛 π‘“π‘œπ‘Ÿ π‘Žπ‘™π‘™ 𝑛 ≥ 𝑛0 ’
ο‚·
Choose 𝑛0 ’= 𝑛0 and c=c’ so that c and 𝑛0 exist proving the inequality
Binary Search Trees
Computing Time Complexity
ο‚·
ο‚·
ο‚·
Find components that are constant and label them c#
Find expensive parts(loops and recursive calls) and find out how many
times they run
If there are recursive calls in loops, figure out the # of loop runs without
the recursive call,
o then figure out the effect of the recursive call
Add together all components, then cross out constants
The highest time complexity left is the time complexity
Fastest to slowest time complexities: Class P/Efficient Algorithms
O 1 < O logn < O n < O nlogn < O n2 < O na < O bn <
O n! < O nn
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Hash Tables
ο‚·
Uses a hash function(containing hash code and a compression map) to map keys to
the hash table(array)
Hash code converts a key(of any variable type) to an integer
Most popular is the polynomial(fewest collision)
Collisions can be resolved by:
Linear probing οƒ  finds next available index in the table from starting position
Separate chainingeach array index is a linked list
Double hashinguses secondary hash function after using initial hash function until
free index found
Average time complexity of get(k), put(k,d), and remove(k) is O(1) w/good hash
function O(n) w/bad
Average time complexity of smallest(k), largest(k), predecessor(k), and successor(k)
is O(n)
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Binary search tree w/ internal node height difference of its subtrees of at most 1
Height is O(logn)
Get, put, removes, smallest, largest, successor, and predecessor are O(logn)
Always rebalance the smallest unbalanced subtree
If the tree becomes unbalanced b/c of insertion, one rotation will rebalance tree
When a single and double rotation can be applied to an unbalanced tree, single
rotation rebalances the tree
If the tree becomes unbalanced after removal, several rotations may be needed
to rebalance tree
Shortest Paths
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Minimum Spanning Treesοƒ  Prim’s
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
A spanning subgraph is a subgraph of a graph G containing
all vertices of G
A spanning tree is a spanning subgraph that is itself a tree
Minimum spanning trees are spanning trees o a weighted
graph w/ minimum total edge weight
Use Prim’s algo to find minimum spanning tree
Prim’s is O(𝑛2 ) for adjacency list and adjacency matrix
Max 2 children and are ordered(proper has exactly 2 chilren per
internal node)
#leaves = #internal nodes +1
Max # of nodes at any level = 2𝑖 , 𝑖 = 𝑙𝑒𝑣𝑒𝑙 π‘Žπ‘›π‘‘ Max # of leaves
= 2β„Ž , β„Ž = β„Žπ‘’π‘–π‘”β„Žπ‘‘
Every key in left subtree u is smaller key(u) vice versa right
subtree evry key larger
Performing inorder traversal returns smallest to largest key
If removing internal node u, is replace by smallest key larger
than u
If one child of tree is a leaf and root is removed, other child
becomes root
Smallest, largest, get, put, remove, and successor are O(height)
Recurrence Equations
ο‚·
ο‚·
Equations represented in terms of itself
𝑛−1
f 0 = c1 , 𝑓 𝑛 = 𝑓
+ 𝑐2 , 𝑛 > 0
2
Other Time Complexities
Multi-Way Search TreeAn ordered tree such that:
Linear search: O(n)
Binary
Search: O(logn)
ο‚·
Each internal node has- at least
2 children
and at most 𝑑
children # children = 1+data items
ο‚·
Internal node stores 𝑑 − 1 data items π‘˜π‘– , 𝐷𝑗
ο‚·
An internal node storing keys π‘˜1 ≤ π‘˜2 ≤ β‹― ≤ π‘˜π‘‘−1 has 𝑑
children 𝑣1 , 𝑣2 , … , 𝑣𝑑
ο‚·
By convenience we add sentinel keys π‘˜0 = −∞ and π‘˜π‘‘ = ∞
ο‚·
Leaves hold no items and serve as placeholders
ο‚·
In-order traversal visits keys in icreasing order
ο‚·
Requires secondary data structure
ο‚·
Smallest, largest is O(height)
ο‚·
Get, successor, and predecessor is O(height x logd)
ο‚·
Put and remove is O(d + height x logd)
AVL Trees
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Height of the tree is the maximum depth of
any node
Degree of a node is the # of children(sum of
degrees in tree=#of edges
Depth or level is # of ancestors
3 different tree traversals:
o Pre-order: visit(u) then left, then
right
o Post-order: left child, then right,
then visit(u)
o In-order: left child, visit(u), right
child
2 ) Sort Algo
For weighted graphs(each edge has # value) use Dijkstra’s
Unweighted graphs use BFS
A sub path of a shortest path is itself a shortest path
There is a tree of shortest paths from a start vertex to all
other vertices
Dijkstra’s assumes the graph is connected, the edges are
undirected, and the edge weights are non-negative
Edge relaxations is when you recalculate the minimum
distance to a node that is not yet in path
Equals min{current min, (path distance + edge length)}
Dijkstra’s is O(𝑛2 ) for adjacency list and adjacency matrix
B-Trees οƒ  Multiway search tree of order d such that:
ο‚·
ο‚·
The root has at least 2 children and at most d
All internal nodes other than the root have at
𝑑
least ⌈ ⌉ (round up) and at most d children
ο‚·
ο‚·
ο‚·
2
All the leaves are the same level
Height is 𝑂 log 𝑑 𝑛
Put and remove are same as 2-4 trees and are
O(π‘‘π‘™π‘œπ‘”π‘›)
Var initializations
While both var In array indices
If/else: add correct
value
If first is in array indices
While first not out of
array inidices
Add elements
Else
While second not out
of array indices
Add elements
Return
n vertices on edges
Adjacency List
Adj Mtrx
Space
Edge list
m=#edges
O(n+m)
O(n+m)
O(n)
ο‚·
incidentEdges(u)
areAdjacent(u,v)
insertVertex(u)
insertEdge(u,v,w)
removeVertex(u)
removeEdge(u,v)
O(m)
O(m)
O(1)
O(1)
O(m)
O(m)
O(deg(u))
O(min{deg(u),deg(v)})
O(1)
O(1)
O(deg(u))
O(deg(u)+deg(v))
O(n)
O(1)
.𝑂 𝑛2
O(1)
.𝑂 𝑛2
O(1)
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Sorting Methods
Graphs
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
#of edges in graph m=
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
1 ) Graph
ο‚·
ο‚·
ο‚·
Graph is connected if there is a path from each node to all other nodes
A tree is a graph without cycles (m=n-1)/ Forrest is a set of trees (π‘š ≤ 𝑛 −
1)
Subgraph is a subset of vertices and edges that forms a graph
Connected component is a maximal connected subgraph
Mark(u)
2
If/else(sometimes)
For each edge (u,v) incident on u
If, then: operations + recursive call
ο‚·
External memory is
divided into blocks
called disk blocks
Transfer of a block
between external and
primary memory is
called disk transfer or
I/O
Want to minimize the
number of disk
transfers referred to as
the I/O complexity
B-Trees fit in disk block
better than AVL Trees
Return
Multiway search tree such that
Every internal node has 2, 3, or 4 children AND all of the leaves are same level
Height is O(logn)
put and remove are O(logn)
Insertion can cause overflow which is handled with a split() operation around 2nd key
o Take 2nd key and move to parent, left of second key and right of second key
are children
Overflow may propagate to the parent node in which make second key root
2 cases for deletion: node with or without leaf children
o Without: replace node with in-order successor/predecessor
o With (2 cases):

fusion adjacent siblings are 2-nodes, 2 children(underflow may
propagate to parent)
ο‚·
parent from between children joins other child

transfer adjacent siblings in a 3-node or 4-node(no underflow)
ο‚·
take key from sibling to parent, parent goes to underflow
Directed Graphs each edge is one direction
ο‚·
ο‚·
ο‚·
ο‚·
𝑛 𝑛−1
ο‚·
ο‚·
ο‚·
2-4 Trees
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
A pair (V,E) where V is a set of vertices and E is collection of edges (u,v)
Edges can be directed: ordered pair of vertices (u,v) u is start v is end
Edges an be undirected: unordered pair of vertices(u,v)
Directed graph all edges directed / Undirected graph edges not directed
Mixed graph: directed and undirected edges
Endpoints or end vertices of an edge
U and V are the endpoints of a/ a,d, and b are incident on V
X has degree 5/ U and V are adjacent
h and i are parallel edges/ j is a self loop
a graph without parallel edges or self loop is a simple graph
path is sequence of adjacent vertices/simple path: path S.T all vertices and
edges are distinct
cycle is circular sequence/ simple cycle: all vertices and edges are distinct
except start and end
sum of degrees in a graph equals 2m, m =#edges
Sorting ordered dict can have diff time complex. depending on data
structure
Unordered array (selection sort) is O(𝑛2 )
AVL or 2-4 is O(nlogn)
Sorted array (insertion sort) is O(𝑛2 )
In place sorting algorithm:
In-place insertion sort is O(𝑛2 )
o Merge sort is O(nlogn): splits array in half repeatedly then
merges pieces based on order
o Quick sort is O(nlogn) on avg and O(𝑛2 ) in worst case
o Merge & quick sort use divide and conquer
o Quick sort chooses a pivot then moves smaller elements
left vice versa
o Then quick sort sorts the left and right pieces and
combines it all
Memory
If graph is simple, π‘š ≤ 𝑛 𝑛 − 1
BFS and DFS still work if modified
DFS tree routed at v: shows all vertices reachable
from v via directed paths
Strong connectivity is where every node can reach all
others
A directed acyclic graph(DAG) is a directed graph
that has no directed cycles
A topological ordering of a directed graph is a
o numbering 𝑣1 , … , 𝑣𝑛 of the vertices such
that:
o For every edge (𝑣𝑖 , 𝑣𝑗 ) we have i<j
An ordering is topological if u can draw arrows only
right
Must be a DAG to have topological ordering
Graph Traversals: BFS DFS
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
Depth first search visits all the vertices and
edges in the connected component of node v
The discovery of edges labeled by DFS form a
spanning tree v called a DFS tree
Breadth first search visits all of the nodes
adjacent to a node
o Places found node into the queue,
then finding their adjacent nodes
BFS visits all of the vertices and edges of a
connected component
Discovery of edges from BFS form a spanning
tree called BFS Tree
Prim’s Algorithm
1.
ο‚·
2.
Starts the same as Dijkstra’s and runs similarly
Difference is we only care about the min edge distance
Edge with min distance is chosen, then all edges incident on it are evaluated
a. Node info is updated if there’s an edge with smaller length
3. Edge with shortest length is traversed and the process is repeated.
4. The end result of Prim’s(note that distance values are the shortest edge to the node)
Dijkstra’s Algorithm
1.
ο‚·
2.
3.
ο‚·
4.
Estimated shortest path and predecessor node is store(from starting node)
Nodes that can’t be estimated from the discovered nodes have d =∞ and p=null
Path of length 1 is chosen , then all edges incident on it are evaluated
a. Node info is updated if a shorter path can be found
No more paths of length 1 can be found, so one of the paths of length 2 is chosen
a. then all edges incident on it are evaluated and node info is updated
node 4 isnt updated b/c current shortest path is d=2 and node 4 is d=3 from 3 to 4
other path of length 2 is chosen and this process is repeated until all nodes found
Download