cs2110-15-trees

advertisement
Data Structures:
Trees and Grammars
Readings:
Sections 6.1, 7.1-7.4
(more from Ch. 6 later)
1
Goals for this Unit
• Continue focus on data structures and
algorithms
• Understand concepts of referencebased data structures (e.g. linked lists,
binary trees)
– Some implementation for binary trees
• Understand usefulness of trees and
hierarchies as useful data models
– Recursion used to define data organization
2
Taxonomy of Data Structures
• From the text:
– Data type: collection of values and operations
• Compare to Abstract Data Type!
– Simple data types vs. composite data types
• Book: Data structures are composite data
types
– Definition: a collection of elements that are some
combination of primitive and other composite data
types
3
Book’s Classification of Data
Structures
• Four groupings:
– Linear Data Structures
– Hierarchical
– Graph
– Sets and Tables
• When defining these, note an element
has:
– one or more information fields
– relationships with other elements
4
Note on Our Book and Our
Course
• Our book’s strategy
– In Ch. 6, discuss principles of Lists
• Give an interface, then implement from scratch
– In Ch. 7, discuss principles of Trees
– Later, in Ch. 9, see what Java gives us
• Our course’s strategy
– We did Ch. 9 first. Saw List interfaces and
operations
– Then, Ch. 8 on maps and sets
– Now, trees with some implementation too
5
Trees Represent…
• Concept of a tree very
common and important.
• Tree terminology:
– Nodes have one parent
– A node’s children
• Leaf nodes: no children
• Root node: top or start; no parent
• Data structures that store trees
• Execution or processing that can be
expressed as a tree
– E.g. method calls as a program runs
– Searching a maze or puzzle
6
Trees are Important
• Trees are important for cognition and
computation
– computer science
– language processing (human or computer)
• parse trees
– knowledge representation (or modeling of
the “real world”)
• E.g. family trees; the Linnaean taxonomy
(kingdom, phylum, …, species); etc.
7
Another Tree Example: File System
C:\
lab1
list.h
lab2
list.cpp
MyMail
CS120
CS216
lab3
school
pers
calc.cpp
• What about file links (Unix) or shortcuts (Windows)?
8
Another Tree Example: XML and
HTML documents
<HTML>
<HEAD>…</HEAD>
<BODY>
<H1>My Page</H1>
<P> Blah
<PRE>blah blah</PRE>
End
</P>
</BODY>
</HTML>
How is this a tree?
What are the leaves?
9
Tree Data Structures
• Why this now?
– Very useful in coding
• TreeMap in Java Collections Framework
– Example of recursive data structures
– Methods are recursive algorithms
10
Tree Definitions and Terms
• First, general trees vs. binary trees
– Each node in a binary tree has at most two
children
• General tree definition:
– Set of nodes T (possibly empty?) with a
distinguished node, the root
– All other nodes form a set of disjoint subtrees Ti
• each a tree in its own right
• each connected to the root with an edge
• Note the recursive definition
– Each node is the root of a subtree
11
Picture of Tree Definition
r
T1
T2
T3
• And all subtrees are recursively defined as:
– a node with…
– subtrees attached to it
12
Tree Terminology
• A node’s parent
• A node’s children
– Binary tree: left child and right child
– Sibling nodes
– Descendants, ancestors
• A node’s degree (how many children)
• Leaf nodes or terminal nodes
• Internal or non-terminal nodes
13
Recursive Data Structure
Recursive Data Structure: a data structure
that contains a pointer or reference to an
instance of itself:
public class TreeNode<T> {
T nodeItem;
TreeNode<T> left, right;
TreeNode<T> parent;
…
}
• Recursion is a natural way to express many
algorithms.
• For recursive data-structures, recursive
algorithms are a natural choice
14
General Trees
• Representing general trees is a bit
harder
– Each node has a list of child nodes
• Turns out that:
– Binary trees are simpler and still quite
useful
• From now on, let’s focus on binarytrees only
15
ADT Tree
• Remember definition on an ADT?
– Model of information: we just covered that
– Operations? See pages 405-406 in textbook
• Many are similar to ADT List or any data
structure
– The “CRUD” operations: create, replace, update,
delete
• Important about this list of operations
– some are in terms of one specified node, e.g.
hasParent()
– others are “tree-wide”, e.g. size(), traversal
16
Classes for Binary Trees (pp. 416-431)
• class LinkedBinaryTree (p. 425, bottom)
– reference to root BinaryTreeNode
– methods: tree-level operations
• class BinaryTreeNode (p. 416)
–
–
–
–
data: an object (of some type)
left: references root of left-subtree (or null)
right: references root of right-subtree (or null)
parent: references this node’s parent node
• Could this be null? When should it be?
– methods: node-level operations
17
Two-class Strategy for Recursive
Data Structures
• Common design: use two classes for a Tree or List
• “Top” class
– has reference to “first” node
– other things that apply to the whole data-structure object
(e.g. the tree-object)
• both methods and fields
• Node class
– Recursive definitions are here as references to other node
objects
– Also data (of course)
– Methods defined in this class are recursive
18
Binary Tree and Node Class
• LinkedBinaryTree class has:
– reference to root node
– reference to a current node, a cursor
– non-recursive methods like:
boolean find(tgt) // see if tgt is in the whole tree
• Node class has:
– data, references to left and right subtrees
– recursive versions of methods like find:
boolean find(tgt) // is tgt here or in my subtrees?
• Note: BinaryTree.find() just calls Node.find()
on the root node!
– Other methods work this way too
19
Why Does This Matter Now?
• This illustrates (again) important design ideas
• The tree itself is what we’re interested in
– There are tree-level operations on it (“ADT level”
operations)
• The implementation is a recursive data
structure
– There are recursive methods inside the lower-level
classes that are closely related (same name!) to
the ADT-level operation
• Principles? abstraction (hiding details),
delegation (helper classes, methods)
20
ADT Tree Operations: “Navigation”
• Positioning:
– toRoot(), toParent(), toLeftChild(),
toRightChild(), find(Object o)
• Checking:
– hasParent(), hasLeftChild(), etc.
– equals(Object tree2)
• Book calls this a “deep compare”
• Do two distinct objects have the same structure
and contents?
21
ADT Tree Operations: Mutators
• Mutators:
– insertRight(Object o), insertLeft(Object o)
• create a new node containing new data
• make this new node be the child of the current
node
• Important: We use these to build trees!
– prune()
• delete the subtree rooted by the current node
22
Next: Implementation
• Next (in the book)
– How to implement Java classes for binary trees
– Class for node, another class for BinTree
– Interface for both, then two implementations
(array and reference)
• But for us:
– We’ll skip some of this, particularly the array
version
– We’ll only look at reference-base implementation
– After that: concept of a binary search tree
23
Understanding Implementations
• Let’s review some of the methods on
pp. 416-431
– (Done in class, looking at code in book.)
• Some topics discussed:
– Node class. Parent reference or not?
– Are two trees equal?
– Traversal strategies: Section 7.3.2 in book
– visit() method and callback (also 7.3.2)
24
Binary Search Trees
• We often need collections that store items
– Maybe a long series of inserts or deletions
• We want fast lookup, and often we want to
access in sorted order
– Lists: O(n) lookup
– Could sort them for O(lg n) lookup
• Cost to sort is O(n lg n) and we might need to re-sort
often as we insert, remove items
• Solution: search tree
25
Binary Search Trees
• Associated with each node is a key
value that can be compared.
• Binary search tree property:
– every node in the left subtree has key
whose value is less than the value of
the root’s key value, and
– every node in the right subtree has
key whose value is greater than the
value of the root’s key value.
26
Example
5
4
1
8
7
11
3
BINARY SEARCH TREE
27
Counterexample
8
5
2
7
4
11
6
10
18
15
NOT A
BINARY SEARCH TREE
20
21
28
Find and Insert in BST
• Find: look for where it should be
• If not there, that’s where you insert
29
Recursion and Tree Operations
• Recursive code for tree operations is simple,
natural, elegant
• Example: pseudo-code for Node.find()
boolean find(Comparable tgt) {
Node next = null;
if (this.data matches tgt)
return true
else if (tgt’s data < this.data)
next = this.leftChild
else // tgt’s data > this.data
next = this.rightChild
// next points to left or right subtree
if (next == null )
return false // no subtree
else return next.find(tgt) // search on
}
30
Order in BSTs
• How could we traverse a BST so that
the nodes are visited in sorted order?
– Does one of our traversal strategies work?
• A very useful property about BSTs
• Consider Java’s TreeSet and TreeMap
– A search tree (not a BST, but be one of its
better “cousins”)
• In CS2150: AVL trees, Red-Black trees
– Guarantee: search times are O(lg n)
31
Deleting from a BST
• Removing a node requires
– Moving its left and right subtrees
– Not hard if one not there
– But if both there?
• Answer: not too tough, but wait for
CS2150 to see!
• In CS2110, we’ll not worry about this
32
Next: Grammars, Trees,
Recursion
• Languages are governed by a set of rules
called a grammar
– Is a statement legal ?
– Generate or derive a new legal statement
• Natural language grammars
– Language processing by computers
• But, grammars used a lot in computing
– Grammar for a programming language
– Grammar for valid inputs, messages, data, etc.
33
Backus-Naur Form
• http://en.wikipedia.org/wiki/Backus-Naur_form
• BNF is a widely-used notation for describing
the grammar or formal syntax of
programming languages or data
• BNF specifics a grammar as a set of
derivation rules of this form:
<symbol> ::= <expression with symbols>
• Look at website and example there (also on
next slide)
– How are trees involved here? Is it recursive?
34
BNF for Postal Address
1.
2.
3.
4.
5.
<postal-address> ::= <name-part> <street-address> <zip-part>
<personal-part> ::= <first-name> | <initial> "."
<name-part> ::= <personal-part> <last-name> [<jr-part>]
<EOL> | <personal-part> <name-part>
<street-address> ::= [<apt>] <house-num> <street-name>
<EOL>
<zip-part> ::= <town-name> "," <state-code> <ZIP-code>
<EOL>
Example:
Ann Marie G. Jones
123 Main St.
Hooville, VA 22901
Where’s the recursion?
35
Grammars in Language
• Rule-based grammars describe
– how legal statements can be produced
– how to tell if a statement is legal
• Study textbook, pp. 389-391, to see rulebased grammar for simple Java-like
arithmetic expressions
– four rules for expressions, terms, factors, and
letter
– Study how a (possibly) legal statement is parsed
to generate a parse tree
36
Computing Parse-Tree Example
• Expression: a * b + c
37
Grammar Terms and Concepts
• First, this is what’s called a context-
free grammar
– For CS2110, let’s not worry about what
this means! (But in CS2102, you learn
this.)
• A CFG has
–a
–a
–a
–a
set of variables (AKA non-terminals)
set of terminal symbols
set of productions
starting symbol
38
Previous Parse Tree
• Terminal symbols:
– <operator> could be:
– <letter> could be:
+*
abc
• Production: <factor>  <letter> | <number>
39
Natural Language Parse Tree
• Statement: The man bit the dog
40
How Can We Use Grammars?
• Parsing
– Is a given statement a valid statement in the
language? (Is the statement recognized by the
grammar?)
– Note this is what the Java compiler does as a first
step toward creating an executable form of your
program. (Find errors, or build executable.)
• Production
– Generate a legal statement for this grammar
– Demo: generate random statements!
• See link on website next to slides
41
Demo’s Poem-grammar data file
{
<start>
The <object> <verb> tonight
}
{
<object>
waves
big yellow flowers
slugs
}
{
<verb>
sigh <adverb>
portend like <object>
die <adverb>
}
{
<adverb>
warily
grumpily
}
• Note: no recursive productions in this example!
42
Download