Data Structures: Trees and Grammars Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later) 1 Goals for this Unit • Continue focus on data structures and algorithms • Understand concepts of referencebased data structures (e.g. linked lists, binary trees) – Some implementation for binary trees • Understand usefulness of trees and hierarchies as useful data models – Recursion used to define data organization 2 Taxonomy of Data Structures • From the text: – Data type: collection of values and operations • Compare to Abstract Data Type! – Simple data types vs. composite data types • Book: Data structures are composite data types – Definition: a collection of elements that are some combination of primitive and other composite data types 3 Book’s Classification of Data Structures • Four groupings: – Linear Data Structures – Hierarchical – Graph – Sets and Tables • When defining these, note an element has: – one or more information fields – relationships with other elements 4 Note on Our Book and Our Course • Our book’s strategy – In Ch. 6, discuss principles of Lists • Give an interface, then implement from scratch – In Ch. 7, discuss principles of Trees – Later, in Ch. 9, see what Java gives us • Our course’s strategy – We did Ch. 9 first. Saw List interfaces and operations – Then, Ch. 8 on maps and sets – Now, trees with some implementation too 5 Trees Represent… • Concept of a tree very common and important. • Tree terminology: – Nodes have one parent – A node’s children • Leaf nodes: no children • Root node: top or start; no parent • Data structures that store trees • Execution or processing that can be expressed as a tree – E.g. method calls as a program runs – Searching a maze or puzzle 6 Trees are Important • Trees are important for cognition and computation – computer science – language processing (human or computer) • parse trees – knowledge representation (or modeling of the “real world”) • E.g. family trees; the Linnaean taxonomy (kingdom, phylum, …, species); etc. 7 Another Tree Example: File System C:\ lab1 list.h lab2 list.cpp MyMail CS120 CS216 lab3 school pers calc.cpp • What about file links (Unix) or shortcuts (Windows)? 8 Another Tree Example: XML and HTML documents <HTML> <HEAD>…</HEAD> <BODY> <H1>My Page</H1> <P> Blah <PRE>blah blah</PRE> End </P> </BODY> </HTML> How is this a tree? What are the leaves? 9 Tree Data Structures • Why this now? – Very useful in coding • TreeMap in Java Collections Framework – Example of recursive data structures – Methods are recursive algorithms 10 Tree Definitions and Terms • First, general trees vs. binary trees – Each node in a binary tree has at most two children • General tree definition: – Set of nodes T (possibly empty?) with a distinguished node, the root – All other nodes form a set of disjoint subtrees Ti • each a tree in its own right • each connected to the root with an edge • Note the recursive definition – Each node is the root of a subtree 11 Picture of Tree Definition r T1 T2 T3 • And all subtrees are recursively defined as: – a node with… – subtrees attached to it 12 Tree Terminology • A node’s parent • A node’s children – Binary tree: left child and right child – Sibling nodes – Descendants, ancestors • A node’s degree (how many children) • Leaf nodes or terminal nodes • Internal or non-terminal nodes 13 Recursive Data Structure Recursive Data Structure: a data structure that contains a pointer or reference to an instance of itself: public class TreeNode<T> { T nodeItem; TreeNode<T> left, right; TreeNode<T> parent; … } • Recursion is a natural way to express many algorithms. • For recursive data-structures, recursive algorithms are a natural choice 14 General Trees • Representing general trees is a bit harder – Each node has a list of child nodes • Turns out that: – Binary trees are simpler and still quite useful • From now on, let’s focus on binarytrees only 15 ADT Tree • Remember definition on an ADT? – Model of information: we just covered that – Operations? See pages 405-406 in textbook • Many are similar to ADT List or any data structure – The “CRUD” operations: create, replace, update, delete • Important about this list of operations – some are in terms of one specified node, e.g. hasParent() – others are “tree-wide”, e.g. size(), traversal 16 Classes for Binary Trees (pp. 416-431) • class LinkedBinaryTree (p. 425, bottom) – reference to root BinaryTreeNode – methods: tree-level operations • class BinaryTreeNode (p. 416) – – – – data: an object (of some type) left: references root of left-subtree (or null) right: references root of right-subtree (or null) parent: references this node’s parent node • Could this be null? When should it be? – methods: node-level operations 17 Two-class Strategy for Recursive Data Structures • Common design: use two classes for a Tree or List • “Top” class – has reference to “first” node – other things that apply to the whole data-structure object (e.g. the tree-object) • both methods and fields • Node class – Recursive definitions are here as references to other node objects – Also data (of course) – Methods defined in this class are recursive 18 Binary Tree and Node Class • LinkedBinaryTree class has: – reference to root node – reference to a current node, a cursor – non-recursive methods like: boolean find(tgt) // see if tgt is in the whole tree • Node class has: – data, references to left and right subtrees – recursive versions of methods like find: boolean find(tgt) // is tgt here or in my subtrees? • Note: BinaryTree.find() just calls Node.find() on the root node! – Other methods work this way too 19 Why Does This Matter Now? • This illustrates (again) important design ideas • The tree itself is what we’re interested in – There are tree-level operations on it (“ADT level” operations) • The implementation is a recursive data structure – There are recursive methods inside the lower-level classes that are closely related (same name!) to the ADT-level operation • Principles? abstraction (hiding details), delegation (helper classes, methods) 20 ADT Tree Operations: “Navigation” • Positioning: – toRoot(), toParent(), toLeftChild(), toRightChild(), find(Object o) • Checking: – hasParent(), hasLeftChild(), etc. – equals(Object tree2) • Book calls this a “deep compare” • Do two distinct objects have the same structure and contents? 21 ADT Tree Operations: Mutators • Mutators: – insertRight(Object o), insertLeft(Object o) • create a new node containing new data • make this new node be the child of the current node • Important: We use these to build trees! – prune() • delete the subtree rooted by the current node 22 Next: Implementation • Next (in the book) – How to implement Java classes for binary trees – Class for node, another class for BinTree – Interface for both, then two implementations (array and reference) • But for us: – We’ll skip some of this, particularly the array version – We’ll only look at reference-base implementation – After that: concept of a binary search tree 23 Understanding Implementations • Let’s review some of the methods on pp. 416-431 – (Done in class, looking at code in book.) • Some topics discussed: – Node class. Parent reference or not? – Are two trees equal? – Traversal strategies: Section 7.3.2 in book – visit() method and callback (also 7.3.2) 24 Binary Search Trees • We often need collections that store items – Maybe a long series of inserts or deletions • We want fast lookup, and often we want to access in sorted order – Lists: O(n) lookup – Could sort them for O(lg n) lookup • Cost to sort is O(n lg n) and we might need to re-sort often as we insert, remove items • Solution: search tree 25 Binary Search Trees • Associated with each node is a key value that can be compared. • Binary search tree property: – every node in the left subtree has key whose value is less than the value of the root’s key value, and – every node in the right subtree has key whose value is greater than the value of the root’s key value. 26 Example 5 4 1 8 7 11 3 BINARY SEARCH TREE 27 Counterexample 8 5 2 7 4 11 6 10 18 15 NOT A BINARY SEARCH TREE 20 21 28 Find and Insert in BST • Find: look for where it should be • If not there, that’s where you insert 29 Recursion and Tree Operations • Recursive code for tree operations is simple, natural, elegant • Example: pseudo-code for Node.find() boolean find(Comparable tgt) { Node next = null; if (this.data matches tgt) return true else if (tgt’s data < this.data) next = this.leftChild else // tgt’s data > this.data next = this.rightChild // next points to left or right subtree if (next == null ) return false // no subtree else return next.find(tgt) // search on } 30 Order in BSTs • How could we traverse a BST so that the nodes are visited in sorted order? – Does one of our traversal strategies work? • A very useful property about BSTs • Consider Java’s TreeSet and TreeMap – A search tree (not a BST, but be one of its better “cousins”) • In CS2150: AVL trees, Red-Black trees – Guarantee: search times are O(lg n) 31 Deleting from a BST • Removing a node requires – Moving its left and right subtrees – Not hard if one not there – But if both there? • Answer: not too tough, but wait for CS2150 to see! • In CS2110, we’ll not worry about this 32 Next: Grammars, Trees, Recursion • Languages are governed by a set of rules called a grammar – Is a statement legal ? – Generate or derive a new legal statement • Natural language grammars – Language processing by computers • But, grammars used a lot in computing – Grammar for a programming language – Grammar for valid inputs, messages, data, etc. 33 Backus-Naur Form • http://en.wikipedia.org/wiki/Backus-Naur_form • BNF is a widely-used notation for describing the grammar or formal syntax of programming languages or data • BNF specifics a grammar as a set of derivation rules of this form: <symbol> ::= <expression with symbols> • Look at website and example there (also on next slide) – How are trees involved here? Is it recursive? 34 BNF for Postal Address 1. 2. 3. 4. 5. <postal-address> ::= <name-part> <street-address> <zip-part> <personal-part> ::= <first-name> | <initial> "." <name-part> ::= <personal-part> <last-name> [<jr-part>] <EOL> | <personal-part> <name-part> <street-address> ::= [<apt>] <house-num> <street-name> <EOL> <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL> Example: Ann Marie G. Jones 123 Main St. Hooville, VA 22901 Where’s the recursion? 35 Grammars in Language • Rule-based grammars describe – how legal statements can be produced – how to tell if a statement is legal • Study textbook, pp. 389-391, to see rulebased grammar for simple Java-like arithmetic expressions – four rules for expressions, terms, factors, and letter – Study how a (possibly) legal statement is parsed to generate a parse tree 36 Computing Parse-Tree Example • Expression: a * b + c 37 Grammar Terms and Concepts • First, this is what’s called a context- free grammar – For CS2110, let’s not worry about what this means! (But in CS2102, you learn this.) • A CFG has –a –a –a –a set of variables (AKA non-terminals) set of terminal symbols set of productions starting symbol 38 Previous Parse Tree • Terminal symbols: – <operator> could be: – <letter> could be: +* abc • Production: <factor> <letter> | <number> 39 Natural Language Parse Tree • Statement: The man bit the dog 40 How Can We Use Grammars? • Parsing – Is a given statement a valid statement in the language? (Is the statement recognized by the grammar?) – Note this is what the Java compiler does as a first step toward creating an executable form of your program. (Find errors, or build executable.) • Production – Generate a legal statement for this grammar – Demo: generate random statements! • See link on website next to slides 41 Demo’s Poem-grammar data file { <start> The <object> <verb> tonight } { <object> waves big yellow flowers slugs } { <verb> sigh <adverb> portend like <object> die <adverb> } { <adverb> warily grumpily } • Note: no recursive productions in this example! 42