Data Structures and Collections • Principles revisited • .NET: – Two libraries: • System.Collections • System.Collections.Generics FEN 2012 UCN Technology - Computer Science 1 Data Structures and Collections Choose and use an adt, e.g. Map application Read and write (use) specifications ADT class: class Appl{ ---- interface: Map m; (e.g. Map) HashMap Specification TreeMap ----- ---- m= new XXXMap(); Choose and use a data structure, e.g. TreeMap FEN 2012 Data structure and algorithms UCN Technology - Computer Science Know about 2 Collections Library System.Collections • • • • • • Data structures in .NET are normally called Collections Are found in namespace System.Collections Compiled into mscorlib.dll assembly Uses object and polymorphism for generic containers. Deprecated! Classes: – Array – – – – ArrayList Hashtable Stack Queue FEN 2012 UCN Technology - Computer Science 3 Collection Interfaces • System.Collections implements a range of different interfaces in order to provide standard usage of different containers – Classes that implements the same interface provides the same services – Makes it easier to learn and to use the library – Makes it possible to write generic code towards the interface • Interfaces: – – – – – – ICollection IEnumerable IEnumerator IList IComparer IComparable FEN 2012 UCN Technology - Computer Science 4 ArrayList • ArrayList stores sequences of elements. – duplicate values are ok – position- (index-) based – Elements are stored in an resizable array. – Implements the IList interface public class ArrayList : IList, IEnumerable, ... { // IList services ... // additional services int Capacity { get... set... } void TrimToSize() control of memory in underlying array int BinarySearch(object value) int IndexOf (object value, int startIndex) int LastIndexOf (object value, int startIndex) ... searching } FEN 2012 UCN Technology - Computer Science 5 IList Interface • IList defineres sequences of elements – Access through index add new elements public interface IList : ICollection { int Add (object value); void Insert(int index, object value); remove void Remove (object value); void RemoveAt(int index); void Clear (); containment testing bool Contains(object value); int IndexOf (object value); object this[int index] { get; set; } read/write existing element (see comment) bool IsReadOnly { get; } bool IsFixedSize { get; } structural properties } FEN 2012 UCN Technology - Computer Science 6 Hashtable • Hashtable supports collections of key/value pairs – keys must be unique, values holds any data – stores object references at key and value – GetHashCode method on key determine position in the table. create add Hashtable ages = new Hashtable(); ages["Ann"] = 27; ages["Bob"] = 32; ages.Add("Tom", 15); update ages["Ann"] = 28; retrieve int a = (int) ages["Ann"]; FEN 2012 UCN Technology - Computer Science 7 Hashtable Traversal • Traversal of Hashtable – each element is of type DictionaryEntry (struct) – data is accessed using the Key and Value properties Hashtable ages = new Hashtable(); ages["Ann"] = 27; ages["Bob"] = 32; ages["Tom"] = 15; enumerate entries get key and value FEN 2012 foreach (DictionaryEntry entry in ages) { string name = (string) entry.Key; int age = (int) entry.Value; ... } UCN Technology - Computer Science 8 .NET 2: System.Collections.Generics (key, value) -pair ICollection<T> IList<T> LinkedList<T> IDictionary<TKey, TValue> List<T> SortedDictionary <TKey, TValue> Index able Array-based FEN 2012 Balanced search tree UCN Technology - Computer Science Dictionary <TKey, TValue> Hashtabel 9 Demos • Lists • Maps • LinkedList in C# FEN 2012 UCN Technology - Computer Science 10 How does they work? Count • Array-based list used Free (waste) • Linked list FEN 2012 UCN Technology - Computer Science 11 Dynamic vs. Static Data Structures • Array-Based Lists: – Fixed (static) size (waste of memory). – May be able to grown and shrink (ArrayList), but this is very expensive in running time (O(n)) – Provides direct access to elements from index (O(1)) • Linked List Implementations: – Uses only the necessary space (grows and shrinks as needed). – Overhead to references and memory allocation – Only sequential access: access by index requires searching (expensive: O(n)) FEN 2012 UCN Technology - Computer Science 12 Hashing • Keys are converted to indices in an array. • A hash function, h maps a key to an integer, the hash code. • The hash code is divided by the array size and the remainder is used as index • If two or more keys gives the same index, we have a collision. FEN 2012 UCN Technology - Computer Science 13 Chaining • The array doesn’t hold the element itself, but a reference to a collection (a linked list for instance) of all colliding elements. • On search that list must be traversed FEN 2012 UCN Technology - Computer Science 14 Efficiency of Hashing • Worst case (maximum collisions): – retrieve, insert, delete all O(n) • Average number of collisions depends on the load factor, λ, not on table size λ = (number of used entries)/(table size) – But not on n. • Typically (linear probing): numberOfCollisionsavg = 1/(1 - λ) • Example: 75% of the table entries in use: – λ = 0.75: 1/(1-0.75) = 4 collisions in average (independent of the table size). FEN 2012 UCN Technology - Computer Science 15 When Hashing Is Inefficient • Traversing in key order. • Find smallest/largest key. • Range-search (Find all keys between high and low). • Searching on something else than the designated primary key. FEN 2012 UCN Technology - Computer Science 16 (Binary) Search Trees • Value based container: – The search tree property: • For any internal node: the value is greater than the value in the left child • For any internal node: the value is less than the value in the right child – Note the recursive nature of this definition: • It implies that all sub trees themselves are search trees • Every operation must ensure that the search tree property is maintained FEN 2012 UCN Technology - Computer Science 17 Example: A Binary Search Tree Holding Names FEN 2012 UCN Technology - Computer Science 18 InOrder: Traversal Visits Nodes in Sorted Order FEN 2012 UCN Technology - Computer Science 19 Efficiency • insert • retrieve • delete – All operations depend on the depth of the tree – If balanced: O(log n) • Most libraries use a balanced version, for instance Red-Black Trees FEN 2012 UCN Technology - Computer Science 20