Programming Interest Group http://www.comp.hkbu.edu.hk/~chxw/pig/index.htm Tutorial Two Data Structures Data Structures Basic data types: Integral: integer, character, boolean Floating-point types: float, double, long double Data structures are methods of organizing large amounts of data. Array List, Stack, Queue, Dequeue Trees: binary tree, binary search tree, AVL tree Priority Queues Hash table Set Graph COMP1200: Data Structures and Algorithms Elementary Data Structures Data type is a set of values and a collection of operations on those values Basic data types in C and C++ Integers (ints) Floating-point numbers (floats) float, double Characters (chars) short int, int, long int, char Structure in C and C++ Example 1: Basic Data Types #include <iostream> #include <stdlib.h> #include <math.h> using namespace std; typedef int Number; Number randNum() { return rand(); } This program computers the average and standard deviation of a sequence of integers generated by the library function rand( ). Question: how can you modity the program to handle a sequence of random floating-point numbers in the range of [0, 1]? int main(int argc, char *argv[]) { int N = atoi(argv[1]); float m1 = 0.0, m2 = 0.0; for (int i = 0; i < N; i++) { Number x = randNum(); m1 += ((float)x) / N; m2 += ((float)x*x) / N; } cout << "RAND_MAX.: " << RAND_MAX << endl; cout << "Avg.:" << m1 << endl; cout << "Std. dev.: " << sqrt(m2 - m1 * m1) << endl; } Example 2: Structure #include <iostream> #include <stdlib.h> #include <math.h> using namespace std; struct mypoint { float x; float y; }; float mydistance(mypoint, mypoint); mypolar (mypoint, float *r, float *theta); int main(int argc, char *argv[]) { struct mypoint a, b; a.x = 1.0; a.y = 1.0; b.x = 4.0; b.y = 5.0; cout << " Distance is " << mydistance(a, b); float r, theta; mypolar(a, &r, &theta); cout << "r : " << r << endl; cout << “theta: " << theta << endl; } /* return the distance between two points */ float mydistance(mypoint a, mypoint b) { float dx = a.x - b.x; float dy = a.y - b.y; return sqrt(dx*dx + dy*dy); } /* convert from Cartesian to polar coordinates */ mypolar (mypoint p, float *r, float *theta) { *r = sqrt(p.x*p.x + p.y*p.y); *theta = atan2(p.y, p.x); } Result: [chxw@csr40 cplus]$ ./a.out Distance is 5 r : 1.41421 theta: 0.785398 Arrays Array is the most fundamental data structure An array is a fixed collection of same-type data that are stored contiguously and are accessible by an index It is the responsibility of the programmer to use indices that are nonnegative and smaller than the array size Two ways to create an array Static allocation: size known to and set by the programmer Dynamic allocation: size unknown to the programmer and set by the user at the execution time Example: Sieve of Eratosthenes #include <iostream> using namespace std; Sieve of Eratosthenes is a classical method to calculate the table of prime numbers. static const int N = 1000; int main( ) { Basic idea: int i, a[N]; Set a[i] to 1 if i is prime, and 0 if i is not /* initialization */ a prime. for (i = 2; i < N; i++) a[i] = 1; for (i = 2; i < N; i++) if (a[i] ) /* sieve i’s multiples up to N-1*/ for(int j = i; j*i < N; j++) a[i*j] = 0; for (i = 2; i < N; i++) if (a[i]) cout << " " << i; cout << endl; } Dynamic Memory Allocation C language malloc( ) and free( ) C++ language use operator new and operator delete int main(int argc, char *argv[]) { int N = atoi(argv[1]); int *a = new int[N]; if (a == 0) { cout << “out of memory " << endl; return 0; } … delete [] a; } Array of Structures #include <iostream> #include <stdlib.h> #include <math.h> using namespace std; struct mypoint { float x; float y; }; float mydistance(mypoint, mypoint); float randfloat( ); int main(int argc, char *argv[]) { float d = atof(argv[2]); int i, cnt = 0, N = atoi(argv[1]); mypoint *a = new mypoint[N]; for( i = 0; i < N; i++) { a[i].x = randfloat(); a[i].y = randfloat(); } for( i = 0; i < N; i++) for(int j = i+1; j < N; j++) if (mydistance(a[i], a[j]) < d) cnt++; cout << cnt << " pairs within " << d << endl; delete [] a; } /* return the distance between two points */ float mydistance(mypoint a, mypoint b) { float dx = a.x - b.x; float dy = a.y - b.y; return sqrt(dx*dx + dy*dy); } /* return a random number between 0 and 1 */ float randfloat( ) { return 1.0 * rand() / RAND_MAX; } This program calculates the number of pair of points whose distance is shorter than a threshold. List A general list of elements: A1, A2, …, AN, associated with a set of operations: Insert: add an element Delete: remove an element Find: find the position of an element (search) FindKth: find the kth element Each element has a fixed position Two different implementations: Array-based list Linked list List Linked list: A1 A2 A3 Linked list with a header: header A1 A2 A3 Doubly linked list: A1 A2 A3 Sample C Implementation of Linked List with a Header Header files: http://www.comp.hkbu.edu.hk/~chxw/pig/code/fatal.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/list.h Source file: http://www.comp.hkbu.edu.hk/~chxw/pig/code/list.h Circular List Example Josephus problem: N people decided to elect a leader as follows: Arrange themselves in a circle Eliminate every Mth person around the circle The last remaining person will be the leader Simulation of Josephus problem #include <iostream> #include <stdlib.h> int main(int argc, char *argv[]) { int i, N = atoi(argv[1]), M = atoi(argv[2]); using namespace std; /* create the first node */ mylink t = new mynode(1, 0); t->next = t; mylink x = t; struct mynode { int item; mynode* next; /* insert the next N-1 nodes */ for( i = 2; i <= N; i++) x = (x->next = new mynode(i, t)); /* constructor */ mynode(int x, mynode* t) { item = x; next = t; } /* simulate the election process */ while (x != x->next) { for (i = 1; i < M; i++) x = x->next; /* delete the next node */ t = x-> next; x->next = t->next; delete t; } cout << x->item << endl; }; typedef mynode *mylink; } Stacks A stack is a list with the restriction that insertions and deletions can be performed at the end of the list, called the top. LIFO: last in, first out Operations: Push(x, s) Pop(s) MakeEmpty(s) IsEmpty(s) Top(s) Stack Implementations Using a linked list http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackli.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackli.c Using an array http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackar.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackar.c Remark: you need to define the maximum stack size when creating the stack Queues A Queue is a list with the restriction that insertion is done at one end, whereas deletion is done at the other end. FIFO: first in, first out Operations: CreateQueue(x): create a queue with maximum size of x Enqueue(x, q): insert an element x at the end of the list Dequeue(q): return and remove the element at the start of the list IsEmpty(q) and IsFull(q) Queue Implementation Implemented by a circular array Need to specify the maximum size of the queue when creating the queue One variable for the front of the queue, another one for the rear of the queue Sample code http://www.comp.hkbu.edu.hk/~chxw/pig/code/queue.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/queue.c Priority Queues A priority queue is a data structure that allows the following operations: Insert(x, p): insert item x into priority queue p Maximum(p): return the item with the highest priority in priority queue p ExtractMax(p): return and remove the item with the highest priority in p Note: Each element contains a key which represents its priority Sets A set is a collection of unordered elements drawn from a given universal set U. Operations: Member(x, S): is an item x an element of set S? Union(A, B) Intersection(A, B) Insert(x, S) Delete(x, S) Dictionaries Dictionaries permit content-based retrieval. Operations: Insert(x, d) Delete(x, d) Search(k, d): return an item with key k Note Dictionaries can be implemented by lots of techniques, like linked list, array, tree, hashing, etc. C++ Standard Template Library The C++ STL provides implementations of lots of data structures Reference: http://www.sgi.com/tech/stl/ http://www.cppreference.com/ Data structures: (Containers in C++) Sequential containers (see Workshop 7) Vectors, Lists, Double-ended Queues Associative containers (see Workshop 7) Sets, Multisets, Maps, Multimaps Container adaptors Stacks, Queues, Priority Queues List in C++ List is implemented as a doubly linked list of elements Each element in a list has its own segment of memory and refers to its predecessor and its successor Disadvantage: Lists do not provide random access. General access to an arbitrary element takes linear time. Hence lists don’t support the [ ] operator Advantage: insertion or removal of an element is fast at any position http://www.cplusplus.com/reference/stl/list/ List Example 1 // list1.cpp #include <iostream> #include <list> using namespace std; $ g++ list1.cpp $ ./a.out abcdefghijklmnopqrst uvwxyz $ int main() { list<char> coll; for (char c = 'a'; c <= 'z'; ++c) coll.push_back(c); while (! coll.empty() ) { cout << coll.front() << ' '; coll.pop_front(); } cout << endl; return 0; } 24 List Example 2 // list2.cpp #include <iostream> #include <list> using namespace std; int main() { list<char> coll; for (char c='a'; c<='z'; ++c) coll.push_back(c); list<char>::const_iterator pos; for (pos = coll.begin(); pos != coll.end(); ++pos) cout << *pos << ' '; cout << endl; } $ g++ list2.cpp $ ./a.out abcdefghijklmnopqrstuvwxyz $ 25 List Example 3 // list3.cpp #include <iostream> #include <list> using namespace std; int main() { list<char> coll; for (char c='a'; c<='z'; ++c) coll.push_back(c); list<char>::iterator pos; for (pos = coll.begin(); pos != coll.end(); ++pos) { *pos = toupper(*pos); cout << *pos << ' '; } cout << endl; } 26 Stack in C++ // stack.cpp #include <iostream> #include <stack> using namespace std; int main() { stack<int> s; for (int i=1; i<=10; ++i) s.push(i); while( !s.empty() ) { cout << s.top() << endl; s.pop(); } return 0; } push(): insert an element pop(): remove the first element top(): access the first element size(): return the number of elements empty(): check whether the container is empty Remark: pop() will remove the first element and return nothing. So usually we need to call top() to get the first element, then call pop() to remove it. Queue in C++ // queue.cpp #include <iostream> #include <queue> using namespace std; int main() { queue<int> s; for (int i=1; i<=10; ++i) s.push(i); while( !s.empty() ) { cout << s.front() << endl; s.pop(); } return 0; } push(): insert an element pop(): remove the first element front(): access the first element back(): access the last element size(): return the number of elements empty(): check whether the container is empty Queue Example II // queue2.cpp #include <iostream> #include <queue> #include <string> using namespace std; q.push(“four “); q.push(“words!“); // skip one element q.pop(); int main() { queue<string> q; q.push(“These “); q.push(“are “); q.push(“more than “); cout << q.front(); q.pop(); cout << q.front(); q.pop(); cout << q.front(); q.pop(); cout << q.front(): q.pop(); cout << “number of elements in the queue: “ << q.size() << endl; return 0; } Priority Queue in C++ // pqueue.cpp #include <iostream> #include <queue> using namespace std; int main() { priority_queue<int> s; s.push(5); s.push(4); s.push(8); s.push(9); s.push(2); s.push(7); s.push(6); s.push(3); s.push(10); while( !s.empty() ) { cout << s.top() << endl; s.pop(); } return 0; } push(): insert an element pop(): remove the element with the highest priority top(): access the element with the highest priority size(): return the number of elements empty(): check whether the container is empty By default, elements are sorted by operator < in descending order, i.e., the largest element has the highest priority. Different Sorting Criterion // pqueue.cpp #include <iostream> #include <queue> using namespace std; int main() { priority_queue<int, vector<int>, greater<int> > s; s.push(5); s.push(4); s.push(8); s.push(9); s.push(2); s.push(7); s.push(6); s.push(3); s.push(10); while( !s.empty() ) { cout << s.top() << endl; s.pop(); } return 0; } Three parameters when defining a priority queue: int: type of element vector<int>: the container that is used internally greater<int>: the sorting criteria (by default, it is less<>) Java java.util package http://java.sun.com/products/jdk http://java.sun.com/j2se/1.4.2/docs/api/java/util/packagesummary.html Stack Stack Queue ArrayList, LinkedList Dictionaries HashMap, hashtable Priority Queue TreeMap Sets HashSet What to do now? Choose your own weapon C: write a set of data structure C++: learn the STL Java: learn the java.util package Try to solve at least one exercise If you still have time, solve more exercises. Practice http://acm.uva.es/p/v100/10038.html http://acm.uva.es/p/v100/10044.html http://acm.uva.es/p/v100/10050.html http://acm.uva.es/p/v101/10149.html http://acm.uva.es/p/v102/10205.html http://acm.uva.es/p/v102/10258.html http://acm.uva.es/p/v103/10315.html http://acm.uva.es/p/v8/843.html