Data Structures and Manipulation

advertisement
Data Structures and
Manipulation
By Dan Jones
OCR Specification Points
3.3.5 Data structures and data manipulation
Topics
Candidates should be able to:
• Implementation of data structures, including
• explain how static data structures may be
stacks, queues and trees.
• Searching, merging and sorting.
used to implement dynamic data structures;
• describe algorithms for the insertion, retrieval
and deletion of data items stored in stack,
queue and tree structures;
• explain the difference between binary
searching and serial searching, highlighting
the advantages and disadvantages of each;
• explain how to merge data files;
• explain the differences between the insertion
and quick sort methods, highlighting the
characteristics, advantages and
disadvantages of each.
Data Structures
• A data structure is a way of storing data in a way that its position has
meaning.
- e.g. Listing names in alphabetical order would give their position meaning.
• There are a number of structures that can be used.
- For example:
 Arrays and Records
 Serial, sequential, indexed sequential and direct access files
- When these are used will depend on the circumstances – the most appropriate
structure for the data should be used.
• Data Structures can be grouped into two main categories:
1. Dynamic
2. Static
IMPLEMENTATION OF
STATIC DATA STRUCTURES
Arrays and Lists
Static Data Structures
• Data structure whose size is fixed when it is created in memory.
- e.g. Array (or List)
Static Example:
What if the value of
memory location 8 was
already assigned to
somewhere else?
The whole of the array
must be moved.
This is very inefficient –
especially when dealing
with large amounts of data.
Array/List
This shows an example of
an array stored in memory in
alphabetical order.
Here decisions about the
array must be made before it
is used:
•
Name
Data type
Size
Shape (number of
dimensions)
•
•
•
This example is a list as it is
one dimensional.
22
….
….
People Array Index
33
Janet
Janet
Ben
0
44
Louise
Louise
Harry
1
55
Ben
Ben
Janet
2
66
Harry
Harry
3
77
Terry
Terry
Louise
4
88
….
0xFFEE
N/A
99
….
….
N/A
•
You can have arrays with
many dimensions.
Arrays are accessed by
providing the array name
and it’s index (from 0).
•
e.g. People[2] would
output “Janet”
Array:
• Name = People
• Data Types = String
• Size = 5
• Shape = 1D
Adding an extra item
would require
redefining the array.
Length
=5
Static Data Structures
• Data structure whose size is fixed when it is created in memory.
- e.g. Array (or List)
- Advantages:
 Little to no risk of overflow or underflow errors – as it will always take up the same space in
memory.
• This will most likely be reserved so no other program can access it
 The program/memory management system can allocate a fixed amount of memory.
- Disadvantages:
 Requires knowledge of the size of the array before it has been created. This can result in:
• Waste of resources – once they have been reserved the space it can no longer be used
by other processes/data.
• Running out of space when the prediction of space is too little.
IMPLEMENTATION OF
DYNAMIC DATA STRUCTURES
Linked Lists, Queues, Stacks, Binary Trees and their
implementation.
Examples of Dynamic Data Structures
• There are four type of dynamic data structures we need to know:
1. Linked List
2. Queue
3. Stack
4. Binary Trees
Linked List
• A linked list is similar to an array in that it stores a list of values, however it is
dynamic and can be extended or shortened to the size of the data inside it at
will.
• To do this pointers are used:
- Each data item no holds two pieces of information:
1. Data = the original data the item held.
2. Pointer = the address of the next item in the list.
- The Start Pointer stores the address of the first item in the list.
 This is what actually allows the list to be accessed by the program.
- The last item in the lists’ pointer will be Null (blank).
 This indicates the end of the list.
 Called a Null Pointer.
Dynamic Example:
Linked Lists
• Inserting an item simply
requires:
• Change pointer of previous
item to its new location.
• Make its equal to the
previous pointers old
value.
• e.g. Inserting “Kerry”
between “Janet”, “Louise”.
• Deleting an item simply
requires:
• Taking its pointer value,
and replacing the previous
items’ pointer value with
this.
• e.g. Deleting “Janet”.
• Now “Janet” will be
skipped, and can be
used for other data.
Start Pointer stores
the address of the first
item in the list.
Data
Pointers
2
….
3
Ben
4
4
Harry
75
5
Kerry
.....
8
6
“Misc. Data”
7
...
Janet
85
8
Louise
10
9
“Misc. Data”
10
Terry
11
.....
Null
Null Pointer is stored
in the last item to
indicate the last item in
the list.
Static and Dynamic Data Structures
Comparison
Static
• Data structure whose size is fixed
when it is created in memory.
Dynamic
• Data structure which will extend
and change its size to fit the data.
- e.g. Array
- e.g. Linked List
- Advantages:
- Advantages:
 The program/memory management
 Can extend as far as physically
system can allocate a fixed amount of
memory.
 No risk of overflow or underflow errors
• As it will always take up the same
space in memory.
possible – more flexible.
 Allows for the program to be more
easily written – less must be
determined at compilation time.
 Inserting, merging and deleting of
items is very easy and requires little
processing power.
Static and Dynamic Data Structures
Comparison
- Disadvantages:
- Disadvantages:
 Requires knowledge of the size of the
 Unnecessary + inefficient for small
array before it has been created. This
can result in:
• Waste of resources – once they
have been reserved the space it
can no longer be used by other
processes/data.
• Running out of space when the
prediction of space is too little.
 Any manipulation other than adding or
taking from the end requires moving
large amounts of the data.
• Inefficient use of memory and cputime.
amounts of data.
• In this case the size of the data
may be even smaller than the
extra data needed to make it
dynamic.
 Data can be highly fragmented over
extended use. This may cause a
physical bottleneck when the
hardware needs to access this data.
Queue
• A queue is a data structure, similar in implementation to a list/array.
- However it implements a “First In First Out” (FIFO or LILO) order.
 Hence le queue!
 So is therefore a “serial structure where the position is related to the chronological
appearance of the data”.
- It can grow and shrink in size.
 For example if items are being processed faster than they are being added to the queue, the
queues size will be smaller.
- Have two pointers:
1. Head Pointer holds the address of the oldest item in the queue (next to be read).
2. Tail Pointer holds the free address before the most recently added item in the queue (last to
be read).
• There are two operations you can do to a queue:
1. Enqueue = putting something on the end of the queue.
 Value added to address pointed to by tail pointer.
 The tail pointer is then incremented.
2. Dequeue = reading and removing the item at the front of the queue.
 Value at head pointer read.
 Head pointer incremented.
Dynamic Example:
Queue
• Enqueue = putting something on the end of the queue.
• Value added to address pointed to by tail pointer.
• The tail pointer is then incremented.
• Dequeue = reading and removing the item at the front of the queue.
• Value at head pointer read.
• Head pointer incremented.
• The previous head value is now ignored as if it were free space.
Tail Pointer = 7 => 8
Head Pointer = 0 => 1
0
1
2
3
4
5
6
7
Dynamic Example:
Stack
…
13
12
A variable which stores the
address of the upmost value of
the stack.
11
• A stack is a method of storing
data following the first in last
out (FILO) principle.
• A stack pointer is used to
store the location of the most
recently added item of the
stack.
• Used to read the data.
• Note: Only the value
pointed to by the stack
pointer can be read at any
time.
10
9
8
7
6
5
4
• Push = the action of adding
something to the stack.
• Stack Pointer Incremented
(7=>8)
• Value is then stored in the
address represented by
stack pointer.
3
2
1
0
Stack Pointer = 7 => 8
Dynamic Example:
Stack
…
13
12
A variable which stores the
address of the upmost value of
the stack.
11
• Pop = the action of removing
an item from the top of the
stack.
• Stack Pointer is
decremented (8=>7)
• Note: The data does not
need to be deleted as
there is no longer any
reference to it.
10
9
8
7
6
5
4
3
2
1
0
Stack Pointer = 7 => 8 => 7
Implementing Stacks and Queues
• Static data structures can be used to implement dynamic data structures.
- This is because most computers do not implement dynamic data structures natively.
And so these must be developed in code.
- Often higher level languages will have implementations built in.
 e.g. This is done by vb.net .
• A stack can be implemented by using an array.
- Methods can be added to the array class to
allow popping and pushing to an array.
 And it will be stored identically in memory.
- However a linked list can be used as well. This provides
a more flexible implementation.
 Meaning the stack does not need to take up a continuous
2
….
3
Ben
4
Harry
5
Janet
6
Harry
7
Louise
8
….
Array stored in memory as a stack
by restricting access to the top item
only (highest index -> “Louise”).
section of memory.
 It can also remove the source of some stack overflows errors.
• As when the available continuous memory runs out can simply point to more free space.
(as opposed to writing over other data).
• A similar method can be applied to create a queue.
Binary Trees
• A binary tree is data structure which stores items of data.
- Each item of data points to another two. (binary!)
Chloe
- The direction in which they are pointed gives their
Barry
position meaning.
 Can be used to sort alphabetically (as the example has been).
Alex
If the traversing algorithm is known.
- The first node is called the root node.
Terence
Ben
Becky
Bex
Example of a binary tree.
- Each pointer (arrow) is a possible path from the node
- After each new set of items are created they are called a new layer.
• The syllabus specifies one way of traversing trees:
1. If there is a left branch that has yet to be traversed, then follow it and repeat.
2. Read the node if it hasn’t already been read.
3. I there is a right branch traverse it and go back to 1.
4. Go back up one layer.
• However other algorithms can be used.
 Such as the method used to traverse binary trees for Reverse Polish.
Follow this shape, but
recursively.
Implementing Binary Trees
• Binary Trees are implemented using a “linked list of arrays”.
- Each node is represented by an array containing:
 Data the node represents
↴
 Pointer to left hand child node → (null if on an end of the tree)
 Pointer to right hand child node
- The pointers store the address in memory of the nodes’
children.
 In this way it acts as a linked list.
Example node with associated data.
Data
Left Pointer
Right Pointer
Ben
0xD023F
0xF11A2
Implementing Binary Trees
• Example binary tree:
Root Node = 0
Data
Left Pointer
Right Pointer
Chloe
7
10
A path.
Data
Left Pointer
Right Pointer
Data
Left Pointer
Right Pointer
Barry
3
13
Terence
Null
Null
A node.
Data
Left Pointer
Right Pointer
Data
Left Pointer
Right Pointer
Alex
Null
Null
Ben
27
18
Data
Left Pointer
Right Pointer
Data
Left Pointer
Right Pointer
Becky
Null
Null
Bex
Null
Null
Second
Layer
Implementing Binary Trees
• Example binary tree represented as arrays in memory:
Root Node = 0
0
“Chloe”
16
...
1
7
17
...
2
10
18
“Bex”
3
“Alex”
19
Null
4
Null
20
Null
5
Null
21
...
6
...
22
...
7
“Barry”
23
...
8
3
24
...
Data
L
R
Data
L
R
9
13
25
...
Alex
N
N
Ben
27
18
10
“Terrence”
26
...
11
Null
27
“Becky”
12
Null
28
Null
13
“Ben”
29
Null
14
27
30
...
15
18
31
...
Each node
represented
by an array
of length 3.
Data
L
R
Chloe
7
10
Data
L
R
Data
L
R
Barry
3
13
Terrence
N
N
Data
L
R
Data
L
R
Becky
N
N
Bex
N
N
DATA MANIPULATION
SEARCHING
Methods of searching lists. (Recap from F452)
Linear Searches
• A serial search is where a list is searched in order from its’ first to its’ last
item.
- This list is not necessarily ordered (but can be).
- Can be slow – especially as the dataset increases.
- As there is no order of items, it will have to check each item on the list before it can
determine the item does not exist.
 Inefficient/waste of cpu time.
• A sequential search is a linear search performed on an ordered dataset.
- Main advantage over serial searching is that if the item does not exist, this can be
determined more quickly.
 When it passes the point where the item should be, it will stop.
 e.g. When looking for “Ben” in fig.1, will stop when it gets to “Beth” – as
“Beth” is after “Ben” in the alphabet.
Names
Andrew
Beth
Chad
Dave
Fred
Fig.1 Sample
ordered list.
Binary Search
• A binary search is a method of searching data which has been pre-sorted.
- Works by splitting the list in two each time, and taking the section which contains the
data item.
 Hence le binary (two)
- Very efficient – much faster than a serial search.
 Will take a maximum of log2[Number of items in the list] iterations to find a specific value.
 As opposed to [Number of items in the list] for a serial/linear search.
• The algorithm can be summarised as: (in a LIST of length N)
1.
Find midpoint value of list: LIST[N/2]
 If an odd number, round up e.g. 13/2 = 6.5 = 7 -> LIST[7]
If target = midpoint, item found at index of midpoint.
3. If target is greater than midpoint, delete all values above.
If target is smaller than midpoint, delete all values below.
4. Go back to 1.
2.
1. Midpoint, 2/2
12/2==1,
5/2
2.5
6,isis
=>
“Dave”.
“Gareth”
3, is “Chad”.
2. “Dave”
“Gareth”does
“Chad”
doesnot
notequal
equal“Dave”
“Dave”.
“Dave”.
=> “Dave”
Searching Example:
4. “Gareth”
3. found
“Dave”atisposition
less than
more
than
“Chad” (lower
(higherininthe
alphabet),
the
alphabet),
so remove
so remove
all value
all value
above
below
and including “Chad”.
“Gareth”.
4. Go back to 1.
Binary
For example, finding “Dave” in
this alphabetical list.
In LIST of length N
1. Find midpoint value of list:
LIST[N/2]
• If an odd number,
round up e.g. 13/2 =
6.5 = 7 -> LIST[7]
2. If target = midpoint, item
found at index of midpoint.
3. If target is greater than
midpoint, delete all values
above.
If target is smaller than
midpoint, delete all values
below.
4. Go back to 1.
N=5
N=2
N=1
Names
Names
Names
Names
Andrew
Andrew
Dave
Dave
Beth
Beth
Fred
Chad
Chad
Dave
Dave
Fred
Fred
N = 12
Gareth
Harry
Matt
Steve
Terry
Vanessa
Zeffery
DATA MANIPULATION
SORTING
Methods of sorting lists.
Insertion Sort
• A method of sorting in which each item is copied from the file into a new file,
in the correct position.
- Simple, but has some disadvantages:
 Inefficient use of time – very slow.
 Requires a lot space in memory.
• Algorithm:
1. Read each value, storing the address of the smallest.
2. When all have been read:
a. Copy smallest to the first place in the new file
b. Remove smallest from old file.
3. Go back to 1, until the old file is empty.
Quick Sort
• A Quick sort is an alternative method sorting.
- Complicated, and cumbersome method but...
 Becomes increasingly efficient as the number of items increases.
 Relatively easy to program.
• Algorithm:
Display list in a row, with a fixed arrow on the first value, and a movable arrow on
the last (however does not actually matter)
2. If the two pointed to values are in the right order:
1.

3.
Move the movable arrow towards the centre.
Else:

Swap the arrows.
Repeat 2-3. until arrows are adjacent – the middle item is now in the correct place.
5. Repeat with sub lists on either side the correctly ordered item.
4.

This is a good exemplar use of recursion.
Sorts Question
•
Perform a quick sort on:
Answer:
Names
Vanessa
Names
Names
Zeffery
Andrew
Fred
Dave
Harry
Fred
Steve
Harry
Andrew
Steve
Dave
Terry
Zeffery
•
Answer:
Perform an insertion sort on:
Names
Names
Beth
Names
Chad
Vanessa
Gareth
Gareth
Matt
Chad
Vanessa
Beth
Matt
DATA MANIPULATION
MERGING
Methods of merging lists.
Merge Sort
• A merge sort is a method of merging two already sorted (sequential) files.
• Outline:
1.
2.
3.
4.
5.
6.
Read first value from each file
Compare
Write smallest value to new file
Read next value from file used
Back to 2. until no more items are left.
Write remainder of longest file to new file.
Merge Sort
•
For example:
Answer:
Names
Andrew
Beth
Names
Names
Andrew
Beth
Chad
Dave
Chad
Dave
Fred
Gareth
Fred
Harry
Matt
Gareth
Steve
Vanessa
Harry
Names
Terry
Matt
Zeffery
Steve
Terry
Vanessa
Zeffery
Download