Data Abstractions

advertisement
Data Abstractions
EECE 310: Software Engineering
Learning Objectives
• Define data abstractions and list their elements
• Write the abstraction function (AF) and
representation invariant (RI) of a data abstraction
• Prove that the RI is maintained and that the
implementation matches the abstraction (i.e., AF)
• Enumerate common mistakes in data abstractions
and learn how to avoid them
• Design equality methods for mutable and
immutable data types
Data Abstraction
• Introduction of a new type in the language
– Type can be abstract or concrete
– Has one of more constructors and operations
– Type can be used like a language type
• Both the code and the data associated with the
type is encapsulated in the type definition
– No need to expose the representation to clients
– Prevents clients from depending on implementation
Isn’t this OOP ?
• NO, though OOP is a way to implement ADTs
– OOP is a way of organizing programs into classes
and objects. Data abstraction is a way of
introducing new types ADTs with meanings.
– Encapsulation is a goal shared by both. But data
abstraction is more than just creating classes.
– In Java, every data abstraction can be
implemented by a class declaration. But every
class declaration is not a data abstraction.
Elements of a Data Abstraction
• The abstraction specification should:
– Name the data type
– List its operations
– Describe the data abstraction in English
– Specify a procedural abstraction for each
operation
• Public vs. Private
– The abstraction only lists the public operations
– There may be other private procedures inside…
Example: IntSet
• Consider a IntSet Data type that we wish to
introduce in the language. It needs to have:
– Constructors to create the data-type from scratch
or from other data types (e.g., lists, IntSets)
– Operations include insert, remove, size and isIn
– A specification of what the data type represents
– Internal representation of the data type
IntSet Abstraction
• public class IntSet {
//OVERVIEW: IntSets are mutable, unbounded sets of integers.
//
A typical IntSet is {x1, …xn}, where xi are all integeres
// Constructors
• public IntSet();
•
//EFFECTS: Initializes this to be the empty set
•
// Mutators
• public void insert (int x);
•
// MODIFIES: this
•
// EFFECTS: adds x to the set this, i.e, this_post = this u {x}
• public void remove (int x);
•
// MODIFIES: this
•
// EFFECTS: this_post = this - {x}
•
//Observers
•
public boolean IsIn(int x);
•
// EFFECTS: returns true if x e this, false otherwise
•
public int size();
•
// EFFECTS: Returns the cardinality of this
• }
Group Activity
• Consider the Polynomial data-type below.
Write the specifications for its methods.
public class Poly {
public Poly(int c, int n) throws NegException;
public Poly add(Poly p) throws NPException;
public Poly mul(Poly p) throws NPException;
public Poly minus();
public int degree();
}
Learning Objectives
• Define data abstractions and list their elements
• Write the abstraction function (AF) and
representation invariant (RI) of a data abstraction
• Prove that the RI is maintained and that the
implementation matches the abstraction (i.e., AF)
• Enumerate common mistakes in data abstractions
and learn how to avoid them
• Design equality methods for mutable and
immutable data types
Abstraction Versus Representation
• Abstraction: External view of a data type
• Representation: Internal variables to represent
the data within a type (e.g., arrays, vectors, lists)
Abstraction
{ 1, 2, 3 }
1
Representation
2
1
3
3
abstract
objects
2
rep
objects
Example: Representation
0
N
Vector<Integer> ‘elems’ of size N to represent an IntSet
• Vector directly holds
the set elements
– if integer e is in the set,
there exists 0 <= i < N,
such that elems[i] = e
• Vector is a bitmap for
denoting set elements
– If integer i is in the set,
then elems[i] = True,
else elems[i] = False
Can you tell how the representation maps to the abstraction ?
Abstraction Function
• Mathematical function to map the
representation to the abstraction
• Captures designer’s intent in choosing the rep
– How do the instance variables relate to the
abstract object that they represent ?
– Makes this mapping explicit in the code
– Advantages: Code maintenance, debugging
IntSet: Abstraction Function
Unsorted Array
AF ( c ) = { c.elems[i].intValue
0 <= i < c.elems.size
}
Boolean Vector
AF( c ) = {
j | 0 <= j < 100 &&
c.elems[j]
}
•The abstraction function is defined for concrete
instances of the class ‘c’, and only includes the
instance variables of the class. Further, it maps the
elements of the representation to the abstraction.
Abstraction Function: Valid Rep
The abstraction function implicitly assumes that the
representation is valid for the class
– What happens if the vector contains duplicate entries
in the first scenario ?
– What happens in the second scenario if the bitmap
contains values other than 0 or 1 ?
The AF holds only for valid representations. How do
we know whether a representation is valid ?
Representation Invariant
• Captures formally the assumptions on which
the abstraction function is based
• Representation must satisfy this at all times
(except when executing the ADT’s methods)
• Defines whether a particular representation is
valid – invariant satisfied only by valid reps.
IntSet: Representation Invariant
Unsorted Arrays
1. c.elems =/= null &&
2. c.elems has no null
elements &&
3. there are no duplicates in
c.elems i.e., for 0<=i, j <N,
c.elems[i].intValue =
c.elems[j].intValue=> i = j.
Boolean Vector
1. c.elements =/= null &&
2. c.elements.size = maxValue
NOTE: The types of the instance variables are NOT a part of the Rep
Invariant. So there is not need to repeat what is there in the type signature.
Rep Invariant: Important Points
• Rep invariant always holds before and after the
execution of the ADT’s operations
– Can be violated while executing the ADT’s operations
– Can be violated by private methods of the ADT
• How much shall the rep invariant constrain?
– Just enough for different developers to implement
different operations AND not talk to each other
– Enough so that AF makes sense for the representation
AF and RI: How to implement ?
RI: repOK
Public method to check if the
rep invariant holds
Useful for testing/debugging
AF: toString
Public method to convert a
valid rep to a String form
Useful for debugging/printing
public boolean repOK() {
// EFFECTS: Returns true
// if the rep invariant holds,
// Returns false otherwise
}
public String toString( ) {
// EFFECTS: Returns a string
// containing the abstraction
// represented by the rep.
Uses of RI and AF
• Documentation of the programmer’s thinking
• RepOK method can be called before and after
every public method invocation in the ADT
– Typically during debugging only
• toString method can be used both during
debugging and in production
• Both the RI and AF can be used to formally
prove the correctness of the ADT
Group Activity
• Assume that the Polynomial data type is
represented as an array trms and a variable
deg. The co-efficients of the term xi are stored
in the ith element of trms array, and the
variable deg represents the degree of the
polynomial (i.e., its highest exponent).
1. Write its abstraction function
2. Write its rep-invariant
Learning Objectives
• Define data abstractions and list their elements
• Write the abstraction function (AF) and
representation invariant (RI) of a data abstraction
• Prove that the RI is maintained and that the
implementation matches the abstraction (i.e., AF)
• Enumerate common mistakes in data abstractions
and learn how to avoid them
• Design equality methods for mutable and
immutable data types
Reasoning about ADTs - 1
• ADTs have state in the form of representation
– Need to consider what happens over a sequence
of operations on the abstraction
– Correctness of one operation depends on
correctness of previous operations
– We need to reason inductively over the operations
of the ADT
• Show that constructor is correct
• Show that each operation is correct
Reasoning about ADTs - 2
• First, need to show that the rep invariant is
maintained by the constructor & operations
• Then, show that the implementation of the
abstraction matches the specification
– Assume that the rep invariant is maintained
– Use the abstraction function to map the
representation to the abstraction
Why show that Rep Invariant is
maintained ?
• Consider the implementation of the IntSet using
the unsorted vector representation. We wish to
compute the size of the set (i.e., its cardinality).
public int size() {
return elems.size();
}
Is the above implementation correct ?
Why show that Rep Invariant is
maintained ?
Yes, but only if the Rep Invariant holds !
c.elems != Null && c.elems has no null elements
&& c.elems has no duplicates
Otherwise, size can return a value >= cardinality
public int size() {
return elems.size();
}
Showing Rep Invariant is maintained:
Data Type Induction
• Show that the constructor establishes the Rep Invariant
• For all other operations,
A Valid Rep
Function Body
Another Valid Rep
Assume at the time of the call the
invariant holds for
1.this and
2.all argument objects of the type
Demonstrate that the invariant holds
on return for
1.this
2.all argument objects of the type
3.for returned objects of the type
IntSet : getIndex
Assume that IntSet has the following private function.
Note that private methods do not need to preserve the RI.
private int getIndex( int x ) {
// EFFECTS: If x is in this, returns index
// where x appears in the Vector elems
// else return -1 (do NOT throw an exception)
for (int i = 0; i < els.size( ); i ++ )
if ( x == elements.get(i).intValue() )
return i;
return –1;
}
IntSet: Constructor
Show that the RI is true at the end of the constructor
public IntSet( ) {
// EFFECTS: Initializes this to be empty
elems = new Vector<Integer>();
}
RI: c.elems != NULL && c.elems has no null elements
&& c.elems has no duplicates
Proof: When the constructor terminates,
Clause 1 is satisfied because the elems vector is initialized by constructor
Clause 2 is satisfied because elems has no elements (and hence no null elements)
Clause 3 is satisfied because elems has no elements (and hence no duplicates)
IntSet: Insert
Show that if RI holds at the beginning, it holds at the end.
public void insert (int x) {
// MODIFIES: this
// EFFECTS: adds x to the set such that this_post = this u {x}
if ( getIndex(x) < 0 )
elems.add( new Integer(x) );
}
RI: c.elems != NULL && c.elems has no null elements
&& c.elems has no duplicates
Proof:
If clause 1 holds at the beginning, it holds at the end of the procedure.
- Because c.elems is not changed by the procedure.
If clause 2 holds at the beginning, it holds at the end of the procedure
- Because it adds a non-null reference to c.elems
If clause 3 holds at the beginning, it holds at the end of the procedure
- Because getIndex() prevents duplicate elements from being added to the vector
IntSet:Remove
Show that if RI holds at the beginning, it holds at the end.
pubic void remove(int x)
{
// MODIFIES: this
// EFFECTS: this_post = this - {x}
int i = getIndex(x);
if (i < 0) return; // Not found
elems.set(i, elems.lastElement() );
elems.remove(elems.size() – 1);
}
RI: c.elems != NULL && c.elems has no null elements
&& c.elems has no duplicates
IntSet: Observers
Show that if RI holds at the beginning, it holds at the end.
public int size() {
return elems.size();
}
public boolean isIn(int x) {
return getIndex(x) >= 0;
}
RI: c.elems != NULL && c.elems has no null elements
&& c.elems has no duplicates
This completes the proof that the RI holds in the ADT.
In other words, given any sequence of operations in the ADT,
the RI always holds at the beginning and end of this sequence.
Group Activity
• Consider the implementation of the
Polynomial Datatype described earlier (also on
the code handout sheet)
• Show using data-type induction that the Rep
Invariant is preserved
Are we done ?
• Thus, we have shown that the RI is established
by the constructor and holds for each
operation (i.e., if RI is true at the beginning, it
is true at the end). Can we stop here ?
No. To see why not, consider an implementation of the operators that does nothing.
Such an implementation will satisfy the rep invariant, but is clearly wrong !!!
To complete the proof, we need to show that the Abstraction provided by the
ADT is correct. For this, we use the (now proven) fact that the RI holds and use
the AF to show that the rep satisfies the AF’s abstraction after each operation.
Abstraction Function: IntSet
Show that the implementation matches the
ADT’s specification (i.e., its abstraction)
Given:
Pre-Rep
Abstraction function
Function Implementation
Prove that:
Post- Rep
Abstraction function
Pre-Abstraction
Function Spec
Post-Abstraction
Abstraction Function: Constructor
AF ( c ) = { c.elems[i].intValue | 0 <= i <
c.elems.size }
public IntSet( ) {
// EFFECTS: Initializes this to be empty
elems = new Vector<Integer>() ;
}
AF
Empty vector
Proof: Constructor creates an empty set, so it is correct.
Empty Set
Abstraction Function: Size
AF ( c ) = { c.elems[i].intValue | 0 <= i <
c.elems.size }
public int size() {
// EFFECTS: Returns the cardinality of this
return elems.size( );
}
AF
Number of elements
in vector
Cardinality of the set (Why ?)
Proof: Because the rep invariant guarantees that there are no duplicates in the
vector, the number of elements in the vector denotes the cardinality of the set.
Abstraction Function: Insert
AF ( c ) = { c.elems[i].intValue | 0 <= i <
c.elems.size }
AF
public void insert (int x) {
// MODIFIES: this
// EFFECTS: adds x to the set
// such that this_post = this U {x}
if ( getIndex(x) < 0 )
elems.add(new Integer(x));
}
Vector
this
Implementation
Vector with element
added if and only if it
did not already exist
this_post = this U {x}
AF
Abstraction Function: Remove
AF ( c ) = { c.elems[i].intValue| 0 <= i <
c.elems.size }
Vector
public void remove (int x) {
// MODIFIES: this
// EFFECTS: this_post = this - {x}
int i = getIndex(x);
if (i < 0) return; // Not found
// Move last element to the index i
elems.set(i, elems.lastElement() );
elems.remove(elems.size() – 1);
Vector with first
}
instance of element
removed if it exists
this
this_post = this - {x}
Abstraction Function: IsIn
AF ( c ) = { c.elems[i].intValue| 0 <= i < c.elems.size }
vector
public boolean isIn(int x) {
// EFFECTS: Returns true if x belongs to
//
this, false otherwise
return getIndex(x) > 0;
}
True if and
only if x is
present in the
vector
this
True if x
belongs to
this, False
otherwise
Proof Summary
• This completes the proof. Thus, we’ve shown
that the ADT implements it spec correcltly.
This method is called “Data type induction”,
because it proceeds using induction.
– Step 0: Write the implementation of the ADT
– Step 1: Show that the RI is maintained by the ADT
– Step 2: Assuming that the RI is maintained, show
using the AF that the translation from the rep to
the abstraction matches the method’s spec.
Group Activity
• Consider the implementation of the
Polynomial Datatype described earlier (also on
the code handout sheet)
• Show that the ADT’s implementation matches
its specification assuming that the RI holds.
Learning Objectives
• Define data abstractions and list their elements
• Write the abstraction function (AF) and
representation invariant (RI) of a data abstraction
• Prove that the RI is maintained and that the
implementation matches the abstraction (i.e., AF)
• Enumerate common mistakes in data abstractions
and learn how to avoid them
• Design equality methods for mutable and
immutable data types
Exposing the Rep
• Note that the proof we just wrote assumes
that the only way you can modify the
representation is through its operations
– Otherwise Rep invariant can be violated
• Is this always true ?
– What if you expose the representation outside the
class, so that any outside entity can change it ?
Mistakes that lead to
exposing the rep - 1
• Making rep components public
public class IntSet {
public Vector<Integer> elements;
Your rep must always be private. Otherwise, all bets are off.
Hopefully, your code will not have this bug ….
Mistakes that lead to exposing
the rep - 2
public class IntSet {
//OVERVIEW: IntSets are mutable, unbounded sets of integers.
//
A typical IntSet is {x1, …xn}
private Vector<Integer> elems;
// no duplicates in vector
public Vector<Integer> allElements (){
//EFFECTS: Returns a vector containing the elements of this,
//
each exactly once, in arbitrary order
return elems;
}
};
intSet = new IntSet();
intSet.allElements().add( new Integer(5) );
intSet.allElements().add( new Integer(5) ); // RI violated – duplicates !
Mistakes that lead to exposing
the rep - 3
public class IntSet {
//OVERVIEW: IntSets are mutable, unbounded sets of integers.
//
A typical IntSet is {x1, …xn}
private Vector<Integer> elems;
//constructors
public IntSet (Vector<Integer> els) throws NullPointerException {
//EFFECTS: If els is null, throws NullPointerException, else
// initializes this to contain as elements all the ints in els.
if (els == null) throw new NullPointerException();
elems = els;
}
};
Vector<Integer> someVector = new Vector();
intSet = new IntSet(someVector);
someVector.add( new Integer(5) );
someVector.add( new Integer(5) );
// RI violated – duplicates !
Summary of mistakes that expose the
Rep
1. NOT making rep components private
2. Returning a reference to the rep’s mutable
components
3. Initializing rep components with a reference
to an “outside” mutable object
4. NOT performing deep copy of rep elements
1. Use clone method instead
2. Perform manual copies
Group Activity
• For the polynomial example, how many
mistakes of exposing the rep can you find.
How will you fix them ? (refer to code handout
sheet)
Learning Objectives
• Define data abstractions and list their elements
• Write the abstraction function (AF) and
representation invariant (RI) of a data abstraction
• Prove that the RI is maintained and that the
implementation matches the abstraction (i.e., AF)
• Enumerate common mistakes in data abstractions
and learn how to avoid them
• Design equality methods for mutable and
immutable data types
Mutable objects
• Objects whose abstract state can be modified
– Applies to the abstraction, not the representation
• Mutable objects: Can be modified once they
are created e.g., IntSet, IntList etc.
• Immutable objects: Cannot be modified
– Examples: Polynomials, Strings
Equality: Equals Method
• All objects are inherited from object which has
a method “Boolean equals(Object o)”
– Returns true if object o is the same as the current
– Returns false otherwise
• Note that equals tests whether two objects
have the same state
– If a and b are different objects, a.equals(b) will
return false even if they are functionally identical
Equality: IntSet Example
IntSet a = new IntSet();
a.insert(1); a.insert(2); a.insert(3);
IntSet b = new IntSet();
b.insert(1); b.insert(2); b.insert(3);
if ( a.equals(b) ) {
System.out.println(“Equal”);
}
What is printed by the above code ?
Equality: IntSet Example
• It prints nothing. Why ?
– Because the intsets are different objects and the
object.equals method only compares their hash
– Therefore, a.equals(b) returns false
• But this is in fact the correct behavior !
– To see this, assume that you added an element to
a but not b after the equals comparison
– a.equals(b) would no longer be true, even if you
have not changed the references to a or b
Rule of Object Equality
• Two objects should be equal if it is impossible
to distinguish between them using any
sequence of calls to the object’s methods
• Corollary: Once two objects are equal, they
should always be equal. Otherwise it is
possible to distinguish between them using
some combination of the object’s methods.
Mutability and the Equals Method
• For mutable objects, you can distinguish between
two objects by mutating them after the
comparison. Therefore, they are NOT equal. The
default equals method does the right thing – i.e.,
returns false.
• If the objects are immutable AND have the same
state, then the equals method should return true.
So we need to override the equals for immutable
objects to do the right thing.
Immutable Abstractions
• ADT does not change once created
– No mutator methods
– Producer methods to create new objects
• Appropriate for modeling objects that do not
change during their existence
– Mathematical entities such as Rational numbers
– Certain objects may be implemented more
efficiently e.g., Strings
Why use immutable ADTs ?
• Safety
– Don’t need to worry about accidental changes
– Can be assured that rep doesn’t change
• Efficiency
– May hurt efficiency if you need to copy the object
– In some cases, it may be more efficient by sharing
representations across objects e.g., Strings
• Ease of Implementation
– May be easier for concurrency control
Equality: Immutable objects
• Immutable objects should define their own
equals method
– Return true if the abstract state matches, even if
the internal state (i.e., rep) is different
• Therefore, methods of an Immutable object
can modify its rep, but not the abstraction
– Such methods said to have benevolent side effects
Group Activity
• Design an equals method for two polynomials.
What will you do if the polynomials are not in
their canonical forms ?
Learning Objectives
• Define data abstractions and list their elements
• Write the abstraction function (AF) and
representation invariant (RI) of a data abstraction
• Prove that the RI is maintained and that the
implementation matches the abstraction (i.e., AF)
• Enumerate common mistakes in data abstractions
and learn how to avoid them
• Design equality methods for mutable and
immutable data types
To do before next class
• Submit assignment 2 in the lab
• Start working on assignment 3
• Prepare for the midterm exam
– Portions include everything covered so far
– In class on Feb 28th
Download