Data Abstraction CS 201j: Engineering Software Nathanael Paul University of Virginia nate@virginia.edu Computer Science Overview • Data abstraction • Specification/Design of Abstract Data Types (ADTs) • Implementation of ADTs The Problem • Programs are complex. – Windows XP: ~45 million lines of code – Mathematica: over 1.5 million • Abstraction helps – Many-to-one – “forget the details” – Must separate “what” from “how” Information Hiding • Modularity - Procedural abstraction – By specification • Locality • Modifiability – By parameterization • Data Abstraction – What you can do with the data is separated from how it is represented Software development cycle • • • • • Specifications – What do you want to do? Design – How will you do what you want? Implement – Code it. Test – Check if it works. Maintain – School projects don’t usually make it this far. Bugs are cheaper earlier in the cycle! Database Implementation • Database on library web-server stores information on users: userID, name, email, etc. • You are responsible for implementing the interface between the web-server and database – What happens when we ask for the email address for a specific user? Client asks for email address Server What is email address of nate? Database Client Client/Server/Database Interaction I need Nate’s email. Server Database The interaction between the server and database is your part. Client Client/Server/Database Interaction nate@virginia.edu Server Database Client Client/Server/Database Interaction nate@virginia.edu Server Database Client Example: Database System • Need a new data type • Abstract Data Types (ADTs) – Help separate what from how – Client will use the specifications for interaction with data – Client of the web database should not know the “guts” of the implementation Data abstraction in Java • An ADT is defined by a class – The ADT in the web/database application will be a User – A private instance variable hides the class internals – public String getEmail (); • What is private in the implementation? • OVERVIEW, EFFECTS, MODIFIES – A class does not provide data abstraction by itself Class User { // OVERVIEW: // mutable object // where the User Accessibility /* Client code using a User object, myUser */ // is a library String nateEmail = myUser.email; // member. sendEmail(nateEmail); public String email; … } /* The client’s code can only see what is made public in the User class. The user’s email data is public in the User class. This is BAD. */ Program Maintenance • Suppose storage space is at a premium – Everyone in the database is userid@virginia.edu, so we can drop the virginia.edu nate@virginia.edu nate – What kind of problems will occur with the code just seen? Program Maintenance • Suppose storage space is at a premium – Everyone in the database is userid@virginia.edu, so we can drop the virginia.edu nate@virginia.edu nate – What kind of problems could occur had the client code been able to access the email address directly? Email was public in User class. String nateEmail = myUser.email; sendEmail(nateEmail); ***ERROR!!!*** Accessibility (fixed) Class User { // OVERVIEW: A // mutable object where // User is a library // member. private String email; // Client code using a User object, myUser … String nateEmail = myUser.getEmail(); public String sendEmail(nateEmail); getEmail() { // EFFECTS: returns user’s // primary email return email; } /* This code properly uses data abstraction when returning the full email address. */ Accessibility (fixed) Class User { // OVERVIEW: A // mutable object where // User is a library // member. // Client code using a User object, myUser private String email; String nateEmail = myUser.getEmail(); … sendEmail(nateEmail); public String getEmail() { // EFFECTS: returns user’s // primary email return email +“@virginia.edu”; /* The database dropped the @virginia.edu, and only one line of code needed changing. */ Advantages/Disadvantages of Data Abstraction? - More code to write and maintain initially - Overhead of calling a method - Greater initial time investment + Client doesn’t need to know about representation + Maintenance is easier. + Increases locality and modifiability Specifying ADTs Bad Users at the Library • The library now wants to crack down on bad Users with overdue books, so the code will need to work with a group of Users. • What should be used to represent the group? What data structures do we know about? How should we integrate this code with what we have? • What operations should be supported? – deleteUser(String userID); – isInGroup(String userID); Library keeping track of “bad” people • You need to write some code that will manipulate a group of Users that are on the “bad” list. • Implementation at right uses an array Class GroupUsers { // OVERVIEW: // Operations provided // to manage a mutable group // of users private User [] latePeople; … public void toString() { // OVERVIEW: Print user // names to standard output … } } Array implementation initialization for GroupUsers Class GroupUsers { // OVERVIEW: Unbounded, mutable // group of Users private User [] latePeople; … public void GroupUsers(String [ ] userIDs) { // OVERVIEW: Initialize group // from userIDs latePeople = new User[userIDs.length + 10]; for(int i = 0; i < userIDs.length; i++) { latePeople[i] = new User(userIDs[i]); } } } ADT design • Mutable/Immutable ADTs – Mutable – object’s fields or values change – Immutable – object’s fields permanently set at creation – Is this being modified? • Tradeoffs • Immutability simpler and safer • Immutability is slower (creation/deletion of objects) Classification of ADT operations • Creator (constructor) – GroupUsers(String userIDs[ ]) • Producer – addUser(String userID) • Mutator – setUserEmail(String email) • Observer – isMember (String userID) Implementing ADTs A bad implementation • Most common characteristics – Modifying implementation forces other code to be changed (violdates modifiability) – Must understand more code than necessary to reason about code (violates locality) – Maintenance is difficult A good implementation • User class needed a way to store state of a user, so operations will build around the stored state. • Methods should be (procedure abstraction): – Easily coded as possible – Efficient – Exhibit locality – Should enable better testing, maintenance Changing the group implementation • The “guts” of the implementation is subject to change. • What happens on the GroupUser’s deleteUser(String userID)? deleteUser(String userID) • The array must shift down an average of n/2 items when deleting an element <user> X <user> <user> <user> <user> <user> <user> <user> Linked Lists A new data structure Each User has its own representation, but we store the collection in a list. In the following implementation, each user object is contained in a Node object. Head User 1 User 2 User 3 X List-node implementation class Node { // OVERVIEW: // Mutable nodes that is used for a linked list // of users private User theUser; private Node next; next points to the … next “bad” user } latePeople User 1 User 2 … List implementation class GroupUsers { // OVERVIEW: // Mutable, unbounded group of users private Node latePeople; /* head of list */ private int numUsers; … } /* Nodes are users with an additional member field called next. The Node class was added, so the User class would not need modification. */ Adding a user into GroupUsers /* in GroupUsers.java */ public void addUser(User newUser) { // MODIFIES: this // EFFECTS: this_pre = this_pre U { (Node)newUser } latePeople.add(new Node(newUser)); numUsers++; } Adding a node into a group of nodes (Node.java) public void add (Node n) { // MODIFIES: this // EFFECTS: n is inserted just after this in the list // first user in list? if (this.next == null) { this.next = n; } else { n.next = this.next; this.next = n; } } deleteUser(String userID) cont. Head X User 1 User 2 User 1 User 3 X Head User 3 X deleteUser(String userID) Node.java public void delete (String userID) { // MODIFIES: this // EFFECTS: this_pre = this_pre – node // where node.userID = userID Node currNode; Node prevNode; if(this.next == null) return; prevNode = this; currNode = this.next; // continued on next slide deleteUser(String userID) cont. while(currNode.next != null) { if(userID.equals(currNode.getUserID())) { prevNode.next = currNode.next; break; } currNode = currNode.next; prevNode = prevNode.next; } // user at end of list? if (currNode.next == null && userID.equals(currNode.getUserID())) { prevNode.next = null; } } Linked List vs. Array • Array is better for: – Accessing a randomly desired element • Linked list is better at: – Inserting – Deleting – Dynamic resizing • Users of your implementation may need to use a list or an array for efficiency, so you need an implementation that can be changed easily. Questions?