TDDC74 Programming: Abstraction and Modelling Supplement Document SICP, Chapter 02 Innehåll 1 Overview: SICP 02 – Data Abstraction 2 2 SICP 2.1-2.3 2 2.1 Abstract Interfaces, Barriers, & ADTs . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.3 Pretty-printing CONS Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Box-and-pointer Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.5 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.5.1 Symbols versus Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.5.2 quote and cons/list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5.3 Diagramming quoted structures . . . . . . . . . . . . . . . . . . . . . . . . . 6 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.6 3 Vocabulary 8 1 1 Overview: SICP 02 – Data Abstraction This document contains supplemental information for PRAM; this material should be studied and understood in addition to the material in SICP, Chapter 02. As you work on SICP, chapter 02, keep in mind that the main point is to learn about data abstraction. In order to explore this issue, it is necessary to learn and master the particular details of how to construct and manipulate different types of compound data in Scheme. Therefore, of course, we spend time on these details. However, it is important for students to recall that although the specific details vary from language to language, the main principles are still valid and useful. 2 SICP 2.1-2.3 2.1 Abstract Interfaces, Barriers, & ADTs See SICP 2.1 and definitions in the appendix. 2.2 Pairs Students often find pair-structures and pair-operations confusing – especially the difference(s) between cons and list. Using box-and-pointer notation will really help solidify an understanding here. Obs! Section 2.1.1 of SICP (“Example: Arithmetic Operations for Rational Numbers”) introduces important material about pair data-structures. Be sure to read that material. • The empty list is written as ’() – you will see it without the quote when it is returned as a value. • The cdr of a pair is the rest of the structure; it is not the “second element.” (cdr (list 1 2 3 4)) does not return 2 ! • The empty list is not a pair; it is an error to ask for the car or cdr of the empty list. • The list? predicate returns #f for improper lists: (list? (cons 1 2)) → #f • An improper list is a structure in which the cdr of the last pair is something other than () (the empty list) • Similarly, a proper list is a structure in which the cdr of the last pair is () (the empty list) • OBS! cons and list create new cons-cells for each pair-element. Thus, the two calls to cons below create two separate cons cells. (list (cons 1 2) (cons 1 2)) 2 2.3 Pretty-printing CONS Structures The “pretty printing” of cons structures can be confusing – especially when students implement their own versions of some primitives and see different results. This is particularly true for expressions that use cons more than once. (cons 1 (cons 2 3)) → (1 . (2 . 3)) (cons 1 (cons 2 3)) → (1 2 . 3) ; should print ; does print (cons (cons 1 2) (cons 2 3)) → ((1 . 2) . (2 . 3)) (cons (cons 1 2) (cons 2 3)) → ((1 . 2) 2 . 3) ; should print ; does print For each of the two pairs of expressions above, the first expression is what should be displayed, based on the logic of forming improper lists. However, in order to increase readability, Scheme “pretty prints” a more compact notation. The printing rule: if the next item to the right is a pair, leave out the leading dot and the parentheses that enclose that pair. 2.4 Box-and-pointer Diagrams Just as we had Substitution diagramming for the Substitution Model of Evaluation, so we also introduce a diagramming technique for visualizing pair structures of returned values. OBS! These diagrams are very important – and will be used through-out the rest of the course. Some things to keep in mind when creating box-and-pointer diagrams: • In order to creating box and pointer diagrams: start by diagramming the backbone. • Keep in mind: box-and-pointer diagrams are a visualization of the structure of returned values. Before creating a diagram, scan the expression to ensure it is well-formed ! The following expression will generate an error: (list 1 (2 3)) → ERROR Remember, cons and list follow the usual rules of evaluation: evaluate expressions and subexpressions for their values before returning the resulting structure. So, in the example above, Scheme will choke on the expression (2 3) Warning: if we ask you to diagram a cons or list structure that generates an error, we expect “error” and an explanation as an answer! • Warning! It is very important remember to include a pointer to the structure. For the next expression, there is an arrow pointing to the box-and-pointer diagram for the cons of 1 and 2: (cons 1 2) Similarly, in the expression below, there would be an arrow pointing from bar to the box-andpointer diagram for the list of 2 and 3 : (define bar (list cons list)) Finally, what is the box-and-pointer diagram for the returned value of evaluating the following expression? ((car bar) 2 3) → ??? This issue will become extremely important in SICP 3 when we need to visualize how datastructures are modified !!! 3 2.5 Symbols A symbol is a quoted – or unevaluated group – of alphabetical characters. (symbol? ’a) → #t (symbol? ’abaababdb) → #t (symbol? ’one) → #t (symbol? ’1) → #f (symbol? ’(1 2)) → #f Such a group of characters is symbolic when it is used to refer to something else, as in the case of a variable. This is potentially confusing, since the creation of a name/value binding is the one case where a symbol is used without a quotation. This is because the creation of those bindings treats the names as symbols rather than evaluating them. In other words, when Scheme evaluates this expression: (define quux 32) It is treating quux as a symbol (rather than evaluating it). However, such names will not technically test true with the symbol? predicate. (symbol? quux) → #f The only elements that will test true as symbols are those where we explicitly use quotation (symbol? (quote quux)) → #t (symbol? ’quux) → #t This is true even when we test whether the returned value is a symbol: (define quux (quote blah)) (symbol? quux) → #t ; obs! ; obs! There are a number of issues about symbols and quoting to observe: • Creation of box-and-pointer diagrams • Symbols versus strings • quote and cons/list Scheme expects that a set of characters without quote (or parenthesis) is a name for a binding. If the binding exists, calling that name returns the value bound to the name. If the binding does not exist, an error is returned. (define foo 42) foo → 42 (quote foo) → foo baz → error: reference to undefined identifier: baz (quote baz) → baz Numbers are already self-evaluating, it doesn’t hurt to quote them – but there is no need to do so: 3 → 3 ’3 → 3 4 2.5.1 Symbols versus Strings Although both symbols and strings are displayed the same on the screen, they are different datatypes and have different characteristics! The main point about symbols is that they “point to something else.” Thus, variables are symbolic names – the symbol (name) actually refers to something else. In the case of variables, the name is just a convenient way to refer to the thing of interest, whether it is a single value or a procedure or something else. When we deal with symbols, we are not interested in the “parts” of the symbol – we are interested in accessing and using what the symbol points to. In fact, technically, a Scheme symbol does not have “parts”, even though it may be displayed on the screen in such a way that it looks like a sequence of characters. A string is a particular sequence of characters. When we identify some sequence of characters as a string, as in åord”, we are creating a different data-type. ’word is not the same as åord" (symbol? (symbol? (string? (string? ’one) öne") ’one) öne") → → → → #t #f #f #t • Symbols and strings are different data-types – which means they support different kinds of operations (as do one and 1 ) • Symbols and strings are both made up of characters • The characters that make up a symbol are not decomposable; on the other hand, there are many useful string-operations (separating into separate characters, concatenating, etc.) • There can only be one name (“symbol”) bound to a given value in a given scope; there can be many identical strings • Symbols are not case-sensitive ’Abc ’abc ’ABC Students often find the difference between strings and symbols confusing. They see a symbol, such as foo and automatically “read” it as consisting of a string of characters. But consider the following: "123" → ? 123 → ? The number is also made of characters, yet we usually understand that it is a number. For this course, very little use of strings or string-processing; mostly as error messages in procedures: (define (foobaz positive-input) (if (<= positive-input 0) "ERROR: input a positive number" (* positive-input positive-input))) Yes, a procedure can return a string – just as it can return a number or list or any other type of value. Note that in SICP, chapter 2, there is mention of the primitive display. We will use this and other “display” primitives in SICP 03. For now, all you need to know is that display is related to aspects of non-functional programming. 5 In other words, in the example of foobaz above, the procedure is able to return a string as a value. In this case, foobaz is a typical functional machine: returning a value. As a side effect, Scheme is nice enough to display the returned value on the screen. This side effect is not functional. Why? Because it changes the state of the screen – and this change lasts beyond the end of the evaluation of the procedure foobaz When we want to directly control what appears – and how it appears – on a display screen, then we will start using non-functional commands, such as display. We will study and use these in part 3 of the course. 2.5.2 quote and cons/list It is essential to remember that quote and cons / list do not operate the same way!!! • Scheme takes the argument to quote without evaluating it – and then creates the appropriate structure • Scheme applies list and cons to its arguments – and those arguments are evaluated for their values before being used by cons or list to create a structure (quote (1 (2 3))) (list 1 (2 3)) ; this is NOT equivalent ; to this (quote (1 (2 3))) (list 1 (list 2 3)) ; this is NOT equivalent ; to this (quote (1 (2 3))) (list 1 (list 2 3)) ; this is equivalent to what is RETURNED ; by this 2.5.3 Diagramming quoted structures • Structures created with quote do have a list-structure – and it can be diagrammed with box-and-pointer. • Obs! The quote is what you use to tell Scheme not to evaluate something; it isn’t part of the output. So, do not: – quote printed results – insert quotes in box and pointer diagrams Test yourself by diagramming the following: (cons ’(a) ’(b)) → ((a) b) (define foo ’(list +)) (car foo) → ??? (define bar (list list +)) (car bar) → ?? (define baz (list ’list +)) (car r) → ??? 6 2.6 Equality Just as there is an equality predicate for numbers, there is one for symbols and one for lists: (= 3 3) → #t (eq? ’foo ’foo) → #t (eq? ’foo ’bar) → #f (equal? ’(1 2) ’(1 2)) → #t (equal? ’(1 2) ’(1 3)) → #f (equal? ’(1 (2 3)) ’(1 (2 3))) → #t (equal? ’(1 2 3) ’(1 (2 3))) → #f For now, please use only the three following tests for equality – and in the following ways: • = works on numbers • eq? works on symbols • equal? works on lists In general, we will be checking to ensure that code isn’t simply using equal? (or otherwise toogeneral) tests of equality. Note that there is some deep subtlety about the nature of equality in general – and about the difference between equality and identity; we will explore this further in SICP, chapter 03. 7 3 Vocabulary Abstract Data Type (ADT) In its simplest terms, an ADT is a “compound” form of data created out of primitive (or built-in) data-types and operations. More importantly, an ADT allows us to separate how the data is used from how the data is represented. This, in turn, makes it much easier to change underlying implementations for debugging, optimization, and the like. Abstract Interface The collection of constructors, selectors, and predicates by which we define a particular data-type. This collection functions as an interface to the functionality implemented “below” the interface. The abstract interface is also known as an abstraction barrier. Accessor One of the procedure-types (constructor, selector, etc.) making up the abstract interface. Accumulator A recursive procedure that “accumulates” its results (typically by “consing up a result”). Cons cell The structure that results from applying cons to two arguments. The value associated with the first part of the cons cell can be returned with car – and the value associated with the rest of the cons cell can be returned with cdr. Loosely, cons cells and pairs are often used as synonyms. Constructor A procedure for creating data of a specific data-type. Data We usually think of data as the “stuff” that is operated upon by procedures. However, we can also define data in terms of an abstract interface: the operations we use to manipulate them – and the formal relationships we specify between arguments and values. Data-type A specific abstract interface for one type of data. Filter A procedure for separating data that meets some criteria from other data. Note: in SICP, to “filter” means to return items that match the predicate-argument – if, for example, one applies filter to the predicate odd? and a list of numbers, then the SICP version of filter returns the odd numbers as the value. In other texts and code, it is common for filter to return the items that do not match the argument-predicate (that is, to “filter out” the items the match the argument-predicate). List Since the cdr of a single pair can point to another pair, technically a list is a pair is one in which the the cdr of the final pair of the backbone cons sequence is nil (or the empty list). (If there is only a single cons cell, then this applies to the cdr of that cell.) In everyday language, the term list is not usually used to refer to a single pair, but rather to a series of cons cells. Note: an improper list is one in which the cdr of the final pair of the backbone sequence is non-nil ; an improper list tests false for list? Map One-to-one transformation. Note also Scheme’s map procedure. Pair A fundamental two-part compound data-structure in Scheme, created with cons (for “construct”). Also known as a cons cell. Predicate (or “recognizer”) A predicate tests its argument(s) and returns true (#t) or false (#f). 8 (number? 2) → #t (pair? 2) → #f (symbol? 2) → #f (pair? (cons 1 2)) → #t (symbol? ’two) → #t Selector A procedure for selecting different “parts” of data of a specific data-type. Symbol A symbol is a particular data-type that is created by quoting – or preventing the evaluation – of alphabetical characters. In more general terms, a symbol stands for something else, as in the case of a variable name that is bound to a value. Tree A self-similar data-structure in which branches consist of “leaves” or sub-branches (which may consist of leaves or sub-branches). A particular kind of tree is a binary tree, in which each branch consists of exactly two leaves or sub-branches. 9