ppt

advertisement
CSE 425 Fall 2015 Midterm Exam
• 80 minutes, during the usual lecture/studio time:
10:10am to 11:30am on Monday October 12, 2015
– Arrive early if you can, exam will begin promptly at 10:10am
• Held in Simon Hall, Room 023 (NOT URBAUER 218)
– You may want to locate the exam room in advance
(downstairs in Simon Hall, near room 022)
• Exam is open book, open notes, hard copy only
– I will bring a copy each of the required and optional text for
people to come up to the front and take a look at as needed
– Please feel free to print and bring in slides, your notes, etc.
– ALL ELECTRONICS MUST BE OFF DURING THE EXEM
(including phones, iPads, laptops, tablets, etc.)
CSE 425: Midterm Review
Abstraction in Programming
LD
R1
FIRST
0010
001
000000100
(opcode) (register)
(location)
• Von Neumann Architecture
– Program instructions and data are stored in a memory area
– CPU executes a sequence of instructions
• Machine instruction sets: lowest level of abstraction
– Binary representation that the CPU can process
– Or that a virtual machine can process (e.g., byte code)
• Assembly language is only slightly more abstract
– “Readable” labels: operations, registers, location addresses
CSE 425: Midterm Review
Evolving to Higher Levels of Abstraction
• Algebraic notation and floating point numbers
– E.g., Fortran (John Backus)
• Structured abstractions and machine independence
– E.g., ALGOL (a committee), Pascal (Niklaus Wirth)
• Architecture independence (on beyond Von
Neumann)
– E.g., based on Lambda Calculus (Alonzo Church)
– E.g., Lisp (John McCarthy)
CSE 425: Midterm Review
Some Programming Paradigms
• Imperative/procedural (E.g., C)
– Variables, assignment, other operators
• Functional (E.g., Lisp, Scheme, ML, Haskell)
– Abstract notion of a function, based on lambda calculus
• Logic (E.g., Prolog)
– Based on symbolic logic (e.g., predicate calculus)
• Object-oriented (E.g., Java, Python, C++)
– Based on encapsulation of data and control together
• Generic (E.g., C++ and especially its standard library)
– Based on type abstraction and enforcement mechanisms
– We’ll cover informally via examples throughout the
semester
CSE 425: Midterm Review
Language Design
• Goals (potentially conflicting)
– Efficiency of coding or execution, writability, expressiveness
• Specific design criteria
–
–
–
–
–
–
Regularity (how well language features are integrated)
Generality (how few cases have to be handled specially)
Orthogonality (how widely features still behave the same)
Uniformity (consistent appearance/behavior across features)
Safety or “security” (how difficult it is to produce errors)
Extensibility (how easy and effective is feature addition)
CSE 425: Midterm Review
Syntax and Lexical Structure
• Syntax gives the structure of statements in a language
– E.g., the format of tokens and how they can be arranged
– Lexical structure also describes how to recognize them
• Scanning obtains tokens from a stream characters
– E.g., whitespace delimited vs. regular-expression based
– Tokens include keywords, constants, symbols, identifiers
– Usually based on assumption of taking longest substring
• Parsing recognizes more complex expressions
– E.g., well-formed statements in logic, arithmetic, etc.
– Free-format languages ignore indentation, etc. while fixed
format languages have specific restrictions/requirements
CSE 425: Midterm Review
Regular Expressions, DFAs, NDFAs
• Regular expressions capture lexical structure of
symbols that can be built using 3 composition rules
– Concatenation (ab) , selection (a | b), repetition (b*)
• Finite automata can recognize regular expressions
– Deterministic finite automata (DFAs) associate a unique
state with each sequence generated by a regular expression
– Non-deterministic finite automata (NDFAs) let multiple states
to be reached by the same input sequence (adding “choice”)
• Can generate a unique (minimal) DFA in 3 steps
– Generate NDFA from the regular expression (Scott pp. 56)
– Convert NDFA to (possibly larger) DFA (Scott pp. 56-58)
– Minimize the DFA (Scott pp. 59) to get a unique automaton
• C++11 <regex> library automates all this for you
CSE 425: Midterm Review
Context Free Grammars and BNF
• In context free grammars (CFGs), structures are
independent of the other structures surrounding them
• Backus-Naur form (BNF) notation describes CFGs
– Symbols are either tokens or nonterminal symbols
– Productions are of the form nonterminal → definition
where definition defines the structure of a nonterminal
– Rules may be recursive, with nonterminal symbol appearing
both on left side of a production and in its own definition
– Metasymbols are used to identify the parts of the production
(arrow), alternative definitions of a nonterminal (vertical bar)
– Next time we’ll extend metasymbols for repeated (braces) or
optional (square brackets) structure in a definition (EBNF)
CSE 425: Midterm Review
Ambiguity, Associativity, Precedence
• If any statement in the language has more than one
distinct parse tree, the language is ambiguous
– Ambiguity can be removed implicitly, as in always replacing
the leftmost remaining nonterminal (an implementation hack)
• Recursive production structure also can disambiguate
– E.g., adding another production to the grammar to establish
precedence (lower in parse tree gives higher precedence)
– E.g., replacing exp → exp + exp with alternative productions
exp → exp + term or exp → term + exp
• Recursive productions also define associativity
– I.e., left-recursive form exp → exp + term is left-associative,
right-recursive form exp → term + exp is right-associative
CSE 425: Midterm Review
Extended Backus-Naur Form (EBNF)
• Optional/repeated structure is common in programs
– E.g., whether or not there are any arguments to a function
– E.g., if there are arguments, how many there are
• We can extend BNF with metasymbols
– E.g., square brackets indicate optional elements, as in the
production function → name ‘(‘ [args] ‘)’
– E.g., curly braces to indicate zero or more repetitions of
elements, as in the production args → arg {‘,’ arg}
– Doesn’t change the expressive power of the grammar
• A limitation of EBNF is that it obscures associativity
– Better to use standard BNF to generate parse/syntax trees
CSE 425: Midterm Review
Recursive-Descent Parsing
• Shift-reduce (bottom-up) parsing techniques are
powerful, but complex to design/implement manually
– Further details about them are in another course (CSE 431)
– Still will want to understand how they work, use techniques
• Recursive-descent (top-down) parsing is often more
straightforward, and can be used in many cases
– We’ll focus on these techniques somewhat in this course
• Key idea is to design (potentially recursive) parsing
functions based on the productions’ right-hand sides
– Then, work through a grammar from more general rules to
more specific ones, consuming input tokens upon a match
– EBNF helps with left recursion removal (making a loop) and
left factoring (making remainder of parse function optional)
CSE 425: Midterm Review
Lookahead with First and Follow Sets
• Recursive descent parsing functions are easiest to
write if they only have to consider the current token
– I.e., the head of a stream or list of input tokens
• Optional and repeated elements complicate this a bit
– E.g., function → name ( [args] ) and arg → 0 |…| 9 and
args → arg {, arg} with ( ) 0 |…| 9 , as terminal symbols
• But, EBNF structure helps in handling these two cases
– The set of tokens that can be first in a valid sequence, e.g.,
each digit in 0 |…| 9 is in the first set for arg (and for args)
– The set of tokens that can follow a valid sequence of
tokens, e.g., ‘)’ is in the follow set for args
– A token from the first set gives a parse function permission
to start, while one from the follow set directs it to end
CSE 425: Midterm Review
Bindings
• A binding associates a set of attributes with a name
– E.g., int &i = j; // i is a reference to int j
• Bindings can occur at many different times
–
–
–
–
Language design time: control flow constructs, constructors
Language implementation time: 32 vs 64 bit int, etc.
Programming time: names given to algorithms, objects, etc.
Compile time: templates are instantiated (types are bound),
machine code produced for functions and methods
– Link time: calls between compilation units are linked up
– Load time: virtual addresses mapped to physical ones
– Run time: scopes are entered and left, dynamic allocation of
objects, updates to variables, pointers, and references, etc.
CSE 425: Midterm Review
Static, Stack, and Dynamic Allocation
• In static allocation, unchanging names are introduced
– E.g., program code, global or static objects, etc.
– Manifest constants can be defined at compile time as well
• In stack based allocation, scoped names are added
– Recursion requires a stack to maintain frames for calls
– Elaboration time constants, local variables, parameters, etc.
• In heap based allocation, names appear any time
– I.e., whenever the program calls a dynamic allocation
operator, constructor, etc.
– Deallocation may be either explicit or implicit
– If implicit, garbage collection is often needed
CSE 425: Midterm Review
Declarations, Blocks, and Scope
• A declaration explicitly binds some information to an
identifier, and may bind other information implicitly
– E.g, the statement int x; explicitly binds integer type
attribute int with identifier x, implicitly binds a location for x,
and may or may not implicitly bind a value (i.e., 0) for x
– Declarations that bind all attributes are called definitions
– Declarations that leave some attributes unbound are called
simply declarations, or sometimes prototypes
• A block is a sequence of declarations and definitions
– Declarations that are specific to a block are locally scoped
– Program-wide declarations are globally scoped
– Block-structured languages allow both nesting of blocks and
scoped re-declaration of identifiers (via lexical addressing)
– Classes, structs, namespaces, packages also create scopes
CSE 425: Midterm Review
Symbol Tables for Nested Scopes
• Scope analysis allows declarations and bindings to be
processed in a stack-like manner at run-time
– A symbol table is used to keep track of that information
– E.g., each identifier in a scope has a set of bindings
– Structure/management may range from a single static
symbol table to a dynamic hierarchy of per-scope tables
pi
double
main
3.141
(int, char**)
argc
int
argv
char **
2
function scope
global scope
CSE 425: Midterm Review
Names, Aliases, and Overloading
• Multiple names may refer to same object or variable
– E.g., pointers or references (and pass-by-reference) in C++
• The ability to re-use a name is also commonplace
– E.g., using the + operator for addition or string concatenation
– Since a symbol table usually stores parameter type
attributes for a function identifier, can use to resolve
– I.e., look up the declaration/definition of a function or
operator based on its parameter types as well as its name
• Many languages allow function name overloading and
a smaller number of them allow operator overloading
– Syntactically introduced constraints such as precedence and
associativity of operators must be respected (unless waived)
– Though with tweaks like C++ prefix operator+ call syntax
CSE 425: Midterm Review
Coercion and Polymorphism
• Types often are allowed to be converted to other types
– E.g., while (ifs >> token) // ifstream& to bool
• When the complier forces this to happen its coercion
– I.e., the type conversion happens implicitly
• Polymorphism lets multiple unconverted types be used
– Inheritance polymorphism (E.g., C++ classes)
– Interface polymorphism (E.g., C++ templates)
– Both support subtype polymorphism (Liskov substitution)
• Explicit parametric polymorphism is called “generic”
– E.g., C++ templates with specialization
CSE 425: Midterm Review
Binding of Reference Environments
• Referencing environment depends on the order in
which declarations are encountered
– When is the context in which a function is executed fixed?
– This matters because of nesting of scopes, hiding of names,
and the ability to pass a function (pointer) as a parameter
– Shallow binding is done just before function is called
– Deep binding is done when the parameter is first passed
• Function/subroutine closures
– Bundle referencing environment with reference to subroutine
– E.g., local variable capture list in C++11 lambda expression
• Object closures
– Similar idea, bundle member variables and operator()
CSE 425: Midterm Review
Operational Semantics
• Augment syntax with rules for an abstract machine
– Specifically, a reduction machine that reduces parts of a
program (building up to the program itself) to its values
– Reduction rules repeatedly infer conclusions from premises
• For example, reduction of string “425” to value 425
– Parse off digit ‘4’, convert to value 4
– Parse off digit ‘2’, add value 2 to 10*4 (gives 42)
– Parse off digit ‘5’, add value 5 to 10*42 (gives 425)
• Axioms (rules without a premise) give basic reductions
– E.g., digit and number strings to their corresponding values
• Richer inference rules allow reductions to be nested
– E.g., convert operands of an addition expression, then add
CSE 425: Midterm Review
Environments and Assignment
• If a program handles only (e.g., arithmetic)
expressions, then it can be reduced to a single value
– E.g., ((4+5)*(12-7) – (4*4 + 3*3)) reduces to 425
• However, if assignment statements are added to a
language, an environment is needed to store state
– E.g., the current values of variables at any point in the code
• Environment must be added to reduction rules
– E.g., if variables are used in an expression like (a+2)*c
– Reductions now are made with respect to an environment
• Need additional rules for effects of assignment, etc.
– An identifier can reduce to its current value in environment
– Assignment updates identifier’s value in the environment
– Program reduces to environment after aggregate changes
CSE 425: Midterm Review
Attribute Grammars and Parse Trees
• Attribute grammars describe relationship between
parsing and semantic attributes and their analysis
– Can perform one (inline) or two passes (explicit parse tree)
• Two kinds of augmentations to the grammar
– Copy rules move values from terminals to non-terminals or
between non-terminals, by copying the values over
– Semantic functions perform richer operations like addition,
subtraction, multiplication division, inverse operations, etc.
• Attribute flow may matter if coupled to parse order
– E.g., S-attributed grammars most general for an LR parse
– E.g., L-attributed grammars most general for an LL parse
• Action routines can be interleaved with parsing as well
– Set up nodes of a parse tree, perform evaluation, or both
CSE 425: Midterm Review
Target Machine Details
• Many architectures can be similar in overall structure
– E.g., Von Neumann with CISC instruction sets is common
– RISC architectures also have been used and still are
• For programming languages, architecture of the target
machine also imposes a number of details to consider
– Especially to retarget a compiler to another machine
– Especially to write software that’s portable across machines
– E.g., sizes, other details of memory locations and addresses
• Programming language details often based on them
– Sizes and layouts of data types, aliases, objects, etc.
– Addresses, how incremented/decremented pointers move,
relative to other variables address space (local, array, heap)
– Directions stack and heap grow in memory address space
CSE 425: Midterm Review
Example Data Size Rules (from C++)
• Rules governing data sizes [from Stroustrup 4th Ed]
– R1: 1 == sizeof(char) <= sizeof(short) <= sizeof(int)
<= sizeof(long) <= sizeof(long long)
– R2: 1 <= sizeof(bool) <= sizeof(long)
– R3: sizeof(char) <= sizeof(wchar_t) <= sizeof(long)
– R4: sizeof(float) <= sizeof(double) <= sizeof(long double)
– R5: sizeof(N) == sizeof(signed N) == sizeof(unsigned N)
where N is char, short, int, long, or long long (integral types)
• Minimum data sizes for some types [from LLM 5th Ed]
–
–
–
–
R6: char must be at least 1 byte (8 bits)
R7: wchar_t, char16_t, short, int must be at least 2 bytes
R8: char32_t and long must be at least 4 bytes
R9: long long must be at least 8 bytes
CSE 425: Midterm Review
Expressions vs. Statements
• It is useful to differentiate expressions vs. statements
– Statements (e.g., assignment) have side effects but usually
do not return a value (C++ doesn’t follow this strictly)
– Expressions (e.g., right hand side of a statement) provide a
value but usually don’t have side effects (again except C++)
• Expression syntax may be prefix, infix, postfix
– Prefix and postfix don’t require parentheses for precedence
• Comparison to procedures and functions
– Operands are viewed as arguments or actual parameters
– Referential transparency assumes no side effects
– Side effects may change control flow as well as values
• Some statements mimic complete control constructs
– E.g., conditional operator in C++ acts like if/else construct
CSE 425: Midterm Review
Structured vs. Unstructured Control Flow
• Goto considered too low level
– Still available in some languages
– Rarely necessary, often better to use other features instead
– E.g., break and continue statements in C++
• Many languages offer structured alternatives
– E.g., break to exit a loop or a selected branch in C++
– E.g., continue to skip the rest of an iteration in C++
• Return and multi-level return
– Return may set a value and also transfer control to caller
– Multi-level return (or exception) may unwind farther
• Continuations capture a context for further execution
– E.g., to defer part of the execution until later
CSE 425: Midterm Review
Sequencing
• Ordering of statements
– Happens-before relationship in control flow
– Particularly important when thinking about concurrency
• Data dependences
– Previous write (store) to a variable (location) seen by
subsequent read (load) instruction(s)
• Compound statements
– A block of statements is used in place of a single statement
CSE 425: Midterm Review
Selection
• If statements (e.g., in C++)
–
–
–
–
If statement evaluates expression (not necessarily Boolean)
If true (non-zero) executes statements in its first block
Otherwise executes else block if one was provided
Can next if statements, so else associates with most recent
If that does not already have an else part
• Switch (case) statements
– Condense if/else logic into cases of an ordinal expression
– Default blocks (no case matches)
– Breaks, fall through can be used to emulate ranges of cases
CSE 425: Midterm Review
Iteration
• Implemented through loop constructs (e.g., in C++)
–
–
–
–
E.g., in C++, while, for, do loops
A continue statement skips rest of that iteration of the loop
A break statement exits the loop
Iterators also provide helpful abstractions for iteration
• While loop
– Most basic construct: test guards each iteration of a block
• For loop
– Encodes special case of a while loop (can emulate an
enumeration controlled loop using logical control)
• Do loop
– Ensures execution of the block at least once
CSE 425: Midterm Review
Recursion
• Functional languages severely limit side effects
– Iteration relies on side effects to make progress, terminate
• Recursion is a natural alternative in those cases
– Or where avoiding side effects simplifies flow control logic
• Recursive functions can support lazy evaluation
– E.g., packaging up remaining work as a continuation and
then only performing that work if it’s needed
• Normal order vs. applicative order evaluation
– Operands may not be evaluated until needed (applicative)
CSE 425: Midterm Review
Data and Data Types
• Data may be more abstract than their representation
– E.g., integer (unbounded) vs. 64-bit int (bounded)
• A language or a standard may define representations
– E.g., IEEE floating point standard, sizeof(char) == 1 in C++
• Otherwise representations may be platform-dependent
– E.g., different int sizes on 32-bit vs. 64-bit platform in C++
• Languages also define operations on data
– E.g., strong requirements for arithmetic operators in Java
– E.g., weaker requirements for them in C++
• Languages also differ in when types are checked
– E.g., Scheme is dynamically type checked (at run-time)
– E.g., C++ is (mostly) statically checked at compile-time
CSE 425: Midterm Review
Programs and Type Systems
• A language’s type system has two main parts
– Its type constructors
– Its rules (and algorithms) for evaluating type equivalence,
type compatibility, and type inference
• Type systems must balance design trade-offs
– Static checking can improve efficiency, catch errors earlier
– Dynamic checking increases flexibility by delaying type
evaluation until run-time (common in functional languages),
may consider a larger subset of safe programs to be legal
• Definition of data types (based on sets)
– A data type has a set of values and a type name
– A data type also defines operations on those values, along
with properties of those operations (e.g., integer division)
CSE 425: Midterm Review
Predefined Types, Simple Types
• Languages often predefine types
– Simple types (e.g., int and double in C++ or Java) have only
a basic mathematic (arithmetic or sequential) structure
– Other types with additional structure may be predefined in
the language (e.g., the String type in Java)
• Languages also often allow types to be introduced
– Simple types can be specified within a program (e.g., an
enum in C++ or a sub-range type in Ada)
– Other types (e.g., unions, arrays, structs, classes, pointers)
can be declared explicitly but involve type constructors
CSE 425: Midterm Review
Type Equivalence
• Structural equivalence
– Basic structural equivalence: all types defined as AXB are
the same but are all different than those defined as BXA
– More complex forms arise if elements are named (so they
are not interface polymorphic to member selection operator)
• Name equivalence
– Two types are the same only if they have the same name
– Easy to check, but restrictive
• Declaration equivalence
– Improves on name equivalence by allowing new names to
be given for a declared type, all of which are equivalent
(e.g., in C++ typedefs behave this way)
CSE 425: Midterm Review
Type Checking
• Some languages check statically but weakly
– E.g., C++ warnings rather than errors for C compatibility
• Type compatibility in construction and assignment
– Variable constructed, or l-value/ref assigned from an r-value
– May involve conversion of r-value into a compatible type
• Overloading introduces inference in type checking
– Also may involve type conversion of operands
– E.g., passing an int to a long int parameter
• Casting overrides type checking
– E.g., cannot assign long int result back into a short int in
Java unless you tell the compiler explicitly to recast r-value
– E.g., C++ reinterpret cast tells compiler not to check
CSE 425: Midterm Review
Type Conversion
• Often need to use one type in place of another type
– Safe implicit conversions often are made automatically
– Widening conversions usually don’t lose information
– Narrowing conversions may truncate, overflow, etc.
• Explicit conversions
– Explicit construction from a type provides safe conversion
– Casting offers alternative when safety can only be
determined at run-time (by the program doing the cast)
– Static casts are completed at compile-time
– Dynamic casts check additional information at run-time (e.g.,
if polymorphic) may throw exception, return 0 pointer, etc.
CSE 425: Midterm Review
Survey of Common Types I
• Records
– E.g., structs in C++
– If elements are named, a record is projected into its fields
(e.g., via struct member selection operator as in P.x in C++)
– Variant records (e.g., unions in C++) let different types
overlap in memory
• Arrays
– Usually contiguous in memory and of a homogeneous type
• Layout and access patterns also may affect cache effects, etc.
– Provide an indexing operator (e.g., []) to access elements
– May be allocated on stack or heap (language specific)
– May be compatible with other types (e.g., pointers in C++)
CSE 425: Midterm Review
Survey of Common Types II
• “Utility” types (e.g., strings, sets, lists, files)
– Widely used but not necessarily inherent to a language itself
– Sometimes built into the language (e.g., Pascal)
• E.g., set operators (+, *, -) for union, intersection, and difference
– Sometimes provided by a library (e.g., the C++ STL)
• Strings
– Often just an array of characters (e.g., C style char *)
– Often an object with operators (e.g., C++ style string)
• Sets (similarly, lists)
– Contain unique values (or in the case of lists, ordered
sequences) of a common type
• Files
– Data “streams” coming from and/or going to mass storage
CSE 425: Midterm Review
Survey of Common Types III
• Recursive types
– May contain (an alias for) another instance of the same type
• E.g., a list node contains an alias for the next node in the list
– May require a pointer or other similar abstraction
• To halt the nesting (and/or construction) of recursive type instances
• Pointers and references
– Similar in some languages, different in others
• E.g., pointer/references semantics in Java vs. C++
– Also may be similar to other types
• E.g., (const) pointer and array semantics in C++
• E.g., address arithmetic vs. index arithmetic in C++
CSE 425: Midterm Review
More about Type Constructors
• Used to construct sets from other sets
– E.g., as Cartesian products (tuples)
– If elements are named, a record is projected into its fields
(e.g., via struct member selection operator as in P.x in C++)
• Specialized forms of type constructors are common
– Unions often are at best unnecessary in object-oriented
languages (or worse: in C++ certain examples can exhibit
type problems similar to those seen with reinterpret casting)
– Subset type constructors narrow a set (e.g., in Ada to
construct a sub-range type from a range type)
– Array constructors project an index type into another type
– Pointer/reference constructors create aliases for other types
(pointers have their own operators but references do not)
CSE 425: Midterm Review
Type Checking with Polymorphism
• Hindley-Milner type checking
– Strongly typed functional languages (ML and Haskell) do this
– Unambiguous (deductive) type inference, by propagating
constraints and info up and down an abstract parse tree
– Relies on specific forms of instantiation and unification
• Different types of polymorphism
– Parametric (interface) polymorphism
– Explicit subtype (inheritance) polymorphism
– Liskov substitution principle: whenever S is a subtype of T,
wherever you need an instance of T you can use one of S
• Occurs-check in unification
– Makes sure variable isn’t in an expression substituted for it
– Difficult to do efficiently, only use if necessary (it is for H-M)
CSE 425: Midterm Review
CSE 425 Fall 2015 Midterm Exam
• 80 minutes, during the usual lecture/studio time:
10:10am to 11:30am on Monday October 12, 2015
– Arrive early if you can, exam will begin promptly at 10:10am
• Held in Simon Hall, Room 023 (NOT URBAUER 218)
– You may want to locate the exam room in advance
(downstairs in Simon Hall, near room 022)
• Exam is open book, open notes, hard copy only
– I will bring a copy each of the required and optional text for
people to come up to the front and take a look at as needed
– Please feel free to print and bring in slides, your notes, etc.
– ALL ELECTRONICS MUST BE OFF DURING THE EXEM
(including phones, iPads, laptops, tablets, etc.)
CSE 425: Midterm Review
Download