week1

advertisement
CS421 Theory - Yoshii - Week 1 - Chap 1
=======================================
##Inter questions will be part of HW1
To find them using unix, do
grep "##" week1.doc
TOPICS:
------/Strings
/Languages
/Defining Sets
/INTRO
======
This class is basically about (programming) languages.
We
-
will discuss:
defining languages
generating all strings of a language
answering YES if a string is in a language
/DEFINING LANGUAGES
===================
In order to design a new language or describe an
existing language,
you have to be able to define exactly what it is.
What are parts of a language?
Take English as an example:
It is made up of words but
the words are made up of letters.
Words are combined to create a sentence.
You have to follow grammar rules to create a sentence.
i.e. the sentence must be SYNTACTICALLY correct.
e.g.good.
e.g.bad.
A boy kicked a ball.
A kicked boy ball a.
Sentences must also have valid meanings.
i.e. the sentences must be SEMANTICALLY correct.
e.g.bad.
A ball kicked a boy.
So, to define a language, you must define:
- the letters making up words (alphabet),
- the words and punctuation marks of the language (tokens),
- the syntactic rules of the language,
- the semantic rules of the language.
##Inter1* Give one sentence which is syntactically incorrect.
##
##
Give another sentence which is syntactically correct but
is semantically incorrect.
/WHAT SHOULD THE DEFINITION ALLOW US TO DO?
===========================================
Given the definition of a language,
we should be able to systematically generate (i.e. list)
all sentences belonging to the language.
Given the definition of a language,
we should be able to say "YES this belongs to the language"
or "NO this does not belong to the language."
/FOR NATURAL LANGUAGES
======================
Why is it difficult to do the above for natural languages
such as English?
Reason1: Natural languages change constantly.
Reason2: Semantic rules of natural languages are complex
requiring the knowledge of the world which
is huge and also change constantly.
Reason3: Natural languages are for communication between
human beings, which are influenced by CONTEXT
and PRAGMATICS.
e.g.context.
e.g.pragmatics.
pronoun references (it, they, that)
I sure wish I could see the file.... (as a request).
Thus, natural languages are full of ambiguities.
This difficulty presents serious problems when we want to build
a natural language understanding or translation system.
/FOR PROGRAMMING LANGUAGES
==========================
To make it easy to define a programming language,
you want to make sure its syntax and semantics are
clean, simple and unambiguous.
For a programming language, we should be able to
build a compiler.
Source Language ---> Compiler ----> Target Language
(e.g. C++)
|
(assembly/machine code)
error messages
A compiler has two fundamental tasks:
- ANALYSIS:
accept only valid statements of the source language.
- GENERATION: generate an equivalent in the target language.
Note: for other text processors, the concept remains similar.
The source may be a document marked with directives (e.g. Latex)
while the target may be the processed document.
The source may be a database query while the target may be
the actions caused on the database.
/COMPILER PARTS and CS421
==========================
source -> [Scanner] - tokens -> [Parser] - syntactic structure ->
[Semantic routines] - IR -> [Optimizer] - IR -> [Generator] -> target
All phases do error checking
In the following, the Scanner and the Parser are closely related
to the goals of this course. Things in " " will be covered in CS421.
1. Scanner (does Lexical Analysis)
WHAT:
- reads character by character; left-to-right
- groups them into the tokens of the language
(identifiers, integers, reserved words, delimiters, etc.)
- eliminates unneeded info (e.g. comments)
- processes compiler control directives (e.g. #define, #ifndef)
HOW:
- "regular expressions" define the tokens
- "finite automata" recognizes tokens
- "finite automata" can be created automatically from regular expressions
TYPES:
- hand-coded scanner
- table-driven scanner
2.
Parser (does Syntactic Analysis)
WHAT:
- groups tokens into higher level units (expressions, statements, etc.)
- verifies correct syntax
- recovers from (and some repairs) syntax errors to continue parsing
- builds a syntax tree (or parse tree), or calls semantic routines directly
during parsing.
HOW:
- units are defined by the "production rules of a grammar" (Context Free)
which contain "recursive rules" for nested structures.
TYPES:
- recursive descent parser
- table-driven parser
3.
Semantic routines (does Semantic Analysis)
WHAT:
- checks static (compile-time) semantics of each construct
(e.g. variables have been declared? operand types are correct?)
and possible operand coercions (e.g. changing integer into real)
- generates the internal representation (IR) suitable for the
target machine
HOW:
- when a production rule is applied, associated semantic routines
are activated
TYPES:
- usually hand coded - bulk of the effort in writing a compiler!
----------------
Week1 Thursday ------------------------------------
/PARTS OF A LANGUAGE
====================
In order to talk about a language, we must talk about
the alphabet, strings made up of symbols from the alphabet,
and rules governing how string are formed for the language.
We will review the basic terms in discussing
strings, languages, sets of strings and relations.
/Strings
-------[ In real life, strings may be words (identifiers) of a language or sentences
(statements) of a language, depending on the context in which the term
is used.]
alphabet = a finite set of symbols = E
= an unordered set of symbols enclosed in curly brackets
e.g. E = {a, b, c}
string
= a finite sequence of 0 or more symbols from some alphabet
e.g. When E is {a,b}
aabb
is a string on E
babab is a string on E
In this class, we will use letters such as u,v,w,x,y,z to name strings.
|x| = length of string x
e.g. | aabb | is 4
/\
is the number of symbols in x
= an empty string where |/\| = 0
prefix of x = any number of leading symbols of x (includes /\ and x itself)
suffix of x = any number of trailing symbols of x (includes /\ and x itself)
##Inter2* write all the prefixes of abc
##
write all the suffixes of abc
concatenation of x and y = xy,
where /\x and x/\ are x
##Inter3* write xy where x = aaa and y = bbb
palindromes = strings which read the same forward and backward
##Inter4* give an example of a palindrome using the alphabet {a,b,c}
/Sets of Strings
----------------
(Remember: E is alphabet; /\ is an empty string)
[ In real life, a set of strings may be a set of all identifiers/words
in a language or a set of all sentences/statements in a language ]
Sets are enclosed in curly brackets { }
e.g. {aaa, bbb, ccc} is a set of 3 strings.
E^* = the set of all strings over E
E^k = the set of all length k strings over E
##Inter5*
E = {a, b}.
What is E^* ? Describe.
##Inter6*
E = {a, b}.
What is E^2 ? Give the set.
E^0 = the set of all length 0 strings.
= { /\ } (/\ is the only string with length 0)
E^1 = E (because each symbol is a string of length 1)
And E^+ = E^* - { /\ }
(i.e. all but the empty string)
##Inter7* Are E^* and E^+ infinite even when E is finite? Why?
Cardinality of a set is the number of members of the set.
e.g. A = {a,b}
|A| = 2
e.g. A = {a,b,c}
|A| = 3
/Languages
---------language = a set of strings of symbols from some alphabet
(i.e. a finite/infinite subset of E^*) following
the rules of the language
e.g.
{ } is a language with no strings
e.g.
{ /\ } is a language with just the empty string
e.g.
a set of palindromes over E = {0, 1} is a language
which is an infinite set.
i.e. { /\, 0, 1, 00, 11, 010, 101, ...}
e.g.
a set of length 3 palindromes over E = {0, 1} is a language
which is a finite set.
i.e. {010, 101, 000, 111}
e.g.
English is a language which is a set of strings (sentences)
from {a,..,z, numbers, punctuation marks}
##Inter8* Is English a finite set? Explain why or why not.
/Defining Sets
-------------To define a set of strings, we don't have to list all the strings
in curly brackets (impossible to do this if the set is infinite);
we can specify a set by a set former = { objects | restrictions }
i.e.
{x | P(x) }
a set of x's such that P(x) is true
{x in A | P(x) } a set of x's from A such that P(x) is true
e.g.
{i | i is an integer and there exists an integer j such that i = 2j}
defines the set of all even integers.
e.g.
{u | u is a palindrome over {0, 1} }
##Inter9*
define the set of all odd integers using a set former.
/Relating Sets
-------------In the followng A and B are sets.
A = B
A and B are equal i.e. they have exactly the same members
A ( B
-
A is a subset of B (every member of A is in B)
A ( B
A is a subset of B but A does not equal to B (proper subset)
A U B = the union of A and B
{x | x is in A or x is in B}
e.g. all the words in English plus all the words in French
A ^ B = the intersection of A and B
e.g. all words common to English and French
##Inter10* complete {x | x is in A ?? } for A ^ B
A - B = the set of A members minus the B members
e.g. all words in English minus those from French
##Inter11* complete {x | x is in A ??} for A - B
A and B are disjoint if they do not share members.
_
A = Universe - A
(i.e. everything except what is in A)
Remember the De Morgans's Law???
_____
x is a member of A U B
then x is not a member of A U B
x is not a member of A
and x is not a member of B
_
_
so, x is a member of A and x is a member of B
_
_
thus, x is a member of A ^ B
##Inter12*
_
_
_____
## Now show that if x is a member of A ^ B then x is a member of A U B
2^A = power set of A = set of all subsets of A including the empty set
and A itself
i.e. a set of sets.
e.g. A = {a,b}
2^A = { {}, {a}, {b}, {a,b}} with 4 members
Cardinality of 2^A = |2^A|
If A is finite then |2^A| = 2^|A|
End.
Download