C - SEAS

advertisement
CIS 262
Automata, Computability, and Complexity
Fall 2015
http://www.seas.upenn.edu/~cse262/
Instructor: Rajeev Alur
alur@cis.upenn.edu
Lecture: Aug 27, 2015
Evolving Computing Platforms
Eniac (1946):
5 thousand operations
per second
Tianhe-2 supercomputer (2015):
33.86 quadrillion floating point
operations per second
2
Evolving Programming Languages
Fortran (1957):
Language with first compiler
Javascript (1995):
Most popular as of today
3
Theory of Computation
Should we judge how good a computer scientist you are simply by
 How many programming languages you know
 How fast you can code
What is the science behind computing?
Once a problem is formalized as a “computational” problem with
clearly specified inputs and outputs, before you write code, let’s
understand:
 Can the problem be solved at all by a computer?
 How efficiently can it be solved?
 How can we be convinced that the solution is correct?
4
Course Goal
Understand limits of computability
 Independent of programming languages, computing platforms..
Need to define “mathematical” models of computation and study
their properties
 Surprisingly robust foundation that has withstood changes in
technology (e.g. advent of quantum computers)
Emphasize mathematical/logical thinking
 Rigorous definitions, theorems, proofs
Course prerequisites
 Discrete math (CIS 160)
 Basic programming (CIS 120)
5
Problem 1: Character Coverage in Documents
Given a text document, check if each of the vowels a, e, i, o, u appears
at least once in the document
As the algorithm scans the document reading one character at a time,
what information should it track?
For each vowel, need to know whether or not it has appeared in the
document read so far
 Maintain one bit, initialized to 0, for each of the vowels
 If the read character is a vowel, set the corresponding bit to 1
 In the end, check if all bits are 1
6
Problem 2: Character Count in Documents
Given a text document, check if the number of occurrences of the
characters a and e are the same
As the algorithm scans the document reading one character at a time,
what information should it track?
Track the difference in number of times a and e have occurred so far
 Count is initially 0
 Increment it if the read character is a
 Decrement it if the read character is e
 In the end, check if the count is 0
7
Finite-State Computation
Problem 1: All five vowels appear at least once
 Need to maintain 5 bits of memory (constant)
 Computing device to solve this problem needs only 32 states
 Number of states is a priori bounded, independent of input length
Problem 2: Count of a’s = count of e’s
 The value of the variable tracking the difference in the two
counts is potentially unbounded
 Memory needed for the computation depends on input length
 If a computing device has only finitely many states, it cannot
solve this problem! (How do we prove such a statement?)
8
Part A: Regular Languages
Finite automata: Formal model of finite-state computation
 Class of problems that can be solved = regular languages
 Define the model, study its properties
 Different characterizations: regular expressions
 Techniques for establishing “non-solvability” by finite automata
Why study this topic??
 Warm-up for defining models of computation
 Beautiful theory with many results!
 Special property: computational problems about automata are
solvable (e.g. minimize the number of states needed)
 Continues to have practical applications: regular expressions
supported by all modern languages (e.g. Javascript), checking
correctness of distributed protocols, …
9
Problem 3: Syntactic Checks for Programs
Given a program (written in, say, C), check if its text adheres to all
syntax rules (e.g. variables should be declared before use)
Is variable a1 declared earlier?
10
Problem 4: Finding Bugs in Programs
Given a program (written in, say, C), verify that its execution cannot
lead to a semantic error (e.g. absence of buffer overflow errors)
Can value of index i be more than
size of array it accesses ?
11
Decidable/Solvable vs Undecidable/Unsolvable
Problem 3: Find syntactic errors in program text
 Compiler checks program text for all such rules and either
reports errors or compiles it into binary code
 Problem is solvable!
Problem 4: Decide whether or not a (semantic) error will arise when a
program executes
 Problem is provably unsolvable!
 There does not exist a compiler that can certify this type of
correctness of programs
Turing Machines: Universal model of computation used to formalize
what a computer can do and cannot do
12
Alan Turing
Alan Turing (British mathematician, 1912-1954): Father of CS
Turing Machines: Mathematical model of computation
Fundamental theorem (Turing, 1936): Undecidability of halting problem
for Turing machines
It is not possible to construct a Turing machine that takes as input
the description of a machine and decides whether or not the
execution of the input machine terminates
Turing also built the computer Enigma to decode encrypted messages
during World War 2
See 2014 movie: The Imitation Game
13
Part B: Computability
Turing Machines: Mathematical model of computation
 Why is it universal? Can do what any known computer can do!
 Insights into how such machines work and their properties
Undecidable/Unsolvable problems:
 Halting problem for Turing machines is undecidable
 Problem reduction: unsolvability of one problem can imply
unsolvability of another!
 Different shades on unsolvability; recursive enumerability
14
Problem 5: Finding Most Connected Person
Given a graph of “friends” connections on Facebook, find the person
who has maximum number of friends
Suffices to check each
person one by one and
find number of his/her
connections
If n people with a total of
m connections then
“time complexity” of
algorithm is roughly
linearly proportional to
n and m
15
Problem 6: Finding Largest Mutually Connected Group
Given a graph of “friends” connections on Facebook, find the largest
clique (group of people who are all friends of one another)
For a given a group of people, easy
to check if this forms a clique
But for given k, number of possible
groups of size k is about nk
Only known bound on k is n, so
number of groups is exponential
in n. Checking all groups is too
inefficient!
16
P vs NP
P: Class of problems with polynomial-time solutions
 Problem 5 belongs to this class
 Precise definition uses time complexity of Turing machines
 These are “efficiently solvable” or tractable problems
NP-complete: A class of problems with no efficient solutions
 Problem 6 belongs to this class
 Precise definition uses “non-deterministic” Turing machines
 No known proof that such a problem cannot be solved efficiently
 A large class of commonly occurring problems that are all
“equivalent” to one another (if you find a polynomial-time solution
for one NP-complete problem, then every problem in NP has polytime solution)
Cook’s Thorem (1971): Propositional satisfiability is NP-complete
17
Part C: Complexity
Time and space complexity of problems
 How much time and memory is needed to solve a given problem
 Lower and upper bounds on complexity
Complexity classes of problems
P
 NP
 PSPACE
Theory of NP-completeness
 Cook’s theorem: SAT is NP-complete
 Problem reductions
THE open problem in computer science/mathematics: Is P = NP ?
18
Course Logistics
Resources
 Website http://www.seas.upenn.edu/~cse262/
 Piazza discussion group
 Canvas for grades
 See website for list of TAs and office hours
 Lecture slides will posted on piazza after each class
Recitation
 Mondays at 4.30 in DRLB A1
 Problem solving
Textbook
 Introduction to the theory of computation, Sipser, 2nd/3rd edition
 Supplementary handouts (will be posted on Piazza)
19
Course Logistics
Grades (one third each)
 Ten homeworks (weekly)
 Two in-class midterms: Sept 29 (part A), Nov 5 (part B)
 Final (note date: Dec 18  )
Rules
 Switch off all devices during lectures!
 Exams are open book
 Discuss concepts and practice problems with me, TAs, other
students, and on Piazza, but work out homework problems on your
own!
20
Reading Homework
Read Chapter 0 of Sipser
Review
 Mathematical notation/concepts for sets, functions
 Proof by contradiction
 Proof by induction
Recitation on Monday: Review of proofs
21
Encoding Problems
Before we can define mathematical computation models
corresponding to machines/computers, we need a way to encode
computational problems
 Mathematically precise
 Simple
 General
Observation: an instance of a problem, whether it be a text
document or Facebook friends graph, can always be encoded by a
sequence of characters (0’s and 1’s)
22
Alphabet
Alphabet S : Finite set of symbols/characters used for encoding
Examples:
 S = { 0, 1 }
 S = { a, b }
 S = { A, C, G, T }
 S = All characters in the Roman alphabet
23
Strings
A string w over an alphabet S is a finite sequence of symbols in S
Example strings over the alphabet S = { a, b }
e : empty string, i.e., string of length 0
a
b
abb
baaabba
abababababababababababababab
S* = Set of all strings over S
If S has m symbols, how many strings of length k ?
How many strings does S* contain?
24
Operations on Strings
Concatenation of strings: Given strings u and v, u.v denotes their
concatenation (sometimes . is omitted)
A string u is a prefix of string w if there exists a string v such that
w = u.v
Prefixes of aab = e, a, aa, aab
A string u is a suffix of string w if there exists a string v such that
w = v.u
Suffixes of aab = e, b, ab, aab
A string u is a substring of string w if there exist strings v and v’
such that w = v.u.v’
Substrings of aaba = e, a, aa, aab, aaba, ab, aba, b, ba
25
Languages
A language L over an alphabet S is a set of strings over S
L is a subset of S*
A problem is encoded as a language L:
Given an input string w, decide whether or not w is in L
This definition is general enough to encode all “decision problems”,
that is, problems where the output is either yes or no
26
Example Language
S = { a, b }
L = { w | w ends with the symbol b }
= { b, ab, bb, aab, abb, bab, bbb, … }
A machine for L needs to check, given an input string w, whether or
not the last symbol of the input w is b
27
Example Language
S = { A, C, G, T }
L = { w | w contains “ACC” as a substring }
Example string in the language: GTTACCGA
Example string not in the language: ACTGCCATTGTCA
28
Muddy Children Puzzle
There are some kids playing in a muddy pond. Their teacher walks by and
says “some of you have mud on your forehead”. Each kid can look
around and see which of other kids have muddy foreheads, but cannot
see his/her own forehead. The teacher says “raise your hand if you
know that you have muddy forehead”. Nobody raises their hands. The
teacher repeats the same question, and it continues like that. If k kids
have muddy foreheads, then after the teacher has asked the question
k times, exactly those kids with muddy foreheads raise their hands.
Figure out why, and then formalize your argument and prove using
induction!
29
Download