CIS 262 Automata, Computability, and Complexity Fall 2015 http://www.seas.upenn.edu/~cse262/ Instructor: Rajeev Alur alur@cis.upenn.edu Lecture: Aug 27, 2015 Evolving Computing Platforms Eniac (1946): 5 thousand operations per second Tianhe-2 supercomputer (2015): 33.86 quadrillion floating point operations per second 2 Evolving Programming Languages Fortran (1957): Language with first compiler Javascript (1995): Most popular as of today 3 Theory of Computation Should we judge how good a computer scientist you are simply by How many programming languages you know How fast you can code What is the science behind computing? Once a problem is formalized as a “computational” problem with clearly specified inputs and outputs, before you write code, let’s understand: Can the problem be solved at all by a computer? How efficiently can it be solved? How can we be convinced that the solution is correct? 4 Course Goal Understand limits of computability Independent of programming languages, computing platforms.. Need to define “mathematical” models of computation and study their properties Surprisingly robust foundation that has withstood changes in technology (e.g. advent of quantum computers) Emphasize mathematical/logical thinking Rigorous definitions, theorems, proofs Course prerequisites Discrete math (CIS 160) Basic programming (CIS 120) 5 Problem 1: Character Coverage in Documents Given a text document, check if each of the vowels a, e, i, o, u appears at least once in the document As the algorithm scans the document reading one character at a time, what information should it track? For each vowel, need to know whether or not it has appeared in the document read so far Maintain one bit, initialized to 0, for each of the vowels If the read character is a vowel, set the corresponding bit to 1 In the end, check if all bits are 1 6 Problem 2: Character Count in Documents Given a text document, check if the number of occurrences of the characters a and e are the same As the algorithm scans the document reading one character at a time, what information should it track? Track the difference in number of times a and e have occurred so far Count is initially 0 Increment it if the read character is a Decrement it if the read character is e In the end, check if the count is 0 7 Finite-State Computation Problem 1: All five vowels appear at least once Need to maintain 5 bits of memory (constant) Computing device to solve this problem needs only 32 states Number of states is a priori bounded, independent of input length Problem 2: Count of a’s = count of e’s The value of the variable tracking the difference in the two counts is potentially unbounded Memory needed for the computation depends on input length If a computing device has only finitely many states, it cannot solve this problem! (How do we prove such a statement?) 8 Part A: Regular Languages Finite automata: Formal model of finite-state computation Class of problems that can be solved = regular languages Define the model, study its properties Different characterizations: regular expressions Techniques for establishing “non-solvability” by finite automata Why study this topic?? Warm-up for defining models of computation Beautiful theory with many results! Special property: computational problems about automata are solvable (e.g. minimize the number of states needed) Continues to have practical applications: regular expressions supported by all modern languages (e.g. Javascript), checking correctness of distributed protocols, … 9 Problem 3: Syntactic Checks for Programs Given a program (written in, say, C), check if its text adheres to all syntax rules (e.g. variables should be declared before use) Is variable a1 declared earlier? 10 Problem 4: Finding Bugs in Programs Given a program (written in, say, C), verify that its execution cannot lead to a semantic error (e.g. absence of buffer overflow errors) Can value of index i be more than size of array it accesses ? 11 Decidable/Solvable vs Undecidable/Unsolvable Problem 3: Find syntactic errors in program text Compiler checks program text for all such rules and either reports errors or compiles it into binary code Problem is solvable! Problem 4: Decide whether or not a (semantic) error will arise when a program executes Problem is provably unsolvable! There does not exist a compiler that can certify this type of correctness of programs Turing Machines: Universal model of computation used to formalize what a computer can do and cannot do 12 Alan Turing Alan Turing (British mathematician, 1912-1954): Father of CS Turing Machines: Mathematical model of computation Fundamental theorem (Turing, 1936): Undecidability of halting problem for Turing machines It is not possible to construct a Turing machine that takes as input the description of a machine and decides whether or not the execution of the input machine terminates Turing also built the computer Enigma to decode encrypted messages during World War 2 See 2014 movie: The Imitation Game 13 Part B: Computability Turing Machines: Mathematical model of computation Why is it universal? Can do what any known computer can do! Insights into how such machines work and their properties Undecidable/Unsolvable problems: Halting problem for Turing machines is undecidable Problem reduction: unsolvability of one problem can imply unsolvability of another! Different shades on unsolvability; recursive enumerability 14 Problem 5: Finding Most Connected Person Given a graph of “friends” connections on Facebook, find the person who has maximum number of friends Suffices to check each person one by one and find number of his/her connections If n people with a total of m connections then “time complexity” of algorithm is roughly linearly proportional to n and m 15 Problem 6: Finding Largest Mutually Connected Group Given a graph of “friends” connections on Facebook, find the largest clique (group of people who are all friends of one another) For a given a group of people, easy to check if this forms a clique But for given k, number of possible groups of size k is about nk Only known bound on k is n, so number of groups is exponential in n. Checking all groups is too inefficient! 16 P vs NP P: Class of problems with polynomial-time solutions Problem 5 belongs to this class Precise definition uses time complexity of Turing machines These are “efficiently solvable” or tractable problems NP-complete: A class of problems with no efficient solutions Problem 6 belongs to this class Precise definition uses “non-deterministic” Turing machines No known proof that such a problem cannot be solved efficiently A large class of commonly occurring problems that are all “equivalent” to one another (if you find a polynomial-time solution for one NP-complete problem, then every problem in NP has polytime solution) Cook’s Thorem (1971): Propositional satisfiability is NP-complete 17 Part C: Complexity Time and space complexity of problems How much time and memory is needed to solve a given problem Lower and upper bounds on complexity Complexity classes of problems P NP PSPACE Theory of NP-completeness Cook’s theorem: SAT is NP-complete Problem reductions THE open problem in computer science/mathematics: Is P = NP ? 18 Course Logistics Resources Website http://www.seas.upenn.edu/~cse262/ Piazza discussion group Canvas for grades See website for list of TAs and office hours Lecture slides will posted on piazza after each class Recitation Mondays at 4.30 in DRLB A1 Problem solving Textbook Introduction to the theory of computation, Sipser, 2nd/3rd edition Supplementary handouts (will be posted on Piazza) 19 Course Logistics Grades (one third each) Ten homeworks (weekly) Two in-class midterms: Sept 29 (part A), Nov 5 (part B) Final (note date: Dec 18 ) Rules Switch off all devices during lectures! Exams are open book Discuss concepts and practice problems with me, TAs, other students, and on Piazza, but work out homework problems on your own! 20 Reading Homework Read Chapter 0 of Sipser Review Mathematical notation/concepts for sets, functions Proof by contradiction Proof by induction Recitation on Monday: Review of proofs 21 Encoding Problems Before we can define mathematical computation models corresponding to machines/computers, we need a way to encode computational problems Mathematically precise Simple General Observation: an instance of a problem, whether it be a text document or Facebook friends graph, can always be encoded by a sequence of characters (0’s and 1’s) 22 Alphabet Alphabet S : Finite set of symbols/characters used for encoding Examples: S = { 0, 1 } S = { a, b } S = { A, C, G, T } S = All characters in the Roman alphabet 23 Strings A string w over an alphabet S is a finite sequence of symbols in S Example strings over the alphabet S = { a, b } e : empty string, i.e., string of length 0 a b abb baaabba abababababababababababababab S* = Set of all strings over S If S has m symbols, how many strings of length k ? How many strings does S* contain? 24 Operations on Strings Concatenation of strings: Given strings u and v, u.v denotes their concatenation (sometimes . is omitted) A string u is a prefix of string w if there exists a string v such that w = u.v Prefixes of aab = e, a, aa, aab A string u is a suffix of string w if there exists a string v such that w = v.u Suffixes of aab = e, b, ab, aab A string u is a substring of string w if there exist strings v and v’ such that w = v.u.v’ Substrings of aaba = e, a, aa, aab, aaba, ab, aba, b, ba 25 Languages A language L over an alphabet S is a set of strings over S L is a subset of S* A problem is encoded as a language L: Given an input string w, decide whether or not w is in L This definition is general enough to encode all “decision problems”, that is, problems where the output is either yes or no 26 Example Language S = { a, b } L = { w | w ends with the symbol b } = { b, ab, bb, aab, abb, bab, bbb, … } A machine for L needs to check, given an input string w, whether or not the last symbol of the input w is b 27 Example Language S = { A, C, G, T } L = { w | w contains “ACC” as a substring } Example string in the language: GTTACCGA Example string not in the language: ACTGCCATTGTCA 28 Muddy Children Puzzle There are some kids playing in a muddy pond. Their teacher walks by and says “some of you have mud on your forehead”. Each kid can look around and see which of other kids have muddy foreheads, but cannot see his/her own forehead. The teacher says “raise your hand if you know that you have muddy forehead”. Nobody raises their hands. The teacher repeats the same question, and it continues like that. If k kids have muddy foreheads, then after the teacher has asked the question k times, exactly those kids with muddy foreheads raise their hands. Figure out why, and then formalize your argument and prove using induction! 29