Class 1 What this course is all about? We will look at languages in the abstract sense--alphabet, grammar rules, machines that recognize the languages. There are four classes of languages usually studied in formal language theory: regular (Type 3), context-free (Type 2), context-sensitive (Type 1) and recursively enumerable or unrestricted (Type 0) Each language type has a different set of rules which the grammars must obey and each is recognized by a different kind of machine. We will focus primarily on the regular and context-free languages. We also consider more abstract concepts such as decidability i.e. given a particular question is it possible to give a correct answer for every instance of the problem? In other words, if we have an algorithm to solve the problem, that algorithm must return a correct answer for every possible instance of the problem. The classic undecidable problem is the halting problem. (This is typically proven for a Turing machine—given a Turing machine M and input string w does M halt when run on w), Basically the problem asks, given a program and input for the program will the program halt when run on that input. (This would be a handy tool to have so you could avoid having infinite loops in a program.) We could think of this as a function called HALT which can take a program and its input and decide (correctly) if the program will halt on that input. Let P be a program that contains the function HALT. Recall that the input to P is a program and its input. Here’s how P operates: Program P (program, input, output) read program and its input answer HALT(program, input) if the answer is “no”, then stop else {answer is “yes”} i2 while i < 5 ii+0 end Now, suppose the input to program P is a copy of P itself and some input program. Consider what happens. If HALT says “yes” then it means P doesn’t go into an infinite loop, but the way P is designed, this is the situation under which P goes into an infinite loop. Basically what we have is program P goes into an infinite loop if and only if P halts. Thus, no such algorithm can exist. In other words it’s not decidable if a program will always halt. That is, there are some questions that cannot be answered by a computer. This is the basic idea behind decidability. We’ll see more of this later when we ask questions like “Is the complement of a context-free language also context-free?” Now, let’s define some terms we will be using throughout the semester. In order to define a language we need to begin with an alphabet—a finite set of symbols denoted by . From the individual symbols, strings (finite sequences of symbols) are constructed. A language is any set of strings over an alphabet. The length of a string w denoted |w| is the number of symbols in it. The empty string (length 0) is denoted by . Other terms we use are concatenation (“hooking” two strings together), reversal (reverse the order of the symbols in the string), substring (a contiguous set of symbols from a string), prefix (a substring that begins with the first symbol, if nonempty) and suffix (a substring which ends with the last symbol, if nonempty). For example, consider the strings x and y below: x = abba y = bbba They can be concatenated in two ways: xy = abbabbba while yx = bbbaabba The reversal of y denoted yR = abbb The prefixes of x are , a, ab, abb, and abba. The suffixes of y are , a, ba, bba and bbba. The notation * is used set of all finite length strings w that can be obtained by using the symbols in the alphabet . Note that while all strings in * have finite length, the set itself is infinite.