Formal Languages and Theory of Automata Formal Languages and Theory of Automata Chapter 1– Introduction Basic Concepts of Finite Automata and Languages Outline • Introduction • Alphabets and Strings • What is a Formal Language? • What is Automata Theory? • General Model of Automaton • Introduction to Finite Automata Introduction • The term Formal Languages refers to languages that can be described by a body of systematic rules – simple and precise mathematical rules. • The formal language theory as a discipline started from the work of linguist Noam Chomsky in the 1950s – – When he attempted to give a precise characterization of the structure of natural languages. – His goal was to define the syntax of languages using simple and precise mathematical rules. – Later it was found that the syntax of programming languages can be described using one of Chomsky’s grammatical models called Context Free Grammars. Introduction – cont.… • Formal Language theory mainly deals with the mathematical properties of strings and collection of strings. • Soon after the advent of modern electronic computers, people realized that all forms of information – whether numbers, names, pictures, or sound waves can be represented as strings. • Then collection of strings known as languages became central to computer science. • Also theoretical computer science studies the mathematical foundation of computer science – which investigates the power and limitations of computing devices using abstract mathematical models of computers. Introduction – cont.… • Models should be as simple as possible so that they can be easily analyzed; yet they have to be powerful enough to be able to perform relevant computation process. – The model ignores the implementation details of individual computers and concentrates on the actual computation process. • Real computers are modeled with a very simple abstract mathematical model – called Turing Machine. – Using it we are able to prove that whether some computational problems can be algorithmically solvable (decidable) or cannot be algorithmically solvable (i.e. undecidable). • Automata theory is the study of abstract machines and automata as well as the computational problems that can be solved using them. Introduction – cont.… • An automaton is an abstract model of machines that perform computation on an input. • Automata theory and formal language theory are closely related, where an automaton is a finite representation of a formal language that may be an infinite set. • Throughout the course, we deal with the relationship between formal languages and automaton by constructing accepting devices for four types of formal languages: – – – – Regulars language and Finite automata Context free languages and Pushdown automata Context sensitive languages and Linear-bounded automata Recursively enumerable languages and Turing machines. Definitions • Symbol – An atomic unit, such as a digit, character, lowercase letter, etc. Sometimes a word. [Formal language does not deal with the “meaning” of the symbols.] • Alphabet – A finite set of symbols, usually denoted by Σ. Σ = {0, 1} Σ = {0, a, 4} Σ = {a, b, c, d} • String – A finite length sequence of symbols, presumably from some alphabet. Alphabets and strings • A common way to talk about words, number, pairs of words, etc. is by representing them as strings • To define strings, we start with an alphabet An alphabet is a finite set of symbols. • Examples S1 = {a, b, c, d, …, z}: the set of letters in English S2 = {0, 1, …, 9}: the set of (base 10) digits S3 = {a, b, …, z, #}: the set of letters plus the special symbol # S4 = {(, )}: the set of open and closed brackets Strings A string over alphabet S is a finite sequence of symbols in S. • The empty string will be denoted by e (epsilon) • Examples abfbz is a string over S1 = {a, b, c, d, …, z} 9021 is a string over S2 = {0, 1, …, 9} ab#bc is a string over S3 = {a, b, …, z, #} ))()(() is a string over S4 = {(, )} Strings Let: u=ε w = 0110 y = 0aa x = aabcaa z = 111 concatenation: wz = 0110111 length: |w| = 4 reversal: yR = aa0 |x| = 6 but |u| = 0 Some special sets of strings • Σ* All strings of symbols from Σ • Σ+ Σ* - {ε} Example: Σ = {0, 1} • Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001,…} • Σ+ = {0, 1, 00, 01, 10, 11, 000, 001,…} What is Formal Language ? A (formal) language is a set of strings over an alphabet. i.e. any subset L of Σ* • Classes of formal languages (e.g., regular, context-free, context-sensitive, recursive, recursively enumerable). • Formal languages are related to programming languages and natural languages Formal Language Examples: Σ = {0, 1} L = {x | x is in Σ* and x contains an even number of 0’s} = {00, 010,11100,00100,…} Σ = {0, 1, 2,…, 9, .} L = {x | x is in Σ* and x forms a finite length real number} = {0, 1.5, 9.326,…} Σ = {a, b, c,…, z, A, B,…, Z} L = {x | x is in Σ* and x is a CPP reserved word} = {while, for, if, int, …} Formal Language Examples: Σ = {CPP reserved words} U { (, ), ., :, ;,…} U {Legal CPP identifiers} L = {x | x is in Σ* and x is a syntactically correct CPP program} Σ = {English words} L = {x | x is in Σ* and x is a syntactically correct English sentence} What is automata theory ? • Automata theory is the study of abstract computational devices • Abstract devices are (simplified) models of real computations • Computations happen everywhere: On your laptop, on your cell phone, in nature, … • Why do we need abstract models? A simple “computer” BATTERY input: switch output: light bulb actions: flip switch states: on, off A simple “computer” f BATTERY start on off f input: switch output: light bulb actions: f for “flip switch” states: on, off bulb is on if and only if there is an odd number of flips Another “computer” 1 1 start off 2 BATTERY off 1 2 2 2 1 2 off 1 on inputs: switches 1 and 2 actions: 1 for “flip switch 1” actions: 2 for “flip switch 2” states: on, off bulb is on if and only if both switches were flipped an odd number of times A design problem 1 BATTERY 2 3 ? 4 5 Can you design a circuit where the light is on if and only if all the switches were flipped exactly the same number of times? A design problem • Such devices are difficult to reason about, because they can be designed in an infinite number of ways • By representing them as abstract computational devices, or automata, we will learn how to answer such questions These abstract devices can model many things • They can describe the operation of any “small computer”, like the control component of an alarm clock or a microwave • They are also used in lexical analyzers to recognize well formed expressions in programming languages: ab1 is a legal name of a variable in C++ 5u= is not Different kinds of automata • This was only one example of a computational device, and there are others • We will look at different devices, and look at the following questions: – What can a given type of device compute, and what are its limitations? – Is one type of device more powerful than another? Some devices we will see finite automata Devices with a finite amount of memory. Used to model “small” computers. push-down automata Devices with infinite memory that can be accessed in a restricted way. Used to model parsers, etc. Turing Machines Devices with infinite memory. Used to model any computer. Preliminaries of automata theory • How do we formalize the question Can device A solve problem B? • First, we need a formal way of describing the problems that we are interested in solving Problems • Examples of problems we will consider – – – – Given a word s, does it contain the subword “fool”? Given a number n, is it divisible by 7? Given a pair of words s and t, are they the same? Given an expression with brackets, e.g. (()()), does every left bracket match with a subsequent right bracket? • All of these have “yes/no” answers. • There are other types of problems, that ask “Find this” or “How many of that” but we won’t look at those. General Model of Automaton ● An automaton is defined as a system where energy, materials and information are transformed, transmitted and used for performing some functions without direct participation of man. ● Examples are ● ● ● automatic machine tools, automatic packing machines, and automatic photo printing machines, etc. 26 General Model of Automaton ● In computer science the term 'automaton' means 'discrete automation' and is defined in a more abstract way as shown in figure below. 27 General Model of Automaton u 28 General Model of Automaton ● The characteristics of an automaton are: 4) State relation: The next state of an automaton at any instant of time is determined by the present state and input. 5) Output relation: The output is related to either state only or to both the input and the state. • It should be noted that at any instant of time the automaton is in some state. • On 'reading' an input symbol, the automaton moves to a next state which is given by the state relation. 29 General Model of Automaton ● We can note the following with regard to output relation: i. An automaton in which the output depends only on the input is called an automaton without a memory. ii. An automaton in which the output depends on the states as well, is called automaton with a finite memory. iii. An automaton in which the output depends only on the states of the machine is called a Moore machine. iv. An automaton in which the output depends on the state as well as on the input at any instant of time is called a Mealy machine. 30 Finite Automata Example of a finite automaton f on off f • There are states off and on, the automaton starts in off and tries to reach the “good state” on • What sequences of fs lead to the good state? • Answer: {f, fff, fffff, …} = {f n: n is odd} • This is an example of a deterministic finite automata over alphabet {f} Deterministic finite automata • A deterministic finite automata (DFA) is a 5-tuple (Q, S, d, q0, F) where – – – – – Q is a finite set of states S (sigma) is an alphabet d: Q × S → Q is a transition function q0 Q is the initial state F Q is a set of accepting states (or final states). • In diagrams, the accepting states will be denoted by double loops Deterministic finite automata • At the initial time, it is assumed to be in the initial state q0, with its input mechanism on the leftmost symbol of the input string. • During each move of the automation, the input mechanism advances one position to the right, so each move consumes one input symbol. • When the end of the string is reached, the string is accepted if the automation is in one of its final states; otherwise the string is rejected. Example 0 q0 1 1 q1 0,1 0 q2 State transition diagram For every transition rule d(qi,a)= qj, the graph has an edge (qi, qj) labeled a. The vertex associated with q0 is called the initial vertex, while those labeled with qf are the final vertices. Transition table inputs states alphabet S = {0, 1} Set of states Q = {q0, q1, q2} initial state q0 accepting states F = {q0, q1} q0 q1 q2 0 q0 q2 q2 1 q1 q1 q2 Language of a DFA The language of a DFA (Q, S, d, q0, F) is the set of all strings over S that, starting from q0 and following the transitions as the string is read left to right, will reach some accepting state. f M: on off f • Language of M is {f, fff, fffff, …} = {f n: n is odd} Examples 0 0 1 q0 q1 1 0 1 1 q0 0 0 q0 q1 1 1 q1 0,1 0 What are the languages of these DFAs? q2 Examples • Construct a DFA that accepts the language ( S = {0, 1} ) L = {010, 1} • Answer 0 q0 qe 1 q1 1 q01 0 q010 Examples • Construct a DFA that accepts the language ( S = {0, 1} ) L = {010, 1} • Answer 0 q0 1 0 qe 0 q01 q010 1 0, 1 1 q1 0, 1 qdie 0, 1 Examples • Construct a DFA over alphabet {0, 1} that accepts all strings that end in 101 Examples • Construct a DFA over alphabet {0, 1} that accepts all strings that end in 101 • Hint: The DFA must “remember” the last 3 bits of the string it is reading