Finite-State Machines Derived from the on-line notes by Dr. Matt Stallmann & Suzanne Balik. A finite-state machine (FSM) is an abstract model of a system (physical, biological, mechanical, electronic, or software). An FSM consists of • States: a finite number of states that represent the internal "memory" of the system. • Inputs to the system (e.g., a user doing something, a character read from the input) • Transitions, which represent the "response" of the system to its environment. Transitions depend upon the current state of the machine as well as the current input and often result in a change of state. • An initial state, which is the state the system is in before it accepts any input. • Final states, states that represent a legal end to a transaction. Simple example: A soda-pop machine Consider an FSM that models a soda-pop machine that dispenses sodapop for 30 cents. The possible inputs to the machine are n - nickel, d - dime, q - quarter, s - select soda. The states of the machine are designated by circles. Each state is labeled with the amount of money that has been deposited so far. State 00 is the initial or start state. This is indicated by the incoming arrow. States which represent a total input of 30 or more cents are considered final states and are designated by double circles. The transitions from state to state are shown as arcs (lines with arrows). Each transition is labeled with the input that caused it. Let’s take an example. Suppose the user inserts a sequence of coins … q takes the machine to state 25; another q takes it to state 50. It dispenses soda pop & change and transitions back to state 00. At this point, the user could select a drink. The machine would also have to give change. An FSM for an ATM A finite-state machine can also be used to model a simple automated teller machine (ATM). In this case, each transition is labeled with an input— • from the user (such as insert card), or • from a central database (such as PIN OK versus bad PIN). Each of the transitions may, in actuality, involve a number of complex steps which are not shown on the diagram. Recognizing input streams FSMs can also be used to precisely define programming-language syntax. Another application of finite-state machines is as a notation for the precise description of a programming language syntax. Consider this description of real constants in Pascal: A real number in PASCAL is written as a string of digits containing a decimal point. There must be at least one digit before and after the decimal point. Real data may also be represented using PASCAL scientific notation. A real data item written in scientific notation consists of a sign followed by a real number, followed by the letter E, another sign and an integer (+ signs may be omitted). The English description is imprecise. Are the strings .9 and 9. valid, or do you have to say 0.9 and 9.0, respectively? This FSM answers the question: A valid real constant in Pascal is any string that leads the FSM from the start state to a final (double-circled) state. What happens when we feed the FSM .9? Starts out in State 0. There’s no transition for a dot, so that’s an illegal input. What happens when we feed it 9.? Starts out in State 0, and then goes to State 2 and then to State 3. But that’s an error, because State 3 is not an accepting state. What about the string 9.0? Starts out in State 0, and then goes to State 2 and then to State 3, and then Step 4. This is an accepting state, so 9.0 is a legal real constant. (Note that .9 and 9. are valid floating-point constants in Java.) Text Processing with FSMs Finite-state machines are often used in text processing. The grep utility takes a string or regular expression and converts it to an FSM before doing a search. Simplifed example: Suppose a file consists of as and bs only and the search is for the string "abba". Here’s an FSM for doing this search: If, for example, this FSM were used to locate the string "abba" in a file containing "aababbbabba...", it would move through these states: Input: a a b a b b b a b b a State: 0 1 1 2 1 2 3 0 1 2 3 4 The states listed above are the states of the machine after the corresponding input. When the final state 4 is reached, the search is successful. We say that the machine has recognized the string. Exercise: Design an FSM that recognizes a string consisting of alternating as and bs, beginning with an a. If the input consists of anything else (e.g., abababaab, the FSM should continue to read it but not recognize it. Here is how an FSM can be developed and translated directly into a program. Consider the following specifications: • A word is a maximal sequence of zero or more upper- and zero or more lower-case letters. Under this definition, what about the sequence "Go State!"? Is "State!" a word? No; it contains a punctuation mark. Is "tat" a word? No, because it is not maximal. How about "irregardless" Yes, because it’s a sequence of (0 or more) upper & lower-case letters. • wc is the word count, initially 0 • lc is the line count, initially 0 • cc is the character count, initially 0 Here’s an FSM to recognize "words." Note that each transition is labeled with an action as well as an input. In state 0, the FSM remembers that we're not currently in the middle of a word, while state 1 remembers that we are. Question: Why are there no final states in this FSM? It’s not trying to recognize anything; it’s just counting characters. An FSM’s behavior can also be specified by a table. Each row corresponds to a state Each column corresponds to an input symbol (or category of input symbols). The table version for the word counting FSM is given below. State A–Z a–z \n other 0 1: ++wc ++cc 0: ++lc, ++cc 0: ++cc 1 1: ++cc 0: ++lc, ++cc 0: ++cc A standard idiom for translating an FSM into a program is the whileswitch idiom. • A while loop gets one character at a time from the input stream. • Inside the loop is a switch statement that causes different code to be executed based on the current state. Here is the code. import java.io.*; public class WordCounter{ public static void main(String[] args) { if (args.length == 1) { try { BufferedReader br = new BufferedReader( new FileReader(args[0])); int wc = 0, lc = 0, cc = 0; char ch; int state = 0; int next; while ((next = br.read()) != −1) { ch = (char) next; switch (state) { case 0: if ( ch == ’\n’ ) { ++lc; ++cc; } else if ( Character.isLetter(ch) ) { state = 1; ++wc; ++cc; } else ++cc; break; case 1: if ( ch == ’\n’ ) { state = 0; ++lc; ++cc; } else if ( Character.isLetter(ch) ) ++cc; else { state = 0; ++cc; } break; default: System.out.println("Invalid state: " + state); } } System.out.println( lc + "\t" + wc + "\t"+ cc); } catch(IOException e) { System.out.println("File error: " + e); System.out.println( "usage: java WordCounter filename"); } } else System.out.println("usage: java WordCounter filename"); } } Within each case, the program makes a decision based on the current input character. Simplifying the program The program can be simplifed by noting that— • ++cc is done on every transition, • whenever the current input is \n we do ++lc and go to (or stay in) state 0, and • otherwise, the state and counters change only when a letter is encountered in state 0 or a character other than a letter (or a newline) is encountered in state 1. The simplified program looks like: import java.io.*; public class WordCounter{ public static void main(String[] args) { if (args.length == 1) { try { BufferedReader br = new BufferedReader( new FileReader(args[0])); int wc = 0, lc = 0, cc = 0; char ch; int state = 0; int next; while ((next = br.read()) != −1) { ch = (char) next; ++cc; if (ch == ‘\n’) { ++lc; state = 0; } else if (state == 0 && Character.isLetter(ch)) { ++wc; state = 1; } else if (state == 1 && !Character.isLetter(ch)) { state = 0; } } System.out.println( lc + "\t" + wc + "\t"+ cc); } catch(IOException e) { System.out.println("File error: " + e); System.out.println( "usage: java WordCounter filename"); } } else System.out.println("usage: java WordCounter filename"); } } So FSMs can sometimes be "optimized" considerably. But, to avoid bugs, it is best to start with the straightforward implementation! Summary 1. Finite-state machines can be used to model the interaction between a system and its environment. 2. The state of an FSM remembers what has occurred so far. In addition to the FSM state, there may be variables that remember other details. The designer has to use judgment to decide what to model with an FSM state and what to leave as a variable. 3. A transition occurs when an event in the environment causes the system to change state (either the FSM state or variables). 4. A FSM can be depicted either by a bubble diagram or a transition table. 5. In text stream applications each transition corresponds to a single input character. 6. The while-switch idiom gives a method for mechanically translating a FSM to a program. Simplifications can be made by taking advantage of special features of the particular FSM.