Using Finite State Machines

BYTE magazine Oct 1979

David E Cortesi, 2340 Tasso St, Palo Alto CA 94301

I was pleased to see a good introductory article on the use of finite state machines appear in BYTE (see "Designing a Command Language" by G A Van den Bout, BYTE, June 1979, page 176). I have found the finite state machine (or finite state automaton, or just FSA) to be a valuable tool in my programmer's toolkit. The finite state machine is an aid to organizing one's thoughts while designing, a good way of producing a really unam- biguous specification document, and as an implemented program it can yield very efficient and reliable code. The finite state machine has long been a plaything of the theoreticians of computer science; you can find it described and analyzed in any textbook on compiler design (it is a good textbook if you can understand the description!). Unfortunately the finite state machine rarely moves out of the textbook and into practical programs. I would like to extend Van den Bout's article with 2 examples from my own experience as a professional programmer that show how the finite state machine solved difficult programming problems in the real world. The first case arose during the design of a timesharing system that was to have a large number of commands. The syntax of the command language was laid down ear- ly in the project, but the specification of the commands themselves kept changing. If I and my colleagues had tried to write detailed code to parse each of the many commands and operands, especially in the face of chang- ing specifications, we would have been swamped. We had to do something to systematize the command-parsing code. We hit on the idea of using finite state machines represented as directed graphs (like the figures in the previous BYTE article). Since we were using a macro-assembler, we created NODE and ARC macroinstructions so that we could "draw" the graph of a command by writing a series of macro calls. Listing 1 shows how some of the chess game commands in the prior article might look in such a macrolanguage. Each macroinstruction assembled to a small group of constants. We thought of these groups as the machine language of an imaginary finite state computer. We then wrote a finite state interpreter which could process these machine instructions. This interpreter program took as its input: (1) the top node of a graph; (2) the tokenized command line from the user; and (3) a small working storage area where semantic routines could leave their Listing 1: A graph representation of a finite state machine as it might look drawn with a macroassembler. The macroinstructions would assemble to machine language for a hypothetical finite state computer; that in turn would be simulated by an interpreter.

Sl	ANODE		; TOP NODE, ARCS SELECT VERB-TOKENS
	ARC TOKEN=KWD.VALUE='MOVE',NEXT=S2M
	ARC TOKEN=KWD.VALUE='CAP',NEXT=S2C 
	ARC TOKEN=KWD.VALUE='TAKE',NEXT-Sll 
	ZNODE 'MOVE, CAP, OR TAKE?' 
S2M 	ANODE VERB=1	; SET VERB-CODE OF MOVE 
	ARC TOKEN=ANY,NEXT=S2
S2C	ANODE VERB=E	; SET VER3-CODE OF CAP 
			; COMMON GRAPH F0R MOVE AND CAP 
S2 	ANODE 
	ARC TOKEN=KWD,VALUE='FROM',NEXT=S3 
	ARC TOKEN=KWD,VALUE='TO',NEXT=58 
	ZNODE '?? PLEASE SAY TO OR FROM' 
			; GRAPH OF 'FROM XX TO YY' PART 
S3	ANODE
	ARC TOKEN=POS,SEMACT=FRPOS,NEXT=S4
	ZNODE 'A POSITION MUST FOLLOW FROM'
S4	ANODE 
	ARC TOKEN=KWD.VALUE='TO',NEXT=S5
	ZNODE 'FROM XX -- EXPECTING TO'
S5	ANODE 
	ARC TOKEN=POS,SEMACT=TOPOS,NEXT=S6 
	ZNODE 'A POSITION MUST FOLLOW TO' 
			; GRAPH OF 'TO XX FROM YY' VARIANT
S8	ANODE ARC TOKEN=POS,SEMACT=TOPOS,NEXT=S9 
	ZNODE 'A POSITION MUST FOLLOW TO' 
S9	ANODE ARC TOKEN=KWD.VALUE='FROM',NEXT=Sl0
	ZNODE 'TO XX -- EXPECTING FROM' 
S10	ANODE ARC TOKEN=POS,SEMACT=FRPOS,NEXT=S6 
	ZNODE 'A POSITION MUST FOLLOW FROM' 
			; END-CHECK FOR MOVE AND CAP 
	ARC TOKEN=END 	; OMITTED tlEXT- MEANS 'ALL DONE' 
	ZNODE 'EXTRA OPERAND' 
Sll	ANODE VERB=3	; SET VERB-CODE OF TAKE 
... etc, etc, etc.

Table 1: A finite state machine for processing numeric constants, represented as an array. Each row is a state of the machine; a column is selected by the next input token. At the intersection is the row number for the next step, and the name of an action to be done.

A0: do nothing 
Al: note negitive 
A2: collect intrger digit 
A3: note rational 
A4: note exponential 
A5: collect fraction digit 
A6: note negitive exponent 
A7: collect exponent digit

El: number(?) is <E>.. 
E2: number is null 
E3: <sign><sign>... 
E4: <sign><E>.. 
E5: <sign><end> 
E6: 0..<sign>.. 
E7: <digit>..<sign>.. 
E8: ..<decimal><sign>.. 
E9: ..<decimal><decimal>.. 
E10: ..<E><decimal>.. 
Ell: ..<E><E>.. 
E12: ..<e><end> 
E13: ..<E><sign><sign>.. 

Zl: exit, value is zero 
ZZ: exit, value is integral 
Z3: exit for rational 
Z4: exit for exponential


row Input token
"+" "-" "0" "1...9" "." "E" <end>
1 2/A0 2/A1 3/A0 4/A2 5/A3 0/E1 0/E2
2 0/E3 0/E3 3/A0 4/A2 5/A3 0/E4 0/E5
3 0/E6 0/E6 3/A0 4/A2 5/A3 6/A4 0/Z1
4 0/E7 0/E7 4/A2 4/A2 5/A3 6/A4 0/Z2
5 0/E8 0/E8 5/A5 5/A5 0/E9 6/A4 0/Z3
6 7/A0 7/A6 7/A0 7/A7 0/E10 0/E11 0/E12
7 0/E13 0/E13 7/A7 7/A7 0/E10 0/E11 0/Z4

row	Input token
row	"+"	"-"	"0"	"1...9"	"."	"E"	<end>
1	2/A0	2/A1	3/A0	4/A2	5/A3	0/E1	0/E2
2	0/E3	0/E3	3/A0	4/A2	5/A3	0/E4	0/E5
3	0/E6	0/E6	3/A0	4/A2	5/A3	6/A4	0/Z1
4	0/E7	0/E7	4/A2	4/A2	5/A3	6/A4	0/Z2
5	0/E8	0/E8	5/A5	5/A5	0/E9	6/A4	0/Z3
6	7/A0	7/A6	7/A0	7/A7	0/E10	0/E11	0/E12
7	0/E13	0/E13	7/A7	7/A7	0/E10	0/E11	0/Z4