Using Finite State Machines

BYTE magazine Oct 1979

David E Cortesi, 2340 Tasso St, Palo Alto CA 94301

I was pleased to see a good introductory article on the use of finite state machines appear in BYTE (see "Designing a Command Language" by G A Van den Bout, BYTE, June 1979, page 176). I have found the finite state machine (or finite state automaton, or just FSA) to be a valuable tool in my programmer's toolkit. The finite state machine is an aid to organizing one's thoughts while designing, a good way of producing a really unam- biguous specification document, and as an implemented program it can yield very efficient and reliable code. The finite state machine has long been a plaything of the theoreticians of computer science; you can find it described and analyzed in any textbook on compiler design (it is a good textbook if you can understand the description!). Unfortunately the finite state machine rarely moves out of the textbook and into practical programs. I would like to extend Van den Bout's article with 2 examples from my own experience as a professional programmer that show how the finite state machine solved difficult programming problems in the real world. The first case arose during the design of a timesharing system that was to have a large number of commands. The syntax of the command language was laid down ear- ly in the project, but the specification of the commands themselves kept changing. If I and my colleagues had tried to write detailed code to parse each of the many commands and operands, especially in the face of chang- ing specifications, we would have been swamped. We had to do something to systematize the command-parsing code. We hit on the idea of using finite state machines represented as directed graphs (like the figures in the previous BYTE article). Since we were using a macro-assembler, we created NODE and ARC macroinstructions so that we could "draw" the graph of a command by writing a series of macro calls. Listing 1 shows how some of the chess game commands in the prior article might look in such a macrolanguage. Each macroinstruction assembled to a small group of constants. We thought of these groups as the machine language of an imaginary finite state computer. We then wrote a finite state interpreter which could process these machine instructions. This interpreter program took as its input: (1) the top node of a graph; (2) the tokenized command line from the user; and (3) a small working storage area where semantic routines could leave their Listing 1: A graph representation of a finite state machine as it might look drawn with a macroassembler. The macroinstructions would assemble to machine language for a hypothetical finite state computer; that in turn would be simulated by an interpreter.

Sl	ANODE		; TOP NODE, ARCS SELECT VERB-TOKENS
	ARC TOKEN=KWD.VALUE='MOVE',NEXT=S2M
	ARC TOKEN=KWD.VALUE='CAP',NEXT=S2C 
	ARC TOKEN=KWD.VALUE='TAKE',NEXT-Sll 
	ZNODE 'MOVE, CAP, OR TAKE?' 
S2M 	ANODE VERB=1	; SET VERB-CODE OF MOVE 
	ARC TOKEN=ANY,NEXT=S2
S2C	ANODE VERB=E	; SET VER3-CODE OF CAP 
			; COMMON GRAPH F0R MOVE AND CAP 
S2 	ANODE 
	ARC TOKEN=KWD,VALUE='FROM',NEXT=S3 
	ARC TOKEN=KWD,VALUE='TO',NEXT=58 
	ZNODE '?? PLEASE SAY TO OR FROM' 
			; GRAPH OF 'FROM XX TO YY' PART 
S3	ANODE
	ARC TOKEN=POS,SEMACT=FRPOS,NEXT=S4
	ZNODE 'A POSITION MUST FOLLOW FROM'
S4	ANODE 
	ARC TOKEN=KWD.VALUE='TO',NEXT=S5
	ZNODE 'FROM XX -- EXPECTING TO'
S5	ANODE 
	ARC TOKEN=POS,SEMACT=TOPOS,NEXT=S6 
	ZNODE 'A POSITION MUST FOLLOW TO' 
			; GRAPH OF 'TO XX FROM YY' VARIANT
S8	ANODE ARC TOKEN=POS,SEMACT=TOPOS,NEXT=S9 
	ZNODE 'A POSITION MUST FOLLOW TO' 
S9	ANODE ARC TOKEN=KWD.VALUE='FROM',NEXT=Sl0
	ZNODE 'TO XX -- EXPECTING FROM' 
S10	ANODE ARC TOKEN=POS,SEMACT=FRPOS,NEXT=S6 
	ZNODE 'A POSITION MUST FOLLOW FROM' 
			; END-CHECK FOR MOVE AND CAP 
	ARC TOKEN=END 	; OMITTED tlEXT- MEANS 'ALL DONE' 
	ZNODE 'EXTRA OPERAND' 
Sll	ANODE VERB=3	; SET VERB-CODE OF TAKE 
... etc, etc, etc. 

Table 1: A finite state machine for processing numeric constants, represented as an array. Each row is a state of the machine; a column is selected by the next input token. At the intersection is the row number for the next step, and the name of an action to be done.

A0: do nothing 
Al: note negitive 
A2: collect intrger digit 
A3: note rational 
A4: note exponential 
A5: collect fraction digit 
A6: note negitive exponent 
A7: collect exponent digit

El: number(?) is <E>.. 
E2: number is null 
E3: <sign><sign>... 
E4: <sign><E>.. 
E5: <sign><end> 
E6: 0..<sign>.. 
E7: <digit>..<sign>.. 
E8: ..<decimal><sign>.. 
E9: ..<decimal><decimal>.. 
E10: ..<E><decimal>.. 
Ell: ..<E><E>.. 
E12: ..<e><end> 
E13: ..<E><sign><sign>.. 

Zl: exit, value is zero 
ZZ: exit, value is integral 
Z3: exit for rational 
Z4: exit for exponential


row

Input token

"+"

"-"

"0"

"1...9"

"."

"E"

<end>

12/A02/A13/A04/A25/A30/E10/E2
20/E30/E33/A04/A25/A30/E40/E5
30/E60/E63/A04/A25/A36/A40/Z1
40/E70/E74/A24/A25/A36/A40/Z2
50/E80/E85/A55/A50/E96/A40/Z3
67/A07/A67/A07/A70/E100/E110/E12
70/E130/E137/A77/A70/E100/E110/Z4