Thus spake Andy Kunz (montana@FAST.NET): > It's when the tokenizer can't figure out what a character is as soon as it > sees it that things get touchy. For example, if we were to support "$" as > both denoting a hex value and the PC, we have to "look ahead" to see the > next character before we know what to do. > > movlw $0c > goto $-1 Yes, but single-character lookahead is required anyway - otherwise you would not even be able to parse 1+1. It's best to implement your own level of get and unget rather than relying on ungetc() or equivalent. Apart from anything else, it makes life much easier if you read a whole line of input at one time. Lexical analysis can be tricky at times; consider the following fragment of C code: char a[] = { 0x0E+1, 0x0D+2, }; One of these is a legal expression in ANSI C, the other is not. Cheers, Clyde -- Clyde Smith-Stubbs | HI-TECH Software, | Voice: +61 7 3354 2411 clyde@htsoft.com | P.O. Box 103, Alderley, | Fax: +61 7 3354 2422 http://www.htsoft.com | QLD, 4051, AUSTRALIA. | --------------------------------------------------------------------------- Download a FREE beta version of our new ANSI C compiler for the PIC microcontroller! Point your WWW browser at http://www.htsoft.com/