Thus spake Andy Kunz (montana@FAST.NET):

> It's when the tokenizer can't figure out what a character is as soon as it
> sees it that things get touchy.  For example, if we were to support "$" as
> both denoting a hex value and the PC, we have to "look ahead" to see the
> next character before we know what to do.
>
>         movlw   $0c
>         goto    $-1

Yes, but single-character lookahead is required anyway - otherwise you would
not even be able to parse 1+1. It's best to
implement your own level of get and unget rather than relying on ungetc()
or equivalent. Apart from anything else, it makes life much easier if
you read a whole line of input at one time.

Lexical analysis can be tricky at times; consider the following
fragment of C code:

char    a[] = {
        0x0E+1,
        0x0D+2,
};

One of these is a legal expression in ANSI C, the other is not.

Cheers, Clyde

--
Clyde Smith-Stubbs    | HI-TECH Software,       | Voice: +61 7 3354 2411
clyde@htsoft.com      | P.O. Box 103, Alderley, | Fax:   +61 7 3354 2422
http://www.htsoft.com | QLD, 4051, AUSTRALIA.   |
---------------------------------------------------------------------------
Download a FREE beta version of our new ANSI C compiler for the PIC
microcontroller! Point your WWW browser at http://www.htsoft.com/