Contributor: SWAG SUPPORT TEAM Type RW_toKEN = Record token_str :String[9]; token_cod :toKEN_CODE; end; RW_Type = Array[0..9] of RW_toKEN; RWT_PTR = ^RW_Type; Const NULL = ''; Rw_2 :RW_Type = ((token_str : 'do'; token_cod : tdo), (token_str : 'if'; token_cod : tif), (token_str : 'in'; token_cod : tin), (token_str : 'of'; token_cod : tof), (token_str : 'or'; token_cod : tor), (token_str : 'to'; token_cod : tto), (token_str : NULL; token_cod : NO_toKEN), (token_str : NULL; token_cod : NO_toKEN), (token_str : NULL; token_cod : NO_toKEN), (token_str : NULL; token_cod : NO_toKEN) ); ...the difference being the explicit declaration of the Constant Record fields. (I'm used to Array Constants, not Record Constants - I was unaware of the requirement) PARSinG NUMBERS Now we'll concentrate on parsing Integer and Real numbers. The Pascal definition of a number begins With an UNSIGNED Integer. An unsigned Integer consists of one or more consecutive DIGITS. The simplest Form of a number token is an unsigned Integer: 1 9 120 12654 A number token can also be an unsigned Integer (the whole part) followed by a fraction part. A fraction part consists of a decimal point followed by an unsigned Integer, such as: 123.45 0.9987564 These numbers have whole parts 123 and 0 respectively, and fraction parts .45 and .9987564 respectively. A number token can also be a whole part followed by an EXPONENT part. An exponent part consists of an "E" (or "e") followed by an unsigned Integer. An optional exponent sign, + or -, can appear between the letter and the first exponent digit. Examples: 134e2 2E99 123e-45 73623E+4 Finally, a number token can be a whole part followed by a fraction part and an exponent part, in that order: 2.3498E7 0.00034e-66 I arbitrarily limit the number of digits to 20, and the exponent value from -37 to +37 - the exact value necessary to limit this value is dependant on how Real values are represented on the Computer. The "get_number" Function is likely to be the biggest Function in your scanner, but it should be relatively straighForward to code...in light of what has already been done With the scanner/ tokenizer module, and the definition of a number. EXERCISE #1 Write the get_number Function to parse Integers and Real numbers. You will need to add the following Types and Variables to your global data segment: Type { add "Real"s to list... } LITERAL_Type = (Integer_LIT, Real_LIT, String_LIT); LITERAL_REC = Record Case lType:LITERAL_Type of Integer_LIT: (ivalue :Integer); Real_LIT : (rvalue :Real ); String_LIT : (svalue :String ); end; Var digit_count :Word; count_error :Boolean; -------------- PART 2 --------------------------------------- The rest of this post will cover two simple topics - parsing Strings inside quotes, and parsing comments. PARSinG COMMENTS {} The Compiler should ignore the input between two curly braces ({}), and the curly braces themselves. My scanner is written so the entire comment is replace by a Single blank (" "), although you could possibly Write the scanner so that comments are _totally_ ignored. EXERCISE #2: Integrate COMMENT detection into the get_Char routine, so that when your Character fetching routine will ignore comments and pass a blank when a comment is encountered, skipping the comment entirely For the next fetch. Make sure that the routine keeps reading Until the right curly brace is detected, even past the end-of-line. if the end-of-File is encountered beFore the right curly brace is found, an "unexpected end" error should be generated. PARSinG StringS (QUOTES) '' The quote Character delimits Strings, any Character between the Strings is ignored by the Compiler, except to stored as a String LITERAL. if you wish a ' (quote) to be included in the literal, and extra ' must precede it. One possible tricky area is the {} (comment) Character. You must be careful not to inadvertently trigger the comment routine within the quote routine While reading a String, otherwise you will have a BUG. EXERCISE #3: Add a quote routine to the get_token routine within your module, to fetch Strings, as a LITERAL IDENTifIER when the QUOTE Character is detected. The following mods to your Types are required: Eof_Char = #$7F; Type Char_CODE = (LETTER, DIGIT, QUOTE, SPECIAL, Eof_CODE); { The following code init's the Character maping table: } Var ch :Byte; begin For ch := 0 to 255 do Char_table[ch] := SPECIAL; For ch := ord('0') to ord('9') do Char_table[ch] := DIGIT; For ch := ord('A') to ord('Z') do Char_table[ch] := LETTER; For ch := ord('a') to ord('z') do Char_table[ch] := LETTER; Char_table[ord(Eof_Char)] := Eof_CODE; Char_table[39] := QUOTE; end; ---------------------------------------------------------------- PLEASE, please let me know what you think about these posts, even if they're negative - I want to have some feedback on the difficulties, and whether or not people are having trouble following the material - I _can_ be more concise at the cost of being more verbose - if it's needed! if you are having problems With your source code, and want me to do a detailed examination of your code, expecially if it's written in a language other than Pascal, send me email via the Internet - to avoid "carpet bombing" the conference with undesired material. NEXT POST: Error codes, and putting your code to the test - our first utility (other than the lister) : a source Program Compactor (not cruncher). FUTURE POSTS: - Review and (hopefully) a status report from "students" - Symbol table - YA utility (cross - referencer) - YA utility (source Program CRUNCHer) - YA utility (source Program UNcruncher) - Parsing simple expressions - Utility : CALC, using infix-to-postfix conversions and stack ops. - Parsing statements - Utility: Pascal syntax checker part I - Parsing declarations (Var, Type, etc) incl's: much improved (and much more Complex) symbol table - Utility: Declarations analyzer. - Syntax Checker part II - Parsing Program, Procedure, and Function declarations (routines). - Syntax checker Part III - Review and discussion?