A dBASE NDX file consists of a header block followed by B-tree root, branch. and leaf blocks. The block size is fixed at 512 bytes. The header block contains the root block number, the end-of-file block number, the length of the search key, the search key expression, and a logical flag that indicates whether the index is generated only for unique keys.-The dBASE header block also contains several other fields--The group length of each key entry (described below!, the maximum number of keys per block, and a flag to indicate whether the key expression uses a date variable type (see Listing 1).
Root, branch, and leaf blocks may be in any order in the file. As entries are inserted, what was once a root block may become a branch block. Thus a number (a long) in the header block indicates the current root block in the file. Also, a second number indicates the end-of-file block where new nodes would be added. The key expression can be up to 100 characters and is stored in lowercase in the header block.
Each B-tree block contains a count (two bytes) of the number of keys stored in that block and two bytes of padding, followed by that number of search key groups. Each search key entry consists of a child node block number (long), a record number (long) in the database file, and the search key expression. The group length is the search key length plus 8 bytes for the two longs along with any padding needed to maintain a 4-byte boundary. The padding happens on the end of each key entry. So for a 30-character key expression (our river name for example) each search key entry would be eight bytes (two longs) plus 30 bytes (search key length) plus two bytes of filler for a group length of 40. Blocks within the file start with block one as the first B-tree block. Block zero is the header, so a block number of zero indicates no children (a leaf block). Database record pointers are stored only in the leaf blocks. All numbers and flags are stored in Intel little-endian format.
dBASE III doesn't bother to fill new blocks with nulls before use so you'll often see garbage bytes in any unused parts of a block. This adds to the confusion when inspecting an NDX file generated by dBASE. The Ashton-Tate developers have traded file size for variable alignment. All variables in the B-tree nodes begin on 4-byte boundaries. The worst case for filler would be a key expression of just one byte over a multiple of four (e.g., 29) that adds three bytes to each key entry. But most key expressions would average two bytes of filler (as in our 30-character river name for example).
dBASE IV supports NDX index files compatible with dBASE 111. dBASE IV also supports a new MDX index file formal. MDX index files can contain up to 47 individual index files in a single MDX file. This allows a large number of index files associated with a single DBF file to be grouped together to free file handles under DOS. MDX files can also be specified with differing block sizes--the default block size is 1024. Only a single MDX file can be associated with a DBF file.
On inspecting MDX files. you'll find a directory structure at the beginning of the file followed by a header block and B-tree blocks for each index file entry. The basic NDX B-tree file block structure is mostly preserved inside the MDX structure except that only a single long (either block number or record number) is used with each key entry. This results in smaller index file-. Again since dBASE doesn't clear out blocks with nulls before writing MDX directory and header blocks are filled with garbage except for the few real directory entries. MDX index files are likely to provide a performance improvement over NDX files when file handles are limited.
THE CLIPPER NTX ALTERNATIVE
Nantucket Software's Clipper (Summer '87 and new version 5.0) creates NTX index files. These index files are similar to dBASE NDX files with a few subtle changes (see Listing 2). First. the block size has been increased to 1()24 bytes This can work to an advantage since the minimum DOS cluster size on hard disks is two 512 byte sectors. With this addition. NTX files support a search key expression up to 256 characters. In some cases I suppose that a search key that long is needed. but this case looks like a poor database design to me. A 200- character search key is not going to be any speed demon. and index file sizes are going to be huge.
Clipper NTX files also add several other refinements. The NTX header block contains a pointer (offset) to a chain of free created by extensive deletions. The NTX B-tree blocks also contain an array of offsets (two-bytes) at the start of each block that points to the beginning of each search key entry in the node. This array handles the order of keys within the block. When new entries are inserted into a node. the group is added at the end of existing keys; the array offsets are shuffled to reflect the proper sequential order. Both branch and leaf blocks can contain pointers to the actual database records. This should slightly reduce the size of the index files, since keys in dBASE branch blocks are also duplicated in the leaf nodes. No filler or padding bytes are used. In addition. NTX files store a data variable as ASCII. differing from what dBASE does.
THE FOX IDX CHALLEN6ER
FoxBASE and FoxPro create an IDX indexfile. In this case I had no help from the vendors. But, stubborn fool that I am, I spent several late nights dumping and analyzing enough IDX files (seeing patterns in my sleep) to break the puzzle. Of course. IDX files are similar to NDX files. But Fox Software developers have made some substantial refinements. The basic information is still in the header block--root node. end-of-file, key length, key expression, and unique flag. The block size is still 512 bytes. The header block adds another pointer--I assume it's to free blocks. But within the B-tree blocks are where the changes occur (see Listing 3).
IDX B-tree blocks start with a flag that indicates the kind of block--root branch, or leaf. All of the record numbers are stored in the leaf blocks. Using the block flag, only a single long pointer is stored with each search key. For branch blocks, this long is the leaf block pointer. And within leaf blocks, each long is the record number of the database entry.
The IDX files also differ in that a block with 12 keys will have only 12 pointers to children--one per key expression. Child pointers always point to the block that compares "less than" to the key. The true B-tree structure has one more child pointer than it has keys in the node. Record numbers are stored as long offsets. in big-endian (Motorola) format while pointers and other flags are stored in little-endian (Intel) format.
As in Clipper, no filler or padding bytes are used. If a search key is 29 characters, then the group length would be 33 characters and some key groups would start on odd-byte boundaries.
Also. each leaf bloch contains two additional pointers to the left and right sibling blocks (with FFFFFFFFh denoting no sibling). In the academic jargon, this structure is the real B+tree. This scheme allows fast sequential scanning of a database file (the Skip command in dBASE) in either ascending or descending order. without ever having to trace up and down the tree structure. IDX files trade a smaller file size and more keys per node for potential odd byte boundaries. For example. using our 30-character river name. an IDX B-tree block will hold 13 key groups, compared to a maximum of 12 key groups in NDX files. The doubly-linked list of leaf nodes delivers very fast indexed sequential access.
Fox also stores dates and numbers in the search key in such a way that the key can be searched sequentially from left to right (big-endian). This differs from dBASE which stores dates as a floating point double (8-byte) value and numbers in Intel (little-endian) format. An interesting trade-off with the more compact index files is the possibility of key records starting on odd-byte boundaries. On older and slower 8088-based PC's. there is no penalty for odd-byte alignment. On the faster '86 and 386 machines. however. one experiences a mild penalty.
/*** dBASE III NDX file structures ***/ #define BLOCK_SIZE 512 /* our 3-tree block size */ #define KEY_LENGTH 30 /* our example search key expression length */ #define NBYTES 2 /* filler bytes to Take KEY LENGTH mod 4 */ #define MAX KEYS 12 /* this number varies based on key length, the follolling example is based a 30-character key expression */ typedef struct index_header { long root_num, /* block number of the root block */ long eof_num /* block number where new blocks are added */ long version; /* uncertain ?? */ short key_len; /* length of the search key expression */ short keys_max; /* Maximum # of keys per block <= 42 */ short int_or_date; /* Flag indicates if key is date or numeric */ short group_len; /* key len plus 8 padded to a multiple of 4 */ short dummy; short unique; /* Flag if index on unique keys only */ char expression[l00];/* Index expression used for search key */ } ndx_header; typedef struct key_group { /* dBASE III NDX key entry */ long block_num: /* For root & branch nodes, the block number */ long rec_num; /* For leaf nodes, the record number */ char searchkey[KEY_LENGTH]; /* our search key */ char filler2[NBYTES]; /* filier bytes to align on 4-byte boundary */ } ndx_key_group; typedef struct index_block { short num_keys; /* number of keys in this block node */ short filler1; /* pad to 4-byte boundary */ struct key_group[MAX KEYS]; long last_block_num; /* block number if greater than key */ } ndx_block;
/*** CLIPPER NTX file structures ***/ #define BLOCK_SIZE 1024 /* our B-tree block size */ #define KEY_LENGTH 30 /* our example search key epression length */ #define MAX_KEYS 24 /* this number varies based on key length, */ the following example is based a 30-character key expression */ typedef struct index_header { short sign; short version; long root_offset; /* offset in file of the root block */ long eof_offset; /* offset in file of free blocks */ short group_len; /* key_len plus 8 */ short key_len; /* length of the search key epression */ short key_dec; /* number of decimals in key epression */ short keys_max; /* Maximum # of keys per block; <= 92 */ short keys_half; /* Maximum # of keys per half-block */ char expression[256]; /* Index expression used for search key */ short unique; /* Flag if index on unique keys only */ } ntx_header; typedef struct key_group { /* Clipper NTX key entry */ long block_offset; /* Block offset for root & branch nodes */ long rec_num; /* For leaf nodes, the record number */ char searchkey[KEY_LENGTH]; /* our search key */ } ntx_key_group; typedef struct index_block { short num_keys; /* number of keys in this block */ short key_order[MAX_KEYS]; /* sorted offsets to keys */ struct key_group[MAX_KEYS]; long last_block_offset; /* block offset if greater than key */ } ntx_block
/*** FoxBASE and FoxPro IDX file structures ***/ #define BLOCK_SIZE 512 /* our 3-tree block size */ #define KEY_LENGTH 30 /* our example search key expression length */ #define MAX_KEYS 13 /* this number vdries bdsed on key length, the following example is based a 30-character key expression */ typedef struct index_header { long root_offset /* offset in file of the root block */ long free_ptr; /* just a guess */ long eof_offset; /* offset in file where new blocks are added */ short key_len; /* length of the search key expression */ short unique; /* Flag if index on unique keys only */ hdr epression[100]; /* Index epression used for the search key */ } idx-header; typedef struct key_group { /* FoxBASE IDX key entry */ union { long block_offset; /* Offset in file for root & branch nodes */ XLONG rec_num; /* In leaf nodes, record number big-endian */ } char searchkey[KEY_LENGTH]; /* our search key */ } idx_key_group; typedef struct index_block { short node_type; /* 0 = branch node, 1 = root, 2 = leaf */ short num_keys; /* number of keys in this block node */ long left_sibling; /* offset in file to left sibling block */ long right_sibling; /* offset in file to right sibling block */ struct key_group[MAX_KEYS]; } idx_block;
Questions:
Comments: