Meta-Language

The "language" proposed here is a type of real time state machine or token interpreter which is targeted at pattern matching and translation using a variation of the BNF grammer. Other examples are CUMP Byte Code, and Minimal Controller Idea.

Examples

| "when you see this" ? "output this" .

| "or change this" ? "into this" .

| @Timer "12:00:00" ? "Good Morning!" @Day = Day + 1 .

| &h1B ? @PCLState "1" | "&f" ? @PCLMacro

| "*p" num | 'X' ...

| @Comment "This is a comment, it will be copied to the bit bucket" .

to get the current exchange rate with Italy:
@http://www.x-rates.com/htmlgraphs/ITL30.htm 'current rate: ' number ? 'Our exchange rate with Italy is ' number .

to translate written text to numbers. "one-two" becomes "1-2" The ":" word causes the OutBuffer to become a local buffer so that the digits that are matched are placed in a local holding area. When digit is referenced after the "?" word, the InBuffer is redirected to that local buffer. Each call to digit adds or removed one entry from the local buffer so that more than one digit can be matched in a single pattern.

| 'digit': |'zero'?'0'.

|'one'?'1'.

...

|'nine'?'9'.

.

| digit '-' digit ? digit '-' digit

Word Description

The idea of very small, punctuation type words rather than english keywords is not commonly seen in programming languages. Years ago, I wondered why the punctuation marks in standard english text were of no use. The Bitscope Command Set is a brilliant example of the application of punctuation to small programming languages. Compair this to the Oricom ICTC command set, NPCI or the PICos commands. CUMP Byte Code, and Minimal Controller Idea are other attempts. Meta-L tries to use the same meaning of punctuation symboles from human text in program code. "(" and ")" denote a comment rather than a memory reference.

Word Description

|

also
;
"Fail to" This word does the following:

At compile time:

checks and calls OutBuf if necessary to get room for the following
places the address (or token) of its run-time code at the OutBuf pointer and increments the pointer
pushes the address of the OutBuf pointer and increments the pointer by a sufficient amount to reserve space for an address or offset which will eventually refer to just past the matching period.

At run-time

Inserts its decision-time address onto the return stack with the address previously stored after the call to the runtime code
Clears the "still-working-on-this-option" flag for the ":" (define) word

At return-time aka decision-time (after the following words have executed and if they return or the "." word has caused a return)

If the "success/fail-flag" is set, changes the IP to the address stored on the stack with the decision-time address.
If the "success/fail-flag" is not set, removes the address stored on the stack with the decision-time address and generates a return.

" "

also
&r
"Match" The quote form is for standard ASCII text. The & form is followed by a code indicating radix and is used for binary data e.g. &h1B or &b00011011 or &o033 are the same value. r can also be a 2 digit decimal number so &h and &16, &o and &08, &b and &02 are the same. (note that Meta-L is all about conditionals, so the logical "and" is implied, and "or" is "|" aka Fail to). Match works differently depending on whether the "?" aka succeed word has appeared. Prior to ?, match rejects or continues testing the value of an input buffer. After ?, match (and its "=" form, see below) is copying the result of a match to the output buffer. The match word does the following :

At compile time if succeed ("?") has not appeared yet:

checks and calls the current OutBuf method if necessary to get room for the following (or at least for the run-time code reference, count and a few bytes of the string)
places the address of its run-time code at the OutBuf pointer and increments the pointer
counts the text (up to 254 bytes to make matching an entire buffer easy or the number of free bytes left in the current Dictionary Chunk) until the next quote and places that count followed by the text at the OutBufPtr and points it past the text.
if bytes remain, the previous two steps are repeated (allocating Dictionary Chunks and traversing InBuf Chunks as needed) until all the text is processed.

At run-time

compares the text at InBufPtr (see "@") to the text following the run-time address (immediate) and causes a return with CF set if they do not match
If there is not enough text at InBufPrt to complete the comparison, calls InBuf to get (or wait for) more text.

At compile time after succeed ("?") has appeared:

checks and calls the current OutBuf method if necessary to get room for the following (or at least for the run-time code reference, count and a few bytes of the string)
places the address (or bytecode) of its run-time code at the OutBufPtr and increments the pointer
counts text (up to 254 bytes or the end of free space in the current OutBuf Chunk) until the next quote or until the "<" word and places that count followed by the text at the OutBufPtr and points past the text
If the "<" word is found, its compile time code is called to process the text.
if bytes remain, the previous three steps are repeated (traversing OutBuf and InBuf Chunks as needed) until all the text is processed.

At run-time

checks and calls the current OutBuf method if necessary to make room then moves the text following the run-time address to OutBufPtr and calls OutBuf to dispose of it. This can be done byte at a time in embedded systems.

=
"Equals" This is a combination of "Match" and "Evaluate" when succeed ("?") has appeared.

At compile time

checks and calls the current OutBuf method if necessary to get room for the following (or at least for the run-time code reference, count and a few bytes of the string)
places the address (or bytecode) of its run-time code at the OutBufPtr and increments the pointer
Begins the evaluation of the words in the section using simple math and variable names defined by the @ word. Constants are expressed in decimal by default. Other radix are indicated by different "&" numbers. For example "&o" is octal and "&b" is binary.

At run-time

checks and calls the current OutBuf method if necessary to make room then continues by executing the words following to produce output. It doesn not affect the "success/fail-flag"

See also: "< >"

?
"Success" or "Then" This word indicates that a match has been made and sets up the system for output by switching the meaning of the "" word to the output version vice the input version. At run-time it resets the "success/fail-flag" so that the "|" (fail-to) word will return rather than continuing to the next possible match.

< >
"Evaluate" Begins the evaluation of the words in the section using simple math and variable names defined by the @ word. Constants are expressed in decimal by default. Other radix are indicated by different "&" numbers. For example "&o" is octal and "&b" is binary. See also: "="

@
"At" changes InBuf (before the ?) or OutBuf (after the ?) to a different run-time address according to the text following the word and calls the method code for that word with a "change-time" message. Think of this as an assembler ORG directive. Actually, @ can just change the outbuffer to point to the InBuf (before the ?) or OutBuf (after the ?) and the word following can just OutBuf its address.
Each defined word has a local buffer to hold the result of execution of that word. These local buffers are accessed by their own words (and tokens) since the word itself is responsible for setting the pointer to the local buffer. Words that always generate fresh content when accessed will point to a zero length buffer (with a method token before the length) so that their method is always called when InBuf is accessed.
Some pre-defined buffer names may include:

For Console IO: @Display, @StdIn, @StdOut
Files, by protocol: @file:d:\path\file.ext, @http://URL/file.ext, @ftp://server/path/file.ext, @telnet
The file buffer method is used to open the file via the local file system, http get, ftp protocol, etc... and the cache is filled with data.
For embedded use: @OutPort , @InPort, @FLASH the address of memory in the onboard FLASH memory, @PCMCIA the address of memory in the external PCMCIA FLASH memory.

When calling MetaL from the command line in an OS, the InBuffer will start as the command line and the OutBuffer will be internal memory. The command line can change the InBuffer and or OutBuffer to StdIO or what ever as needed.

+n
-n Sets InBuf or OutBuf to the buffer name plus an offset.

.
"Return" or "Stop" this word returns to the previous level after setting up the system for input by switching the meaning of the " word to the input version vice the output version.

:
"Define" causes a new word to be defined in the token dictionary. That word will store and keep track of what ever its code produces and allow that product to be recalled and used in other code.
At First:

The following items are placed in the OutBuffer:

The define "builds" token
the value of a new token which is entered into the Token Table to point to:
a new local thread which calls the Define "Does" buffer handling method followed by
a local buffer, allocated right in the OutBuf. (see Memory Allocation for Buffers).
Tread execution continues in order to compile the rest of the definition of the defined word into the OutBuf following the local buffer.

"Builds"

The next execution of a given instance of define (when the code before the ":" succeeds) will cause the token created by that define to be placed in the outbuffer followed by a count of the number of references to this word in the current code. This count is incremented after each reference and cleared when "?" (Then) is compiled.
The reference count is also stored on the stack, and a new thread is launched to finish compiling the code following the reference to this word. After that thread completes (the "." is executed), the count is restored to its value at the point where the defined word was referenced.
If the match for the name of the defined word occured after the "?" (Then), the text following the name of the word is checked for a "#" and a number. This number, or the current reference count, is placed after the token in the Outbuffer.
The defined code itself is not executed. The thread started by execution of the define builds token is stopped (like the "." (stop) word).
"Does"

When the code that uses that definition is executed a new thread will start with the Define "Does" code (basically a buffer handler method) which behaves differently if executed before or after "?" (Then).
Before "?" (Then) it will save the OutBufPtr, and make the local Buffer the OutBuffer,
Each time the word is called, prior to the "?" (Then) word, the reference count for this instance of the word is checked and the local buffer from this entry on will be cleared. Data resulting from the execution of the thread after the original ":" will be appended to the buffer pre-pended with a length byte.
In any pre "?" (Then) case, DefineDoes will execute the code following the local Buffer (the code that was compiled when the word was defined). When that code has completed, the OutBufPtr will be restored and any data OutBuffed by the code following the define will be available to the calling code by reference to the defined tokens local buffer.
After the "?" (Then), executing the defined word will simply tranfer the entry specified by the reference count from the local buffer to the OutBuf, and return without executing the code for the defined word.

( )
"Comment" Text enclosed in parens are ignored or used to describe the word when used after a ":"

Each word address is expected to point to a machine language command (i.e. the "interpreter" just jumps to the code at that address) and that command can be a call (or more likely a software interrupt or "restart") that invokes the interpreter with the address of the word on the stack which will point to the addresses that make up the word.

Rather than using word addresses, we can use a single byte "token" or index to a memory address in a table. To increase the number of available subroutines past 255, some of the routines can implement their own jump table and fetch the value of a second byte following their own token in the code to use as an index in their own table. This process can be repeated as often as needed.

To improve performance, the current thread IP can be held in a register (vice on the stack) and can be pushed to the stack when a new thread is started by the interpreter.

Instead of a conditional word like MATCH popping its thread to fail to the next entry, Just set the CY flag and return to FailTo which will then Jump to the next entry (if CY=1) or return (popping the FailToPtr) to the Parent (if CY=0) when a simple TRet is encountered indicating success at the end of a series of trials.

The following example assumes that the inbuffer contains "good"

Program Stack Comment

addr0 Begin (call interpreter) IPInt <addr0>

Failto <addr1> IPInt <addr0> FailtoRun <addr1>

Match "bad" IPInt <addr0> FailtoRun <addr1> ;carry flag set and return

IPInt <addr1> ;FailtoRun Jumps to <addr1>

addr1 Begin IPInt <addr1>

Failto <addr2> IPInt <addr1> FailtoRun <addr2>

Match "good" IPInt <addr1> FailtoRun <addr2) ;Carry flag clear

IPInt <addr1> ;FailtoRun continues at new IP

TRet ;FailtoRun drops <addr> and returns

Memory Allocation for Buffers

By default, Buffers are initialized as small memory areas (16 bytes) just after their name entries in the dictionary. The pointer compiled into new code that references this buffer will point to the length byte following a pointer to the method used to refill or empty the buffer (see figure 2) If the length is zero, or insufficient for the required operation, the method requesting access to the buffer should call the pointer to the buffers method with a "fill with x bytes" message sent, this will return a pointer to the filled buffer (not necessarily at the original address) with a new length indicating the actual number of bytes now in the new buffer. The method is responsible for interpreting the rest of the data in the buffer. For example: A dynamic method will take the next byte in the buffer as a chunk pointer (and see zero,zero as meaning that there is no data and no chunks) but a virtual pointer will look for a cache len byte or zero followed by a cache chunk pointer. This system allows binary data to be stored without escapes via the length byte, but switches to a zero terminated string of pointer or other data when the length byte is zero. Also, the original pointer to the buffers length byte should not be confused with a buffer pointer (which would point not to a length byte but rather to a position in the data of the buffer).

*Memory image of various buffer types*
[(match)<len)(buffername)](bufdef)(token)(startthread)(method	static)(buffer len:byte=[1...255])(buffer data:15 bytes)
	dynamic)(byte=0)(chunk pointer)[(chunk pointer)](byte=0)
	virtual)(byte=0)(cache len)(cache:14bytes max) or (byte=0)(byte=0)(cache chunk pointer)(byte=0)
	file)(byte=0)(handle len)(handle/zero if not open)(url len)(url)(offset len)(file offset)(cache chunk pointer)

After a static buffer is filled with 15 bytes of data, its method can be called to "make room" at which time, the static method will be replaced with a dynamic method and a chunk allocated, the original data moved to it, a chunk pointer inserted in the buffer, and the length changed to zero. If the dynamic buffer is not often accessed and free chunks are no longer available, the chunk allocator may replace the dynamic method with a virtual method and move the data to a disk file.

It should be kept in mind that all the program code, data, dictionary etc.. that make up the system must also be contained in buffers. Match, fail, Outbuf must all take into account the number of bytes left in the current In and Out buffers and break up output / repeat input operations accordingly. All programs will basically be paragraph aligned to simplify memory management.

Buffers Length

Buffers have no maximum length but are processed in 255 byte chunks. The first byte in a buffer contains its length. As the buffer exceeds 255 bytes an additional index chunk is added that tracks the number of 255 byte chunks that are contained in the buffer, An invalid byte length value (255, or zero for easier testing with zero,zero reserved to flag buffers that need to be refilled) at the start of this new chunk acts as a flag that this index chunk is present. All the bytes in the index chunk point to other chunks. When the buffer exceeds 255 chunks of 255 bytes of 64k of data, the next "meta" index chunk is added indicating one full chunk chunk of data. There are 256 chunks in a segment.
Note that with this system, it is entirely legal to delete part of a chunk without having to move the entire file. I.e. we do not have to fill each chunk completely.
With 14 bytes of data for chunk references, a single small (16 byte) buffer entry in the dictionary would allow access to 255*14=3570 bytes of data without calling a method and using only two buffer pointers; one to the small buffer data to select the chunk pointer and one to select the byte in the chunk pointed to. By increasing the number of levels of index chunks, any amount of data can be stored.
To optimize for buffers that are normally accessed sequentially (and remove the need to store the index chunk in ram which may be critical in embedded implementations) could buffer data be combined with pointers in one chunk? e.g. (byte=0)(chunk pointer)(byte=0)(length byte)(data)

in	of memory there are:	each containing:	which can be selected by a pointer of:
4GB	4G bytes	1 byte	32 bits
4GB	16M chunks	256 bytes
4GB	64k segments	65536 bytes	16 bits
1MB	1M bytes	1 byte	20 bits
1MB	4096 chunks	256 bytes
1MB	16 segments	65536 bytes	4 bits
640KB	2560 chunks	256 bytes
640KB	10 segments	65536 bytes	4 bits
64KB	64k bytes	1 byte	16 bits
64KB	256 chunks	256 bytes	8 bits
64KB	2048 buffer entries	about 32 bytes

Chunk Pointers

To access an unlimited amount of memory without allocating an unlimited sized address pointer, we can use very small relative pointers that are combined by having the first pointer point to a second pointer that points etc.... when needed to reference more distant memory.
One problem with this approach is that when an area of memory becomes completely used, no second pointer may be available within range of the first. To reduce the risk of this occurring, the meaning of the pointers can be changed to allow some values to point to distant memory while others point to local memory with greater granularity. For example, rather than interpret successive values of a pointer as successive local chunks, cause every other value to point to a chunk in another segment. Ideally, most values would still point to local chunks distributed evenly (every other or every 4th etc..), a smaller number of values would point to chunks less local and still a smaller number of values would point to chunks still farther away until finally only a very few values would point to chunks very far away.
The system of value interpretation selected is as follows: shift the chunk pointer left by the number of bits indicated by the lower 2 bits of the chunk pointer themselves shifted left one and add to the result the original lower two bits. This should be easily implemented in most processors with a register A to register B copy, bit mask register B (and with 0FCh), shift register A left by register B value, shift register A left by register B value (again), and an add register B to register A. An A86 macro has been developed. The following is an illustration of the chunks that can be referenced from a chunk pointer relative to any given chuck that contains the pointer via this interpretation:

Each line of pixels represents one segment (256 chunks of 256 bytes or 64k bytes) and each column represents one chunk with in that segment. The illustration was generated with a simple QBASIC program which also prints a list of address offsets in order generated from chunk pointer value and sorted by offset value with distance between offsets. A few chunks almost 16k chunks or 4M bytes away and about every 4th chunk in the local segment can be referenced. This is comparable to the x386 index*scale addressing mode. A second problem with this approach is that each time a new level of pointers is added, space to store the current value of the pointer must be allocated. Since all pointers are being mapped to the physical addresses in the machine (e.g. an index register), we are certainly aware that some actual upper limit on the address will be reached, however we do not wish to impose this limit in the structure of the language but only in the actual implementation on a specific machine. Chunk Allocation

We can specify that some number of chunks be used to hold free/allocated information in each byte rather than a chunk pointer. One chunk per segment suffices if each chunks free / allocated values are limited to 255. Each method that allocates a chunk must select the related byte in the index and increase its value if it is not already 255. For simplicity, it would seem to be best to combine this system with a system for tracking frequency of chunk access for virtual memory management. If we take the value 0 to indicate availability, allocating a new chunk would consist of:

point to the index chunk of the local segment
scan for a zero value in a byte related to a chunk that can be pointed to from the chunk we are adding to, if not found, repeat with each segment that can be referenced by the chunk we are adding to until no segments are available, then call the virtual memory manager to dump to secondary memory the least used chunk from the set to which we can point.
change the value of the index byte related to the chunk to be allocated from zero to one
compute and add a chunk pointer to the new chunk

Virtual Buffers

As buffers grow to large sizes, it may be necessary to move less frequently used parts (chunks) of the buffer to secondary memory (hard disk). This implies the following.

We must track which chunks are used less frequently
We must allocate a file on the disk and track which chunks are stored in it and where they are in the file
When a chunk that is not in memory is needed, some mechanism must retrieve it from the file.

One solution may be to convert the memory buffer to a file buffer and then keep track of the chunks that are in cache. We can specify that some number of chunks be used to hold access count information in each byte rather than a chunk pointer. One chunk per segment suffices if each chunks access count is limited to 255. Each method that accesses a chunk must select the related byte in the index and increase its value if it is not already 255. It should be noted that we wish to remove first groups of chunks pointed to by index chunks (because the overhead required to track the position of the removed chunk is probably greater than the space that would be released by the removal of one chunk) that A) do not contain data often needed and B) are not index chunks themselves that are used to get to data that is often needed. Also, while the relationship between index chunks that point to infrequently used data is related to the structure of buffers, it is entirely possible that one part of a buffer will be constantly in use while another part of the same buffer is never used after its initial allocation.
For simplicity, it would seem to be best to combine this system with a system for tracking unused chunks. See Chunk Allocation elsewhere.
When the virtual memory manager is called to free space:

scan through the segment index chunks for each available segment to find the lowest access count value greater than zero of a chunk index.
verify that the chunks pointed to in the selected chunk index are not frequently used
copy all the chunks pointed to by the index chunk to secondary memory, change their segment index chunk byte value to zero (marking them as free), and place a reference to the location in the index chunk after a zero length value and a method token for virtual memory reloading. This assumes that the code reading the chunks will know to call the method after the zero byte (rather than before the zero length byte as in a buffer) when it encounters an index chunk (first byte 255) with no chunk pointers or a chunk with a zero length.

In the case that only one chunk referenced in a chunk index is repeatedly called when all the other chunks are never used; if we only increment the chunk usage count for the index chunk when we shift from one chunk to the next - since the count will remain low for the index there is a better chance that it will be selected to page out (good) and if the high usage count for the high usage chunk is noted when the indexed chunks are being processed, perhaps the index chunks can be reorganized so that the high usage chunk is moved to a different index. For example a buffer with only one commonly used chunk could end up with chunk pointers to index chunks that are all paged out followed by one chunk pointer directly to the we want to keep handy followed by more chunk pointer to paged out index chunks. Actually all the paged out chunks should be combined into one paged out index chunk before the commonly used chunk and one other after it. This results in only 3 chunks and the buffer area as the worst case minimum memory usage for any size buffer where only one chunk is really needed.

Buffer list

Normally the buffer list would be spread out through memory as new buffers are created in the dictionary and old buffers are destroyed with matches and fail-to's used to search the list. But if the buffers list was rearranged in order by swapping around fail-to pointers so that the last buffer used was always moved to the first position in the list and if the last position in the list was quickly accessible (maybe the XOR linked list pointer could be adapted, this would provide two benefits as the buffer list would sort itself in the most often used to least often used order over time

the chance of finding the buffer you need closer to the top of the list increases and
the buffer at the end of the list is probably the best candidate for swapping out to virtual memory.

Persistent Buffers

Buffers with a persistent method will be saved to NVM when the system receives a power off interrupt or constantly in a background process.

Stack

In order to continue the theme of unlimited size, the stack must be implemented as a buffer which can be upgraded to a virtual method handler. The chunk allocator should know that chunks buried on the bottom of the stack are less likely to be needed that chunks near the top.

Records / Tables

If the name of the buffer contains a '.' dot, the name before the dot is taken to be a table name and the name after the dot will refer to a field in that table. Operators to select records by key value, relative or absolute record number and to add new or delete old records.

Arrays / Indexed Tables

If the name of a buffer (field or table) ends in a [key], the buffer is taken to be an indexed array. The key value is used to find an individual element or record in the buffer and can be any text (Refer to JavaScript arrays)

Sequential File System

In order to simplify the sequential reading of data that is normally accessed serially, we can modify the normal index chunk system to also include a "streaming" linked list where each chunk pointer to the next chunk of data is stored at the end of the current chunk and is logically XOR'd with the address of the previous chunk so that the sequence can be traversed backwards or forwards.

Directory Chunk (each memory device must have one at a known starting place): File: array of FileHeader
FileHeader: file number
starting chunk
ending chunk
length (chunk count and bytes used in last chunk)
DataChunk: method/file number (for redundancy)?
data
chunk link
OpenFile (in ram, like a fcb): file number
chunk pointer
previous chunk pointer
cache

Comparison of variable types
Name Speed Structure Implemented by

register super fast

static var fast direct pointer

dynamic var medium indirect pointer to pointer

virtual medium slow disk/cache VM manager

file slow disk bios code

\| &h1B ? @PCLState "1"	\| "&f" ? @PCLMacro
	\| "*p" num \| 'X' ...

	Program	Stack	Comment
addr0	Begin (call interpreter)	IPInt <addr0>
	Failto <addr1>	IPInt <addr0> FailtoRun <addr1>
	Match "bad"	IPInt <addr0> FailtoRun <addr1>	;carry flag set and return
		IPInt <addr1>	;FailtoRun Jumps to <addr1>
addr1	Begin	IPInt <addr1>
	Failto <addr2>	IPInt <addr1> FailtoRun <addr2>
	Match "good"	IPInt <addr1> FailtoRun <addr2)	;Carry flag clear
		IPInt <addr1>	;FailtoRun continues at new IP

Name	Speed	Structure	Implemented by
register	super fast
static var	fast	direct	pointer
dynamic var	medium	indirect	pointer to pointer
virtual	medium slow	disk/cache	VM manager
file	slow	disk	bios code