After a long hiatus, I'm in the process of getting back into the PIC
development game. I have some projects for my EV that I'm putting together
that requires embedded computing power and the PIC family will serve well
for that purpose.

For the longest I've been trying to put together a development environment
that I fcould feel productive with. From the code blasting side I'm a firm
believer in bootloading because it facilitates self hosted chip
programming.

I've also worked to find something similar for program development. I got
pretty far along building a small compiler, called NPCI, that compiled into
a threaded token target that's interpreted, a la the Basic Stamp. But
there's absolutely no way that could be self hosted.

Along the way I stumbled upon Forth. It represents and intriguing blend of
power, simplicity, and compactness. It provides an extensible development
environment that can be self hosted in small embedded systems environments.
So I've been studying developing Forth and vowed that when the time came,
that I would work on a self hosted Forth system for the PIC.

Well the time has come. 

>From my last foray into comp.lang.forth a couple of years ago, it seems
that most of the Forth community is targeted towards PC sized systems and
commercial Forth packages. Plus I've lost my Usenet access in the meantime.
So I'm posting this hoping to find an interested Forth enthusiast or two
who is well versed in both Forth, which I am admittedly still not, and PICs
to kick around a design discussion. I figure it's going to be an interative
process of getting something that works, the something that works
efficiently enough to actually use.

So I'm going to start the discussion with an overview of the Forth virtual
machine. Then follow up with some of the design decisions I've made, and
those that are still up in the air.

For the sake of keeping the discussion confined, there are three design
decisions that cannot be changed. Discussion about them should be started
in another thread. Specifically...

1. Keep it to Forth and Forth-like systems. I know there are a ton of
languages and development environments out there. I'm really trying to
figure out how to do it this way.

2. It has to be self hosted on the PIC. I know there are Forth PIC cross
compilers out there. I've already checked them out and will happily point
you to them if you are interested. Again my investigation is whether or not
a usable environment can be completely self hosted. So fundametally we're
talking about a PIC and a dumb serial terminal.

3. The target is the 16F family. I'm already seen quite a few of the
limitations. Also there is already a self hosted 18F Forth, FlashForth, out
and about already. It's always easier to move up the chain than to move
down. I'm trying to figure out if the 16F group has enough resources to be
usable.

The Forth Virtual Machine

The Forth Virtual machine isn't too terribly complicated. At a fundamental
level the data model is a 2 stack machine with a handful of other
registers. The two stacks are the data stack, which is the primary target
for most operations, and the return stack, which is mostly used to keep
track of return addresses for routines. The one design decision is the data
width of these stacks, known as the cell size. For small machines, 16 bits
is pretty standard.  There isn't an obligation for these stacks to be too
terribly deep, generally 32 cells for the data stack, and 16 cells for the
return stack.

The registers are pretty limited. There's an instruction pointer, two stack
pointers, and most implementations use a temp register or two.

A 16F PIC can handle these requirements. The biggest problem is the fact
that the underlaying machine doesn't implement any stacks and only has a
single system for indirect addressing. So the FSR/INDF pair are going to be
overstressed.

The execution model is a threaded interpreter concept. The fundamental
Forth execution unit is the word, which represents a parameterless
subroutine. A Forth system comes with a standard set of words. Developers
can then create additional words to implement the application. For the most
part words are globally visible, so any word can call any other, including
with a bit of work, itself. Forth words can either be written in the native
system language, which are called primitive words, or can be a string of
other Forth words. Primitives, Forth words, standard words, and user
defined words are all indistiguishable from one another. They all look and
behave the same no matter where they came from.

So as a simple example here's a definition for a word that adds 3 to the
top value on the stack:

: ADD3 3 + ;

The integer 3 is pushed on the stack, then the '+' word is called, which
adds the top two stack values, leaving the result on the stack.

Once definied that word can be used by others:

: PRINT3MORE ADD3 . ;

This word takes what's on top of the stack, adds 3 to it, then prints the
result for example.

So for execution, a Forth system does little more than fetch the next word
to execute, and call it. Eventually a primitive word written in the native
system language gets called and does some actual work.

Now there are a ton of ways to implement this. One of the simplest is
simply make each word reference a subroutine call. So PRINT3MORE could be
written as:

PRINT3MORE:
	call	push3
	call	plus
	return

All of the other ways make the code into data which is interpreted. For
example the address of each routine can be stored:

PRINT3MORE:
	dw	push,3,plus,exit

But now there has to be a system that actually reads those addresses, then
jumps to the target for them to execute. In Forth speak this system is
known as the inner interpreter.

Also note there is an extra word added to the end of the list. The exit
word functions as the equivalent to the return. Entry/exit is coming up in
a minute.

One problem that needs to be resolved is the fact that natively implement
primitives are actual code, not addresses. So they are not executed by the
inner interpreter, they just run normally at full speed. However, there has
to be a way to get back to the inner interpreter after they finish. So
instead of a return, or an exit word, they jump back into the inner
interpreter at the point where the instruction pointer (IP) of the virtual
machine is advanced to the next word to execute. This point is generally
called next. So in pseudocode a
primitive like add would be implemented something like:

add:
	; pop off the top cell of the stack
	; add to the next cell of the stack
	goto next

One final point is that for each forth word, which is a list of addresses,
there still has to be at least one native instruction for the inner
interpreter to actually jump to. And unlike a primitive, which asks the
inner interpreter to simply get the next address of the next word, when
going into a forth word, the IP needs to be changed to the start of the
list of words for that word, and the old IP needs to be saved so that you
can be back to the calling word when the current word finishes. So each
forth word for these reasons have a real instruction, called the code
field, that is executed when the word starts. For each forth word, this
goes to the entry routine, which sets up the inner interpreter to run the
new word. So PRINT3MORE would actually be something like:


PRINT3MORE:
	goto	entry
	dw	push,3,plus,exit

and entry would do the following:

entry:
	; push the old IP on the return stack
	; Change the IP to point to the first word of the new routine
	goto innerinterpreter

One last option, which is the one that I'm starting with, is instead of
putting the actually addresses of the routines for the words, to instead
replace them with tokens. So the Forth routines are a list of tokens, and a
token table holds the addresses of the targets. So PRINT3MORE would be:

PRINT3MORE:
	goto	entry 	; Set up to run this word
	dt	PUSH,3,PLUS,EXIT

Where PUSH,PLUS, and EXIT are token numbers for their repective routines.
It adds another level of indirection but if you limit to 256 tokens, can
implement with one word per token and a 512 entry token table.

The Dictionary

Once we've established our list of words, in order to create new words, we
need to be able to find the old words on the system. In Forth this is done
using a linked list called the dictionary. It's little more than a pointer
to the next entry, the name of the entry, and some other fields, such as
the token number for the word for example.

The one issue I have (because I haven't implemented it yet) with the
dictionary is the space that names take up. I'm trying to figure out a
balance between using the whole name, and not having it available at all.
The compromise I'm thinking of is hashing up the name and storing the hash
value plus some truncated start of the name (like maybe 3 or 4 chars?).
That way I can use long names, but not have to store them onboard.

The problem with a hash is collisions. And you cannot create a perfect hash
for a name without knowing all the names in advance. What I settled on is
double hashing. Hash with two different functions. Then it's unlikely (but
not impossible) for two different names to hash up to the same value with
both hashes. So storing a 2 2 byte hashs, a length, and the first 3 chars
would give a pretty unique name in only 8 bytes of space.

Compiling new words

The key to the forth system is the fact that you can compile new words.
Since each word is just a simple list, no complicated compiler is required.
Just read a word, look it up in the dictionary, get its token, add that
token to the list of the word being compiled. Then hash up the name, add to
the dictionary, and write the new compiled word to flash. The list of
standard words in forth facilitate this process.

Execution is equally as simple. Find a word, get its address (known as a
execution token in forthspeak) and tell the interpreter to go run it.

Current status

I've already written up a small set of primitives (add,sub,logic, push
numbers) and tested them. I've also put in place the token fetcher (which
uses computed goto/retlw), token to address lookup (also computed goto/retlw),
and the inner interpreter which uses these two to get the next token,
locate its target, and then use a computed goto to jump to it. The next
routine for primitives, which increments the 16 bit IP and drops back into
the inner interpreter is done. Also the entry system is done, though I
haven't written the exit word yet. Currently I'm working on the memory
access routines which are a bit of a problem because there's only FSR/INDF
pair. It would be much cleaner if there were two. Right now I'm copying the
stuff off the stack into a temp area, then using the temp area data to
reset FSR to the target, then putting FSR back to stack before returning.

Design questions

There's quite a bit left to figure out. I haven't figured out the right
balance for the thread model yet. Direct addressing would be faster because
there would be no table lookup. If the address is accessed using computed
goto/retlw then you'd have to have 2 words for each address. I'm also
wondering if maybe directly accessing the program memory via the
EEADDR/EEDATA system would be a win. That way I could encode 14 bit
addresses into each word and only have one computed goto instead of the 4
(one for the token, two for the token address, one to get to the target
word) that I have now. Now that I'm writing it, I'm pretty sure I should go
back a recode it this way. It'll execute faster and I won't have to waste
space tracking tokens.

Banked memory is really awful. Tracking PCLATH and the IRP bit is truly a
pain. Plus I'll save another word in each dictionary definition.

The big issue is what to encode as primitives. Two examples are 1+:

: 1+ 1 + ; 

Just an increment. Only two words in forth. However it would execute faster
being natively coded. Another example is writing and reading ports. One
could implement PORTA as a constant address. Then write two forth routines
to read/write it:

: !PORTA PORTA ! ;
: @PORTA PORTA @ ;

Or should these be written natively for speed?

Interesting questions indeed.

Resources

I learned a lot from a lot of places. Brad Rodriguez has a whole series on
writing forths for small systems (6809, Z80). You can find there here:

http://www.bradrodriguez.com/papers

Jones Forth for the PC is a great tutorial. Explains everything I outlined
above and more. Shows the code for a ton of primitives. The Intel Pentium
is the target, but the concepts are good:

http://www.annexia.org/forth

Sean Barrett's Obfuscated C code entry implements a primitive forth kernel
in C then proceeds to define a fuller forth using it. A ton of nuts and
bolts of manipulating the stack to implement compiling.

http://www.ioccc.org/1992/buzzard.2.design

BAJ
-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist