After a long hiatus, I'm in the process of getting back into the PIC development game. I have some projects for my EV that I'm putting together that requires embedded computing power and the PIC family will serve well for that purpose. For the longest I've been trying to put together a development environment that I fcould feel productive with. From the code blasting side I'm a firm believer in bootloading because it facilitates self hosted chip programming. I've also worked to find something similar for program development. I got pretty far along building a small compiler, called NPCI, that compiled into a threaded token target that's interpreted, a la the Basic Stamp. But there's absolutely no way that could be self hosted. Along the way I stumbled upon Forth. It represents and intriguing blend of power, simplicity, and compactness. It provides an extensible development environment that can be self hosted in small embedded systems environments. So I've been studying developing Forth and vowed that when the time came, that I would work on a self hosted Forth system for the PIC. Well the time has come. >From my last foray into comp.lang.forth a couple of years ago, it seems that most of the Forth community is targeted towards PC sized systems and commercial Forth packages. Plus I've lost my Usenet access in the meantime. So I'm posting this hoping to find an interested Forth enthusiast or two who is well versed in both Forth, which I am admittedly still not, and PICs to kick around a design discussion. I figure it's going to be an interative process of getting something that works, the something that works efficiently enough to actually use. So I'm going to start the discussion with an overview of the Forth virtual machine. Then follow up with some of the design decisions I've made, and those that are still up in the air. For the sake of keeping the discussion confined, there are three design decisions that cannot be changed. Discussion about them should be started in another thread. Specifically... 1. Keep it to Forth and Forth-like systems. I know there are a ton of languages and development environments out there. I'm really trying to figure out how to do it this way. 2. It has to be self hosted on the PIC. I know there are Forth PIC cross compilers out there. I've already checked them out and will happily point you to them if you are interested. Again my investigation is whether or not a usable environment can be completely self hosted. So fundametally we're talking about a PIC and a dumb serial terminal. 3. The target is the 16F family. I'm already seen quite a few of the limitations. Also there is already a self hosted 18F Forth, FlashForth, out and about already. It's always easier to move up the chain than to move down. I'm trying to figure out if the 16F group has enough resources to be usable. The Forth Virtual Machine The Forth Virtual machine isn't too terribly complicated. At a fundamental level the data model is a 2 stack machine with a handful of other registers. The two stacks are the data stack, which is the primary target for most operations, and the return stack, which is mostly used to keep track of return addresses for routines. The one design decision is the data width of these stacks, known as the cell size. For small machines, 16 bits is pretty standard. There isn't an obligation for these stacks to be too terribly deep, generally 32 cells for the data stack, and 16 cells for the return stack. The registers are pretty limited. There's an instruction pointer, two stack pointers, and most implementations use a temp register or two. A 16F PIC can handle these requirements. The biggest problem is the fact that the underlaying machine doesn't implement any stacks and only has a single system for indirect addressing. So the FSR/INDF pair are going to be overstressed. The execution model is a threaded interpreter concept. The fundamental Forth execution unit is the word, which represents a parameterless subroutine. A Forth system comes with a standard set of words. Developers can then create additional words to implement the application. For the most part words are globally visible, so any word can call any other, including with a bit of work, itself. Forth words can either be written in the native system language, which are called primitive words, or can be a string of other Forth words. Primitives, Forth words, standard words, and user defined words are all indistiguishable from one another. They all look and behave the same no matter where they came from. So as a simple example here's a definition for a word that adds 3 to the top value on the stack: : ADD3 3 + ; The integer 3 is pushed on the stack, then the '+' word is called, which adds the top two stack values, leaving the result on the stack. Once definied that word can be used by others: : PRINT3MORE ADD3 . ; This word takes what's on top of the stack, adds 3 to it, then prints the result for example. So for execution, a Forth system does little more than fetch the next word to execute, and call it. Eventually a primitive word written in the native system language gets called and does some actual work. Now there are a ton of ways to implement this. One of the simplest is simply make each word reference a subroutine call. So PRINT3MORE could be written as: PRINT3MORE: call push3 call plus return All of the other ways make the code into data which is interpreted. For example the address of each routine can be stored: PRINT3MORE: dw push,3,plus,exit But now there has to be a system that actually reads those addresses, then jumps to the target for them to execute. In Forth speak this system is known as the inner interpreter. Also note there is an extra word added to the end of the list. The exit word functions as the equivalent to the return. Entry/exit is coming up in a minute. One problem that needs to be resolved is the fact that natively implement primitives are actual code, not addresses. So they are not executed by the inner interpreter, they just run normally at full speed. However, there has to be a way to get back to the inner interpreter after they finish. So instead of a return, or an exit word, they jump back into the inner interpreter at the point where the instruction pointer (IP) of the virtual machine is advanced to the next word to execute. This point is generally called next. So in pseudocode a primitive like add would be implemented something like: add: ; pop off the top cell of the stack ; add to the next cell of the stack goto next One final point is that for each forth word, which is a list of addresses, there still has to be at least one native instruction for the inner interpreter to actually jump to. And unlike a primitive, which asks the inner interpreter to simply get the next address of the next word, when going into a forth word, the IP needs to be changed to the start of the list of words for that word, and the old IP needs to be saved so that you can be back to the calling word when the current word finishes. So each forth word for these reasons have a real instruction, called the code field, that is executed when the word starts. For each forth word, this goes to the entry routine, which sets up the inner interpreter to run the new word. So PRINT3MORE would actually be something like: PRINT3MORE: goto entry dw push,3,plus,exit and entry would do the following: entry: ; push the old IP on the return stack ; Change the IP to point to the first word of the new routine goto innerinterpreter One last option, which is the one that I'm starting with, is instead of putting the actually addresses of the routines for the words, to instead replace them with tokens. So the Forth routines are a list of tokens, and a token table holds the addresses of the targets. So PRINT3MORE would be: PRINT3MORE: goto entry ; Set up to run this word dt PUSH,3,PLUS,EXIT Where PUSH,PLUS, and EXIT are token numbers for their repective routines. It adds another level of indirection but if you limit to 256 tokens, can implement with one word per token and a 512 entry token table. The Dictionary Once we've established our list of words, in order to create new words, we need to be able to find the old words on the system. In Forth this is done using a linked list called the dictionary. It's little more than a pointer to the next entry, the name of the entry, and some other fields, such as the token number for the word for example. The one issue I have (because I haven't implemented it yet) with the dictionary is the space that names take up. I'm trying to figure out a balance between using the whole name, and not having it available at all. The compromise I'm thinking of is hashing up the name and storing the hash value plus some truncated start of the name (like maybe 3 or 4 chars?). That way I can use long names, but not have to store them onboard. The problem with a hash is collisions. And you cannot create a perfect hash for a name without knowing all the names in advance. What I settled on is double hashing. Hash with two different functions. Then it's unlikely (but not impossible) for two different names to hash up to the same value with both hashes. So storing a 2 2 byte hashs, a length, and the first 3 chars would give a pretty unique name in only 8 bytes of space. Compiling new words The key to the forth system is the fact that you can compile new words. Since each word is just a simple list, no complicated compiler is required. Just read a word, look it up in the dictionary, get its token, add that token to the list of the word being compiled. Then hash up the name, add to the dictionary, and write the new compiled word to flash. The list of standard words in forth facilitate this process. Execution is equally as simple. Find a word, get its address (known as a execution token in forthspeak) and tell the interpreter to go run it. Current status I've already written up a small set of primitives (add,sub,logic, push numbers) and tested them. I've also put in place the token fetcher (which uses computed goto/retlw), token to address lookup (also computed goto/retlw), and the inner interpreter which uses these two to get the next token, locate its target, and then use a computed goto to jump to it. The next routine for primitives, which increments the 16 bit IP and drops back into the inner interpreter is done. Also the entry system is done, though I haven't written the exit word yet. Currently I'm working on the memory access routines which are a bit of a problem because there's only FSR/INDF pair. It would be much cleaner if there were two. Right now I'm copying the stuff off the stack into a temp area, then using the temp area data to reset FSR to the target, then putting FSR back to stack before returning. Design questions There's quite a bit left to figure out. I haven't figured out the right balance for the thread model yet. Direct addressing would be faster because there would be no table lookup. If the address is accessed using computed goto/retlw then you'd have to have 2 words for each address. I'm also wondering if maybe directly accessing the program memory via the EEADDR/EEDATA system would be a win. That way I could encode 14 bit addresses into each word and only have one computed goto instead of the 4 (one for the token, two for the token address, one to get to the target word) that I have now. Now that I'm writing it, I'm pretty sure I should go back a recode it this way. It'll execute faster and I won't have to waste space tracking tokens. Banked memory is really awful. Tracking PCLATH and the IRP bit is truly a pain. Plus I'll save another word in each dictionary definition. The big issue is what to encode as primitives. Two examples are 1+: : 1+ 1 + ; Just an increment. Only two words in forth. However it would execute faster being natively coded. Another example is writing and reading ports. One could implement PORTA as a constant address. Then write two forth routines to read/write it: : !PORTA PORTA ! ; : @PORTA PORTA @ ; Or should these be written natively for speed? Interesting questions indeed. Resources I learned a lot from a lot of places. Brad Rodriguez has a whole series on writing forths for small systems (6809, Z80). You can find there here: http://www.bradrodriguez.com/papers Jones Forth for the PC is a great tutorial. Explains everything I outlined above and more. Shows the code for a ton of primitives. The Intel Pentium is the target, but the concepts are good: http://www.annexia.org/forth Sean Barrett's Obfuscated C code entry implements a primitive forth kernel in C then proceeds to define a fuller forth using it. A ton of nuts and bolts of manipulating the stack to implement compiling. http://www.ioccc.org/1992/buzzard.2.design BAJ -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist