On Fri, Jan 07, 2011 at 12:29:53PM -0500, Tamas Rudnai wrote: > As far as I concern if you do that you will have problems with the string= s, > so if WHILE is a token of '[' then your string cannot contain that > character. It is because in a tokenized form each character is a token > itself already. If you need to use state machines to handle strings (aka. > saying a string is always start and ends with quotation marks, therefore > does not matter what is inside the string), then we are talking about par= ser > rather than a tokenized interpreter. I don't think this is that big an issue. One simple way to resolve this problem is to simply embed the length into the string. Then the interpreter's level of "parsing" is to add the length to the beginning of the string to get to the end. Yes I realize that some entity has to parse. But if there is a compiler for human readable text, then it isn't that complicated to do.=20 Say for the sake of argument you use a base 27 encoding starting from the ASCII '@' symbol for the length. So '@' is length 0, 'A' is 1 and so forth. Use the format ' to encode strings. Use a single quote since James picked double quote for print. Just for kicks let's take some small bit of C's backslash encoding (with the backslash included in the length. So for the class Hello World! program the bytecode would be: "'NHello World!\n [1] The 'N' between the ' and the 'H' of the Hello World is the length. Since it's 14 characters, use the 14th character of the alphabet 'N' as the legnth.=20 BTW a 2 character base-32 length (start with '0') would encode up to 1024 characters in a 2 character ASCII format. Or maybe a single Base-64 character which is also quite printable. Now James, you do realize that expressions would have to be coded in RPN? That's the only way to do it so that a parser is not required. So instead of: c:z@e-f+1 ;c is the e'th byte after z plus 1 less f You would need something like: Xze@f-1+c:$ Where X and $ bound the expression. Once you do this, fundamentally you end up with a micro version of Forth, which is exactly why Forth is the target language I'm trying to implement. RPN facilitates strict left to right executing without parsing. All you need is a tokenizer and a operand stack.=20 BAJ >=20 > Tamas >=20 >=20 > On Fri, Jan 7, 2011 at 5:12 PM, James Newton wr= ote: >=20 > > I've thought about that sort of thing for a long time... > > > > When translating ASCII text keywords into bytecode, my question is: Why > > can't the bytecode be ASCII and use symbols that relate to their functi= on? > > This would give you the ability to directly code in the "bytecode" syst= em > > without having or using a compiler / tokenizer and would make debugging > > much > > easier. > > > > For example, why can't the bytecode for "IF" be "?" > > > > [ WHILE > > ] WEND > > { DO > > } LOOP > > & AND > > " PRINT > > Etc... > > > > Operations that don't relate to common ASCII punctuation marks can be > > translated to uppercase letters: > > P (Port) set SRC or DST to IO pins. E.g. 2P1 references pin 1 of = port > > 2 > > H set last pin referenced (H)igh and make it an output > > L set last pin referenced (L)ow and make it an output > > I set last pin referenced as an (I)nput. Read value to bit > > accumulator > > and set true flag if 1 > > Wdd (W)ait dd mSecs > > Wdddd (W)ait dddd uSecs > > > > http://techref.massmind.org/techref/piclist/cump/bytecode.htm > > > > The first variable used in the source code, which might be named "myDat= a", > > could be tokenized as "a", the second as "b". Then you can still read t= hem > > and have a way of translating that back to the variable name without an= y > > code between. > > > > If you leave some room between the actual register address defined by "= a" > > and "b', such that "a" references register 0 and "b' references registe= r 4, > > then multi-byte operations can be specified by a trailing length. So a:= b3 > > copies register 4,5,6, and 7 to registers 0,1,2, and 3. If multibyte va= lues > > are taken in LSB first (little endian) order, then a:b3+c3 actually mov= es a > > LONG at b to a then adds c as a LONG. > > > > If you work with the idea of every operation having a destination, > > operation > > and source, and you don't execute the operations until one of those is > > specified more than once, or the line ends then you can make a "bytecod= e" > > that is easy to execute in the PIC and that is VERY human readable > > > > Examples: > > c:z@e-f+1 ;c is the e'th byte after z plus 1 less f > > 1P2L ;set port 1 pin 2 low > > a:0 ;clear a > > [a > a:a+1 ;a more complex byte code interpreter might handle a++ > > b:8P ;read the value of port 8 (might be ADC) into variable = b > > b>c?1P2H] ;once it reads a value higher than c (set prev) set pin > > high > > > > ;putting the wend on the line with the ? means it will only be executed= if > > ;the condition is false, so we fall out when the desired minimum is rea= d. > > > > a=3DFF?"Timeout! Input Level=3D"b > > > > The engine that executes those codes is actually very easy to write (I > > think) and is described here: > > http://techref.massmind.org/techref/idea/minimalcontroller.htm I think = it > > could be made to fit into most of the PICs, if not the littlest ones. > > > > Different interpretations of the punctuation marks can be used to optim= ize > > for different functions. I think a compiler can be written in the same = way: > > http://techref.massmind.org/techref/language/meta-l/index.htm > > > > > > -- > > James Newton > > 1-970-462-7764 > > > > -----Original Message----- > > From: piclist-bounces@mit.edu [mailto:piclist-bounces@mit.edu] On Behal= f > > Of > > Neil Cherry > > Sent: Sunday, January 02, 2011 16:02 > > To: Microcontroller discussion list - Public. > > Subject: Re: [PIC] P-code/Toeknizer > > > > So far I've managed to find at least one system that might be of > > interest, maybe two: > > > > http://darjeeling.sourceforge.net/ > > http://www.harbaum.org/till/nanovm/index.shtml > > > > And before anyone gets too excited, all I'm really doing nothing more > > than investigating an idea (I get lots of them). What I was thinking > > about was something like the BASIC Stamp stuff. Something with an > > Ethernet and easy to write quick code for. Something not too > > expensive. So far I'm meeting none of these goals. But it is getting > > closer. > > > > As usual it's for my home automation projects and before anyone gets > > too excited and suggest I build it myself I don't have the spare > > cycles. So far my mind is tied up on the main hardware/software > > architecture and sub-systems. I'll probably come back to this at a > > later date. > > > > -- > > Linux Home Automation Neil Cherry ncherry@linuxha.com > > http://www.linuxha.com/ Main site > > http://linuxha.blogspot.com/ My HA Blog > > Author of: Linux Smart Homes For Dummies > > -- > > http://www.piclist.com PIC/SX FAQ & list archive > > View/change your membership options at > > http://mailman.mit.edu/mailman/listinfo/piclist > > > > -- > > http://www.piclist.com PIC/SX FAQ & list archive > > View/change your membership options at > > http://mailman.mit.edu/mailman/listinfo/piclist > > >=20 >=20 >=20 > --=20 > int main() { char *a,*s,*q; printf(s=3D"int main() { char *a,*s,*q; > printf(s=3D%s%s%s, q=3D%s%s%s%s,s,q,q,a=3D%s%s%s%s,q,q,q,a,a,q); }", > q=3D"\"",s,q,q,a=3D"\\",q,q,q,a,a,q); } > --=20 > http://www.piclist.com PIC/SX FAQ & list archive > View/change your membership options at > http://mailman.mit.edu/mailman/listinfo/piclist --=20 Byron A. Jeff Department Chair: IT/CS/CNET College of Information and Mathematical Sciences Clayton State University http://cims.clayton.edu/bjeff --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .