Cybiko Bytecode Interpreter

Reasons For Inventing The Wheel
Design Decisions
Structure And Use of Bytecode Modules
Internal Register Assignments
Assembler
Assembler Directives
Instruction Operands
Instruction Set
Recent Additions
What Will Come Next

The Cybiko computer is built around the Hitachi H8S processor which, from the programmer's point of view, has many nice features, most notably excellent instruction timing, 32-bit architecture (including a "flat" address space), and plenty of registers. When you need to squeeze the last cycle out of some time-critical piece of code, control hardware, or manage power savings, the H8S is great. But when it is necessary to come up with short programs, the H8S fails. There are several reasons for this:

1. The H8S is a successor to a 16-bit processor, designed to maintain maximum compatibility with its predecessor. These Hitachi processors might appear to be like the Intel i80x86 processors at first glance. But while (in 32-bit "flat" mode) Intel "sacrificed" 16-bit instructions (whenever you employ an instruction that operates upon 16-bit operand[s], it gets prefixed with an extra byte), Hitachi "sacrificed" some 32-bit instructions. Most notably, 16-bit push/pop take 2 bytes each, while their 32-bit counterparts take 4 (!) bytes each. And remember, these are the instructions that the GNU C compiler uses for passing all function arguments to vararg functions and all arguments (except the first 3) to other functions.

2. The H8S has an extensive set of instructions for operating upon bits, bit masks, etc., and these instructions have rather short opcodes. It is amazing what one can do with a carry bit. This is great for operating system (especially device driver) writers, and for die-hard assembler addicts. But from the compiler's perspective, they are almost useless, and simply eat up opcode space, making "useful" (from the compiler's point of view) opcodes much longer.

3. The H8S has a RISC-alike architecture in many respects. First and foremost, the lengths of all its opcodes are multiples of 2 (the length of a machine word). From the perspective of optimizing an opcode set in terms of average code length (by Hitachi engineers), this was a bad thing. RISC architecture also implies that, in any opcode, all involved registers must be denoted explicitly — all you can do with the memory operand is load or store it.

4. It seems that in the famous "time vs. space" tradeoff, Hitachi engineers have always favoured time. The most notable example is the execution flow control instruction subset. There are two "flavours" of most such opcodes: "branches" (opcode names are some derivations of 'branch', opcodes are 2-byte and employ 8-bit relative displacements) and "jumps" (derivations of 'jump', 4-byte, 24-bit displacements, respectively). The funny thing is that, while instructions must be aligned upon 2-byte (word) boundaries, "branch" opcodes address bytes, not words. That is, displacements stored within instructions never have its lower bit set. Even if you (via some cheating) store 1 there, it will be ignored by the internal processor logic. So why not store disp/2 and "expand" it (i.e. left-shift) just before utilization, enlarging byte-addressable range (and thus reducing the need for the dword-sized jumps) by a factor of two? It seems like this would require one extra processor cycle.

The features mentioned above are great if you're dealing with a controller, firmware for the "frozen" built-in system, or just something time-critical. But Cybiko features an extensive object-oriented OS (with pre-emptive multitasking), GUI, an extensive and ever-growing set of applications, and much more. And these features, and much more, must fit in only 250k RAM. So, we started looking for something less expensive in terms of consumed memory, and came up with our bytecode interpreter (which, BTW, currently uses only 950 bytes by itself!).

The above list could be expanded, of course, but there is also another big reason for implementing a bytecode interpreter, which can be explained in even greater detail. Fortunately, Sun Microsystems did the boring work for us by advertising their Java virtual machine with a bytecode interpreter of its own. So to learn more about the huge and undisputed advantages :-) of interpreted vs. native code, you're probably better off with Sun's The Java Language Environment: A White Paper (by James Gosling and Henry McGilton).

Design Decisions

A good interpreter has to be small and fast. These days, "small" is no longer considered a mandatory property, but things look very different when all you have is 250k of RAM shared with other (potentially even more demanding) applications running concurrently. So we have taken some steps to pursue both goals; specifically:

1.   Our interpreter, as a virtual machine, works with registers, not with a [virtual] stack. There are two virtual registers, R0 and R1 (which correspond to real registers er0 and er1 of the H8S processor) which get used as operands. This does not lead to opcode bloat, however, each register has a hard-coded role — in other words, no single opcode contains a "register bit". For example, push implies register R0, pop implies register R1, and so on. This may sound weird, at first, but there are very clever ways to make use of such opcodes, and our existing C compiler proves that. As for opcode implementations, please consider this one:
add:
   add.l er1, er0     ; 2 bytes, 1 cycle
   bra scheduler     ; 2 bytes, 2 cycles

It is that simple.

2. There is no translation layer between CyOS and the hardware on one side, and the bytecode interpreter on the other. For example, push, pop, calln, and retn. Opcodes make use of the regular stack and not the emulated one. Upon calls to CyOS and so called extension functions (see below) which expect the first 3 parameters to be in registers, those parameters' values get placed into the proper registers as a result of standard expression evaluation sequences (e.g. extension functions, written in "regular" C, expect their 3-rd argument to be 'this' and expect it to be passed in register er2, but bytecode interpreter always keeps 'this' in er2). That means that the interpreter does not have to move registers' values around before each call. Thus the bytecode interpreter and the rest of the system are pretty well integrated.

3. The bytecode interpreter has neither 'stack frame pointer' nor 'code buffer pointer'. For those of you familiar with the i80x86 processors: 'stack frame pointer' used to be BP in 16-bit programs, and EBP in 32-bit. Compiler writers managed to always compute local variables' addresses relative to ESP, and thus freed EBP for use as a general register. In GNU C/C++, this is known as 'omit frame pointer' optimization. Our C compiler always does this optimization, so we dropped the very notion of 'stack frame pointer' out of our virtual machine specification.

Almost the same applies to 'code buffer pointer' - there is no such thing. Even such an instruction as leag.u contains displacement relative to the very next instruction, not any "absolute" offset. Other instructions, such as calln.s and jump.c are similar in this respect. In other words, the code is essentially position-independent (a.k.a. PIC).

4. The maximum size of a bytecode module is 64k; therefore, "local" address space is essentially 16-bit and, consequently, "static" offsets (including those used in calln.s opcode) are 2-byte. Furthermore, stack frames are addressed with unsigned 1-byte displacements (thus, a function cannot have more than((255-4-1)&~3)==248 bytes of arguments and 'auto' variables in total); objects are also addressed with unsigned 1-byte displacements (therefore, objects cannot be larger then 256 bytes; in other words, no more than 256 bytes are addressable via 'this', but objects themselves could be of any size up to 64k). But — as soon as any address gets loaded into a register (say, as a result of leal.b bytecode execution, which effectively sums 1-byte offset it contains and current value of stack pointer and places result in R0), it becomes a valid 32-bit address, fully compatible with those used by the rest of the system.

5. There are bytecodes for "object" commands, that is, opcodes that operate upon data addressed relative to the special 'this' pointer. In other words, there are provisions for object-oriented languages, such as C++. The current implementation of C has some OO extensions implemented via the use of these opcodes (which save considerable space). See leat.b for more information.

6. Not all arithmetic opcodes are 32-bit. Multiplication and division are essentially 16-bit in that if their operands do not fit within the -32768..32767 range, results are unpredictable (this does not apply to the dividend). This is a design decision.

7. Unsigned data types are not supported, just like in Java. The only supported data type types are signed char, short (synonym for int), and long. However, unsigned shift right, and specialized Unicode character types found in Java are not supported in the Cybiko bytecode interpreter. Note that the H8S has a 24-bit address space and no virtual memory, so any address is guaranteed to have 8 higher bits clear; in other words, addresses could safely be treated as signed entities and compare correctly. We used this fact for major optimization: we excluded all opcodes for unsigned comparisons.

8. Opcodes are bytes, so there may be up to 256 of them. The thing is that we take the famous "profile anything, assume nothing" rule quite seriously. From this perspective, we want to build as perfect an instruction set as possible while maintaining maximum (including full backward) compatibility with current implementation. Therefore, we decided to implement a minimalist set of opcodes, wait until large pieces of software which make use of that set appear, and then "profile" them to see what bytecode sequences are most used, and then make them into new bytecodes. Similar approaches already proved to be highly efficient with our proprietary compressor.

9. The bytecode interpreter is fully re-entrant. Any number of applications executing bytecode modules (even with different sets of extension functions) share a single in-memory image of the respective dynamic library (bytecode.dl).

Structure And Use of Bytecode Modules

The structure of a module is simple. At the very beginning, there is a table of even number 16-bit words. In other words, there is a word, then some number of pairs of words are followed by one more word. The very first word is an offset to the module entry point: function main(), or whatever you call it; please see the VCC1 compiler documentation for more info on how that compiler handles module entry point (however, your treatment may vary; see below). Pairs are offset (again, relative to the start of the module) from data and code for "exported" objects, respectively (remember — all words, including double words, are stored in BE order!). The very last word is a "terminating NULL". Currently, the following tasks are up to the programmer:

1. How to load the module into memory. You should probably load it like any other resource - from the .app or .dl application archive. You may then keep it in memory (and keep p-code running) until your program terminates, or you may free it upon losing focus (thus effectively suspending p-code execution) and then re-load upon getting focus again.

2. How to interpret data and code pointed to by those offsets at the beginning of the module. For example, Cylandia treats the first offset as the module initialization function's offset (which contains relocation records and objects' ctors), and treats remaining offsets (except for the very last one, of course) as offsets to data and code of game actors. Our C compiler thinks of them as exported structures and respective exported methods.

The only way to execute bytecodes is to call the vm_exec family function found in the bytecode.dl library, like this (below, we assume that entire module got loaded into 'word_t* buff'):

  /* 1) call module initialization routine */

vm_exec( NULL,                                                                     /* no active objects/actors yet */
                (byte_t*) bytecode +
                bytecode[ bytecode[ I_FIRST_EXPORT ] + I_STARTUP ], /* startup code */
                extension functions );                                               /* functions imported by p-code*/

  /* 2) enter main loop */

vm_exec_3( NULL,                                                                  /* no active objects/actors yet */
                (byte_t*) bytecode +
                bytecode[ bytecode[ I_FIRST_EXPORT ] + I_MAIN ],      /* main(): module entry point */
                extension functions,                                                 /* functions imported by p-code*/
                argc, argv, TRUE);                                                    /* arguments */

  /* 3) call module cleanup routine */

vm_exec( NULL,                                                                     /* no active objects/actors yet */
                (byte_t*) bytecode +
                bytecode[ bytecode[ I_FIRST_EXPORT ] + I_CLEANUP ], /* cleanup code */
                extension functions );                                              /* functions imported by p-code*/

Important note: all functions that are called via vm_exec() must return with retf opcode (in other words, they're considered far, as opposed to near functions callable via calln.s).

The very last argument to vm_exec() is the address of the table of functions prototyped as follows (also, see callx.b opcode description):

typedef dword_t (*import_t)( dword_t arg0, dword_t arg1, void* this_ptr );

import_t extension_functions[] = { /* ... */ };

The primary use of extension functions is for implementation of time-critical pieces of code and filling in the gaps which currently exist in the interpreter's import abilities. For example, currently there is no way to import CyOS' global variables, so you'll probably have to provide an extension function which returns the address of a variable (say, a font handle) to get access to it.

Please see the attached example program for further details.

Internal Register Assignments

Internally, bytecode interpreter (i.e. vm_exec() function found in the bytecode.dl dynamic library) uses the following register assignments:

er0 virtual register R0 (accumulator/pointer for indirect loads),

er1 virtual register R1 (aux data register/pointer for indirect saves),

er2 'this' for the object/actor/struct being processed,

er3 scratch register,

er4 table of extension functions (caller-supplied),

er5 table of built-in opcodes,

er6 p-code instruction pointer (static data/code space),

er7 stack pointer.

Assembler

The bytecode assembler - vas utility, understands the following options:

-i display version information (logo),

-d disassemble its own output to stdout (check output integrity),

-h display help (usage info, currently quite terse, to say the least),

-o outfile name output file (stdout by default).

-O Optimize output assembler code

-v Verbose mode � statistic messages about module to be compiled will be generated

If no options are given, vas tries to read stdin and to write to stdout, don�t optimize output code and don�t write statistic messages. Preferable extension for the output file is .bin, which stands for binary. While parsing input:

empty lines are ignored,
semicolon (;) is considered a comment character - the semicolon itself and everything up to the end of line is treated as a comment and is ignored,
if first non-blank character on the line is period (.) that line is considered to be an assembler directive,
otherwise, the line must contain a valid instruction code.
only one instruction name may be placed in one string.

Assembler Directives

Note: forward references (i.e. references to labels which have not been defined yet) are OK. Moreover, the assembler operates so that even resolving backward references is always postponed until the second pass.

.ln <line_number>

Associates all the following instructions (until next .ln directive) with a particular line number within source code in a higher-level language (say, C, C++, BASIC, Pascal, etc.) Compilers should use this directive extensively while emitting code so that the assembler can report errors and other problems in association with the source line that caused them.

.label <global_label>

Introduces a global label. Label name is a number in range [99, 32768]. It does not matter (at all) what that label denotes — program code or data, or even something else (like control information); the label is just assigned the current value of the Program Counter (PC). Two, or more, labels on sequential lines are OK; these labels will be identical.

.disp <global_label> <displacement>

Allocates 2 bytes and then stores there the unsigned distance from the current position of the Program Counter forward to the label <global_label> + <displacement>. I.e. the following sequence

      .disp 555 0
      .label 555

stores 0, not 2; while the following sequence stores 7:

      .disp 333 0
      .skip 7
      .label 333

Currently, this directive is used in conjunction with the switch opcode only.

.ldisp <global_label> <displacement>

Allocates 4 bytes and then stores there the distance from the current position of the Program Counter to the label <global_label> + <displacement>.

.offset <global_label>

Allocates 2 bytes and stores an absolute offset to the label <global_label>. This opcode is currently used for building export tables only.

.even

If current value of Program Counter is an odd number, then this directive emits zero bytes. Please note that there is no need to align code, so this directive is only useful for data elements bigger then one byte.

.skip <number_of_bytes>

Advances Program Counter by specified <number_of_bytes> (i.e. allocates<number_of_bytes> zero bytes).

.byte <number>

Emits byte <number>. Preferred way to initialize static data.

.word <number>

Emits word <number>.

.string <text_in_double_quotes>

Emits string literal with the value <text_in_double_quotes>, plus terminating 0. Unfortunately, there is currently no way to escape any symbols, notably newline and quotable quotation marks. As a workaround, the .byte directive could be used to emit any bytes, but please be warned that, in the future, characters may no longer be single bytes.

.bss

Starts the 'bss' (Blank Storage Segment) segment. Preferred place for uninitialized global variable.

.text

Starts segment of code.

.data

Starts segment of data.

.rodata

Starts segment of data. Duplicate segments of this type will be merged by vlink.

.public <label> "<name>"

Declares public symbol for using in other modules.

.extern <label> "<name>"

Declares extern symbol used in module.

.end

Specifies logical end of the source file. Source files that do not use this directive may not compile with future versions of the assembler.

Instruction Operands

All bytecodes have one explicit operand, at most. However, assembler instructions for some of them seem to allow for one extra argument (see below). Please note that this is only a notation device — a way to tell assembler to adjust target address after converting label number or base address into a final offset (stack-, object-, or buffer-relative).

Suppose the Cybiko C compiler sees something like this:

    int my_array[ 3 ];
    ...
    my_array[ 2 ] = 1;

Obviously, there is no need to load index, scale it (since array elements are word-sized), then load address of the array, then add it to scaled index, etc. Instead,
our compiler will produce something like this:

    .label 777 ; my_array
    .skip 6
    ...
    leag.u 777 4 ; my_array
    move
    load1
    storeis

Here, 777 is label number (ordinal), 6 is the size of my_array[], and 4 is extra displacement relative to the label 777. Note that assembler will turn 'leag.u 777 4' into a single opcode followed by the word-sized displacement (hence .s suffix).

Note that instruction suffixes designate optional data that may follow (and not the size of the operands):

     .c char (that is, signed byte),
     .b byte (unsigned byte),
     .s   short (16-bit signed word),
     .u   unsigned short (16-bit unsigned word),
     .w word (16-bit unsigned word),
     .l   long word (32-bit signed double word).

Instruction Set

seteq

Compares R0 to R1, sets R0 to 1 if registers are equal, or to 0 otherwise.

setlt

Compares R0 to R1, sets R0 to 1 if R1 is less than R0, or to 0 otherwise.

setle

Compares R0 to R1, sets R0 to 1 if R1 is less than or equal to R0, or to 0 otherwise.

setgt

Compares R0 to R1, sets R0 to 1 if R1 is greater than R0, or to 0 otherwise.

setge

Compares R0 to R1, sets R0 to 1 if R1 is greater than or equal to R0, or to 0 otherwise.

setz

Tests R0 and sets it to 1 if it was zero, or to 0 otherwise, effectively performing 'logical not' upon contents of R0. Roughly equivalent to the following opcode sequence: move load0 seteq.

setnz

Tests R0 and sets it to 1 if it was non-zero; roughly equivalent to the following opcode sequence: move load0 setne. This is effectively 'convert to boolean' operator performed upon contents of R0.

cmpe.c <num>

Compares R1 to signed byte <num> (-128..127), sets R0 to 1 if they are equal, or to 0 otherwise. Same as the 'load.c <num> seteq' sequence but more efficient.

cmpe.s

Compares R1 to signed word <num> (-32768..32767), sets R0 to 1 if they are equal, or to 0 otherwise. Same as the 'load.s <num> seteq' sequence but more efficient.

cmpe.l <num>

Compares R1 to signed double word <num>, sets R0 to 1 if they are equal, or to 0 otherwise. Same as the 'load.l <num> seteq' sequence but more efficient.

switch

After this opcode, there must be a table of 16-bit words formed with the respective number of .disp directives, which denote jump displacements for switch values 0, 1, 2, etc. Number of .disp directives must not be less than maximum switch value expected. No bound checks are currently performed at run time (for the sake of effectiveness), so use of this opcode may be tricky! Used for switchfast C statement implementation.

jump.c <label>

Unconditional branch to instruction that is within the -128..127 range relative to the instruction immediately following this one. If target <label>is not within the reach of this instruction, assemble will try to use its 'long' counterpart, jump.s.

jumpz.c <label>

If R0 is zero, then branch to instruction that is within the -128..127 range relative to the instruction immediately following this one. If target <label> is not within the reach of this instruction, assemble will try to use its 'long' counterpart, jumpz.s.

jumpnz.c <label>

If R0 is not zero, then branch to instruction that is within the -128..127 range relative to the instruction immediately following this one. If target <label> is not within the reach of this instruction, assemble will try to use its 'long' counterpart, jumpnz.s.

jump.s <label>

Unconditional branch to instruction that is within the -32768..32767 range relative to the instruction immediately following this one.

jumpz.s <label>

If R0 is zero, then branch to instruction that is within the -32768..32767 range relative to the instruction immediately following this one.

jumpnz.s <label>

If R0 is not zero, then branch to instruction that is within the -32768..32767
range relative to the instruction immediately following this one.

calln.s <label>

Push address of the instruction immediately following call itself on stack, then jump to the instruction that is within the -32768..32767 range from the immediately following one.

calli

Indirect call: push address of the instruction immediately following call itself on stack, then jump to the instruction whose address is currently within R1.

callx.b <index>

Call extension functions represented with its index within the table of extension functions.

calls12.w <index>

Call the CyOS function which takes 0, 1, or 2 arguments. For 1-argument system functions, simply evaluate (just before call) the necessary expression so that the evaluation result is in R0, then use calls12.w. For 2-argument functions, evaluate the second argument, use move, then evaluate the first argument, then use calls12.w (or, if R1 is likely to be lost in the process of evaluation of the first argument, evaluate second argument, use push, evaluate first argument, use pop, then use calls12.w).

calls3.w <index>

Calls CyOS function which takes 3 or more arguments. This opcode first pops 3-rd off the stack, then acts exactly as calls12.w. So you must first evaluate and push arguments 3 and beyond in reverse order, then evaluate arguments 2 and 1, as shown above, then use calls3.w.

calld12.w <index>

Calls function located within CyOS dynamic library or even application that exports [some of] its functions. Very similar to calls12.w and even uses the same calling convention. However, it differs in that is must be immediately followed by the .disp directive which should define offset to a 4-byte static variable that holds address of export table of respective dynamic library. It is that table that is indexed with argument <index>.

calld3.w <index>

Just like calld12.w, calls function located within CyOS dynamic library or even application that exports [some of] its functions. Very similar to calls12.w (uses the same calling convention) and to calld12.w (requires .disp directive); see these opcodes for details.

retf

"Return far" — return from vm_exec().

retn

"Return near" — return from function called via calln.s.

retn.c <num>

Same as retn, but pop arguments off the stack upon return (here, <num> is number to be added to stack pointer just before return).

stack.c <num>

Add -128..127 to stack pointer.

push

Push R0.

pop

Pop R1.

popadd

The same as consequitive execution of pop and add operations. New in virtual machine version 11 (included in SysPack 55).

move

Move R0 to R1; same as "push pop" sequence, but by far more efficient. The "push pop" sequence, however, may also be more convenient sometimes, since it allows you to preserve R1. See calls12.w opcode description.

load0

Load 0 (zero) into R0.

load1

Load 1 (one) into R0.

load.c <num>

Load -128..127 into R0.

load.s <num>

Load -32768..32767 into R0.

load.l <num>

Load double word into R0.

loadic.b <num>

As same as loadic, but the parameter of this operation will be added to the byte pointer before execution. New in virtual machine version 11 (included in SysPack 55).

loadis.b <num>

As same as loadic, but the parameter of this operation will be added to the word pointer before execution. New in virtual machine version 11 (included in SysPack 55).

loadil.b <num>

As same as loadic, but the parameter of this operation will be added to the double word pointer before execution. New in virtual machine version 11 (included in SysPack 55).

leal.b <offset> <displacement>

Load effective address of 'auto' variable into R0.

leat.b <offset> <displacement>

Load effective address of 'object' variable into R0. If byte-sized offset specified in the instruction is 0, then such instruction actually means 'load this into R0'. Compilers for C++ and other OO languages may make use of this feature.

leag.u <label> <displacement>

Load effective address of 'static' variable or function into R0. If effective address is before, assemble will try to use its 'back' counterpart, leagb.u.

loadic

Load char pointed to by contents of R0 into R0.

loadis

Load short pointed to by contents of R0 into R0.

loadil

Load long pointed to by contents of R0 into R0.

storeic

Store char contained in R0 at the address pointed to by contents of R1.

storeis

Store short contained in R0 at the address pointed to by contents of R1.

storeil

Store long contained in R0 at the address pointed to by contents of R1.

loadlc.b <offset> <displacement>

Load char located at the address 'stack pointer' + 'next byte', into R0.

loadls.b <offset> <displacement>

Load short located at the address 'stack pointer' + 'next byte', into R0.

loadll.b <offset> <displacement>

Load long located at the address 'stack pointer' + 'next byte', into R0.

loadtc.b <offset> <displacement>

Load char located at the address 'this' + 'next byte', into R0.

loadts.b <offset> <displacement>

Load short located at the address 'this' + 'next byte', into R0.

loadtl.b <offset> <displacement>

storelc.b <offset> <displacement>

Store char from R0 at the address 'stack pointer' + 'next byte'.

storels.b <offset> <displacement>

Store short from R0 at the address 'stack pointer' + 'next byte'.

storell.b <offset> <displacement>

Store long from R0 at the address 'stack pointer' + 'next byte'.

storetc.b <offset> <displacement>

Store char from R0 at the address 'this' + 'next byte'.

storets.b <offset> <displacement>

Store short from R0 at the address 'this' + 'next byte'.

storetl.b <offset> <displacement>

Store long from R0 at the address 'this' + 'next byte'.

loadgc.u <label> <displacement>

Combines leag.u and loadic

loadgs.u <label> <displacement>

Combines leag.u and loadis

loadgl.u <label> <displacement>

Combines leag.u and loadil

load0p

Combines load0 and push.

load1p

Combines load1 and push.

loadp.c

Combines load.c and push.

loadp.s

Combines load.s and push.

loadp.l

Combines load.l and push.

lealp.b

Combines leal.b and push.

leatp.b

Combines leat.b and push.

leagp.s

Combines leag.u and push.

loadicp

Combines loadic and push.

loadisp

Combines loadis and push.

loadilp

Combines loadil and push.

loadlcp.b

Combines loadlc.b and push.

loadlsp.b

Combines loadls.b and push.

loadllp.b

Combines loadll.b and push.

loadtcp.b

Combines loadtc.b and push.

oadtsp.b

Combines loadts.b and push.

loadtlp.b

Combines loadtl.b and push.

loadgcp.s

Combines loadgc.u and push.

loadgsp.s

Combines loadgs.u and push.

loadglp.s

Combines loadgl.u and push.

load0m

Combines load0 and move.

load1m

Combines load1 and move.

loadm.c

Combines load.c and move.

loadm.s

Combines load.s and move.

loadm.l

Combines load.l and move.

lealm.b

Combines leal.b and move.

leatm.b

Combines leat.b and move.

leagm.s

Combines leag.u and move.

loadicm

Combines loadic and move.

loadism

Combines loadis and move.

loadilm

Combines loadil and move.

loadlcm.b

Combines loadlc.b and move.

loadlsm.b

Combines loadls.b and move.

oadllm.b

Combines loadll.b and move.

loadtcm.b

Combines loadtc.b and move.

loadtsm.b

Combines loadts.b and move.

loadtlm.b

Combines loadtl.b and move.

loadgcm.s

Combines loadgc.u and move.

loadgsm.s

Combines loadgs.u and move.

loadglm.s

Combines loadgl.u and move.

inc1

Add 1 to R0.

inc2

Add 2 to R0.

inc4

Add 4 to R0.

incic1

Add to byte addressed by eR0 register. New in virtual machine version 11 (included in SysPack 55).

incis1

Add to word addressed by eR0 register. New in virtual machine version 11 (included in SysPack 55).

incil1

Add to double word addressed by eR0 register. New in virtual machine version 11 (included in SysPack 55).

dec1

Subtract 1 from R0.

dec2

Subtract 2 from R0.

dec4

Subtract 4 from R0.

decic1

Subtruct 1 from byte addressed by eR0 register. New in virtual machine version 11 (included in SysPack 55).

decis1

Subtruct 1 from word addressed by eR0 register. New in virtual machine version 11 (included in SysPack 55).

decil1

Subtract 1 from double word addressed by eR0 register. New in virtual machine version 11 (included in SysPack 55).

patch

Some words after this opcode are a list of patches. A zero word (.word 0) is a marker of the end of the list. Every word is a relative offset of a label (.disp). Every label is a label of ".ldisp" (address of variable). To each value of .ldisp the absolute address following it byte will be added. New in virtual machine version 11 (included in SysPack 55).

lshift1

Arithmetically shift R0 left by 1 bit.

lshift2

Arithmetically shift R0 left by 2 bits.

rshift1

Arithmetically shift R0 right by 1 bit.

rshift2

Arithmetically shift R0 right by 2 bits.

add

Add R1 to R0, result in R0.

add.c <num>

Add -128..127 to R0, result in R0.

add.s <num>

Add -32768..32767 to R0, result in R0.

add.l <num>

Add double word to R0, result in R0.

sub

Subtract R0 from R1 (!), result in R0.

neg

Negate R0.

mul

Multiply R0 by R1, result in R0.

mul.c

Multiply eR0 by parameter of this command. New in virtual machine version 11 (included in SysPack 55).

muladd.c

As same as consequtive execution of "mul.c" and "add" commands. New in virtual machine version 13 (included in SysPack 56).

jumple.c

Jump, if eR0 <= eR1. In that case eR0 = 0 after execution, otherwise eR0 = 1. New in virtual machine version 13 (included in SysPack 56).

jumpge.c

Jump, if eR0 >= eR1. In that case eR0 = 0 after execution, otherwise eR0 = 1. New in virtual machine version 13 (included in SysPack 56).

div

Divide R1 by R0 (!), result in R0.

mod

Divide R1 by R0 (!), remainder (result) in R0.

and

Bitwise AND of R0 and R1, result in R0.

Bitwise OR of R0 and R1, result in R0.

xor

Bitwise eXclusive OR of R0 and R1, result in R0.

Recent Additions.

Syspack V.	Bytecode V.	Own version in root.inf	Changes
53, 54	10	N\A	initial implementation
55	11	N\A	decic1, incic1, decis1, incis1, decil1, incil1, popadd, patch, loadic.b, loadis.b, loadil.b, mul.c
56	13	N\A	muladd.c, jumple.c, jumpge.c, leagb.u
57	14	1.3.3	"root.inf" added
N\A	15	1.4.4	lshift, rshift, urshift1, urshift2, urshift, native, bcopy.b, swap, push2, pop2, move2, nop
N\A	16	1.5.1	loadiuc, loadius, loadiuc.b, loadius.b, loadluc.b, loadlus.b, loadtuc.b, loadtus.b, loadguc.u, loadgus.u, cuwd, cubd, deciuc1, inciuc1, decius1, incius1, setb, setbe, seta, setae, mulu.c, mulu, divu, modu
N\A	18	1.7.1	callni.b

What Will Come Next

Since we assumed nothing, we'll profile anything :-) and come up with the rest of 256-147=109 opcodes. Most probably, there will appear direct loads/saves for static variables and better (read: some :-) support for increments and decrements for variables (as opposed to memory locations denoted with address expressions).

Assembler should become macro assembler (not a top priority task, though).

Life should become easier :-).

Cybiko Bytecode Interpreter

Table of Contents