[...]
All this talk about expanding the 16CXX instruction set is almost
certainly just talk as far as Microchip is concerned.  However it may not
be very long before other companies start bringing out "Enhanced
pic-type" chips much like all the "enhanced" 8051's out there (unlike the
8051, the PIC is a solid foundation to start enhancing)

>> That wouldn't be my first choice for expanding the instruction set.
>For
>> that I'd like to have two instructions to move any bit directly in
>and
>> out of the Carry bit.  This would speed up serial I/O and general
>> bit-shuffling.  Also nice would be a way to complement a bit, which
>the
>> 17CXX has.
>
>I don't really think there's space for those instructions in the 16Cxx
>opcode set.  I agree there are probably more useful instructions than
>"movf" [without flags] but it would have fit very nicely in the "CLRW"
>group [i.e. make it so "clrf xxx,w" maps to "movfnoflags xx,w".  At
>this
>point that could not be changed without forcing people to re-assemble
>their code (substituting "ANDLW 0" for clrw) but otherwise I think the
>hardware mod would be quite simple.

ANDLW 0 will clear W and set Z, MOVLW 0 will clear W without affecting Z.
 So the CLRW instruction is quite unnecessary.  But it is obviously an
extension of the CLRF instruction and making it do something other than
put zero somewhere may be difficult.
>
>As for whether the bit ops would fit... hmm... I think you'd be
>limitted
>to one "bit"-style instruction and for that I already have an idea in
>mind [though not a good name]: bmagic.
>
>The "bmagic" instruction behaves identically to "bsf" *unless* it
>follows
>a skip(*) in which case it would be *executed* as a "bcf".  Thus, you
>could move a bit to carry via:
>
>        btfsc   Mybyte,Mybit
>         bmagic C
>
>and could also move bits to other bits, with or without negation, etc.
>The instruction would also be handy following "incfsz" and "decfsz"
>instructions.
>
Interesting, but likely hard to do since this means tampering with the
pipeline in the PIC.  In the present design, the execution unit
apparently forces the prefetched instruction to all zeros (NOP) if a skip
is found necessary.  The bmagic instruction would require an additional
decoder to determine if the prefetched instruction is bmagic and modify
rather than clear it (probably not time to do this).  Or, rather than
clearing, a 'skip' flag could be added to the execution unit to make it
execute all instructions except bmagic as NOP.  This would likely double
the size of the microcode ROM so it could be expensive.

>(*) Note: If a "bmagic" instruction immediately followed a "GOTO",
>"CALL",
>"RETLW", or "RETURN", it would get executed as a "BCF" instruction
>[since
>the CPU was trying to skip it].  Interrupt handling shouldn't pose a
>problem provided that the interrupt unit allows any pending "bmagic"
>instruction to execute unimpeded.
>
>Would there be any particular use for move-bit to or from carry which
>could not be handled just as well by "bmagic"?  The only one I can
>think
>of would be shifting (where the carry-flag update is automatic) but
>that's
>it.

That does happen a lot, for example inputting or outputting serial data
from a port pin.  I think the move to C instructions would be easier to
implement than "bmagic" since they could be done entirely in the
execution unit.  "bmagic" does make it easy to complement the bit in the
process by reversing the polarity of the test so a nice companion to the
move bit would be a complement C instruction, which could be used with
the new mvbc and btfsc to do bit-by-bit xors.  Another use is the
arithmetic shift right, which preserves the value in bit 7:
        mvbc    foo,7
        rrf     foo,f

A while ago, someone mentioned that the SPARC processor has an option to
either discard or execute the prefetched instruction after a change in
program flow.  (of course the btfsX instructions would *always* skip
since there isn't much point if they don't)  This wouldn't need much
modification to the PIC core and could make table lookups 2 cycles
faster.  With the bit set, just do

        addlw   low(table_start-1)      ;Offset into table (be
sure
                                        ;PCLATH set up)
        movwf   PCL                     ;Vector to table
        goto    $+1                     ;After MOVLW in table, do
this
                                        ;to get back here
        (next)
The table is a list of movlw's.  The instructions executed are:
fetch:   MOVWF  GOTO    MOVLW   (next)
execute: ADDLW  MOVWF   GOTO    MOVLW   (next)

Also this would call and return subroutines without missing a cycle, but
it would be necessary to move the CALL or RETURN up one, causing the
instruction after the CALL to be executed both once before and once after
the subroutine.

        (prev)
        call    sub
        movlw   5
        (next)

sub
        andlw   2               ;
        return
        addlw   1

Present PIC
fetch:  CALL    MOVLW   ANDLW   RETURN  ADDLW   MOVLW   (next)
exec:   (prev)  CALL    NOP     ANDLW   RETURN  NOP     MOVLW
No skip PIC
fetch:  CALL    MOVLW   ANDLW   RETURN  ADDLW   MOVLW
        (next)
exec:   (prev)  CALL    MOVLW   ANDLW   RETURN  ADDLW   MOVLW

This would take quite a bit of getting used to for assembler programmers
like me.  Compilers should be able to handle it better.  But, it may
waste more time and instructions turning this feature off and on than
could be saved by using it.



< about expanding RETFIE so it loads STATUS from RAM at the same time>
>Interesting notion.  I think it would probably be better in practice
>to
>make the "retfie" load W instead of status; this ability to load W
>would
>also be handy in many other contexts as well.  I like the notion of
>having
>a bit in RETxx which specifies whether it should set the
>interrupt-enable;
>combined with the ability to fetch an operand that could be very
>handy.

I thought about that, the reason I went with STATUS is that it takes 2
instructions to save or load STATUS and only one to do W (which could be
a MOVFW since STATUS will be rewritten with the return and load).  If the
W register were mapped in file space (a highly desirable feature which
you mentioned below), then the return and load W instruction could be
made to act as a simple return by pointing it at W.  Instructions to load
and save STATUS directly may be useful in ISRs and other circumstances
but I doubt there's space for them.

>
>Shadow registers are a nice concept; if you don't nest interrupts the
>hardware cost is probably not too unreasonable, but they allow
>interrupt
>latency to be practically nullified.

A shadow for PCLATH would also be useful.  Without hardware support of a
data push-pop stack, nested interrupts are going to get too
time-consuming to be worth attempting.
>
>Anyhow, there are four more magic instructions I'd like to see...
>
>[1] "forcef" this instruction would fetch its operand from a register,
>leave it in the operand buffer, and disable the operand fetch on the
>next
>instruction.  For example, if W held 5, then the sequence:
>
>        forcef  Foo
>        addwf   Bar,f
>
>would place [Foo+5] into Bar.  A more typical use would be:
>
>        forcef  foo
>        movnf   Bar,f   ; movnf = move, without bothering flags
>
>which would copy Foo into Bar without affecting anything else.

I've never had much occasion to do this, more general-purpose is an
instruction which exchanges W and a register.

        xchwf   bar     ;save W into bar
        movfw   foo     ;Get foo
        xchwf   bar     ;bar = foo, recover old W

In the example above, the pesky MOVFW will affect Z.  But, if the value
of foo didn't need to be preserved the movfw could be replaced with an
xchwf, leaving the old value of bar in foo (a complete exchange of foo
and bar).  The xchwf could also be the no-flags load W, if it didn't
matter that the value in RAM was lost (such as when returning from an
interrupt).

>[2] "forcel"; this instruction would behave like "forcef" except that
>the
>operand would be taken from the opcode itself [as in "xxxLW"
>instructions).

The place I would find this useful would be for resetting a counter, e.g.
a counter that goes from 40 down to 1 then starts at 40 again:

        decf    counter,f       ;Decrement counter
        forcel  .40
        skpnz                   ;Did counter reach 0?
        movf    bar,f           ;Yes, restart it.

But this won't work with your forceX instructions since they only keep
for one instruction.  So may as well put the value in W and save it if
necessary.  A decfsnz instruction as on the 17CXX would be useful in
situations like these.
>
>[maybe] "forcew" [optional]; this instruction would place W in the
>operand
>buffer (I don't know how easily this could be done).  If W were a
>readable
>register, this wouldn't be needed (note: there's no real reason why
>WREG
>needs to be writable since anything that can be done with WREG could
>be
>done to W).
>
>[3a] "btfgs"; goto if bit set.  This instruction would cause the next
>instruction to be skipped; if the bit was set, the bits in the
>"pending
>instruction" register would be copied to the PC.  This would allow for
>jumps anywhere in the address space without need for PCLATH, and would
>reduce the time required for a conditional jump from three cycles to
>two.
>
>[3b] "btfgc"; [same as above, but if bit clear]
>
>[4a] and [4b]; "btfcs" and "btfcc"; call if bit set/clear.  I'd handle
>these by using the MSB of the destination address to select GOTO/CALL.
>
This is a real mess, you're introducing two-word instructions now.  So an
entirely new data path from the instruction register to the PC would need
to be added, along with logic to not attempt to execute the program words
which are long addresses rather than instructions.

What may be nice would be versions of goto and call that use W rather
than PCLATH for the high address.  This would make long gotos or calls
quicker but then it wouldn't be possible to pass parameters in W.

>In addition, I'd like to see the following "register" enhancements:
>
>[1] Make bit 6 of PCLATH a "goto/call" flag (bits 5 and 7 would remain
>unused).  If bit 6 is set, make any writes to PC perform a CALL rather
>than a GOTO.  This would save two cycles off the typical:
>
>Springboard:
>        movwf   PC
>        ...
>        ; put whatever in W, then...
>        call    Springboard

The no-skip logic would also allow 4-cycle table access, albeit with a
lot more confusion.  The place to control that is probably also one of
the high bits of PCLATH.

I've had a couple of times when a no-return RETURN would be useful, i.e.
it would pop (and ignore) the top address on the stack so the last CALL
would be effectively converted to a goto and the next RETURN would go to
the second to last CALL.
>
>[2] Add two new "indf" addresses for "indfi" and "indfd" which would
>post-increment or post-decrement fsr.

Not sure if this is worth dedicating much hardware and file space to.
>
>[3] Add W as a readable register (to allow "btfsc", "rrf", etc. to be
>done
>on it usefully).  I don't think it needs to be writable (the only
>instructions which would benefit from that ability would be "bsf" and
>"bcf"; these are synonymous with "andlw" or "iorlw" except that the
>latter
>affect flags.

This is very useful for the test and rotate, such as multiplying or
dividing by a constant to generate an address.
>