> All this talk about expanding the 16CXX instruction set is almost
> certainly just talk as far as Microchip is concerned.  However it may not
> be very long before other companies start bringing out "Enhanced
> pic-type" chips much like all the "enhanced" 8051's out there (unlike the
> 8051, the PIC is a solid foundation to start enhancing)

Right.  I think, though, the 16Cxx is a more solid foundation than the
17Cxx.  The 17Cxx adds a bunch of stuff, but there seems IMHO to be less
consistency about it.

> >I don't really think there's space for those instructions in the 16Cxx
> >opcode set.  I agree there are probably more useful instructions than
> >"movf" [without flags] but it would have fit very nicely in the "CLRW"
> >group [i.e. make it so "clrf xxx,w" maps to "movfnoflags xx,w".  At
> >this
> >point that could not be changed without forcing people to re-assemble
> >their code (substituting "ANDLW 0" for clrw) but otherwise I think the
> >hardware mod would be quite simple.
>
> ANDLW 0 will clear W and set Z, MOVLW 0 will clear W without affecting Z.
>  So the CLRW instruction is quite unnecessary.  But it is obviously an
> extension of the CLRF instruction and making it do something other than
> put zero somewhere may be difficult.

I don't know how Microchip actually implements the guts of their PIC, but
I would expect that it shouldn't be that hard a mod.  In particular, you
don't need to worry about the change in functionality until after the
operand fetch; there should be plenty of time in the pipeline to decode
that if the most significant 7 bits are "x", they should be interpreted to
do "y".

> >As for whether the bit ops would fit... hmm... I think you'd be
> >limitted
> >to one "bit"-style instruction and for that I already have an idea in
> >mind [though not a good name]: bmagic.
> >
> >The "bmagic" instruction behaves identically to "bsf" *unless* it
> >follows
> >a skip(*) in which case it would be *executed* as a "bcf".  Thus, you
> >could move a bit to carry via:
> >
> >        btfsc   Mybyte,Mybit
> >         bmagic C
> >
> >and could also move bits to other bits, with or without negation, etc.
> >The instruction would also be handy following "incfsz" and "decfsz"
> >instructions.
> >
> Interesting, but likely hard to do since this means tampering with the
> pipeline in the PIC.  In the present design, the execution unit
> apparently forces the prefetched instruction to all zeros (NOP) if a skip
> is found necessary.  The bmagic instruction would require an additional
> decoder to determine if the prefetched instruction is bmagic and modify
> rather than clear it (probably not time to do this).  Or, rather than
> clearing, a 'skip' flag could be added to the execution unit to make it
> execute all instructions except bmagic as NOP.  This would likely double
> the size of the microcode ROM so it could be expensive.

I guess it's possible that Microchip physically zeroes the fetched
instruction, but I would think it easier to add a "skip" state to the
unit's state machine.  Given that such things need to be handled to
process interrupts and such properly, that would probably be my choice.
Your point, however, is well-taken.  Perhaps "BMAGIC" could perform a bit
test and flip a bit in the next prefetched instruction if true?

> That does happen a lot, for example inputting or outputting serial data
> from a port pin.  I think the move to C instructions would be easier to
> implement than "bmagic" since they could be done entirely in the
> execution unit.  "bmagic" does make it easy to complement the bit in the
> process by reversing the polarity of the test so a nice companion to the
> move bit would be a complement C instruction, which could be used with
> the new mvbc and btfsc to do bit-by-bit xors.  Another use is the
> arithmetic shift right, which preserves the value in bit 7:
>         mvbc    foo,7
>         rrf     foo,f

Note that the above can also be done as:

        rlf     foo,w
        rrf     foo,f

if you don't mind trashing W.

I guess my biggest problem with load/store C is that in many applications
it's desirable to invert a bit in the process of moving it, and you end up
having to use a lot of opcode space to support moving bits around.  If
adding one instruction would extend the functionality of others to allow
moving bits, I think that would be better.

> A while ago, someone mentioned that the SPARC processor has an option to
> either discard or execute the prefetched instruction after a change in
> program flow.  (of course the btfsX instructions would *always* skip
> since there isn't much point if they don't)  This wouldn't need much
> modification to the PIC core and could make table lookups 2 cycles
> faster.

Delayed jumps are nice.  Unless you have seperate opcodes for delayed and
non-delayed jumps, however, they're a pain.  I personally despise flags
which make major changes to how instructions operate; the delay-jumps flag
would probably rank between the '251's mode flag and the 6502's "decimal"
flag on the annoyance scale.

>         movwf   PCL                     ;Vector to table
>         goto    $+1                     ;After MOVLW in table, do

There's a problem with this--it's not obvious, but it can bite and hard.
Suppose an interrupt occurs as the "movwf" is about to be executed [so it
will take control just after].  What address should be pushed on the
return stack?

On the 12-bit cores, though, this would have been a good technique since
there were no interrupts to contend with and it would have allowed RETLW
to be replaced with something else useful (like ADDLW).

> < about expanding RETFIE so it loads STATUS from RAM at the same time>
> >Interesting notion.  I think it would probably be better in practice
> >to
> >make the "retfie" load W instead of status;

> I thought about that, the reason I went with STATUS is that it takes 2
> instructions to save or load STATUS and only one to do W (which could be
> a MOVFW since STATUS will be rewritten with the return and load).  If the
> W register were mapped in file space (a highly desirable feature which
> you mentioned below), then the return and load W instruction could be
> made to act as a simple return by pointing it at W.  Instructions to load
> and save STATUS directly may be useful in ISRs and other circumstances
> but I doubt there's space for them.

I think W is probably easier to implement in hardware, and I think that in
the non-interrupt case W will be useful much more often.

> >Shadow registers are a nice concept; if you don't nest interrupts the
> >hardware cost is probably not too unreasonable, but they allow
> >interrupt
> >latency to be practically nullified.
>
> A shadow for PCLATH would also be useful.  Without hardware support of a
> data push-pop stack, nested interrupts are going to get too
> time-consuming to be worth attempting.

Yeah, a shadow for PCLATH might be nice too, but W and PSW are the
essential ones.  Those are needed in 90-99% of ISR's; PCLATH is needed
much more seldom.

> >Anyhow, there are four more magic instructions I'd like to see...
> >
> >[1] "forcef" this instruction would fetch its operand from a register,
> >leave it in the operand buffer, and disable the operand fetch on the
> >next
> >instruction.  ... A more typical use would be:
> >
> >        forcef  foo
> >        movnf   Bar,f   ; movnf = move, without bothering flags
> >
> >which would copy Foo into Bar without affecting anything else.
>
> I've never had much occasion to do this, more general-purpose is an
> instruction which exchanges W and a register.
>
>         xchwf   bar     ;save W into bar
>         movfw   foo     ;Get foo
>         xchwf   bar     ;bar = foo, recover old W
>
> In the example above, the pesky MOVFW will affect Z.  But, if the value
> of foo didn't need to be preserved the movfw could be replaced with an
> xchwf, leaving the old value of bar in foo (a complete exchange of foo
> and bar).  The xchwf could also be the no-flags load W, if it didn't
> matter that the value in RAM was lost (such as when returning from an
> interrupt).

But as your example itself illustrates, the exchange instruction doesn't
help as much as the "forcef" instruction.  In addition, the "forcef"
version would avoid the side-effects posed by your "xchwf" example above
and would (conjecture here) probably be easier to implement.

Compared with "movfp" and "movpf" (both of which use GOBS of opcode space)
I think "forcef" would be just as useful (not as fast as movpf/movfp, but
not restricted in addressing ability).  I don't know whether you'd use
that instruction, but there are many spots (esp. in ISR's) where the
ability to move register to register without affecting W or PSW would be
very nice indeed.

> >[2] "forcel"; this instruction would behave like "forcef" except that
> >the
> >operand would be taken from the opcode itself [as in "xxxLW"
> >instructions).
>
> The place I would find this useful would be for resetting a counter, e.g.
> a counter that goes from 40 down to 1 then starts at 40 again:
>
>         decf    counter,f       ;Decrement counter
>         forcel  .40
>         skpnz                   ;Did counter reach 0?
>         movf    bar,f           ;Yes, restart it.
>
> But this won't work with your forceX instructions since they only keep
> for one instruction.  So may as well put the value in W and save it if
> necessary.  A decfsnz instruction as on the 17CXX would be useful in
> situations like these.

How about...

        decfsz  counter,f
        btfsc   KZ,0            ; Something that will always be clear
         forcel .40
        movnf   counter,f       ; Move without setting flags

Note that the "movnf" always happens, but if the counter wasn't
decremented to zero the "forcel" won't be executed and the "movnf" will
have no effect.  Note that in the above code, neither W nor PSW is
affected; it could be used in an interrupt without saving those registers.

> >[3a] "btfgs"; goto if bit set.  This instruction would cause the next
> >instruction to be skipped; if the bit was set, the bits in the
> >"pending
> >instruction" register would be copied to the PC.  This would allow for
> >jumps anywhere in the address space without need for PCLATH, and would
> >reduce the time required for a conditional jump from three cycles to
> >two.
> >
> >[3b] "btfgc"; [same as above, but if bit clear]
> >
> >[4a] and [4b]; "btfcs" and "btfcc"; call if bit set/clear.  I'd handle
> >these by using the MSB of the destination address to select GOTO/CALL.
> >
> This is a real mess, you're introducing two-word instructions now.  So an
> entirely new data path from the instruction register to the PC would need
> to be added, along with logic to not attempt to execute the program words
> which are long addresses rather than instructions.

I don't think it would be that bad; true there would have to be a
dedicated data path from the pending instruction to the PC, but that
shouldn't be terribly hard.  As for avoiding an attempt to execute the
instruction, the only case I can see where that would be problematical
would be with my original proposed "bmagic" instruction; otherwise the
same type of logic could be used as handles "ordinary" skips.

I guess if Microchip is using a discrete-logic multiplexor on the PC,
adding the other data path could be a pain; if they're using a pass-gate
mux, however, I would think it quite feasible.

> What may be nice would be versions of goto and call that use W rather
> than PCLATH for the high address.  This would make long gotos or calls
> quicker but then it wouldn't be possible to pass parameters in W.

Ehh... I don't know about that.  On a 4K part, at most one BSF/BCF is
needed to set PCLATH before a GOTO or CALL; on an 8K part two would be
needed.  Compared with having to load W, I think PCLATH is more
convenient.

> >In addition, I'd like to see the following "register" enhancements:
> >
> >[1] Make bit 6 of PCLATH a "goto/call" flag (bits 5 and 7 would remain
> >unused).  If bit 6 is set, make any writes to PC perform a CALL rather
> >than a GOTO.  This would save two cycles off the typical:
> >
> >Springboard:
> >        movwf   PC
> >        ...
> >        ; put whatever in W, then...
> >        call    Springboard
>
> The no-skip logic would also allow 4-cycle table access, albeit with a
> lot more confusion.  The place to control that is probably also one of
> the high bits of PCLATH.

I have seen code which used RLF or RRF on PCLATH (in computing table
values).  Such code could break if anything other than bit 6 were used.  I
think having a "call-mode" bit there would be nicer than a skip-select
bit, but it's probably just a matter of taste.

> I've had a couple of times when a no-return RETURN would be useful, i.e.
> it would pop (and ignore) the top address on the stack so the last CALL
> would be effectively converted to a goto and the next RETURN would go to
> the second to last CALL.

I've been in this situation too.

> >[2] Add two new "indf" addresses for "indfi" and "indfd" which would
> >post-increment or post-decrement fsr.
>
> Not sure if this is worth dedicating much hardware and file space to.

Perhaps; perhaps not.  IMHO it would have been a better investment than
having bits in ALUSTA to control those effects.

> >[3] Add W as a readable register (to allow "btfsc", "rrf", etc. to be
> >done
> >on it usefully).  I don't think it needs to be writable (the only
> >instructions which would benefit from that ability would be "bsf" and
> >"bcf"; these are synonymous with "andlw" or "iorlw" except that the
> >latter
> >affect flags.
>
> This is very useful for the test and rotate, such as multiplying or
> dividing by a constant to generate an address.

Right.  Any reason it should be writable (as a register), though?  Doing
so requires adding another write-data path for IMHO very little payoff.