On Thu, Jan 13, 2011 at 08:36:15AM -0500, smplx wrote:
>=20
> NOTE: PIC tag added

Thanks. I lost the original subject and forgot to readd it on the retype.

I'm going to do some snippage. Refer to the original post for details.

>=20
> On Wed, 12 Jan 2011, Byron Jeff wrote:
> > My interest is how to implement these threads. Forth has several differ=
ent
> > types of implementations. Without going into all the nuances, there are
> > three basic ways of implementing how to get to a particular subroutine:
> >
> > 1. An actual subroutine call.
> > 2. Specifying the address of the subroutine and doing an indirect jump/=
call.
> > 3. Specifying a token for the subroutine.
> >
> > Each has their pros and cons in terms of space and speed. The design
> > decisions are further complicated on a pic because of paging issues and=
 the
> > fact that in general PICs can only execute code from program memory.
> >
> > So I'm trying to decide the best way to implement these threads. My
> > design considerations are:
> >
> > 1) Threads should have the ability to execute from program memory or RA=
M.
> >
> > 2) Threads should be able to be located in any part of program memory
> >
> > 3) Code space and execution efficiency should be optimized.
> >
> > Clearly a balancing act.
> >
> > Specifying just the address has advantages and disadvantages. It does h=
ave
> > the advantage of "running" both from program memory and RAM unchanged. =
With
> > the new CALLW instruction the entire program memory can be reached. And
> > finally with the new indirect access map, there is a unified RAM and
> > program memory address space.
> >
> > The challenges are primarily space and time considerations. As a compro=
mise
> > the indirect access map only accesses the lower byte of each program me=
mory
> > word. So that means that two fetches would be required to get each addr=
ess.
> > On the other hand, the traditional EEDATA fetch can get all 14 bits of =
each
> > word. But two challenges are that 14 bits only gives 16K words of acces=
s
> > (which is the entire 16F1939 program space) and that it takes quite a b=
it
> > of setup to use EEDATA to fetch both words.
> >
> > Now tokens have some possibilities. Tokens can be shoved into 8 bits wh=
ich
> > requires only a single fetch. The new BRW instruction makes it trivial =
to
> > implement a jump table. The biggest problem with tokens is that there's=
 a
> > limit to the number you can implement before you have to do something a=
bout
> > it.
> >
> > I'm thinking that maybe a mix of tokens and addresses may be the winner=
.. If
> > tokens have bit 7 clear and address have bit 7 set, then tokens can be =
used
> > for heavily accessed subroutines while addresses can be used for others=
..
> >
> > Anyway, just some thoughts. If you have any, I'd like to hear them.
>=20
> Ok, just a couple of thoughts.
>=20
> Yes using a mixture of tokens and absolute addresses sounds good. However=
=20
> I would forget about using a bit in the token to distinguish between the=
=20
> two as this means you lose half your tokens and half your address space=20
> AND you increase the runtime overhead by having to decide which you are=20
> decoding every time.

Somewhat good points. Actually the half an address space isn't lost because
the architecture only has a 15 bit address space. So 7 bits from the token
+ 8 bits from the next fetch =3D 15 bits.

Decoding is a valid issue. My thought process was the fact that to get full
access to the memory space, the jump table would need 2 instruction entries
(A MOVLP for the PCLATH followed by a jump/call) and BRW only can access
256 words forward of the instruction to 256/2 =3D 128 possible tokens.
For 256 tokens, I would have to decode the top bit anyway to determine
which jump table to access. So I figured that decoding and using the
absolute address for the second half would be a win.

> Instead I would propose that you always use tokens=20
> and that you reserve one of these to handle an absolute address. In effec=
t=20
> all the subroutines referenced in your subroutine look-up table get calle=
d=20
> immediately without worrying if the token is a token or an address and yo=
u=20
> have one special subroutine that takes the next two bytes of the stack=20
> (it's 16 bit integer parameter) and uses these as the actual address of=20
> the real subroutine to call (jump to). The return address doesn't need to=
=20
> be messed with and banking is easier because you are only calling=20
> subroutines via your look-up table and the special indirect subroutine=20
> handler can do the necessary banking for you. So although it does require=
=20
> a little extra work because it looks like you are doing kind of two calls=
,=20
> it should on average actually be more efficient than processing each toke=
n=20
> specially to see if it is a subroutine token or an absolute address.

See above. The only problem I see with this approach is the size mix of
tokens + absolute addresses. With my approach you have 1 byte for tokens
and 2 bytes for absolute addresses. With yours there is 1 byte for tokens
and 3 bytes for absolute addresses. And as alluded to above, the top bit of
the token still needs to be decoded.

Finally extending the token set does not seem to make much sense either
since it'll take 2 bytes for extended tokens and in 2 bytes you can have
access to any absolute address.

The decoding should only take a single btfsx instruction as the extended
16F architecture has the WREG as a FSR. So the sequence should be something
like

        call      fetchtoken
        btfsc     WREG,7       ; Skip if token
        goto      absolute     ; process absolute address
        lslf      WREG,F       ; double token for table jump
        brw                    ; to table branch
tokentable
token0  movlp     HIGH(token0) ; top part of first token
        goto      token0       ; process first token

....

I think with your approach either the same code with a jump to the second
half of the token table instead of absolute jump or a clear carry/rlf/test
on carry would need to be done. I don't think that there is more than one
instruction times worth of time one way or the other difference.

It is certainly interesting analysis.

BTW I have a design goal of running threads out of RAM so that temporary
words can be written and tested in RAM before being committed to flash.

BAJ

>=20
> Regards
> Sergio Masci
>=20
> =3D=3D=3D
>=20
> Do NOT send emails to this email address directly. This email address is=
=20
> only valid for the PICLIST. Anything else gets dumpped without me ever=20
> seeing it.
>=20
>=20
> --=20
> http://www.piclist.com PIC/SX FAQ & list archive
> View/change your membership options at
> http://mailman.mit.edu/mailman/listinfo/piclist

--=20
Byron A. Jeff
Department Chair: IT/CS/CNET
College of Information and Mathematical Sciences
Clayton State University
http://cims.clayton.edu/bjeff
--=20
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.