This ``goto tree'' idea Bob Ammerman came up with is very clever. Of course, this is only useful on 12 bit PICs -- there's faster and smaller techniques available on 14 bit and 16 bit PICs. I think Bob confused Roman Black by leaving out one tiny detail -- both the XOR table and this ``goto tree'' table can go anywhere in memory, and can be called from anywhere in memory, with no restrictions, period. However, the programmer must add one instruction in a special location:
; jump table
    ....
get_table: ; must be in the ``correct'' half of a 512 byte block !
    goto  get_xxxx ; get_xxxx() itself can be anywhere
add_16:; must be in the ``correct'' half of a 512 byte block !
    goto really_add_16 ; really_add_16() itself can be anywhere
    ....
. Then the programmer can call it from anywhere in memory with
    call get_table
.
Maybe the Original Poster can free up some more space in the "special" half of
each 512 byte block by using indirect subroutine calls like that. That was the
original problem, right -- running out of space in the "special" half ?
If you have N values, this ``goto tree'' table takes up
static instructions = N (retlw table[index]) + N/2 (btfss index,0) + 2*(N/2 - 1)
(btfss... , goto..., in a tree).
So a total of (2.5 N - 2) instructions (a bit worse if N is not a power of 2).
(Not counting the specially-placed GOTO or any CALLs).
It's much faster (fewer dynamic instructions) than the XOR code, but it's not
isochronous, and it takes more static instructions: 2.5 N - 2, vs.  2 N + 1.
Looks like a classic time-space tradoff. I can usually make code run much faster
on average if I am allowed to let the worst-case run really slow. I can usually
make tradeoffs between a fast algorithm that uses more code/RAM, or a slow
algorithm that uses less code/RAM. Once in a long while I feel very proud of
myself for making something run faster (or at least the same) in every case
*and* use less code/RAM.
If the table has enough places with several consecutive duplicate values (for
example, the flat steps near the top of a sine wave), this ``goto tree'' might
take *fewer* static instructions than the XOR code, or even the ADDWF techniques
available on other processors -- the programmer can take out the redundant
test,branch,retlw instructions that all return the same value. In those special
cases, the ``goto tree'' would be faster *and* take less space than any other
table lookup I know.
Bob Ammerman