> This is the kind of thinking that made segmented addressing so popular with
> ppl programming the 80x86 - NOT!!!  The FSR is the logical way to use
> pointers in C, and having to bank-switch pointers plays hell with
> efficiency.  Anyone remember the HUGE pointer type in PC type C?  How
> inefficient it was?
>
> consider this dumb code fragment.
>
> #define BYTE unsigned char
>
> #ifdef BLOAT
> #define MAXARRAY 400
> #else
> #define MAXARRAY 40
> #endif
>
> main()
> {       BYTE source_array[MAXARRAY];
>         BYTE dest_array[MAXARRAY];
>         BYTE *source_ptr, *dest_ptr;
>         BYTE idx;
>         source_ptr = source_array;
>         dest_ptr = dest_array;
>         for(idx = 0; idx < MAXARRAY; idx++)
>                 *source_ptr++ = *dest_ptr++;
> }
>
> Doesn't do much, but think about the extra code the compiler emits to
> generate bankswitching when BLOAT is defined...  - and how many times you
> would use pointers in C - and the slowdown...

For the particular code fragement given, a smart compiler would generate
decent code since it would know when the page boundaries were going to
occur; consequently, it could just do something like...

; Part 1: neither side has crossed a page boundary yet
        movlw   DoublePostIncrement
        movwf   STATUS
        movlw   NumBytesPart1div2
        movwf   Counter
        movlw   StartAddr1
        movwf   FSR0
        movlw   StartAddr2
        movwf   FSR1
Loop1:
        movlr   FirstBank
        movfp   IND0,Temp0
        movfp   IND0,Temp1
        movlw   SecondBank
        movfp   Temp0,IND1
        movfp   Temp1,IND1
        decfsz  Counter
         goto   Loop1
; Part 2: The first side has just crossed a page boundary
        movlw   32
        movwf   FSR0
        movlw   NumBytesPart2div2
        movwf   Counter
Loop2:
        movlr   FirstBank
        movfp   IND0,Temp0
        movfp   IND0,Temp1
        movlw   SecondBank
        movfp   Temp0,IND1
        movfp   Temp1,IND1
        decfsz  Counter
         goto   Loop1

...
Which manages to be only about three times as slow as it oughta be (9
cycles per 2 bytes [4.5 per], instead of 7 cycles for 4 bytes [1.75
per])...

Of course, if the compiler DIDN'T KNOW where the page crossings were going
to be (e.g.

/* Huge declarations imply pointers that may cross page boundaries */
void mymemcpy(huge char *ptr1, huge char *ptr2, unsigned char numbytes)
{
  while (numbytes--)
    *ptr1++ = *ptr2++;
}

or other such code) then it couldn't do any optimizations like the above
and would be stuck doing [assuming those #*($# 224-byte pages]

; Note: This code fragment assumes that three bytes of unbanked space are
; available [temp0, temp1, and temp2]

_mymemcpy:
        movlw   0               ; These three instructions can probably
        iorwf   numbytes        ; be replaced with 1, but my 17Cxx book
        btfsc   Z               ; isn't handy.
         return
        movwf   temp2
        movfp   ptr2,FSR0
        movfp   ptr2+1,temp0
        movfp   ptr1,FSR1
        movfp   ptr1+1,temp1
        movlw   NoAutoInc
        movwf   STATUS
loop:
        movpf   temp0,BSR
        movpf   IND0,WREG
        movpf   temp1,BSR
        movpf   WREG,IND1
        infsnz  FSR0
         goto   F0oops
F0okay:
        infsnz  FSR1
         goto   F1oops
F1okay:
        decfsz  numbytes
         goto   loop
        return

F0oops:
        bsf     FSR0,5          ; Advance it to 32
        movlw   16
        addwf   temp0
        goto    F0okay

F1oops:
        bsf     FSR1,5          ; Advance it to 32
        movlw   16
        addwf   temp1
        goto    F1okay

ELEVEN CYCLES PER BYTE--**BEST** CASE!  If there weren't room for
Temp0..Temp2 in unbanked memory, the count would go up to FOURTEEN!
Even with only one memory pointer the 16C6x can do as well:

MyMemCpy:
        movf    NumBytes,w
        btfsc   Z
         return
        movf    Source,w
        movwf   FSR
        subwf   Dest,w
        movwf   TempDS
        sublw   1
        movwf   TempSD
Loop:
        movf    IND,w
        movwf   TempW
        movf    TempSD,w
        addwf   FSR
        movf    TempW,w
        movwf   IND
        movf    TempDS,w
        addwf   FSR
        decfsz  NumBytes
         goto   Loop

[okay, I'll admit the above isn't a perfect comparison since the 16C6x
version doesn't deal with pointers crossing pages.  If I didn't have to
worry about incrementing pointers through page boundaries that would
improve the 17Cxx code to seven cycles per byte.  But the 17Cxx is
supposed to be much better than the 16Cxx; its performance on this type of
thing is IMHO quite deficient.]