> This is the kind of thinking that made segmented addressing so popular with > ppl programming the 80x86 - NOT!!! The FSR is the logical way to use > pointers in C, and having to bank-switch pointers plays hell with > efficiency. Anyone remember the HUGE pointer type in PC type C? How > inefficient it was? > > consider this dumb code fragment. > > #define BYTE unsigned char > > #ifdef BLOAT > #define MAXARRAY 400 > #else > #define MAXARRAY 40 > #endif > > main() > { BYTE source_array[MAXARRAY]; > BYTE dest_array[MAXARRAY]; > BYTE *source_ptr, *dest_ptr; > BYTE idx; > source_ptr = source_array; > dest_ptr = dest_array; > for(idx = 0; idx < MAXARRAY; idx++) > *source_ptr++ = *dest_ptr++; > } > > Doesn't do much, but think about the extra code the compiler emits to > generate bankswitching when BLOAT is defined... - and how many times you > would use pointers in C - and the slowdown... For the particular code fragement given, a smart compiler would generate decent code since it would know when the page boundaries were going to occur; consequently, it could just do something like... ; Part 1: neither side has crossed a page boundary yet movlw DoublePostIncrement movwf STATUS movlw NumBytesPart1div2 movwf Counter movlw StartAddr1 movwf FSR0 movlw StartAddr2 movwf FSR1 Loop1: movlr FirstBank movfp IND0,Temp0 movfp IND0,Temp1 movlw SecondBank movfp Temp0,IND1 movfp Temp1,IND1 decfsz Counter goto Loop1 ; Part 2: The first side has just crossed a page boundary movlw 32 movwf FSR0 movlw NumBytesPart2div2 movwf Counter Loop2: movlr FirstBank movfp IND0,Temp0 movfp IND0,Temp1 movlw SecondBank movfp Temp0,IND1 movfp Temp1,IND1 decfsz Counter goto Loop1 ... Which manages to be only about three times as slow as it oughta be (9 cycles per 2 bytes [4.5 per], instead of 7 cycles for 4 bytes [1.75 per])... Of course, if the compiler DIDN'T KNOW where the page crossings were going to be (e.g. /* Huge declarations imply pointers that may cross page boundaries */ void mymemcpy(huge char *ptr1, huge char *ptr2, unsigned char numbytes) { while (numbytes--) *ptr1++ = *ptr2++; } or other such code) then it couldn't do any optimizations like the above and would be stuck doing [assuming those #*($# 224-byte pages] ; Note: This code fragment assumes that three bytes of unbanked space are ; available [temp0, temp1, and temp2] _mymemcpy: movlw 0 ; These three instructions can probably iorwf numbytes ; be replaced with 1, but my 17Cxx book btfsc Z ; isn't handy. return movwf temp2 movfp ptr2,FSR0 movfp ptr2+1,temp0 movfp ptr1,FSR1 movfp ptr1+1,temp1 movlw NoAutoInc movwf STATUS loop: movpf temp0,BSR movpf IND0,WREG movpf temp1,BSR movpf WREG,IND1 infsnz FSR0 goto F0oops F0okay: infsnz FSR1 goto F1oops F1okay: decfsz numbytes goto loop return F0oops: bsf FSR0,5 ; Advance it to 32 movlw 16 addwf temp0 goto F0okay F1oops: bsf FSR1,5 ; Advance it to 32 movlw 16 addwf temp1 goto F1okay ELEVEN CYCLES PER BYTE--**BEST** CASE! If there weren't room for Temp0..Temp2 in unbanked memory, the count would go up to FOURTEEN! Even with only one memory pointer the 16C6x can do as well: MyMemCpy: movf NumBytes,w btfsc Z return movf Source,w movwf FSR subwf Dest,w movwf TempDS sublw 1 movwf TempSD Loop: movf IND,w movwf TempW movf TempSD,w addwf FSR movf TempW,w movwf IND movf TempDS,w addwf FSR decfsz NumBytes goto Loop [okay, I'll admit the above isn't a perfect comparison since the 16C6x version doesn't deal with pointers crossing pages. If I didn't have to worry about incrementing pointers through page boundaries that would improve the 17Cxx code to seven cycles per byte. But the 17Cxx is supposed to be much better than the 16Cxx; its performance on this type of thing is IMHO quite deficient.]