A couple of thoughts some to mind: At 12:49 PM 6/5/96, Scott Dattalo wrote: >A week or so ago Andy W. made the claim: > >> I have in front of me a 16-bit square-root routine that takes only >> 126 cycles to execute on a PIC16C5x. > >and it requires: > >> RAM: 4 registers, including the two that hold the 16-bit input. >> ROM: 24 words. > >and finally > >> Anyway, he hasn't given me permission to publish it, so I can't post > ^-- The Author >> the routine here. However, the knowledge that such a solution exists >> should be enough to prod someone into trying to duplicate it. > > >Well, I've been prodded. However, I've come up a little short. My solution >requires: > > RAM: 4 registers, including the two that hold the 16-bit input. (same) > ROM: 28 words. (4 words too big) > >Perhaps someone may see where 4 words can be squeezed out. But, realize that >the slightest change has many ramifications. There are many inconspicuous >dependencies (i.e. tricks). > >Note, I have tested this routine over many but not all of the 64k possible >values. I haven't found any errors.... > > >s1 equ 0x20 >s2 equ 0x21 >N_hi equ 0x22 >N_lo equ 0x23 > > >;--------------------------------------------------------------- >;sqrt >; >; The purpose of this routine is take the square root of a 16 bit >;unsigned integer. Adding in the following documentation (for a little clarity now that we have this wicked code to add to our libaries!): ; N_hi N_lo - Comprise 16 bit number to be rooted ; S1 - Temporary ; S2 - 8 bit integer sqaure root of N >sqrt > MOVLW 0x40 > MOVWF s1 > CLRF s2 > >L2 ADDWF s2,W > SUBWF N_hi,W ;W = N_hi-s1-s2 > > SKPNC ;if N_hi > (s1+s2) > goto L3 ; then replace N_hi with N_hi-s1-s2 > > ;C==0. > *> BTFSC s1,6 ;If this is the first time through the loop *> goto L1 ;then N is < s1+s2 The lines marked with "*>" above can be changed to: BTFSS s1,6 ;If this is the first time through the loop ;then N is < s1+s2 We remove the goto and make use of the goto (now) 2 instructions below. To do so, we flip the bit test. > > ;N may still be greater than s1+s2, so we need to check MS bit of N > BTFSS N_lo,0 ;If MS bit is zero, then N < s1+s2 > goto L1 > >L3 > ;N is greater than s1+s2, so N_hi with N_hi-s1-s2 > MOVWF N_hi > > RLF s1,W > RLF s1,W > ADDWF s2,F ;s2 = s2 | (s1<<1) >L1 Moved the following from below: This saves 6 cycles when we end. Since we don't care what N or S1 contain, no need to shift them! BTFSC s1,6 return > > ;Next, roll N left one bit position. I.e. N = (N<<1) | (N>>15). This puts > the > ;MS bit into the LS bit position. > RLF N_hi,W ;C = MS bit of N > RLF N_lo,F > RLF N_hi,F > > CLRC > RRF s1,W > RRF s1,F ;s1 >>= 1 > Move to above. No longer needed. ;; BTFSC s1,6 ;; return > > BTFSS s1,7 > goto L2 > >L4 > BTFSS N_lo,7 > MOVLW 1 > goto L2 > > > [...............shnip..............] > >s1 = 1<<14 >s2 = 0 >do >{ > if((s1 + s2) <= N) > { > N = N - s2 -s1 > s2 = s2 | (s1<<1) > } > s1 = s1 >> 1 > N = N << 1 >} while(s1 > 2^6) > >return(s2>>8) > reworking the above code to be: s1 = 1<<14 s2 = 0 while (1) { if((s1 + s2) <= N) { N = N - s2 -s1 s2 = s2 | (s1<<1) } if (s1 <= 2^7) // Since low bits of S1 are 0, this is if (s1 == 2^7) break; // Only do the following if we need to // (Save a couple cycles at the end!) s1 = s1 >> 1 N = N << 1 } return(s2>>8) [...............shnip..............] > >Anyway, maybe this will prod someone else too. > > >Scott The net effect: * We made the code 1 instruction smaller. * The timing (worst, average, best) is at least 7 cycles shorter. - Always 6 cycles for the end condition - Save at least 1 cycle through the loop. - Each time we have to loop, we save 1 cycle cheers, eric PS. If I get a chance, I'll do the timing analisys (if no one else does!)