Most of you probably recall the recent square root saga where Andy W. made a claim that someone had written a square root routine with the following features: o) 28 words of program memory o) 4 bytes of RAM, including the two that hold the 16 bit input o) Executes in 95 cycles. The original author requested that it not be released publicly, however Andy hoped that someone would be "prodded" into reproducing the routine. A few weeks later I produced a square root routine with the following features: o) 27 words of program memory (actually it was 28, Eric and Martin independently suggested a way to shave one word). o) 4 bytes of RAM, including the two that hold the 16 bit input o) Executes in ~150 cycles. This version saves one word of program memory at the cost of 50% reduction in execution speed; A trade I would usually decline. I didn't see a way to improve this routine any more, so I just left it alone. Then one day I came across the book The Logic of Computer Arithmetic by Ivan Flores He describes two different algorithms for computing the square root of a number. I wrote PIC programs for each (actually several PIC programs for each) and came up with this implementation o) 30 words of program memory o) 3 bytes of RAM, including the two that hold the 16 bit input o) Executes in ~104 cycles. ;---------------------------------------------------------- ;sqrt ; ; The purpose of this routine is take the square root of a 16 bit ;unsigned integer. ;Inputs: N_hi - High byte of the 16 bit number to be square rooted ; N_lo - Low byte " " " ;Outputs: W register returned with the 8 bit result ; ;Memory used: ; mask ; 30 Instructions sqrt MOVLW 0xc0 ;Initialize value for mask MOVWF mask MOVLW 0x40 ;Initial value for the root sq1 SUBWF N_hi,F ;Subtract the root developed so far SKPC ;If it is larger than N_hi goto sq5 ; then go restore the subtraction sq2 IORWF mask,W ;Set the current bit sq3 RLF N_lo,F ;Shift N left one position RLF N_hi,F RRF mask,F ;Shift the mask right, and pick up msb of N BTFSC mask,7 ;If msb of N is set, then unconditionally goto sq6 ;set the next bit (because subtracting the ;root is guaranteed not to cause a borrow) XORWF mask,W ;Append "01" to the root developed so far SKPC ;If the lsb of mask was shifted into the carry goto sq1 ;then we're done. Otherwise, loop again. ;We are almost done. In the last iteration, we have 7 bits of the root. When ;"01" is appended to it, we will have a 9-bit number that must be subtracted ;from N. SUBWF N_hi,F ; SKPC ;If the upper 7 bits cause a borrow, then RETURN ;the appended "01" will as well: We're done. SKPNZ ;If the result of the subtraction is zero BTFSC N_lo,7 ;AND the msb of N_lo is set then the LSB of the ;root is zero. XORLW 1 ;Otherwise, it is one. RETURN ; sq6 SKPNC ;Need to unconditionally set the current bit of the root. RETURN ;However, if we're through iterating, then leave. Note, ;C is set by shifting the mask right and the LSB of root ;was set by IORWF at sq2. BCF mask,7 ;Clear the MSB of N that got shifted into the mask. XORWF mask,W ;Append "01" to the root developed so far. SUBWF N_hi,F ;Subtract goto sq2 ;Go unconditionally set the current bit. sq5 ADDWF N_hi,F ;Restore N_hi by reversing the SUBWF with a ADDWF goto sq3 ;Don't set the current bit While this version is three words longer, it saves one byte of RAM; this is a trade I would make any day (in the PIC enviroment that is). Also, the execution speed is only slightly slower than the one Andy's mystery programmer produced, yet significantly faster than my original version. Here's an excerpt from one of the messages I sent Andy while I was developing this routine: --------------------------------------- I was wandering around the SRI Library the other day and came across the book: The Logic of Computer Arithmetic by Ivan Flores It's the kind of book that Knuth would love. It discusses all of the _classical_ arithmetic algorithms and how to efficiently implement them with hardware. The thing I found really interesting is that he gives two ways to implement the _fast_ sqrt algorithm that we've been throwing around. He calls the first method the Binary Square Roots - Restoring Method and the other the Binary Square Roots - Non-restoring Method. While they're both very similar, the first one is the way I had implemented the sqrt routine. His explanation of the theory of operation however, follows more closely along the lines of what Bob F(??) had posted as a square root challenge some months back. In other words, you pair the bits together and perform a magic algorithm: Quoting Flores: In general, the procedure consists of taking the square root developed so far, appending 01 to it and subtracting it, properly shifted, from the current remainder. The 0 in 01 corresponds to mutliplying by 2; the 1 is a new trial bit. If the resulting remainder is positive, the new root bit developed is truly 1; if the remainder is negative, the new bit developed is 0 and the remainder must be restored (thus the name) by adding the quantity just subtracted. The example he gives takes the square root of 01011111 1 0 0 1 . ----------------- ) 01 01 11 11 . 00 -1 --- 00 01 <--- positive: first bit is a 1 -1 01 ----- 11 00 <--- negative: 2nd bit is a 0 +1 01 <--- restore the wrong guess ------ 00 01 11 -10 01 --------- 11 11 10 <--- negative: 3rd bit is a zero +10 01 <--- restore the wrong guess --------- 01 11 11 -1 00 01 --------- 0 11 10 <--- positive: 4th bit is a one etc... The other method does not restore the subtraction if the result was negative. Instead, it appends a 11 to the root developed so far and on the next iteration it performs an addition. If the addition causes an overflow, then on the next iteration you go back to the subtraction mode. Before I botch the explanation much further, let me quote Flores again: As long as the remainder is negative, we proceed as in the previous section; we enter a 1 in the corresponding root bit being developed; we append 01 to this number; we shift it the correct number of times and _subtract_ it from the previous remainder. When the remainder goes negative, we do not restore as in the previous section. First we enter a 0 as the next root bit developed. To this we append 11. This result is shifted left the proper number of times and _added_ to the present remainder. Again, the same example: 1 0 0 1 . ----------------- ) 01 01 11 11 . 00 -1 --- 00 01 <--- positive: first bit is a 1 -1 01 <--- Developed root is "1"; appended 01; subtract ----- 11 00 11 <--- negative: 2nd bit is a 0 +10 11 <--- Developed root is "10"; append 11 and add. --------- 11 11 10 11 <--- positive (i.e. didn't cause overflow): 3rd bit is a 0 1 00 11 <--- Developed root is "100"; append 11 and add --------- 1 00 00 11 10 <--- Overflow: 4th bit is a one etc... Pretty neat! ---------------------------------------------------- Anyway, this latest version is also the "restoring" method. The non-restoring version turned out to be significantly longer. If any one can find a way to make this routine shorter and/or faster, please let me know! Scott P.S. Logarithms and Powers are around the corner.