Andy David says:
Here's my 24-bit routine as written for the 17c43 taken from a mail I sent Scott just after I wrote them, hence the comments about the implementations I used.I did a few pencil-and-paper runs through the algorithm to be sure I understood the steps involved. This 24-bit resembles these 'manual' steps of shifting along the input string 2 bits at a time and subtracting - as the result is 12-bit, the final subtraction isn't awkward as in the 32-bit sqrt. I can't remember the performance figures for this routine, but I've just counted 348 min, 406 max (inc. call & return). A downside of this method is that the loop counter has to be a seperate variable, so it uses the same amount of ram as the 32-bit sqrt. This one is easier to follow and compare to an example on paper than the method I've used for the 32-bit sqrt.
I've added a 16-bit square root routine (again for the 17c43) that uses successive approximation to find the square root, the binary restoring method I'm fairly sure would be quicker. This is the only 16-bit sqrt routine I have specifically for the 17cxx PICs, I've only included it as the original poster requested a 16-bit routine. If you really want speed speed speed, I'd rewrite the 32-bit routine to be 16-bit.
Standard disclaimer applies
;========================================================================== ; brSQRT24 ; ; Calculates the square root of a twentyfour bit number using the ; binary restoring method. ; ; Result in ACCaHI:ACCaLO ; Input in ACCcLO:ACCdHI:ACCdLO ; Test ACCbLO:ACCcHI ; Counter in ACCbHI ; ; 40 words long, uses 8 bytes RAM (inc. 3 holding 24-bit input). ; ;-------------------------------------------------------------------------- brSQRT24: clrf ACCaHI,f ; clrf ACCaLO,f ; clrf ACCcHI,f ; clrf ACCbLO,w ; movlw .12 ; movwf ACCbHI ; (6 cycle intro, 8 incl. call) ShftUp: rlcf ACCdLO,f ; Shift input up 2 places. rlcf ACCdHI,f ; (33 cycles per loop if bit is 0) rlcf ACCcLO,f ; (29 cycles per loop if bit is 1) rlcf ACCcHI,f ; rlcf ACCbLO,f ; rlcf ACCdLO,f ; rlcf ACCdHI,f ; rlcf ACCcLO,f ; rlcf ACCcHI,f ; rlcf ACCbLO,f ; rlcf ACCaLO,f ; Shift root-so-far up by two and append rlcf ACCaHI,f ; ... '01'. rlcf ACCaLO,f ; rlcf ACCaHI,f ; bcf ACCaLO,1 ; bsf ACCaLO,0 ; SubTest:movfp ACCaLO,WREG ; subwf ACCcHI,f ; movfp ACCaHI,WREG ; subwfb ACCbLO,f ; btfsc ALUSTA,C ; goto Set1 movfp ACCaLO,WREG ; Restore the remainder. addwf ACCcHI,f ; ... (the current bit is 0). movfp ACCaHI,WREG ; addwfc ACCbLO,f ; goto Set0 ; Set1: bsf ACCaLO,1 ; Set0: rrcf ACCaHI,f ; rrcf ACCaLO,f ; bcf ACCaHI,7 ; BitLoop:decfsz ACCbHI,f ; goto ShftUp return