PIC Microcontoller Math Method

32 by 16 Divison from Nikolai Golovchenko,
or
How to make use of a 16 bit division routine for 32 bit division

Division routines sometimes can be used for more than they were designed for! For example, it is possible to divide 16 or 32 bit dividend by 16 bit divisor to 16 bit quotient in just one routine.

The idea is simple: you do 16 bit division as usually, but when the dividend is 32 bit, the lower two bytes are used as dividend and the higher two bytes are used as initial value for remainder (which is zero when doing 16 bit division)

To understand how initializing remainder relates to dividend, let's examine the algorithms for two cases:

     x
q = ---
     y

1) when x, y, and q are 16 bit unsigned integers, and
2) when x and q are 32 bit unsigned integers, but y is a 16 bit unsigned integer

Note that in some cases it is known that the result of 32 bit by 16 bit division is 16 bit maximum. The trick described on this page works only for 16 bit result.
For example, when speed of a motor is calculated by a Hall sensor pulses (one pulse per revolution) the following formula is used:
        60 x 1,000,000         (conversions for minutes and us,
rpm = ------------------        timer is clocked by 1 MHz clock)
      16 bit timer value
Timer is used to measure the Hall sensor pulse period. For this motor and timer the rpm range is within 915-20,000 rpm, which fits perfectly into 16 bits.

1) 16 bit division

x - 16 bit dividend;
y - 16 bit divisor;
q - 16 bit quotient;

rem - 16 bit remainder;
counter - loop counter.

1. Clear remainder, load counter=16

       rem         q
     -------    -------
     |00|00|    |??|??|
     -------    -------
      b1 b0      b1 b0

2. Shift left dividend x, and shift left remainder rem, by 1 bit, so that MSb of x was shifted to the remainder's LSb

       rem         x
     -------    -------
     |r1|r0| <- |x1|x0| <- shift left
     -------    -------
      b1 b0      b1 b0

3. Subtract divisor y from remainder. If subtraction successful (no borrow), next bit of q is 1. If borrow, next bit of q is 0, and remainder should be restored.

      -------
      |r1|r0|  rem
      -------
       b1 b0
    -
      -------
      |y1|y0|  y
      -------
       b1 b0

4. Shift left q and set its LSb according to subtraction result from previous step.

         q       carry(inverted borrow) flag
      -------    ---
      |q1|q0| <- |C|  shift in next result bit
      -------    ---
       b1 b0

5. Decrement counter and if it's not zero repeat from step 2 (16 iterations)

After that, q contains quotient, rem - remainder, x is garbage (it was shifted out completely), and y is untouched.

2) 32 bit division with 16 bit divisor

x - 32 bit dividend;
y - 16 bit divisor;
q - 32 bit quotient;

rem - 17 bit remainder (remainder may be bigger than 16 bit, for example, when x=0x80000000 and y=0xFFFF);
counter - loop counter.

1. Clear remainder, load counter=32

        rem              q
     ----------    -------------
     |00|00|00|    |??|??|??|??|
     ----------    -------------
      b2 b1 b0      b3 b2 b1 b0

2. Shift left dividend x, and shift left remainder rem, by 1 bit, so that MSb of x was shifted to the remainder's LSb

      rem              x
   ----------    -------------
   |r2|r1|r0| <- |x3|x2|x1|x0| <- shift left
   ----------    -------------
    b2 b1 b0      b3 b2 b1 b0

3. Subtract divisor y from remainder. If subtraction successful (no borrow), next bit of q is 1. If borrow, next bit of q is 0, and remainder should be restored.

      ----------
      |r2|r1|r0|  rem
      ----------
       b2 b1 b0
    -
         -------
         |y1|y0|  y
         -------
          b1 b0

4. Shift left q and set its LSb according to subtraction result from previous step.

            q          carry(inverted borrow) flag
      -------------    ---
      |q3|q2|q1|q0| <- |C|  shift in next result bit
      -------------    ---
       b3 b2 b1 b0

5. Decrement counter and if it's not zero repeat from step 2 (32 iterations)

After that, q contains quotient, rem - remainder (16 bit), x is garbage (it was shifted out completely), and y is untouched.

Now imagine that you know the result is always 16 bit (for example). That means that the first 16 iterations through steps 1-5 produce 16 zero bits in quotient (because we know that higher 2 bytes of quotient are zero in that case), x was shifted 16 times to remainder, remainder was never subtracted from. So we can say that with these 16 iterations we actually did the following:

From:

      rem              x
   ----------    -------------
   | 0| 0| 0| <- |x3|x2|x1|x0| <- shift left
   ----------    -------------
    b2 b1 b0      b3 b2 b1 b0
-
       y              q
      -------    -------------
      |y1|y0|    |??|??|??|??|
      -------    -------------
       b1 b0      b3 b2 b1 b0

To:

      rem              x
   ----------    -------------
   | 0|x3|x2| <- |x1|x0|??|??| <- shift left
   ----------    -------------
    b2 b1 b0      b3 b2 b1 b0

-       y              q
      -------    -------------
      |y1|y0|    |??|??| 0| 0|
      -------    -------------
       b1 b0      b3 b2 b1 b0

And after the next 16 iterations:

      rem              x
   ----------    -------------
   | 0|r1|r0| <- |??|??|??|??| <- shift left
   ----------    -------------
    b2 b1 b0      b3 b2 b1 b0

-       y              q
      -------    -------------
      |y1|y0|    | 0| 0|q1|q0|
      -------    -------------
       b1 b0      b3 b2 b1 b0

This is very similar to 16 bit division:

From:

        rem         x
      -------    -------
      | 0| 0| <- |x1|x0| <- shift left
      -------    -------
       b1 b0      b1 b0

-        y          q
      -------    -------
      |y1|y0|    |??|??|
      -------    -------
       b1 b0      b1 b0

To:

        rem         x
      -------    -------
      |r1|r0| <- |??|??| <- shift left
      -------    -------
       b1 b0      b1 b0

-        y          q
      -------    -------
      |y1|y0|    |q1|q0|
      -------    -------
       b1 b0      b1 b0

Except the only thing - we don't start with zero remainder, but instead from:

       rem          x
      -------    -------
      |x3|x2| <- |x1|x0| <- shift left
      -------    -------
       b1 b0      b1 b0

-       y          q
      -------    -------
      |y1|y0|    | 0| 0|
      -------    -------
       b1 b0      b1 b0

x3 should be less than 0x80 though, so that after first left shift the higher bit wouldn't disappear. So we actually can do 31-by-16-to-16 division with a 16-by-16-to-16 one!

Example code

This example uses slightly different algorithm from the one described above. It does not restore the remainder immidiately after a subtraction causes a borrow. However, the trick still aplies, and because this variant of division routine has an extended remainder (necessary for the non-restoring method to hold the current remainder sign), it can do the full 32-by-16-to-16-bit division.

; x = 60*1e6/y
;
; x, x+1 - rpm
; y, y+1 - pulse width in 1 us units
;
FindRPM
        clrf x
        movlw 0x87
        movwf x+1
        movlw 0x93
        movwf x+2
        movlw 0x03
        movwf x+3
        goto div32by16to16

; uint16 x = uint32 x / uint16 y
;
; Input:
;  x, x+1, x+2, x+3 - 32 bit unsigned integer dividend (x - lsb, x+3 - msb)
;  y, y+1 - 16 bit unsigned integer divisor
; Output:
;  x, x+1 - 16 bit unsigned integer quotient
; Temporary:
;  counter
;  temp - remainder extension
;
; Note: result must fit in 16 bits for routine to work
;       correctly

div32by16to16
        goto div16by16loopinit   ;or just move the label

; uint16 x = uint16 x / uint16 y
;
; Input:
;  x, x+1 - 16 bit unsigned integer dividend (x - lsb, x+1 - msb)
;  y, y+1 - 16 bit unsigned integer divisor
; Output:
;  x, x+1 - 16 bit unsigned integer quotient
; Temporary:
;  counter
;  x+2, x+3 - 16 bit remainder
;  temp - remainder extension
; Size: 36 instructions
; Max timing: 6+16*(5+13+3)-1+2+2=345 cycles


div16by16
        clrf x+2        ;clear
        clrf x+3        ;remainder
div16by16loopinit
        clrf temp       ;clear remainder extension
        movlw 16
        movwf counter
        setc            ;first iteration will be subtraction
div16by16loop
        ;shift in next result bit and shift out next
        ;dividend bit to remainder

        rlf x, f        ;shift lsb
        rlf x+1, f      ;shift msb
        rlf x+2, f
        rlf x+3, f
        rlf temp, f

        movf y, w
        btfss x, 0
         goto div16by16add

        ;subtract divisor from remainder
        subwf x+2, f
        movf y+1, w
        skpc
         incfsz y+1, w
           subwf x+3, f
        movlw 1
        skpc
         subwf temp, f
        goto div16by16next

div16by16add
        ;add divisor to remainder
        addwf x+2, f
        movf y+1, w
        skpnc
         incfsz y+1, w
           addwf x+3, f
        movlw 1
        skpnc
         addwf temp, f

div16by16next
        ;carry is next result bit
        decfsz counter, f
         goto div16by16loop

;shift in last bit
        rlf x, f
        rlf x+1, f
        return

Comments:

I can confirm this works! Put it into MPlab and used MPlabSim
Coool! :)
Nikolai, you're a genius! Thanks for the division routine, it'll be used in an ignition mapping system for a motorcycle.
vivekanand.bollapinni@philips.com
Thank You. This is quite a generic algorithm which can be applied to division of any n-bit number by any m-bit number, when properly tuned.
I have a correction here in step-3: According to me, it should be in the following manner to be generic.

if (remainder >= divisor)
{
remainder = remainder - divisor;
next bit of quotient = 1;
}
else
{
next bit of quotient = 1;
/* Remainder in this case remains unchanged */
}
harold@dovesystems.com
The line "movlw 16" assumes a radix of 10. When we cut and paste
routines into our projects, we may not have the default radix
set to 10, causing interesting effects! May I suggest that code
on the site always specify the radix, never using the default?
In this case, I'd modify the line to "movlw d'16'" just to make
sure.

Questions:

bertond@wvi.com asks: " what is skpc and skpnc?????????" James Newton of James Newton's Massmind replies: skpc: "Skip the next instruction if the carry flag is set"
skpnc: "Skip the next instruction if the carry flag is NOT set"
Both of these are actually macros that are built in to MPLAB and defined as btfss 3,0 and btfsc 3,0 respectivly.
See http://www.piclist.com/techref/microchip/mpasmpseudo.htm.

PIC Microcontoller Math Method

32 by 16 Divison from Nikolai Golovchenko, or How to make use of a 16 bit division routine for 32 bit division

1) 16 bit division

2) 32 bit division with 16 bit divisor

Example code

32 by 16 Divison from Nikolai Golovchenko,
or
How to make use of a 16 bit division routine for 32 bit division