On Sat, 25 May 2019, Dwayne Reid wrote:

> Resend with tag.  Thanks for the reminder, Bob.
>
> Good day to all.
>
> I'm having a serious senior moment here.  I'm trying to figure out
> why most of the fixed-constant multiply routines use double-byte math
> when Andrew Warren's 8X8 multiply routine used single-byte addition.

revisiting simple multiply:

// res =3D arg1 * arg2
// where res, arg1 and arg2 are 16 bit numbers

// (1)
res =3D 0;

for (j=3D0; j<8; j=3Dj+1)
{
 	// we are building the result from MSB
 	// to LSB
 	res =3D res << 1;

 	if ((arg1 & 0x80) !=3D 0)
 	{
// (2)
 		res =3D res + arg2;
 	}

 	// do this for all the "1" bits in arg1
 	arg1 =3D arg1 << 1;
}

we make "res" big enough to hold the 16 bit result (multiplying two 8
bit numbers always gives a 16 bit result)

so here is the reason we might want to use a single byte add:

using double byte we have

; (3)
; res =3D res + arg2;
 	movf	arg2+0,w
 	addwf	res+0
 	movf	arg2+1,w
 	addwfc	res+1

the PIC doesn't have an add with carry instruction so the above becomes

; (4)
; res =3D res + arg2;
 	movf	arg2+0,w
 	addwf	res+0
 	btfsc	STATUS,C
 	incf	res+1
 	movf	arg2+1,w
 	addwf	res+1

now if we consider that arg2 is only ever 8 bits wide (i.e. between 0 and=20
255) then the second part of the multiprecision add is always equivalent=20
to adding 0. in other words

this -->	movf	arg2+1,w
 		addwf	res+1

is the same as
 		movlw	0
 		addwf	res+1

which has no effect on res+1

therefore when adding 8 bit arg2 to 16 bit res (4) above becomes

; (5)
; res =3D res + arg2;
 	movf	arg2+0,w
 	addwf	res+0
 	btfsc	STATUS,C
 	incf	res+1

Now the above simple version (1) is using the property of adding two 16=20
bit numbers (2) to capture the carry from the LOW bytes to the HIGH=20
bytes. Andrew's version works in the opposite direction i.e. instead of=20
using << it uses >>. Andrew's version actually looks like this:

// res =3D arg1 * arg2
// where res, arg1 and arg2 are 16 bit numbers

// (6)
res =3D 0;

for (j=3D0; j<8; j=3Dj+1)
{
 	// we are building the result from LSB
 	// to MSB

 	if ((arg1 & 0x01) !=3D 0)
 	{
// (7)
 		res =3D res + (arg2 << 8);
 	}

 	// as we are building starting with LSB
 	// we need to keep shifting right to
 	// eventually LSB of the result will be in
 	// bit 0
 	res =3D res >> 1;

 	if (CARRY)
 	{
// (8)
 		res =3D res | 0x8000;
 	}

 	// do this for all the "1" bits in arg1
 	arg1 =3D arg1 >> 1;
}

However capturing the carry and using later in C is impossible using=20
standard C.

Andrew is also checking for "(arg1 & 0x01) !=3D 0" by shifting the LSB into=
=20
the carry flag. So (6) then becomes:

// (9)
res =3D 0;

for (j=3D0; j<8; j=3Dj+1)
{
 	// we are building the result from LSB
 	// to MSB

// 10)
 	// do this for all the "1" bits in arg1
 	arg1 =3D arg1 >> 1;

 	if (CARRY)
 	{
 		res =3D res + (arg2 << 8);
 	}

// (11)
 	res =3D res >> 1;

 	if (CARRY)
 	{
// (12)
 		res =3D res | 0x8000;
 	}
}

Using assembler (11) and (12) become:

; (13)
 	rrf	res+1
 	rrf	res+0

By arranging for arg1 to be an alias for res+1 he is combining (10) and=20
(11) using (13).


>
> I used to eat this kind of stuff for lunch in years long ago.  I saw
> Andy's routine when he published it and had that sudden bright flash
> of understanding.  Not so much anymore.
>
> I've got two routines below.  One is the unrolled version of Andy's
> routine, the other is a fixed-constant version with a multiplier of
> decimal 171.  I haven't tested these yet - that comes tomorrow.  But
> I would greatly appreciate some low-level explanations of the
> differences between Nikolai's ConstMulDiv code and Andy's code.
>
> You will notice that I turned both of these into macros.  They are
> used in only a single location in my code and I didn't see the need
> to have the overhead of call & return.
>
> Assume that I would be calling MULT8X8 with decimal 171 as the multiplier=
..
>
> MULT8X8         MACRO  resulthi, resultlo
> ;enters with multiplicand in resultlo, multiplier in W
> ;exits with result in resulthi:resultlo
> mult            MACRO
>     skpnc                       ;btfsc       STATUS,C
>       addwf     resulthi,F
>     rrf         resulthi,F
>     rrf         resultlo,F
>     endm
>
>     clrf        resulthi        ;1 cy
>     rrf         resultlo,F      ;1 cy
>
>     mult                        ;4 cy
>     mult                        ;4 cy
>     mult                        ;4 cy
>     mult                        ;4 cy
>     mult                        ;4 cy
>     mult                        ;4 cy
>     mult                        ;4 cy
>     mult                        ;4 cy
>     endm                        ;34 cy
>
>
> ;another variant of above but with fixed constant
> ;.171 =3D 1010 1011
>
> mult171         MACRO  resulthi, resultlo
> ;enters with multiplicand in W
> ;exits with result in resulthi:resultlo
> rotate          MACRO
>     rrf         resulthi,F
>     rrf         resultlo,F
>     endm
>
> ;perform add operation every time C is set.
>     movwf        resulthi       ;save W
>     movlw       .171 >> 1       ;          C
>     movwf       resultlo        ;01010101  1
>     movfw       resulthi        ;
>     clrf        resulthi
>     addwf       resulthi,F
>     rotate                      ;00101010  1
>     addwf       resulthi,F
>     rotate                      ;00010101  0
>     rotate                      ;00001010  1
>     addwf       resulthi,F
>     rotate                      ;00000101  0
>     rotate                      ;00000010  1
>     addwf       resulthi,F
>     rotate                      ;00000001  0
>     rotate                      ;00000000  1
>     addwf       resulthi,F
>     rotate                      ;00000000  0  W * 171 -> resulthi:resultl=
o
>     endm			;27 cy
>
> I can knock a couple of cycles off the above routine if I in-line it
> instead of instantiating it as a macro (I don't have to save, then restor=
e W).
>
> It wouldn't surprise if I made several bone-headed mistakes in the
> above but I'm hoping that my intent is clear.  Guidance and simple
> explanations greatly appreciated.
>
> In particular, I'm looking for comparisons between the above routines
> and the double-byte math operations that Nikolai's and Sergio's
> routines need to use.

The routines I (sergio) produced were generic 16 bit. They where intended=20
to show ***HOW*** to perform multiplication using subtraction. They were=20
not optimised for 8 bit arguments as I did not have a lot of time to spend=
=20
on this. I did note that they could be further optimised. FWIW I did go to=
=20
great lengths to optimise operations between variables of different sizes=20
(e.g. between 8, 16 and 32 bit variables) in my PIC XCSB compiler.

The multiplication using subtraction was particularly interesting to me=20
as a way of multiplying quickly by numbers such as 2^n-1, 2^n-2, 2^n-3...

Regards
Sergio
--=20
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.