On Sat, 25 May 2019, Dwayne Reid wrote: > Resend with tag. Thanks for the reminder, Bob. > > Good day to all. > > I'm having a serious senior moment here. I'm trying to figure out > why most of the fixed-constant multiply routines use double-byte math > when Andrew Warren's 8X8 multiply routine used single-byte addition. revisiting simple multiply: // res =3D arg1 * arg2 // where res, arg1 and arg2 are 16 bit numbers // (1) res =3D 0; for (j=3D0; j<8; j=3Dj+1) { // we are building the result from MSB // to LSB res =3D res << 1; if ((arg1 & 0x80) !=3D 0) { // (2) res =3D res + arg2; } // do this for all the "1" bits in arg1 arg1 =3D arg1 << 1; } we make "res" big enough to hold the 16 bit result (multiplying two 8 bit numbers always gives a 16 bit result) so here is the reason we might want to use a single byte add: using double byte we have ; (3) ; res =3D res + arg2; movf arg2+0,w addwf res+0 movf arg2+1,w addwfc res+1 the PIC doesn't have an add with carry instruction so the above becomes ; (4) ; res =3D res + arg2; movf arg2+0,w addwf res+0 btfsc STATUS,C incf res+1 movf arg2+1,w addwf res+1 now if we consider that arg2 is only ever 8 bits wide (i.e. between 0 and=20 255) then the second part of the multiprecision add is always equivalent=20 to adding 0. in other words this --> movf arg2+1,w addwf res+1 is the same as movlw 0 addwf res+1 which has no effect on res+1 therefore when adding 8 bit arg2 to 16 bit res (4) above becomes ; (5) ; res =3D res + arg2; movf arg2+0,w addwf res+0 btfsc STATUS,C incf res+1 Now the above simple version (1) is using the property of adding two 16=20 bit numbers (2) to capture the carry from the LOW bytes to the HIGH=20 bytes. Andrew's version works in the opposite direction i.e. instead of=20 using << it uses >>. Andrew's version actually looks like this: // res =3D arg1 * arg2 // where res, arg1 and arg2 are 16 bit numbers // (6) res =3D 0; for (j=3D0; j<8; j=3Dj+1) { // we are building the result from LSB // to MSB if ((arg1 & 0x01) !=3D 0) { // (7) res =3D res + (arg2 << 8); } // as we are building starting with LSB // we need to keep shifting right to // eventually LSB of the result will be in // bit 0 res =3D res >> 1; if (CARRY) { // (8) res =3D res | 0x8000; } // do this for all the "1" bits in arg1 arg1 =3D arg1 >> 1; } However capturing the carry and using later in C is impossible using=20 standard C. Andrew is also checking for "(arg1 & 0x01) !=3D 0" by shifting the LSB into= =20 the carry flag. So (6) then becomes: // (9) res =3D 0; for (j=3D0; j<8; j=3Dj+1) { // we are building the result from LSB // to MSB // 10) // do this for all the "1" bits in arg1 arg1 =3D arg1 >> 1; if (CARRY) { res =3D res + (arg2 << 8); } // (11) res =3D res >> 1; if (CARRY) { // (12) res =3D res | 0x8000; } } Using assembler (11) and (12) become: ; (13) rrf res+1 rrf res+0 By arranging for arg1 to be an alias for res+1 he is combining (10) and=20 (11) using (13). > > I used to eat this kind of stuff for lunch in years long ago. I saw > Andy's routine when he published it and had that sudden bright flash > of understanding. Not so much anymore. > > I've got two routines below. One is the unrolled version of Andy's > routine, the other is a fixed-constant version with a multiplier of > decimal 171. I haven't tested these yet - that comes tomorrow. But > I would greatly appreciate some low-level explanations of the > differences between Nikolai's ConstMulDiv code and Andy's code. > > You will notice that I turned both of these into macros. They are > used in only a single location in my code and I didn't see the need > to have the overhead of call & return. > > Assume that I would be calling MULT8X8 with decimal 171 as the multiplier= .. > > MULT8X8 MACRO resulthi, resultlo > ;enters with multiplicand in resultlo, multiplier in W > ;exits with result in resulthi:resultlo > mult MACRO > skpnc ;btfsc STATUS,C > addwf resulthi,F > rrf resulthi,F > rrf resultlo,F > endm > > clrf resulthi ;1 cy > rrf resultlo,F ;1 cy > > mult ;4 cy > mult ;4 cy > mult ;4 cy > mult ;4 cy > mult ;4 cy > mult ;4 cy > mult ;4 cy > mult ;4 cy > endm ;34 cy > > > ;another variant of above but with fixed constant > ;.171 =3D 1010 1011 > > mult171 MACRO resulthi, resultlo > ;enters with multiplicand in W > ;exits with result in resulthi:resultlo > rotate MACRO > rrf resulthi,F > rrf resultlo,F > endm > > ;perform add operation every time C is set. > movwf resulthi ;save W > movlw .171 >> 1 ; C > movwf resultlo ;01010101 1 > movfw resulthi ; > clrf resulthi > addwf resulthi,F > rotate ;00101010 1 > addwf resulthi,F > rotate ;00010101 0 > rotate ;00001010 1 > addwf resulthi,F > rotate ;00000101 0 > rotate ;00000010 1 > addwf resulthi,F > rotate ;00000001 0 > rotate ;00000000 1 > addwf resulthi,F > rotate ;00000000 0 W * 171 -> resulthi:resultl= o > endm ;27 cy > > I can knock a couple of cycles off the above routine if I in-line it > instead of instantiating it as a macro (I don't have to save, then restor= e W). > > It wouldn't surprise if I made several bone-headed mistakes in the > above but I'm hoping that my intent is clear. Guidance and simple > explanations greatly appreciated. > > In particular, I'm looking for comparisons between the above routines > and the double-byte math operations that Nikolai's and Sergio's > routines need to use. The routines I (sergio) produced were generic 16 bit. They where intended=20 to show ***HOW*** to perform multiplication using subtraction. They were=20 not optimised for 8 bit arguments as I did not have a lot of time to spend= =20 on this. I did note that they could be further optimised. FWIW I did go to= =20 great lengths to optimise operations between variables of different sizes=20 (e.g. between 8, 16 and 32 bit variables) in my PIC XCSB compiler. The multiplication using subtraction was particularly interesting to me=20 as a way of multiplying quickly by numbers such as 2^n-1, 2^n-2, 2^n-3... Regards Sergio --=20 http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .