Untested idea for a N-byte by N-byte multiply on a PIC18.... It works by computing all the subproducts that have the same significance in the output at the same time and then summing them: First we need our values: NSize = 32 ; Test case = 512 bits cblock ? A:MSize ; args B:NSize R:NSize+NSize ; result (the +1 byte will be zeroed!) HITEMP ; temp for hi byte of current multiply group endc Note: values are stored 'little endian' HugeMultiply: ; Do the first operation to get things started. movf A+0,W mulwf B+0 movff PRODL,R+0 movff PRODH,R+1 ; Get ready to continue the summation clrf R+2 clrf HITEMP ; Do operations 2 thru NSize to compute the least significant bytes. ; ; Operation 2 multiplies the two pairs: ; A<0> by B<1> and A<1> by B<0> ; to compute R<3:2:1> ; ; Operation 3 multiplies the three pairs: ; A<0> by B<2>, A<1> by B<1> and A<2> by B<0> ; to compute R<4:3:2> ; ; The NSize'th operation multiplies the NSize pairs: ; A<0> by B, A<1> by B ... A by B<0> ; to compute R ; ; Note that for the I'th operation the sum of the subscript for each pair of A and B ; bytes multipled together is I-1 LFSR 2,R+1 ; Each operation will increment FSR2 I = 2 while I <= NSize ; point at start of A LFSR 0,A ; point at first byte of interest in B LFSR 1,B+NSize-I ; call into the correct location within the MultiplyVector routine ; to multiply I byte pairs, accumulating them in R RCALL MultiplyVector + 16 * (NSize-I) I = I + 1 endw ; Continue computing the output bytes for the high order bytes of the result ; ; Now do operations numbered NSize+1 thru ? ; ; Operation NSize+1 multiplies NSize-1 pairs: ; A<1> by B, A<2> by B, ..., A by B<1> ; to compute R ; ; Operation NSize+2 multiples NSize-2 pairs: ; A<2> by B, A<3> by B, ..., A by B<2> ; to compute R ; ; Operation NSize+3 multiples NSize-3 pairs: ; A<3> by B, A<4> by B,..., A by B<3> ; to compute R ; ; Operation NSize + NSize - 2 multiplies 2 pairs: ; A by B, A by B ; to compute R ; Note: at this point I = NSize+1 while I <= NSize * 2 - 2 ; point at first byte of interest in A LFSR 0,A+I-NSize LFSR 1,B+NSize-1 ; call into the correct location within the MultiplyVector routine ; to multiply 2*NSize-I byte pairs, accumulating them in R RCALL MultiplyVector + 16 * (NSize-(2*NSize-I)) I = I + 1 endw ; Finally the last operation to multiply together A and B and ; finish up computing the result movf A+NSize-1 mulwf B+NSize-1 movf PRODL addwf R+NSize-2,F movf PRODH addwfc R+NSize-1,F return ; Here is the kernel of code that multiplies all the pairs that have the same significance ; Note that each byte's worth of multiplication takes 6 words of instructions MultiplyVector: I = 0 while I < NSize-1 ; NSize copies of the code movf POSTINC0,W ; get byte from A mulwf POSTDEC1,W ; multiply by corresponding byte from B movf PRODL,W addwf POSTINC2,F ; accumulate it in correct location movf PRODH,W addwfc POSTDEC2,F skpnc incf HITEMP,F I = I + 1 endw ; On the last iteration we leave FSR2 pointing to the next byte of the product, ; and also bring in the high byte movf POSTINC0,W ; get byte from A mulwf POSTDEC1,W ; multiply by corresponding byte from B movf PRODL,W addwf POSTINC2,F ; accumulate it correct location movf PRODH,W addwf POSTINC2,F ; point to highest byte movf HITEMP,W movwf POSTDEC2,F ; point to new middle byte clrf HITEMP ; get ready for next operation return ; Timing analysis: ; ; Call, return, initialization and multiply of LSBits takes 12 ; Operations I = 2 thru NSize+NSize-2 take 4 + call on MultiplyVector ; Multiply of MSBits takes 6 instructions ; MultiplyVector takes 8N + 5 instructions including call and return. ; MultiplyVector is called once with N = NSize ; and twice each with N=1..NSize-1 ; ; The total time used, not counting MultiplyVector is: ; 12 + 4 * (NSize*2 - 3) + 6 ; = 12 + 8*NSize - 12 + 6 ; = 8*NSize + 6 ; ; The total time used by MultiplyVector is: ; 2*8*1 + 5 ; + 2*8*2 + 5 ; + 2*8*3 + 5 ; + ... ; + 2*8*NSize-1 + 5 ; + 8*NSize + 5 ; ; = 2*8* ( 1+2+3+..+NSize-1) + 8*NSize+5*NSize ; = 2*8*(NSize-1*NSize)/2 + 8 * NSize+5*NSize ; = 8*(NSize-1*NSize) + 8*NSize + 5*NSize ; = 8*(NSize*NSize-NSize+NSize) + 5*NSize ; = 8*NSize*Nsize + 5 * NSize ; ; The grand total time is then: ; = 8*NSize*NSize + 13*NSize + 6 ; ; So, for NSize = 32 we get: ; 8*32*32 + 13*32 + 6 = 8192 + 416 + 6 = 8614 instructions. ; ; At 10MIPS that will take 0.8614 msec. Not too shabby. Bob Ammerman RAm Systems -- http://www.piclist.com#nomail Going offline? Don't AutoReply us! email listserv@mitvma.mit.edu with SET PICList DIGEST in the body