ON 20100405@10:02:46 AM at page: On a web page you were interested in at: http://www.piclist.com/method/math/muldiv.htm#40272.1494907407 James Newton[JMN-EFP-786] published post 40272.1494907407 curious@bwv190.internetdsl.tpnet.pl asks:
how about 'vedic' multiplication methods? while not so beautifull without parallelism (i.e. fpga array) , they are still interesting, and allow performing multiplication of large numbers in just few steps (i.e. 6 clock cycles for 64bit mul. if fpga is used)|Delete 'P-' before: '' but after: 'offer@reduce.com :
simplest way to perform it is to multiply bit like one multiplies on paper, so i.e.
101011
* 0010
------------
000000
+ 101011
+ 000000
-------------
01010110
notice that while CPU has to repeat the 'shift and multiply'
for each bit of the multiplier , fpga can do it in parallel in just one cycle (shift is just adressing to destination register, moving data there (including masking by multiplier) - takes just one cycle , actually one 'slope' , as not even full 'cycle' is needed.)
then we have $multiplier_bit_count_size array of sums to make.
for 64bit multiplier, this would equal to as many as 64 operations, so 64 cycles, but we can once again try to be smart the 'vedic' way , and group our adds to ones which will unlikely influence eachoter. this mean we can quickly add each pair , which makes it 32 parallel operations per first clock cyle,
then 16 for 2nd, 8 for 3rd, 4, for 4th , 2 for 5th , and voila.
including the masked preloading of the array mentioned earlier , this all equals to just 6 cycles, and of those, just two involve ALU bus (fetching number and multiplier is one, and placing final addition result in destination is another)
so assuming clocking the MUL array 6x faster than the ALU bus (which is quite doable in cmos, assuming we talk about up to ~1ghz speeds), we can practicaly deliver MUL instruction in just 2 ALU bus clock cycles, while if registers can be independent (separate ALU bus to result register) - just one clock cycle.
in asm it takes bit more looping unfortunatelly , but it still makes practical method for multiplying of insanely large (64bit and more) numbers quite a breeze.
Appoint Consideration,transfer judge hear much ready north academic central approach operation growth connect mark list connection atmosphere secure revenue candidate cost main element condition sale really directly mistake highly occasion list manager account between invite apart arm impression authority role stick control more machine size suggest teacher teaching practical past count secretary destroy offer situation check simple repeat aware ever fill constant meanwhile beautiful all exercise he let effect very immediate relatively improvement next pupil happy agreement difficult personal role appeal tomorrow category available lady
'