On Sat, 20 Oct 2012, Joe Mickley wrote: > Follow on to my last post. Started working on the implementation suggest= ed > in Hackers Delight. > I'm working from Figure 9-3 and converting it to ASM as I go. Book in on= e > hand, keyboard in one > hand, coffee in one hand (wait a minute, how is that working ...) >=20 > The initial results are NOT good. My original goal was to improve on the > divide from the Microchip > lib implementation which takes around 400 instruction cycles and which I > consider too long. I have > gotten part way thru the Hackers Delight implementation and I can clearly > see that this is not going > to produce a substantially (lets just define "substantial" as at least 2:= 1 > better in time) better solution. >=20 > The problem is that the routine starts off by normalizing the denominator > left in the 32 bit register > pair until the first "1" bit in the denominator is aligned with the MS bi= t > of the INT32. That means that > if the denominator is a smallish value there will be a lot of left shifti= ng. > Further, the routine takes > the numerator along with it as it shifts left. Since the numerator was > already an INT32, internally > it now becomes an INT64 as part of the shift process. >=20 > I reduced the total number of left shifts by first looking to see if ther= e > were any "1"s at all in the MS > word of the INT32 denominator. If there are none there, then my routine > simply uses MOV instructions > to promote the INT16 words up by 1 word in the registers. After that it > then looks to see where the > first "1" is in the register and iteratively shifts left until that bit > aligns with the MS bit of the INT32 register. Sorry to busy to look into this in any detail (even though it is an interes= ting=20 problem), however I can offer a tiny but of advice. You can use a lookup table to help you quickly find the top bit set in a by= te,=20 and consequently a 32 bit int thus: unsigned char top_bit[] =3D { 0 1,1, 2,2,2,2, 3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5, 5,5,5,5,5,5,5,5, 5,5,5,5,5,5,5,5, 5,5,5,5,5,5,5,5, 6... (repeats 64 times) 7... (repeats 128 times) }; long x; int msb; msb =3D (x >> 24) & 0xff; if (msb !=3D 0) { msb =3D top_bit[msb] + 24; } else { msb =3D (x >> 16) & 0xff; if (msb !=3D 0) { msb =3D top_bit[msb] + 16; } else { msb =3D (x >> 8) & 0xff; if (msb !=3D 0) { msb =3D top_bit[msb] + 8; } } } Of course you can optimise this further depending on the the underlying CPU= =20 arch but I'll leave that as an exercise for the reader :-) Regards Sergio Masci --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .