The FF1L instruction will let you find the one bit. You should then be able to use a multibit shift with the shift count in the register to do the variable shift at just a couple of instructions per word= .. -- Bob Ammerman RAm Systems -----Original Message----- From: piclist-bounces@mit.edu [mailto:piclist-bounces@mit.edu] On Behalf Of Joe Mickley Sent: Saturday, October 20, 2012 10:47 AM To: Microcontroller discussion list - Public. Subject: Re: [PIC] Fast 32 bit by 32 bit divide in a dsPIC30F Follow on to my last post. Started working on the implementation suggested= =20 in Hackers Delight. I'm working from Figure 9-3 and converting it to ASM as I go. Book in one= =20 hand, keyboard in one hand, coffee in one hand (wait a minute, how is that working ...) The initial results are NOT good. My original goal was to improve on the=20 divide from the Microchip lib implementation which takes around 400 instruction cycles and which I=20 consider too long. I have gotten part way thru the Hackers Delight implementation and I can clearly=20 see that this is not going to produce a substantially (lets just define "substantial" as at least 2:1= =20 better in time) better solution. The problem is that the routine starts off by normalizing the denominator=20 left in the 32 bit register pair until the first "1" bit in the denominator is aligned with the MS bit= =20 of the INT32. That means that if the denominator is a smallish value there will be a lot of left shifting= .. Further, the routine takes the numerator along with it as it shifts left. Since the numerator was=20 already an INT32, internally it now becomes an INT64 as part of the shift process. I reduced the total number of left shifts by first looking to see if there= =20 were any "1"s at all in the MS word of the INT32 denominator. If there are none there, then my routine=20 simply uses MOV instructions to promote the INT16 words up by 1 word in the registers. After that it=20 then looks to see where the first "1" is in the register and iteratively shifts left until that bit=20 aligns with the MS bit of the INT32 register. The problem is that all of that left shifting (denominator as an INT32 and= =20 numerator as an INT64) takes about 11 cycles/bit. The result is that for a small denominator (worst cas= e denominator =3D1) the total time is about 200 cycles. Thats before I actually get to the section of=20 code that is going to do the actual dividing. The routine has 2 DIVIDE operations. I know each one will take 17 cycles.= =20 That is 34 more. There is (not yet coded) an iterative "multiply and compare" function. That too= =20 is multi-word. MUL goes fast but many compare (CP) instructions will still count up (somewhere in there= =20 are some decision functions, else why compare) and I haven't looked at the iteration criteria. This is not good. I'm not even close to being done and I have already=20 burned about 260 cycles worst case. The "substantial improvement" idea is dead and it's going to get worse. I=20 am going to keep plugging away for a bit longer, but I don't think this is going to be the solution. --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .